8.1. Understanding Psychological Measurement
By Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler and Dana C. Leighton, adapted by Marc Chao and Muhamad Alif Bin Ibrahim
What Is Measurement?
Measurement involves assigning scores to individuals in a way that those scores represent specific characteristics or traits of those individuals. This concept applies to everyday situations, such as stepping on a bathroom scale to measure weight or using a meat thermometer to check the internal temperature of a turkey. It is also central to scientific disciplines. In physics, for example, measuring an object’s potential energy involves determining its mass and height, then using a formula that includes Earth’s gravitational acceleration (9.8 m/s²) to calculate the final value. The resulting number represents the object’s potential energy.
This same principle applies to psychological measurement, also known as psychometrics. In psychology, the goal is to systematically assign scores to represent intangible traits or mental states. For instance, a cognitive psychologist interested in working memory capacity might use a backward digit span task. In this task, the psychologist reads a series of digits and asks the participant to repeat them in reverse order. The length of the longest digit sequence correctly repeated serves as the participant’s score, representing their working memory capacity.
Similarly, a clinical psychologist might want to measure depression levels. To do this, they could use the Beck Depression Inventory (BDI), a 21-item questionnaire where participants rate how frequently they have experienced symptoms like sadness or fatigue over the past two weeks. The total score from these ratings reflects the participant’s current level of depression.
The key takeaway is that measurement does not require a specific tool or instrument. Instead, it relies on a systematic method for assigning scores in a way that accurately reflects the characteristic being measured. Whether it is working memory, depression, or physical weight, the measurement process must follow a structured approach to ensure consistency and meaningful results.
Psychological Constructs
Some variables studied in psychology, such as age, height, weight, and birth order, are relatively easy to measure. For example, asking someone their age usually provides an accurate answer, and if someone is unsure or unwilling to share their weight, a bathroom scale offers an objective measurement. However, most psychological variables are not so simple to measure. You cannot determine someone’s intelligence just by looking at them, nor can you measure self-esteem with a scale. These more abstract variables are called constructs and include traits like extraversion, emotional states like fear, attitudes such as opinions on taxes, and abilities like athleticism.
Psychological constructs cannot be observed directly for a couple of reasons. First, they often represent tendencies to think, feel, or act in certain ways rather than observable actions at any given moment. For instance, saying that a student is highly extraverted does not mean she’s always outgoing. At this moment, she might be sitting quietly reading a book. Instead, extraversion reflects her general tendency to be outgoing and socially engaged across various situations.
Second, constructs often involve internal processes that are not visible to an observer. For example, fear activates certain parts of the nervous system, triggers specific thoughts and feelings, and may lead to behaviours like avoiding danger, all of which may not be apparent to someone watching. Importantly, constructs like extraversion or fear are not reduced to one specific behaviour, thought, or biological response. Instead, each construct acts as a summary of a broader pattern of behaviours and internal processes.
A conceptual definition of a construct explains the behaviours and internal processes that make up that construct and outlines how it relates to other variables. For example, neuroticism is defined as a tendency to experience negative emotions like anxiety, anger, and sadness across different situations. This definition might also mention that neuroticism has a genetic basis, remains relatively stable over time, and is associated with a higher tendency to experience physical pain and other symptoms.
Students sometimes wonder why researchers do not simply rely on dictionary definitions for constructs like self-esteem or neuroticism. The reason is that scientific constructs often have no direct counterpart in everyday language. For example, working memory capacity is not a term you would typically find in casual conversation. More importantly, scientific definitions are far more detailed and precise than dictionary definitions. Researchers aim to create definitions that accurately reflect reality and are refined through empirical testing and adjustment based on evidence.
In psychology, it is common to find multiple definitions for the same construct in the research literature. This happens because researchers are continually testing, refining, and sometimes replacing older definitions with ones that better explain their findings. In some cases, there is an ongoing debate about which definition is most accurate. This iterative process is central to the scientific study of psychological constructs and helps ensure that these abstract ideas are measured and understood as precisely as possible.
Operational Definitions
An operational definition explains a variable in terms of how it is specifically measured in a study. Psychologists typically measure variables in three main ways: self-report measures, behavioural measures, and physiological measures.
In self-report measures, participants describe their own thoughts, feelings, or behaviours. For example, the Rosenberg Self-Esteem Scale asks people to rate statements about their self-worth.
In behavioural measures, researchers observe and record actions or behaviours. These observations can happen in controlled laboratory settings or in natural environments. For example, working memory capacity can be measured using a backward digit span task, where participants repeat numbers in reverse order. In a more natural setting, Albert Bandura and his colleagues measured physical aggression by observing children play with a Bobo doll. They counted specific aggressive behaviours, such as hitting, kicking, or punching the doll. The number of these actions within a set time frame served as the operational definition of aggression in their study.
In physiological measures, researchers record biological processes such as heart rate, blood pressure, stress hormone levels, or brain activity. These measures provide objective data on participants’ physical responses to stimuli or conditions.
For any single construct, there are often multiple valid operational definitions. Stress is a good example. Conceptually, stress can be defined as an adaptive response to a perceived threat, involving physiological, emotional, and behavioural changes. However, stress has been measured in many ways:
- The Social Readjustment Rating Scale evaluates stress by assigning points to life events, such as divorce or job change, based on their severity.
- The Hassles and Uplifts Scale focuses on everyday stressors like misplacing items or worrying about weight.
- The Perceived Stress Scale asks participants how frequently they feel nervous or overwhelmed.
- Physiological measures, such as blood pressure or cortisol levels, provide biological markers of stress.
When psychologists use multiple operational definitions for the same construct, either in one study or across different studies, they are applying the principle of converging operations. This approach assumes that different measures of the same construct should produce similar results.
For example, if different stress measures (e.g., self-report questionnaires and physiological indicators) correlate with each other and show consistent patterns, this strengthens confidence that the construct is being accurately measured. Studies have shown that various measures of stress all correlate with immune system functioning, reinforcing the conclusion that stress negatively affects immune health (Segerstrom & Miller, 2004).
Levels of Measurement
Psychologist S.S. Stevens introduced the idea that measurements can be categorised based on how much quantitative information they communicate about a variable. For example, in a 100-metre race, runners’ performance can be recorded in two ways: simply by their rank order (1st, 2nd, 3rd) or by using a stopwatch to record exact times (11.5 seconds, 12.1 seconds). While both methods measure performance, the stopwatch provides more detailed information because it shows not only the order but also how much faster or slower one runner was compared to another.
As shown in Table 8.1.1, Stevens identified four levels of measurement, nominal, ordinal, interval, and ratio, each offering a different level of detail and determining which statistical methods are appropriate.
Nominal Level: Categorising Data
At the nominal level, data is grouped into categories or labels without any implied order. For example, asking participants about their marital status (single, married, divorced) or ethnicity involves nominal-level measurement. These labels indicate differences but do not suggest any ranking or order, as being “single” is not inherently higher or lower than being “married”.
Key takeaway: Nominal scales classify data but do not rank it.
Ordinal Level: Ranking Data
At the ordinal level, data is ranked or ordered, but the intervals between ranks are not necessarily equal. For example, if people rate their satisfaction with a product as “very dissatisfied”, “somewhat dissatisfied”, “somewhat satisfied”, or “very satisfied”, the categories are ranked. “Very satisfied” is clearly higher than “somewhat satisfied”, but the difference between these two categories might not be the same as the difference between “somewhat dissatisfied” and “very dissatisfied”.
Similarly, in a race, the difference in time between the 1st and 2nd place finishers might be tiny, while the difference between 2nd and 3rd place could be much larger. Ordinal scales tell us who is higher or lower, but not how much higher or lower.
Key takeaway: Ordinal scales rank data, but intervals between ranks may not be consistent.
Interval Level: Equal Intervals, No True Zero
The interval level provides more information by ensuring that the differences between values are consistent across the scale. A good example is the Celsius or Fahrenheit temperature scales. The difference between 30°C and 40°C is the same as the difference between 80°C and 90°C.
However, interval scales lack a true zero point. For instance, 0°C does not mean the absence of temperature; it is just another point on the scale. This means ratios do not hold meaningful comparisons because the zero point is arbitrary, so you cannot say 80°C is “twice as hot” as 40°C.
In psychology, IQ scores are considered interval-level measurements. A score of 0 does not mean no intelligence, and an IQ of 140 is not “twice as intelligent” as an IQ of 70. However, the difference between an IQ of 80 and 100 is the same as the difference between 120 and 140.
Key takeaway: Interval scales have equal intervals but no true zero, making ratio comparisons meaningless.
Ratio Level: True Zero Point
The ratio level is the most precise level of measurement because it has equal intervals and a true zero point, indicating the absence of the characteristic being measured. Examples include weight (in kilograms), height (in metres), and income (in dollars).
For instance, someone who weighs 0 kg truly has no weight, and someone with $50 has exactly twice as much money as someone with $25. The Kelvin temperature scale is another example because 0 K represents absolute zero, which is the complete absence of molecular motion.
Key takeaway: Ratio scales allow for meaningful comparisons of both intervals and ratios.
Level of Measurement | Category labels | Rank order | Equal intervals | True zero |
NOMINAL | X | |||
ORDINAL | X | X | ||
INTERVAL | X | X | X | |
RATIO | X | X | X | X |
Reliability and Validity of Measurement
Measurement in psychology involves assigning scores to individuals to represent certain characteristics or traits accurately. However, when dealing with abstract constructs such as intelligence, self-esteem, or depression, researchers must ensure that these scores genuinely reflect the intended characteristic. To achieve this, psychologists conduct studies to confirm that their measurement tools function as expected. If the results suggest the measure is unreliable or invalid, it is either revised or abandoned altogether.
Imagine you have been dieting for a month. Your clothes fit more loosely, and friends have noticed your weight loss. If your bathroom scale shows you have lost 10 pounds, it aligns with your observations, and you would trust the scale. However, if it indicates a gain of 10 pounds, you would suspect it is broken and either fix or replace it. This analogy highlights how psychologists approach evaluating their measurement tools. Two key dimensions guide this evaluation: reliability and validity.
References
Segerstrom, S. C., & Miller, G. E. (2004). Psychological stress and the human immune system: A meta-analytic study of 30 years of inquiry. Psychological Bulletin, 130(4), 601–630. https://doi.org/10.1037/0033-2909.130.4.601
Chapter Attribution
Content adapted, with editorial changes, from:
Research methods in psychology, (4th ed.), (2019) by R. S. Jhangiani et al., Kwantlen Polytechnic University, is used under a CC BY-NC-SA licence.