8.3. Validity

8.3. Validity

By Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler and Dana C. Leighton, adapted by Marc Chao and Muhamad Alif Bin Ibrahim

Validity refers to how well a measurement tool captures the variable it is intended to measure. While reliability, which refers to the consistency of a measure, is an essential foundation for validity, it does not guarantee it. A measure can be highly reliable but lack validity entirely. For example, if someone attempted to measure self-esteem by using a ruler to measure index finger length, the results might be consistent every time (reliable) but would not reflect self-esteem (invalid). Therefore, researchers must gather evidence to support the claim that their measurement tools accurately represent the intended construct. This evidence is typically categorised into several types, including face validity, content validity, criterion validity, convergent validity, and discriminant validity.

Face Validity

Face validity refers to how much a measurement method appears to measure what it claims to measure, just by looking at it. For example, a self-esteem questionnaire with questions like “Do you see yourself as a person of worth?” or “Do you think you have good qualities?” would seem to have good face validity because these questions directly relate to self-esteem. On the other hand, trying to measure self-esteem by measuring someone’s finger length would have poor face validity because there is no obvious connection between the two.

Face validity is often evaluated informally. Researchers (or sometimes participants) simply consider whether the measure looks like it is assessing the intended construct. Occasionally, it is assessed quantitatively; for instance, by asking a group of people to rate how well they think a measure captures the concept it is supposed to measure.

However, face validity is one of the weakest forms of evidence for a measure’s accuracy. One reason is that it relies on intuition, and people’s assumptions about what should measure a construct are often incorrect. Many well-established psychological tests actually lack face validity but still work effectively.

For example, the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) is a widely used tool for assessing personality traits and disorders. It includes statements like “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” to assess traits like aggression suppression. At first glance, these items do not seem connected to aggression. However, the test is not concerned with individual answers to these questions but rather with how a person’s overall pattern of responses compares to known patterns of individuals who suppress aggression.

Content Validity

Content validity refers to how well a measurement method captures all the important aspects of the construct it is supposed to measure. In other words, does the measure fully represent the concept it claims to assess?

For example, if a researcher defines test anxiety as including both physical symptoms (like nervous feelings caused by the activation of the sympathetic nervous system) and negative thoughts (like worrying about failure), then a good measure of test anxiety should include questions about both of these aspects. If the measure only focuses on nervous feelings and ignores negative thoughts, it would lack content validity because it does not fully represent the construct.

Similarly, attitudes are often described as including thoughts, feelings, and actions toward something. For instance, someone with a positive attitude toward exercise might:

think positively about exercise (“Exercise is good for my health.”)
feel good about exercising (“I enjoy working out.”)
actually engage in exercise regularly.

A measure of attitudes toward exercise would need to include items assessing all three of these components to have strong content validity.

Unlike other forms of validity, content validity is not typically assessed through statistical analysis. Instead, it relies on a detailed comparison of the measure with the conceptual definition of the construct. Researchers carefully review whether the measure includes all relevant dimensions and adequately reflects the intended concept.

Criterion Validity

Criterion validity refers to how well a measurement correlates with other variables, called criteria, that are logically related to the construct being measured. In simple terms, it asks: Does this measure behave the way we expect it to when compared to other relevant outcomes?

For example, if a researcher develops a new test anxiety questionnaire, we would expect the scores on this measure to correlate negatively with performance on an important school exam. In other words, students with higher test anxiety scores should, on average, perform worse on the exam. If such a negative correlation is found, it suggests that the test anxiety measure is valid. However, if students with high anxiety scores perform just as well as those with low scores, it would raise doubts about whether the measure actually assesses test anxiety.

A criterion can be any variable that logically connects to the construct being measured. For test anxiety, relevant criteria might include:

exam performance (expected negative correlation)
overall course grades (expected negative correlation)
blood pressure during an exam (expected positive correlation).

For another example, consider a measure of physical risk-taking. Validating such a measure might involve checking whether the scores are related to:

participation in extreme sports like snowboarding or rock climbing
the number of speeding tickets a person has received
the number of broken bones they have experienced.

Types of Criterion Validity

Concurrent validity: When the criterion is measured at the same time as the construct, it is called concurrent validity. For example, if test anxiety scores are correlated with blood pressure readings taken during an actual test, this would be evidence of concurrent validity.
Predictive validity: When the criterion is measured in the future, it is called predictive validity. For example, if test anxiety scores predict lower final exam scores at the end of the semester, the measure demonstrates predictive validity.

Convergent Validity

A special case of criterion validity is called convergent validity. It examines whether scores on a new measure align with scores from existing, well-established measures of the same construct. If a new test anxiety scale correlates strongly with a widely accepted test anxiety questionnaire, this would demonstrate convergent validity.

For example, psychologists John Cacioppo and Richard Petty developed the Need for Cognition Scale, which measures how much people enjoy and value thinking (Cacioppo & Petty, 1982). To validate their scale:

They found that higher scores correlated positively with academic achievement test scores.
They found that higher scores correlated negatively with dogmatism (a measure of rigid, uncritical thinking).

Over time, the Need for Cognition Scale has been shown to correlate with a variety of outcomes, such as the effectiveness of advertisements, political interest, and juror decision-making (Petty et al., 2009).

Discriminant Validity

Discriminant validity refers to the extent to which a measure does not strongly correlate with measures of variables that are theoretically different from it. In other words, it ensures that the measure captures the intended construct and not something else.

For example, self-esteem reflects a stable, long-term attitude toward oneself, while mood refers to temporary feelings that can change from moment to moment. A valid self-esteem questionnaire should show little correlation with a measure of mood. If the two are highly correlated, it could suggest that the self-esteem measure is unintentionally capturing mood rather than actual self-esteem.

When psychologists John Cacioppo and Richard Petty developed the Need for Cognition Scale (which measures how much people enjoy and value thinking), they also tested its discriminant validity. They found:

only a weak correlation between need for cognition and cognitive style (e.g., whether someone tends to think analytically or holistically)
no correlation between need for cognition and test anxiety
no correlation between need for cognition and social desirability (the tendency to respond in a way that makes one appear socially acceptable).

These weak or nonexistent correlations provided strong evidence that the Need for Cognition Scale was measuring a distinct construct, separate from cognitive style, anxiety, or social desirability.

References

Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42(1), 116–131. https://doi.org/10.1037/0022-3514.42.1.116

Petty, R., Briñol, P., Loersch, C., & McCaslin, M. J. (2009). The need for cognition. In M. R. Leary & R. H. Hoyle, R. H. (Eds.). Handbook of individual differences in social behavior. The Guilford Press.

Chapter Attribution

Content adapted, with editorial changes, from:

Research methods in psychology, (4th ed.), (2019) by R. S. Jhangiani et al., Kwantlen Polytechnic University, is used under a CC BY-NC-SA licence.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

8.3. Validity Copyright © 2025 by Marc Chao and Muhamad Alif Bin Ibrahim is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.