9.3. Experimentation and Validity

9.3. Experimentation and Validity

By Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler and Dana C. Leighton, adapted by Marc Chao and Muhamad Alif Bin Ibrahim

The Four Big Validities in Psychology Research

When evaluating a psychology experiment, one key question to ask is: “Is this study valid?” However, determining validity is not as simple as it might seem because there are different types of validity, each addressing a specific aspect of a study’s accuracy and soundness.

Researchers generally focus on four main types of validity to determine whether an experiment is well-designed (Judd & Kenny, 1981; Morling, 2014):

Internal Validity: Does the study establish a clear cause-and-effect relationship between the independent and dependent variables?
External Validity: Can the study’s results be generalised to other people, settings, or situations?
Construct Validity: Does the study accurately measure the concepts or variables it claims to measure?
Statistical Validity: Are the statistical analyses appropriate, and do they support the study’s conclusions?

Each of these validities focuses on a different question about the research. In the following sections, we will take a closer look at each type to better understand how they contribute to a study’s overall quality and reliability.

Internal Validity: Ensuring Cause-and-Effect Relationships

Just because two variables are statistically related does not mean one causes the other. You have probably heard the saying, “Correlation does not imply causation”. For example, if studies show that people who exercise regularly tend to be happier, it does not automatically mean that exercise causes happiness. It is possible that happier people are more likely to exercise, or that another factor, such as better physical health, leads to both increased happiness and regular exercise.

The goal of an experiment is to demonstrate a causal relationship between two variables by showing that changes in the independent variable directly cause changes in the dependent variable. The logic is straightforward: if a researcher creates two or more similar conditions and only manipulates the independent variable while keeping everything else constant, then any differences observed in the dependent variable must have been caused by the independent variable.

Take Darley and Latané’s experiment as an example. The only difference between their experimental conditions was the number of students participants believed were involved in the discussion. Because this was the only manipulated difference, it must have been the cause of differences in helping behaviour across the conditions.

A study is said to have high internal validity if its design supports the conclusion that the independent variable caused the observed changes in the dependent variable. Experiments typically have strong internal validity because they involve direct manipulation of the independent variable and control of extraneous variables, often through techniques like random assignment.

In contrast, non-experimental designs (e.g., correlational studies), where variables are observed and measured but not directly manipulated by the researcher, generally have lower internal validity. Without manipulation and control, it is harder to rule out alternative explanations for observed relationships between variables.

External Validity: How Well Do Results Apply Beyond the Experiment?

Experiments often face criticism for being conducted under artificial conditions due to the need to manipulate independent variables and control extraneous factors. For example, in many psychology experiments, participants are typically undergraduate students who complete paper-and-pencil questionnaires or computer tasks in a laboratory setting. Consider a study by Barbara Fredrickson and her colleagues (1998), where undergraduate students were asked to complete a maths test while wearing swimsuits. At first glance, this scenario seems unrealistic, as it is difficult to imagine a situation outside of a laboratory where someone would need to solve maths problems in a swimsuit.

This concern highlights the issue of external validity, which refers to how well the results of a study can be generalised to people and situations beyond those directly studied. A study has high external validity when its participants and conditions closely match real-world scenarios, a concept known as mundane realism. For example, if researchers wanted to study how cereal box colours (yellow vs. purple) influence shoppers’ choices, they would achieve high external validity by observing real shoppers in a real grocery store. If shoppers bought more cereal in purple boxes, the findings would likely apply to other grocery stores and shoppers.

However, if the same study were conducted in a university lab, where undergraduate students simply rated colours on a computer screen, the study would have lower mundane realism. While the visual processing of colours might still reflect real-world decision-making (psychological realism), the results would not directly translate to real grocery store behaviour.

It is important to note that experiments are not inherently low in external validity. Many experiments are carefully designed to simulate real-world conditions. For example, Darley and Latané’s experiment realistically simulated an emergency situation. Additionally, field experiments, which take place outside the lab, often achieve high external validity. In one study, Robert Cialdini and his colleagues tested how hotel guests responded to different towel reuse messages. They found that guests were far more likely to reuse their towels when told that most other guests did the same. Since the experiment was conducted in real hotel rooms with real guests, the findings are highly generalisable to other hotels.

Another reason experiments can still have strong external validity is that they often focus on universal psychological processes, which are mechanisms that operate consistently across different people and situations. Returning to Fredrickson’s swimsuit study, the researchers found that women, but not men, performed worse on the maths test while wearing swimsuits. They concluded that this was due to self-objectification, where women are more likely to view themselves from an outsider’s perspective, which can divert attention away from other tasks. While solving maths problems in swimsuits might be rare, the underlying psychological process of self-objectification is likely to occur in many different situations and contexts.

Construct Validity: How Well Does the Experiment Measure What it Claims to Measure?

Construct validity refers to how well an experiment’s design captures the concept it intends to study. It focuses on whether the manipulations and measures accurately represent the research question.

In their famous study, Darley and Latané explored the question: “Does helping behaviour become diffused when more people are present?” They hypothesised that participants would be less likely to help in an emergency if they believed more people were available to assist. This process of translating a research question into an experimental design is called operationalisation, which refers to defining how abstract variables will be measured or manipulated.

Darley and Latané operationalised “diffusion of responsibility” by varying the number of other people participants believed were involved in the discussion. Their experiment created a clear emergency situation, provided participants with an opportunity to help, and systematically increased the number of perceived bystanders. This design had high construct validity because the manipulations closely aligned with the core research question.

However, what if the study had only included two conditions, one with a single student and another with two students in the discussion? While a decrease in helping behaviour might still be observed, it would not provide strong evidence for diffusion of responsibility. Instead, the effect might be interpreted as social inhibition, a concept from Bandura’s research. In this case, construct validity would be lower because the study would not fully capture the intended phenomenon.

On the other hand, imagine if there had been five conditions instead of three. Researchers might have observed whether helping behaviour continued to decline as the number of bystanders increased or if it plateaued at a certain point. This would offer a more nuanced understanding of diffusion of responsibility. However, adding even more conditions beyond this point would not necessarily improve construct validity further, and it could make the study unnecessarily complex without adding new insights.

When designing your own experiment, think carefully about how well your operationalisation aligns with your research question. The goal is to ensure that your manipulations and measurements are clear, relevant, and directly address the concept you are trying to study. High construct validity strengthens the connection between your experimental design and the conclusions you can draw from your results.

Statistical Validity: Ensuring Accurate and Appropriate Data Analysis

Statistical validity refers to whether the correct statistical methods were used to analyse the data and if the conclusions drawn from those analyses are sound. In psychology, there are many statistical tests, such as t-tests, ANOVA, regression, and correlation, and choosing the right one depends on two main factors: the type of data collected (e.g., numerical or categorical) and the study design (e.g., between-subjects or within-subjects).

Each statistical test also comes with specific assumptions, like data being normally distributed or having equal variances between groups. If these assumptions are violated and the test is still applied, the statistical conclusions may become unreliable, which threatens statistical validity.

One common critique in research is that a study does not have enough participants. While this might seem like a concern about external validity (how well results generalise to a larger population), it is actually an issue of statistical validity. Small sample sizes make it harder to detect meaningful effects, even if they exist, because they reduce the statistical power of the study. However, it is worth noting that small sample sizes are not always a problem; certain types of research (e.g., single-case studies) can still provide valuable insights, as we will discuss later.

To ensure statistical validity, researchers must use the appropriate statistical tests for their data and design. Additionally, they should conduct a power analysis before starting their study. A power analysis helps determine the minimum number of participants needed to detect a specific effect size. In short, larger sample sizes increase the likelihood of detecting a real effect, but only if the right statistical tools are applied correctly.

Prioritising Validities in Research

The four key types of validity (internal, external, construct, and statistical) are essential tools for evaluating and designing experiments. However, achieving high validity across all four areas is often challenging, and researchers must prioritise based on their study’s goals.

For example, in Cialdini’s study on towel reuse in hotels, the external validity was notably high because the research was conducted in a real-world setting with typical hotel guests. However, the statistical validity was somewhat limited. This difference does not mean the study was flawed; instead, it highlights areas for improvement in future follow-up research (Goldstein et al., 2008).

As Morling (2014) explains, many psychology experiments tend to prioritise internal and construct validity to ensure that the independent variable truly causes changes in the dependent variable and that the variables are well-measured. However, this focus sometimes comes at the expense of external validity, meaning the findings may not always generalise well to real-world situations.

References

Cialdini, R. (2005). Don’t throw in the towel: Use social influence research. Observer, 2005(April). https://www.psychologicalscience.org/observer/dont-throw-in-the-towel-use-social-influence-research

Fredrickson, B. L., Roberts, T.-A., Noll, S. M., Quinn, D. M., & Twenge, J. M. (1998). That swimsuit becomes you: Sex differences in self-objectification, restrained eating, and math performance. Journal of Personality and Social Psychology, 75(1), 269–284. https://doi.org/10.1037/0022-3514.75.1.269

Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using social norms to motivate environmental conservation in hotels. The Journal of Consumer Research, 35(3), 472–482. https://doi.org/10.1086/586910

Judd, C.M. & Kenny, D.A. (1981). Estimating the effects of social interventions. Cambridge University Press.

Morling, B. (2014). Guide your students to become better research consumers. Observer, 2014(April). https://www.psychologicalscience.org/observer/teach-your-students-to-be-better-consumers

Chapter Attribution

Content adapted, with editorial changes, from:

Research methods in psychology, (4th ed.), (2019) by R. S. Jhangiani et al., Kwantlen Polytechnic University, is used under a CC BY-NC-SA licence.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

9.3. Experimentation and Validity Copyright © 2025 by Marc Chao and Muhamad Alif Bin Ibrahim is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.