9.2. Experimental Design

9.2. Experimental Design

By Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler and Dana C. Leighton, adapted by Marc Chao and Muhamad Alif Bin Ibrahim

This section explores different ways to design an experiment. The main difference lies in how participants interact with the independent variable. In one approach, each participant experiences only one level of the independent variable. This is known as a between-subjects experiment. In the other approach, each participant experiences every level of the independent variable. This is called a within-subjects experiment.

Between-Subjects Experiments

In a between-subjects experiment, each participant is exposed to only one level of the independent variable. For example, in a study with 100 university students, half might be asked to write about a traumatic event while the other half writes about a neutral event. Similarly, in a study with 60 people who have severe agoraphobia, 20 participants might be assigned to each of three different treatments for the disorder.

In this type of experiment, it is crucial that the groups are, on average, as similar as possible. Participants in each condition should have comparable characteristics, such as gender balance, average IQ, motivation levels, and general health status. This similarity ensures that extraneous participant variables, which are factors other than the independent variable, do not become confounding variables that could distort the results. By carefully balancing these variables across groups, researchers can be confident that any observed differences in outcomes are due to the independent variable, not unintended factors.

Random Assignment

Random assignment is a method researchers use to evenly distribute extraneous variables across different experimental conditions. It involves assigning participants to conditions using a random process, ensuring each participant has an equal chance of being placed in any group.

It is important not to confuse random assignment with random sampling. Random sampling is about selecting participants from a population, while random assignment focuses on distributing participants into experimental groups. In psychology, random sampling is rarely used, but random assignment is a standard and crucial practice.

For random assignment to be effective, two criteria must be met:

Equal Chance: Every participant must have the same probability of being assigned to each condition (e.g., a 50% chance for two conditions).
Independence: Each participant’s assignment must be made independently of others.

A simple example of random assignment is flipping a coin: heads could mean the participant goes to Condition A, while tails assigns them to Condition B. For three conditions, researchers might use a random number generator to assign participants based on numbers (e.g., 1 for Condition A, 2 for Condition B, 3 for Condition C).

In practice, researchers often create an assignment sequence in advance, especially when using software. The sequence ensures that participants are assigned fairly as they arrive.

Addressing Unequal Group Sizes

A challenge with pure random assignment (e.g., coin flipping) is that group sizes might become unequal. While unequal sample sizes are not usually a big issue, equal group sizes are more efficient for statistical analysis.

To address this, researchers often use a method called block randomisation. In this approach:

Each condition appears once within a block before any condition is repeated.
The order of conditions within each block is randomised.
This sequence is prepared before participants arrive, and each new participant is assigned to the next available slot in the sequence.

For example, if there are three conditions (A, B, and C) and nine participants, the random assignment might look like this (Table 9.2.1):

Table 9.2.1. Block randomisation sequence for assigning nine participants to three conditions
Participant	Condition
1	A
2	C
3	B
4	B
5	C
6	A
7	C
8	B
9	A

Online tools, such as Research Randomiser, can help generate these sequences automatically.

Limitations of Random Assignment

While random assignment is highly effective, it is not without its limitations. There is always a chance that, by pure coincidence, the groups might differ in meaningful ways. For example, one group could unintentionally have slightly older participants, or participants who are more motivated than those in another group.

However, this concern is generally minimal for several reasons. First, random assignment tends to work better with larger sample sizes, as larger groups reduce the impact of chance imbalances. Second, statistical tests used to analyse experimental data are specifically designed to account for the imperfections of random assignment. Finally, if an unnoticed confounding variable does influence the results, replication of the experiment can often reveal and address such issues.

Matched Groups

In a matched-groups design, participants are carefully matched across conditions based on their scores on the dependent variable or other relevant extraneous variables before the independent variable is manipulated. This approach ensures that these variables will not become confounding factors across experimental conditions.

For example, imagine we want to study whether expressive writing impacts people’s health. First, we would measure health-related variables for all potential participants. Using these measurements, we would rank participants from the healthiest to the least healthy.

Next, we would pair participants based on their health rankings. In each pair, one participant would be randomly assigned to the traumatic writing condition, while the other would be assigned to the neutral writing condition. This process would continue until every participant is assigned to a condition, ensuring that both groups are balanced in terms of health from the start.

If we observe a difference in health outcomes between the two groups at the end of the study, we can confidently attribute this difference to the writing intervention rather than pre-existing differences in health. This design reduces variability between groups and strengthens the internal validity of the experiment.

Within-Subjects Experiments

In a within-subjects experiment, each participant experiences all conditions of the study. For example, in an experiment examining how a defendant’s physical attractiveness influences judgements of guilt, a between-subjects design would involve one group evaluating an attractive defendant and another group evaluating an unattractive defendant. In contrast, a within-subjects design would have the same participants evaluate both an attractive and an unattractive defendant.

The main advantage of a within-subjects design is its ability to control extraneous participant variables effectively. Since each participant serves as their own control, factors like IQ, socioeconomic status, or family background remain consistent across conditions. This reduces variability caused by individual differences and makes it easier to detect the effect of the independent variable. Additionally, within-subjects designs allow researchers to use statistical techniques that account for these consistent participant variables, further minimising “noise” in the data.

However, not all experiments are suitable for a within-subjects design, and in some cases, it may not be the best choice. Certain types of studies or research questions might require a different approach, which we will explore further later in the chapter.

Carryover Effects and Counterbalancing

In a within-subjects experiment, participants experience all conditions of the independent variable. While this design has advantages, it also introduces potential issues known as order effects. Order effects happen when the order in which participants experience conditions influences their responses. One common type is a carryover effect, where being tested in one condition affects performance in later conditions.

Carryover effects can take different forms. A practice effect occurs when participants perform better in later conditions because they have had time to practice a task. In contrast, a fatigue effect happens when participants perform worse in later conditions due to tiredness or boredom. Another type, called a context effect (or contrast effect), happens when participants’ perceptions are influenced by the order of conditions. For example, if participants first judge an attractive defendant and then judge an average-looking one, their judgements might be harsher simply because of the comparison.

Order effects can also make participants more likely to guess the hypothesis of the study. If participants are asked to judge both an attractive and an unattractive defendant, they might realise the study is about how attractiveness influences judgements. This knowledge could lead them to unconsciously alter their responses to match (or oppose) the perceived expectations of the researcher.

While carryover effects are sometimes interesting research topics on their own, they pose a problem when they are not the focus of the study. For example, if participants always judge the attractive defendant first and the unattractive defendant second, any differences observed might result from the order of conditions rather than from the attractiveness variable itself. In this case, the order of conditions becomes a confounding variable.

The most effective solution to this issue is counterbalancing, where participants experience conditions in different orders. The best approach is complete counterbalancing, where every possible order of conditions is used equally across participants. For example, in a study with two conditions (A and B), half the participants would experience condition A first, while the other half would experience condition B first. With three conditions (A, B, and C), participants would be randomly assigned to one of six possible orders: ABC, ACB, BAC, BCA, CAB, or CBA. However, as the number of conditions increases, the number of possible orders grows rapidly. For example, four conditions require 24 orders, and five conditions require 120 orders.

When complete counterbalancing is not practical, researchers often use a Latin square design. A Latin square ensures that each condition appears in every position (first, second, third, etc.) an equal number of times and that every condition follows and precedes each other condition exactly once. For example, in a study with four conditions (A, B, C, D), the Latin square might look like this (Table 9.2.2):

Table 9.2.2. An example of a Latin square
A	B	C	D
B	C	D	A
C	D	A	B
D	A	B	C

In this setup, each condition appears in every order position once, and the total number of orders equals the number of conditions (4 instead of 24). This approach drastically reduces the complexity of counterbalancing while still minimising order effects.

For experiments with a very large number of conditions, random counterbalancing can be used. In this method, the order of conditions is randomly assigned for each participant. While this approach is less effective than complete counterbalancing or Latin square designs, it can still help reduce order effects when they are expected to be minor.

Counterbalancing achieves two key objectives. First, it prevents order effects from becoming confounding variables by ensuring conditions are presented in different orders across participants. This way, any observed differences in the dependent variable cannot be solely attributed to the order of conditions. Second, counterbalancing allows researchers to detect carryover effects by analysing whether the order of conditions had any significant impact on the results.

Simultaneous Within-Subjects Designs

In traditional within-subjects designs, participants experience one condition at a time. However, there is an alternative approach where participants respond to multiple conditions simultaneously. This method is often used when participants make repeated responses in each condition.

For example, imagine a study where participants judge the guilt of 10 attractive defendants and 10 unattractive defendants. Instead of having participants rate all the attractive defendants first and then all the unattractive ones, the researcher could mix the two types together in a random order. Participants would then make judgements for all 20 defendants, and the researcher could calculate the average guilt rating for each type.

Another example might involve studying memory in people with social anxiety disorder. Suppose a researcher wants to know whether these individuals remember negative adjectives (e.g., “stupid”, “incompetent”) better than positive ones (e.g., “happy”, “productive”). Instead of presenting two separate lists of positive and negative adjectives, the researcher could create one list containing both types of words. Participants would study the mixed list and then try to recall as many words as possible. The researcher would then count how many positive and negative words were remembered.

This simultaneous approach allows participants to process multiple conditions within a single session, which can help reduce order effects and make the study more efficient. It also provides a clear comparison between conditions while maintaining the advantages of a within-subjects design.

Choosing Between Between-Subjects and Within-Subjects Designs

Most experiments can be designed using either a between-subjects or a within-subjects approach. Researchers must carefully consider the strengths and weaknesses of each method to determine which is best suited for their specific study.

Between-subjects designs are often simpler to set up and require less time per participant. They naturally avoid carryover effects without the need for complex counterbalancing. This design is particularly useful when testing time is limited or when exposure to one condition might permanently alter a participant’s response in another condition.

On the other hand, within-subjects designs offer better control over extraneous participant variables, reducing noise in the data and making it easier to detect the effect of the independent variable on the dependent variable. They also typically require fewer participants to achieve the same statistical power as a between-subjects design.

A good general rule is: if you can conduct a within-subjects experiment in the available time per participant, and if carryover effects can be managed with proper counterbalancing, then a within-subjects design is usually the better choice. However, if a within-subjects design is impractical, either because of time constraints or because one condition might permanently affect responses in another, a between-subjects design is more appropriate.

For example, if you are studying participants in a busy setting like a doctor’s waiting room or a grocery store line, you may not have the time to test each person under multiple conditions. A between-subjects design would be more efficient in this case. Similarly, if you are testing an intervention designed to reduce prejudice, a within-subjects design would require participants to be exposed to the treatment and then the control condition. If the treatment is effective, participants’ prejudice levels would already be reduced, making them unsuitable for the control condition. In such cases, a between-subjects design is the only feasible option.

Finally, remember that choosing one design does not exclude the other from future studies. Researchers often use both designs, sometimes even within the same research program, to explore a question from multiple angles. This mixed-methods approach is common in professional research and can provide a more comprehensive understanding of the phenomenon being studied.

Chapter Attribution

Content adapted, with editorial changes, from:

Research methods in psychology, (4th ed.), (2019) by R. S. Jhangiani et al., Kwantlen Polytechnic University, is used under a CC BY-NC-SA licence.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

9.2. Experimental Design Copyright © 2025 by Marc Chao and Muhamad Alif Bin Ibrahim is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.