11.3. The Analysis of Variance

11.3. The Analysis of Variance

By Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler and Dana C. Leighton, adapted by Marc Chao and Muhamad Alif Bin Ibrahim

While t-tests are ideal for comparing two means, such as between a sample mean and a population mean, or between the means of two groups or conditions, they are not suitable when more than two groups or conditions need to be compared. For these cases, the analysis of variance (ANOVA) is the most common statistical method.

This section focuses primarily on the one-way ANOVA, which is used in between-subjects designs with a single independent variable. Additionally, it provides a brief overview of other types of ANOVA, such as those used in within-subjects designs and factorial designs.

One-Way ANOVA

The one-way ANOVA is a statistical test used to compare the means of more than two groups (e.g., M₁, M₂…M_G) in a between-subjects design. It is particularly useful when the research involves multiple conditions or groups.

The null hypothesis for a one-way ANOVA is that all group means are equal in the population (μ₁ = μ₂ = ⋯ = μ_𝐺). The alternative hypothesis asserts that not all means are equal.

The F Statistic in ANOVA

The test statistic for ANOVA is called F, which is calculated as the ratio of two estimates of population variance derived from the sample data:

Mean Squares Between Groups (MS_B): This measures variability between the group means.
Mean Squares Within Groups (MS_W): This measures variability within each group.

The formula for the F statistic is:

F = MS_B / MS_W

This statistic is useful because we know how it behaves under the null hypothesis. As shown in Figure 11.3.1, the F-distribution is unimodal and positively skewed, with most values clustering around 1 when the null hypothesis is true. The shape of the distribution depends on two types of degrees of freedom:

Between-groups degrees of freedom: df_B = G − 1, where G is the number of groups.
Within-groups degrees of freedom: df_W = N − G, where N is the total sample size.

Distribution of the<em> F</em> ratio with 2 and 37 degrees of freedom when the null hypothesis is true — Figure 11.3.1. Distribution of the F ratio with 2 and 37 degrees of freedom when the null hypothesis is true. The red vertical line represents the critical value when α is .05, by R. S. Jhangiani et al. in Research Methods in Psychology 4e is used under a CC BY-NC-SA licence

Interpreting F and the p-Value

By knowing the distribution of F under the null hypothesis, we can calculate the p-value. Statistical software like SPSS or Excel can compute both the F statistic and the p-value.

If p ≤ 0.05, reject the null hypothesis and conclude that there are differences among the group means.
If p > 0.05, retain the null hypothesis, as there is not enough evidence to conclude differences exist.

Using Critical Values to Interpret F

In cases where F is calculated manually, a table of critical values (like Table 11.3.1) can be used. For example:

If the computed F ratio exceeds the critical value for the given degrees of freedom (df_B and df_W), the null hypothesis is rejected.
If the computed F ratio is less than the critical value, the null hypothesis is retained.
Critical values vary based on the degrees of freedom and the significance level (α = 0.05).

Table 11.3.1. Table of critical values of F when α = .05
			df_B
df_W	2	3	4
8	4.459	4.066	3.838
9	4.256	3.863	3.633
10	4.103	3.708	3.478
11	3.982	3.587	3.357
12	3.885	3.490	3.259
13	3.806	3.411	3.179
14	3.739	3.344	3.112
15	3.682	3.287	3.056
16	3.634	3.239	3.007
17	3.592	3.197	2.965
18	3.555	3.160	2.928
19	3.522	3.127	2.895
20	3.493	3.098	2.866
21	3.467	3.072	2.840
22	3.443	3.049	2.817
23	3.422	3.028	2.796
24	3.403	3.009	2.776
25	3.385	2.991	2.759
30	3.316	2.922	2.690
35	3.267	2.874	2.641
40	3.232	2.839	2.606
45	3.204	2.812	2.579
50	3.183	2.790	2.557
55	3.165	2.773	2.540
60	3.150	2.758	2.525
65	3.138	2.746	2.513
70	3.128	2.736	2.503
75	3.119	2.727	2.494
80	3.111	2.719	2.486
85	3.104	2.712	2.479
90	3.098	2.706	2.473
95	3.092	2.700	2.467
100	3.087	2.696	2.463

Example: One-Way ANOVA

A health psychologist is investigating whether different groups estimate the calorie content of a chocolate chip cookie differently. The three groups in the study are psychology majors, nutrition majors, and professional dietitians. The psychologist collects calorie estimation data from participants in each group and compares the group means using a one-way ANOVA. This statistical test is designed to determine if there are significant differences among the group means in the population.

The calorie estimates from each group are as follows:

Psychology majors (8 participants): 200, 180, 220, 160, 150, 200, 190, 200
- Mean (M₁) = 187.50
- Standard Deviation (SD₁) = 23.14
Nutrition majors (9 participants): 190, 220, 200, 230, 160, 150, 200, 210, 195
- Mean (M₂) = 195.00
- Standard Deviation (SD₂) = 27.77
Dieticians (8 participants): 220, 250, 240, 275, 250, 230, 200, 240
- Mean (M₃) = 238.13
- Standard Deviation (SD₃) = 22.35

From the descriptive statistics, it appears that dieticians provide significantly higher calorie estimates than both psychology and nutrition majors.

Hypotheses

Null Hypothesis (H₀): The mean calorie estimates for the three groups are equal in the population (μ₁ = μ₂ = μ₃).
Alternative Hypothesis (H_A): At least one of the group means is significantly different from the others.

The psychologist decides to run a one-way ANOVA to statistically test whether the group means are significantly different in the population. The psychologist uses statistical software (e.g., SPSS, Excel, or another tool) to conduct the one-way ANOVA. The software calculates the F ratio and the p-value to determine whether the differences among the group means are statistically significant.

The results of the ANOVA are summarised in the following table (Table 11.3.2):

Table 11.3.2. ANOVA results
*Source of variation*	SS	df	MS	F	*p-value*	F_critical
Between groups	11,943.75	2	5,971.875	9.916234	0.000928	3.4668
Within groups	12,646.88	21	602.2321
Total	24,590.63	23

Key results from the table:

MS_B (mean squares between groups): 5,971.88
MS_W (mean squares within groups): 602.23
F (F ratio): 9.92
p-value: 0.0009

Interpretation of Results

Decision rule: The critical value of F (F_critical) at α = 0.05, with 2 and 21 degrees of freedom, is 3.467. The computed F ratio (9.92) exceeds this critical value, indicating that the result is statistically significant.
p-value: The p-value (0.0009) is much smaller than 0.05, providing strong evidence to reject the null hypothesis.
Conclusion: The psychologist concludes that the mean calorie estimates for the three groups are not equal in the population. This suggests significant differences in how the three groups estimate calorie content.

The results provide strong evidence that dieticians estimate calories differently from the other two groups. However, the ANOVA does not specify which group means are different from each other. To identify the specific group differences, post hoc comparisons (e.g., Tukey’s HSD test) would need to be conducted.

The results can be reported as follows:

F(2, 21) = 9.92, p = 0.0009

This format indicates:

Degrees of freedom (df_B = 2, df_W = 21)
F-statistic (F = 9.92)
p-value (p = 0.0009)

The sum of squares (SS) values for “Between Groups” and “Within Groups” are intermediate calculations used to find the MS_B and MS_W values but are not typically reported in the final results.
If the F ratio had been computed manually, the psychologist could use a table of critical values (like Table 11.3.1) to confirm the statistical significance.

Post Hoc Comparisons: Follow-Up Tests After ANOVA

When the null hypothesis is rejected in a one-way ANOVA, we conclude that the group means are not all equal in the population. However, this result does not clarify the specific nature of the differences. For instance, in a study with three groups:

All three means might differ significantly from one another. For example, the mean calorie estimates for psychology majors, nutrition majors, and dieticians could all be distinct.
Only one mean might differ significantly. For example, dieticians’ calorie estimates might differ significantly from those of psychology and nutrition majors, while the psychology and nutrition majors’ means remain similar.

Because ANOVA does not specify which groups differ, significant results are typically followed by post hoc comparisons to identify specific group differences.

Why Not Just Use Multiple t-Tests?

A straightforward approach might be to run a series of independent-samples t-tests, comparing each group’s mean to every other group. However, this method has a critical flaw: increased risk of type I errors. Each t-test has a 5% chance of incorrectly rejecting the null hypothesis when it is true. Conducting multiple t-tests increases the cumulative probability of making at least one error. For example, with three groups, conducting three t-tests raises the likelihood of a mistake beyond the acceptable 5% level.

To control for this inflated error risk, researchers use modified t-test procedures designed to keep the overall error rate within acceptable limits (close to 5%). These methods include:

Bonferroni Procedure: Adjusts the significance threshold for each test based on the number of comparisons.
Fisher’s Least Significant Difference (LSD) Test: Performs comparisons but with adjustments to maintain accuracy.
Tukey’s Honestly Significant Difference (HSD) Test: Specifically designed for pairwise comparisons to control error rates.

While the technical details of these methods are beyond the scope of this explanation, their purpose is straightforward: to minimise the chance of mistakenly rejecting a true null hypothesis while identifying meaningful group differences.

Repeated-Measures ANOVA

The one-way ANOVA is designed for between-subjects designs, where the means being compared come from separate groups of participants. However, it is not suitable for within-subjects designs, where the same participants are tested under different conditions or at different times. In these cases, a repeated-measures ANOVA is used instead.

The fundamental principles of repeated-measures ANOVA are similar to those of the one-way ANOVA. The primary distinction lies in how the analysis accounts for variability within the data:

Handling Individual Differences

Measuring the dependent variable multiple times for each participant allows the analysis to account for stable individual differences. These are characteristics that consistently vary among participants but remain unchanged across conditions. For example:

In a reaction-time study, some participants might be naturally faster or slower due to stable factors like their nervous system or muscle response.
In a between-subjects design, these differences add to the variability within groups, increasing the mean squares within groups (MS_W). A higher MS_W results in a lower F value, making it harder to detect significant differences.
In a repeated-measures design, these stable individual differences are measured and subtracted from MS_W, reducing its value.

Increased Sensitivity

By reducing the value of MS_W, repeated-measures ANOVA produces a higher F value, making the test more sensitive to detecting actual differences between conditions.

This approach leverages the fact that participants serve as their own control, allowing the analysis to isolate and remove irrelevant variability caused by stable individual differences. As a result, repeated-measures ANOVA offers a more precise and powerful way to detect meaningful effects in within-subjects designs.

Factorial ANOVA

When a study includes more than one independent variable, the appropriate statistical approach is the factorial ANOVA. This method builds on the principles of the one-way and repeated-measures ANOVAs but is designed to handle the complexities of factorial designs.

Multiple Effects Analysis

Factorial ANOVA examines not only the impact of each independent variable (main effects) but also how these variables interact (interaction effects). For each:

It calculates an F ratio to measure the variability explained by the effect.
It provides a p value to indicate whether the effect is statistically significant.

Main Effects

These represent the individual influence of each independent variable on the dependent variable. For example:

In a calorie estimation study, one main effect might measure whether a participant’s major (psychology vs. nutrition) influences calorie estimates.
Another main effect might examine whether the type of food (cookie vs. hamburger) affects the estimates.

Interaction Effects

The interaction effect explores whether the influence of one independent variable depends on the level of the other. For instance:

Does the participant’s major (psychology vs. nutrition) affect calorie estimates differently depending on the type of food (cookie vs. hamburger)?

Customising for Study Design

The factorial ANOVA is adaptable and must be tailored based on the study design:

Between-subjects design: Participants are assigned to separate groups based on the levels of the independent variables.
Within-subjects design: The same participants experience all levels of the independent variables.
Mixed design: Combines between-subjects and within-subjects elements.

Imagine a health psychologist investigates calorie estimation using two independent variables: participant major (psychology vs. nutrition) and food type (cookie vs. hamburger). A factorial ANOVA would calculate:

An F ratio and p value for the main effect of major.
An F ratio and p value for the main effect of food type.
An F ratio and p value for the interaction between major and food type.

This analysis provides a detailed understanding of how the independent variables individually and collectively influence calorie estimates, making the factorial ANOVA a powerful tool for complex research designs.

Chapter Attribution

Content adapted, with editorial changes, from:

Research methods in psychology, (4th ed.), (2019) by R. S. Jhangiani et al., Kwantlen Polytechnic University, is used under a CC BY-NC-SA licence.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

11.3. The Analysis of Variance Copyright © 2025 by Marc Chao and Muhamad Alif Bin Ibrahim is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.