6.6. Analysing the Data

6.6. Analysing the Data

By Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler and Dana C. Leighton, adapted by Marc Chao and Muhamad Alif Bin Ibrahim

Once a study has been conducted and the data collected, researchers must systematically analyse the data to draw meaningful conclusions. This stage is critical because raw data, no matter how extensive, do not speak for themselves. Data analysis involves applying statistical techniques to identify patterns, relationships, and trends within the data. Typically, researchers use two primary types of statistics: descriptive statistics and inferential statistics. Together, these approaches help summarise the data, test hypotheses, and determine whether the results can be generalised to a larger population.

Descriptive Statistics

Descriptive statistics are used to summarise, organise, and simplify data so that they can be more easily interpreted. They provide a clear picture of what the data look like and allow researchers to highlight key patterns. Descriptive statistics typically fall into three major categories: measures of central tendency, measures of dispersion, and correlation coefficients.

Measures of Central Tendency

These statistics describe the centre or average value of a data set and give researchers an idea of the “typical” response within the sample. The three main measures of central tendency are:

Mean: The arithmetic average of a set of scores. It is calculated by adding all the scores together and dividing by the number of scores.
Median: The middle score in a dataset when the scores are arranged in ascending or descending order.
Mode: The most frequently occurring score in a dataset.

For example, if researchers measure how many hours a group of students sleep per night, the mean would give the average number of hours, the median would show the midpoint value, and the mode would identify the most common number of hours reported.

Measures of Dispersion

While measures of central tendency show the average or typical value, measures of dispersion indicate how spread out the data are around that central point. These include:

Range: The difference between the highest and lowest scores.
Standard Deviation: A more sophisticated measure that indicates how far, on average, each score deviates from the mean.
Variance: The square of the standard deviation, providing another measure of spread, though less commonly interpreted directly.

For instance, if two classrooms report an average test score of 85, but one classroom has a standard deviation of 2 while the other has a standard deviation of 15, the latter classroom shows far greater variability in student performance.

Correlation Coefficients

In non-experimental research, researchers often seek to identify relationships between two variables rather than comparing groups. The correlation coefficient measures both the strength and direction of these relationships, ranging from -1.00 to +1.00:

A positive correlation means that as one variable increases, the other also increases (e.g., height and weight).
A negative correlation means that as one variable increases, the other decreases (e.g., stress and happiness).
A correlation coefficient close to 0 indicates no relationship between the variables.

For example, if researchers observe a correlation of +0.75 between sleep duration and cognitive performance, it suggests a strong positive relationship, where better sleep is associated with better cognitive outcomes.

Descriptive statistics serve as the foundation for understanding the dataset, preparing it for more complex statistical analysis, and communicating the findings in a clear and accessible way.

Inferential Statistics

While descriptive statistics summarise what happened within the sample, inferential statistics allow researchers to draw conclusions about the broader population based on sample data. This process is crucial because most psychological studies rely on samples rather than entire populations.

Inferential statistics enable researchers to test hypotheses and determine whether the observed effects in their data are statistically significant. In other words, whether they are unlikely to have occurred by chance.

Statistical Significance and Probability

Statistical significance is determined using a p-value. This is a probability value that indicates the likelihood of obtaining the observed results if the null hypothesis (the assumption that there is no real effect or relationship) were true. In most research, a p-value of less than 0.05 (5%) is considered statistically significant. This means there is less than a 5% chance that the observed effect occurred randomly.

For example, if a study finds that a new anxiety treatment significantly reduces symptoms compared to a placebo, and the p-value is less than 0.05, researchers can conclude that the effect is unlikely to be due to chance.

The Role of Probability and Error

It is important to note that inferential statistics are probabilistic and never provide absolute certainty. Instead, they offer confidence levels about whether an observed effect reflects a real relationship in the population. However, this probabilistic nature opens the door to potential errors:

Type I Error (False Positive): This occurs when researchers conclude that an effect exists when it actually does not. For example, they might conclude that a drug improves memory when the observed results were purely due to chance. The 5% significance threshold helps minimise this risk but does not eliminate it entirely.
Type II Error (False Negative): This happens when researchers fail to detect an effect that actually exists. For instance, they might conclude that a treatment has no impact when it genuinely does, perhaps because the sample size was too small or the statistical power was inadequate.

Researchers aim to strike a balance between minimising Type I and Type II errors, often adjusting sample sizes, significance thresholds, and statistical techniques to ensure their conclusions are as reliable as possible.

Drawing Conclusions from Statistical Analyses

Once researchers have completed their statistical analyses, they must carefully interpret their results. Did the findings support the hypothesis? Were there unexpected patterns? Do the results align with or contradict previous research?

Statistical Significance vs. Practical Significance

While statistical significance indicates whether an effect is unlikely due to chance, practical significance considers whether the effect is meaningful in real-world terms. For example, if a study finds that a drug reduces anxiety scores by 0.5 points on a 100-point scale, the result might be statistically significant but not practically meaningful.

Replicability and Transparency

To strengthen confidence in their findings, researchers often conduct replication studies by repeating the experiment under similar conditions to see if the same results emerge. They also share their data, methods, and analyses transparently, enabling other scientists to verify or challenge their conclusions.

Chapter Attribution

Content adapted, with editorial changes, from:

Research methods in psychology, (4th ed.), (2019) by R. S. Jhangiani et al., Kwantlen Polytechnic University, is used under a CC BY-NC-SA licence.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

6.6. Analysing the Data Copyright © 2025 by Marc Chao and Muhamad Alif Bin Ibrahim is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.