10.6. Conducting Your Analyses
By Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler and Dana C. Leighton, adapted by Marc Chao and Muhamad Alif Bin Ibrahim
Analysing data can be a challenging task, even when you have a solid understanding of the statistical methods involved. Typically, you will be working with data collected from multiple participants, covering several variables. These might include demographic details like age and sex, independent and dependent variables, and possibly manipulation checks to verify experimental conditions.
The raw data you collect might come in various forms, including paper questionnaires, digital files filled with numbers or text, video recordings, or written observations. These different sources of information often need to be organised, coded, or merged into a cohesive dataset before analysis can begin. Additionally, you may encounter missing data, errors, or responses that seem unusual or inconsistent, all of which need to be addressed carefully.
In this section, we will explore practical strategies to streamline your data analysis process. By staying organised and approaching the task systematically, you can reduce errors, save time, and ensure your results are accurate and reliable.
Preparing Your Data for Analysis
Before analysing your data, whether it is in paper form or stored in a digital file, there are some essential steps to follow to ensure everything is organised, secure, and ready for processing.
First, make sure your data does not include any information that could identify individual participants. Confidentiality is crucial, so store raw data securely, either in a locked room or on a password-protected computer. Consent forms should be stored separately in another secure location. Additionally, create backup copies of your data, either photocopies or digital backups, and store them securely in a different location. Professional researchers typically keep these records for several years in case questions arise later about the data, procedure, or consent process.
Next, carefully review your raw data for completeness and accuracy. Check that all responses are legible, recorded correctly, and make sense. You might encounter missing responses, unclear answers, or obvious errors (e.g., someone marking “12” on a 1-to-10 scale). If these issues affect critical independent or dependent variables, or if too many responses are missing or questionable, you may need to exclude that participant’s data from your analysis. However, do not delete or discard excluded data. Instead, set them aside, document the reasons for exclusion, and keep detailed notes as you will need to report this in your final analysis.
Once your data are clean and ready, you can enter them into a spreadsheet or statistical software like Microsoft Excel or SPSS. If your data are already in a digital file, ensure they are properly formatted for analysis. Typically, data are organised so that each row represents one participant, and each column represents one variable, with clear variable names at the top of each column.
For example, a typical data file shown in Table 10.6.1 starts with a column for participant identification numbers, followed by demographic variables (e.g., sex and age), independent variables (e.g., mood), multiple survey items (e.g., self-esteem questions), and dependent variables (e.g., intentions and attitudes). Categorical variables can be entered either as labels (e.g., “M” for male, “F” for female) or as numbers (e.g., “0” for negative mood and “1” for positive mood). While labels are more intuitive for reading, certain statistical analyses may require numerical coding. Tools like SPSS allow you to enter numerical values and attach corresponding labels for clarity.
ID | SEX | AGE | MOOD | SE1 | SE2 | SE3 | SE4 | TOTAL | INT | ATT |
1 | M | 20 | 1 | 2 | 3 | 2 | 3 | 10 | 6 | 5 |
2 | F | 22 | 1 | 1 | 0 | 2 | 1 | 4 | 4 | 4 |
3 | F | 19 | 0 | 2 | 2 | 2 | 2 | 8 | 2 | 3 |
4 | F | 24 | 0 | 3 | 3 | 2 | 3 | 11 | 5 | 6 |
If you are working with multiple-response measures, such as several survey items assessing self-esteem, it is better to enter each response as a separate variable in your spreadsheet rather than manually calculating a total score beforehand. Software tools like Excel or SPSS have built-in functions (e.g., “AVERAGE” in Excel or “Compute” in SPSS) to combine these responses accurately. This method reduces errors, allows you to check internal consistency, and provides flexibility if you decide to analyse individual survey items later.
Preliminary Analyses
Before diving into your primary research questions, it is important to run a few preliminary analyses to ensure your data are reliable and ready for deeper examination.
If you are using a multiple-response measure, start by checking its internal consistency. This ensures that the items on your measure are reliably capturing the same underlying concept. Statistical programs like SPSS can calculate reliability coefficients such as Cronbach’s α or Cohen’s κ. If these seem too complex, you can still assess reliability with a simpler method like a split-half correlation, which compares how two halves of the measure align with each other.
Next, analyse each key variable on its own (this step is not necessary for manipulated independent variables since their values are determined by the researcher). Start by creating histograms for each variable to visualise their distributions. Pay attention to their shapes and calculate common measures of central tendency (e.g., mean, median, mode) and variability (e.g., standard deviation). Be sure you understand what these statistics reveal about your data. For example, if participants rated their happiness on a 1-to-10 scale and the distribution is unimodal and negatively skewed, with a mean of 8.25 and a standard deviation of 1.14, it means most participants rated themselves fairly high on happiness, with a few giving noticeably lower ratings.
At this stage, it is also essential to identify outliers. These are data points that stand out as extreme compared to the rest of your dataset. Investigate these outliers carefully to determine whether they result from simple data-entry errors. If you find a mistake, correct it and move on. However, if an outlier seems to stem from a misunderstanding or lack of effort from a participant, you might need to consider excluding it. For example, in a reaction-time study where most participants responded within a few seconds, a response time of three minutes would likely indicate confusion or inattention. Including such an extreme value would significantly distort the mean and standard deviation.
If you decide to exclude outliers, document your reasons carefully and apply the same criteria consistently across all participants. Keep detailed notes on which data points were removed, why they were excluded, and the rules you followed. When you report your results, make sure to mention how many participants or responses were excluded and the criteria used for exclusion. Importantly, do not delete or discard excluded data. Set them aside in case you or another researcher needs to review them later.
It is worth noting that not all outliers are errors or misunderstandings. Sometimes, they genuinely reflect extreme but valid responses. For example, in a survey on the number of sexual partners among university students, most participants might report fewer than 15, but a handful might report 60 or 70. While these numbers could be errors, exaggerations, or misunderstandings, they might also be accurate reflections of those participants’ experiences.
In such cases, there are a few strategies you can use. One approach is to rely on statistics like the median, which are less affected by extreme values. Another approach is to run your analysis twice, once with the outliers included and once without them. If the results are essentially the same, it is usually safe to leave the outliers in the dataset. If the results differ significantly, you can report both analyses and explain how the outliers influenced the findings.
Planned and Exploratory Analyses
Once your data are prepared and preliminary analyses are complete, you are ready to address your primary research questions. When designing your study, you likely had specific hypotheses in mind, such as predictions about relationships or patterns you expected to find in the data. Testing these predictions involves planned analyses, where you focus on analysing the relationships you anticipated.
For example, if your hypothesis predicted a difference between group or condition means, you would calculate the means and standard deviations for each group, create a bar graph to visualise the results, and calculate Cohen’s d to measure the size of the difference. If your hypothesis involved a correlation between two quantitative variables, you would create a scatterplot or line graph (making sure to check for any signs of nonlinearity or restriction of range) and calculate Pearson’s r to measure the strength and direction of the relationship.
After completing your planned analyses, you might decide to look for additional patterns or relationships in your data that you did not predict beforehand. These are called exploratory analyses because they are not based on pre-existing hypotheses. Exploratory analyses can uncover unexpected findings that might inspire future research or provide valuable insights for the discussion section of your report.
As psychologist Daryl Bem (2003) suggests, exploratory analysis often involves examining the data from multiple perspectives. You might analyse subgroups separately (e.g., by sex), create new composite scores by combining variables, or reorganise the data in different ways to reveal potential patterns. If an interesting trend emerges, you might explore whether similar evidence exists elsewhere in your dataset. While Bem humorously describes this as a “fishing expedition”, the goal is to uncover meaningful insights hidden in the data.
However, it is important to distinguish planned analyses from exploratory analyses when presenting your results. Planned analyses are based on specific hypotheses and have a clearer foundation, while exploratory analyses are more open-ended and carry a higher risk of identifying patterns that occurred purely by chance. This risk is known as a Type 1 error, where a random anomaly is mistaken for a genuine finding.
Because of this risk, findings from exploratory analyses should be interpreted cautiously and ideally tested again in a follow-up study before being presented as reliable results. In your report, make it clear which results came from planned analyses and which emerged during exploratory analysis. If you discover intriguing patterns during exploratory analysis, describe them as potential areas for further investigation rather than definitive conclusions.
Understanding Your Descriptive Statistics
Before diving into inferential statistics, which help determine whether your study’s results are likely to apply to the larger population, it is essential to fully understand your descriptive statistics. These statistics tell the story of what actually happened in your study, providing a clear snapshot of your data.
For example, imagine a study where a treatment group of 50 participants has an average score of 34.32 with a standard deviation (SD) of 10.45, while a control group of 50 participants has an average score of 21.45 with an SD of 9.22. Additionally, the effect size (Cohen’s d) is a very strong 1.31. Even without running a formal inferential statistical test like a t-test, it is already clear from these descriptive statistics that the treatment had a significant impact.
Similarly, consider a scatterplot showing a random cloud of data points and a Pearson’s r value of −0.02. This tiny correlation tells you that there is essentially no relationship between the two variables. Again, while inferential statistical testing would still be part of a formal report, the descriptive statistics alone already paint a clear picture.
References
Bem, D. J. (1987). Writing the empirical journal article. Psychology Press.
Chapter Attribution
Content adapted, with editorial changes, from:
Research methods in psychology, (4th ed.), (2019) by R. S. Jhangiani et al., Kwantlen Polytechnic University, is used under a CC BY-NC-SA licence.