Appendix T — SPSS Tutorial: Repeated Measures ANOVA
Conducting one-way repeated measures ANOVA in SPSS
T.1 Overview
This tutorial walks through a one-way repeated measures ANOVA using the core_session_wide_training.csv dataset. Download it here: core_session_wide_training.csv. The research question is: Did muscular strength change significantly across three time points (pre-, mid-, and post-training) in participants enrolled in a 12-week resistance training program?
The dataset is already in wide format (one row per participant, separate columns for each time point) and filtered to the training group only — no restructuring is needed.
Prerequisites: You should have completed the SPSS tutorials for independent samples t-test and one-way ANOVA before working through this tutorial.
T.2 Part 1: Opening the dataset
Step 1: Open core_session_wide_training.csv in SPSS
Go to File → Open → Data, locate core_session_wide_training.csv, and open it. The dataset contains 30 participants (training group only) in wide format, with variables strength_kg_pre, strength_kg_mid, and strength_kg_post ready for analysis.
T.3 Part 2: Running the repeated measures ANOVA
Step 1: Open the General Linear Model dialog
Go to Analyze → General Linear Model → Repeated Measures
Step 2: Define the within-subject factor
In the Repeated Measures Define Factor(s) dialog:
- In the Within-Subject Factor Name field, type:
Time - In the Number of Levels field, type:
3 - Click Add
- Click Define
Step 3: Assign variables to the factor levels
In the Repeated Measures dialog:
- Move
strength_kg_preto the box next toTime(1) - Move
strength_kg_midto the box next toTime(2) - Move
strength_kg_postto the box next toTime(3)
Step 4: Request options
Click Options:
- Check Descriptive statistics
- Check Estimates of effect size
- Check Homogeneity tests (produces Mauchly’s test)
Step 5: Request estimated marginal means
Click EM Means:
- Click Continue and then on EM Means…
- Under Display Means for, move
Timeto the right panel - Check Compare main effects
- Under Confidence interval adjustment, select Bonferroni
- Click Continue
Step 6: Request a means plot
Click Plots:
- Move
Timeto the Horizontal Axis box - Click Add
- Check Error bars and select 95% CI or Standard Error
- Click Continue
Step 7: Run the analysis
Click OK. SPSS will produce several output tables.
T.4 Part 3: Interpreting the output
T.4.1 Table 1: Descriptive statistics
The first table shows the mean, standard deviation, and n for each time point:
| Time Point | M (kg) | SD | n |
|---|---|---|---|
| Pre-training | 79.67 | 12.26 | 30 |
| Mid-training (6-week) | 81.69 | 12.26 | 30 |
| Post-training (12-week) | 85.06 | 12.48 | 30 |
Strength increased progressively at each time point, with a total gain of approximately 5.4 kg from pre to post.
T.4.2 Table 2: Mauchly’s test of sphericity
| Mauchly’s W | χ² | df | p | ε (GG) | ε (HF) | ε (LB) | |
|---|---|---|---|---|---|---|---|
| Time | .623 | 13.234 | 2 | .001 | .726 | .755 | .500 |
Interpreting this table:
Mauchly’s W = .623, χ²(2) = 13.234, p = .001. Because p < .05, the sphericity assumption is violated. We therefore do not use the Sphericity Assumed row. Because ε_GG = .726, which is below .75, we will report the Greenhouse-Geisser corrected row in the within-subjects effects table.
When Mauchly’s test is significant and sphericity is violated, inspect the GG epsilon:
- ε_GG ≥ .75 → use Huynh-Feldt corrected row
- ε_GG < .75 → use Greenhouse-Geisser corrected row
The within-subjects effects table will contain four rows for the Time effect: Sphericity Assumed, Greenhouse-Geisser, Huynh-Feldt, and Lower-bound. You must determine which row to report based on Mauchly’s test result — not by choosing the row with the most favorable p-value. Selecting a row post hoc based on significance constitutes p-hacking[1,2].
T.4.3 Table 3: Tests of within-subjects effects
| Source | Correction | SS | df | MS | F | p | η²_p |
|---|---|---|---|---|---|---|---|
| Time | Sphericity Assumed | 443.73 | 2 | 221.87 | 116.0 | < .001 | .800 |
| Greenhouse-Geisser | 443.73 | 1.452 | 305.60 | 116.0 | < .001 | .800 | |
| Huynh-Feldt | 443.73 | 1.510 | 293.86 | 116.0 | < .001 | .800 | |
| Error (Time) | Sphericity Assumed | 110.94 | 58 | 1.91 | |||
| Greenhouse-Geisser | 110.94 | 42.11 | 2.63 | ||||
| Huynh-Feldt | 110.94 | 43.79 | 2.53 |
Interpreting this table:
Because Mauchly’s test was significant and ε_GG < .75, we read the Greenhouse-Geisser row: F(1.45, 42.11) = 116.0, p < .001, η²_p = .80.
The F-ratio is the same across all rows — only the degrees of freedom (and therefore the exact p-value) change with the corrections. Here, the correction meaningfully reduces the degrees of freedom, so the correct reporting choice is the Greenhouse-Geisser row rather than the uncorrected sphericity-assumed row.
Partial eta-squared of .80 means that 80% of the variance in strength that is attributable to within-person sources (time + error, excluding between-person baseline differences) is explained by the time effect. This is a very large effect by Cohen’s (1988) benchmarks (.01 small, .06 medium, .14 large). The within-subjects design is so efficient here because individual strength levels are highly stable across participants — removing this between-subjects variability makes the time effect stand out very clearly.
T.4.4 Table 4: Pairwise comparisons (Bonferroni-corrected)
| Comparison | Mean Difference (kg) | SE | p (adjusted) | 95% CI |
|---|---|---|---|---|
| Pre → Mid | −2.02 | 0.27 | < .001 | [−2.70, −1.34] |
| Pre → Post | −5.38 | 0.33 | < .001 | [−6.22, −4.54] |
| Mid → Post | −3.36 | 0.45 | < .001 | [−4.50, −2.22] |
Interpreting this table:
All three pairwise comparisons are statistically significant after Bonferroni correction. Strength increased significantly from pre to mid (2.02 kg gain), from mid to post (3.36 kg gain), and from pre to post (5.38 kg total gain). The confidence intervals are narrow, indicating high precision in the estimates of change. Negative signs in the mean difference column reflect the direction of subtraction (earlier time − later time); reverse the sign for reporting (mid − pre = +2.02 kg).
T.5 Part 4: Computing effect sizes
T.5.1 Partial eta-squared (η²_p)
SPSS reports η²_p directly in the within-subjects effects table when you check Estimates of effect size in the Options dialog. Read the value straight from the Partial Eta Squared column — no additional calculation is needed. From our output: η²_p = .800.
The underlying formula, shown here for conceptual understanding only, is:
\[\eta^2_p = \frac{SS_{\text{time}}}{SS_{\text{time}} + SS_{\text{error}}}\]
where \(SS_{\text{error}}\) is the time × subjects error term from the Within-Subjects Effects table.
T.5.2 Partial omega-squared (ω²_p)
SPSS does not report ω²_p directly, but you do not need to compute it by hand. Two convenient options are available:
- Statistical Calculators appendix — Use the interactive effect size calculator in the Statistical Calculators appendix. Enter the SS and MS values from the SPSS output table and it returns ω²_p instantly.
- SPSS 31 and later — The built-in power analysis module (Analyze → Power Analysis → General Linear Model) reports ω²_p alongside η²_p for repeated measures designs.
For reference, one convenient way to compute partial omega-squared for this repeated-measures effect uses \(df_{\text{time}}\), \(MS_{\text{time}}\), \(MS_{\text{error}}\), and the number of participants \(n\):
\[\omega^2_p = \frac{df_{\text{time}}\left(MS_{\text{time}} - MS_{\text{error}}\right)}{df_{\text{time}}MS_{\text{time}} + \left(n - df_{\text{time}}\right)MS_{\text{error}}}\]
For our example, this yields ω²_p = .88 — a very large effect. Report ω²_p as the primary effect size in publications; include η²_p for comparability with other studies.
T.5.3 Cohen’s d for pairwise comparisons
For each significant pairwise comparison, compute Cohen’s d from the Bonferroni output or from the difference scores directly:
| Comparison | Mean Diff | SD of Diff | Cohen’s d | Interpretation |
|---|---|---|---|---|
| Mid − Pre | 2.02 | 1.46 | 1.38 | Large |
| Post − Pre | 5.38 | 1.81 | 2.97 | Very large |
| Post − Mid | 3.36 | 2.47 | 1.36 | Large |
The SD of the difference scores is not shown in the Bonferroni output. To obtain it in SPSS, create the difference variable using Transform → Compute Variable (e.g., diff_mid_pre = strength_kg_mid - strength_kg_pre) and then run Analyze → Descriptive Statistics → Descriptives on that new variable. SPSS will report the SD directly in the output — no hand arithmetic needed. Repeat for each pairwise comparison, then divide the mean difference by the SD to obtain Cohen’s d.
T.6 Part 5: Creating a visualization
SPSS produces a basic estimated marginal means plot automatically when you request it in the Plots dialog. To improve it:
- Double-click the chart in the output to open it in the Chart Editor
- Add error bars: Elements → Error Bars → 95% CI or Standard Error
- Adjust axis labels and titles as needed
- Right-click → Export to save as PNG or PDF
For a more polished visualization, use the R code provided in Chapter 15 (the line plot and spaghetti plot figures).
T.7 Part 6: APA-style write-up
Using the output from this tutorial, a complete APA-style report reads:
“A one-way repeated measures ANOVA was conducted to examine the effect of training time on muscular strength (kg) in 30 participants enrolled in a 12-week resistance training program. Mauchly’s test indicated that the sphericity assumption was violated, W = .623, χ²(2) = 13.234, p = .001. Because ε_GG = .726, the Greenhouse-Geisser correction was applied. The within-subjects effect of time was statistically significant, F(1.45, 42.11) = 116.0, p < .001, η²_p = .80, ω²_p = .88. Descriptive statistics indicated progressive strength gains from pre-training (M = 79.7, SD = 12.3 kg) to mid-training at six weeks (M = 81.7, SD = 12.3 kg) and post-training at twelve weeks (M = 85.1, SD = 12.5 kg). Bonferroni-corrected pairwise comparisons confirmed that all three time points differed significantly: pre vs. mid, mean difference = 2.02 kg, p < .001, 95% CI [1.34, 2.70]; mid vs. post, mean difference = 3.36 kg, p < .001, 95% CI [2.22, 4.50]; and pre vs. post, mean difference = 5.38 kg, p < .001, 95% CI [4.54, 6.22].”
T.8 Part 7: Checking the normality assumption
Repeated measures ANOVA requires normality of the difference scores between each pair of time points, not of the raw scores.
Step 1: Create difference score variables
Go to Transform → Compute Variable:
diff_mid_pre = strength_kg_mid - strength_kg_prediff_post_pre = strength_kg_post - strength_kg_prediff_post_mid = strength_kg_post - strength_kg_mid
Step 2: Run Shapiro-Wilk tests on each difference variable
Go to Analyze → Descriptive Statistics → Explore:
- Move all three difference variables to the Dependent List
- Click Plots, check Normality plots with tests
- Click Continue → OK
SPSS will produce Shapiro-Wilk W statistics, Q-Q plots, and histograms for each difference variable.
Interpreting the output:
For this dataset, the Shapiro-Wilk tests are:
diff_mid_pre: \(W = 0.972\), \(p = .599\)diff_post_pre: \(W = 0.924\), \(p = .034\)diff_post_mid: \(W = 0.963\), \(p = .375\)
This means that two of the three difference scores do not show evidence of non-normality, but the post - pre difference score does fall below \(p < .05\). With \(n = 30\), that single significant Shapiro-Wilk test should prompt a visual inspection of the Q-Q plot rather than an automatic abandonment of repeated measures ANOVA. In this example, the ANOVA result is still typically treated as reasonably robust, but you should note the mild normality concern when interpreting the findings.
T.9 Troubleshooting common issues
“SPSS won’t let me define the within-subject factor”
Make sure you clicked Add after typing the factor name and number of levels, before clicking Define. Both steps are required.
“I get an error about unequal group sizes”
The repeated measures GLM requires complete data for every participant across all time points. Check for missing values using Analyze → Descriptive Statistics → Frequencies with the Display frequency tables option. Participants with any missing time point will be excluded from the analysis by SPSS (listwise deletion). Address missing data using imputation or alternative software if the exclusions are substantial.
“Mauchly’s test cannot be computed”
If you have only two levels in your within-subjects factor, SPSS does not compute Mauchly’s test because a two-level factor automatically satisfies sphericity. This is expected — proceed with the sphericity-assumed F-row.
“The within-subjects effects table shows ‘Greenhouse-Geisser’ but I’m not sure which row to report”
You must base your choice on Mauchly’s p-value and ε_GG before inspecting the F-rows. If p > .05 → Sphericity Assumed. If p < .05 and ε_GG ≥ .75 → Huynh-Feldt. If p < .05 and ε_GG < .75 → Greenhouse-Geisser. Never choose based on which row gives the smallest p-value.
“The pairwise comparisons show different signs than I expected”
SPSS computes differences in the order the time points were defined (e.g., Time 1 − Time 2 = Pre − Mid). If the training group improves over time, these differences will be negative (later time minus earlier time is positive; earlier minus later is negative). Reverse the sign when describing gains in your write-up.
T.10 Practice exercises
Use the core_session_wide_training.csv dataset to complete the following exercises.
Training group, VO₂max: Test whether VO₂max (mL·kg⁻¹·min⁻¹) changed significantly from pre to mid to post in the training group (n = 25 with complete data). Report Mauchly’s test, the appropriate F-row, Bonferroni comparisons, and η²_p.
Control group, strength: Run the same repeated measures ANOVA for muscular strength in the control group. Compare the pattern of results to the training group. What does the F-ratio and η²_p tell you about change in the control condition?
Training group, agility: Conduct a repeated measures ANOVA for agility T-test time (s) in the training group across pre, mid, and post. Note that lower scores indicate better agility. Report the direction of change (i.e., are times getting faster?) and compute Cohen’s d for each pairwise comparison.
Write-up practice: Using the results from Exercise 1 (VO₂max, training group), write a complete APA-style results paragraph including Mauchly’s test, the omnibus F with effect size, and all three pairwise comparisons with confidence intervals.