Appendix Q — SPSS Tutorial: Comparing Two Means
Conducting independent and paired t-tests in SPSS
Q.1 Overview
Comparing means between groups or conditions is one of the most common statistical procedures in Movement Science research. SPSS provides comprehensive tools for conducting one-sample t-tests (comparing one group to a benchmark), independent-samples t-tests (comparing two separate groups), and paired-samples t-tests (comparing two related measurements). This tutorial demonstrates:
- How to distinguish between independent and paired designs
- How to perform and interpret both types of t-tests
- How to check assumptions (normality, equal variances)
- How to interpret Levene’s test and choose appropriate test versions
- How to compute effect sizes and confidence intervals
- How to create effective visualizations for mean comparisons
Understanding SPSS output for t-tests is critical because the software provides multiple versions of tests (Student’s vs. Welch’s) and assumption diagnostics that guide interpretation. Learning to read these outputs ensures you select the correct analysis and report results accurately.
Prerequisites: Familiarity with SPSS data entry, descriptive statistics, and basic hypothesis testing concepts (see Chapter 10 and Chapter 13).
Q.2 Dataset for this tutorial
We will use the Core Dataset (core_session.csv). Download it here: core_session.csv
For this tutorial, we will:
- Independent-samples t-test: Compare
sprint_20m_s(20-m sprint time) between the training and control groups at pre-training (N = 60, 30 per group) - Paired-samples t-test: Compare
sprint_20m_sat pre vs. post across all participants with complete data (N = 55 pairs) - One-sample t-test: Test whether mean
vo2_mlkgmindiffers from a population reference value of 40 mL·kg⁻¹·min⁻¹
Open the file in SPSS using File → Open → Data, choose the CSV file type, and import using the Text Import Wizard.
Q.3 Part 0: One-sample t-test
The one-sample t-test compares the mean of a single group to a known benchmark or population mean.
Q.3.1 Example scenario
We test whether the mean maximal oxygen consumption (vo2_mlkgmin) in our sample (N = 60) differs from a population reference value of 40 mL·kg⁻¹·min⁻¹.
Q.3.2 Procedure
- Analyze → Compare Means → One-Sample T Test…
- Move
vo2_mlkgminto the Test Variable(s) list. - In the Test Value box, enter
40. - Ensure Estimate effect sizes is checked.
- Click OK.
Q.3.3 Interpreting the output
SPSS produces two main tables:
Q.3.3.1 Table 1: One-Sample Statistics
Shows the sample mean, standard deviation, and standard error. If your sample mean is 42.5, you can see at a glance that it is higher than the test value of 40.
Q.3.3.2 Table 2: One-Sample Test
- t: The test statistic (larger absolute values indicate stronger evidence against the null).
- df: Degrees of freedom (n - 1).
- Sig. (2-tailed): The p-value. If \(p < .05\), the difference is statistically significant.
- Mean Difference: The difference between your sample mean and the test value.
- 95% Confidence Interval: The range of plausible values for the true population difference.
Q.3.4 Reporting the results
“The sample mean VO₂max (M = 42.5 mL/kg/min, SD = 5.2, n = 60) was significantly higher than the population reference value of 40 mL/kg/min, t(59) = 3.72, p < .001, mean difference = 2.5, 95% CI [1.16, 3.84], Cohen’s d = 0.48.”
Q.4 Part 1: Independent-samples t-test
The independent-samples t-test compares the means of two separate, unrelated groups (e.g., experimental vs. control, males vs. females, trained vs. untrained).
Q.4.1 Example scenario
We compare 20-meter sprint time (s) between the training group (n = 30) and control group (n = 30) at baseline (pre-training) to determine whether the two groups started with equivalent sprint performance.
Q.4.2 Data structure
For independent-samples t-tests, your data should have:
- One grouping variable (e.g.,
group) with two levels:trainingandcontrol - One continuous dependent variable (e.g.,
sprint_20m_s) - Each row represents one participant
Filter to pre-training: Data → Select Cases → If: time = 'pre'
Q.4.3 Procedure
- Analyze → Compare Means → Independent-Samples T Test…
- Move
sprint_20m_sto Test Variable(s) - Move
groupto Grouping Variable - Click Define Groups… and enter
controlandtraining - Ensure Estimate effect sizes is checked (this will output Cohen’s d and other effect size metrics directly).
- Click Continue → OK
Q.4.4 Interpreting the output
SPSS produces three main tables:
Q.4.4.1 Table 1: Group Statistics
Group Statistics
group N Mean Std. Deviation Std. Error Mean
sprint_20m_s control 30 3.811 .340 .062
training 30 3.772 .373 .068
Initial observation: Control and training groups have nearly identical means at baseline (3.81 vs. 3.77 s), as expected for a pre-intervention comparison.
Q.4.4.2 Table 2: Levene’s Test for Equality of Variances
Independent Samples Test
Levene's Test for Equality of Variances
F Sig.
Equal variances 0.029 .864
assumed
Decision rule:
- If Sig. (p-value) > .05: Assume equal variances (use “Equal variances assumed” row)
- If Sig. (p-value) ≤ .05: Do not assume equal variances (use “Equal variances not assumed” row)
In this example: p = .864 > .05, so we assume equal variances.
Many statisticians recommend always using Welch’s t-test (“Equal variances not assumed” row) because it is robust to unequal variances and performs well even when variances are equal. This is increasingly the default in modern statistical practice.
Q.4.4.3 Table 3: Independent Samples Test (t-test results)
Independent Samples Test
t-test for Equality of Means
t df Sig. Mean Std. Error 95% CI of the Difference
(2-tailed) Difference Difference Lower Upper
Equal variances assumed 0.429 58 .669 .039 .091 -.145 .224
Equal variances not assumed 0.429 57.5 .669 .039 .091 -.145 .224
Key information:
- t = 0.429, df = 58, p = .669
- Mean Difference = 0.039 s (negligible)
- 95% CI: [−0.145, 0.224] s (includes zero → not significant)
Q.4.5 Decision and interpretation
Decision: p = .669 > .05, so we fail to reject H₀
APA-style interpretation:
“Sprint time at baseline did not differ significantly between the control (M = 3.81 s, SD = 0.34, n = 30) and training groups (M = 3.77 s, SD = 0.37, n = 30), t(58) = 0.43, p = .669, mean difference = 0.04 s, 95% CI [−0.15, 0.22] s. This confirms that the two groups were equivalent at baseline.”
Q.4.6 Computing Cohen’s d in SPSS
Modern versions of SPSS (v27 and later) compute effect sizes automatically if you select the Estimate effect sizes option in the t-test dialog. It produces an “Independent Samples Effect Sizes” table.
Look for the Cohen’s d row and the Point Estimate column.
For this example: The output will show a Point Estimate for Cohen’s d of approximately 0.11.
Interpretation: Cohen’s d = 0.11 indicates a negligible effect size, consistent with the non-significant result and confirming the groups were well-matched at baseline.
Q.5 Part 2: Paired-samples t-test
The paired-samples t-test compares two related measurements on the same participants (e.g., pre-test vs. post-test, left vs. right limb).
Q.5.1 Example scenario
We compare 20-meter sprint time (s) at pre-training vs. post-training across all participants with data at both time points (N = 55 pairs) to determine whether sprint performance changed over the study period.
Q.5.2 Data structure
For paired-samples t-tests, reshape the data to wide format so each participant has one row with sprint_pre and sprint_post columns. In SPSS, use Data → Restructure if your data are in long format.
Q.5.3 Procedure
- Analyze → Compare Means → Paired-Samples T Test…
- Select both variables (e.g.,
sprint_20m_s_preandsprint_20m_s_post) - Click the arrow to move them to Paired Variables as a pair
- Ensure Estimate effect sizes is checked.
- Options… → Continue
- OK
Q.5.4 Interpreting the output
SPSS produces three main tables:
Q.5.4.1 Table 1: Paired Samples Statistics
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 sprint_pre 3.802 55 .365 .049
sprint_post 3.792 55 .402 .054
Initial observation: Mean sprint time changed minimally from pre (3.802 s) to post (3.792 s).
Q.5.4.2 Table 2: Paired Samples Correlations
Paired Samples Correlations
N Correlation Sig.
Pair 1 sprint_pre & 55 .920 <.001
sprint_post
- Correlation = .920: Very high positive correlation — individual sprint ability is stable across time
- Sig. < .001: The correlation is statistically significant
Why this matters: High pre-post correlations increase the statistical power of the paired t-test by reducing error variance.
Q.5.4.3 Table 3: Paired Samples Test
Paired Samples Test
Paired Differences
Mean Std. Std. Error 95% CI of the Difference t df Sig.
Deviation Mean Lower Upper (2-tailed)
Pair 1 sprint_pre - .010 .158 .021 -.033 .053 0.469 54 .641
sprint_post
Key information:
- Mean (of differences) = 0.010 s (trivially small pre-post change)
- Std. Deviation = 0.158 s (variability in individual changes)
- 95% CI = [−0.033, 0.053] (includes zero → not significant)
- t = 0.469, df = 54, p = .641
Q.5.5 Decision and interpretation
Decision: p = .641 > .05, so we fail to reject H₀
APA-style interpretation:
“Sprint time did not change significantly from pre-training (M = 3.80 s, SD = 0.37) to post-training (M = 3.79 s, SD = 0.40), t(54) = 0.47, p = .641, mean difference = 0.01 s, 95% CI [−0.03, 0.05] s. The paired pre-post correlation was r = .92 (p < .001), confirming high individual consistency across time points.”
Q.5.6 Computing Cohen’s d for paired samples
Just like with the independent t-test, ensuring the Estimate effect sizes box is checked will generate a “Paired Samples Effect Sizes” table.
Look for the Cohen’s d row and the Point Estimate column.
For this example: The output will show a Point Estimate for Cohen’s d of approximately 0.06.
Interpretation: Cohen’s d = 0.06 indicates a negligible effect size, consistent with the non-significant result.
Q.6 Part 3: Checking assumptions
Q.6.1 Checking normality
For one-sample t-tests, check normality of the single continuous variable. For independent t-tests, check normality separately for each group. For paired t-tests, check normality of the difference scores (Post − Pre).
Q.6.1.1 Visual inspection: Histograms and Q-Q plots
For one-sample and independent t-tests:
- Graphs → Legacy Dialogs → Histogram…
- Select your dependent variable (e.g.,
JumpHeight) - Check Display normal curve
- (For independent t-tests only) Panels…
- Rows: Select grouping variable (e.g.,
Group) - Continue
- Rows: Select grouping variable (e.g.,
- OK
For paired t-tests:
- Transform → Compute Variable…
- Create a new variable:
Difference = GripPost - GripPre - Graphs → Legacy Dialogs → Histogram…
- Select
Difference - Check Display normal curve
- OK
Q-Q plots:
- Analyze → Descriptive Statistics → Q-Q Plots…
- Move the variable to Variables
- OK
Interpretation:
- Histogram should be approximately bell-shaped
- Q-Q plot: Points should fall roughly along the diagonal line
- Minor deviations are acceptable, especially with n > 30
Q.6.1.2 Formal test: Shapiro-Wilk test
For one-sample and paired t-tests:
- (For paired tests) First compute the difference score (see above)
- Analyze → Descriptive Statistics → Explore…
- Move variable (or
Difference) to Dependent List - Click Plots…
- Check Normality plots with tests
- Continue
- OK
For independent t-tests:
- Analyze → Descriptive Statistics → Explore…
- Move dependent variable to Dependent List
- Move grouping variable to Factor List
- Click Plots…
- Check Normality plots with tests
- Continue
- OK
Interpreting Shapiro-Wilk output:
Tests of Normality
Shapiro-Wilk
Statistic df Sig.
sprint_20m_s .986 60 .737
- H₀: Data are normally distributed
- H₁: Data are not normally distributed
Decision rule:
- If Sig. > .05: Assume normality
- If Sig. ≤ .05: Normality violated
In this example: W = 0.986, p = .737 > .05, so sprint times are approximately normally distributed. Parametric t-test is appropriate.
T-tests are robust to moderate violations of normality, especially with larger samples (n > 30 per group). If normality is severely violated with small samples, consider:
- Transformations (log, square root)
- Nonparametric alternatives (Mann-Whitney U for independent, Wilcoxon signed-rank for paired)
- Bootstrap methods
Q.7 Part 4: Creating visualizations
Q.7.1 Box plots and Histograms (One-sample)
- Graphs → Legacy Dialogs → Boxplot…
- Select Simple and Summaries of separate variables
- Move your variable to Boxes Represent
- OK
Q.7.2 Box plots for independent groups
- Graphs → Legacy Dialogs → Boxplot…
- Select Simple and Summaries of separate variables
- Define
- Move dependent variable to Boxes Represent
- Move grouping variable to Category Axis
- OK
Q.7.3 Error bar plots
- Graphs → Legacy Dialogs → Error Bar…
- Select Simple and Summaries of separate variables
- Define
- Move dependent variable to Error Bars
- Move grouping variable to Category Axis
- Under Bars Represent, select Confidence interval for mean (95%)
- OK
Q.7.4 Pre-post line plots (paired designs)
- Graphs → Legacy Dialogs → Line…
- Select Multiple and Summaries of separate variables
- Define
- Move both variables (e.g.,
GripPreandGripPost) to Lines Represent - OK
SPSS offers a newer graphing interface:
- Graphs → Chart Builder…
- Drag chart types to the canvas and customize
However, legacy dialogs are often quicker for simple plots.
Q.8 Part 5: Reporting guidelines
Q.8.1 One-sample t-test report template
“[Sample] (M = [mean], SD = [SD], n = [n]) [differed/did not differ] significantly from the [benchmark/population mean] of [value], t([df]) = [t-value], p = [p-value], d = [Cohen’s d], 95% CI [lower, upper].”
Q.8.2 Independent t-test report template
“[Group 1] (M = [mean], SD = [SD], n = [n]) [differed/did not differ] significantly from [Group 2] (M = [mean], SD = [SD], n = [n]), t([df]) = [t-value], p = [p-value], [Cohen’s d = …], 95% CI [lower, upper].”
Example:
“Sprint time at baseline did not differ significantly between control (M = 3.81 s, SD = 0.34, n = 30) and training groups (M = 3.77 s, SD = 0.37, n = 30), t(58) = 0.43, p = .669, mean difference = 0.04 s, 95% CI [−0.15, 0.22] s, Cohen’s d = 0.11.”
Q.8.3 Paired t-test report template
“[Post-condition] (M = [mean], SD = [SD]) was significantly [higher/lower] than [pre-condition] (M = [mean], SD = [SD]), t([df]) = [t-value], p = [p-value], mean difference = [M_diff], 95% CI [lower, upper], [Cohen’s d = …].”
Example:
“Sprint time did not change significantly from pre-training (M = 3.80 s, SD = 0.37) to post-training (M = 3.79 s, SD = 0.40), t(54) = 0.47, p = .641, mean difference = 0.01 s, 95% CI [−0.03, 0.05] s, Cohen’s d = 0.06.”
Q.9 Part 6: Common issues and troubleshooting
Q.9.1 Issue 1: Levene’s test is significant (p < .05)
Solution: Use Welch’s t-test (read the “Equal variances not assumed” row in the output). This is the default recommendation in modern practice.
Q.9.2 Issue 2: Very small sample sizes (n < 15 per group)
Solution:
- Check normality carefully using Q-Q plots and Shapiro-Wilk test
- If normality is violated, consider nonparametric alternatives:
- Mann-Whitney U test (for independent samples)
- Wilcoxon signed-rank test (for paired samples)
To run Mann-Whitney U in SPSS:
- Analyze → Nonparametric Tests → Legacy Dialogs → 2 Independent Samples…
- Move dependent variable to Test Variable List
- Move grouping variable to Grouping Variable
- Define Groups…
- Select Mann-Whitney U
- OK
Q.9.3 Issue 3: Missing data in paired tests
Problem: Paired t-tests require complete pairs. If one measurement is missing, SPSS excludes the entire pair.
Solution:
- Use listwise deletion (SPSS default) if missing data are minimal
- Consider multiple imputation for systematic missing data
- Report how many cases were excluded
Q.9.4 Issue 4: One-tailed vs. two-tailed tests
SPSS reports two-tailed p-values by default. If you have a strong directional hypothesis:
- One-tailed p-value = Two-tailed p-value / 2
Example: If SPSS reports p = .042 (two-tailed), the one-tailed p = .021.
Only use one-tailed tests if you have strong a priori theoretical reasons and would not interpret effects in the opposite direction.
Q.10 Practice exercises
Q.10.1 Exercise 0: One-sample t-test
Scenario: Compare the average daily caloric intake of 25 soccer players (M = 2850 kcal, SD = 450) to a recommended target of 3000 kcal.
- Enter the data into SPSS
- Run a one-sample t-test with a Test Value of 3000
- Ensure “Estimate effect sizes” is checked
- Write an APA-style results statement
Q.10.2 Exercise 1: Independent t-test
Scenario: Compare reaction time (ms) between 20 young adults (M = 285, SD = 32) and 18 older adults (M = 340, SD = 45).
- Enter the data into SPSS
- Run an independent-samples t-test with “Estimate effect sizes” checked
- Check Levene’s test and determine which row to report
- Locate Cohen’s d in the Effect Sizes table
- Write an APA-style results statement
Q.10.3 Exercise 2: Paired t-test
Scenario: Measure VO₂max (mL/kg/min) before and after 10 weeks of interval training in 15 runners.
- Pre: M = 52.3, SD = 6.1
- Post: M = 58.7, SD = 6.5
- Enter the paired data
- Compute a difference score
- Check normality of the differences
- Run a paired-samples t-test with “Estimate effect sizes” checked
- Locate Cohen’s d in the Effect Sizes table
- Write an APA-style results statement
Q.10.4 Exercise 3: Assumption checking
Use the Training Study Dataset and check:
- Normality for each group using histograms, Q-Q plots, and Shapiro-Wilk tests
- Homogeneity of variance using Levene’s test
- Decide whether assumptions are met or if corrections/alternatives are needed
Q.11 Summary
This tutorial covered:
- One-sample t-tests: Comparing one group to a benchmark
- Independent-samples t-tests: Comparing two separate groups
- Paired-samples t-tests: Comparing two related measurements
- Assumption checking: Normality (Shapiro-Wilk, Q-Q plots) and equal variances (Levene’s test)
- Effect sizes: Computing and interpreting Cohen’s d
- Visualizations: Box plots, error bar plots, and line plots
- Reporting: APA-style guidelines for presenting t-test results
Mastering these procedures in SPSS enables you to conduct rigorous group comparisons, check assumptions, and communicate findings transparently in Movement Science research.
For more on t-tests and mean comparisons, see:
- Chapter 10: Hypothesis Testing and Statistical Inference
- Chapter 13: Comparing Two Means
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.)
- Pallant, J. (2020). SPSS Survival Manual (7th ed.)