Appendix P — SPSS Tutorial: Hypothesis Testing
Conducting t-tests, interpreting p-values, and making statistical decisions
P.1 Overview
Hypothesis testing is one of the most common statistical procedures in Movement Science research. SPSS provides comprehensive tools for conducting t-tests and interpreting the results. This tutorial demonstrates:
- How to perform one-sample, independent-samples, and paired-samples t-tests
- How to interpret SPSS output tables
- How to check normality and homogeneity of variance assumptions
- How to compute and interpret effect sizes
- How to choose between different versions of the t-test (Student’s vs. Welch’s)
Understanding SPSS output for hypothesis tests is critical because the software provides more information than just the p-value. Learning to interpret confidence intervals, effect sizes, and assumption diagnostics will help you conduct more responsible and transparent statistical analyses.
Prerequisites: Familiarity with SPSS data entry, descriptive statistics, and basic data management.
P.2 Dataset for this tutorial
We will use the Core Dataset (core_session.csv). Download it here: core_session.csv
For this tutorial: * One-sample t-test: Test whether vo2_mlkgmin at pre-training differs from a reference value of 40 mL·kg⁻¹·min⁻¹ (N = 60) * Independent-samples t-test: Compare sprint_20m_s between training and control groups at pre-training (N = 30 per group) * Paired-samples t-test: Compare sprint_20m_s pre vs. post (N = 55 pairs)
P.3 Part 1: One-sample t-test
The one-sample t-test compares a sample mean to a known or hypothesized population value.
P.3.1 Example scenario
We test whether the mean VO₂max (vo2_mlkgmin) in our sample (N = 60, pre-training) differs from a commonly cited population reference value of 40 mL·kg⁻¹·min⁻¹ for recreationally active adults.
P.3.2 Procedure
- Analyze → Compare Means → One-Sample T Test…
- Move
vo2_mlkgminto Test Variable(s) - Enter Test Value = 40
- OK
P.3.3 Interpreting the output
SPSS produces two tables:
One-Sample Statistics:
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
vo2_mlkgmin 60 41.340 6.817 0.880
One-Sample Test:
One-Sample Test
Test Value = 40
t df Sig. Mean 95% CI of the Difference
(2-tailed) Difference Lower Upper
vo2_mlkgmin 1.523 59 .133 1.340 -0.421 3.101
Key information:
- t = 1.523: Test statistic
- df = 59: Degrees of freedom (n − 1)
- Sig. (2-tailed) = .133: Two-tailed p-value
- Mean Difference = 1.340: Observed mean minus test value (41.34 − 40)
- 95% CI [−0.421, 3.101]: CI includes zero → not significant
P.3.4 Decision and interpretation
Decision: p = .133 > .05, so fail to reject H₀
Interpretation:
“The mean VO₂max in our sample (M = 41.34 mL·kg⁻¹·min⁻¹, SD = 6.82) did not differ significantly from the population reference value of 40 mL·kg⁻¹·min⁻¹, t(59) = 1.52, p = .133, mean difference = 1.34 mL·kg⁻¹·min⁻¹, 95% CI [−0.42, 3.10] mL·kg⁻¹·min⁻¹.”
SPSS reports two-tailed p-values by default. If you need a one-tailed test, divide the p-value by 2 (but only if the direction matches your hypothesis). However, two-tailed tests are recommended in most situations.
P.4 Part 2: Independent-samples t-test
The independent-samples t-test compares means between two independent groups.
P.4.1 Example scenario
Compare 20-m sprint time (sprint_20m_s) between training and control groups at pre-training (N = 30 per group).
P.4.2 Procedure
- Analyze → Compare Means → Independent-Samples T Test…
- Move
sprint_20m_sto Test Variable(s) - Move
groupto Grouping Variable - Click Define Groups… and enter
controlandtraining - Continue → OK
P.4.3 Interpreting the output
SPSS produces three tables:
Group Statistics:
Group Statistics
group N Mean Std. Deviation Std. Error Mean
sprint_20m_s control 30 3.811 .340 .062
training 30 3.772 .373 .068
Independent Samples Test:
Independent Samples Test
Levene's Test t-test for Equality of Means
F Sig. t df Sig. Mean 95% CI of the Difference
(2-tailed) Difference Lower Upper
sprint_20m_s Equal variances 0.029 .864 0.429 58 .669 .039 -.145 .224
assumed
Equal variances 0.429 57.5 .669 .039 -.145 .224
not assumed
Key components:
- Levene’s Test:
- F = 0.029, Sig. = .864
- Interpretation: p = .864 > .05, so variances are approximately equal
- T-test results:
- Equal variances assumed: t = 0.429, df = 58, p = .669
- Both rows are nearly identical (as expected when Levene’s is non-significant)
P.4.4 Which t-test to use?
Rule of thumb:
- If Levene’s p > .05: Use “Equal variances assumed” row (Student’s t-test)
- If Levene’s p < .05: Use “Equal variances not assumed” row (Welch’s t-test)
Modern recommendation: Many statisticians recommend always using Welch’s t-test (equal variances not assumed) because it is more robust and performs well even when variances are equal.
P.4.5 Decision and interpretation
Using Equal variances assumed (since Levene’s p = .864):
Decision: p = .669 > .05, fail to reject H₀
Interpretation:
“Sprint time at baseline did not differ significantly between control (M = 3.81 s, SD = 0.34) and training groups (M = 3.77 s, SD = 0.37), t(58) = 0.43, p = .669, mean difference = 0.04 s, 95% CI [−0.15, 0.22] s.”
SPSS does not automatically report Cohen’s d. To compute it manually:
\[ d = \frac{\text{Mean difference}}{s_{\text{pooled}}} \]
For this example:
\[ s_{\text{pooled}} = \sqrt{\frac{(30-1)(0.340^2) + (30-1)(0.373^2)}{58}} = 0.357 \]
\[ d = \frac{0.039}{0.357} = 0.11 \text{ (negligible effect)} \]
P.5 Part 3: Paired-samples t-test
The paired-samples t-test compares two related measurements (e.g., pre-test and post-test on the same participants).
P.5.1 Example scenario
Compare 20-m sprint time at pre-training vs. post-training across all participants with complete data (N = 55 pairs).
P.5.2 Procedure
- Analyze → Compare Means → Paired-Samples T Test…
- Select both variables (
sprint_preandsprint_post) and click the arrow to move them to Paired Variables - OK
P.5.3 Interpreting the output
SPSS produces three tables:
Paired Samples Statistics:
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 sprint_pre 3.802 55 .365 .049
sprint_post 3.792 55 .402 .054
Paired Samples Correlations:
Paired Samples Correlations
N Correlation Sig.
Pair 1 sprint_pre & 55 .920 <.001
sprint_post
This table shows the correlation between pre- and post-test scores. High correlation (r = .920) indicates very good individual rank-order stability across time points.
Paired Samples Test:
Paired Samples Test
Paired Differences
Mean Std. Std. Error 95% CI of the Difference t df Sig.
Difference Deviation Mean Lower Upper (2-tailed)
Pair 1 sprint_pre - .010 .158 .021 -.033 .053 0.469 54 .641
sprint_post
Key information:
- Mean Difference = 0.010 s: Trivially small pre-to-post change
- Std. Deviation = 0.158 s: Variability in change scores
- 95% CI [−0.033, 0.053]: CI includes zero → not significant
- t = 0.469: Test statistic
- df = 54: Degrees of freedom (n − 1)
- Sig. = .641: p > .05
P.5.4 Decision and interpretation
Decision: p = .641 > .05, fail to reject H₀
Interpretation:
“Sprint time did not change significantly from pre-training (M = 3.80 s, SD = 0.37) to post-training (M = 3.79 s, SD = 0.40), mean change = 0.01 s, 95% CI [−0.033, 0.053] s, t(54) = 0.47, p = .641.”
\[ d = \frac{\text{Mean difference}}{s_d} = \frac{0.010}{0.158} = 0.06 \text{ (negligible effect)} \]
Where \(s_d\) is the standard deviation of the difference scores (provided in SPSS output).
P.6 Part 4: Checking assumptions
Hypothesis tests rely on assumptions that should be checked.
P.6.1 Checking normality
For small to moderate samples (n < 50), check normality:
- Analyze → Descriptive Statistics → Explore…
- Move your variable to Dependent List
- For independent t-tests, move the grouping variable to Factor List
- Click Plots…
- ✓ Check Normality plots with tests
- Continue
- OK
SPSS produces:
- Shapiro-Wilk test: If p > .05, assume normality
- Q-Q plots: Points should fall roughly on the diagonal line
What if normality is violated?
- For large samples (n > 30 per group), t-tests are robust to violations
- For severe violations with small samples, consider nonparametric tests (Mann-Whitney U for independent samples, Wilcoxon signed-rank for paired samples)
P.6.2 Checking homogeneity of variance
For independent t-tests only, Levene’s test is automatically provided in the output.
- Levene’s p > .05: Variances are approximately equal (use Student’s t-test)
- Levene’s p < .05: Variances differ significantly (use Welch’s t-test)
P.7 Part 5: Reporting results
P.7.1 APA-style reporting template
One-sample t-test:
“The mean [variable] (M = [mean], SD = [SD], n = [n]) was significantly [greater/less] than [test value], t([df]) = [t-value], p = [p-value], mean difference = [diff], 95% CI [lower, upper].”
Independent-samples t-test:
“[Group 1] (M = [mean], SD = [SD], n = [n]) [differed/did not differ] significantly from [Group 2] (M = [mean], SD = [SD], n = [n]), t([df]) = [t-value], p = [p-value], mean difference = [diff], 95% CI [lower, upper], d = effect size.”
Paired-samples t-test:
“[Outcome] increased/decreased significantly from [pre] (M = [mean], SD = [SD]) to [post] (M = [mean], SD = [SD]), mean [increase/decrease] = [diff], 95% CI [lower, upper], t([df]) = [t-value], p = [p-value], d = effect size.”
P.7.2 Example table
| Test Type | Variable | Group/Condition | n | Mean (SD) | t | df | p | 95% CI | d |
|---|---|---|---|---|---|---|---|---|---|
| One-sample | VO₂max | vs. Reference (40) | 60 | 41.34 (6.82) | 1.52 | 59 | .133 | [−0.42, 3.10] | 0.20 |
| Independent | Sprint | Control | 30 | 3.81 (0.34) | 0.43 | 58 | .669 | [−0.15, 0.22] | 0.11 |
| Training | 30 | 3.77 (0.37) | |||||||
| Paired | Sprint | Pre vs. Post | 55 | 0.01 diff | 0.47 | 54 | .641 | [−0.03, 0.05] | 0.06 |
P.8 Part 6: Common mistakes and troubleshooting
P.8.1 Mistake 1: Using paired t-test for independent groups
Problem: Paired t-tests require the same participants measured twice. Independent groups need independent-samples t-tests.
Solution: Ensure your data structure matches the test. Paired data: two columns (pre, post). Independent data: one column (outcome) and one grouping variable.
P.8.2 Mistake 2: Reporting only p-values without effect sizes or CIs
Problem: “p < .05” provides limited information.
Solution: Always report descriptive statistics, confidence intervals, and effect sizes alongside p-values.
P.8.3 Mistake 3: Concluding “no difference” from non-significant results
Problem: p > .05 does not mean “no effect.”
Solution: Examine the confidence interval. A wide CI suggests the study was underpowered. Report: “The difference was not statistically significant (p = .08), but the 95% CI [−2.1, 12.5] cm includes both trivial and meaningful effects.”
P.8.4 Mistake 4: Ignoring assumption violations
Problem: Using Student’s t-test when variances differ markedly.
Solution: Use Welch’s t-test (equal variances not assumed row) when Levene’s test is significant.
P.8.5 Mistake 5: Multiple testing without correction
Problem: Conducting many t-tests increases Type I error rate.
Solution: Use ANOVA for multiple groups (Chapter 14) or apply corrections (e.g., Bonferroni) when conducting multiple comparisons.
P.9 Part 7: Power analysis and sample size planning
SPSS does not have built-in power analysis tools. Use external software:
- **G*Power** (free): User-friendly power analysis software
- R packages:
pwr,simr - Online calculators
**Example using G*Power:**
- Open G*Power
- Test family: t-tests
- Statistical test: Means: Difference between two independent means (two groups)
- Type of power analysis: A priori (compute required sample size)
- Input parameters:
- Effect size d: 0.5 (medium effect)
- α = 0.05
- Power = 0.80
- Calculate
Result: n = 64 per group (128 total)
Conducting power analysis before data collection ensures adequate sample size to detect meaningful effects. Underpowered studies waste resources and produce unreliable findings.
P.10 Part 8: Bayesian t-tests (optional)
SPSS has limited Bayesian capabilities, but specialized software (JASP, R) can compute Bayes factors, which quantify evidence for H₀ vs. H₁:
- BF₁₀ > 3: Moderate evidence for H₁
- BF₁₀ > 10: Strong evidence for H₁
- BF₁₀ < 1/3: Moderate evidence for H₀
Bayesian methods provide a more intuitive interpretation than p-values and can quantify evidence for the null hypothesis.
P.11 Summary
This tutorial covered:
- Conducting one-sample, independent-samples, and paired-samples t-tests in SPSS
- Interpreting SPSS output tables including t-statistics, p-values, degrees of freedom, and confidence intervals
- Checking assumptions using Levene’s test and normality diagnostics
- Choosing between Student’s and Welch’s t-tests based on variance equality
- Computing effect sizes (Cohen’s d) to quantify magnitude
- Reporting results following APA guidelines
Key takeaways:
- Always report descriptive statistics, confidence intervals, and effect sizes alongside p-values
- Use Welch’s t-test when variances differ or as a default robust option
- Check assumptions but remember t-tests are robust to moderate violations
- Non-significant results do not prove “no effect”—examine confidence intervals for precision
- Practice conducting t-tests on your own datasets
- Compare Student’s vs. Welch’s t-test results when variances differ
- Compute effect sizes manually and interpret practical significance
- Consult Chapter 10 of the textbook for deeper understanding of hypothesis testing logic
- Learn ANOVA (Chapter 14) for comparing more than two groups
P.12 Additional resources
- SPSS manuals: IBM SPSS Statistics Base documentation
- APA Style (7th ed.): Guidelines for reporting statistical tests
- **G*Power**: Free power analysis software (https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower)
- Textbook website: Download practice datasets and syntax files