Chapter 13: Comparing Two Means
Student Resources
I use the 4 “P’s” framework to help you learn the material in this chapter: Prepare, Practice, Participate, and Perform. To increase the chances to succeed in this course, I strongly encourage you to complete all four “P’s” for each chapter.
1 Prepare
1.1 Chapter Overview
This chapter introduces t-tests—essential tools for evaluating group differences in Movement Science. You’ll learn how to distinguish between independent and paired sample designs, select the appropriate t-test, compute and interpret effect sizes (like Cohen’s d), and use confidence intervals to assess the magnitude and precision of mean differences.
1.2 Multimedia Resources
The following table provides access to video and slide resources for this chapter. Click the links to open them in an overlay for better viewing on all devices.
| Resource | Description | Link |
|---|---|---|
| Long Video Overview | A detailed video explaining independent and paired t-tests, assumptions, effect sizes, and interpretation in movement science research. | 🔗 Watch Video |
| Slide Overview PDF | PDF slides that serve as an overview of this chapter. Read these before the textbook to introduce the main concepts and vocabulary. | 🔗 Download PDF |
| Slide Deck HTML | Interactive HTML slides for class. During class, the instructor controls the presentation; after class, review at your own pace. | 🔗 Open Slides |
| Slide Deck PDF | PDF version of the slide deck for download and offline viewing. | 🔗 Download PDF |
1.3 Read the Chapter
Read (Weir & Vincent, 2021, p. Ch.10) and (Furtado, 2026, p. Ch.13) to understand the theoretical and practical application of t-tests for comparing two means.
To succeed in this course, you must read the textbook chapters assigned for each topic. This is the only way to learn the material in depth.
Once done, proceed to the next section to practice what you learned.
2 Practice
Practicing what you learned in the chapter is essential to mastering the material. Below are some resources to help you practice the material in this chapter.
2.1 Frequently Asked Questions
Use a paired t-test when the same participants are measured twice (such as in pre-post designs) or when observations are matched in pairs (e.g., twins, or left-right limb comparisons). Paired designs control for individual differences by comparing each person to themselves, reducing error variance and increasing statistical power. In contrast, use an independent t-test when comparing two separate, unrelated groups (e.g., experimental vs. control) where participants in one group are distinct from those in the other. Using an independent t-test on paired data wastes power, while using a paired t-test on independent data violates the assumption that pairs are related.
Welch’s t-test does not assume equal population variances, making it more robust than the traditional pooled-variance t-test. When variances differ substantially between groups, the pooled-variance t-test can produce inflated Type I error rates or reduced power. Welch’s t-test corrects for this by using separate variance estimates and adjusting degrees of freedom. Importantly, Welch’s t-test performs well even when variances are equal, meaning it rarely performs worse than the pooled-variance version and often performs better. For this reason, most modern statistical practice recommends Welch’s t-test as the default.
Independent t-tests assume: 1. Independence of observations: Scores in one group do not influence scores in the other. 2. Normality: Data in each group are approximately normally distributed. This is particularly critical with small samples (e.g., \(n < 15\)). With large samples, the test is robust to normality violations due to the Central Limit Theorem. 3. Homogeneity of variance: Population variances are equal (though this can be relaxed by using Welch’s t-test).
Paired t-tests assume: 1. Pairs are independent: One pair does not influence another pair. 2. Differences are normally distributed: It is critical to check the normality of the difference scores (e.g., post − pre), not the raw pre- or post-test scores separately. 3. No order effects: For repeated measures, researchers should use counterbalancing or randomization if possible to prevent systematic order effects, like fatigue or learning.
Cohen’s d quantifies the standardized magnitude of a mean difference. Cohen suggested standard benchmarks: \(|d| = 0.2\) (small), \(0.5\) (medium), and \(0.8\) (large). However, context is crucial. In injury prevention research, even a “small” effect (d = 0.2) may save lives and heavily justify an intervention. Conversely, in elite athletic contexts, a “large” effect (d = 0.8) may be unrealistic to achieve. Always interpret effect sizes relative to your specific research domain rather than relying exclusively on arbitrary guidelines.
P-values indicate whether an effect is statistically detectable (e.g., \(p < .05\) suggests the difference is unlikely due to chance), but they do not quantify the size or precision of the effect. Confidence intervals (CIs) provide a range of plausible values for the true population difference. They enable researchers to evaluate both statistical significance (does the CI exclude zero?) and practical importance (are the bounds meaningful in the real world?). Wide CIs indicate high uncertainty, while narrow CIs indicate high precision—bringing transparency to your findings.
Statistical power is the probability of detecting a true effect when it exists (Power = \(1 - \beta\)). Power increases when you have: 1. Larger sample sizes 2. Larger effect sizes 3. A higher significance level (\(\alpha\)) 4. Lower data variability 5. Matched/Paired designs: Paired designs typically offer higher power than independent designs because they control for individual baseline differences.
Low power (< 0.50) typically means studies will frequently miss actual effects, producing false negatives. Researchers should conduct a priori power analysis to determine needed sample sizes.
Statistical significance (\(p < .05\)) simply indicates that an observed difference is unlikely to have occurred by chance, assuming the null hypothesis holds true. Practical significance evaluates whether the magnitude of the difference actually matters in real-world applications. A difference can be statistically significant but completely trivial in practice (e.g., catching 0.1s faster with \(n = 5000\)). To assess practical significance, focus on effect sizes (like Cohen’s d), confidence intervals, and domain-specific thresholds like minimal clinically important differences (MCIDs).
2.2 Test your Knowledge
Take this low-stakes quiz to test your knowledge of the material in this chapter. This quiz is for practice only and will help you identify areas where you may need additional review.
3 Participate
This section includes activities and discussions that will be completed during class time. Your active participation is essential for deepening your understanding of the material.
During class, we will: - Differentiate research scenarios that require independent versus paired t-tests - Verify assumptions (homogeneity of variance, normality) using SPSS - Run independent and paired samples t-tests in SPSS - Compare Student’s t-test with Welch’s t-test outputs - Interpret effect sizes (Cohen’s d) and confidence intervals practically - Practice writing APA-style results statements for mean comparisons
4 Perform
4.1 Apply Your Learning
Now that you’ve prepared, practiced, and participated, it’s time to demonstrate your mastery of the material through assignments and assessments.
I strongly encourage you to complete the previous “Ps” (Prepare, Practice, Participate) before attempting any assignments or assessments associated with this chapter.