Chapter 11: Correlation and Bivariate Regression

Student Resources

How to study this chapter

I use the 4 “P’s” framework to help you learn the material in this chapter: Prepare, Practice, Participate, and Perform. To increase the chances to succeed in this course, I strongly encourage you to complete all four “P’s” for each chapter.

1 Prepare

1.1 Chapter Overview

This chapter introduces correlation and bivariate regression—essential tools for quantifying and modeling relationships between variables in Movement Science. You’ll learn how to compute and interpret Pearson’s correlation coefficient, fit a linear regression model, and use these methods responsibly to examine associations and make predictions from movement data.

1.2 Multimedia Resources

The following table provides access to video and slide resources for this chapter. Click the links to open them in an overlay for better viewing on all devices.

Multimedia Resources
Resource	Description	Link
Long Video Overview	A detailed video explaining correlation and bivariate regression, interpreting Pearson’s r, and computing regression models in movement science research.	🔗 Watch Video
Slide Overview PDF	PDF slides that serve as an overview of this chapter. Read these before the textbook to introduce the main concepts and vocabulary.	🔗 Download PDF
Slide Deck HTML	Interactive HTML slides for class. During class, the instructor controls the presentation; after class, review at your own pace.	🔗 Open Slides
Slide Deck PDF	PDF version of the slide deck for download and offline viewing.	🔗 Download PDF

1.3 Read the Chapter

Read (Weir & Vincent, 2021, p. Ch11) and (Furtado, 2026, p. Ch11) to understand how to quantify relationships between variables using correlation and bivariate regression.

To succeed in this course, you must read the textbook chapters assigned for each topic. This is the only way to learn the material in depth.

Once done, proceed to the next section to practice what you learned.

2 Practice

Practicing what you learned in the chapter is essential to mastering the material. Below are some resources to help you practice the material in this chapter.

2.1 Frequently Asked Questions

Correlation measures the strength and direction of the linear relationship between two continuous variables. It quantifies the degree to which two variables tend to change together systematically—that is, whether knowing the value of one variable helps predict the value of the other. Positive correlation means higher values of one variable tend to occur with higher values of the other (e.g., leg strength and vertical jump height). Negative correlation means higher values of one variable tend to occur with lower values of the other (e.g., body mass and endurance performance). A correlation of zero indicates no systematic linear relationship.

Pearson’s correlation coefficient (\(r\)) is the most common measure of correlation. It ranges from \(-1\) to \(+1\): - \(r = +1\): Perfect positive linear relationship - \(r = -1\): Perfect negative linear relationship - \(r = 0\): No linear relationship - \(|r| > 0.7\): Strong correlation (approximate) - \(0.4 < |r| < 0.7\): Moderate correlation (approximate) - \(|r| < 0.4\): Weak correlation (approximate)

These thresholds are approximate and context-dependent. Always interpret the magnitude of \(r\) relative to what is theoretically expected and practically meaningful in your specific research domain.

No. This is the most important limitation of correlation. A strong correlation between two variables does not mean that changes in one variable cause changes in the other. Possible explanations for a correlation include: 1. Confounding variables: A third unmeasured variable influences both (e.g., hot weather causes both increased ice cream sales and more drownings). 2. Reverse causation: The assumed direction of causation may be backwards. 3. Spurious correlations: The association may be coincidental with no meaningful connection.

Establishing causation requires experimental evidence: random assignment, manipulation of the independent variable, and control of confounders. Always use cautious language: “X is associated with Y,” not “X causes Y.”

No. Pearson’s \(r\) quantifies only linear associations. Two variables may have a strong, systematic non-linear relationship but show a weak or near-zero \(r\). For example, the relationship between exercise intensity and lactate concentration is exponential, and the Yerkes-Dodson inverted-U relationship between arousal and performance would yield \(r \approx 0\) despite a clear pattern. This is why always plotting your data (scatterplot) before computing \(r\) is essential—visual inspection reveals nonlinearity that \(r\) cannot detect.

Several factors can artificially influence the magnitude of \(r\): - Outliers: A single extreme data point can dramatically inflate or deflate \(r\). Always inspect scatterplots for outliers. - Restriction of range: If the range of values for one or both variables is artificially narrowed (e.g., studying only elite athletes), \(r\) will be attenuated. - Measurement error: Unreliable measurements reduce the observed \(r\) relative to the true relationship. - Sample size: Small samples produce unstable \(r\) estimates with wide confidence intervals.

Bivariate (simple) linear regression models the relationship between one predictor variable (\(X\)) and one outcome variable (\(Y\)) using a straight line: \[\hat{Y} = b_0 + b_1 X\] Where \(b_0\) is the intercept (predicted value of \(Y\) when \(X = 0\)) and \(b_1\) is the slope (the change in predicted \(Y\) for a one-unit increase in \(X\)). The regression line is determined by the least squares criterion: it minimizes the sum of squared differences between observed and predicted values. Regression is used when the goal is prediction, whereas correlation is used to quantify the strength of association.

Slope (\(b_1\)): For every one-unit increase in \(X\), the predicted \(Y\) changes by \(b_1\) units. A positive slope indicates a positive relationship; a negative slope indicates a negative relationship. The slope has units (units of \(Y\) per unit of \(X\)).
Intercept (\(b_0\)): The predicted value of \(Y\) when \(X = 0\). The intercept is often not directly interpretable if \(X = 0\) is outside the range of the data.

Example: If the regression equation for predicting jump height (cm) from leg strength (kg) is \(\hat{Y} = 10.2 + 0.44X\), then for every 1 kg increase in leg strength, predicted jump height increases by 0.44 cm.

\(R^2\) (the coefficient of determination) is the square of Pearson’s \(r\) and represents the proportion of variance in \(Y\) explained by \(X\). It ranges from 0 to 1: - \(R^2 = 0.64\) means 64% of the variability in \(Y\) is accounted for by \(X\) - The remaining \(1 - R^2\) is unexplained variance (residual)

\(R^2\) is a measure of effect size and practical importance, not just statistical significance. A statistically significant \(r\) can have a small \(R^2\) in large samples, meaning the predictor accounts for little practical variance.

Both methods assume: 1. Linearity: The relationship between \(X\) and \(Y\) is linear (check with a scatterplot) 2. Independence: Observations are independent of one another 3. Homoscedasticity: The variance of residuals is constant across all levels of \(X\) (check with a residual plot) 4. Normality of residuals: For inference (hypothesis tests, CIs), residuals should be approximately normally distributed

Violating these assumptions—especially linearity and homoscedasticity—can produce misleading results. Always inspect plots before trusting numerical output.

Correlation: Report \(r\), sample size, and p-value (or confidence interval): - “Leg strength was significantly correlated with vertical jump height, \(r(6) = .99\), \(p < .001\).”

Regression: Report the equation, \(R^2\), and significance of the model: - “Leg strength significantly predicted vertical jump height (\(b = 0.44\), \(\beta = .99\)), \(R^2 = .98\), \(F(1, 6) = 314.2\), \(p < .001\).”

Always include a scatterplot with the regression line when reporting regression results.

2.2 Test your Knowledge

Take this low-stakes quiz to test your knowledge of the material in this chapter. This quiz is for practice only and will help you identify areas where you may need additional review.

# What is a confidence interval? - [ ] A single value that estimates the population parameter - [x] A range of plausible values for the population parameter based on sample data - [ ] The range within which all sample data fall - [ ] The probability that our sample mean is correct # What does a 95% confidence interval mean? - [ ] 95% of the data fall within this interval - [ ] There is a 95% probability the true parameter is in this specific interval - [x] If we repeated the study many times, 95% of the intervals would contain the true parameter - [ ] We are 95% certain our sample mean is correct # How do you calculate a 95% confidence interval for a mean? - [ ] CI = x̄ ± 1.96 × s - [ ] CI = x̄ ± 2 × n - [x] CI = x̄ ± t* × SE (where SE = s/√n) - [ ] CI = x̄ ± s/n # What does a narrow confidence interval indicate? - [ ] The sample size was small - [ ] The data were highly variable - [x] The estimate of the population parameter is precise - [ ] We used a high confidence level (e.g., 99%) # What does a wide confidence interval indicate? - [ ] The estimate is very accurate - [x] There is substantial uncertainty about the population parameter - [ ] The sample size was very large - [ ] The data had low variability # What happens to confidence interval width as sample size increases? - [ ] It increases - [x] It decreases (interval becomes narrower) - [ ] It stays the same - [ ] It becomes zero # What happens to confidence interval width as variability (s) increases? - [x] It increases (interval becomes wider) - [ ] It decreases - [ ] It stays the same - [ ] It depends on sample size only # What happens to confidence interval width when you increase confidence level from 95% to 99%? - [x] It increases (interval becomes wider) - [ ] It decreases (interval becomes narrower) - [ ] It stays the same - [ ] It depends on the sample size # If a 95% CI for the difference between two means is [2.5, 8.3], what can you conclude? - [ ] The difference is not statistically significant - [x] The difference is statistically significant at α = 0.05 (interval doesn't include 0) - [ ] The difference is exactly 5.4 - [ ] The result is due to chance # If a 95% CI for the difference between two means is [-1.2, 4.8], what can you conclude? - [x] The difference is not statistically significant at α = 0.05 (interval includes 0) - [ ] The difference is statistically significant - [ ] Group 1 is definitely better than Group 2 - [ ] The sample size was too large # What critical value (t*) would you use for a 95% CI with a large sample (df > 30)? - [ ] t* ≈ 1.645 - [x] t* ≈ 1.96 - [ ] t* ≈ 2.576 - [ ] t* = 1.00 # To cut the width of a confidence interval in half, you need to: - [ ] Double the sample size - [ ] Triple the sample size - [x] Quadruple the sample size - [ ] Use a lower confidence level # Which provides more information: a p-value alone or a confidence interval? - [ ] P-value alone (tells you if the result is significant) - [x] Confidence interval (shows effect size, precision, and significance) - [ ] They provide exactly the same information - [ ] Neither is useful without the other # Can you determine statistical significance from a confidence interval for a difference? - [x] Yes, if the CI doesn't include zero, the difference is statistically significant - [ ] No, you must conduct a separate hypothesis test - [ ] Yes, but only if you use a 99% CI - [ ] No, p-values are the only way to assess significance # What is the main advantage of reporting confidence intervals instead of just p-values? - [ ] They are easier to calculate - [ ] They always show statistical significance - [x] They show the magnitude and precision of effects, not just whether effects are "significant" - [ ] They eliminate the need for hypothesis testing # If x̄ = 45, s = 10, n = 25, and t* = 2.064 for 95% CI, what is the confidence interval? - [ ] [43, 47] - [ ] [35, 55] - [x] [40.9, 49.1] (SE = 10/√25 = 2; margin = 2.064 × 2 = 4.1) - [ ] [44, 46]

3 Participate

This section includes activities and discussions that will be completed during class time. Your active participation is essential for deepening your understanding of the material.

In-Class Activities

During class, we will: - Construct and interpret scatterplots to visualize bivariate relationships - Compute and interpret Pearson’s correlation coefficient for Movement Science datasets - Distinguish between correlation and causation using real-world examples - Fit a bivariate regression model and interpret the slope, intercept, and \(R^2\) - Identify violations of assumptions (linearity, homoscedasticity) using residual plots - Practice reporting correlation and regression results in APA format

4 Perform

4.1 Apply Your Learning

Now that you’ve prepared, practiced, and participated, it’s time to demonstrate your mastery of the material through assignments and assessments.

Note to Students

I strongly encourage you to complete the previous “Ps” (Prepare, Practice, Participate) before attempting any assignments or assessments associated with this chapter.

4.2 Additional Resources

References

Furtado, O., Jr. (2026). Statistics for movement science: A hands-on guide with SPSS (1st ed.). https://drfurtado.github.io/sms/

Weir, J. P., & Vincent, W. J. (2021). Statistics in kinesiology (5th ed.). Human Kinetics.

1 Prepare

1.1 Chapter Overview

1.2 Multimedia Resources

1.3 Read the Chapter

2 Practice

2.1 Frequently Asked Questions

What is correlation?

What is Pearson’s correlation coefficient (r)?

Does correlation imply causation?

Does Pearson’s r measure non-linear relationships?

What factors affect the size of r?

What is bivariate regression?

How do I interpret the slope and intercept?

What is R² and how do I interpret it?

What are the assumptions of correlation and regression?

How do I report correlation and regression results in APA style?