KIN 610: Quantitative Methods in Kinesiology

Chapter 11: Correlation and Bivariate Regression

Ovande Furtado Jr., PhD.

Professor, Cal State Northridge

2026-02-21

FYI

This presentation is based on the following books. The references are coming from these books unless otherwise specified.

Main sources:

  • Moore, D. S., Notz, W. I., & Fligner, M. (2021). The basic practice of statistics (9th ed.). W.H. Freeman.
  • Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE Publications.
  • Furtado, O., Jr. (2026). Statistics for movement science: A hands-on guide with SPSS (1st ed.). https://drfurtado.github.io/sms

ClassShare App

You may be asked in class to go to the ClassShare App to answer questions.

SPSS Tutorial

Intro Question

  • A coach wants to know: does greater leg strength predict higher vertical jump performance? You collect leg strength (kg) and jump height (cm) from 30 athletes. How would you describe and model this relationship?
Click to reveal answer We need tools that (1) quantify the strength of the relationship and (2) allow us to make predictions. Correlation tells us how strongly two variables co-vary; regression produces a mathematical equation for prediction. Together they form the foundation of bivariate analysis in Movement Science.
  • Correlation quantifies the strength and direction of a linear relationship between two variables.
  • Bivariate regression models that relationship mathematically and enables prediction.
  • Both depend on visualizing data with a scatterplot first.

Learning Objectives

By the end of this chapter, you should be able to:

  • Explain what correlation measures and how it quantifies linear relationships
  • Compute and interpret Pearson’s \(r\) and \(r^2\)
  • Distinguish between correlation and causation
  • Construct and interpret scatterplots for bivariate data
  • Fit a bivariate regression model and interpret the slope, intercept, and \(R^2\)
  • Assess assumptions: linearity, homoscedasticity, independence, normality of residuals
  • Recognize the influence of outliers on correlation and regression results
  • Apply and report correlation and regression results appropriately in Movement Science

Symbols

Symbol Name Pronunciation Definition
\(r\) Pearson’s correlation “r” Strength and direction of the linear relationship
\(\rho\) Population correlation “rho” True correlation in the population
\(r^2\) Coefficient of determination “r squared” Proportion of variance in \(Y\) explained by \(X\)
\(R^2\) Coefficient of determination (regression) “R squared” Proportion of variance explained by the regression model
\(\hat{y}\) Predicted value “y hat” Value of \(Y\) predicted by the regression equation
\(a\) Intercept “a” Predicted \(Y\) when \(X = 0\)
\(b\) Slope “b” Change in \(\hat{y}\) for a one-unit increase in \(X\)
\(e\) Residual “residual” Difference between observed and predicted \(Y\)

What is Correlation?

Correlation measures the strength and direction of the linear relationship between two continuous variables[1,2].

Key properties:

  • Dimensionless (no units) and standardized
  • Ranges from \(-1\) to \(+1\)
  • Symmetric: \(r_{XY} = r_{YX}\)

Directions:

  • Positive: Higher \(X\) → Higher \(Y\) (e.g., leg strength and jump height)
  • Negative: Higher \(X\) → Lower \(Y\) (e.g., body mass and endurance performance)
  • Zero: No linear relationship

Benchmarks[1]:

\(|r|\) Strength
\(> 0.7\) Strong
\(0.4–0.7\) Moderate
\(< 0.4\) Weak
Figure 1: Three types of correlation: positive (top), negative (middle), and zero (bottom)

A note about criteria for correlation strength

There are many guidelines for interpreting the strength of a correlation. The choice of criteria depends on the field and study. Here are some common guidelines:

\[ \begin{array}{lccccc} \hline \textbf{Guideline} & \textbf{Negligible} & \textbf{Weak} & \textbf{Moderate} & \textbf{Strong} & \textbf{Very Strong / Perfect} \\ \hline \text{Cohen (1988)} & \text{–} & 0.10 \le |r| < 0.30 & 0.30 \le |r| < 0.50 & |r| \ge 0.50 & \text{–} \\ \text{Evans (1996)} & |r| < 0.20 & 0.20 \le |r| < 0.40 & 0.40 \le |r| < 0.60 & 0.60 \le |r| < 0.80 & 0.80 \le |r| \le 1.00 \\ \text{Hinkle (1994)} & |r| < 0.30 & 0.30 \le |r| < 0.50 & 0.50 \le |r| < 0.70 & 0.70 \le |r| < 0.90 & 0.90 \le |r| \le 1.00 \\ \text{Dancey and Reidy (2004)} & |r| < 0.10 & 0.10 \le |r| < 0.30 & 0.30 \le |r| < 0.60 & 0.60 \le |r| < 1.00 & |r| = 1.00 \\ \text{Mukaka (2012)} & |r| < 0.30 & 0.30 \le |r| < 0.50 & 0.50 \le |r| < 0.70 & 0.70 \le |r| < 0.90 & 0.90 \le |r| \le 1.00 \\ \hline \end{array} \]

Pearson’s Formula for \(r\)

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \; \sum (y_i - \bar{y})^2}} \tag{1}\]

Where \(\bar{x}\) and \(\bar{y}\) are the means of \(X\) and \(Y\), respectively.

  • The numerator measures covariation: how \(X\) and \(Y\) vary together
  • The denominator adjusts for the individual spread of \(X\) and \(Y\), ensuring \(r\) always falls between \(-1\) and \(+1\)

Alternatively, if we convert our data to z-scores first, the formula simplifies elegantly. Since z-scores already standardize for individual variability, \(r\) is simply the “average” product of the z-scores[1]:

\[ r = \frac{1}{n-1} \sum z_x \, z_y \tag{2}\]

Where \(z_x\) and \(z_y\) are the z-scores of \(X\) and \(Y\), respectively.

Intuition

When above-average \(X\) tends to pair with above-average \(Y\) (both deviations have the same sign), products are mostly positive → \(r > 0\). When they pair with opposite signs → \(r < 0\).

Check Question

If above-average leg strength values consistently pair with above-average jump heights, will the correlation be positive, negative, or zero?
Click to reveal answer

Answer: Positive. When above-average \(X\) pairs with above-average \(Y\), the deviations \((x_i - \bar{x})\) and \((y_i - \bar{y})\) both have the same sign, making their product positive. Summing many positive products gives a positive numerator, and thus \(r > 0\).

Figure 2: Mean lines (red dashed) divide the scatterplot into quadrants. Most points fall where deviations have the same sign (+/+ or -/-).

Scatterplots: Always Plot Your Data

Scatterplots are the essential first step — never compute \(r\) without visualizing the data first[3,4].

What scatterplots reveal:

  • Shape of the relationship: Indicates whether a linear model is appropriate or if a nonlinear (curved) pattern exists.
  • Strength and direction: Shows how closely points cluster together and whether they follow a positive or negative trend.
  • Outliers or influential points: Highlights extreme values that could disproportionately skew \(r\) and regression results.
  • Heteroscedasticity: Reveals if the spread changes across values (varies, forming a funnel shape)—a critical assumption violation for regression and significance testing.
Figure 3: Moderate positive linear relationship between leg strength and vertical jump height (r = 0.590)

Anscombe’s Quartet

Four datasets with identical \(r = 0.816\), \(\bar{x}\), \(\bar{y}\), and regression lines — but completely different patterns when plotted. Correlation alone can be deeply misleading[1].

Figure 4: Anscombe’s Quartet: Four datasets with identical statistics but different distributions.

Critical Limitation 1: Linearity

Pearson’s \(r\) measures only linear associations[1,3]. Two variables can have a strong, meaningful relationship yet produce \(r \approx 0\) if the relationship is nonlinear.

Movement Science examples of nonlinearity:

  • Lactate–intensity: exponential rise at high intensities
  • Arousal–performance: inverted-U (Yerkes-Dodson law)
  • Fatigue–time: rapid initial decline, then plateau

What to do if nonlinear:

  1. Transform data (e.g., log transformation)
  2. Fit a nonlinear model
  3. Use Spearman’s rank correlation (\(r_s\))
Figure 5: A nonlinear relationship: Pearson’s r ≈ 0 despite a clear U-shaped pattern

Critical Limitation 2: Correlation ≠ Causation

A strong correlation does not prove that one variable causes the other[1,5].

Three reasons correlations can be misleading:

Explanation Description Example
Confounding variable Third variable drives both Hot weather → ice cream sales AND drownings
Reverse causation Direction assumed backwards Do fit people exercise, or does exercise make people fit?
Spurious correlation Coincidence Spelling bee winner length ↔︎ spider deaths

Establishing causation requires:

  1. Temporal precedence (cause precedes effect)
  2. Covariation (variables must correlate)
  3. Elimination of alternatives (RCT or experimental control)

Language matters

Use: “X is associated with Y” or “X and Y are related

Avoid: “X causes Y” (unless you have experimental evidence)

Movement Science example

A strong negative correlation between physical activity and cardiovascular disease does not prove that activity prevents heart disease[5]. Healthier individuals may simply be more likely to exercise (reverse causation), or genetic factors may influence both (confounding). Only RCTs can establish causation[6].

Check Question

A study finds r = 0.72 between hours of practice and gymnast skill. Can the researcher conclude that more practice causes higher skill?
Click to reveal answer

Answer: No. Alternative explanations include: (1) skilled gymnasts may be more motivated to practice (reverse causation); (2) talent, coaching quality, or physical ability may drive both practice time and skill (confounding). Correlation alone cannot establish causation. A randomized controlled trial, where athletes are randomly assigned to different practice schedules, would be needed[1].

Coefficient of Determination: \(r^2\)

Squaring \(r\) gives \(r^2\), the coefficient of determination: the proportion of variance in \(Y\) explained by \(X\)[1,3]. This is similar to the effect size in in the t-test and ANOVA.

\[ r^2 = (0.590)^2 = 0.348 \tag{3}\]

Interpretation:

  • Correlation (\(r^2\)): 34.8% of the variance is shared between leg strength and jump height.
  • Regression (\(R^2\)): Leg strength mathematically accounts for or predicts 34.8% of the variability in jump height.
  • The remaining 65.2% is unexplained (technique, fiber type, measurement error, etc.)

\[ t = \frac{r \sqrt{n-2}}{\sqrt{1 - r^2}}, \quad df = n - 2 \tag{4}\]

For our example: \(t = 1.790\), \(df = 6\), \(p = .124\)

Note

Equation 3 refers to the equation for the coefficient of determination.
Equation 4 refers to the equation for the significance test of the correlation coefficient.

Practical benchmarks[7]

\(r^2\) Interpretation
\(< 0.10\) Weak (< 10% shared variance)
\(0.10–0.30\) Moderate
\(\geq 0.30\) Strong

Statistical significance ≠ practical importance

With very large samples, even \(r = 0.05\) can be statistically significant — yet \(r^2 = 0.0025\) means only 0.25% of variance is explained[4,6].

Always report \(r\), \(r^2\), and confidence intervals — not just \(p\)-values.

Worked Example: Data & Scatterplot

Data: leg strength (kg) and vertical jump height (cm) in 8 athletes.

Data:

Athlete \(X\) (kg) \(Y\) (cm)
1 80 42
2 90 53
3 70 44
4 100 49
5 85 52
6 95 48
7 75 40
8 88 56

Summary Stats:

  • \(\bar{x} = 85.38, \; s_x = 10.06\)
  • \(\bar{y} = 48.00, \; s_y = 5.63\)

1. Always plot first!

Figure 6: Scatterplot of Leg Strength vs. Jump Height showing a moderate positive correlation (r = 0.590).

Worked Example: Checking Assumptions with SPSS

Before calculating any correlation coefficient, we must verify that the dataset meets the necessary statistical assumptions.

Assumption How to check in SPSS
1. Continuous Variables Both \(X\) and \(Y\) must be interval or ratio level variables. Confirm this in the Variable View (Measure column).
2. Linearity The relationship must be roughly linear. Look at the scatterplot via Graphs > Chart Builder... and ensure the pattern follows a line, not a curve.
3. Independence Each observation must be independent. This is confirmed via study design (e.g., each row is a unique athlete).
4. No Outliers Ensure extreme points aren’t pulling the linear trend. Check the scatterplot visually or use boxplots via Analyze > Descriptive Statistics > Explore.
5. Normality Variables should be roughly normally distributed (necessary for significance testing). Run the Shapiro-Wilk test or check Q-Q plots via Analyze > Descriptive Statistics > Explore.
6. Homoscedasticity Data points should be evenly spread along the regression line (no funnel shape). Check the scatterplot or regression residuals visually.

Worked Example: Computing Pearson’s \(r\)

Once assumptions are met, we compute Pearson’s \(r\).

2. Compute \(r\) using z-scores

First, convert all values to z-scores: \(z_x = \frac{x - \bar{x}}{s_x}, \quad z_y = \frac{y - \bar{y}}{s_y}\)

For the 8 athletes, the sum of their cross-products is \(\sum z_x z_y = 4.132\)

\[r = \frac{1}{n-1} \sum z_x z_y = \frac{4.132}{7} = \mathbf{0.590}\]

  • \(r^2\): \(0.590^2 = \mathbf{0.348}\) (34.8% of variance explained)
  • Significance: \(p = \mathbf{0.124}\) (not statistically significant)

Interpretation: A moderate positive linear relationship.

Calculate in SPSS

You can find step-by-step instructions on how to compute Pearson’s \(r\) in the SPSS Tutorial: Correlation and Bivariate Regression chapter of the SMS textbook.

Bivariate Linear Regression: The Model

Regression goes beyond correlation: it fits a mathematical equation to predict \(Y\) from \(X\)[1,3].

\[\hat{y} = a + bx\]

Components:

Symbol Name Meaning
\(\hat{y}\) Predicted value Estimated \(Y\) for a given \(X\)
\(a\) Intercept Predicted \(Y\) when \(X = 0\)
\(b\) Slope Change in \(\hat{y}\) per 1-unit increase in \(X\)
Figure 7: Visualizing the regression line components

Correlation vs. Regression

Correlation Regression
Goal Quantify association Predict \(Y\) from \(X\)
Output \(r\), \(r^2\) Equation \(\hat{y} = a + bx\)
Symmetric? Yes1 (\(r_{XY} = r_{YX}\)) No (predicting \(Y\) from \(X\) ≠ vice versa)
When to use Describe relationship Make predictions

Understanding Slope and Intercept

To build the regression equation \(\hat{y} = a + bx\), we must calculate the slope (\(b\)) and intercept (\(a\)).

The Slope (\(b\))

  • Determines the steepness of the regression line.
  • Formula: \(b = r \frac{s_y}{s_x}\)
  • If \(b\) is positive, the line goes up; if negative, the line goes down.
  • Represents the rate of change: how much does \(Y\) change when \(X\) increases by exactly one unit?
  • In real-world terms (e.g., strength \(\to\) jump height), a slope of 0.5 means a 1kg increase in strength yields a 0.5cm increase in jump height.

The Intercept (\(a\))

  • The point where the regression line crosses the Y-axis.
  • Formula: \(a = \bar{y} - b\bar{x}\)
  • Represents the predicted value of \(Y\) when \(X\) is exactly exactly 0.
  • Often, \(a\) is just a mathematical anchor. For instance, estimating jump height for someone with 0kg of leg strength is absurd! Never over-interpret the intercept outside the plausible range of your data[1].
  • The regression line always passes exactly through the means of the data: \((\bar{x}, \bar{y})\).

Worked Example: Regression Equation

Using the leg strength data: \(\bar{x} = 85.375\), \(\bar{y} = 48.000\), \(s_x = 10.056\), \(s_y = 5.632\), \(r = 0.590\).

Step 1: Compute the slope

\[b = r \frac{s_y}{s_x} = 0.590 \times \frac{5.632}{10.056} = 0.590 \times 0.560 = \mathbf{0.331 \text{ cm/kg}}\]

Step 2: Compute the intercept

\[a = \bar{y} - b\bar{x} = 48.000 - (0.331 \times 85.375) = \mathbf{19.741 \text{ cm}}\]

Step 3: Write the equation

\[\hat{y} = 19.741 + 0.331(x)\]

Making a prediction: If leg strength = 92 kg:

\[\hat{y} = 19.741 + 0.331(92) = 50.19 \text{ cm}\]

Slope (\(b = 0.331\))

For every 1 kg increase in leg strength, the predicted jump height increases by 0.331 cm on average. Note: This represents the average trend across the entire sample. It does not guarantee that if a specific individual gains 1 kg of strength, their jump will mechanically increase by exactly 0.331 cm.

Intercept (\(a = 19.741\))

Predicted jump height when leg strength = 0. This value is not meaningful here — no one has zero leg strength. Do not over-interpret intercepts outside the data range.

Extrapolation

Never predict outside the observed range of \(X\) (70–100 kg in this example). The linear relationship may not hold beyond that range[1].

Residuals and Model Fit

A residual is the difference between the observed and predicted value[1]:

\[e_i = y_i - \hat{y}_i \tag{5}\]

What residuals tell us:

  • How well the model fits: Residuals show the error for each prediction.
  • Whether assumptions are met: We want errors to be pure, unpredictable noise. If they are randomly scattered around zero (no pattern), it means the linear model successfully captured the relationship and variance is constant.

\(R^2\) (in bivariate regression = \(r^2\)):

\[ R^2 = 0.348 \tag{6}\]

  • 34.8% of variance in jump height explained by leg strength
  • 65.2% is residual (unexplained) variance

Reading the Residual Plot:

Pattern Diagnosis
1. Random scatter ✓ Assumptions met: Errors are random (linearity) with constant spread (homoscedasticity).
2. Funnel shape Heteroscedasticity: Spread of errors changes over time, violating constant variance.
3. Curved pattern Nonlinearity: The linear model missed a curved relationship.
4. Outliers Influential points: Specific extreme values that might distort the model.
Figure 8: Visual guide to common residual plot patterns

Check Question

Using the equation \(\hat{y} = 19.741 + 0.331x\), what is the predicted jump height for an athlete with leg strength of 80 kg?
Click to reveal answer

Answer: Calculation:
\(\hat{y} = 19.741 + 0.331(80) = 19.741 + 26.48 = \mathbf{46.22 \text{ cm}}\).

What does this mean?
Our model predicts an athlete with 80 kg of leg strength will jump 46.22 cm.

The Residual (Error):
If we look back at our original data table, Athlete 1 actually had 80 kg of leg strength, but only jumped 42 cm.
Residual (\(y - \hat{y}\)) = 42 − 46.22 = -4.22 cm.
The negative residual means Athlete 1 jumped 4.22 cm lower than the model expected!

Assumptions of Correlation and Regression

Both methods rely on five key assumptions[1,3]:

1. Linearity The \(X\)\(Y\) relationship must be approximately linear. → Check: scatterplot and residual plot.

2. Homoscedasticity Variance in \(Y\) is constant across all values of \(X\). Violations produce a funnel shape in residual plots. → Check: residual plot.

3. Independence Each observation must be independent (one data point per participant, or use appropriate repeated-measures methods). → Check: study design.

4. Normality of residuals Residuals should be approximately normally distributed (for inference). Less critical for large samples (Central Limit Theorem). → Check: histogram or Q-Q plot of residuals.

5. No extreme outliers Outliers — especially those with high leverage (extreme \(X\)) and large residuals — can distort \(r\) and regression coefficients. → Check: scatterplot, residual plot, Cook’s distance.

Key principle

Violating assumptions — especially linearity and homoscedasticity — can produce misleading \(r\) values, biased slope estimates, and incorrect standard errors[3,8].

Check Question

A residual plot shows a clear funnel shape, with residuals spreading out as predicted values increase. Which assumption is violated?
Click to reveal answer

Answer: Homoscedasticity is violated. The funnel shape indicates heteroscedasticity — the variance of residuals increases with the predicted value. This can distort standard errors and confidence intervals. Possible remedies include log-transforming \(Y\), using weighted least squares, or robust regression methods[8].

Outliers and Influential Points

Outliers can have a disproportionate influence on \(r\) and regression coefficients[1,8].

Types of problematic points:

  • Outlier in \(Y\): A data point with an extreme outcome value; sits far above or below the regression line (large residual).
  • High leverage point: Extreme \(X\) value; sits far to the left or right of other data, acting like a seesaw to tilt the regression line toward itself.
  • Influential point: Extreme \(X\) AND large residual; both distorts slope and \(r\)

What to do with outliers:

  1. Check for data entry errors first
  2. If legitimate, report results with and without the outlier
  3. Consider robust methods (e.g., Spearman’s \(r_s\), robust regression)
  4. Never delete outliers automatically — they may represent real biological variability
Figure 9: Effect of an influential outlier on the regression line. Blue = original data; red = with outlier added.

Statistical vs. Practical Significance

Just as with hypothesis testing, a statistically significant correlation may not be practically meaningful[4,6].

Examples in Movement Science:

Scenario \(r\) \(p\) \(r^2\) Practical interpretation
Training volume & VO2max .10 .02 1% Stat. sig. but trivial (n = 400)
Strength & jump height .85 .08 72% Large effect, underpowered (n = 8)
Balance score & fall risk .55 .001 30% Stat. sig. AND meaningful

Key principle:

  • Large samples can make tiny \(r\) values statistically significant
  • Small samples may fail to detect large real correlations
  • Always report \(r\), \(r^2\), and confidence intervals alongside \(p\)-values[2,9]

Effect size benchmarks for \(r^2\)[10]

  • \(r^2 \approx 0.01\) → Small effect
  • \(r^2 \approx 0.09\) → Medium effect
  • \(r^2 \approx 0.25\) → Large effect

In elite sport contexts, even very small correlations (\(r \approx 0.10\)\(0.30\)) can be practically important — a 1% improvement can separate medal positions[6,7].

Reporting Results in APA Style

Correlation:

“Leg strength was significantly and positively correlated with vertical jump height, \(r(6) = .998\), \(p < .001\).”

Note: \(df = n - 2 = 6\) in parentheses.

Regression:

“A bivariate linear regression revealed that leg strength significantly predicted vertical jump height (\(b = 0.50\), \(\beta = .998\)), \(R^2 = .996\), \(F(1, 6) = 1502.1\), \(p < .001\). For every 1 kg increase in leg strength, jump height increased by 0.50 cm.”

Always include:

  • The regression equation
  • \(R^2\) and its interpretation
  • A scatterplot with the regression line
  • Residual plots to support assumptions

APA formatting rules

  • Use lowercase r italicized for Pearson’s correlation
  • Report degrees of freedom in parentheses: \(r(df)\)
  • Report \(p < .001\) when the p-value is very small
  • Include confidence intervals for \(r\) when possible: \(r = .998\), 95% CI \([.990, 1.000]\)
  • Use cautious language: “associated with,” not “causes”

Common Misconceptions

Misconception 1

❌ incorrect: “\(r = 0\) means there is no relationship between the variables.”

✅ correct: \(r = 0\) means there is no linear relationship. A strong nonlinear (curved) relationship can produce \(r \approx 0\). Always check a scatterplot[1].

Misconception 2

❌ incorrect: “A significant correlation proves causation.”

✅ correct: Correlation quantifies association only. Causation requires experimental design (random assignment, manipulation, control of confounds)[1,2].

Misconception 3

❌ incorrect: “\(r = 0.90\) is twice as strong as \(r = 0.45\).”

✅ correct: \(r\) is not a ratio scale. Compare using \(r^2\): \(0.90^2 = 81\%\) vs. \(0.45^2 = 20\%\) variance explained — a 4× difference, not 2×.

Misconception 4

❌ incorrect: “Non-overlapping confidence intervals for \(r\) confirm the correlations are different.”

✅ correct: Use Fisher’s z-transformation to formally test whether two \(r\) values differ — visual overlap of CIs is not a reliable test.

Misconception 5

❌ incorrect: “I can use the regression equation to predict values for athletes much stronger than any in my sample.”

✅ correct: That would be extrapolation — the linear relationship may not hold outside the observed range of \(X\)[3,8].

Workflow Summary

Use this sequence whenever examining the relationship between two continuous variables[1,3]:

Step Action Tool
1 Create a scatterplot Visualize pattern, outliers, linearity
2 Compute \(r\) Quantify strength and direction
3 Test significance \(t = r\sqrt{n-2}/\sqrt{1-r^2}\), \(df = n-2\)
4 Fit regression model (if prediction needed) \(\hat{y} = a + bx\); report slope, intercept, \(R^2\)
5 Check assumptions Residual plot, Q-Q plot
6 Interpret cautiously Correlation ≠ causation; report effect sizes

Important

The goal is not just a number — it is understanding the nature of the relationship and communicating it honestly, including its limitations.

Summary: Key Takeaways

  1. Correlation (\(r\)) quantifies the strength and direction of a linear relationship; ranges from \(-1\) to \(+1\)
  2. Always plot your data first\(r\) cannot detect nonlinear relationships
  3. Correlation does not imply causation — confounding, reverse causation, and spurious correlations are always possible
  4. \(r^2\) represents the proportion of variance explained — more interpretable than \(r\) alone
  5. Bivariate regression produces a prediction equation \(\hat{y} = a + bx\); the slope tells you how much \(Y\) changes per unit of \(X\)
  6. Check assumptions: linearity, homoscedasticity, independence, normality of residuals, no extreme outliers
  7. Extrapolation is risky — restrict predictions to the observed range of \(X\)
  8. Statistical significance ≠ practical importance — always report \(r\), \(r^2\), CI, and \(p\) together

Important

Correlation and regression are powerful descriptive tools — but responsible use requires knowing their limits.

Practice Questions

  1. What does it mean if \(r = 0\) for two variables in a Movement Science study?
  2. A researcher finds \(r = 0.60\) between weekly training volume and 1-RM bench press. What percentage of variance in bench press is explained by training volume?
  3. Why must you always create a scatterplot before computing a correlation coefficient?
  4. Explain the difference between a high-leverage point and an influential point.
  5. A regression equation predicting VO2max from resting heart rate is: \(\hat{y} = 80 - 0.5x\). Predict VO2max for an athlete with resting HR = 60 bpm.
  6. What does a funnel shape in a residual plot indicate, and how might you address it?
  7. Why is it inappropriate to conclude causation from a significant correlation between ice cream sales and sports injuries?
  8. When would you prefer to report Spearman’s \(r_s\) instead of Pearson’s \(r\)?

Exit Ticket: Bivariate Correlation Activity

Please complete the Bivariate Correlation Activity for this week before leaving.

References

1. Moore, D. S., McCabe, G. P., & Craig, B. A. (2021). Introduction to the practice of statistics (10th ed.). W. H. Freeman; Company.
2. Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
3. Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.
4. Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.
5. Vincent, W. J. (1999). Statistics in kinesiology. Human Kinetics.
6. Batterham, A. M., & Hopkins, W. G. (2006). Making meaningful inferences about magnitudes. International Journal of Sports Physiology and Performance, 1(1), 50–57.
7. Hopkins, W. G. (2000). Measures of reliability in sports medicine and science. Sports Medicine, 30(1), 1–15. https://doi.org/10.2165/00007256-200030010-00001
8. Wilcox, R. R. (2017). Introduction to robust estimation and hypothesis testing (4th ed.). Academic Press.
9. Wilkinson, L., & Task Force on Statistical Inference, APA Board of Scientific Affairs. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604.
10. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum.
11. Furtado, O., Jr. (2026). Statistics for movement science: A hands-on guide with SPSS (1st ed.). https://drfurtado.github.io/sms/