Chapter 7: The Normal Distribution
2026-02-04
This presentation is based on the following books. The references are coming from these books unless otherwise specified.
Main sources:
ClassShare App
You may be asked in class to go to the ClassShare App to answer questions.
SPSS Tutorial
Try generating samples from a normal distribution to see how the histogram changes with different sample sizes and parameters. This demonstrates the concept that most values cluster around the mean.
Interactive Demo: Click here to open the interactive normal distribution sampling demo
Note: The interactive demo opens in a new tab/window. It starts with kinesiology-relevant defaults (μ = 50 cm, σ = 10 cm, like jump height) but allows you to adjust all parameters to see how the histogram changes.
By the end of this chapter, you should be able to:
| Symbol | Name | Pronunciation | Definition |
|---|---|---|---|
| \(\mu\) | Population mean | “myoo” | Center of the distribution |
| \(\sigma\) | Population standard deviation | “\(\sigma\)” | Spread of the distribution |
| \(z\) | Z-score | “zee” | \((x - \mu) / \sigma\) |
| \(\Phi(z)\) | Cumulative probability | “phi of z” | Area under curve to the left of z |
| \(P(X \leq x)\) | Probability | “probability of X less than or equal to x” | Probability that X is less than or equal to x |
| Skewness | Skewness | “skew-ness” | Measure of asymmetry; 0 for symmetric |
| Kurtosis | Kurtosis | “kur-toh-sis” | Measure of tail weight; 0 for normal (excess kurtosis) |
| \(z_{\text{skew}}\) | Z-score for skewness | – | Skewness / SE of skewness |
| \(z_{\text{kurt}}\) | Z-score for kurtosis | – | Kurtosis / SE of kurtosis |
| \(Q_1\) | First quartile | – | 25th percentile |
| \(Q_3\) | Third quartile | – | 75th percentile |
The normal distribution (also called the bell curve or Gaussian distribution) is a continuous probability distribution that is central to statistical theory and practice[1].
Understanding terminology is essential for normality assessment[1,7]:
The normal distribution has five defining characteristics[1]:
Standard normal distribution (μ = 0, σ = 1) allows us to use a single z-table for any normal distribution[1]. See Equation 1 for the standard z-score calculation:
\[ z = \frac{x - \mu}{\sigma} \]
Example: Vertical jump heights: μ = 45 cm, σ = 7 cm. What proportion jump higher than 52 cm?
Step 1: \(z = \frac{52 - 45}{7} = 1.00\)
Step 2: Cumulative probability for z = 1.00 is 0.8413 (84.13% below)
Step 3: Upper tail: \(P(X > 52) = 1 - 0.8413 = 0.1587\) (15.87% jump higher)
How did I get the percentile? I used the CPT.
Real-World Context: Professional Basketball
A study of 53 professional basketball players reported that Spanish League (LEB) players had a mean Countermovement Jump (CMJ) of 41.17 cm during the pre-season (1st assessment).
“The vertical jump is considered a fundamental skill in basketball… [data] allow coaches to compare their players’ performance with high-level athletes.” — Read Abstract
Assuming a hypothetical SD = 5.0 cm (consistent with similar cohorts), a 52 cm jump would have a z-score of: \[ z = \frac{52 - 41.17}{5.0} = 2.17 \] This exceeds 98.5% of the professional cohort!
Answer: A z-score of -1.5 means the score is 1.5 standard deviations below the mean (\(x = 52 - 1.5(7) = 41.5\) cm).
Calculation:
Percentile = Normal CPT at z=-1.5 = 0.0668 or 6.68%
Meaning: The score is at the 7th percentile, meaning only ~6.7% of scores are lower (and 93.3% are higher).
Interpretation: This is 1.5 SD below average - a relatively low performance compared to the group.
Skewness quantifies the degree of asymmetry in a distribution[7,8].
Interpretation:
Rules of thumb[8]:
| Skewness Range | Interpretation |
|---|---|
| \(\|\)Skewness\(\|\) < 0.5 | Approximately symmetric |
| 0.5 ≤ \(\|\)Skewness\(\|\) < 1.0 | Moderately skewed |
| \(\|\)Skewness\(\|\) ≥ 1.0 | Highly skewed |
Real-World Context
Reaction times in elite sprinters are typically positively (right) skewed.
Answer: Positive skewness indicates a long right tail where the mean is pulled higher than the median by extreme values.
Z-score for skewness (z-skew) tests whether observed skewness is statistically different from zero[7,9].
Formula:
\[ z_{\text{skew}} = \frac{\text{Skewness}}{SE_{\text{skew}}} \]
Decision rules:
To make it simple!
If z-skew is in between -2.0 and 2.0, the distribution is approximately symmetric.
Example 1 (not significant):
Example 2 (highly significant):
Answer: Formula: z-skew = skewness / SE_skew
Calculation:
z-skew = 1.45 / 0.31 = 4.68
Decision: |4.68| >= 2.58, so the skewness is highly significant at a = 0.01 (p < .01).
Interpretation: This is not a trivial departure - it’s a strong, systematic skew that cannot be attributed to sampling variation. A distribution this skewed is clearly not normal!
Kurtosis quantifies the “tailedness” or extremity of a distribution relative to the normal distribution[10,11].
Types:
Rules of thumb:
| Kurtosis | Interpretation |
|---|---|
| |k| < 1.0 | Normal-like (ok) |
| 1.0 ≤ |k| < 2.0 | Moderate |
| |k| ≥ 2.0 | Severe |
Important
Modern interpretation emphasizes tail weight rather than “peakedness”[10].
Answer: Heavy tails indicate more extreme outliers than expected under a normal distribution.
Z-score for kurtosis (z-kurtosis) tests whether observed kurtosis differs significantly from zero[7,9].
Formula:
\[ z_{\text{kurt}} = \frac{\text{Kurtosis}}{SE_{\text{kurt}}} \]
Decision rules:
To make it simple!
If z-kurt is in between -2.0 and 2.0, the distribution is approximately normal.
Combined interpretation:
| Z-Skew | Z-Kurtosis | Decision |
|---|---|---|
| Both \(\|z\| < 1.96\) | Both \(\|z\| < 1.96\) | Approximately normal |
| Either \(\|z\| \geq 1.96\) | Either \(\|z\| \geq 1.96\) | Significant departure |
| Both \(\|z\| \geq 2.58\) | Both \(\|z\| \geq 2.58\) | Severe non-normality |
Histograms display the frequency distribution of data by grouping values into bins, revealing the overall shape and patterns that numerical summaries alone cannot capture[7]. They allow you to quickly identify:
Histograms provide an intuitive visual complement to statistics like mean, median, skewness, and kurtosis—helping you see what the numbers are telling you.
Q-Q plots (quantile-quantile plots) compare observed data quantiles to expected normal quantiles[1,7].
How to interpret:
set.seed(123)
par(mfrow = c(1, 3)) # 3 plots side-by-side
# 1. Normal
norm_data <- rnorm(200)
qqnorm(norm_data, main = "Normal", pch = 19, col = "gray50")
qqline(norm_data, col = "red", lwd = 2)
# 2. Right-Skewed (Positive Skew)
# Points curve UP at both ends (convex / U-shape) relative to line
right_skew <- rexp(200, rate = 1)
qqnorm(right_skew, main = "Right-Skewed", pch = 19, col = "gray50")
qqline(right_skew, col = "red", lwd = 2)
# 3. Left-Skewed (Negative Skew)
# Points curve DOWN at both ends (concave / inverted U) relative to line
left_skew <- 100 - rexp(200, rate = 1)
qqnorm(left_skew, main = "Left-Skewed", pch = 19, col = "gray50")
qqline(left_skew, col = "red", lwd = 2)
par(mfrow = c(1, 1)) # Reset layoutGold standard
Q-Q plots are the most informative visual tool for assessing normality because they show how data deviate from the normal model across the entire distribution[7].
Answer: An S-shaped curve indicates right-skewed data (positive skewness).
Shapiro-Wilk test is the most powerful normality test for small to moderate samples[13,14].
Hypotheses:
Decision rule:
Simple interpretation
Example:
Critical limitation
Do not rely solely on p-values! Sample size strongly affects test results[5,9]:
So, always combine formal tests with visual assessment (Q-Q plots, histograms) to make informed decisions about normality.
Why visual and formal methods often conflict:
Decision: Trust visual assessment
Decision: Trust visual assessment
Important
Modern statistical practice prioritizes visual assessment with formal tests serving as supplementary evidence[5,7,12].
Key message: Due to sample size effects, formal tests work best with moderate samples (n = 30-100) but are problematic with very small or very large samples. Always start with visual assessment (Q-Q plots, histograms) regardless of sample size.
Answer: They most often conflict due to:
::::
When visual and formal methods conflict, follow this hierarchical approach[5,7]:
Step 1: Prioritize visual assessment (reveals nature and magnitude)
Step 2: Consider sample size when interpreting formal tests
Step 3: Evaluate practical vs. statistical significance
Step 4: Apply convergence rule
Apply the convergence rule to resolve conflicts between methods:
Scenario 1: Large Sample (n = 150)
Decision: Use parametric
Scenario 2: Small Sample (n = 22)
Decision: Transform or use nonparametric
Key insight
When methods conflict, trust the pattern across multiple indicators rather than relying on a single test. Large samples reveal trivial issues; small samples hide real problems.
Answer: Prioritize visual assessment (Q-Q plots, histograms) because they reveal the nature and magnitude of departures. Formal tests serve as supplementary evidence, but are highly influenced by sample size.
| Visual Assessment | Formal Test | Sample Size | Recommended Action |
|---|---|---|---|
| Approximately normal | p < 0.05 | n < 30 | Use parametric (test underpowered) |
| Approximately normal | p < 0.05 | n ≥ 100 | Use parametric (trivial departure) |
| Clear departure | p > 0.05 | n < 30 | Transform/nonparametric (test underpowered) |
| Clear departure | p > 0.05 | n ≥ 100 | Investigate data quality |
| Mild departure | p < 0.05 | Any | Use robust methods (Welch’s t) |
| Severe departure | p < 0.05 | Any | Transform/nonparametric |
Practical checklist
When reviewing SPSS output, systematically check:
✓ Sample size, ✓ Visual (Q-Q + histogram), ✓ Magnitude (z-skew, z-kurt), ✓ Formal test (Shapiro-Wilk), ✓ Integrated decision
Movement Science data often show systematic departures from normality[3,4]:
Real example
A researcher collects reaction times: strong right skew (skew = 1.8), Shapiro-Wilk p = .002, Q-Q plot shows clear upward curvature.
Options:
High priority: Check normality carefully
Lower priority: Normality less critical
When departures are consequential, consider these principled options[4,12]:
Use robust or nonparametric methods:
Transform the variable:
Accept departure and proceed with caution:
Separate subgroups: If bimodal, analyze groups separately[15]
Acknowledge and report: Describe distributional shape and justify your approach[7]
Scenario: Sprint times from 40 participants
Step 1: Visualize (histogram + Q-Q plot)
Step 2: Compute shape measures
Step 3: Run formal test
Step 4: Integrated decision