KIN 610: Quantitative Methods in Kinesiology

Chapter 8: Probability and Sampling Error

Ovande Furtado Jr., PhD.

Professor, Cal State Northridge

2026-02-11

FYI

This presentation is based on the following books. The references are coming from these books unless otherwise specified.

Main sources:

ClassShare App

You may be asked in class to go to the ClassShare App to answer questions.

SPSS Tutorial

Intro Question

  • If you took 100 different random samples from the same population, would you expect all the sample means to be exactly the same? Why or why not?
Click to reveal answer No! Each sample would include different individuals, so sample means would naturally vary. This variability is called sampling error, and understanding it is the foundation of statistical inference.
  • In this chapter, we’ll explore how sample statistics (e.g., means, standard deviations) vary from sample to sample, introduce the Central Limit Theorem, and learn how confidence intervals and probability help us make inferences about populations from sample data.

Learning Objectives

By the end of this chapter, you should be able to:

  • Define probability and explain its role in statistical inference
  • Explain sampling error and why sample statistics vary
  • Describe the sampling distribution of the mean
  • State the Central Limit Theorem and explain its importance
  • Calculate and interpret the standard error of the mean
  • Construct and interpret confidence intervals
  • Distinguish between point estimates and interval estimates
  • Apply probability concepts to movement science research

Symbols

Symbol Name Pronunciation Definition
\(\mu\) Population mean “myoo” True average of the population
\(\bar{x}\) Sample mean “x bar” Average of the sample
\(\sigma\) Population standard deviation \(\sigma\) Population variability
\(s\) Sample standard deviation “s” Sample variability
\(n\) Sample size “n” Number of observations in a sample
\(SE\) Standard error “standard error” Standard deviation of the sampling distribution
\(CI\) Confidence interval “C.I.” Range of plausible values for a parameter
\(P(A)\) Probability of event A “probability of A” Likelihood of event A occurring
\(z\) Z-score “zee” Number of standard deviations from the mean

The Reality of Research: Your Sample

In the real world, you don’t see the whole population. You only see one sample.

Scenario: You measure vertical jump height in 30 students.

  • You calculate the sample mean: \(\bar{x} = 51.2\) cm.
  • This is your point estimate.

Critical Questions:

  1. Is \(51.2\) cm the “true” average of all students?
  2. If you repeated the study tomorrow with different students, would you get \(51.2\) again?
Figure 1: Your Observed Data (n=30)

The Challenge

We have this one number (\(\bar{x} = 51.2\)), but we know that sampling variability exists. How do we go from this single snapshot to making a confident statement about the entire population?

Introduction: From Samples to Populations

Statistical inference is the process of drawing conclusions about a population based on information from a sample[1].

  • We rarely have access to entire populations
    • e.g., every graduate student in California
  • Instead, we collect a sample and use it to estimate population parameters
  • The key challenge: How confident can we be in our estimates?
  • This chapter introduces the foundational concepts that make inference possible
Figure 2: Relationship between population and sample

What is Probability?

Probability quantifies the likelihood that an event will occur, expressed as a number between 0 and 1[1].

Key concepts:

  • \(P(A) = 0\): Event A is impossible
  • \(P(A) = 1\): Event A is certain
  • \(P(A) = 0.5\): Event A has a 50-50 chance

Types of probability:

  1. Classical: Based on equally likely outcomes (e.g., coin flip: \(P(\text{heads}) = 0.5\))
  2. Relative frequency: Based on long-run proportions from data (e.g., 68% of athletes jump above 40 cm)
  3. Subjective: Based on personal judgment or expertise

Movement Science example:

If vertical jump heights are normally distributed with \(\mu = 50\) cm and \(\sigma = 8\) cm, what is the probability of jumping higher than 58 cm?

\[z = \frac{58 - 50}{8} = 1.0\]

\(P(X > 58) = 1 - 0.8413 = 0.1587\)15.87%

0.8413 is the cumulative probability up to 1 standard deviation above the mean.

Real-World Context

In a study of 200 college athletes, approximately 32 would be expected to jump higher than 58 cm (15.87% × 200 ≈ 32).

Check Question

Check your understanding: If P(A) = 0.05, how would you interpret this?
Click to reveal answer

Answer: There is a 5% chance (or 1 in 20) that event A will occur. In statistics, this is the conventional threshold for “rare enough to be noteworthy” — the significance level α = 0.05.

Probability in Movement Science

Probability formalizes statements about uncertainty and enables evidence-based predictions.

Key Applications:

  • Performance Thresholds: What is the probability a player jumps > 70 cm?
    • Example: If 25/80 exceed this, empirical probability = 31%.
  • Outcome Likelihood: Is a 10% sprint improvement due to training or random chance?
  • Treatment Effects: Is a difference between groups real or sampling variability?
Figure 3: Visualizing probability concepts

Why it matters

Research inherently involves uncertainty (sampling, measurement error, uncontrolled factors). Probability gives us the language to reason about this uncertainty.

Sampling Error

Sampling error is the natural,unavoidable difference between a sample statistic (e.g., \(\bar{x}\)) and the true population parameter (e.g., \(\mu\))[1,2].

Key points:

  • Even with perfect random sampling, sample means will differ from the population mean
  • This is not a mistake — it’s a natural consequence of randomness
  • Different samples from the same population will produce different statistics
  • Sampling error decreases as sample size increases

Formula:

\[\text{Sampling Error} = \bar{x} - \mu\]

Figure 4: Five random samples from the same population show different means

Important

Sampling error is not an error in the colloquial sense — it is the expected variability that arises from using a sample to estimate a population parameter.

Workflow for understanding sampling error

Use this sequence when interpreting sample-based research findings:

  1. Recognize that your sample is one of many possible samples from the population.
  2. Acknowledge that sampling error exists and affects all sample statistics.
  3. Quantify uncertainty using the standard error (SE) or confidence intervals.
  4. Consider how sample size influences the precision of estimates.
  5. Make probabilistic (not absolute) statements about populations based on samples.

The Sampling Distribution

The sampling distribution (distribution of sample means) is the distribution of a statistic (like the sample mean) across all possible samples of the same size from a population[1].

The bridge between descriptive statistics (describing our sample) and inferential statistics (making claims about populations) is built on sampling distributions[1].

Key properties:

  1. Center: The mean of the sampling distribution equals the population mean (\(\mu_{\bar{x}} = \mu\))
  2. Spread: The standard deviation is the standard error (\(SE = \sigma / \sqrt{n}\))
  3. Shape: Becomes approximately normal as \(n\) increases (Central Limit Theorem)
Figure 5: Sampling distribution of the mean (1000 samples of n = 30)

Think of it this way

If you could take an infinite number of samples (each of size \(n\)) from the same population and calculate the mean of each, the distribution of those means would form the sampling distribution.

Building the Sampling Distribution

Open App in New Tab

Check Question

Check your understanding: Is sampling error a mistake made during data collection?
Click to reveal answer

Answer: No! Sampling error is the natural, expected variability between a sample statistic and the population parameter. It occurs even with perfect random sampling because each sample contains different individuals.

Standard Error of the Mean

The standard error (SE) is the standard deviation of the sampling distribution — it tells us how much sample means vary from sample to sample[1,2].

\[ SE = \frac{\sigma}{\sqrt{n}} \]

Equation 1: Standard error formula

Key insights:

  • SE measures the precision of our estimate (the sample mean).
  • SE decreases as \(n\) increases (larger samples → more precise estimates).
  • SE depends on population variability (\(\sigma\)) and sample size (\(n\)).
    • More variable populations require larger samples to achieve the same precision.

Estimating Standard Error in Practice

In real research, we rarely know the true population standard deviation (\(\sigma\)). Instead, we estimate it using the sample standard deviation (\(s\)).

\[ SE_{\bar{x}} \approx \frac{s}{\sqrt{n}} \]

The Power of Sample Size

The table below shows how the SE shrinks as sample size grows (assuming \(s = 10\)).

Sample Size (\(n\)) Standard Error (\(SE\))
10 3.16
30 1.83
100 1.00
400 0.50

The “Square Root” Rule: Because \(n\) is in the denominator under a square root, to cut the error in half, you must quadruple (\(4\times\)) the sample size.

Figure 6: Standard error decreases as sample size increases

Important

Diminishing Returns: Notice that the curve flattens out. Increasing \(n\) from 10 to 30 gives a huge gain in precision. Increasing \(n\) from 100 to 120 gives very little. This is crucial for cost-effective study design.

The Central Limit Theorem

The Central Limit Theorem (CLT) is one of the most important theorems in statistics[1,2].

CLT: Regardless of the shape of the population distribution, the sampling distribution of the mean (sampling distribution of \(\bar{x}\)) approaches a normal distribution as the sample size increases.

Warning

We are not saying that sample means (your data) are normal. We are saying that the distribution of sample means (sampling distribution) is normal.

Conditions:

  1. Random sampling from the population
  2. Sample size is sufficiently large (\(n \geq 30\) as a common rule of thumb)
  3. Observations are independent

Implications:

  • Even if the population is skewed, the distribution of sample means will be approximately normal
  • This justifies using normal-based methods (z-tests, confidence intervals) for inference
Figure 7: CLT: Population is right-skewed, but sampling distribution of means becomes normal

Important

The CLT explains why statistics works: Even when we don’t know the shape of the population, we can make valid inferences about the mean as long as our sample is large enough.

Check Question

Check your understanding: Does the CLT require the population to be normally distributed?
Click to reveal answer

Answer: No! That’s the beauty of the CLT — the sampling distribution of the mean approaches normality regardless of the population’s shape, as long as the sample size is sufficiently large.

Confidence Intervals

A confidence interval (CI) is a range of plausible values for a population parameter, constructed from sample data[1,2].

Formula for 95% CI:

\[ CI_{95\%} = \bar{x} \pm 1.96 \times SE \]

Equation 2: 95% confidence interval formula

Components:

  • \(\bar{x}\): Sample mean (our best estimate)
  • \(1.96\): Z-value for 95% confidence
  • \(SE\): Standard error (\(\sigma / \sqrt{n}\) or \(s / \sqrt{n}\))

Example: \(\bar{x} = 53\) cm, \(SE = 1.1\) cm

\[CI_{95\%} = 53 \pm 1.96(1.1) = [50.84, 55.16]\]

Figure 8: 20 confidence intervals: Most capture μ, but some miss it

Important

Interpretation: “We are 95% confident that the true population mean falls between 50.84 and 55.16 cm.” This means that if we repeated this process many times, about 95% of our intervals would capture the true μ.

Check Question

Check your understanding: If you calculated 100 different 95% confidence intervals from 100 different samples, how many would you expect to contain the true population mean?
Click to reveal answer

Answer: About 95 out of 100 intervals would be expected to contain the true population mean μ. Approximately 5 intervals would miss it — this is the inherent 5% error rate of 95% confidence intervals.

Factors Affecting Confidence Interval Width

The width of a confidence interval is influenced by three factors[1]:

1. Sample size (\(n\)):

  • Larger \(n\) → narrower intervals (more precision)
  • \(SE = \sigma / \sqrt{n}\), so larger \(n\) reduces SE

2. Variability (\(\sigma\) or \(s\)):

  • More variable data → wider intervals
  • Less control over this factor

3. Confidence level:

  • Higher confidence → wider intervals
  • 99% CI is wider than 95% CI (z = 2.576 vs. 1.96)
Confidence Level Z-value Interval Width
90% 1.645 Narrowest
95% 1.960 Moderate
99% 2.576 Widest

Trade-off

Higher confidence requires wider intervals (less precision). The 95% level is a convention that balances confidence and precision.

Point Estimates vs. Interval Estimates

Point estimate: A single value used to estimate a population parameter

  • \(\bar{x} = 53\) cm
  • Simple but provides no information about precision
  • Could be very close to μ or far from it

Interval estimate: A range of plausible values

  • 95% CI: [50.84, 55.16] cm
  • Communicates both the estimate and its uncertainty
  • Preferred in modern statistical reporting

APA Recommendation

The American Psychological Association (APA) recommends reporting confidence intervals alongside or instead of p-values[3].

Why? Confidence intervals provide more information:

  • The point estimate (center of the CI)
  • The precision of the estimate (width of the CI)
  • The direction and magnitude of the effect

Summary: Key Takeaways

  1. Probability quantifies uncertainty and is the foundation of statistical inference
  2. Sampling error is natural variability, not a mistake — it’s expected when using samples
  3. The sampling distribution shows how sample statistics vary across all possible samples
  4. Standard error measures the precision of sample estimates (\(SE = \sigma / \sqrt{n}\))
  5. Central Limit Theorem: Sampling distribution of the mean is approximately normal for large \(n\), regardless of population shape
  6. Confidence intervals provide a range of plausible values for population parameters
  7. Larger samples lead to smaller standard errors and narrower confidence intervals
  8. 95% confidence means that 95% of intervals from repeated sampling would capture the true parameter

Important

These concepts form the foundation for hypothesis testing (Chapter 10), where we’ll use probability to make formal decisions about whether observed differences are real or due to chance.

Practice Questions

  1. What is the difference between a population parameter and a sample statistic?
  2. If \(\sigma = 15\) and \(n = 25\), what is the standard error?
  3. Explain the Central Limit Theorem in your own words. Why is it so important?
  4. A 95% CI is [42, 58] cm. What does this mean?
  5. How does increasing sample size affect the confidence interval width?
  6. If you took 200 random 95% CIs, how many would you expect to miss the true μ?
  7. What is the relationship between probability and statistical inference?
  8. Why is the sample mean considered a good estimator of the population mean?

References

1. Moore, D. S., McCabe, G. P., & Craig, B. A. (2021). Introduction to the practice of statistics (10th ed.). W. H. Freeman; Company.
2. Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.
3. Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.
4. Furtado, O., Jr. (2026). Statistics for movement science: A hands-on guide with SPSS (1st ed.). https://drfurtado.github.io/sms/