Chapter 1: Measurement, Statistics, and Research

Ovande Furtado, Jr., Ph.D.

Module 1: The Foundation

This module establishes the bedrock of empirical research by turning observations into reliable data.

  • We will explore:
    • The precise process of measurement
    • Systems used to ensure universal understanding
    • Classification scales that determine what we are allowed to do with our data

Important

Before building the study, we need to understand the “materials” (data) and how they are created.

What is measurement?

  • Measurement is the process of comparing a value to a standard, transforming abstract observations into concrete, objective data.
  • The terms “I believe, I feel, I observe” become “The measured value is X units.”
  • Kinesiology example:
    • Grip strength is measured with a dynamometer
    • The force output is compared to a standard unit (pounds or kilograms)
    • The result is a quantified value (e.g., 52.3 kg)

Important

Reproducibility: Note and include in the report the device model, calibration date, and protocol so others can repeat the measurement.

Note

Quantification is the first step from anecdotal observation to scientific inquiry.

Measurement vs statistics vs evaluation

These are sequential, not interchangeable:

  • Measurement: produces data via comparison to a standard
  • Statistics: organizes and interprets measured data
  • Evaluation: assigns meaning or worth to the statistical results
  • Example:
    • Measure a client’s VO2 max
    • Use statistics to summarize or compare (45 mL/kg/min)
    • Evaluate whether the value is “good” for age and sex, then design training

G Measure Measurement Measure grip strength (52.3 kg) Stats Statistics Summarize/compare Measure->Stats Eval Evaluation Is it adequate for age/sex? Design hand therapy Stats->Eval

Note

The ethical and scientific risk is confusing objective data collection with subjective interpretation.

Process of measurement (4 steps)

  • Assuming that you have a clear research question, the measurement process involves four key steps:

G A 1 B 2 A->B C 3 B->C D 4 C->D

  1. Define object: Identify and define what is measured (example: athlete’s vertical jump height)
  2. Define standard: Choose and define the standard (example: centimeters)
  3. Compare: Collect data by comparing object to standard (example: jump mat or Vertec device)
  4. State relationship: Quantify the result (example shown: 65 cm)

Note

If you cannot clearly state the object and standard, your measurement is not reproducible.

Test your knowledge: Measurement process

  • You want to measure the peak power output of cyclists during a 5-second sprint on a cycle ergometer.
  • Outline the four steps of the measurement process for this scenario.

Using the ClassShare App, submit your answers.

Answer
  1. Define object: Peak power output during a 5-second sprint
  2. Define standard: Watts (W)
  3. Compare: Use a calibrated cycle ergometer to measure power output during the sprint
  4. State relationship: Quantify the result (example shown: 1,150 Watts)

Classification of data

  • Once a dataset is collected, it must be classified according to its properties.
  • This classification determines which statistical analyses are appropriate.
  • Nominal is the most basic level; ratio is the most advanced.
  • Each scale builds on the previous one by adding more properties - see next slide.
  • Understanding scale is essential for valid data analysis.
  • Four primary scales of measurement:
G A Nominal B Ordinal A->B C Interval B->C D Ratio C->D
Figure 1: Hierarchy of measurement scales in kinesiology research

Note

This is a major decision point. Misclassifying scale can lead to invalid analysis choices.

Test your knowledge: scale classification

  • For each of the following examples, identify the measurement scale (nominal, ordinal, interval, or ratio):
    1. Sport type (e.g., soccer, basketball, swimming)
    2. Finish place in a race (1st, 2nd, 3rd)
    3. Body temperature in Celsius
    4. Body mass in kilograms

Using the ClassShare App, submit your answers.

Answer
  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

Measurement scales at a glance

Scale Core idea What you can do Example (kinesiology)
Nominal Categories only counts, proportions sport type, group label s
Ordinal Rank order medians, ranks finish place, Likert-type ratings
Interval Equal intervals, no true zero add/subtract, means Celsius temperature
Ratio Equal intervals and true zero all arithmetic, ratios mass, time, distance, power

Note

If 0 means none of the quantity, it is ratio. If 0 is arbitrary, it is interval.

Scale pitfalls and examples

  • Nominal: labels only; numbers are arbitrary (example: “1” gymnasts, “2” weightlifters)
  • Ordinal: ordered, but gaps are unknown (example: 1st vs 2nd place could differ by milliseconds or seconds)
  • Interval: equal steps, but 0 is not absence (example: 0°C does not mean “no heat”)
  • Ratio: includes a true zero; ratio statements make sense (example: 100 kg is twice 50 kg)

Note

Using parametric tests that assume interval or ratio data on ordinal-only data is a methodological flaw that can invalidate conclusions.

Module 2: The Blueprint

  • Research Designs and Variable Roles

    • A research design is the strategic blueprint that dictates how a scientific question will be answered.
  • The choice of design is not arbitrary. It must match the problem you intend to solve.

  • Primary design styles:

    • Historical
    • Observational
    • Experimental

Note

Move from “materials” (measurement) to the architectural plan (design).

Research design and statistical analysis

  • Historical
    • Investigates the past to describe and understand events
    • Example: analyzing training logs from the 1960s to understand marathon preparation trends
    • Provides context, but cannot establish cause and effect
  • Observational (descriptive)
    • Describes existing phenomena without manipulation
    • Example: survey weekly physical activity levels of office workers
    • Useful for identifying correlations (example: sitting time and low back pain), but does not prove causation
  • Experimental
    • Manipulates a variable to test cause and effect
    • Example: sports drink vs placebo, then measure anaerobic power output (Wingate test)
    • Provides the strongest evidence for causal claims
    • Requires careful control of confounding variables
    • Statistical analyses depend on design:
      • Historical: qualitative summaries, thematic analysis
      • Observational: correlation, regression
      • Experimental: t-tests, ANOVA, regression

Note

Selecting the right design is the most critical strategic decision because it determines the strength of the evidence you can claim.

Independent and dependent variables

In experimental research, variable roles clarify the cause-effect relationship:

  • Independent variable (cause): manipulated or controlled by the researcher
    Example: beverage type (sports drink vs placebo)
  • Dependent variable (effect): measured outcome expected to change
    Example: anaerobic power output during a Wingate test

Read the abstract of this study and identify the independent and dependent variables. https://pubmed.ncbi.nlm.nih.gov/17507739/

Answer This is an experimental study that manipulates the independent variable (carbohydrate vs placebo consumption) and measures the dependent variable (anaerobic power output on the Wingate Anaerobic Test).

Note

We will use DV for dependent variable and IV for independent variable throughout the course.

Predictor and Criterion variables

In observational studies (no manipulation):

  • Predictor variable: used to predict an outcome
  • Criterion variable: the outcome being predicted

Read this study and identify the predictor and criterion variables. https://pubmed.ncbi.nlm.nih.gov/38782723/

Answer
  • Predictor: chronic short sleep duration (hours of sleep per night)
  • Criterion: reaction time

Note

Best practice for scatter plots: Predictor variable on x-axis, criterion variable on y-axis. This follows the causal flow (predictor → criterion) and makes regression relationships intuitive to interpret.

Module 3: The Integrity Check

  • Evaluating Research Validity
  • A blueprint can look elegant, but it is worthless if the resulting structure is unsound.
  • Validity refers to the truthfulness and appropriateness of the research process.
  • We focus on:
    • Internal validity
    • External validity
    • Common threats that compromise validity

Note

Validity is not just a technical detail, it is trustworthiness.

Internal and external validity

Internal validity

  • Degree of control within the experiment
  • Confidence that changes are due to the independent variable, not confounds
  • Example: a 12-week strength training study with a control group (pre and post tests, no training) helps rule out time and test learning effects

External validity

  • Ability to generalize findings beyond the specific sample and setting
  • Example: treadmill oxygen consumption measured in a lab may not generalize perfectly to outdoor competitive running with many uncontrolled factors
  • There is often a trade-off between internal and external validity

Note

Design studies to be both methodologically sound and practically relevant so findings are both true and useful.

Real-world examples of validity in research

Internal validity example

Study: “Internal Validity in Resistance Training Research” (Makaruk et al., 2022)
https://pubmed.ncbi.nlm.nih.gov/36281664/

This review of 340 randomized controlled trials (RCTs) in resistance training identifies threats to internal validity like maturation, history, testing effects, instrumentation, selection bias, and attrition. It provides recommendations for control groups, randomization, standardized protocols, and blinding to strengthen causal inferences in exercise science research.

External validity example

Study: “Decision-Making Skills in Youth Basketball Players: Diagnostic and External Validation of a Video-Based Assessment” (Rösch et al., 2021)
https://pubmed.ncbi.nlm.nih.gov/33673427/

This study validated a video-based decision-making assessment for youth basketball players by correlating results with real-game performance data (assists and turnovers). Significant associations showed the tool predicts on-court behavior, ensuring generalizability from lab to competitive sports for talent identification.

Threats to internal validity

Three common categories:

  1. Intervening variables
    1. extraneous variables: these influence the DV but are not related to the IV (e.g., weather)
    2. confounding variables: these influence the DV and are related to the IV (e.g., prior training status)
  2. Instrument error
    1. example: scale not zeroed, force plate miscalibrated
  3. Investigator error
    1. example: inconsistent instructions, subjective scoring without blinding

G A Threats to internal validity B Intervening variables A->B C Instrument error A->C D Investigator error A->D

Note

Students should understand act like “validity defenders” who anticipate threats before data collection begins.

Threat 1: Intervening variables

  • Intervening variables are factors outside the planned design that influence the dependent variable.

Example:

  • A diet plan study on body composition that does not control participants’ resistance training habits

Mitigation strategies:

  • Control groups
  • Inclusion and exclusion criteria
  • Clear protocols and monitoring of key behaviors

Knowledge check: threat 1

  • You are conducting a study to evaluate the effect of a new hydration strategy on cycling performance.
  • Identify one potential confounding variable and describe how you would control for it.

Using the ClassShare App, submit your answers.

Answer
  • Potential confounding variable: Ambient temperature during cycling tests
  • Control strategy: Conduct all performance tests in a climate-controlled lab environment to ensure consistent temperature

Threat 2: Instrument error

  • Instrument error occurs when measurement tools are faulty or uncalibrated.

Examples:

  • A scale or force plate not properly zeroed
  • Faulty equipment settings that create systematic inaccuracies

Mitigation strategies:

  • Calibration routines
  • Quality checks and logs
  • Standard operating procedures for setup and measurement

Knowledge check: threat 2

  • You are using a force plate to measure ground reaction forces during a jump test.
  • Describe one step you would take to minimize instrument error.

Using the ClassShare App, submit your answers.

Answer
  • Step to minimize instrument error: Perform a calibration check before each testing session by using known weights to ensure the force plate readings are accurate and consistent.

Threat 3: Investigator error

  • Investigator actions can unintentionally introduce bias into data.

Examples:

  • Inconsistent verbal encouragement during performance tests
  • Subjective scoring of movement quality without blinding

Mitigation strategies:

  • Blinding when feasible
  • Scripts and standardized instructions
  • Training and reliability checks for scoring

Knowledge check: threat 3

  • You are conducting a study where participants perform a series of balance tests, and you are scoring their performance based on movement quality.
  • Describe one strategy you would implement to reduce investigator error.

Using the ClassShare App, submit your answers.

Answer
  • Strategy to reduce investigator error: Use video recordings of the balance tests and have multiple blinded raters independently score the performances to ensure consistency and reduce subjective bias.

Module 4: Scaling the Model

  • Statistical Inference and Sampling
  • After a study is built and its integrity is checked, the next challenge is scaling conclusions.
  • Statistical inference is the bridge:
    • Making educated generalizations about a population based on a sample

Sampling focus:

  • Random sampling
  • Stratified sampling
  • Parameter vs statistic

Parameters and statistics

Definitions:

  • Population: the full group of interest
  • Sample: the measured subset
  • Parameter: numerical characteristic of the population
    • Example: average VO2 max of all college athletes
  • Statistic: numerical characteristic of the sample
    • Example: average VO2 max of surveyed college athletes

Sampling bias example:

  • Surveying only university gym students likely overestimates physical activity compared to the broader student body

Random and stratified sampling

Random sampling

  • Every population member has an equal and independent chance of selection
  • Helps avoid systematic bias
  • With sufficient size, tends to reflect population characteristics in natural proportions

Stratified sampling

  • Used when population has distinct subgroups
  • Steps:
    1. Divide population into strata (example: endurance, power, team sports)
    2. Randomly sample within each stratum proportional to its size
  • Helps ensure no subgroup is over or under represented
  • Can improve representativeness and precision

Module 5: The Stress Test

Hypotheses and Theoretical Frameworks

Now the finished structure undergoes a final test.

This module covers:

  • How broad theories generate testable hypotheses
  • How hypotheses are tested against statistical reality
  • The logic of null hypothesis significance testing using probability

Theories and hypotheses

  • Theory: broad conceptual framework explaining a phenomenon, informed by prior research and observation
    Example: mental visualization in motor learning suggests imagined practice can improve performance
  • Hypothesis: specific, testable prediction derived from theory
    Example hypothesis shown:
    “Basketball players who engage in 20 minutes of daily mental visualization of free throws will show greater improvement in free-throw accuracy than players who do not.”

Key idea:

  • Theories are usually too broad to test directly
  • Hypotheses are the testable units that accumulate evidence for or against a theory

Note

Page 15 synthesis note: translating a broad idea into a precise testable question is a core research skill and a core source of scientific creativity.

Hypothesis testing (H0 and H1)

  • Research hypothesis (H1): an effect exists
  • Null hypothesis (H0): no difference or no relationship

Example null statement:

  • “There is no difference in free-throw accuracy improvement between the visualization group and the control group.”

Key logic:

  • We do not prove H1 directly
  • We evaluate how consistent the data are with H0

Hypothesis testing workflow and the p-value

G A State hypotheses: H0 and H1 B Conduct experiment and analyze data A->B C Calculate p-value: probability of results if H0 is true B->C D Reject H0 Statistically significant Evidence supports H1 C->D p < 0.05 E Fail to reject H0 Not statistically significant Insufficient evidence for H1 C->E p >= 0.05

Interpretation aligned with the chapter:

  • The p-value is the probability of observing the results (or more extreme results) if H0 were true
  • If p < 0.05, results are unlikely due to random chance or sampling error, so we reject H0 and conclude a statistically significant effect