Descriptive Statistics

A rigorous, evidence-based introduction to descriptive statistics for beginner researchers in kinesiology and the humanities. Covers measures of central tendency and variability, the coefficient of variation, R-based visualizations, and best practices for reporting.

mean
median
mode
variance
standard deviation
interquartile range
range
coefficient of variation
Author
Affiliation

Cal State Northridge

Published

February 18, 2023

1 Learning Objectives

By the end of this post, you should be able to:

  1. Define descriptive statistics and explain why they are the essential first step in any data analysis pipeline.
  2. Distinguish between the mean, median, and mode, and select the appropriate measure for a given data distribution.
  3. Interpret measures of variability—range, IQR, variance, standard deviation, and coefficient of variation—and explain what each reveals about the spread of data.
  4. Recognize how outliers and skewness affect each measure.
  5. Produce and interpret descriptive graphs in R, including histograms and boxplots.
  6. Apply descriptive statistics concepts to real examples from movement science and the humanities.

2 Why Descriptive Statistics Matter

Before asking whether a new training protocol improves sprint speed, or whether a new teaching method raises test scores, researchers must first describe their data clearly. Descriptive statistics are mathematical summaries that characterize the location, spread, and shape of a distribution (Field, 2018; Weir & Vincent, 2021). Without them, patterns remain invisible and errors go unnoticed.

Consider a practical example: a sport scientist studying 30-meter sprint times in youth soccer players could report pages of raw numbers, or they could report that the group averaged 4.3 s (SD = 0.4 s), with one outlier at 6.1 s who was later identified as recovering from a hamstring strain. The descriptive summary communicates far more efficiently—and flags a data quality issue that raw tables would bury.

Note

Descriptive vs. Inferential Statistics
Descriptive statistics summarize the sample in hand. Inferential statistics use the sample to make probability-based claims about a broader population (Weir & Vincent, 2021). This post focuses exclusively on the descriptive layer—the foundation that must be solid before any inference is attempted.

3 Mathematical Symbols

The table below collects all notation used in this post for quick reference.

Measure Symbol
Mean (population) \(\mu\)
Mean (sample) \(\bar{x}\)
Median \(Mdn\)
Mode \(Mo\)
Range \(R\)
Interquartile Range \(IQR\)
Variance (population) \(\sigma^2\)
Variance (sample) \(s^2\)
Standard Deviation (population) \(\sigma\)
Standard Deviation (sample) \(s\)
Coefficient of Variation \(CV\)

4 Measures of Central Tendency

Measures of central tendency answer one fundamental question: What is the typical value in this dataset? (Gravetter et al., 2021; Weir & Vincent, 2021). Three statistics address that question in different ways—the mean, median, and mode—each making different assumptions about the data and each sensitive to different distributional features.

4.1 Mean

The arithmetic mean (\(\bar{x}\)) is the sum of all observations divided by the number of observations (Weir & Vincent, 2021). It is the most widely used and statistically efficient estimator of the population center when data are approximately normally distributed.

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]

Movement science example. Suppose we measure the vertical jump height (cm) of eight collegiate volleyball players:

\[ 52, \; 55, \; 48, \; 60, \; 53, \; 57, \; 50, \; 61 \]

\[ \bar{x} = \frac{52 + 55 + 48 + 60 + 53 + 57 + 50 + 61}{8} = \frac{436}{8} = 54.5 \text{ cm} \]

This single number tells coaches that, on average, the team clears 54.5 cm—immediately useful for program benchmarking (Hopkins, 2000).

Humanities example. In a study of reading fluency among undergraduate students, researchers might report the mean number of words read correctly per minute (WCPM). A mean of 148 WCPM with a small standard deviation suggests the group is homogeneous; a large SD points to heterogeneity that may require differentiated instruction.

Warning

Sensitivity to outliers. The mean incorporates every observation equally, so a single extreme score can distort it substantially. If the sport scientist above had accidentally entered 601 cm instead of 61 cm, the mean would jump to 118.1 cm—an absurd value that would not be caught unless the analyst also examined graphical displays (Field, 2018).

4.1.1 Worked Example in R

jump_heights <- c(52, 55, 48, 60, 53, 57, 50, 61)

cat("Mean jump height:", round(mean(jump_heights), 2), "cm\n")
Mean jump height: 54.5 cm

4.2 Median

The median (\(Mdn\)) is the middle value when observations are ordered from smallest to largest. For an even number of observations it is the average of the two central values (Weir & Vincent, 2021). The median is resistant to outliers because it is determined by rank position, not magnitude.

When to prefer the median. Research on athletes’ salaries routinely uses the median rather than the mean, because a handful of top-earners would inflate the mean to a level unrepresentative of the typical player. The same logic applies to any skewed physiological or performance variable—for example, injury recovery time, where most athletes recover quickly but a few require months (Field, 2018).

Movement science example. Ten cross-country runners complete a 5 km race (minutes):

\[ 18.2, \; 19.0, \; 19.5, \; 19.8, \; 20.1, \; 20.4, \; 21.0, \; 21.6, \; 22.3, \; 31.4 \]

The last value (31.4 min) belongs to a runner who fell and finished despite an ankle sprain. Ordered, the two central values are 20.1 and 20.4, giving:

\[ Mdn = \frac{20.1 + 20.4}{2} = 20.25 \text{ min} \]

The mean would be 21.3 min—inflated by the outlier. The median better represents the “typical” runner.

4.2.1 Worked Example in R

race_times <- c(18.2, 19.0, 19.5, 19.8, 20.1, 20.4, 21.0, 21.6, 22.3, 31.4)

writeLines(c(
  paste0("Mean race time : ",  round(mean(race_times), 2), " min"),
  paste0("Median race time: ", round(median(race_times), 2), " min"),
  paste0("Difference      : ", round(mean(race_times) - median(race_times), 2),
         " min — illustrates outlier pull on the mean")
))
Mean race time : 21.33 min
Median race time: 20.25 min
Difference      : 1.08 min — illustrates outlier pull on the mean

4.3 Mode

The mode (\(Mo\)) is the most frequently occurring value in a dataset (Weir & Vincent, 2021). It is the only measure of central tendency applicable to nominal (categorical) data and is particularly informative when the researcher wants to identify the most common category or score.

Movement science example. A physical education researcher surveys 120 middle-school students about their favorite physical activity. The mode—say, basketball (reported by 38 students)—directly informs curriculum decisions. No arithmetic is possible on these nominal labels, so neither mean nor median can be computed.

Humanities example. A corpus linguist analyzing the most common words in a set of 18th-century philosophical texts finds that reason occurs more frequently than any other content word (the modal word). This frequency analysis is a form of finding the mode across word-type categories.

4.3.1 Multimodal distributions

Data can have more than one mode. Reaction-time data from a mixed-age sample (children vs. adults) sometimes show a bimodal distribution—one peak around 250 ms (adults) and another around 400 ms (children)—revealing that the population is not homogeneous. Reporting only the mean would obscure this structure entirely (Field, 2018).

4.4 Visualizing and Comparing the Three Measures

The figure below uses simulated sprint-time data to show how all three measures co-locate in a roughly normal distribution and how they diverge when one extreme score is introduced.

#| fig-cap: "Sprint times (seconds) for 50 simulated youth athletes. Left panel shows a symmetric distribution; right panel introduces a single outlier (20 s). Dashed vertical lines mark the mean (blue), median (green), and mode (red)."

set.seed(42)
n   <- 50
sym <- rnorm(n, mean = 5.8, sd = 0.35)   # symmetric

# Calculate mode for continuous data using density peak
get_mode_continuous <- function(x) {
  d <- density(x)
  d$x[which.max(d$y)]
}

skw <- c(sym, 20)  # introduce one outlier

par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))

# ---- Panel A: Symmetric ----
hist(sym, breaks = 12, col = "steelblue", border = "white",
     main = "A: Symmetric", xlab = "Sprint time (s)", ylab = "Frequency", xlim = c(4.5, 7))
abline(v = mean(sym),                    col = "blue",  lwd = 2, lty = 2)
abline(v = median(sym),                  col = "darkgreen", lwd = 2, lty = 2)
abline(v = get_mode_continuous(sym),     col = "red",   lwd = 2, lty = 2)
legend("topright", legend = c("Mean", "Median", "Mode"),
       col = c("blue","darkgreen","red"), lty = 2, lwd = 2, cex = 0.8, bty = "n")

# ---- Panel B: With outlier ----
hist(skw, breaks = 15, col = "salmon", border = "white",
     main = "B: One outlier added (20 s)", xlab = "Sprint time (s)", ylab = "Frequency")
abline(v = mean(skw),                    col = "blue",  lwd = 2, lty = 2)
abline(v = median(skw),                  col = "darkgreen", lwd = 2, lty = 2)
abline(v = get_mode_continuous(sym),     col = "red",   lwd = 2, lty = 2)
legend("topright", legend = c("Mean", "Median", "Mode"),
       col = c("blue","darkgreen","red"), lty = 2, lwd = 2, cex = 0.8, bty = "n")

par(mfrow = c(1,1))
Figure 1

Interpretation: In the symmetric distribution (Panel A), all three measures nearly coincide—a hallmark of normality. In Panel B, the outlier pulls the mean rightward while the median and mode remain stable, demonstrating the mean’s vulnerability to extreme values (Gravetter et al., 2021).

4.5 Summary: Choosing a Measure of Central Tendency

Measure Best used when… Sensitive to outliers? Works with nominal data?
Mean Data are continuous, roughly symmetric Yes No
Median Data are skewed or contain outliers No No
Mode Data are categorical, or you need the most frequent value No Yes

5 Measures of Variability

Reporting only a measure of central tendency is insufficient and can be misleading (Thomas et al., 2015; Weir & Vincent, 2021). Two groups can share an identical mean yet differ dramatically in their spread. For instance, two groups of runners both averaging 10.5 s in the 100-m dash might have standard deviations of 0.2 s (a homogeneous, elite group) versus 1.8 s (a heterogeneous recreational group)—a distinction with profound practical implications for training design.

Measures of variability quantify how spread out observations are around the center of a distribution. Four primary measures are used in kinesiology and behavioral research: range, interquartile range, variance, and standard deviation (Weir & Vincent, 2021). A fifth—the coefficient of variation—extends the concept to relative comparisons across different scales.

5.1 Range

The range is the distance between the maximum and minimum values:

\[ R = X_{max} - X_{min} \]

It requires only two data points to compute and is easily understood. However, it is maximally sensitive to outliers: a single extreme observation can inflate the range irrespective of what the remaining data look like (Weir & Vincent, 2021).

Movement science example. Grip strength (kg) for 12 collegiate wrestlers:

\[ 45, \; 48, \; 50, \; 52, \; 53, \; 54, \; 55, \; 56, \; 58, \; 60, \; 62, \; 89 \]

\[ R = 89 - 45 = 44 \text{ kg} \]

The value 89 kg belongs to the team’s nationally ranked heavyweight; without it, \(R = 62 - 45 = 17\) kg. Reporting only the range would overstate within-team variability for the 11 remaining athletes.

Practical guideline. Report the range alongside the mean or median as a quick data-quality check, but do not rely on it as the sole variability descriptor (Field, 2018).

5.2 Interquartile Range

The interquartile range (IQR) is the spread of the middle 50% of the distribution:

\[ IQR = Q_3 - Q_1 \]

where \(Q_1\) is the 25th percentile and \(Q_3\) is the 75th percentile. Because IQR excludes the upper and lower 25% of data, it is robust against outliers and is the natural companion to the median (Field, 2018; Weir & Vincent, 2021).

Movement science example. Hip flexor flexibility (°) measured in 20 collegiate dancers (sorted):

\[ 72, 75, 78, 80, 82, 84, 85, 87, 88, 90, 91, 92, 94, 95, 96, 98, 100, 103, 108, 140 \]

flexibility <- c(72, 75, 78, 80, 82, 84, 85, 87, 88, 90,
                 91, 92, 94, 95, 96, 98, 100, 103, 108, 140)

q <- quantile(flexibility, probs = c(0.25, 0.75))
writeLines(c(
  paste0("Q1  = ", q[1], " degrees"),
  paste0("Q3  = ", q[2], " degrees"),
  paste0("IQR = ", q[2] - q[1], " degrees")
))
Q1  = 83.5 degrees
Q3  = 96.5 degrees
IQR = 13 degrees

The IQR of 13° excludes the extreme value of 140° (likely a hyperflexible individual), giving a more representative picture of the group.

Boxplot — the IQR made visible. The boxplot is the standard graphical display for IQR-based summaries. The box spans \(Q_1\) to \(Q_3\); the horizontal line inside marks the median; whiskers extend to the most extreme non-outlier values; points beyond the whiskers are plotted individually as potential outliers (Field, 2018).

#| fig-cap: "Boxplot of hip flexor flexibility scores (degrees) in 20 collegiate dancers. The box spans the IQR (Q1–Q3), the center line marks the median, and the filled circle identifies the outlier (140°)."

boxplot(flexibility,
        horizontal = TRUE,
        col = "lightblue",
        border = "navy",
        main = "Hip Flexor Flexibility — Collegiate Dancers",
        xlab = "Flexibility (degrees)",
        pch  = 19,
        outcol = "red")
text(quantile(flexibility, 0.25), 1.3,
     paste0("Q1 = ", quantile(flexibility, 0.25)), cex = 0.8, col = "navy")
text(quantile(flexibility, 0.75), 1.3,
     paste0("Q3 = ", quantile(flexibility, 0.75)), cex = 0.8, col = "navy")
text(median(flexibility), 0.7,
     paste0("Mdn = ", median(flexibility)), cex = 0.8, col = "darkgreen")
Figure 2

5.3 Variance

The sample variance (\(s^2\)) is the average squared deviation from the mean:

\[ s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n - 1} \]

The denominator uses \(n - 1\) (rather than \(n\)) to produce an unbiased estimate of the population variance—a property known as Bessel’s correction (Gravetter et al., 2021).

Movement science example. Reaction times (ms) for five sprinters at the starting blocks:

\[ 141, \; 150, \; 155, \; 132, \; 145 \]

rt <- c(141, 150, 155, 132, 145)
xbar <- mean(rt)
n    <- length(rt)
deviations_sq <- (rt - xbar)^2

writeLines(c(
  paste0("Mean reaction time: ", xbar, " ms"),
  paste0("Squared deviations: ", paste(round(deviations_sq, 2), collapse = "  ")),
  paste0("Sum of sq. dev.   : ", round(sum(deviations_sq), 2)),
  paste0("Variance (s²)     : ", round(var(rt), 2), " ms²")
))
Mean reaction time: 144.6 ms
Squared deviations: 12.96  29.16  108.16  158.76  0.16
Sum of sq. dev.   : 309.2
Variance (s²)     : 77.3 ms²
Note

Why squared units? Squaring the deviations removes negative signs and penalizes large deviations more heavily than small ones. However, the result is expressed in squared units (ms²), which are difficult to interpret directly. This motivates the standard deviation, which restores the original measurement scale.

5.4 Standard Deviation

The standard deviation (\(s\)) is the square root of the variance:

\[ s = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}} \]

It is the most commonly reported variability statistic in kinesiology research because it shares units with the original measurement, making it directly interpretable (Hopkins, 2000; Weir & Vincent, 2021).

A useful rule of thumb (Normal distribution). For roughly normal data:

  • ~68% of observations fall within \(\bar{x} \pm 1s\)
  • ~95% of observations fall within \(\bar{x} \pm 2s\)
  • ~99.7% of observations fall within \(\bar{x} \pm 3s\)

This “68–95–99.7 rule” helps researchers quickly identify unusual values and assess normality (Gravetter et al., 2021).

Movement science example (continued). From the reaction-time data above:

writeLines(c(
  paste0("Standard deviation (s): ", round(sd(rt), 2), " ms"),
  paste0("68% CI (approx.): ", round(mean(rt) - sd(rt), 1),
         " to ", round(mean(rt) + sd(rt), 1), " ms")
))
Standard deviation (s): 8.79 ms
68% CI (approx.): 135.8 to 153.4 ms

Humanities example. A historian records the publication lag (years from manuscript completion to print) for 40 major philosophical treatises from the Enlightenment era. A mean of 3.2 years with \(s = 0.8\) years indicates relatively consistent publication timelines; \(s = 5.6\) years would signal high variability potentially attributable to political censorship or patronage networks—a substantively interesting finding.

5.4.1 Visual summary: mean ± SD with individual data points

#| fig-cap: "Reaction times (ms) for five sprinters at the starting blocks. The dashed line marks the group mean; shaded band shows ±1 SD. Individual data points are displayed as filled circles."

rt_data <- data.frame(athlete = factor(1:5), rt = c(141, 150, 155, 132, 145))
xm <- mean(rt_data$rt)
xs <- sd(rt_data$rt)

plot(rt_data$athlete, rt_data$rt,
     pch = 19, col = "steelblue", cex = 1.5,
     ylim = c(115, 175),
     xlab = "Athlete", ylab = "Reaction Time (ms)",
     main = "Sprinter Reaction Times: Mean ± 1 SD")
abline(h = xm, lty = 2, col = "navy", lwd = 2)
rect(0.5, xm - xs, 5.5, xm + xs, col = rgb(0, 0, 1, 0.08), border = NA)
text(5.3, xm + 2, expression(bar(x)), col = "navy", cex = 0.9)
text(5.3, xm + xs + 2, "+1 SD", col = "steelblue", cex = 0.8)
text(5.3, xm - xs - 2, "-1 SD", col = "steelblue", cex = 0.8)
Figure 3

5.5 Coefficient of Variation

The coefficient of variation (CV) expresses the standard deviation as a percentage of the mean:

\[ CV = \frac{s}{\bar{x}} \times 100\% \]

Because it is dimensionless, the CV is uniquely suited for comparing the relative variability of measurements recorded on different scales or units (Atkinson & Nevill, 1998; Hopkins, 2000; Weir & Vincent, 2021).

Movement science example. A strength and conditioning researcher wants to compare the consistency of two performance tests: vertical jump height (cm) and isokinetic knee extension torque (N·m).

# Vertical jump (cm)
jump <- c(52, 55, 48, 60, 53, 57, 50, 61, 54, 56)
# Isokinetic torque (N·m)
torque <- c(180, 210, 175, 230, 195, 220, 185, 215, 200, 190)

cv <- function(x) round(sd(x) / mean(x) * 100, 1)

results <- data.frame(
  Test           = c("Vertical Jump (cm)", "Isokinetic Torque (N·m)"),
  Mean           = c(round(mean(jump), 1), round(mean(torque), 1)),
  SD             = c(round(sd(jump), 1), round(sd(torque), 1)),
  CV_pct         = c(cv(jump), cv(torque))
)
names(results)[4] <- "CV (%)"
print(results, row.names = FALSE)
                    Test  Mean   SD CV (%)
      Vertical Jump (cm)  54.6  4.1    7.5
 Isokinetic Torque (N·m) 200.0 18.3    9.1

Even though the torque measure has a much larger SD in absolute terms, the CV reveals that the two tests have similar relative variability. This prevents the erroneous conclusion that torque is “more variable.”

Hopkins (2000) suggests that a CV below 5% is typically considered good test-retest reliability for laboratory performance tests in sports science, while CVs between 5–15% are acceptable, and values above 15% indicate poor reliability or substantial biological variability (Hopkins, 2000).

#| fig-cap: "Coefficient of variation (%) for four common kinesiology performance tests. The dashed line at 10% marks a frequently used threshold separating low from moderate variability."

tests <- c("Vertical Jump", "Grip Strength", "VO2max", "Reaction Time")
cvs   <- c(cv(jump), 8.2, 6.5, 12.3)

barplot(cvs,
        names.arg = tests,
        col = ifelse(cvs < 10, "steelblue", "salmon"),
        border = "white",
        ylab = "Coefficient of Variation (%)",
        main = "Relative Variability Across Performance Tests",
        ylim = c(0, 16),
        las = 1,
        cex.names = 0.85)
abline(h = 10, lty = 2, col = "red", lwd = 1.5)
text(4.8, 10.5, "10% threshold", col = "red", cex = 0.8)
Figure 4

5.6 Summary: Comparing Measures of Variability

Measure Formula Units Sensitive to outliers? Best use
Range \(X_{max} - X_{min}\) Same as data Very sensitive Quick screen; data entry check
IQR \(Q_3 - Q_1\) Same as data Resistant Skewed data; companion to median
Variance \(\frac{\sum(x_i - \bar{x})^2}{n-1}\) Squared units Moderate Mathematical derivations; ANOVA
SD \(\sqrt{s^2}\) Same as data Moderate Normal data; reporting with mean
CV \(\frac{s}{\bar{x}} \times 100\%\) Dimensionless (%) Moderate Comparing across different scales

6 Putting It All Together: A Worked Dataset

Suppose we collect resting heart rate (bpm) for 15 physical education students:

\[ 62, 68, 70, 72, 64, 75, 71, 69, 66, 73, 70, 68, 74, 65, 98 \]

The value 98 bpm belongs to a student who reported high caffeine consumption that morning.

hr <- c(62, 68, 70, 72, 64, 75, 71, 69, 66, 73, 70, 68, 74, 65, 98)

writeLines(c(
  paste0("n             = ", length(hr)),
  paste0("Mean          = ", round(mean(hr), 2), " bpm"),
  paste0("Median        = ", median(hr), " bpm"),
  paste0("SD            = ", round(sd(hr), 2), " bpm"),
  paste0("Variance      = ", round(var(hr), 2), " bpm\u00b2"),
  paste0("Range         = ", diff(range(hr)), " bpm"),
  paste0("IQR           = ", IQR(hr), " bpm"),
  paste0("CV            = ", round(sd(hr)/mean(hr)*100, 1), "%")
))
n             = 15
Mean          = 71 bpm
Median        = 70 bpm
SD            = 8.34 bpm
Variance      = 69.57 bpm²
Range         = 36 bpm
IQR           = 5.5 bpm
CV            = 11.7%
#| fig-cap: "Resting heart rate (bpm) for 15 physical education students. The histogram (left) shows a right-skewed distribution caused by the outlier; the boxplot (right) clearly identifies the outlier as a value beyond 1.5×IQR from Q3."

par(mfrow = c(1, 2), mar = c(4.5, 4, 3, 1))

hist(hr, breaks = 10, col = "lightblue", border = "white",
     main = "Histogram", xlab = "Resting Heart Rate (bpm)")
abline(v = mean(hr),   col = "blue",      lwd = 2, lty = 2)
abline(v = median(hr), col = "darkgreen", lwd = 2, lty = 2)
legend("topright", legend = c("Mean", "Median"),
       col = c("blue","darkgreen"), lty = 2, lwd = 2, cex = 0.75, bty = "n")

boxplot(hr, col = "lightblue", border = "navy",
        main = "Boxplot", ylab = "Resting Heart Rate (bpm)",
        pch = 19, outcol = "red", outcex = 1.2)

par(mfrow = c(1, 1))
Figure 5

Key takeaway: The mean (72.3 bpm) is pulled upward by the outlier; the median (70 bpm) is unaffected. The boxplot flags the outlier immediately. A researcher who reports only the mean without a boxplot or histogram would not detect this anomaly (Field, 2018).

7 Limitations and Best Practices

Descriptive statistics describe the sample—they do not support causal claims or generalizations to other populations (Thomas et al., 2015). Additionally, no single statistic should be reported in isolation:

  • Always pair the mean with the SD (or SE), and the median with the IQR.
  • Supplement tabular summaries with at least one graphical display (histogram or boxplot).
  • Screen for outliers before computing the mean and SD; document any cases removed or corrected.
  • Report sample size (\(n\)) explicitly, as the same SD carries very different weight depending on whether \(n = 10\) or \(n = 1{,}000\).
  • When comparing across scales or units, use the CV (Atkinson & Nevill, 1998; Hopkins, 2000).
Tip

APA reporting style. The APA Publication Manual (7th ed.) recommends reporting means and standard deviations in the text as \(M = 70.0\), \(SD = 8.5\), or in a summary table. For non-normal distributions, accompany the median and IQR: \(Mdn = 70\), \(IQR = 7\).

8 Summary

Descriptive statistics are the essential first layer of any quantitative analysis. The key concepts covered in this post are:

  • The mean, median, and mode each capture the center of a distribution differently; choice depends on data level and distributional shape.
  • The range, IQR, variance, SD, and CV quantify spread; they complement central tendency and should always be reported alongside it.
  • Graphical displays—particularly histograms and boxplots—are indispensable because they reveal features (outliers, skewness, multimodality) that summary statistics alone can miss.
  • The CV is uniquely valuable when comparing variability across measurements on different scales—a common scenario in kinesiology research.

These building blocks underpin every subsequent inferential procedure. A well-characterized, carefully visualized dataset is the strongest foundation for valid statistical inference.

Image credit

Illustration by Elisabet Guba from Ouch!

References

Atkinson, G., & Nevill, A. M. (1998). Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Medicine, 26(4), 217–238. https://doi.org/10.2165/00007256-199826040-00002
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE Publications.
Gravetter, F. J., Wallnau, L. B., & Forzano, L.-A. B. (2021). Statistics for the behavioral sciences (10th ed.). Cengage Learning.
Hopkins, W. G. (2000). Measures of reliability in sports medicine and science. Sports Medicine, 30(1), 1–15. https://doi.org/10.2165/00007256-200030010-00001
Thomas, J. R., Nelson, J. K., & Silverman, S. J. (2015). Research methods in physical activity (7th ed.). Human Kinetics.
Weir, J. P., & Vincent, W. J. (2021). Statistics in kinesiology (5th ed.). Human Kinetics.

Reuse

Citation

BibTeX citation:
@misc{furtado2026,
  author = {Furtado, Ovande},
  title = {Descriptive {Statistics}},
  date = {2026-03-04},
  url = {https://drfurtado.github.io/randomstats/posts/02162023-descriptive-statistics/},
  langid = {en}
}
For attribution, please cite this work as:
Furtado, O. (2026, March 4). Descriptive Statistics. RandomStats. https://drfurtado.github.io/randomstats/posts/02162023-descriptive-statistics/