Learning Objectives

Define statistical inference and describe its importance in scientific research.
Explain the difference between population parameters and sample statistics.
Describe the concept of estimation and the types of estimators used in statistical inference.
Define hypothesis testing and its components: null hypothesis, alternative hypothesis, test statistic, and p-value.
Describe the difference between type I and type II errors and the factors that influence their occurrence.
Explain the concept of level of confidence and how it is used in hypothesis testing.
Describe the differences between one-tailed and two-tailed tests and how to identify the corresponding rejection regions.
Understand the concept of sampling error and its impact on inferential statistics.
Describe the properties and applications of common probability distributions, including the normal distribution, binomial distribution, and Poisson distribution.

1 Introduction

In Kinesiology, we often study the characteristics and behaviors of the human body in motion. We collect and analyze data from various sources, such as laboratory experiments, observational studies, and surveys. In order to make meaningful conclusions about these data and draw accurate inferences about the population, we need to understand the fundamentals of inferential statistics.

Statistics plays a crucial role in Kinesiology research, allowing us to analyze and interpret our collected data. However, it is important to differentiate between descriptive statistics and inferential statistics. Descriptive statistics refers to the methods used to summarize and visualize data, such as calculating means and standard deviations and creating graphs. This information can be useful in understanding the characteristics of the sample data, but it does not allow us to make inferences about the larger population.

On the other hand, inferential statistics is used to make predictions and inferences about a population based on a sample of data. It uses sample statistics, such as means, proportions, and variances, to estimate population parameters. Hypothesis testing is a key component of inferential statistics, allowing us to determine whether observed differences in sample data are statistically significant and not just due to chance.

Inferential statistics is essential for making decisions based on Kinesiology research and practice data. For example, a kinesiologist may conduct a study to evaluate the effectiveness of a new exercise program. By using inferential statistics, they can draw conclusions about the effectiveness of the program for the larger population, rather than just the individuals in the study.

In this blog post, we will cover the fundamentals of inferential statistics, including estimation and hypothesis testing. In addition, we will provide examples related to Kinesiology to illustrate the concepts and demonstrate their practical applications. By the end of this post, you will have a solid understanding of how inferential statistics can be used to make informed decisions in your research and professional practice. Whether you are a student or a professional in Kinesiology, understanding inferential statistics is essential for conducting valid research and making informed decisions.

2 Probability Concepts

Probability theory is a fundamental component of statistical inference, which involves making inferences about populations based on samples. At its core, probability theory deals with the likelihood of events and outcomes. Three key concepts are necessary to understand probability theory: sample space, events, and probabilities.

The sample space is the set of all possible outcomes of an experiment or trial. It represents the totality of potential outcomes for a given scenario. For example, in a clinical trial assessing the efficacy of a new drug, the sample space may include all possible patient outcomes, such as symptom relief or adverse side effects. Defining the sample space is critical for identifying the range of potential outcomes that could arise from the trial.

An event is a subset of the sample space that comprises a group of outcomes with a common characteristic. For example, in the context of the drug trial, an event could be defined as the subset of patients who experience symptom relief or the subset of patients who experience adverse side effects. Identifying events enables researchers to focus on particular outcomes of interest.

Probabilities are numerical values assigned to events, representing the likelihood of those events occurring. Probabilities range from 0 to 1, with 0 indicating that an event cannot occur and one indicating that the event is certain to occur. For example, in a coin flip, the probability of obtaining heads or tails is 0.5, indicating that each outcome is equally likely to occur. It is uncertain whether the event will occur for events with probabilities between 0 and 1, but the assigned probability reflects the degree of uncertainty associated with the event.

Two key concepts for understanding probability theory are mutually exclusive and independent events. Mutually exclusive events are events that cannot occur simultaneously. For example, in rolling a six-sided die, the events of rolling a two and a 4 are mutually exclusive, as it is impossible to roll both simultaneously. Independent events, on the other hand, are events where the occurrence of one event does not affect the occurrence of another. For instance, in flipping a coin and rolling a die, the events are independent, as the outcome of the coin toss has no bearing on the outcome of the die roll.

Probability theory follows certain rules, including the addition rule and the multiplication rule. The addition rule states that for mutually exclusive events, the probability of either one or the other occurring is the sum of their individual probabilities minus the probability of both occurring together. On the other hand, the multiplication rule states that the probability of both events occurring together is equal to the product of their individual probabilities for independent events.

Understanding these probability concepts is crucial for comprehending probability distributions, a fundamental aspect of statistical inference. Probability distributions allow researchers to infer population parameters based on sample statistics. We will discuss probability distributions in more detail in the following section.

3 Probability Distributions

Probability distributions are fundamental to inferential statistics and describe the likelihood or probability of observing a certain outcome or event. There are two types of probability distributions: discrete and continuous. This section will discuss these two types of probability distributions, their characteristics, and their applications in Kinesiology research.

3.1 Discrete Probability Distributions

A discrete probability distribution describes the probabilities of different outcomes of a discrete variable, which can only take certain values. The values are typically integers, and the probability distribution function gives the probabilities associated with each possible variable value. Examples of discrete probability distributions include the binomial distribution, Poisson distribution, and geometric distribution.

The binomial distribution is commonly used in kinesiology research to analyze dichotomous data, such as the number of successes and failures in a series of trials. For example, a kinesiologist might be interested in the success rate of patients undergoing a particular treatment. The binomial distribution can be used to model the probability of observing a certain number of successes or failures in a given number of trials.

The Poisson distribution is another commonly used discrete probability distribution in kinesiology research. It is used to model the number of events that occur in a given period, such as the number of injuries in a sports team over a season. The Poisson distribution assumes that the events occur randomly and independently of each other and that the rate of occurrence is constant over time.

3.2 Continuous Probability Distributions

A continuous probability distribution describes the probabilities of different outcomes of a continuous variable, which can take any value within a certain range. The probability distribution function gives the probabilities associated with different intervals of the variable. Examples of continuous probability distributions include the normal distribution, t-distribution, and chi-square distribution.

The normal distribution is the most commonly used continuous probability distribution in kinesiology research. It is often used to model continuous and normally distributed measurements, such as height, weight, and blood pressure. The normal distribution is characterized by its mean and standard deviation. It is bell-shaped, with most observations falling within one or two standard deviations of the mean.

The t-distribution is used when the sample size is small, or the population standard deviation is unknown. It is similar to the normal distribution but with slightly fatter tails. The t-distribution is commonly used in kinesiology research when comparing means between two groups.

The chi-square distribution is used to test the independence of categorical variables, such as the relationship between age and injury risk in a particular sport. It is also used to test goodness-of-fit, or whether a sample fits a certain distribution.

In conclusion, probability distributions are essential tools for inferential statistics in kinesiology research. Understanding the characteristics and applications of different types of probability distributions is crucial for conducting meaningful and valid statistical analyses.

3.3 Null Distributions

Hypothesis testing relies heavily on null distributions, which are probability distributions of a test statistic under the assumption of a true null hypothesis.

A test statistic, such as z, t, F, or chi-square, is a summary of a sample expressed as a single number. To determine a p value, the test statistic is compared to the corresponding null distribution.

The p value represents the probability of obtaining a test statistic value as extreme or more extreme than the sample’s value, assuming the null hypothesis is true. This probability can be calculated as the area under the null distribution’s probability density function curve that is equal to or more extreme than the sample’s test statistic.

Table 1: Types of tests

Distribution	Description	Statistical tests
Standard Normal	Normal distribution with mean = 0 and standard deviation = 1	Z-tests
Binomial	Discrete distribution, number of successes in n independent trials	Chi-squared test, goodness of fit
Poisson	Discrete distribution, number of events in a fixed interval of time or space	Chi-squared test, goodness of fit
Exponential	Continuous distribution, time between successive events in a Poisson process	Survival analysis
t-distribution	Continuous distribution, used for small sample sizes	t-tests
F-distribution	Continuous distribution, used in ANOVA to test equality of variances	ANOVA
Chi-Square	Probability distribution used to test hypotheses about the relationship between two categorical variables	Chi-square goodness of fit test, chi-square test of independence

4 Sampling Error

Inferential statistics relies on sampling data from a population to make conclusions about the population. However, there is always the potential for error due to the inherent variability in the sample. This is known as sampling error.

In Kinesiology research, sampling error can occur in several ways. For example, a study examining the effect of a new exercise program on cardiovascular fitness may use a convenience sample of athletes from a single gym, leading to a biased sample that may not be representative of the broader population of athletes. Similarly, a study examining the prevalence of a certain injury in a population may use self-reported data, leading to potential biases in reporting.

To reduce the impact of sampling error in inferential statistics, researchers can use various techniques, including random sampling and stratified sampling. Random sampling involves randomly selecting participants from the population without bias or preference for certain characteristics. Stratified sampling involves dividing the population into subgroups based on certain characteristics, such as age or gender, and selecting a random sample from each subgroup.

Another way to reduce sampling error is to increase the sample size. A larger sample size can reduce the impact of random sampling error by providing a more representative sample of the population. However, increasing the sample size does not necessarily eliminate systematic sampling error, and careful attention should be paid to selecting and recruiting participants.

In summary, sampling error is a potential source of error in inferential statistics that can impact the validity and generalizability of research findings in Kinesiology. Random sampling, stratified sampling, and increasing sample size can reduce the impact of sampling error. However, careful attention should be paid to selecting and recruiting participants to minimize systematic sampling error.

4.1 Levels of Confidence

When making inferences about a population based on a sample, it is important to have a measure of the uncertainty or variability in the estimate. One common way of measuring this uncertainty is by using levels of confidence. Levels of confidence represent the degree of certainty that the true population parameter falls within a certain range of values. For example, a 95% level of confidence means that if the study were repeated many times, 95% of the resulting confidence intervals would contain the true population parameter.

In Kinesiology research, levels of confidence can be used to estimate the uncertainty in sample statistics, such as means and proportions, and to make inferences about population parameters, such as population means and proportions. For example, a study examining the effect of a new exercise program on cardiovascular fitness may calculate a 95% confidence interval for the mean difference in fitness levels between the exercise group and the control group. This confidence interval provides a range of values likely to contain the true population mean difference, with 95% confidence.

The level of confidence chosen by a researcher depends on several factors, including the level of risk associated with making an incorrect inference, the sample size, and the data’s variability. Generally, higher levels of confidence, such as 99%, require larger sample sizes and lead to wider confidence intervals. It is important to note that levels of confidence do not guarantee that the true population parameter falls within the confidence interval, but rather provide a measure of the likelihood that it does. In addition, levels of confidence do not provide information about the precision of the estimate, only the degree of certainty in the range of values. In conclusion, levels of confidence provide a measure of the uncertainty in sample statistics and the likelihood that the true population parameter falls within a certain range of values. They are useful in making inferences about population parameters in Kinesiology research, but researchers should carefully consider the level of confidence chosen and the associated risks and limitations.

Table 2 shows the corresponding values for the level of confidence (LOC), Z-value, and probability (p) in a standard normal distribution for the most commonly used levels of confidence in inferential statistics: 68%, 90%, 95%, and 99%. The values in the table can be used to calculate confidence intervals and perform hypothesis tests.

Table 2: Corresponding values for Z, LOC, and p

Level of Confidence (LOC)	Z-Value	Probability (p)
68%	1.00	0.32
90%	1.64	0.10
95%	1.96	0.05
99%	2.58	0.01

4.2 Calculation

The equation to create confidence intervals for a population parameter is:

Confidence interval = point estimate ± margin of error

where the point estimate is the sample statistic that is used to estimate the population parameter, and the margin of error represents the amount of error that is likely to occur due to random sampling variation.

The margin of error is calculated as:

Margin of error = critical value x standard error

where the critical value represents the number of standard errors from the mean of the sampling distribution that corresponds to the desired level of confidence, and the standard error is a measure of the variability of the sample statistic.

The critical value is determined based on the desired level of confidence, which is typically expressed as a percentage, such as 90%, 95%, or 99%. The critical value can be obtained from a table or a statistical calculator based on the sample size and the level of confidence.

The standard error depends on the type of population parameter being estimated and the sample statistic used to estimate it. For example, the standard error for the mean is calculated as the standard deviation of the sample divided by the square root of the sample size.

Once the point estimate, critical value, and standard error are determined, the confidence interval can be calculated using the above formula. The resulting interval represents a range of values within which the true population parameter is likely to fall with a certain level of confidence.

Let’s say a group of researchers is interested in the average time it takes for athletes to complete a particular obstacle course. They randomly sample 50 athletes and record their completion times.

The sample mean completion time is 35 seconds, and the sample standard deviation is 5 seconds.

The researchers want to construct a 95% confidence interval for the population mean completion time. Using the formula for a confidence interval for a population mean, they can calculate the lower and upper bounds of the interval:

Lower bound = sample mean - (critical value * standard error) Upper bound = sample mean + (critical value * standard error)

The critical value for a 95% confidence interval with 49 degrees of freedom (n - 1) is 2.01 (based on a t-distribution table).

The standard error can be calculated as the sample standard deviation divided by the square root of the sample size:

Standard error = sample standard deviation / sqrt(sample size)

Plugging in the values, the researchers can calculate the confidence interval as follows:

Lower bound = 35 - (2.01 * (5 / sqrt(50))) = 32.22 seconds Upper bound = 35 + (2.01 * (5 / sqrt(50))) = 37.78 seconds

Therefore, the researchers can be 95% confident that the true population mean completion time falls within the interval of 32.22 to 37.78 seconds.

5 Estimation

5.1 Point estimation

Point estimation is a method of statistical inference used to estimate an unknown population parameter using a single value, called the point estimator. For example, point estimation can be used in Kinesiology to estimate parameters such as mean height, weight, or strength of a population based on a sample. For example, let us say we want to estimate the mean height of all individuals in a population of athletes. We can take a sample of athletes and calculate the sample mean as the point estimator. The sample mean is a good estimator of the population mean because it is unbiased and has an expected value equal to it.

A biased estimator, on the other hand, would be systematically off from the true population parameter. However, point estimators are not always perfect and can vary from sample to sample, leading to sampling error. Therefore, it is important to evaluate the properties of a point estimator to determine its effectiveness. The properties of a good point estimator include the following: Unbiasedness: the estimator has an expected value equal to the population parameter. Efficiency: the estimator has a small variance and is more precise than other estimators. Consistency: the estimator becomes closer to the population parameter as the sample size increases. For instance, we want to estimate the maximum oxygen uptake (VO2max) of all collegiate athletes in the United States using a sample of 100 athletes. The sample mean VO2 max of the athletes is calculated as 60 ml/kg/min. Therefore, we can use the sample mean as the point estimator of the population mean VO2 max. If we take more samples and calculate the mean for each sample, the mean of all means will converge to the population mean VO2max as the sample size increases.

5.2 Interval Estimation

Interval estimation is a method of statistical inference used to estimate an unknown population parameter using an interval of values called the confidence interval. In Kinesiology, confidence intervals can be used to estimate various parameters, such as the difference in means between two groups or the effect size of an intervention. For example, we want to estimate the difference in mean jump height between male and female athletes. We can take a sample of male and female athletes, calculate the sample mean jump height for each group, and calculate the difference in means as the point estimate. However, we also want to know how confident we can be that the true difference in means falls within a certain range. This is where interval estimation comes in. We can construct a confidence interval for the difference in means using the sample means, sample variances, and sample sizes for each group. The confidence interval is a range of values likely to contain the population parameter with a certain degree of confidence, usually expressed as a percentage. For example, a 95% confidence interval for the difference in means would indicate that we are 95% confident that the true difference in means falls within the interval. Confidence intervals can also test hypotheses about the population parameter by checking whether the hypothesized value falls within the interval. For instance, if the null hypothesis is that there is no difference in mean jump height between male and female athletes, we can check whether the hypothesized value of zero falls within the 95% confidence interval. If it does not, we can reject the null hypothesis at the specified level of significance.

6 Hypothesis Testing

Hypothesis testing is a method of statistical inference used to make decisions about a population parameter based on sample data. For example, Kinesiology often uses hypothesis testing to test the efficacy of interventions or treatments, compare groups of athletes or patients, or examine relationships between variables. The process involves stating a null hypothesis and an alternative hypothesis, collecting sample data, and using statistical tests to determine whether the sample data provides enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

6.1 \(H_0\) and \(H_1\) Hypotheses

The null hypothesis (\(H_0\)) is a statement that assumes the population parameter is equal to a specific value or falls within a specific range. The alternative hypothesis (\(H_1\)) is a statement that contradicts the null hypothesis and assumes that the population parameter is different from the value or range specified by the null hypothesis. We want to test whether a new training program increases the mean strength of athletes.

The null hypothesis would be that the mean strength of athletes who follow the training program is equal to that of athletes who do not. The alternative hypothesis would be that the mean strength of athletes who follow the program is greater than that of athletes who do not. Another example could be testing whether there is a significant difference in muscle strength between male and female athletes, with the null hypothesis stating that there is no difference and the alternative hypothesis stating that there is a difference.

6.2 Test Statistics and p-values

Test statistics measure the distance between the sample statistic and the hypothesized population parameter under the null hypothesis. The most commonly used test statistic is the t-test, which compares the means of two groups. In addition, tests such as ANOVA and regression analysis can be used for more complex comparisons. The p-value is the probability of observing a test statistic as extreme or more extreme than the one obtained from the sample data, assuming the null hypothesis is true. A small p-value (usually less than 0.05) indicates that the sample data provides strong evidence against the null hypothesis and favors the alternative hypothesis. For example, suppose we conduct a study to test the effect of a new exercise program on reducing knee pain in patients with osteoarthritis. We recruited 50 patients and randomly assigned them to either the exercise or control groups. We measured their knee pain before and after the intervention and used a t-test to compare the mean changes between the two groups. If we obtain a t-value of 2.5 and a p-value of 0.01, the probability of observing a t-value as extreme as 2.5 or more extreme, assuming the null hypothesis is true, is 0.01 or less. Since the p-value is less than 0.05, we can reject the null hypothesis and conclude that the exercise program effectively reduces knee pain in patients with osteoarthritis.

6.3 Type I and Type II Errors

Hypothesis testing involves making two types of errors: type I and type II. A type I error occurs when the null hypothesis is rejected even though it is true. This is also known as a false positive. A type II error occurs when the null hypothesis is not rejected even though it is false. This is also known as a false negative. In Kinesiology, type I errors can be particularly concerning when testing the efficacy of a new treatment or intervention. For example, suppose a new drug is tested for its effectiveness in improving muscle strength, and the null hypothesis is incorrectly rejected, leading to a false positive result. In that case, the drug may be prescribed to patients who do not benefit from it. On the other hand, a type II error can lead to a missed opportunity to identify an effective treatment or intervention.

6.3.1 Causes of Error

^[weir2021?] list five possible causes of Type I Error and four of Type II Error. These are:

Type I Errors

Measurement error
Lack of random sample
α value too liberal (α = .10)
Investigator bias
Improper use of one-tailed test

Type II Errors

Measurement error
Lack of sufficient power (N too small)
α value too conservative (α = .01)
Treatment effect not properly applied

Table 3 shows the possible outcomes of a hypothesis test based on whether the null hypothesis is true or false and whether the test correctly rejects or fails to reject the null hypothesis. A Type I error occurs when the null hypothesis is actually true, but the test incorrectly rejects it. A Type II error occurs when the null hypothesis is actually false, but the test incorrectly fails to reject it.

Table 3: Hypothesis testing and error

	Null Hypothesis is True	Null Hypothesis is False
Reject Null Hypothesis	Type I Error (False Positive)	Correct Decision (True Positive)
Fail to Reject Null Hypothesis	Correct Decision (True Negative)	Type II Error (False Negative)

6.4 Effect Size & Power Analysis

Effect size measures the strength of the relationship between two variables or the magnitude of the difference between two groups. It is commonly used in hypothesis testing to supplement the p-value and provide a more meaningful interpretation of the results. In Kinesiology, the effect size can be used to determine the practical significance of a study’s findings.

Power analysis is a statistical technique used to determine the sample size needed to detect a certain effect size with a certain level of statistical power. Statistical power is the probability of correctly rejecting the null hypothesis when it is false.

In Kinesiology, power analysis can be used to plan sample size for a study to ensure adequate statistical power and reduce the risk of type II errors. For example, suppose we conduct a study to test the effect of a new stretching protocol on increasing flexibility in athletes. We may use a power analysis to determine the minimum sample size needed to detect a significant effect size with 80% power. This can help ensure that our study is adequately powered to detect a true effect and avoid a type II error. In summary, hypothesis testing is an important tool for making decisions based on sample data in Kinesiology research. It involves stating a null and alternative hypothesis, using test statistics and p-values to determine whether to reject the null hypothesis, and considering the potential for type I and type II errors. Effect size and power analysis can supplement hypothesis testing and provide a more comprehensive understanding of a study’s findings’ practical significance and feasibility.

6.5 One- and Two-Tailed Tests

When conducting a statistical test, researchers typically choose between a one-tailed or a two-tailed test based on the research question and hypotheses.

A one-tailed test, or a directional test, is used when the research question involves a specific directional prediction about the relationship between the variables. For example, in Kinesiology, a researcher might hypothesize that a certain training intervention will improve athletic performance. The one-tailed test would be used to determine whether the intervention significantly improves performance (one direction) or not (the other direction).

Figure 1: Distribution of α rejection area for a One-tailed test.

A two-tailed test, also known as a non-directional test, is used when the research question involves a non-specific directional prediction about the relationship between the variables. For example, in Kinesiology, a researcher might hypothesize that a certain type of exercise produces a difference in strength between groups. The two-tailed test would be used to determine whether there is a significant difference in strength between the groups, regardless of the direction of the effect.

Figure 2: Distribution of α rejection area for a two-tailed test

In a one-tailed test, the critical region for rejecting the null hypothesis is located in only one tail of the distribution, either the upper tail or the lower tail, depending on the direction of the prediction. In a two-tailed test, the critical region for rejecting the null hypothesis is located in both tails of the distribution, representing extreme values in either direction.

The choice of a one-tailed or two-tailed test has important implications for the statistical test type and the significance level used to make the decision. A one-tailed test has more power to detect a significant effect in a specific direction, but it also has a higher risk of a Type I error if the direction of the effect is incorrect. A two-tailed test has less power to detect a significant effect but is more conservative and less likely to make a Type I error.

In summary, one-tailed tests are used when the research question involves a specific directional prediction. Two-tailed tests are used when the research question involves a non-specific directional prediction. The choice between one-tailed and two-tailed tests has implications for the statistical test’s power and level of significance. Therefore, it should be carefully considered based on the research question and hypotheses.

The plot below shows the normal distribution with the rejection area shaded in red on both sides of the distribution, representing the α level for a two-tailed test. The critical values are indicated by the vertical red lines.

6.6 The Choice

The choice between a one-tailed and two-tailed test depends on the research question and the hypothesis being tested. When the research question is directional, or the hypothesis specifies the direction of the effect, a one-tailed test is appropriate. When the research question is non-directional, or the hypothesis does not specify the direction of the effect, a two-tailed test is appropriate. It is important to choose the appropriate test to ensure the validity and accuracy of the results.

7 Summary

n conclusion, inferential statistics is a vital tool for researchers and practitioners in kinesiology. Through the use of probability distributions, hypothesis testing, and estimation, we can draw conclusions about a population based on a sample. The selection of appropriate statistical tests and methods is critical for ensuring that our results are accurate and reliable. Additionally, understanding the potential for sampling error and the implications of type I and type II errors is essential in the interpretation of statistical results. Finally, knowing the difference between one-tailed and two-tailed tests and the corresponding rejection regions allows us to make informed decisions when conducting hypothesis tests. With these fundamental concepts and tools at our disposal, we can confidently analyze data and draw meaningful conclusions that can advance our understanding of human movement and performance in kinesiology.

7.0.1 Image credit

7.0.2 Image credit

Illustration by Elisabet Guba from Ouch!

Reuse

CC BY-NC 4.0

Citation

BibTeX citation:

@misc{furtado2023,
  author = {Furtado, Ovande},
  title = {Fundamentals of {Inferential} {Statistics}},
  date = {2023-02-25},
  url = {https://drfurtado.github.io/randomstats/posts/022523-inferential-stats/},
  langid = {en}
}

For attribution, please cite this work as:

1. Furtado, O. (2023, February 25). Fundamentals of Inferential Statistics. RandomStats. https://drfurtado.github.io/randomstats/posts/022523-inferential-stats/