Descriptive Statistics

This post aims to provide master-level students in kinesiology with a strong foundation in descriptive statistics, including measures of central tendency and variability, the coefficient of variation, and tips for presenting and interpreting results.

mean
median
mode
variance
standard deviation
interquartile range
range
coefficient of variation
Author
Affiliation

Cal State Northridge

Published

February 18, 2023

1 Learning Objectives

  1. Define descriptive statistics and explain their importance in data analysis.
  2. Describe the different measures of central tendency, including the mean, median, and mode, and when to use them.
  3. Explain the different measures of variability, such as the range, interquartile range, variance, and standard deviation, and how they are used.
  4. Provide examples of how descriptive statistics can be used in Kinesiology research and practice, such as analyzing exercise performance data or comparing outcomes of different interventions.
  5. Discuss the limitations of descriptive statistics and when inferential statistics should be used instead.
  6. Introduce the coefficient of variation as a measure of relative variability and how it can be used to compare the variability of different data sets.
  7. Demonstrate how to calculate descriptive statistics using statistical software, such as jamovi.
  8. Provide tips and best practices for presenting and interpreting descriptive statistics in research papers and presentations.
  9. Address common misconceptions or challenges related to descriptive statistics and how to overcome them.
  10. Encourage readers to continue learning about descriptive statistics and their applications in Kinesiology research and practice.

2 Symbols

Measure Symbol
Mean (population) \(\mu\)
Mean (sample) \(\bar{x}\)
Median \(med\)
Mode \(mode\)
Range \(R\)
Interquartile Range \(IQR\)
Variance (population) \(\sigma^2\)
Variance (sample) \(s^2\)
Standard Deviation (population) \(\sigma\)
Standard Deviation (sample) \(s\)

3 Measures of Central Tendency

As a master’s student in Kinesiology, you are likely already familiar with the importance of collecting and analyzing data in your research. However, more is needed to present the raw data in a table or graph when analyzing data. Furthermore, to truly understand your data and draw meaningful conclusions, you need to use various statistical tools, including measures of central tendency. Measures of central tendency are statistical values representing the central characteristics of a data set[weir2021?]. These measures are essential for summarizing data and providing a snapshot of the overall distribution. For example, in Kinesiology, you may use measures of central tendency to describe the performance of athletes, the prevalence of certain conditions or injuries, or the effectiveness of different interventions.

There are three main measures of central tendency: the mean, the median, and the mode. The mean is the arithmetic average of a data set, while the median is the middle value when the data is arranged in order. The mode is the most frequently occurring value. Each measure has advantages and disadvantages, and the appropriate measure depends on the specific data and research question. It is worth noting that measures of central tendency should not be used in isolation. Instead, they should be used in conjunction with other statistical tools, such as measures of variability, to provide a comprehensive picture of the data. For example, a study that reports only the mean without reporting the standard deviation or range may be misleading and potentially lead to incorrect conclusions. Furthermore, measures of central tendency are not appropriate for all types of data. For example, if the data is skewed or contains extreme outliers, the mean may not accurately represent the “typical” value. In these cases, the median or mode may be more appropriate.

In this section, we will explore the different measures of central tendency in more detail. We will discuss how to calculate each measure, their properties and limitations, and when to use each measure in practice. We will also cover some common mistakes to avoid when using measures of central tendency and provide best practices for selecting and interpreting these measures. By the end of this article, you should have a solid understanding of how to use measures of central tendency to describe and analyze data in Kinesiology research.

3.1 Mean

The mean is the most commonly used measure of central tendency in Kinesiology research. It is the arithmetic average of a data set and is calculated by adding up all the values and dividing by the total number of values[weir2021?]. The mean is a useful measure because it considers all the values in a data set and is sensitive to small changes in the data.

In Kinesiology, you may use the mean to describe the average performance of a group of athletes, the average length of time it takes for an injury to heal, or the average dose of a medication administered to patients. However, it is important to remember that the mean is not always the most appropriate measure of central tendency, especially when the data is skewed or contains extreme outliers.

The mean is the arithmetic average of a data set and is calculated by adding up all the values and dividing by the total number of values.

One of the critical properties of the mean is that extreme values or outliers can influence it. For example, if you have a data set of the time it takes for athletes to complete a race, and one athlete takes significantly longer than the others, their time will have a disproportionate effect on the mean. In these cases, it may be more appropriate to use the median, which is not as sensitive to extreme values.

In Kinesiology, you might use the mean to represent the average performance of athletes, such as the average running speed of a group of runners or the average strength gains of a group of weightlifters after a training intervention. The mean can also be used to describe the distribution of a variable, such as the mean body mass index (BMI) of a sample.

To calculate the mean, follow these steps:

  1. Add up all the values in the dataset.

  2. Count the number of observations in the dataset.

  3. Divide the total sum by the number of observations.

For example, suppose you have the following data representing the time (in seconds) it takes for athletes to complete a sprint:

12.5, 10.8, 11.2, 13.1, 12.9, 11.7, 12.3

To calculate the mean, add up all the values in the dataset:

12.5 + 10.8 + 11.2 + 13.1 + 12.9 + 11.7 + 12.3 = 84.5

Then, count the number of observations in the dataset: 7

Finally, divide the total sum by the number of observations to get the mean: 84.5 / 7 = 12.07 seconds.

So the mean time to complete the sprint is 12.07 seconds.

In Kinesiology research, you may use statistical software to calculate the mean for larger datasets. One example of such software is jamovi, a free and open-source statistical package. Here are the steps to calculate the mean using jamovi:

  1. Open jamovi and create a new dataset or import an existing dataset.

  2. Click on the “Descriptives” button in the toolbar.

  3. Select the variable(s) for which you want to calculate the mean and click on the “Add” button.

  4. Click on the “Compute” button to generate the results. The mean will be displayed in the “Mean” column of the output table.

For example, suppose you have a dataset with the following variables: age, weight, height, and maximum bench press. Then, to calculate the mean for the maximum bench press variable using jamovi, follow these steps:

  1. Open jamovi and create a new dataset or import an existing dataset that contains the maximum benchpress variable.

  2. Click on the “Descriptives” button in the toolbar.

  3. Select the “maximum bench press” variable and click on the “Add” button.

  4. Click on the “Compute” button to generate the results. The mean for the maximum bench press variable will be displayed in the “Mean” column of the output table.

Calculating the mean is a simple and important tool in Kinesiology research. It can help you understand a group’s average performance or characteristics and inform future interventions or studies.

The formula for calculating the mean is:

Mean = (sum of all values in the dataset) / (number of observations)

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]where, \(n\) is the total number of values in the set, \(x_i\) is the \(i\)th value in the set, and \(\sum_{i=1}^{n}\) represents the sum of all the values from \(i=1\) to \(i=n\).

3.2 Median

The median is another commonly used measure of central tendency in Kinesiology research. It represents the middle value in a set of data when the values are arranged in order of magnitude[weir2021?] Unlike the mean, the median is not influenced by extreme values, making it a robust measure of central tendency. However, it may not be as sensitive to small changes in the data as the mean.

In Kinesiology, you might use the median to represent the typical or “average” value when the data contains extreme values that can skew the mean, such as when analyzing the salaries of professional athletes, where a few individuals may earn significantly more than the rest of the population.

The median represents the middle value in a set of data when the values are arranged in order of magnitude.

To calculate the median, follow these steps:

First, arrange the data in order of magnitude.

If the number of observations is odd, the median is the middle value. If the number of observations is even, the median is the average of the two middle values.

For example, suppose you have the following data representing the time (in seconds) it takes for athletes to complete a sprint:

10.8, 11.2, 11.7, 12.3, 12.5, 12.9, 13.1

To calculate the median, first arrange the data in order of magnitude:

10.8, 11.2, 11.7, 12.3, 12.5, 12.9, 13.1

The number of observations is odd, so the median is the middle value, which is 12.3 seconds.

In Kinesiology research, you may use statistical software to calculate the median for larger datasets. One example of such software is jamovi, a free and open-source statistical package. Here are the steps to calculate the median using jamovi:

  1. Open jamovi and create a new dataset or import an existing dataset.
  2. Click on the “Descriptives” button in the toolbar.
  3. Select the variable(s) for which you want to calculate the median and click on the “Add” button.
  4. Click on the “Compute” button to generate the results. The median will be displayed in the “Median” column of the output table.

For example, suppose you have a data set with the following variables: age, weight, height, and maximum bench press. Then, to calculate the median for the maximum bench press variable using jamovi, follow these steps:

Open jamovi and create a new data set or import an existing data set that contains the maximum bench-press variable.

Click on the “Descriptives” button in the toolbar.

Select the “maximum bench press” variable and click on the “Add” button.

Click on the “Compute” button to generate the results. The median for the maximum bench press variable will be displayed in the “Median” column of the output table.

Calculating the median is important in Kinesiology research when the data contains extreme values that can skew the mean. It can help you understand a group’s typical value or performance and inform future interventions or studies.

3.3 Mode

The mode is the value that occurs most frequently in a data set. It is another measure of central tendency that can be useful in Kinesiology research. The mode is often used with categorical or nominal data[huck2004?], such as the number of times a certain exercise is performed in a week or the type of injury sustained in a particular sport.

In Kinesiology, the mode can help researchers to identify the most common value or category in a data set, which can inform interventions or training programs. For example, suppose a study on exercise habits in older adults finds that the mode for the number of times per week that individuals engage in moderate-intensity exercise is three. In that case, this information can tailor an exercise program to the needs of the participants.

The mode is the value that occurs most frequently in a data set.

To calculate the mode, identify the value or category that occurs most frequently in the data set. In some cases, there may be more than one or no mode.

For example, suppose you have the following data representing the number of hours of sleep that athletes report getting each night:

7, 6, 7, 8, 8, 8, 6, 7, 7, 9

In this case, the mode is 7 hours of sleep per night, as this value occurs most frequently in the data set.

In Kinesiology research, you may use statistical software to calculate the mode for larger data sets. For example, here are the steps to calculate the mode using jamovi:

  1. Open jamovi and create a new data set or import an existing data set.
  2. Click on the “Descriptives” button in the toolbar.
  3. Select the variable(s) you want to calculate the mode and click on the “Add” button.
  4. Click on the “Compute” button to generate the results. The mode(s) will be displayed in the “Mode” column of the output table.

For example, suppose you have a data set with the following variables: age, weight, height, and sport played. Then, to calculate the mode for the “sport played” variable using jamovi, follow these steps:

  1. Open jamovi and create a new data set or import an existing one containing the “sport played” variable.

  2. Click on the “Descriptives” button in the toolbar.

  3. Select the “sport played” variable and click the “Add” button.

  4. Click on the “Compute” button to generate the results. The mode(s) for the “sport played” variable will be displayed in the “Mode” column of the output table.

When working with categorical or nominal data, calculating the mode can be useful in Kinesiology research. It can help you identify the most common value or category, which can inform interventions or training programs. However, it is important to note that the mode may not always be the most appropriate measure of central tendency, particularly when the data set is skewed or contains extreme values. In these cases, the mean or median may be more appropriate measures of central tendency.

3.4 Comparing the Measures of Central Tendency

One way to decide which measure of central tendency to use is to consider the distribution of the data. If the data are normally distributed and have no extreme values, the mean may be the most appropriate measure of central tendency. However, if the data are skewed or have extreme values, the median or mode may be more appropriate.

For example, consider a study on the effects of a training program on sprint times. The data set includes the following times for a group of athletes:

9.5, 10.1, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11.0

In this case, the mean would be an appropriate measure of central tendency because the data are normally distributed and have no extreme values. The mean, in this case, is 10.55 seconds.

However, consider another study on the effects of a training program on strength gains. The dataset includes the following gains in one-rep maximum for a group of athletes:

10, 15, 20, 25, 30, 35, 40, 45, 50, 200

In this case, the median would be a more appropriate measure of central tendency because the data are skewed due to the extreme value of 200. The median, in this case, is 30 pounds.

In some cases, the mode may be the most appropriate measure of central tendency. For example, consider a study on the number of times a certain exercise is performed in a week. The dataset includes the following responses:

1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 7, 7, 7, 7

In this case, the mode would be the most appropriate measure of central tendency because the data are categorical, and the frequency of each category is of interest. The mode, in this case, is 7, indicating that the exercise is most commonly performed 7 times per week.

It is important to note that the choice of measure of central tendency can affect the interpretation of the data. For example, if the mean is used as the measure of central tendency for a skewed data set, the resulting value may not accurately represent the typical value in the data set. Therefore, it is important to carefully consider the distribution of the data and choose the appropriate measure of central tendency for the research question at hand.

In addition to considering the distribution of the data, Kinesiology researchers should also consider other factors such as the sample size, the level of measurement of the data, and the research question. By carefully considering these factors, researchers can choose the most appropriate measure of central tendency and ensure that their analysis accurately represents the underlying data.

The table below provides a quick summary of the definitions, calculations, and usefulness of the mean, median, and mode. It can be a helpful reference for Kinesiology researchers when deciding which measure of central tendency to use for a particular data set.

Measure of Central Tendency Definition Calculation Usefulness
Mean The average of a set of numbers Sum of values divided by number of values Useful for data that are normally distributed and have no extreme values
Median The middle value in a set of numbers Order values and find the middle value Useful for data with extreme values or that are not normally distributed
Mode The value that occurs most frequently in a set of numbers Identify the value that appears most often Useful for categorical or nominal data

4 Measures of variability

When conducting research in the field of Kinesiology, it is essential to collect and analyze data in order to draw meaningful conclusions. However, simply collecting data is not enough. Researchers must also be able to effectively analyze and interpret the data they have collected in order to draw conclusions that are meaningful and accurate[weir2021?]. One of the key aspects of data analysis is understanding measures of variability.

Measures of variability are important because they help us understand how much the individual data points in a data set vary from one another. In other words, they help us understand how spread out the data is. The measures of variability that we will be discussing in this blog include variance, standard deviation, range, and interquartile range.

Variance and standard deviation are measures of how far the data points in a data set are from the mean. The range is a simple measure of the spread of the data, while the interquartile range is a more robust measure of the spread that is not influenced by extreme values in the data set.

Understanding measures of variability is important because they can help researchers understand the precision and accuracy of their results. For example, a small standard deviation indicates that the data points are close to the mean, while a large standard deviation indicates that the data points are spread out. Understanding these measures can help researchers better interpret the meaning of their results and draw more meaningful conclusions.

Measures of variability help us understand how much the individual data points in a data set vary from one another.

In this section, we will discuss each of these measures of variability in detail, including how to calculate them, how to interpret them, and how to choose the appropriate measure of variability based on the research question, data distribution, and sample size. By the end of this blog, you will have a solid understanding of these measures of variability and be able to effectively apply them to your own research in the field of Kinesiology.

4.1 Range

The range is the simplest measure of variability and is calculated as the difference between the largest and smallest values in a data set[huck2004?]. The range is useful because it gives an idea of the spread of the data, but it has some limitations.

The main limitation of the range is that it is sensitive to outliers or extreme values that are much higher or lower than the rest of the data[weir2021?]. The range may be very large if a data set has one or more extreme values, even if most of the data is tightly clustered. Therefore, the range should always be interpreted in conjunction with other measures of variability, such as the standard deviation.

The range is the simplest measure of variability and is calculated as the difference between the largest and smallest values in a data set.

To calculate the range of a dataset, we subtract the smallest value from the largest value. For example, if we have the following set of data:

10, 15, 18, 20, 22, 25

The range is:

25 - 10 = 15

To calculate the range using jamovi, we can follow these steps:

  1. Open jamovi and create a new dataset or import an existing one.
  2. Click on the “Descriptives” button in the toolbar and select “Descriptives”.
  3. Select the variable(s) you want to analyze and click on the arrow to move them to the “Variables” box.
  4. Check the “Minimum” and “Maximum” boxes under “Statistics”.
  5. Click “OK” to generate the output, including the minimum and maximum values for each selected variable.
  6. Subtract the minimum from the maximum value to obtain the range.

In the example above, we can use jamovi to obtain the minimum and maximum values for the reaction time data set we used in the previous sections. The output shows that the minimum value is 196 ms and the maximum value is 454 ms. Therefore, the range of this data set is as follows:

454 - 196 = 258 ms

Overall, the range is a simple measure of variability that can provide an idea of the spread of the data. However, it should be interpreted with caution, especially when there are outliers. It is often used in conjunction with other measures of variability, such as the standard deviation, to provide a more complete picture of the data set.

4.2 Interquartile Range

The interquartile range (IQR) is a measure of variability that is less sensitive to outliers than the range[weir2021?]. It is based on the concept of quartiles, which divide a data set into four equal parts. The IQR is the difference between the upper and lower quartiles and represents the spread of the middle 50% of the data.

To calculate the IQR, we first need to find the values of the first quartile (Q1) and the third quartile (Q3). The first quartile is the value that separates the bottom 25% of the data from the top 75%, while the third quartile separates the bottom 75% of the data from the top 25%.

The interquartile range (IQR) is a measure of variability that is less sensitive to outliers than the range .

Once we have found Q1 and Q3, we can calculate the IQR as follows:

$$
\text{IQR} = Q_3 - Q_1
$$

where $Q_1$ is the first quartile and $Q_3$ is the third quartile.

The IQR is a useful measure of variability because it provides information about the range of the middle 50% of the data, where most of the data are typically located. In addition, it is less sensitive to outliers than the range, which makes it a more robust measure of variability.

To calculate the IQR using jamovi, we can follow these steps:

  1. Open jamovi and create a new dataset or import an existing one.
  2. Click on the “Descriptives” button in the toolbar and select “Descriptives”.
  3. Select the variable(s) you want to analyze and click on the arrow to move them to the “Variables” box.
  4. Check the “Quartiles” box under “Statistics”.
  5. Click “OK” to generate the output, which will include the values of Q1, Q2, and Q3 for each selected variable.
  6. Subtract Q1 from Q3 to obtain the IQR.

For example, suppose we have the following data set:

15, 18, 20, 22, 25, 30, 35, 40, 45, 50, 55

We can use jamovi to calculate the quartiles of this data set. For example, the output shows that Q1 is 20, Q2 (the median) is 30, and Q3 is 45. Therefore, the IQR is:

IQR = Q3 - Q1 = 45 - 20 =25

Overall, the interquartile range is a useful measure of variability that is less sensitive to outliers than the range. It provides information about the spread of the middle 50% of the data. It is often used in conjunction with other measures of variability, such as the standard deviation, to provide a more complete picture of the data set.

4.3 Variance

Variance is a measure of how much the individual data points in a data set deviate from the mean or average of the data set[weir2021?]. In other words, it gives us an idea of how much the data is spread out from the central tendency.

The formula for variance is:

\[s^2 = \frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n-1}\]

where, \(s^2\) represents the sample variance, \(n\) is the number of observations, \(x_i\) is the value of the \(i\)th observation, and \(\bar{x}\) is the sample mean.

Variance is a measure of how much the individual data points in a data set deviate from the mean or average of the data set.

Let us take an example to illustrate how to calculate the variance. Suppose we have a sample dataset of 5 participants’ reaction times in milliseconds:

141, 150, 155, 132, 145

First, we calculate the mean:

\(\mu\) = (141 + 150 + 155 + 132 + 145) / 5 = 144.6

Next, we subtract the mean from each data point and square the result:

(141 - 144.6)² = 12.96

(150 - 144.6)² = 29.16

(155 - 144.6)² = 108.16

(132 - 144.6)² = 158.76

(145 - 144.6)² = 0.16

Then, we sum up these values:

12.96 + 29.16 + 108.16 + 158.76 + 0.16 = 309.2

Finally, we divide this sum by n - 1 (4 in this case, since we are dealing with a sample):

variance (\(s^2\)) = 309.2 / 4 = 77.3

Therefore, the variance of this data set is 77.3 ms².

While the variance is a useful measure of variability, its unit of measurement is squared, which can be difficult to interpret. To make the variance more interpretable, we can take the square root of it to obtain the standard deviation, which we will cover in the next section.

To calculate the variance using jamovi, we can follow these steps:

  1. Open jamovi and create a new data set or import an existing one.
  2. Click on the “Descriptives” button in the toolbar and select “Descriptives”.
  3. Select the variable(s) you want to analyze and click on the arrow to move them to the “Variables” box.
  4. Check the “Variance” box under “Statistics”.
  5. Click “OK” to generate the output, including the variance for each selected variable.

Overall, variance is a powerful tool that allows us to quantify the variability of our data, which can provide insights into individual differences or performance changes over time.

4.4 Standard Deviation

Standard deviation is a measure of how much the individual data points in a data set deviate from the mean or average of the data set[thorne2003?]. The variance’s square root makes it a more interpretable measure of variability[weir2021?].

The formula for standard deviation is:

$$
s = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}}
$$

where, \(s\) represents the sample standard deviation, \(n\) is the number of observations, \(x_i\) is the value of the \(i\)th observation, and \(\bar{x}\) is the sample mean.

Let us continue with the example we used for calculating the variance in the previous section. We found that the variance of the reaction time data set was 77.3 ms². To calculate the standard deviation, we take the square root of the variance:

Standard deviation is a measure of how much the individual data points in a data set deviate from the mean or average of the data set.

standard deviation (\(s\)) = √77.3 = 8.8 ms

Therefore, the standard deviation of this data set is 8.8 ms, which is a more interpretable measure of the spread of the data than the variance because the former is aligned to the original units of the data set.

The standard deviation has useful properties that make it a preferred measure of variability in many contexts. For example, it is used to calculate confidence intervals, which provide a range of plausible values for the population mean based on a sample mean and standard deviation. It is also used in many statistical tests, such as t-tests and ANOVA, to determine whether there are significant differences between groups or conditions.

To calculate the standard deviation using jamovi, we can follow these steps:

  1. Open jamovi and create a new dataset or import an existing one.
  2. Click on the “Descriptives” button in the toolbar and select “Descriptives”.
  3. Select the variable(s) you want to analyze and click on the arrow to move them to the “Variables” box.
  4. Check the “Standard deviation” box under “Statistics”.
  5. Click “OK” to generate the output, including the standard deviation for each selected variable.

Overall, the standard deviation is a useful tool that allows us to quantify our data’s variability more interpretably than the variance. It can provide insights into individual differences, performance changes over time, and the significance of group differences.

4.5 Coefficient of Variation

The coefficient of variation (CV) is a statistical measure used to represent a set of data’s relative variation or dispersion[bowen2016?]. It is particularly useful for comparing the variability of data sets with different units or scales. For example, in Kinesiology, the CV can be used to compare the variability of different measurements or variables, such as the variability of running times among athletes of different ages or the variability of muscle strength among individuals with different body compositions.

The CV is defined as the ratio of the standard deviation (\(s\)) to the mean (\({\bar{x}})\) of the data, expressed as a percentage. The formula for calculating the CV is as follows:

The coefficient of variation (CV) is a statistical measure used to represent a set of data’s relative variation or dispersion.

$$
CV = \frac{s}{\bar{x}} \times 100%
$$

where \(s\) is the standard deviation of the sample and \(\bar{x}\) is the mean of the sample. The coefficient of variation is expressed as a percentage.

For example, let us say we have two groups of athletes, Group A and Group B, and we want to compare the variability of their running times. The mean running time for Group A is 10 seconds, and the SD is 1 second, while the mean running time for Group B is 12 seconds, and the SD is 2 seconds. To calculate the CV for each group, we use the formula above:

CV(Group A) = (1 / 10) x 100% = 10%

CV(Group B) = (2 / 12) x 100% = 16.7%

From this calculation, we can see that the variability of running times in Group B is higher than in Group A, as the CV for Group B is higher than the CV for Group A.

The CV is useful because it allows us to compare the variability of data sets with different scales or units. For example, we want to compare the variability of body weight and bench press strength in a group of athletes. Body weight might be measured in kilograms, while bench press strength might be measured in pounds. It would not be meaningful to compare the absolute standard deviations of these two measurements, but the CV allows us to compare their relative variability.

The CV can also identify outliers or data points that are particularly variable compared to the rest of the data. Generally, a CV of less than 15% indicates low variability, while a CV of greater than 30% indicates high variability.

The coefficient of variation is a useful statistical measure that allows us to compare the relative variability or dispersion of different data sets, even when they have different scales or units. It is a valuable tool for analyzing and interpreting data in Kinesiology research and can help researchers make meaningful comparisons between different groups or variables.

4.6 Comparing Measures of Variability

When comparing measures of variability, it is essential to consider the characteristics of the data and the research question. For example, the range and IQR are useful when the data is not normally distributed or when the researcher is interested in identifying outliers. On the other hand, variance and standard deviation are useful for normally distributed data and can provide more information about the spread of the distribution. It is also important to consider the units of the data when choosing a measure of variability.

For example, let us say we have the following data:

10, 12, 14, 16, 18, 20

The range is 10 (20 - 10), the IQR is 6 (Q3 - Q1 = 18 - 12), the variance is 8, and the standard deviation is approximately 2.83. In this case, the range and IQR provide similar information about the spread of the data. In contrast, the variance and standard deviation provide more detailed information about the differences between each data point and the mean. Therefore, if we were comparing the spread of two sets of data with different units, such as the weight and height of athletes, it would be more appropriate to use the coefficient of variation, which is the ratio of the standard deviation to the mean expressed as a percentage.

The table below provides a quick summary of the calculations, interpretation, advantages and disadvantages of the range, interquartile range, variance, and standard deviation. It can be a helpful reference for Kinesiology researchers when deciding which measure of variability to use for a particular data set.

Measure of Variability Calculation Interpretation Advantages Disadvantages
Range Maximum value - Minimum value The spread of the data from the smallest to the largest value Easy to understand Sensitive to outliers
Interquartile Range Q3 - Q1 The range between the first quartile and the third quartile Resistant to outliers Not sensitive to extreme values that fall outside the range of the interquartile
Variance \(s^2 = \frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n-1}\) A measure of how much the data deviates from the mean Widely used and well known Can be sensitive to outliers
Standard Deviation \(s = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}}\) A measure of the amount of variation or dispersion of a set of values from the mean Widely used and well known Can be sensitive to outliers; more difficult to interpret than the range or IQR

5 Summary

Descriptive statistics are an essential component of data analysis and are used to summarize and describe the important features of a data set. Measures of central tendency, such as the mean, median, and mode, provide information about the typical or central value of a data set, while measures of variability, such as the range, interquartile range, variance, and standard deviation, describe how spread out the data is. These measures can be used to gain insights into the data, identify patterns, and make comparisons across different groups or data sets.

It’s important for students in fields such as Kinesiology to understand these descriptive statistics and how they can be used to analyze and interpret data. With this knowledge, they can effectively communicate their findings and make informed decisions based on the results of their analyses.

Moreover, it’s essential to note that descriptive statistics can only be used to summarize and describe the data, and cannot be used to make generalizations or predictions about a larger population. Inferential statistics are necessary for these purposes and involve using statistical methods to draw conclusions from a sample and generalize them to a larger population.

Image credit

Illustration by Elisabet Guba from Ouch!

Reuse

Citation

BibTeX citation:
@misc{furtado2023,
  author = {Furtado, Ovande},
  title = {Descriptive {Statistics}},
  date = {2023-02-18},
  url = {https://drfurtado.github.io/randomstats/posts/02162023-descriptive-statistics/},
  langid = {en}
}
For attribution, please cite this work as:
1. Furtado, O. (2023, February 18). Descriptive Statistics. RandomStats. https://drfurtado.github.io/randomstats/posts/02162023-descriptive-statistics/