Statistical Tests

Parametric tests

Parametric tests are a class of statistical tests that make assumptions about the underlying probability distribution of the data being analyzed. These assumptions include that the data are sampled from a population that follows a normal distribution, and that the sample size is large enough to make these assumptions valid. The most commonly used parametric tests include t-tests, ANOVA (analysis of variance) and multiple regression analysis. These tests are known to have more powerful results than non-parametric tests, which makes them more useful for exploring the difference between groups or testing hypotheses about a population.

The assumptions of parametric tests are met when the data are approximately normal and when the sample size is large enough. Therefore, it is recommended to check for normality of the data using normality test such as the Shapiro-Wilk test and check for outliers using box plots and histograms.

Parametric tests are useful when the researcher wants to compare means of two or more groups, establish a relationship between two or more variables, or investigate the correlation between variables. Examples of parametric tests include: t-test, ANOVA, multiple regression analysis, and chi-square tests.

Simple Linear Regression

Simple linear regression is a statistical method used to model the relationship between two variables, where one variable (called the independent variable or predictor variable) is used to predict the other variable (called the dependent variable or response variable). The relationship between the two variables is assumed to be linear, meaning that the change in the dependent variable is proportional to the change in the independent variable.

The goal of simple linear regression is to find the best-fitting line that describes the relationship between the two variables. This line is called the regression line, and it is defined by an equation of the form y = mx + b, where y is the dependent variable, x is the independent variable, m is the slope of the line, and b is the y-intercept of the line.

To find the values of m and b that best fit the data, simple linear regression uses a method called least squares regression. This method minimizes the sum of the squared differences between the predicted values of y and the actual values of y for each value of x. The resulting regression line can be used to predict the value of y for any given value of x within the range of the data.

Simple linear regression is commonly used in many fields, including economics, social sciences, and engineering, to analyze the relationship between two variables and make predictions based on that relationship.

Steps

To conduct a hypothesis test in the context of simple linear regression, follow these steps:

State the research question(s) and assess the nature of the variables (continuous, discrete), and identify the dependent (outcome) and independent variables.
State the null and alternative hypotheses: The null hypothesis (H₀) typically states that there is no relationship between the predictor (independent) variable and the response (dependent) variable, i.e., the slope of the regression line (β₁) is equal to zero. The alternative hypothesis (H₁) states that there is a relationship between the predictor and response variables, i.e., the slope is not equal to zero.

H₀: β₁ = 0 H₁: β₁ ≠ 0
Collect data: Gather a sample of paired data points for the predictor and response variables.
Prepare data (diagnostics): Ensure that the data is appropriate for linear regression (i.e., there is a linear relationship, the residuals are normally distributed, homoscedasticity is present, and extreme outliers are not present).
1. Linearity:
  1. Scatterplot: Create a scatterplot of the response variable against the predictor variable. Visually inspect the plot to determine if the relationship appears to be linear. If the data points form a pattern that roughly resembles a straight line, it suggests a linear relationship. If the pattern is curved, it indicates that the relationship may be nonlinear.
  2. Residual plot: After fitting the linear regression model, plot the residuals (observed values minus predicted values) against the predictor variable or the predicted values. A random scattering of residuals around the horizontal axis (zero) suggests that the linear model is appropriate. If there is a pattern or trend in the residual plot, it indicates a potential violation of the linearity assumption.
  3. Correlation coefficient (Pearson’s r): Calculate the correlation coefficient between the predictor and response variables. Pearson’s r ranges from -1 to 1, with values close to -1 or 1 suggesting a strong linear relationship, and values close to 0 indicating a weak or no linear relationship. While the correlation coefficient can provide information about the strength and direction of the relationship, it doesn’t necessarily imply causation or indicate that a linear regression model is appropriate.
  4. Partial regression plots (added-variable plots): Partial regression plots can help assess the linearity assumption in multiple regression settings. These plots show the relationship between a response variable and a predictor variable, controlling for the effects of other predictor variables in the model. If the partial regression plots display a linear pattern, it supports the linearity assumption.
Estimate the regression coefficients: Using the sample data, calculate the slope (β₁) and the intercept (β₀) of the regression line using the least squares method or another appropriate method.
Calculate the test statistic: Compute the t-statistic for the hypothesis test using the following formula:

t = (b₁ - 0) / SE(b₁)

Here, b₁ is the estimated slope of the regression line, and SE(b₁) is the standard error of the slope.
Determine the degrees of freedom: Calculate the degrees of freedom for the t-distribution, which is equal to the number of data points (n) minus the number of estimated parameters (2 for simple linear regression: the slope and the intercept).

df = n - 2
Determine the critical value or p-value: Based on the chosen significance level (α, usually 0.05), determine the critical t-value from the t-distribution table, or calculate the p-value for the observed t-statistic. The p-value is the probability of observing a t-statistic as extreme or more extreme than the one calculated, assuming the null hypothesis is true.
Make a decision: Compare the calculated t-statistic to the critical value or the p-value to the significance level:
- If the absolute value of the t-statistic is greater than the critical value, or if the p-value is less than the significance level (p < α), reject the null hypothesis. This suggests that there is a significant relationship between the predictor and response variables.
- If the absolute value of the t-statistic is less than or equal to the critical value, or if the p-value is greater than or equal to the significance level (p ≥ α), fail to reject the null hypothesis. This suggests that there is no significant relationship between the predictor and response variables.
Interpret the results: Based on the decision made in the previous step, interpret the results in the context of your research question or study. If you rejected the null hypothesis, you can also report the estimated regression coefficients and the coefficient of determination (R²) to describe the strength and direction of the relationship between the predictor and response variables.

Diagnostics

Example

Research question

Can VO2max be predicted from BMI?

Variables:

V02max: predicted/dependent (continuous)

BMI: predictor/independent (continuous)

Hypotheses

H₀: β₁ = 0; H₁: β₁ ≠ 0

Data collection

The following variables have been recorded for each individual:

Age (in years)
Gender (1 = Male, 2 = Female)
Resting eart Rate (RHR, in beats per minute)
VO2 Max (in mL/kg/min)
Body Mass Index (BMI)

The dataset can be found here: physical-fitness.csv

Data preparation (Diagnostics)

Linearity

Homoscedasticity

If you violate the assumption of equal variance (also known as homoscedasticity) in a simple linear regression, this can have several consequences for your model:

Biased estimates: The estimated coefficients may be biased, which means that they may not accurately reflect the true relationship between the dependent variable and the independent variable.
Inaccurate standard errors: The standard errors of the estimated coefficients may be inaccurate, which means that the confidence intervals and hypothesis tests based on them may be incorrect.
Invalid hypothesis tests: The F-test and t-tests used to assess the significance of the regression coefficients may not be valid if the assumption of equal variance is violated.
Inefficient estimates: The estimates may be less efficient than they would be under homoscedasticity, which means that you may need a larger sample size to achieve the same level of precision in your estimates.

To address this issue, you may want to consider using alternative regression techniques that are robust to heteroscedasticity, such as weighted least squares regression or generalized least squares regression. Alternatively, you may want to transform the variables or use nonparametric methods that do not assume equal variance. You can also use diagnostic plots, such as a plot of residuals against predicted values, to identify and correct heteroscedasticity.

One-sample t test

\[ t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}} \tag{1}\]

where:

\(\bar{x}\) is the sample mean
\(\mu\) is the population mean
\(s\) is the sample standard deviation
\(n\) is the sample size

It also can be written as:

\[ t = \frac{\bar{x} - \mu}{s_x/\sqrt{n}} \tag{2}\]

where:

\(\bar{x}\) is the sample mean
\(\mu\) is the population mean
\(s_x\) is the sample standard deviation
\(n\) is the sample size

When to use it?

The one-sample t-test should be run when you have a single sample of data and you want to compare the mean of that sample to a known or hypothesized population mean.

Example in kinesiology: A kinesiologist wants to know if a new exercise program improves muscle strength in older adults. The kinesiologist recruits a sample of 20 older adults and has them complete the exercise program for 8 weeks. The kinesiologist measures muscle strength using a dynamometer before and after the 8 week program. The kinesiologist wants to know if the mean muscle strength of the sample is significantly different from the population mean muscle strength of older adults (which is hypothesized to be 40 kg). The kinesiologist would run a one-sample t-test to compare the mean muscle strength of the sample (after completing the exercise program) to the hypothesized population mean of 40 kg. If the t-value is significant, the kinesiologist can conclude that the exercise program does improve muscle strength in older adults.

Assumptions

The test makes several assumptions about the data:

Independence: The observations in the sample are independent of one another.
Normality: The population from which the sample is drawn is normally distributed.
Equal variances: The population variances of the two groups are equal.
Random sampling: The sample is drawn randomly and independently from the population.
Sample size: The sample size is large enough (usually greater than 30) for the Central Limit Theorem to be applied.

It is important to check these assumptions before running the one-sample t-test to ensure the validity of the test results. In case the sample size is small or the data distribution is not normal, a non-parametric test such as the Wilcoxon signed-rank test should be used instead.

Reporting results

When reporting the results of a one-sample t-test in APA style, you should include the following information:

The test statistic and the associated degrees of freedom (df). For example: “t(df) = 2.45, p = .03.”
The p-value of the test. This should be rounded to two decimal places and reported as a probability (e.g., “p = .03” rather than “p < .05”).
The sample size (n). This should be reported as a whole number (e.g., “n = 20”).
The mean and standard deviation of the sample. For example: “M = 5.6, SD = 1.2.”
The direction of the effect. For example, “The mean score was significantly higher than the hypothesized mean (M = 4.0, t(df) = 2.45, p = .03).”
Any relevant effect size measures. For example, you might report the Cohen’s d effect size as “d = 0.87.”

It is also good practice to include a brief description of the research question or hypothesis being tested, as well as a description of the sample and any relevant study variables.

For example: “In this study, we tested the hypothesis that the mean score on a memory task would be significantly higher than 4.0. A sample of 20 participants completed the task, with a mean score of 5.6 (SD = 1.2). The one-sample t-test revealed a significant difference, t(df) = 2.45, p = .03, d = 0.87, indicating that the mean score was significantly higher than the hypothesized mean.”

Examples

The results of the one-sample t test indicated that the mean (M = X, SD = Y) was significantly different from the hypothesized value (t(df) = t-value, p < .05).

In a one-sample t test, the mean of the sample was significantly different from the hypothesized mean (t(df) = t-value, p < .05). Specifically, the mean of the sample was X (SD = standard deviation) while the hypothesized mean was Y.

The results of the one-sample t test indicated that the mean of the sample (M = X, SD = Y) was significantly different from the hypothesized mean (μ = t) t(df) = t-value, p < .05.

Independent-Samples t test

When to use it?

The independent-samples t-test should be run when comparing the means of two independent groups. For example, in the field of kinesiology, an independent-samples t-test can be used to compare the muscle strength of a group of individuals who have completed a resistance training program to a group of individuals who have not completed a resistance training program. The independent variable would be whether or not the individual completed the resistance training program and the dependent variable would be muscle strength. The t-test would be used to determine if there is a significant difference in muscle strength between the two groups, indicating that the resistance training program had an effect on muscle strength.

Assumptions

The assumptions of the independent-samples t-test include:

Normality: The data should be approximately normally distributed within each group.
Independence: The observations in each group should be independent of one another.
Equal variances: The variances of the two groups should be roughly equal.
Random Sampling: The sample of each group should be random and representative of the population.
Equal sample size: The sample size in each group should be equal or similar.

It is important to note that not all these assumptions need to be perfectly met for the test to be valid, but the deviations from these assumptions should be small. If the data do not meet these assumptions, the non-parametric version of the independent-samples t-test, such as the Mann-Whitney U test, can be used instead.

Reporting Results

To report the results of an independent samples t-test in APA style, you should include the following information:

The type of test that was conducted (e.g., “An independent samples t-test was conducted to compare the mean scores of two groups on a measure of stress.”)
The sample size for each group (e.g., “The sample consisted of 20 participants in Group A and 25 participants in Group B.”)
The mean and standard deviation for each group (e.g., “The mean score for Group A was M = 3.5, SD = 1.2, and the mean score for Group B was M = 2.8, SD = 0.9.”)
The t-value and p-value obtained from the test (e.g., “The t-value was t(43) = 2.3, p = .03, indicating that there was a significant difference between the mean scores of the two groups, with Group A scoring higher than Group B.”)
The effect size (e.g., “The effect size for the difference between the two groups was d = .7, indicating a moderate effect.”)

It is also a good idea to provide a brief interpretation of the results, explaining what they mean in the context of your research question or hypothesis.

Here is an example of how you might report the results of an independent samples t-test in APA style:

Examples

“An independent samples t-test was conducted to compare the mean scores of two groups on a measure of stress. The sample consisted of 20 participants in Group A and 25 participants in Group B. The mean score for Group A was M = 3.5, SD = 1.2, and the mean score for Group B was M = 2.8, SD = 0.9. The t-value was t(43) = 2.3, p = .03, indicating that there was a significant difference between the mean scores of the two groups, with Group A scoring higher than Group B. The effect size for the difference between the two groups was d = .7, indicating a moderate effect. These results suggest that participants in Group A experienced significantly higher levels of stress than those in Group B.”

“The purpose of this study was to examine the effect of a new teaching method on student achievement. A sample of 50 students was randomly assigned to either the experimental group, which received the new teaching method, or the control group, which received the traditional teaching method. Student achievement was measured using a standardized test. The results of the independent samples t-test showed that there was a statistically significant difference between the experimental and control groups, t(48) = 2.57, p = .01. The experimental group had a higher mean score on the achievement test than the control group. These findings suggest that the new teaching method was effective in improving student achievement. However, it is important to note that the small sample size and lack of generalizability to other populations are limitations of this study.”

Paired-Samples t-test

When to use it?

The paired-samples t-test should be run when you have two sets of related (or paired) data that you want to compare. For example, if you want to compare the effectiveness of two different treatments on a group of patients, you would use a paired-samples t-test. This test is also commonly used in pre- and post-test designs, where you want to compare scores before and after an intervention or treatment. Additionally, if you want to compare the mean differences between two groups or conditions, but you want to control for individual differences, you can use a paired-samples t-test.

Assumptions

The assumptions of the paired-samples t-test include:

Independence: The observations within each pair are independent of one another.
Normality: The differences between the pairs of observations are approximately normally distributed.
Equal variances: The variances of the differences between the pairs of observations are equal.
Paired data: The observations are paired, meaning that each individual is measured twice, once before and once after some intervention, or in two different conditions.
Random Sampling: The sample being used is selected randomly from the population.

It’s important to note that violations of these assumptions may lead to inaccurate results, so it’s necessary to check them before applying the test. Checking for normality can be done using a normal probability plot, and checking for equal variances can be done using Levene’s test. The paired sample t-test is sensitive to the violation of normality and equal variances assumptions, thus if the assumptions are not met, there are other options that can be used, such as the Wilcoxon signed-rank test which is a non-parametric version of the paired-samples t-test.

Reporting Results

When reporting the results of a paired-samples t-test, it is important to include the following information:

The test statistic (e.g. t-value) and the associated p-value. The t-value tells us how many standard errors the mean difference is from zero, and the p-value tells us the probability of observing a t-value as extreme as the one we calculated under the assumption that the null hypothesis is true.
The sample size (e.g. the number of pairs of observations).
The mean difference and the standard deviation of the differences.
The effect size, such as Cohen’s d, which is a measure of the magnitude of the difference between the means.

Example:

“A paired-samples t-test was conducted to compare the pre-test and post-test scores of a group of students. The sample size was 20 pairs of observations. The mean difference between the pre-test and post-test scores was 5.3 (SD = 2.5). The t-value was 3.87, with a p-value of 0.001. The effect size (Cohen’s d) was 0.67, which indicates a moderate effect size.”

“The results of the paired-samples t-test revealed that there was a statistically significant difference between the pre-test and post-test scores of the students, t(19) = 3.87, p = 0.001. The mean difference was 5.3 (95% CI [3.4, 7.2]), and the effect size (d = 0.67) suggests moderate effect size.”

One-way ANOVA

When to use it?

A one-way ANOVA (Analysis of Variance) is used to test for differences in the mean of a continuous outcome variable across two or more categorical groups. It is used when you have one independent variable (also known as a factor) with two or more levels or groups and one continuous dependent variable.

In other words, you should run a one-way ANOVA when:

You want to compare the means of a continuous outcome variable across two or more groups.
You have one independent variable (factor) with two or more levels or groups
The dependent variable is continuous

Example: A researcher wants to know if there is a difference in the mean test scores of students who studied using different methods (Method A, Method B, Method C). The researcher collect data on test scores and the method used. The test scores are continuous variable and method of study is categorical variable with 3 levels. So, the researcher can use one-way ANOVA to test for the difference in mean test scores across different method of study.

Assumptions

The one-way ANOVA (analysis of variance) makes several assumptions about the data being analyzed:

Independence: The observations in each group are independent of each other and do not affect the observations in the other groups.
Normality: The data within each group is normally distributed, or at least approximately normally distributed. This assumption can be checked using normality tests such as the Shapiro-Wilk test.
Equal variances: The variances of the data within each group are equal. This assumption can be checked using tests such as the Levene’s test.
Random sampling: The data are randomly sampled from the population, so that the sample is representative of the population.

Violations of these assumptions can lead to biased or incorrect results. If the assumptions are not met, non-parametric alternatives such as Kruskal-Wallis test could be used.

Reporting results

When reporting the results of a one-way ANOVA, it is important to include the following information:

The F-value and the associated p-value. The F-value tells us how much variation in the dependent variable is accounted for by the independent variable, and the p-value tells us the probability of observing an F-value as extreme as the one we calculated under the assumption that the null hypothesis is true.
The sample size for each group, or the number of observations in each group.
The means and standard deviations for each group.
The effect size, such as eta-squared (η²) or omega-squared (ω²), which is a measure of the proportion of variance in the dependent variable that is accounted for by the independent variable.

Example:

“A one-way ANOVA was conducted to compare the scores of three groups of students on a test. The sample size for each group was 10, 10, and 12. The means and standard deviations for each group were: Group 1 = 80 (SD = 5), Group 2 = 75 (SD = 4), Group 3 = 70 (SD = 3). The F-value was 12.5, with a p-value of 0.001. The effect size (η²) was 0.17, which indicates a moderate effect size.”

“The results of the one-way ANOVA revealed that there was a statistically significant difference between the scores of the three groups of students, F(2, 28) = 12.5, p = 0.001. Post-hoc comparisons revealed that group 1 significantly different from group 2 (p = 0.05) and group 3 (p = 0.001). The mean scores for each group were: Group 1 = 80 (95% CI [78, 82]), Group 2 = 75 (95% CI [73, 77]), Group 3 = 70 (95% CI [69, 72]). The effect size (η² = 0.17) suggests that the independent variable accounted for a moderate proportion of variance in the dependent variable.”

Between-subjects Two-way ANOVA

When to use it?

You should run a Between-subjects Two-way ANOVA when you have two independent variables (also known as factors) that you want to test the effect of on a dependent variable, and the levels of these independent variables are different between groups of participants.

An example related to physical activity could be comparing the effect of two different types of exercise programs (e.g. resistance training vs. cardio) on muscle mass gain in two different age groups (e.g. older adults vs. younger adults). In this example, the independent variables would be the type of exercise program and the age group, and the dependent variable would be muscle mass gain. The levels of the independent variables would be different between groups of participants, as the older adults would be in one group and the younger adults would be in another. A Between-subjects Two-way ANOVA would allow you to test for any interactions between the two independent variables (i.e. whether the effect of the exercise program on muscle mass gain is different in older adults vs. younger adults) and any main effects of each independent variable on the dependent variable.

Assumptions

The assumptions of the between-subjects two-way ANOVA are as follows:

Independence: The observations in each group are independent of each other, meaning that the responses of one participant do not affect the responses of another participant.
Normality: The distribution of the residuals (the difference between the observed and predicted values) is approximately normal. It is important to check the normality assumptions using normality test such as the Shapiro-Wilk test and check for outliers using box plots and histograms.
Equal variances: The variances of the residuals are equal across all groups and levels of the independent variables. This assumption can be checked using Levene’s test for equality of variances.
Additivity: The effect of each independent variable on the dependent variable is additive, meaning that the effect of one independent variable does not depend on the level of the other independent variable.
Linearity: The relationship between the independent variables and the dependent variable is linear, meaning that a straight line can be used to represent the relationship.
Independence of errors: the residuals are independent, meaning that the residuals of one observation are not correlated with the residuals of any other observation.

It’s important to note that when these assumptions are not met, the results of the ANOVA should be interpreted with caution and alternative methods of analysis should be considered.

Reporting results

When reporting the results of a between-subjects Two-way ANOVA, you should include the following information:

A description of the study design, including the independent variables, dependent variable, and the levels of each independent variable.
The results of the ANOVA, including the F-value, degrees of freedom, and p-value for each independent variable and any interaction between the two independent variables.
A summary of the main findings, including any significant main effects or interactions between the independent variables.
Follow-up tests, such as post-hoc tests, to determine which specific levels of the independent variables were responsible for the significant effects.

Example:

“The study aimed to examine the effect of two different types of physical education programs (e.g. traditional vs. adventure-based) on physical fitness in two different age groups (e.g. younger children vs. older children). A between-subjects Two-way ANOVA was conducted, with the independent variables being the type of program and the age group, and the dependent variable being physical fitness. The results of the ANOVA showed a significant main effect of the type of program, F (1, 48) = 8.32, p = 0.006, with the adventure-based program resulting in higher physical fitness scores than the traditional program. A significant interaction was also found between the type of program and the age group, F (1, 48) = 4.12, p = 0.047. Follow-up tests showed that the adventure-based program resulted in significantly higher physical fitness scores in older children compared to the traditional program, but there was no significant difference in younger children.”

Within-subjects Two-way ANOVA

You should run a Within-subjects Two-way ANOVA when you have two independent variables (also known as factors) that you want to test the effect of on a dependent variable, and the levels of these independent variables are the same for all participants, but are manipulated within each participant.

An example related to motor skill learning could be comparing the effect of two different types of feedback (e.g. verbal vs. visual) on the learning of a new motor skill (e.g. throwing a ball) in a group of participants. In this example, the independent variables would be the type of feedback and the time of testing (e.g. pre-test and post-test), and the dependent variable would be the accuracy of the throwing skill. The levels of the independent variables would be the same for all participants, as each participant would receive both types of feedback and be tested both before and after training. A Within-subjects Two-way ANOVA would allow you to test for any interactions between the two independent variables (i.e. whether the effect of feedback on motor skill learning is different at the pre-test vs. the post-test) and any main effects of each independent variable on the dependent variable.

Assumptions

The assumptions of the within-subjects two-way ANOVA (also known as a repeated-measures ANOVA) include:

Independence: The observations within each cell of the design must be independent of one another.
Normality: The observations within each cell should be approximately normally distributed.
Equal variances: The variances of the observations within each cell should be approximately equal.
Sphericity: The variances of the differences between the observations for each condition should be approximately equal across the levels of the other independent variable. This assumption is often tested using Mauchly’s test.
No carryover effects: There should be no carryover effects from one level of one independent variable to the next level of the same independent variable.
No missing data: There should be no missing data in the dataset.

It is important to note that when the assumptions of the within-subjects two-way ANOVA are not met, the results may not be reliable and alternative methods such as mixed-design ANOVA or non-parametric methods may be needed.

Reporting results

When reporting the results of a within-subjects Two-way ANOVA, you should first indicate the dependent variable, the independent variables and the levels of each independent variable. You should then report the main effect of each independent variable on the dependent variable, as well as any interaction effects between the independent variables.

For example, if you were studying the effect of two different types of practice schedules (random vs. blocked) and two different types of feedback (verbal vs. visual) on motor skill learning, your report would look something like this:

Dependent variable: Motor skill learning Independent variables: Practice schedule (random vs. blocked) and feedback (verbal vs. visual)

Main Effects:

Practice schedule: There was a significant main effect of practice schedule on motor skill learning, F(1,24) = 7.32, p = .012, ηp² = .234, such that participants who practiced in a random schedule showed better motor skill learning than those who practiced in a blocked schedule.
Feedback: There was a significant main effect of feedback on motor skill learning, F(1,24) = 4.56, p = .045, ηp² = .158, such that participants who received visual feedback showed better motor skill learning than those who received verbal feedback.

Interaction Effects:

There was no significant interaction between practice schedule and feedback on motor skill learning, F(1,24) = .21, p = .65, ηp² = .009.

It is worth noting that p values less than 0.05 would be considered statistically significant, and ηp² indicates the effect size of the interaction effects.

It is also important to report the means and standard deviations of the dependent variable within each level of the independent variables. This will give a more detailed understanding of the pattern of results.

Mixed-design Two-way ANOVA

When to use it?

You should run a mixed-factors two-way ANOVA when you have two independent variables (also known as factors) that you want to test the effect of on a dependent variable, and one of the independent variables is a between-subjects variable and the other is a within-subjects variable.

A mixed-factors two-way ANOVA is useful when you want to examine the effect of a manipulation that varies within subjects (e.g. different levels of a treatment) while also taking into account individual differences among participants (e.g. gender, age, etc.).

An example of when to run a mixed-factors two-way ANOVA would be if you wanted to investigate the effect of two different types of exercise programs (e.g. resistance training vs. cardio) on muscle mass gain in two different age groups (e.g. older adults vs. younger adults) but the program is applied to the same group of people. In this example, the independent variables would be the type of exercise program (between-subjects) and the age group (within-subjects) and the dependent variable would be muscle mass gain.

A mixed-factors two-way ANOVA would allow you to test for any interactions between the two independent variables, and any main effects of each independent variable on the dependent variable, while also taking into account the individual differences among participants.

Assumptions

The assumptions of the mixed-factors two-way ANOVA (also known as a mixed-design two-way ANOVA) include:

Independence of observations: Each participant’s response should be independent of the responses of other participants.
Normality: The residuals (the difference between the observed and predicted values) should be approximately normally distributed.
Equal variances: The variances of the residuals should be equal across all levels of the independent variables.
Sphericity: The variances of the differences between the levels of the within-subjects variable should be equal.
Linearity: The relationship between the independent and dependent variables should be linear.
Additivity: The effect of each independent variable should be additive, meaning that the effect of one variable should not depend on the level of the other variable.

It’s important to note that the violation of these assumptions can lead to inaccurate results, therefore, it’s important to check them by using various graphical methods, like box plots, histograms and Q-Q plots, and statistical methods like Shapiro-Wilk test, Levene’s test, and Mauchly’s test.

Reporting results

When reporting the results of a mixed-factors Two-way ANOVA, you should include the following information:

A description of the study design, including the independent variables, dependent variable, and the levels of each independent variable. Specify which variable is the within-subject variable and which one is the between-subject variable.
The results of the ANOVA, including the F-value, degrees of freedom, and p-value for each independent variable and any interaction between the two independent variables.
A summary of the main findings, including any significant main effects or interactions between the independent variables.
Follow-up tests, such as post-hoc tests, to determine which specific levels of the independent variables were responsible for the significant effects.

Example:

“The study aimed to examine the effect of two different types of motor development programs (e.g. traditional vs. adventure-based) on balance skills in two different age groups (e.g. preschool children vs. school-aged children). A mixed-factors Two-way ANOVA was conducted, with the independent variables being time (within-subject variable) and the age group (between-subject variable) and the dependent variable being balance skills. The results of the ANOVA showed a significant main effect of time, F (1, 48) = 8.32, p = 0.006, with post-test resulting in higher balance skills scores than pretest. A significant interaction was also found between time and the age group, F (1, 48) = 4.12, p = 0.047. Follow-up tests showed that post-test resulted in significantly higher balance skills scores in preschool children compared to pre-test, but there was no significant difference in school-aged children.”

It is important to note that in a mixed-factors ANOVA, the assumptions of normality and equal variances are tested at each level of the within-subject variable separately.

Pearson Correlation Coefficient

The Pearson correlation coefficient (r) is used to measure the strength and direction of the linear relationship between two continuous variables. It is appropriate to use when you want to determine if there is a linear relationship between two variables and the data are at least interval level. It is important to note that the Pearson correlation coefficient only measures linear relationships, and it is not appropriate to use if the relationship between the variables is non-linear.

It is generally considered appropriate to use the Pearson correlation coefficient when the following conditions are met:

The data for both variables is continuous (i.e., interval or ratio level).
Both variables are normally distributed or the sample size is large enough (n>30)
The variables are not categorical
The data are free from outliers
The relationship between the two variables is believed to be linear

It’s also worth noting that the Pearson correlation coefficient assumes that there is no causality between the two variables, it just measures the association between them. If you want to establish causality, you should use other statistical techniques such as regression analysis.

Reporting results

When reporting the results of a Pearson correlation coefficient, it is important to include the following information:

The correlation coefficient (r) and the associated p-value. The correlation coefficient tells us the strength and direction of the linear relationship between two variables, and the p-value tells us the probability of observing a correlation coefficient as extreme as the one we calculated under the assumption that there is no correlation.
The sample size (e.g. the number of observations)
The scatter plot of the data with the line of best fit and the equation of that line.
The effect size, such as Cohen’s d, which is a measure of the magnitude of the correlation.

Example:

“A Pearson correlation coefficient was calculated to examine the relationship between hours of sleep and test scores of a group of students. The sample size was 30. The correlation coefficient was 0.75, with a p-value of 0.001. The scatter plot of the data with the line of best fit showing a positive correlation and equation of the line y = 0.5x+50 . The effect size (Cohen’s d) was 0.5, which indicates a moderate effect size.”

“The results of the Pearson correlation coefficient revealed a statistically significant positive correlation between hours of sleep and test scores of the students (r(28) = 0.75, p = 0.001). The scatter plot of the data with the line of best fit y = 0.5x+50 illustrates the positive correlation. The effect size (d = 0.5) suggests a moderate correlation.”

Non-parametric tests

Non-parametric tests are a class of statistical tests that do not make assumptions about the underlying probability distribution of the data being analyzed. These tests are also known as distribution-free tests, as they do not require the data to follow a specific distribution such as normal distribution. They are useful when the data do not meet the assumptions of parametric tests or when the sample size is small. Non-parametric tests are less powerful than parametric tests, but they are more robust and can be used with a wide range of data types.

Examples of non-parametric tests include: Mann-Whitney U test, Wilcoxon signed-rank test, Kruskal-Wallis H test, chi-square test, and median test. They can be used to test for differences between groups, compare medians, and test for independence. They are commonly used in fields such as social sciences, medicine, and biology.

It is important to note that non-parametric tests do not require the data to be normally distributed but they do require other assumptions to be met such as independence and ordinal or continuous data.

Wilcoxon signed-rank

The Wilcoxon signed-rank test is a non-parametric test used to determine whether two related samples have the same median. It is used when the data do not meet the assumptions of a parametric test such as the t-test, such as when the data are not normally distributed or the sample size is small. The Wilcoxon signed-rank test is a good alternative to the t-test for comparing two related samples.

You should run the Wilcoxon signed-rank test when:

You have two related samples (e.g. pre-test and post-test scores for the same individuals)
The data are not normally distributed
The sample size is small
You want to compare the median of two related samples

An example of when you would use the Wilcoxon signed-rank test would be if you were studying the effectiveness of a new treatment for a medical condition, and you collected pre-treatment and post-treatment scores for a small group of patients. Since the sample size is small, and the distribution of scores is not normal, you would use the Wilcoxon signed-rank test to compare the median pre-treatment and post-treatment scores and determine if there was a significant difference between them.

Assumptions

It is a robust test and doesn’t require the assumptions of normality, equal variances and large sample size that parametric tests like t-test require.

The Wilcoxon signed-rank test assumes the following:

The data are ordinal or continuous, not categorical.
The two samples being compared are related or dependent, meaning that the observations in one sample correspond to the observations in the other sample in some meaningful way.
The data are at least ordinal, it doesn’t assume normality.
The data are independent and randomly selected from the population.
The observations in the two samples should be continuous or ordinal, not categorical.
The data should be free of outliers, as they can have a large impact on the test results.

It is important to note that the Wilcoxon signed-rank test is a non-parametric test, so it is less powerful than a t-test and a parametric test. Therefore, it is advisable to use the Wilcoxon signed-rank test only when the assumptions of the t-test are not met.

Reporting Results

When reporting the results of a Wilcoxon signed-rank test, you should include the following information:

A description of the study design, including the dependent variable and the groups being compared.
The results of the test, including the test statistic (e.g. W) and the p-value.
A summary of the main findings, including whether the difference between the groups is statistically significant.
The effect size, such as Cohen’s d or r, to indicate the magnitude of the difference between the groups.

Example:

“The study aimed to examine the effect of a physical therapy intervention on pain levels in patients with knee osteoarthritis. A Wilcoxon signed-rank test was conducted to compare the pain levels before and after the intervention. The results of the test showed a significant decrease in pain levels after the intervention, W = 45, p = 0.03. The effect size was r = 0.5, indicating a moderate effect size. This suggests that the physical therapy intervention was effective in reducing pain levels in patients with knee osteoarthritis.”

It is important to note that the Wilcoxon signed-rank test is a non-parametric test that is used when the data are not normally distributed or the sample size is small. This test is used to compare two related samples or matched-pairs data.

Mann-Whitney U test

The Mann-Whitney U test is a non-parametric test that is used to compare two independent groups to determine whether there is a significant difference in the distribution of scores between the two groups. This test is used when the assumptions of a parametric test such as the independent t-test cannot be met. These assumptions include normality of the data and equal variances between groups.

The Mann-Whitney U test is equivalent to the Independent-Samples t-test.

The Mann-Whitney U test is particularly useful when the data are not normally distributed, when the sample size is small, or when the data have outliers. It is also used when the data are ordinal or non-normally distributed, and when the variances of the two groups are not equal.

An example of when to run the Mann-Whitney U test is when you want to compare the muscle mass gain of two groups of athletes, one group using a new supplement, and the other using a placebo. The muscle mass gain is measured in kilograms and the sample size is small(less than 30), thus, it would be appropriate to use the Mann-Whitney U test to determine if there is a significant difference in muscle mass gain between the two groups.

Assumptions

The Mann-Whitney U test is a non-parametric test that is used to compare the medians of two independent groups. In order for the test to be valid, the following assumptions must be met:

Independence: The observations in each group must be independent of one another. This means that there should be no relationship between the observations within a group.
Ordinal or continuous data: The data must be ordinal or continuous in nature. The Mann-Whitney U test cannot be used for categorical or discrete data.
No ties: There should be no ties in the data. Ties occur when two or more observations have the same value.
No outliers: The data should not contain outliers. Outliers are observations that are far away from the rest of the data and can have a strong impact on the results of the test.
No normal distribution: The data do not need to follow a normal distribution, which makes the test a good option when dealing with non-normal data.
Same population : The two groups being compared should be sampled from the same population.

It is important to check for these assumptions before running the test and report the results accordingly.

Reporting results

When reporting the results of a Mann-Whitney U test, you should include the following information:

A brief description of the test and its purpose.
The test statistic (U) and the p-value.
A statement about the outcome of the test, such as whether the null hypothesis was rejected or not.
A summary of the main findings, such as the differences in the two groups being compared.
A visual representation of the data, such as box plots or histograms, to help users understand the results.

Example:

“Mann-Whitney U Test

Purpose: To compare the differences in physical activity levels between two groups of individuals (e.g. active vs. sedentary).

Test statistic: U = 50

p-value: 0.03

Outcome: The null hypothesis is rejected.

Findings: The active group had significantly higher physical activity levels than the sedentary group.

Visual representation: A box plot is shown to help the user understand the distribution of physical activity levels in the two groups. The active group has a higher median value and a smaller interquartile range than the sedentary group.”

It is important to note that the Mann-Whitney U test is a non-parametric test that is used to compare two independent samples when the data are not normally distributed. This test is also useful to compare two groups when the sample size is small.

Spearman Rho Correlation Coefficient

When to use it?

The Spearman rank-order correlation coefficient (Spearman’s rho, or simply rho) is a nonparametric measure of the correlation between two variables. It is used when the assumptions of the Pearson correlation coefficient, such as normality of the data, are not met, or when the variables are ordinal or interval with non-linear relationship.

Spearman’s rho should be used when your data are ordinal or interval and non-normally distributed. Or when you suspect that the relationship between the two variables is non-linear.

It’s also useful when your data have outliers or extreme values as it is less affected by them compared to Pearson correlation coefficient.

It’s also useful when you want to investigate the correlation between two ordinal variables, or when you want to investigate the correlation between an ordinal variable and an interval variable.

Keep in mind that, Spearman’s rho assumes that the relationship between the two variables is monotonic, it does not test for linearity.

Reporting results

When reporting the results of a Spearman’s rank correlation coefficient (rho), it is important to include the following information:

The correlation coefficient (rho) and the associated p-value. The correlation coefficient tells us the strength and direction of the monotonic relationship between two variables, and the p-value tells us the probability of observing a correlation coefficient as extreme as the one we calculated under the assumption that there is no correlation.
The sample size (e.g. the number of observations)
A scatter plot of the data with the line of best fit if possible
The effect size, such as Cohen’s d, which is a measure of the magnitude of the correlation.

Example:

“A Spearman’s rank correlation coefficient was calculated to examine the relationship between blood pressure and weight of a group of patients. The sample size was 40. The correlation coefficient was 0.6, with a p-value of 0.001. The scatter plot of the data with the line of best fit showing a positive correlation. The effect size (Cohen’s d) was 0.3, which indicates a weak effect size.”

“The results of the Spearman’s rank correlation coefficient revealed a statistically significant positive correlation between blood pressure and weight of the patients (rho(38) = 0.6, p = 0.001). The scatter plot of the data with the line of best fit illustrates the positive correlation. The effect size (d = 0.3) suggests a weak correlation.”

It’s important to note that when there is a non-parametric test like spearman correlation, it is better to report the median, percentiles and interquartile range of the variables instead of mean and standard deviation.