Week 7: Bivariate Correlation

KIN 610 - Spring 2023

Dr. Ovande Furtado Jr

Intro

Bivariate correlation analysis is a statistical technique to measure the strength and direction of the linear relationship between two variables
It helps identify the impact of variables on human health and physical activity

The correlation coefficient

A correlation coefficient is a statistical measure used to determine the strength and direction of the linear relationship between two variables.
Correlation coefficients range from -1 to +1 and provide a numerical value that summarizes the degree to which two variables are related.
Correlation coefficients are denoted by different symbols depending on their type.

Properties of Correlation Coefficient

Are symmetric, meaning that the order of the variables does not affect the value of the coefficient.
Outliers are extreme values that can disproportionately influence the analysis results
- Outliers should be carefully examined and, if necessary, removed from the analysis to ensure that they are not unduly influencing the results
Are a unitless measure of association
Not affected by changes in the units of measurement of the variables
- Useful for comparing the strength of relationships between variables measured in different units

Properties of Correlation Coefficient, cont

The correlation coefficient is always between -1 and +1
Provides a standardized measure of association that can be easily interpreted
Closer the correlation coefficient is to -1 or +1, stronger the relationship is between the variables

Criteria for interpretation

This table is used to interpret the strength of association between two variables based on the magnitude of their correlation coefficient.

However, it’s important to keep in mind that these are just guidelines, and the strength of association may depend on the context and specific research question.

Correlation coefficient	Correlation strength	Correlation type
-.7 to -1	Very strong	Negative
-.5 to -.7	Strong	Negative
-.3 to -.5	Moderate	Negative
0 to -.3	Weak	Negative
0	None	Zero
0 to .3	Weak	Positive
.3 to .5	Moderate	Positive
.5 to .7	Strong	Positive
.7 to 1	Very strong	Positive

Types of Correlation Coefficient

There are several types of correlation coefficients that can be used in bivariate correlation analysis, each with its strengths and limitations.
Most common: Pearson's correlation coefficient (\(r\)), which measures the strength and direction of the linear relationship between two variables.
Ranges from -1 to +1, with a value of -1 indicating a perfect negative correlation, a value of 0 indicating no correlation, and a value of +1 indicating a perfect positive correlation.

Comparison table

Correlation Coefficient	Symbol	Scales
Pearson	\(r\)	Both scales interval (or ratio)
Spearman	\(\rho\) (rho)	Both scales ordinal
Kendall	\(\tau\) (tau)	Both scales ordinal
Point-Biserial	rpbi	One scale naturally dichotomous (nominal), one scale interval (or ratio)
Phi	\(\phi\) (phi)	Both scales are naturally dichotomous (nominal)
Gamma	\(\gamma\) (gamma)	One scale nominal, one scale ordinal

Pearson correlation coefficient

Used in research studies with interval or ratio scale variables
Measures strength and direction of linear relationship between two continuous variables
Widely recognized and accepted, many statistical software packages include functions for calculating it
Assumes linear and normally distributed relationship, sensitive to outliers and extreme values

Spearman’s rank correlation coefficient

Used when data is not normally distributed or there are outliers
Based on ranks of data, measures monotonic relationship between two variables
Monotonic relationship means variables move in same direction but not necessarily at constant rate

Kendall’s tau correlation coefficient

Also based on ranks of data, more robust than Spearman’s rank correlation coefficient
Used when sample size is small or there are tied ranks in the data

Point-biserial correlation coefficient

Used when one variable is dichotomous and the other is continuous
Measures association between dichotomous and continuous variables

Phi correlation coefficient

Used when both variables are dichotomous
Measures association between two dichotomous variables

Gamma correlation coefficient

Non-parametric measure of association between two variables (one nominal and one ordinal)
Measures strength and direction of monotonic relationship
Used in research studies with categorical data or healthcare research
Does not assume linear relationship between variables

Graphical Representation of Correlation

Graphical representation of correlation can provide valuable insights into the relationship between two variables.
Scatterplots
- are a common method of graphical representation of correlation.
- can also be used to identify outliers and nonlinear relationships between variables.

Scale of measurement of variables

Correlation coefficient is affected by the scale of measurement of the variables
If one variable is measured on a larger scale than the other, the correlation coefficient may be artificially inflated
Scale of measurement should be considered when interpreting the correlation coefficient

Graphical representation of correlation

Scatter plot is a graph displaying the relationship between two continuous variables as points
The horizontal axis represents one variable, and the vertical axis represents another variable
Scatter plot can identify the presence and strength of a linear relationship between the two variables

Correlation vs. Causation

Correlation does not imply causation.
Two variables may be correlated, but other factors may be influencing their relationship.

Limitations of Bivariate Correlation

Only examines the relationship between two variables and does not consider the influence of other variables.
Does not provide information about causation.
Assumes a linear relationship and normality, which may not always be appropriate - there is fix

Steps for Conducting Bivariate Correlation

Check for normality.
Check for outliers.
Calculate the correlation coefficient.
Test for statistical significance.
Interpret the results.

Example

Study on the relationship between exercise intensity and heart rate.
Normality and outlier checks performed.
Correlation coefficient calculated (r = 0.80).
Statistical significance tested (p < 0.05).
Interpretation: Strong positive relationship between exercise intensity and heart rate.

Hypothesis Testing in Bivariate Correlation Analysis

Bivariate correlation analysis involves hypothesis testing to determine if there is a significant correlation between the variables of interest.
Hypothesis testing determines the probability that the correlation coefficient calculated from the sample data is statistically significant, indicating that it is unlikely to have occurred by chance.

Null & Alternative Hypotheses

The null hypothesis for a bivariate correlation analysis is that there is no significant correlation between the two variables.
The alternative hypothesis is that there is a significant correlation between the two variables.
The significance level, denoted by α, is the probability of rejecting the null hypothesis when it is true.
Follow the example below when stating the null (𝐻0) and alternative (𝐻𝑎) hypotheses.

Stating the Null and Alternative Hypotheses

H0: There is no linear correlation between the two variables, or the correlation coefficient is zero.

H1: There is a linear correlation between the two variables, or the correlation coefficient is not zero.

Mathematically, we can represent the null and alternative hypotheses as:

H0: 𝜌 = 0

H1: 𝜌 ≠ 0

𝜌 is the population correlation coefficient.

Steps to Conduct Hypothesis Testing

Calculate the correlation coefficient (r) using the sample data. Determine the degrees of freedom (df) equal to the sample size minus two (two variables).
Use a table or statistical software to determine the critical value of r for the given significance level (i.e., alpha = 0.05) and degrees of freedom.
Compare the calculated correlation coefficient to the critical value.
If the calculated correlation coefficient is greater than the critical value, reject the null hypothesis and conclude that there is a significant correlation between the two variables.
If the calculated correlation coefficient is less than the critical value, fail to reject the null hypothesis and conclude that there is no significant correlation between the two variables.

Assumptions and Interpretation

The correlation of two interval (or ratio) variables assumes that the data meet the assumptions of normality and linearity.
Normality refers to the assumption that the distribution of the variables being studied is approximately normal.
Linearity is the assumption that the relationship between the two variables is linear.
- If these assumptions are not met, it may be necessary to use non-parametric correlation coefficients or transform the data to meet these assumptions.
Interpret the results of hypothesis testing in the context of the research question and consider other factors that may influence the relationship between the variables.
Controlling for confounding variables or conducting subgroup analyses to explore potential interactions may be necessary.
- Partial correlation measures the strength of a relationship between two variables, while controlling for the effect of one or more other variables.
- For example, you might want to see if there is a correlation between amount of food eaten and blood pressure, while controlling for weight or amount of exercise.

Correlation and jamovi

esci module - Pearson correlation
- Data: lsj - Parenthood
Correlation matrix (Pearson, Spearman, Kendall’s tau
- Data: lsj - Anscombe