Week 7: Bivariate Correlation

KIN 610 - Spring 2023

Dr. Ovande Furtado Jr

Intro

  • Bivariate correlation analysis is a statistical technique to measure the strength and direction of the linear relationship between two variables
  • It helps identify the impact of variables on human health and physical activity

The correlation coefficient

  • A correlation coefficient is a statistical measure used to determine the strength and direction of the linear relationship between two variables.
  • Correlation coefficients range from -1 to +1 and provide a numerical value that summarizes the degree to which two variables are related.
  • Correlation coefficients are denoted by different symbols depending on their type.

Properties of Correlation Coefficient

  • Are symmetric, meaning that the order of the variables does not affect the value of the coefficient.
  • Outliers are extreme values that can disproportionately influence the analysis results
    • Outliers should be carefully examined and, if necessary, removed from the analysis to ensure that they are not unduly influencing the results
  • Are a unitless measure of association
  • Not affected by changes in the units of measurement of the variables
    • Useful for comparing the strength of relationships between variables measured in different units

Properties of Correlation Coefficient, cont

  • The correlation coefficient is always between -1 and +1
  • Provides a standardized measure of association that can be easily interpreted
  • Closer the correlation coefficient is to -1 or +1, stronger the relationship is between the variables

Criteria for interpretation

This table is used to interpret the strength of association between two variables based on the magnitude of their correlation coefficient.

However, it’s important to keep in mind that these are just guidelines, and the strength of association may depend on the context and specific research question.

Correlation coefficient Correlation strength Correlation type
-.7 to -1 Very strong Negative
-.5 to -.7 Strong Negative
-.3 to -.5 Moderate Negative
0 to -.3 Weak Negative
0 None Zero
0 to .3 Weak Positive
.3 to .5 Moderate Positive
.5 to .7 Strong Positive
.7 to 1 Very strong Positive

Types of Correlation Coefficient

  • There are several types of correlation coefficients that can be used in bivariate correlation analysis, each with its strengths and limitations.
  • Most common: Pearson's correlation coefficient (\(r\)), which measures the strength and direction of the linear relationship between two variables.
  • Ranges from -1 to +1, with a value of -1 indicating a perfect negative correlation, a value of 0 indicating no correlation, and a value of +1 indicating a perfect positive correlation.

Comparison table

Correlation Coefficient Symbol Scales
Pearson \(r\) Both scales interval (or ratio)
Spearman \(\rho\) (rho) Both scales ordinal
Kendall \(\tau\) (tau) Both scales ordinal
Point-Biserial rpbi One scale naturally dichotomous (nominal), one scale interval (or ratio)
Phi \(\phi\) (phi) Both scales are naturally dichotomous (nominal)
Gamma \(\gamma\) (gamma) One scale nominal, one scale ordinal

Pearson correlation coefficient

  • Used in research studies with interval or ratio scale variables
  • Measures strength and direction of linear relationship between two continuous variables
  • Widely recognized and accepted, many statistical software packages include functions for calculating it
  • Assumes linear and normally distributed relationship, sensitive to outliers and extreme values

Spearman’s rank correlation coefficient

  • Used when data is not normally distributed or there are outliers
  • Based on ranks of data, measures monotonic relationship between two variables
  • Monotonic relationship means variables move in same direction but not necessarily at constant rate

Kendall’s tau correlation coefficient

  • Also based on ranks of data, more robust than Spearman’s rank correlation coefficient
  • Used when sample size is small or there are tied ranks in the data

Point-biserial correlation coefficient

  • Used when one variable is dichotomous and the other is continuous
  • Measures association between dichotomous and continuous variables

Phi correlation coefficient

  • Used when both variables are dichotomous
  • Measures association between two dichotomous variables

Gamma correlation coefficient

  • Non-parametric measure of association between two variables (one nominal and one ordinal)
  • Measures strength and direction of monotonic relationship
  • Used in research studies with categorical data or healthcare research
  • Does not assume linear relationship between variables

Graphical Representation of Correlation

  • Graphical representation of correlation can provide valuable insights into the relationship between two variables.
  • Scatterplots
    • are a common method of graphical representation of correlation.

    • can also be used to identify outliers and nonlinear relationships between variables.

Scale of measurement of variables

  • Correlation coefficient is affected by the scale of measurement of the variables
  • If one variable is measured on a larger scale than the other, the correlation coefficient may be artificially inflated
  • Scale of measurement should be considered when interpreting the correlation coefficient

Graphical representation of correlation

  • Scatter plot is a graph displaying the relationship between two continuous variables as points
  • The horizontal axis represents one variable, and the vertical axis represents another variable
  • Scatter plot can identify the presence and strength of a linear relationship between the two variables

Correlation vs. Causation

  • Correlation does not imply causation.
  • Two variables may be correlated, but other factors may be influencing their relationship.

Limitations of Bivariate Correlation

  • Only examines the relationship between two variables and does not consider the influence of other variables.
  • Does not provide information about causation.
  • Assumes a linear relationship and normality, which may not always be appropriate - there is fix

Steps for Conducting Bivariate Correlation

  1. Check for normality.
  2. Check for outliers.
  3. Calculate the correlation coefficient.
  4. Test for statistical significance.
  5. Interpret the results.

Example

  • Study on the relationship between exercise intensity and heart rate.
  • Normality and outlier checks performed.
  • Correlation coefficient calculated (r = 0.80).
  • Statistical significance tested (p < 0.05).
  • Interpretation: Strong positive relationship between exercise intensity and heart rate.

Hypothesis Testing in Bivariate Correlation Analysis

  • Bivariate correlation analysis involves hypothesis testing to determine if there is a significant correlation between the variables of interest.
  • Hypothesis testing determines the probability that the correlation coefficient calculated from the sample data is statistically significant, indicating that it is unlikely to have occurred by chance.

Null & Alternative Hypotheses

  • The null hypothesis for a bivariate correlation analysis is that there is no significant correlation between the two variables.
  • The alternative hypothesis is that there is a significant correlation between the two variables.
  • The significance level, denoted by α, is the probability of rejecting the null hypothesis when it is true.
  • Follow the example below when stating the null (𝐻0) and alternative (𝐻𝑎) hypotheses.

Stating the Null and Alternative Hypotheses

H0: There is no linear correlation between the two variables, or the correlation coefficient is zero.

H1: There is a linear correlation between the two variables, or the correlation coefficient is not zero.

Mathematically, we can represent the null and alternative hypotheses as:

H0: 𝜌 = 0

H1: 𝜌 ≠ 0

𝜌 is the population correlation coefficient.

Steps to Conduct Hypothesis Testing

  • Calculate the correlation coefficient (r) using the sample data. Determine the degrees of freedom (df) equal to the sample size minus two (two variables).
  • Use a table or statistical software to determine the critical value of r for the given significance level (i.e., alpha = 0.05) and degrees of freedom.
  • Compare the calculated correlation coefficient to the critical value.
  • If the calculated correlation coefficient is greater than the critical value, reject the null hypothesis and conclude that there is a significant correlation between the two variables.
  • If the calculated correlation coefficient is less than the critical value, fail to reject the null hypothesis and conclude that there is no significant correlation between the two variables.

Assumptions and Interpretation

  • The correlation of two interval (or ratio) variables assumes that the data meet the assumptions of normality and linearity.
  • Normality refers to the assumption that the distribution of the variables being studied is approximately normal.
  • Linearity is the assumption that the relationship between the two variables is linear.
    • If these assumptions are not met, it may be necessary to use non-parametric correlation coefficients or transform the data to meet these assumptions.
  • Interpret the results of hypothesis testing in the context of the research question and consider other factors that may influence the relationship between the variables.
  • Controlling for confounding variables or conducting subgroup analyses to explore potential interactions may be necessary.
    • Partial correlation measures the strength of a relationship between two variables, while controlling for the effect of one or more other variables. 

    • For example, you might want to see if there is a correlation between amount of food eaten and blood pressure, while controlling for weight or amount of exercise. 

Correlation and jamovi

  • esci module - Pearson correlation

    • Data: lsj - Parenthood
  • Correlation matrix (Pearson, Spearman, Kendall’s tau

    • Data: lsj - Anscombe

References