Biserial Correlation:
The biserial correlation coefficient is used to measure the strength and direction of the relationship between a continuous variable (which can be dichotomous in nature) and a continuous variable. Here’s how it works with an example:
Example: Relationship between Exam Scores and Pass/Fail Status
Imagine you have a dataset where students either pass or fail an exam (dichotomous variable), and you also have their exam scores (continuous variable).
- Data Representation:
- Pass/Fail (Dichotomous Variable):
- Pass = 1
- Fail = 0
- Exam Scores (Continuous Variable): Numerical scores ranging from 0 to 100.
- Calculation of Biserial Correlation: Suppose the data for a sample of students looks like this: Student Exam Score Pass/Fail 1 85 Pass 2 60 Fail 3 75 Pass 4 40 Fail 5 95 Pass … … … To calculate the biserial correlation coefficient ( r_{pb} ):
- Convert Dichotomous Variable:
- Assign a numerical value to the dichotomous variable (e.g., Pass = 1, Fail = 0).
- Calculate Correlation:
- Use the formula:
[ r_{pb} = \frac{M_{1} – M_{0}}{s_{y}} \sqrt{\frac{n_{1} n_{0}}{n (n – 1)}} ] - Where:
- ( M_{1} ) and ( M_{0} ) are the means of the continuous variable for the groups defined by the dichotomous variable (e.g., mean exam scores for Pass and Fail groups).
- ( s_{y} ) is the standard deviation of the continuous variable.
- ( n_{1} ) and ( n_{0} ) are the frequencies of Pass and Fail groups, respectively.
- ( n ) is the total number of observations.
- Use the formula:
- Interpretation:
- The biserial correlation ( r_{pb} ) ranges from -1 to +1, where:
- ( r_{pb} > 0 ): Positive relationship (higher scores tend to be associated with Pass status).
- ( r_{pb} < 0 ): Negative relationship (lower scores tend to be associated with Pass status).
- ( |r_{pb}| ) closer to 1 indicates a stronger relationship.
Tetrachoric Correlation:
The tetrachoric correlation coefficient measures the strength and direction of the relationship between two dichotomous variables that are assumed to have an underlying bivariate normal distribution. Here’s an example to illustrate this concept:
Example: Relationship between Two Dichotomous Variables
Consider a dataset where you have two dichotomous variables, such as “smoking status” (Yes/No) and “lung cancer diagnosis” (Yes/No).
- Data Representation:
- Smoking Status (Variable A):
- Yes = 1
- No = 0
- Lung Cancer Diagnosis (Variable B):
- Yes = 1
- No = 0
- Calculation of Tetrachoric Correlation: Suppose you have data from a study: Patient Smoking Status Lung Cancer Diagnosis 1 Yes Yes 2 No No 3 Yes Yes 4 Yes No 5 No Yes … … … To calculate the tetrachoric correlation coefficient ( r_{t} ):
- Cross-Tabulation:
- Construct a 2×2 contingency table (cross-tabulation) showing the frequencies of the combinations of the two dichotomous variables. Lung Cancer Yes Lung Cancer No Smoking Yes ( n_{11} ) ( n_{10} ) Smoking No ( n_{01} ) ( n_{00} )
- Calculate Tetrachoric Correlation:
- Use statistical software or specialized tables for calculating the tetrachoric correlation, as it involves estimation using methods that consider the underlying bivariate normal distribution assumption.
- Interpretation:
- The tetrachoric correlation ( r_{t} ) ranges from -1 to +1, where:
- ( r_{t} > 0 ): Positive relationship (both variables tend to be either Yes or No together).
- ( r_{t} < 0 ): Negative relationship (one variable tends to be Yes while the other tends to be No).
- ( |r_{t}| ) closer to 1 indicates a stronger relationship between the two dichotomous variables.
Summary:
- Biserial Correlation is used for measuring relationships between a continuous variable and a dichotomous variable.
- Tetrachoric Correlation is used for measuring relationships between two dichotomous variables assumed to follow a bivariate normal distribution.
Both correlations provide insights into the strength and direction of associations between variables in different types of data scenarios.