Computational Method
The Chi-Square test finds statistically significant associations between categorical variables. The test determines if the categorical variables are statistically independent or not.
- Variable 1 corresponds rows that consists of r categories.
- Variable 2 corresponds columns that consists of c categories.
Each cell of the contingency table is the count of the joint occurrence of particular levels of variable 1 and variable 2.
For example, the following two-way contingency table shows the categorical variable Gender with two levels (Male, Female) and the categorical variable Affiliation with two levels (Smokers, Non-smokers).
Gender | Affiliation | |
---|---|---|
Smokers | Non-Smokers | |
Male | n11 | n12 |
Female | n21 | n22 |
The cell counts nij , i = 1, 2; j = 1, 2 are number of joint occurrences of Gender and Affiliation at their ith and the jth levels, respectively. The Null and alternative hypotheses H0 and H1 corresponding to a χ2 test of independence is as follows:
H0: The two categorical variables are independent
vs
H1: The two categorical variables are not independent
Use the previous table to calculate the expected cell counts:
e11 = n11 + n21
e12 = n11 + n12
e21 = n21 + n22
e22 = n12 + n22
The following formula calculates the χ2 test statistic:
The χ2 statistic follows a Chi-Square distribution with r - 1 and c - 1 degrees of freedom. In the Gender Affiliation table, r=2 and c=2. The Null hypothesis H0 is rejected if χ2stat > χ2r-1,c-1,α where α ϵ {0.10, 0.05, 0.01}.
The following formula calculates the Cramer's V statistic:
- φ is the phi coefficient
- χ2 derives from the Pearson's chi-squared test
- n is the grand total of observations
- c is the number of columns
- r is the number of rows
- If the chi-square statistic is greater than the critical value, then the function rejects the Null hypothesis.
- If the chi-square statistic is lesser than or equal to the critical value, then the function fails to reject the Null hypothesis.