TD_ChiSq Usage Notes - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantage™

Computational Method

The Chi-Square test finds statistically significant associations between categorical variables. The test determines if the categorical variables are statistically independent or not.

The contingency tables organize the data for analysis. A two-way contingency table consists of r rows and c columns wherein:
  • Variable 1 corresponds rows that consists of r categories.
  • Variable 2 corresponds columns that consists of c categories.

Each cell of the contingency table is the count of the joint occurrence of particular levels of variable 1 and variable 2.

For example, the following two-way contingency table shows the categorical variable Gender with two levels (Male, Female) and the categorical variable Affiliation with two levels (Smokers, Non-smokers).

Gender Affiliation Table
Gender Affiliation
Smokers Non-Smokers
Male n11 n12
Female n21 n22

The cell counts nij , i = 1, 2; j = 1, 2 are number of joint occurrences of Gender and Affiliation at their ith and the jth levels, respectively. The Null and alternative hypotheses H0 and H1 corresponding to a χ2 test of independence is as follows:

H0: The two categorical variables are independent

vs

H1: The two categorical variables are not independent

Use the previous table to calculate the expected cell counts:

e11 = n11 + n21

e12 = n11 + n12

e21 = n21 + n22

e22 = n12 + n22

The following formula calculates the χ2 test statistic:


Chi-Square Test Statistic

The χ2 statistic follows a Chi-Square distribution with r - 1 and c - 1 degrees of freedom. In the Gender Affiliation table, r=2 and c=2. The Null hypothesis H0 is rejected if χ2stat > χ2r-1,c-1,α where α ϵ {0.10, 0.05, 0.01}.

The following formula calculates the Cramer's V statistic:


Cramer's V statistic formula
where:
  • φ is the phi coefficient
  • χ2 derives from the Pearson's chi-squared test
  • n is the grand total of observations
  • c is the number of columns
  • r is the number of rows
Used rules to compute the hypothesis conclusion:
  • If the chi-square statistic is greater than the critical value, then the function rejects the Null hypothesis.
  • If the chi-square statistic is lesser than or equal to the critical value, then the function fails to reject the Null hypothesis.