TD_ChiSq Function | Hypothesis Testing | Teradata Vantage - TD_ChiSq - Teradata® Database

Database Analytic Functions

Product
Teradata® Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-28
dita:mapPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/wnd1589838592459.ditamap
dita:ditavalPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/ayr1485454803741.ditaval
dita:id
B035-1206
lifecycle
previous
Product Category
Teradata Vantage™

TD_ChiSq performs Pearson's chi-squared (χ2) test for independence, which determines if there is a statistically significant difference between the expected and observed frequencies in one or more categories of a contingency table (also called a cross tabulation).

Test Type

  • One-tailed, upper-tailed
  • One-sample
  • Unpaired

Computational Method

The Chi-Square test finds statistically significant associations between categorical variables. The test determines if the categorical variables are statistically independent or not.

The data for analysis is organized in a table known as contingency tables. A two-way contingency table consists of r rows and c columns wherein:
  • The rows correspond to variable 1 that consists of r categories
  • The columns correspond to variable 2 that consists of c categories

Each cell of the contingency table is the count of the joint occurrence of particular levels of variable 1 and variable 2.

For example, the following two-way contingency table shows the categorical variable Gender with two levels (Male, Female) and the categorical variable Affiliation with two levels (Smokers, Non-smokers).

Gender Affiliation table
Gender Affiliation
Smokers Non-Smokers
Male n 11 n 12
Female n 21 n 22

The cell counts nij , i = 1, 2; j = 1, 2 are number of joint occurrences of Gender and Affiliation at their ith and the jth levels respectively. The Null and alternative hypotheses H0 and H1 corresponding to a χ2 test of independence is as follows:

H0: The two categorical variables are independent

vs

H1: The two categorical variables are not independent

Using the above table, the expected cell counts are calculated:

e11 = n11 + n21

e12 = n11 + n12

e21 = n21 + n22

e22 = n12 + n22

The χ2 test statistic is calculated as:


Chi-square_test-statistic.png

The χ2 statistic follows a Chi-Square distribution with r - 1 and c - 1 degrees of freedom. In the Gender Affiliation table, r=2 and c=2. The Null hypothesis H0 is rejected if χ2 stat > χ2 r-1,c-1,α where α ϵ {0.10, 0.05, 0.01}.

The Cramer's V statistic is calculated using the following formula:


cramerV_stat
where:
  • φ is the phi coefficient
  • χ2 is derived from the Pearson's chi-squared test
  • n is the grand total of observations
  • c is the number of columns
  • r is the number of rows
The following rules are used to compute the hypothesis conclusion:
  • If the chi-square statistic is greater than the critical value, then the function rejects the Null hypothesis.
  • If the chi-square statistic is lesser than or equal to the critical value, then the function fails to reject the Null hypothesis.