Chi-Squared Tests | Vantage Analytics Library - Chi-Squared Tests - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage

Chi-Squared tests determine whether the probabilities observed from data in a RxC contingency table are the same. The null hypothesis is that the probabilities are the same. The tests output a p-value to compare to the specified threshold to determine whether to reject the null hypothesis.

The most common use for chi-squared tests is comparing observed counts to expected counts. For example, suppose a random sample of N people has m males and f females. The 50/50 hypothesis is that the expected counts are m=½N and f=½N. The chi-squared test can determine if the difference between the expected and observed counts are significant enough to rule out the 50/50 hypothesis.

The chi-squared tests calculate the following measures of association.

Measure of Association Description
Phi coefficient Degree of association between two binary variables. Same as Pearson correlation for two dichotomous variables. Adjusts chi-squared significance to factor out sample size.
Cramer's V (also called Cramer's phi) Correlation between two variables. Used to examine association between two categorical variables with more than 2x2 (in which case phi is inappropriate). Attainable upper limit is always 1. Most popular chi-squared-based measure of nominal association.
Likelihood ratio chi-squared statistic Tests hypothesis of no association of columns and rows in nominal-level tabular data.

More recent version of chi-squared test, directly related to log-linear analysis and logistic regression. Based on maximum likelihood estimation—ratio between observed and expected frequencies—instead of difference between them.

Continuity-Adjusted chi-squared statistic Pearson chi-squared statistic adjusted for continuity of chi-squared distribution. Most useful for small samples. Becomes more like Pearson chi-squared statistic as sample size increases.

Continuity adjustment is controversial. This chi-squared test is more conservative, and more like Fisher's exact test, when sample is small.

Contingency coefficient Adjusted phi coefficient recommended only for tables 5x5 or larger. Underestimates level of association for smaller tables.

Always less than 1.0. Larger value indicates stronger association.