TD_ChiSq Function Example | Teradata Vantage - TD_ChiSq Example - Teradata® Database

Database Analytic Functions

Product
Teradata® Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-28
dita:mapPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/wnd1589838592459.ditamap
dita:ditavalPath
Teradata_Vantage™___Advanced_SQL_Engine_Analytic_Functions.withLogo_upload_July2021/ayr1485454803741.ditaval
dita:id
B035-1206
lifecycle
previous
Product Category
Teradata Vantage™

Every complete example in this document is available in a zip file that you can download. The zip file includes a SQL script file that creates the input tables for the examples. If you are reading this document on https://docs.teradata.com/, you can download the zip file from the attachment in the left sidebar.

This example tests whether gender influences smoking habits affiliation. The null hypothesis is that gender and smoking habits affiliation are independent. TD_ChiSq compares the null hypothesis (expected frequencies) to the contingency table (observed frequencies).

Input: contingency1

The contingency table contains the frequencies of men and women affiliated with each smoking habit. category_1, gender, has labels "female" and "male". category_2, habits, has labels "smokers" and "non-smokers".

This example illustrates a two-way contingency table with two categories, category_1 and category_2 respectively.

Each row has a label, i which has a value between 1 to r, and each column has a label, j which has a value between 2 to c. The values of c and r are 3 and 2 respectively.

Here, category 1_label1 corresponds to females and category_1_label2 corresponds to males. Similarly, category2_label1 corresponds to smokers and category2_label2 corresponds to non-smokers.

Query to create the contingency table is as follows:

--Query to construct table contingency1
CREATE MULTISET TABLE contingency1 as
(
select gender as gender
, sum((case when smokinghabit = 'smoker' then 1 else 0 end)) as smoker_cnt
, sum((case when smokinghabit= 'nonsmoker' then 1 else 0 end)) as nonsmoker_cnt
from mytesttable
group by gender
) with data;
 
--Alternate Query to construct table contingency1 using PIVOT
CREATE MULTISET TABLE contingency1 as
(
SELECT *
FROM (select gender as gender, smokinghabit, count(smokinghabit) as smokinghabit_count
from mytesttable group by gender, smokinghabit) as mytesttable 
 PIVOT ( SUM(smokinghabit_count) as smokinghabit_cnt FOR smokinghabit                        
                        IN ('smoker' AS smoker, 'nonsmoker' AS nonsmoker))Tmp
) with data;
                   Habits        
     gender        smokers     non-smokers
    ----------    ---------  -------------
     female            6            9
     male              8            5

SQL Call

SELECT * from TD_CHISQ (
  ON contingency1 AS CONTINGENCY
  OUT TABLE EXPCOUNTS (exptable1)
  USING
  Alpha (0.05)
) AS dt;

Output Table

chi_square	         cramers_v	        df	alpha	                    p_value	         criticalvalue	        conclusion
----------               -------------------    ----     --------------------     -------------------     --------------         -----------------------------
1.29230769230769E 000	 2.14834462211830E-001	1	5.00000000000000E-002	2.55623107546413E-001	3.84145882069412E 000	Fail to reject Null hypothesis

exptable1

gender              smokers           non-smokers
----------  ----------------------  ----------------------
female       7.50000000000000E 000   7.50000000000000E 000
male         6.50000000000000E 000   6.50000000000000E 000