Every complete example in this document is available in a zip file that you can download. The zip file includes a SQL script file that creates the input tables for the examples. If you are reading this document on https://docs.teradata.com/, you can download the zip file from the attachment in the left sidebar.
This example tests whether gender influences party affiliation. The null hypothesis is that gender and party affiliation are independent. TD_ChiSq compares the null hypothesis (expected frequencies) to the contingency table (observed frequencies).
TD_ChiSq Input: contingency1
The contingency table contains the frequencies of men and women affiliated with each party. category_1, gender, has the following labels: female and male. category_2, party, has the following labels: Democrats and Republicans.
This example illustrates a two-way contingency table with two categories, category_1 and category_2, respectively.
Each row has a label, i which has a value between 1 to r, and each column has a label, j which has a value between 2 to c. The values of c and r are 3 and 2, respectively.
In this example, category 1_label1 corresponds to females and category_1_label2 corresponds to males. Similarly, category2_label1 corresponds to Democrats and category2_label2 corresponds to Republicans.
Query to create the contingency table is as follows:
Query to construct table contingency1
CREATE MULTISET TABLE contingency1 AS ( SELECT gender AS gender , sum((case when party = 'dem' then 1 else 0 end)) as dem_party_cnt , sum((case when party = 'rep' then 1 else 0 end)) as rep_party_cnt FROM mytesttable GROUP BY gender ) with data;
The input table, contingency1, can be like this:
gender | dem | rep |
---|---|---|
male | 8 | 5 |
female | 6 | 9 |
Alternate Query to construct table contingency1 using PIVOT
DROP TABLE contingency1; CREATE MULTISET TABLE contingency1 AS ( SELECT * FROM (select gender AS gender, party, count(party) AS party_count FROM mytesttable group by gender, party) AS mytesttable PIVOT ( SUM(party_count) AS party_cnt FOR party IN ('dem' AS dem, 'rep' AS rep))Tmp ) with data;
The input table, contingency1 using PIVOT, can be like this:
gender | dem | rep |
---|---|---|
male | 8 | 5 |
female | 6 | 9 |
Alternate Query to construct table contingency1 using TD_Pivoting
DROP TABLE contingency1; CREATE MULTISET TABLE contingency1 AS ( SELECT * FROM TD_Pivoting ( ON (select gender AS gender, party, count(party) AS party_count FROM mytesttable GROUP BY gender, party) AS InputTable PARTITION BY gender USING PartitionColumns ('gender') TargetColumns ('party_count') PivotColumn('party') PivotKeys('dem','rep') Aggregation('party_count:SUM') OutputColumnNames('dem_party_cnt','rep_party_cnt') ) AS dt ) with data;
The input table, contingency1 using TD_Pivoting, can be like this:
gender | dem | rep |
---|---|---|
male | 8 | 5 |
female | 6 | 9 |
TD_ChiSq SQL Call
SELECT * FROM TD_ChiSq ( ON contingency1 AS CONTINGENCY OUT TABLE EXPCOUNTS (exptable1) USING Alpha (0.05) ) AS dt;
TD_ChiSq Output Table
chi_square | cramers_v | df | alpha | p_value | criticalvalue | conclusion |
---|---|---|---|---|---|---|
1.292307692 | 0.214834462 | 1 | 0.050000000 | 0.255623108 | 3.841458821 | Fail to reject Null hypothesis |
exptable1 SQL Call
SELECT * FROM exptable1;
exptable1 Output Table
gender | dem | rep |
---|---|---|
male | 6.500000000 | 6.500000000 |
female | 7.500000000 | 7.500000000 |