TD_ChiSq Example | chisq | Teradata Vantage - Example: How to Use TD_ChiSq - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
lifecycle
latest
Product Category
Teradata Vantageā„¢

Every complete example in this document is available in a zip file that you can download. The zip file includes a SQL script file that creates the input tables for the examples. If you are reading this document on https://docs.teradata.com/, you can download the zip file from the attachment in the left sidebar.

This example tests whether gender influences party affiliation. The null hypothesis is that gender and party affiliation are independent. TD_ChiSq compares the null hypothesis (expected frequencies) to the contingency table (observed frequencies).

TD_ChiSq Input: contingency1

The contingency table contains the frequencies of men and women affiliated with each party. category_1, gender, has the following labels: female and male. category_2, party, has the following labels: Democrats and Republicans.

This example illustrates a two-way contingency table with two categories, category_1 and category_2, respectively.

Each row has a label, i which has a value between 1 to r, and each column has a label, j which has a value between 2 to c. The values of c and r are 3 and 2, respectively.

In this example, category 1_label1 corresponds to females and category_1_label2 corresponds to males. Similarly, category2_label1 corresponds to Democrats and category2_label2 corresponds to Republicans.

Query to create the contingency table is as follows:

Query to construct table contingency1

CREATE MULTISET TABLE contingency1 AS
(
SELECT gender AS gender
, sum((case when party = 'dem' then 1 else 0 end)) as dem_party_cnt
, sum((case when party = 'rep' then 1 else 0 end)) as rep_party_cnt
FROM mytesttable
GROUP BY gender
) with data;

The input table, contingency1, can be like this:

gender dem rep
male 8 5
female 6 9

Alternate Query to construct table contingency1 using PIVOT

DROP TABLE contingency1;
CREATE MULTISET TABLE contingency1 AS
(
SELECT *
FROM (select gender AS gender, party, count(party) AS party_count
FROM mytesttable group by gender, party) AS mytesttable
 PIVOT ( SUM(party_count) AS party_cnt FOR party
                        IN ('dem' AS dem, 'rep' AS rep))Tmp
) with data;

The input table, contingency1 using PIVOT, can be like this:

gender dem rep
male 8 5
female 6 9

Alternate Query to construct table contingency1 using TD_Pivoting

DROP TABLE contingency1;
CREATE MULTISET TABLE contingency1 AS (
SELECT * FROM TD_Pivoting (
  ON (select gender AS gender, party, count(party) AS party_count FROM mytesttable GROUP BY gender, party) AS InputTable PARTITION BY gender
  USING
  PartitionColumns ('gender')
  TargetColumns ('party_count')
  PivotColumn('party')
  PivotKeys('dem','rep')
  Aggregation('party_count:SUM')
  OutputColumnNames('dem_party_cnt','rep_party_cnt')
) AS dt
) with data;

The input table, contingency1 using TD_Pivoting, can be like this:

gender dem rep
male 8 5
female 6 9

TD_ChiSq SQL Call

SELECT * FROM TD_ChiSq (
  ON contingency1 AS CONTINGENCY
  OUT TABLE EXPCOUNTS (exptable1)
  USING
  Alpha (0.05)
) AS dt;

TD_ChiSq Output Table

chi_square cramers_v df alpha p_value criticalvalue conclusion
1.292307692 0.214834462 1 0.050000000 0.255623108 3.841458821 Fail to reject Null hypothesis

exptable1 SQL Call

SELECT * FROM exptable1;

exptable1 Output Table

gender dem rep
male 6.500000000 6.500000000
female 7.500000000 7.500000000