TD_ChiSq Example | chisq | Teradata Vantage - Example: How to Use TD_ChiSq

TD_ChiSq Example | chisq | Teradata Vantage - Example: How to Use TD_ChiSq - Analytics Database

Database Analytic Functions

Deployment

VantageCloud

VantageCore

Edition

Enterprise

IntelliFlex

VMware

Product

Analytics Database

Release Number

17.20

Published

June 2022

Language

English (United States)

Last Update

2024-04-06

dita:mapPath

gjn1627595495337.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

jmh1512506877710

Product Category

Teradata Vantage™

Every complete example in this document is available in a zip file that you can download. The zip file includes a SQL script file that creates the input tables for the examples. If you are reading this document on https://docs.teradata.com/, you can download the zip file from the attachment in the left sidebar.

This example tests whether gender influences party affiliation. The null hypothesis is that gender and party affiliation are independent. TD_ChiSq compares the null hypothesis (expected frequencies) to the contingency table (observed frequencies).

TD_ChiSq Input: contingency1

The contingency table contains the frequencies of men and women affiliated with each party. category_1, gender, has the following labels: female and male. category_2, party, has the following labels: Democrats and Republicans.

This example illustrates a two-way contingency table with two categories, category_1 and category_2, respectively.

Each row has a label, i which has a value between 1 to r, and each column has a label, j which has a value between 2 to c. The values of c and r are 3 and 2, respectively.

In this example, category 1_label1 corresponds to females and category_1_label2 corresponds to males. Similarly, category2_label1 corresponds to Democrats and category2_label2 corresponds to Republicans.

Query to create the contingency table is as follows:

Query to construct table contingency1

CREATE MULTISET TABLE contingency1 AS
(
SELECT gender AS gender
, sum((case when party = 'dem' then 1 else 0 end)) as dem_party_cnt
, sum((case when party = 'rep' then 1 else 0 end)) as rep_party_cnt
FROM mytesttable
GROUP BY gender
) with data;

The input table, contingency1, can be like this:

gender	dem	rep
male	8	5
female	6	9

Alternate Query to construct table contingency1 using PIVOT

DROP TABLE contingency1;
CREATE MULTISET TABLE contingency1 AS
(
SELECT *
FROM (select gender AS gender, party, count(party) AS party_count
FROM mytesttable group by gender, party) AS mytesttable
 PIVOT ( SUM(party_count) AS party_cnt FOR party
                        IN ('dem' AS dem, 'rep' AS rep))Tmp
) with data;

The input table, contingency1 using PIVOT, can be like this:

gender	dem	rep
male	8	5
female	6	9

Alternate Query to construct table contingency1 using TD_Pivoting

DROP TABLE contingency1;
CREATE MULTISET TABLE contingency1 AS (
SELECT * FROM TD_Pivoting (
  ON (select gender AS gender, party, count(party) AS party_count FROM mytesttable GROUP BY gender, party) AS InputTable PARTITION BY gender
  USING
  PartitionColumns ('gender')
  TargetColumns ('party_count')
  PivotColumn('party')
  PivotKeys('dem','rep')
  Aggregation('party_count:SUM')
  OutputColumnNames('dem_party_cnt','rep_party_cnt')
) AS dt
) with data;

The input table, contingency1 using TD_Pivoting, can be like this:

gender	dem	rep
male	8	5
female	6	9

TD_ChiSq SQL Call

SELECT * FROM TD_ChiSq (
  ON contingency1 AS CONTINGENCY
  OUT TABLE EXPCOUNTS (exptable1)
  USING
  Alpha (0.05)
) AS dt;

TD_ChiSq Output Table

chi_square	cramers_v	df	alpha	p_value	criticalvalue	conclusion
1.292307692	0.214834462	1	0.050000000	0.255623108	3.841458821	Fail to reject Null hypothesis

exptable1 SQL Call

SELECT * FROM exptable1;

exptable1 Output Table

gender	dem	rep
male	6.500000000	6.500000000
female	7.500000000	7.500000000