Hypothesis-Test Mode Example 2: Normality Tests with GroupByColumns

Hypothesis-Test Mode Example 2: Normality Tests with GroupByColumns - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

This example shows the use of grouping columns, and also illustrates the syntax for testing against multiple distributions in a single SQL command.

Input

The input table, factory_7, represents hypothetical mean-time-to-failure data for two products. This is a subset of the rows:

factory_7
product	mttf
A	10039.5
A	9926.6
A	9971.34
A	9868.7
A	9940.17
A	10266.7
A	9768.64
A	10043.2
A	10133.7
A	9731.33
...	...
D	9721.21
D	10068.6
D	9952
D	9851.94
D	10378.3
D	9908.9
D	9749.43
D	10448
D	9681.25
D	10147.5
...	...

SQL Call

The function call evaluates two possible distributions (normal and uniform) and applies the Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) fit tests.

SELECT * FROM DistributionMatchReduce (
  ON DistributionMatchMultiInput (
	ON (
      SELECT RANK() OVER (PARTITION BY product ORDER BY mttf) AS "rank", product, mttf
	  FROM factory_7 
	  WHERE mttf IS NOT NULL
    ) AS "input" PARTITION BY ANY
	ON (
      SELECT product, COUNT(*) AS group_size 
	  FROM factory_7 
	  WHERE mttf IS NOT NULL 
	  GROUP BY product
    ) AS groupstats DIMENSION
	USING
	ValueColumn ('mttf')
	Tests ('KS', 'AD')
	Distributions ('NORMAL:10000,150','UNIFORMCONTINUOUS:9500,10500')
	GroupByColumns ('product')
	MinGroupSize (50)
  ) PARTITION BY product
) AS dt;

Output

The reported p-values support these conclusions:

For product A:
- Both tests fail to reject the null hypothesis that the data fit a normal distribution with the specified parameters.
- Both tests reject the null hypothesis that the data fit the specified uniform distribution.
For product D:
- Both tests fail to reject the null hypothesis that the data fit a uniform distribution with the specified parameters.
- Both tests reject the null hypothesis that the data fit the specified normal distribution.

In the output table column names, when 'a' and 'b' appear between digits, interpret them as comma (,) and period (.), respectively.

product	group_size	normal$10000a150_ks_statistic	normal$10000a150_ks_p_value	normal$10000a150_ad_statistic	normal$10000a150_ad_p_value	normal$10000a150_chisq_statistic	normal$10000a150_chisq_p_value	uniformcontinuous$9500a10500_ks_statistic	uniformcontinuous$9500a10500_ks_p_value	uniformcontinuous$9500a10500_ad_statistic	uniformcontinuous$9500a10500_ad_p_value	uniformcontinuous$9500a10500_chisq_statistic	uniformcontinuous$9500a10500_chisq_p_value
D	3000	0.207718	0	886.383	0	0.00774023	0.993823	0.442443	0.805828	0.442443	0.805828	4.72667	0.857455
A	3000	0.0113786	0.454754	0.530142	0.175628	0.214864	2.22045e-16	365.583	2e-07	365.583	2e-07	2613.86	0