Hypothesis-Test Mode Example 1: Normality Tests without GroupByColumns

Hypothesis-Test Mode Example 1: Normality Tests without GroupByColumns - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Input

The example creates input tables t1 and t2 from a table of raw input, raw_normal_50_2, which contains data drawn from a normal distribution with a mean of 50 and a standard deviation of 2. Here are the first 10 rows:

raw_normal_50_2
price
48.0701
52.6426
48.6372
50.9832
50.523
52.1773
50.3103
48.4424
50.1352
50.1382
...

The following statements create tables t1 and t2:

CREATE MULTISET TABLE t1 AS (
  SELECT COUNT(*) AS group_size 
  FROM raw_normal_50_2
  WHERE price IS NOT NULL
) WITH data;

CREATE MULTISET TABLE t2 AS (
  SELECT RANK() OVER (ORDER BY price) AS "rank", price 
  FROM raw_normal_50_2
  WHERE price IS NOT NULL
) WITH data;

SQL Call

SELECT * FROM DistributionMatchReduce (
  ON DistributionMatchMultiInput (
    ON t2 AS "input" PARTITION BY ANY
    ON t1 AS groupstats DIMENSION
    USING
    ValueColumn ('price')
    Tests ('KS', 'CvM', 'AD', 'CHISQ')
    Distributions ('NORMAL:49.97225, 2.009698')
    MinGroupSize (50)
    NumCell (10)
  ) PARTITION BY 1
) AS dt;

Output

The reported p-value for each of the four tests is around 0.4, which does not rule out the null hypothesis that the data are consistent with a normal distribution with the specified mean and standard deviation.

In the output table column names, when 'a' and 'b' appear between digits, interpret them as comma (,) and period (.), respectively.

group_size	normal$49b97225a2b009698_ks_statistic	normal$49b97225a2b009698_ks_p_value	normal$49b97225a2b009698_cvm_statistic	normal$49b97225a2b009698_cvm_p_value	normal$49b97225a2b009698_ad_statistic	normal$49b97225a2b009698_ad_p_value	normal$49b97225a2b009698_chisq_statistic	normal$49b97225a2b009698_chisq_p_value
400	0.03195	0.411803	0.055654	0.430791	0.376152	0.410291	7.8	0.35056