Distribution Matching Hypothesis-Test Mode Syntax Elements | Teradata Vantage - Hypothesis-Test Mode Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™
TargetColumn
Specify the name of the InputTable column that contains the values of the sample data set.
Tests
[Optional] Specify one to four tests to perform:
test Description
'KS' Kolmogorov-Smirnov test.
'CvM' Cramér-von Mises criterion.
'AD' Anderson-Darling test.
'CHISQ' Pearson's Chi-squared test.
Default: All tests
Distributions
Specify the reference distributions and their parameters. Either all distributions must be continuous or all must be discrete.
Continuous Distributions and Parameters
distribution:parameters parameter Descriptions
BETA:α,β α > 0 is the first shape parameter.

β > 0 is the second shape parameter.

CAUCHY:x,θ x, a DOUBLE PRECISION value, is the median parameter.

θ > 0 is the scale parameter.

CHISQ:k k, a positive INTEGER, is the degree of freedom.
EXPONENTIAL:θ θ > 0 is the mean parameter, which is the inverse rate.
F:d1,d2 d1 > 0 and d2 > 0 are degrees of freedom.
GAMMA:k,θ k > 0 is the shape parameter.

θ > 0 is the scale parameter.

LOGNORMAL:μ,σ μ, a DOUBLE PRECISION value, is the mean.

σ > 0 is the standard deviation.

NORMAL:μ,σ μ, a DOUBLE PRECISION value, is the mean.

σ > 0 is the standard deviation.

T:k k, a positive INTEGER, is the degree of freedom.
TRIANGULAR:a,c,b a <= c <= b && a < b, where a is the lower limit of this distribution (inclusive), b is the upper limit of this distribution (inclusive), and c is the mode of this distribution.
UNIFORMCONTINUOUS:a,b a < b, where a is the lower bound of this distribution (inclusive) and b is the upper bound of this distribution (exclusive).
WEIBULL:α,β α > 0 is the shape parameter.

β > 0 is the scale parameter.

The function uses the two-parameter form of the distribution defined by the Weibull Distribution, http://mathworld.wolfram.com/WeibullDistribution.html, equations (1) and (2).

Discrete Distributions and Parameters
distribution:parameters parameter Descriptions
BINOMIAL:n,p n, a positive INTEGER, is the number of trials.

p, in [0,1], is the success probability in each trial.

GEOMETRIC:p p, in [0,1], is the success probability in each trial.
NEGATIVEBINOMIAL:r,p r, a positive INTEGER, is the number of successes until the function stops the tests.

p, in [0,1], is the success probability in each trial.

The function represents the distribution of the number of failures before r successes occur.

POISSON:λ λ > 0 is the rate parameter.
UNIFORMDISCRETE:a,b a < b, where a is the lower bound of this distribution (inclusive) and b is the upper bound of this distribution (exclusive). Both a and b are INTEGER values.
For discrete distributions:
  • BINOMIAL, GEOMETRIC, NEGATIVEBINOMIAL, and POISSON distributions are on N={0,1,2,...}.
  • UNIFORMDISCRETE distribution is on events, which are represented by integers.
GroupByColumns
[Optional] Specify the names of the InputTable columns that contain the group identifications over which to run the test. The function can run multiple tests for different partitions of the data in parallel. If you omit this syntax element, specify PARTITION BY 1 and omit the GROUP BY clause in the second ON clause.
MinGroupSize
[Optional] Specify the minimum group size. The function ignores groups smaller than the minimum size when calculating statistics.
Default: 50
NumCell
[Optional] Specify the number of cells to make discrete in a continuous distribution. The cell_size must be greater than 3 if distribution is NORMAL; otherwise, it must be greater than 1. The quotient min_group_size/cell_size cannot be less than 5.
If you specify NumCell, you must specify 'CHISQ' in the Tests syntax element.
Default: 10