1.1 - 8.10 - Best-Match Mode Example: INTEGER Input - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

This example finds the best matching distribution for a data set that consists of integers that are from different sources and are output by different distributions.

Input

The input table, age_distribution, contains the ages of children visiting three amusement parks during a one-week period in spring and another one-week period in summer.

age_distribution
season park_name age
Spring Funland 12
Spring Funland 1
Spring Funland 10
Spring Funland 2
Spring Funland 10
Spring Funland 3
Spring Funland 3
Spring Funland 7
Spring Funland 11
Spring Funland 8
... ... ...
Spring Wonderland 10
Spring Wonderland 12
Spring Wonderland 9
Spring Wonderland 12
Spring Wonderland 10
Spring Wonderland 8
Spring Wonderland 8
Spring Wonderland 8
Spring Wonderland 9
Spring Wonderland 8
... ... ...
Spring KidsWorld 7
Spring KidsWorld 6
Spring KidsWorld 5
Spring KidsWorld 6
Spring KidsWorld 4
Spring KidsWorld 1
Spring KidsWorld 8
Spring KidsWorld 6
Spring KidsWorld 3
Spring KidsWorld 4
... ... ...
Summer Funland 10
Summer Funland 6
Summer Funland 10
Summer Funland 7
Summer Funland 4
Summer Funland 8
Summer Funland 6
Summer Funland 7
Summer Funland 4
Summer Funland 10
... ... ...
Summer Wonderland 7
Summer Wonderland 9
Summer Wonderland 9
Summer Wonderland 2
Summer Wonderland 4
Summer Wonderland 8
Summer Wonderland 5
Summer Wonderland 4
Summer Wonderland 7
Summer Wonderland 11
... ... ...
Summer KidsWorld 5
Summer KidsWorld 5
Summer KidsWorld 3
Summer KidsWorld 8
Summer KidsWorld 4
Summer KidsWorld 7
Summer KidsWorld 6
Summer KidsWorld 3
Summer KidsWorld 2
Summer KidsWorld 0
... ... ...

SQL Call

SELECT * FROM DistributionMatchReduce (
  ON DistributionMatchMultiInput (
    ON (
      SELECT COUNT(1) AS counts,
        SUM(counts) OVER (PARTITION BY park_name, season
          
          ORDER BY age ROWS UNBOUNDED PRECEDING) AS "rank", park_name, season, age
      FROM age_distribution
      WHERE age IS NOT NULL
      GROUP BY park_name, season, age
    ) AS InputTable PARTITION BY ANY
    ON (
      SELECT park_name, season, COUNT(*) AS group_size, AVG(age) AS mean, 
        STDDEV_SAMP(age) AS sd, MAX(age) AS "maximum", MIN(age) AS "minimum"
      FROM age_distribution
      WHERE age IS NOT NULL
      GROUP BY park_name, season
    ) AS GroupStatistics DIMENSION
    USING
    TargetColumn('age')
    TESTS('KS', 'AD', 'CHISQ')
     GroupByColumns('park_name', 'season')
    MINGROUPSIZE(50)
    NumCell(10)
  ) PARTITION BY park_name, season
)AS dt;

Output

The function has attempted to identify the best matching distribution for each partition of the data, based on each test specified in the SQL call. For each partition, the output shows the distribution and parameters identified by each test with the associated p-value.

 park_name  season group_size best_match_ks_top1             p_value_ks_top1      best_match_ad_top1                     p_value_ad_top1      best_match_chisq_top1            p_value_chisq_top1    
 ---------- ------ ---------- ------------------------------ -------------------- -------------------------------------- -------------------- -------------------------------- --------------------- 
 wonderland spring        400 binomial:16,0.5091542759742608  0.06241687387228012 binomial:16,0.5091542759742608         1.500000053056283E-6 binomial:16,0.5091542759742608                     0.0
 funland    spring        400 uniformdiscrete:1,12             0.9992926716804504 negativebinomial:7,0.46705571565376613 1.500000053056283E-6 uniformdiscrete:1,12             8.948397578478762E-13
 wonderland summer        400 poisson:7.005000114440918        0.9931338429450989 poisson:7.005000114440918               6.47109345663921E-6 binomial:110,0.06332936686652757                   0.0
 funland    summer        400 poisson:7.065000057220459      6.726568099111319E-4 binomial:13,0.507505492077339          1.500000053056283E-6 binomial:13,0.507505492077339                      0.0
 kidsworld  spring        400 uniformdiscrete:1,8              0.9639452695846558 binomial:100,0.5                       1.500000053056283E-6 binomial:100,0.5                                   0.0
 kidsworld  summer        400 poisson:4.457499980926514        0.9960735440254211 binomial:100,0.5                       1.500000053056283E-6 binomial:100,0.5                                   0.0

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.