RandomSample Example 3: KMeans|| Sampling - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example uses KMeans|| sampling. Like Example 2, this example treats the numeric variables cyl, gear, and carb as categorical variables and uses the categorical variables vs and am. However, this example uses the Manhattan distance metric for the numerical variables and the Hamming distance metric for the categorical variables. Because the Hamming distance metric requires categories of equal length, assume that in input table column am, 'manual' has been changed to 'manualsys' (which is the same length as 'automatic').

Input

  • InputTable: fs_input1, created from fs_input (in RandomSample Example 1: Basic Sampling (Weighted)) and populated with these statements:
    CREATE MULTISET TABLE fs_input1 AS (
      SELECT * FROM fs_input
    ) WITH DATA;
    
    UPDATE fs_input1 SET am='manualsys' WHERE am='manual';

SQL Call

SELECT * FROM RandomSample (
  ON fs_input1 AS InputTable
  USING
  NumSample (20)
  SamplingMode ('kmeans||')
  InputColumns ('mpg:carb')
  CategoryWeights (1000, 10, 100, 100, 100)
  AsCategories ('cyl' ,'gear', 'carb')
  CategoricalDistance ('hamming')
  Distance ('manhattan')
  Seed (1)
  IterationNum (2)
  SeedColumn ('model')
) AS dt ORDER BY 1,2,3;

Output

set_id mpg cyl disp hp drat wt qsec vs am gear carb
0 12.42 8 414.4 228 3.324 4.7398 16.808 S automatic 3 4
0 15.8 8 351 264 4.22 3.17 14.5 S manualsys 5 4
0 17.225 8 349 162.5 2.9375 3.58125 16.9525 S automatic 3 2
0 17.3 8 275.8 180 3.07 3.73 17.6 S automatic 3 3
0 19.2 6 67.6 123 3.92 3.44 18.3 V automatic 4 4
0 19.7 6 145 175 3.62 2.77 15.5 S manualsys 5 6
0 21.4 4 121 109 4.11 2.78 18.6 V manualsys 4 2
0 21.4 6 258 110 3.08 3.215 19.44 V automatic 3 1
0 21.5 4 120.1 97 3.7 2.465 20.01 V automatic 3 1
0 23.6 4 143.75 78.5 3.805 3.17 21.45 V automatic 4 2