1.1 - 8.10 - KModes Example: InitialSeedTable - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

Input

The InputTable has 32 observations on 11 variables, about different models of cars.

InputTable: kmodes_input
Variable Type Description drat wt qsec vs am gear carb
mpg Normalized numerical Miles/US gallon -0.835197792 0.22254417 -0.307088658 S automatic 3 2
cyl Numerical treated as categorical Number of cylinders -1.246659826 2.077504765 0.073449451 S automatic 3 4
disp Normalized numerical Displacement (cubic inches) 0.24956575 0.636460997 -1.364760755 S automatic 3 4
hp Normalized numerical Gross horsepower -0.685575235 2.174596366 -0.239934874 S automatic 3 4
drat Normalized numerical Rear axle ratio 0.473999587 -0.917004624 0.426006817 V manual 4 1
wt Normalized numerical Weight (lb/1000) -1.564607761 0.309415603 -0.54772305 S automatic 3 2
qsec Normalized numerical 1/4 mile time -0.722980874 0.360516446 -1.124126363 S automatic 3 4
vs Categorical Engine (V or straight) 0.043834734 -0.457097039 -1.314395417 S manual 5 6
am Categorical Transmission (automatic or manual) 0.90416444 -1.039646647 0.907275602 V manual 4 1
gear Numerical treated as categorical Number of forward gears 0.90416444 -1.310481114 0.588295128 V manual 4 1
carb Numerical treated as categorical Number of carburetors 1.166003916 -0.048290296 -1.874010283 S manual 5 4

The InitialSeedTable has three points that serve as initial cluster centers.

InitialSeedTable: kmodes_init
model mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 0.449543447 4 -0.990182091 -0.783040459 0.473999587 -0.917004624 0.426006817 V manual 4 1
Ferrari Dino -0.064813069 6 -0.691647397 0.412942174 0.043834734 -0.457097039 -1.314395417 S manual 5 6
Lincoln Continental -1.607882616 8 1.849931752 0.996348337 -1.115740088 2.255335698 -0.016088927 S automatic 3 4

SQL Call

DROP TABLE kmodes_clusters;

SELECT * FROM KModes (
  ON kmodes_input AS InputTable
  ON kmodes_init AS InitialSeedTable
  OUT TABLE OutputTable (kmodes_clusters)
  USING
  TargetColumns ('mpg:carb')
  NumericAsCategorical ('cyl','gear','carb')
) AS dt;

Output

With InitialSeedTable, the cluster centers and assignments are the same every time, with the same distance metric (in this case, the default, Euclidean).

+

This query returns the following table:

SELECT * FROM kmodes_clusters;
 cluster_id mpg                disp               hp                 drat              wt                 qsec              cyl vs am        gear carb within_cluster_ss cluster_weight distance_metric   category_weights          
 ---------- ------------------ ------------------ ------------------ ----------------- ------------------ ----------------- --- -- --------- ---- ---- ----------------- -------------- ----------------- ------------------------- 
 1               -0.2639188168       -0.059076587        0.760068841      0.4478156392       -0.221011145     -1.2494800924 6   s  manual    5    4     20.7870494221671            5.0 NULL              NULL                     
 0          -0.724943435928571  0.890010157642857  0.511912862714286     -0.9434069635  0.794435602785714      -0.180375863 8   s  automatic 3    4     43.6174097337196           14.0 EUCLIDEAN,OVERLAP [1.0, 1.0, 1.0, 1.0, 1.0]
 2           0.882215552923077 -0.935750713230769 -0.843624945153846 0.843739945692308 -0.770541747153846 0.674820195846154 4   v  manual    4    2     48.7743732753756           13.0 NULL              NULL

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.