KModes Example 1: InitialSeedTable - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

The InputTable has 32 observations on 11 variables, about different models of cars.

InputTable: kmodes_input
Variable Type Description drat wt qsec vs am gear carb
mpg Normalized numerical Miles/US gallon -0.835197792 0.22254417 -0.307088658 S automatic 3 2
cyl Numerical treated as categorical Number of cylinders -1.246659826 2.077504765 0.073449451 S automatic 3 4
disp Normalized numerical Displacement (cubic inches) 0.24956575 0.636460997 -1.364760755 S automatic 3 4
hp Normalized numerical Gross horsepower -0.685575235 2.174596366 -0.239934874 S automatic 3 4
drat Normalized numerical Rear axle ratio 0.473999587 -0.917004624 0.426006817 V manual 4 1
wt Normalized numerical Weight (lb/1000) -1.564607761 0.309415603 -0.54772305 S automatic 3 2
qsec Normalized numerical 1/4 mile time -0.722980874 0.360516446 -1.124126363 S automatic 3 4
vs Categorical Engine (V or straight) 0.043834734 -0.457097039 -1.314395417 S manual 5 6
am Categorical Transmission (automatic or manual) 0.90416444 -1.039646647 0.907275602 V manual 4 1
gear Numerical treated as categorical Number of forward gears 0.90416444 -1.310481114 0.588295128 V manual 4 1
carb Numerical treated as categorical Number of carburetors 1.166003916 -0.048290296 -1.874010283 S manual 5 4

The InitialSeedTable has three points that serve as initial cluster centers.

InitialSeedTable: kmodes_init
model mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 0.449543447 4 -0.990182091 -0.783040459 0.473999587 -0.917004624 0.426006817 V manual 4 1
Ferrari Dino -0.064813069 6 -0.691647397 0.412942174 0.043834734 -0.457097039 -1.314395417 S manual 5 6
Lincoln Continental -1.607882616 8 1.849931752 0.996348337 -1.115740088 2.255335698 -0.016088927 S automatic 3 4

SQL Call

DROP TABLE kmodes_clusters;

SELECT * FROM KModes (
  ON kmodes_input AS InputTable
  ON kmodes_init AS InitialSeedTable
  OUT TABLE OutputTable (kmodes_clusters)
  USING
  TargetColumns ('mpg:carb') 
  NumericAsCategorical ('cyl', 'gear', 'carb')
) AS dt;

Output

With the initialseedtable argument, the cluster centers and assignments are the same every time, with the same distance metric (in this case, the default, Euclidean).

Output Table
summary between_cluster_error total_within_cluster_error pseudo_f
Number of Clusters: 3 195.82116758156 113.178832431262 16.7251942136624
Number of Iterations: 5      
Model Converged: true      
Number of Data Points: 32.0      

This query returns the following table:

SELECT * FROM kmodes_clusters ORDER BY 1;
kmodes_clusters
cluster_id mpg disp hp drat wt qsec cyl vs am gear carb within_cluster_ss cluster_weight distance_metric category_weights
0 -0.724943435928571 0.890010157642857 0.511912862714286 -0.9434069635 0.794435602785714 -0.180375863 8 S automatic 3 4 43.6174097337196 14 EUCLIDEAN,OVERLAP [1.0,1.0,1.0,1.0,1.0]
1 -0.2639188168 -0.059076587 0.760068841 0.4478156392 -0.221011145 -1.2494800924 6 S manual 5 4 20.7870494221671 5    
2 0.882215552923077 -0.935750713230769 -0.843624945153846 0.843739945692308 -0.770541747153846 0.674820195846154 4 V manual 4 2 48.7743732753756 13