Input
The InputTable has 32 observations on 11 variables, about different models of cars.
Variable | Type | Description | drat | wt | qsec | vs | am | gear | carb |
---|---|---|---|---|---|---|---|---|---|
mpg | Normalized numerical | Miles/US gallon | -0.835197792 | 0.22254417 | -0.307088658 | S | automatic | 3 | 2 |
cyl | Numerical treated as categorical | Number of cylinders | -1.246659826 | 2.077504765 | 0.073449451 | S | automatic | 3 | 4 |
disp | Normalized numerical | Displacement (cubic inches) | 0.24956575 | 0.636460997 | -1.364760755 | S | automatic | 3 | 4 |
hp | Normalized numerical | Gross horsepower | -0.685575235 | 2.174596366 | -0.239934874 | S | automatic | 3 | 4 |
drat | Normalized numerical | Rear axle ratio | 0.473999587 | -0.917004624 | 0.426006817 | V | manual | 4 | 1 |
wt | Normalized numerical | Weight (lb/1000) | -1.564607761 | 0.309415603 | -0.54772305 | S | automatic | 3 | 2 |
qsec | Normalized numerical | 1/4 mile time | -0.722980874 | 0.360516446 | -1.124126363 | S | automatic | 3 | 4 |
vs | Categorical | Engine (V or straight) | 0.043834734 | -0.457097039 | -1.314395417 | S | manual | 5 | 6 |
am | Categorical | Transmission (automatic or manual) | 0.90416444 | -1.039646647 | 0.907275602 | V | manual | 4 | 1 |
gear | Numerical treated as categorical | Number of forward gears | 0.90416444 | -1.310481114 | 0.588295128 | V | manual | 4 | 1 |
carb | Numerical treated as categorical | Number of carburetors | 1.166003916 | -0.048290296 | -1.874010283 | S | manual | 5 | 4 |
The InitialSeedTable has three points that serve as initial cluster centers.
model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
---|---|---|---|---|---|---|---|---|---|---|---|
Datsun 710 | 0.449543447 | 4 | -0.990182091 | -0.783040459 | 0.473999587 | -0.917004624 | 0.426006817 | V | manual | 4 | 1 |
Ferrari Dino | -0.064813069 | 6 | -0.691647397 | 0.412942174 | 0.043834734 | -0.457097039 | -1.314395417 | S | manual | 5 | 6 |
Lincoln Continental | -1.607882616 | 8 | 1.849931752 | 0.996348337 | -1.115740088 | 2.255335698 | -0.016088927 | S | automatic | 3 | 4 |
SQL Call
DROP TABLE kmodes_clusters; SELECT * FROM KModes ( ON kmodes_input AS InputTable ON kmodes_init AS InitialSeedTable OUT TABLE OutputTable (kmodes_clusters) USING TargetColumns ('mpg:carb') NumericAsCategorical ('cyl', 'gear', 'carb') ) AS dt;
Output
With the initialseedtable argument, the cluster centers and assignments are the same every time, with the same distance metric (in this case, the default, Euclidean).
summary | between_cluster_error | total_within_cluster_error | pseudo_f |
---|---|---|---|
Number of Clusters: 3 | 195.82116758156 | 113.178832431262 | 16.7251942136624 |
Number of Iterations: 5 | |||
Model Converged: true | |||
Number of Data Points: 32.0 |
This query returns the following table:
SELECT * FROM kmodes_clusters ORDER BY 1;
cluster_id | mpg | disp | hp | drat | wt | qsec | cyl | vs | am | gear | carb | within_cluster_ss | cluster_weight | distance_metric | category_weights |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.724943435928571 | 0.890010157642857 | 0.511912862714286 | -0.9434069635 | 0.794435602785714 | -0.180375863 | 8 | S | automatic | 3 | 4 | 43.6174097337196 | 14 | EUCLIDEAN,OVERLAP | [1.0,1.0,1.0,1.0,1.0] |
1 | -0.2639188168 | -0.059076587 | 0.760068841 | 0.4478156392 | -0.221011145 | -1.2494800924 | 6 | S | manual | 5 | 4 | 20.7870494221671 | 5 | ||
2 | 0.882215552923077 | -0.935750713230769 | -0.843624945153846 | 0.843739945692308 | -0.770541747153846 | 0.674820195846154 | 4 | V | manual | 4 | 2 | 48.7743732753756 | 13 |