KMeans Example 1: NumClusters, UnpackColumns ('false') by Default - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

The InputTable has five attributes of personal computers (price, speed, hard disk size, RAM, and screen size). The table has over 6000 rows. These examples use different arguments to find eight clusters based on the five attributes.

InputTable: computers_train1
id price speed hd ram screen
1 1499 25 80 4 14
2 1795 33 85 2 14
3 1595 25 170 4 15
4 1849 25 170 8 14
5 3295 33 340 16 14
6 3695 66 340 16 14
7 1720 25 170 4 14
8 1995 50 85 2 14
9 2225 50 210 8 14
12 2605 66 210 8 14
13 2045 50 130 4 14
14 2295 25 245 8 14
16 2225 50 130 4 14
17 1595 33 85 2 14
18 2325 33 210 4 15
19 2095 33 250 4 15
20 4395 66 452 8 14
... ... ... ... ... ...

SQL Call

This call tries to group the 5-dimensional data points into 8 clusters.

SELECT * FROM KMeans (
  ON computers_train1 AS InputTable
  OUT TABLE OutputTable (kmeanssample_centroid)
  USING
  NumClusters (8)
  StopThreshold (0.05)
  MaxIterNum (10)
) AS dt;

Output

Results Message Table
clusterid price speed hd ram screen size withinss
0 3862.41481481481 64.437037037037 578.792592592593 12.2666666666667 15.3037037037037 135 2.65353071259265E7
1 1469.89667458432 40.978622327791 258.852731591449 4.09976247030879 14.229216152019 842 3.08899537458434E7
2 2212.28004291845 51.2049356223176 327.561158798283 6.4549356223176 14.5708154506438 932 2.78566847135181E7
3 3001.59325044405 61.1314387211368 440.188277087034 12.2131438721137 14.8845470692718 563 2.43744200888119E7
4 2574.35087719298 53.2968960863698 435.774628879892 10.1376518218623 14.7085020242915 741 2.33952718137665E7
5 2153.42763157895 68.6118421052632 872.039473684211 12.2236842105263 15.0263157894737 304 2.3224274710526E7
6 2931.40569395018 61.5302491103203 1076.79715302491 22.7758007117438 15.0249110320285 281 2.58860688825617E7
7 1854.83719008264 47.9363636363636 301.023140495868 5.20330578512397 14.4330578512397 1210 4.2156215442976E7
  Converged: False    
  NumberofIterations: 10    
  Numberofclusters: 8    
  Successfully created Output table    
  Total_WithinSS: 2.2431819652393007E8    
  Between_SS: 1.8142692960010383E9    

This query returns the following table:

SELECT * FROM kmeanssample_centroid;
kmeanssample_centroid
clusterid price speed hd ram screen size withinss
0 2861.5646437995 57.7915567282 428.9525065963 11.746701847 14.7941952507 758 3.23717069498672E7
1 1463.1038461539 39.1371794872 236.5679487179 3.9282051282 14.208974359 780 2.45706992256417E7
2 2392.8843069874 52.4788087056 334.5612829324 7.3195876289 14.5841924399 873 2.60713860412369E7
3 3636.914893617 65.4723404255 615.4042553191 13.6340425532 15.1276595745 235 5.03470581617022E7
4 2095.0655462185 53.6352941176 512.6991596639 8.6588235294 14.7210084034 595 2.04614168806725E7
5 2689.9612756264 62.9225512528 980.2961275626 19.5353075171 15.15261959 439 5.32294371890669E7
6 1762.2406947891 64.9081885856 595.4863523573 7.0967741935 14.682382134 403 1.95361545012407E7
7 1891.4 42.9837837838 197.0627027027 4.08 14.3221621622 925 2.13319771956758E7