1.1 - 8.10 - Scale Example: Use Scale Output in KMeans - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

This example uses the Scale function to scale data (using the maxabs method) before inputting it to the function KMeans, which outputs the centroids of the clusters in the data set. (KMeans (ML Engine) explains the reason for scaling data before inputting it to a distance-based analysis function like KMeans.)

For an example of using Scale functions to convert input variables into z-scores for use with the Principal Components Analysis (PCA) functions, see PCA Example.

Input

SQL Call to Create Table of Scaled Data

CREATE MULTISET TABLE computers_normalized AS (
  SELECT * FROM Scale (
    ON computers_train1 AS InputTable PARTITION BY ANY
    ON (
      SELECT * FROM ScaleMap (
        ON computers_train1
        USING
        TargetColumns ('[1:5]')
        MissValue ('omit')
     ) AS dt1
   ) AS statistic DIMENSION
   USING
   ScaleMethod ('maxabs')
   Accumulate ('id')
  ) AS dt2
) WITH DATA;

SQL Call to Input Scaled Data to KMeans

SELECT * FROM KMeans (
  ON computers_normalized AS InputTable
  OUT TABLE OutputTable (computers_centroid)
  USING
  NumClusters ('8')
  StopThreshold ('0.05')
  MaxIterNum ('10')
) AS dt;

The result of this query varies with each run. For repeatability, use the InitialSeeds syntax element instead of the NumClusters syntax element.

Output

 clusterid mean                                                                                       size withinss         
 --------- ------------------------------------------------------------------------------------------ ---- ---------------- 
 0         0.520659010288158 0.603350604490499 0.175562135044001 0.236183074265976 0.880829015544038  579  11.895719506972 
 1         0.305714191782829 0.306566440349176 0.0933333333333337 0.114997575169738 0.838135448165681 1031 6.96675718192228
 2         0.427463261652133 0.415340136054421 0.103356009070295 0.120535714285714 0.85734293717487   294  4.56252016477447
 3         0.509765433827703 0.371402439024389 0.323603368176539 0.558434959349594 0.859995217599231  492  18.7512162682813
 4         0.3816394266286 0.777139208173692 0.251070972450283 0.234594508301405 0.866877019006832    783  32.2577148011592
 5         0.333863472659545 0.593407407407406 0.137674074074075 0.131944444444444 0.846623093681913  675  12.2440968832223
 6         0.53893810202189 0.759645868465428 0.36271982654782 0.561129848229342 0.885031246900105    593  42.2744605421735
 7         0.404013881226436 0.32103386809269 0.195777947542653 0.247994652406417 0.854147006396138   561  7.9886712185662 
 --------- ------------------------------------------------------------------------------------------ ---- ----------------
           Converged : False                                                                                               
           Number of Iterations : 10                                                                                       
           Number of clusters : 8                                                                                          
           Successfully created Output table                                                                               
           Total_WithinSS : 136.94115656707123                                                                             
           Between_SS : 395.39586320118275

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.