Scale Example 5: Use Scale Output in KMeans - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example uses the Scale function to scale data (using the maxabs method) before inputting it to the function KMeans, which outputs the centroids of the clusters in the data set. (KMeans explains the reason for scaling data before inputting it to a distance-based analysis function like KMeans.)

For an example of using Scale functions to convert input variables into z-scores for use with the Principal Components Analysis (PCA) functions, see PCA Example.

Input

SQL Call to Create Table of Scaled Data

CREATE MULTISET TABLE computers_normalized AS (
  SELECT * FROM Scale (
    ON computers_train1 AS "input" PARTITION BY ANY
    ON (
      SELECT * FROM ScaleMap (
        ON computers_train1
        USING
        TargetColumns ('[1:5]')
        MissValue ('omit')
      ) AS dt1
    ) AS statistic DIMENSION
    USING
    ScaleMethod ('maxabs')
    Accumulate ('id')
  ) AS dt2
) WITH DATA;

SQL Call to Input Scaled Data to KMeans

SELECT * FROM KMeans (
  ON computers_normalized AS InputTable
  OUT TABLE OutputTable (computers_centroid)
  USING
  NumClusters ('8')
  StopThreshold ('0.05')
  MaxIterNum ('10')
) AS dt;
The result of this query varies with each run. For repeatability, use the InitialSeeds argument instead of the NumClusters argument.

Output

clusterid price speed hd ram screen size withinss
0 0.499153191737689 0.313013698630136 0.273377364644488 0.474743150684932 0.850523771152295 292 6.14410036408697
1 0.346259665529739 0.311809635722682 0.122839796318057 0.156984430082256 0.846547314577999 1702 26.9367642917337
2 0.517252416931197 0.601960461285007 0.174217462932454 0.236408566721582 0.879930225797071 607 12.6014867603556
3 0.351843035925946 0.582768595041321 0.11426800472255 0.118629476584022 0.84086857883649 726 11.1916445231536
4 0.43909933270926 1.0 0.276756756756757 0.300168918918919 0.867408585055643 370 15.9375922030147
5 0.357141678235243 0.648366336633661 0.263063020587773 0.2378300330033 0.871481265773634 606 10.3780145525625
6 0.549091201375517 0.608754716981132 0.511633423180592 0.756603773584906 0.882574916759156 265 19.8411759929145
7 0.533305831046153 0.625136363636362 0.288437229437229 0.497727272727273 0.879144385026736 440 11.5107079157115
  Converged : False    
  Number of Iterations : 10    
  Number of clusters : 8    
  Successfully created Output table    
  Total_WithinSS : 114.54148660353306    
  Between_SS : 417.79553316469645