This example uses the Scale function to scale data (using the maxabs method) before inputting it to the function KMeans, which outputs the centroids of the clusters in the data set. (KMeans explains the reason for scaling data before inputting it to a distance-based analysis function like KMeans.)
For an example of using Scale functions to convert input variables into z-scores for use with the Principal Components Analysis (PCA) functions, see PCA Example.
Input
- input: computers_train1, as in KMeans Example 1: NumClusters, UnpackColumns ('false') by Default
- statistic: created by calling the ScaleMap function inside the Scale call
SQL Call to Create Table of Scaled Data
CREATE MULTISET TABLE computers_normalized AS ( SELECT * FROM Scale ( ON computers_train1 AS "input" PARTITION BY ANY ON ( SELECT * FROM ScaleMap ( ON computers_train1 USING TargetColumns ('[1:5]') MissValue ('omit') ) AS dt1 ) AS statistic DIMENSION USING ScaleMethod ('maxabs') Accumulate ('id') ) AS dt2 ) WITH DATA;
SQL Call to Input Scaled Data to KMeans
SELECT * FROM KMeans ( ON computers_normalized AS InputTable OUT TABLE OutputTable (computers_centroid) USING NumClusters ('8') StopThreshold ('0.05') MaxIterNum ('10') ) AS dt;
The result of this query varies with each run. For repeatability, use the InitialSeeds argument instead of the NumClusters argument.
Output
clusterid | price speed hd ram screen | size | withinss |
---|---|---|---|
0 | 0.499153191737689 0.313013698630136 0.273377364644488 0.474743150684932 0.850523771152295 | 292 | 6.14410036408697 |
1 | 0.346259665529739 0.311809635722682 0.122839796318057 0.156984430082256 0.846547314577999 | 1702 | 26.9367642917337 |
2 | 0.517252416931197 0.601960461285007 0.174217462932454 0.236408566721582 0.879930225797071 | 607 | 12.6014867603556 |
3 | 0.351843035925946 0.582768595041321 0.11426800472255 0.118629476584022 0.84086857883649 | 726 | 11.1916445231536 |
4 | 0.43909933270926 1.0 0.276756756756757 0.300168918918919 0.867408585055643 | 370 | 15.9375922030147 |
5 | 0.357141678235243 0.648366336633661 0.263063020587773 0.2378300330033 0.871481265773634 | 606 | 10.3780145525625 |
6 | 0.549091201375517 0.608754716981132 0.511633423180592 0.756603773584906 0.882574916759156 | 265 | 19.8411759929145 |
7 | 0.533305831046153 0.625136363636362 0.288437229437229 0.497727272727273 0.879144385026736 | 440 | 11.5107079157115 |
Converged : False | |||
Number of Iterations : 10 | |||
Number of clusters : 8 | |||
Successfully created Output table | |||
Total_WithinSS : 114.54148660353306 | |||
Between_SS : 417.79553316469645 |