This example uses the Scale function to scale data (using the maxabs method) before inputting it to the function KMeans, which outputs the centroids of the clusters in the data set. (KMeans (ML Engine) explains the reason for scaling data before inputting it to a distance-based analysis function like KMeans.)
For an example of using Scale functions to convert input variables into z-scores for use with the Principal Components Analysis (PCA) functions, see PCA Example.
Input
- InputTable: computers_train1, as in KMeans Example: NumClusters, UnpackColumns ('false')
- Statistic: created by calling the ScaleMap function inside the Scale call
SQL Call to Create Table of Scaled Data
CREATE MULTISET TABLE computers_normalized AS ( SELECT * FROM Scale ( ON computers_train1 AS InputTable PARTITION BY ANY ON ( SELECT * FROM ScaleMap ( ON computers_train1 USING TargetColumns ('[1:5]') MissValue ('omit') ) AS dt1 ) AS statistic DIMENSION USING ScaleMethod ('maxabs') Accumulate ('id') ) AS dt2 ) WITH DATA;
SQL Call to Input Scaled Data to KMeans
SELECT * FROM KMeans ( ON computers_normalized AS InputTable OUT TABLE OutputTable (computers_centroid) USING NumClusters ('8') StopThreshold ('0.05') MaxIterNum ('10') ) AS dt;
The result of this query varies with each run. For repeatability, use the InitialSeeds syntax element instead of the NumClusters syntax element.
Output
clusterid mean size withinss --------- ------------------------------------------------------------------------------------------ ---- ---------------- 0 0.520659010288158 0.603350604490499 0.175562135044001 0.236183074265976 0.880829015544038 579 11.895719506972 1 0.305714191782829 0.306566440349176 0.0933333333333337 0.114997575169738 0.838135448165681 1031 6.96675718192228 2 0.427463261652133 0.415340136054421 0.103356009070295 0.120535714285714 0.85734293717487 294 4.56252016477447 3 0.509765433827703 0.371402439024389 0.323603368176539 0.558434959349594 0.859995217599231 492 18.7512162682813 4 0.3816394266286 0.777139208173692 0.251070972450283 0.234594508301405 0.866877019006832 783 32.2577148011592 5 0.333863472659545 0.593407407407406 0.137674074074075 0.131944444444444 0.846623093681913 675 12.2440968832223 6 0.53893810202189 0.759645868465428 0.36271982654782 0.561129848229342 0.885031246900105 593 42.2744605421735 7 0.404013881226436 0.32103386809269 0.195777947542653 0.247994652406417 0.854147006396138 561 7.9886712185662 --------- ------------------------------------------------------------------------------------------ ---- ---------------- Converged : False Number of Iterations : 10 Number of clusters : 8 Successfully created Output table Total_WithinSS : 136.94115656707123 Between_SS : 395.39586320118275
Download a zip file of all examples and a SQL script file that creates their input tables.