The TD_KMeansPredict function uses the cluster centroids in the TD_KMeans function output to assign the input data points to the cluster centroids.
Input Table
This example uses the following input table:
id C1 C2 -- -- -- 1 1 1 2 2 2 3 8 8 4 9 9
KMeans_Model (generated using TD_KMeans)
You can view the TD_KMeans call provisioned with initial centroids table.
SELECT * FROM TD_KMeans ( ON kmeans_input_table AS InputTable ON kmeans_initial_centroids_table AS InitialCentroidsTable DIMENSION USING IdColumn('id') TargetColumns('c1','c2') StopThreshold(0.0395) MaxIterNum(3) ) AS dt;
Result:
td_clusterid_kmeans C1 C2 td_size_kmeans td_withinss_kmeans id td_modelinfo_kmeans ------------------- -- -- -------------- ------------------ -- ------------------- 0 1.5 1.5 2 1 NULL NULL 1 8.5 8.5 2 1 NULL NULL NULL NULL NULL NULL NULL NULL Converged : True NULL NULL NULL NULL NULL NULL Number of Iterations : 2 NULL NULL NULL NULL NULL NULL Number of Clusters : 2 NULL NULL NULL NULL NULL NULL Total_WithinSS 2.00000000000000E+00 NULL NULL NULL NULL NULL NULL Between_SS : 9.80000000000000E+01 NULL NULL NULL NULL NULL NULL Method for InitialCentroids : Externally supplied InitialCentroidsTable
TD_KMeansPredict Call
SELECT * FROM TD_KMeansPredict ( ON kmeans_input_table AS InputTable ON kmeans_model AS ModelTable DIMENSION USING OutputDistance('true') Accumulate('c1','c2') )AS dt order by 1,2,3;
TD_KMeansPredict Output
id td_clusterid_kmeans td_distance_kmeans C1 C2 -- ------------------- ------------------ -- -- 1 0 0.707 1 1 2 0 0.707 2 2 3 1 0.707 8 8 4 1 0.707 9 9
If you set the value of OutputDistance to 'false' and rerun the query, the output shows these columns:
id td_clusterid_kmeans C1 C2 -- --------------------- -- -- 1 0 1 1 2 0 2 2 3 1 8 8 4 1 9 9