The KMeans function has two required outputs and one optional output. The required outputs are the results message (output to the screen) and the table of cluster centroids (specified by the OutputTable argument). The optional output is a table of the clusters themselves (specified by the ClusteredOutput argument).
The results message table starts with information about each cluster, described by the following two tables.
Column Name | Data Type | Description |
---|---|---|
clusterid | INTEGER | Contains the cluster identifiers of the centroids. |
feature_set | VARCHAR | Column name is the concatenation of the feature names. For example, if the feature names are 'p1', 'p2', and 'p3', then the column name is 'p1 p2 'p3'. Contains the concatenation of the means in the centroid. For example, means 3, 5, and 6 are represented as '3 5 6'. The UnpackColumns argument does not affect this column.
|
size | INTEGER | Contains the number of points in the cluster. |
withinss | INTEGER | Contains the within-cluster-sum-of-squares—the sum of squared differences of each point from its cluster centroid. |
Label | Value |
---|---|
Converged : | 'True' if the algorithm converged, 'False' otherwise. |
Number of iterations : | Number of iterations that the algorithm performed. |
Number of clusters : | Number of clusters. |
Output table : | Name of the output table specified by the OutputTable argument. |
Total_WithinSS : | Sum of withinss values in the preceding table. |
Between_SS : | Between sum of squares—the sum of squared distances of centroids to the global mean, where the squared distance of each mean to the global mean is multiplied by the number of data points it represents. |
The schema of the table of cluster centroids is affected by the UnpackColumns argument.
Column Name | Data Type | Description |
---|---|---|
clusterid | INTEGER | Contains the cluster identifiers of the centroids. |
feature_set | VARCHAR | Column name is the concatenation of the feature names. For example, if the feature names are 'p1', 'p2', and 'p3', then the column name is 'p1 p2 'p3'. Contains the concatenation of the means of the features in the centroid. For example, means 3, 5, and 6 are represented as '3 5 6'. |
size | INTEGER | Contains the number of points in the cluster. |
withinss | INTEGER | Contains the within-cluster-sum-of-squares—the sum of squared differences of each point from its cluster centroid. |
Column Name | Data Type | Description |
---|---|---|
clusterid | INTEGER | Contains the cluster identifiers of the centroids. |
feature_i | INTEGER or VARCHAR | Contains the means for feature i. The table has one such column for each feature. |
size | INTEGER | Contains the number of points in the cluster. |
withinss | INTEGER | Contains the within-cluster-sum-of-squares—the sum of squared differences of each point from its cluster centroid. |
The following table describes the optional table of the clusters themselves.
Column Name | Data Type | Description |
---|---|---|
pointid | INTEGER | Contains the identifier of the user or item (from input_table). |
centroidid | INTEGER | Contains the identifier of the centroid for pointid. |