The KMeans function has one required input table (specified by the InputTable argument) and one optional input table (specified by the CentroidsTable argument).
The required input table contains the features by which to cluster the data.
Column Name | Data Type | Description |
---|---|---|
id | INTEGER | Contains the identifier of the user or item. |
dimension_i | DOUBLE PRECISION | Contains the data in dimension i. The table has columns dimension_1 through dimension_n, where n is the number of dimensions. Each dimension is a feature by which to cluster the data. For example, if the required application is clustering points by latitude and longitude on the surface of the earth, then the input table has three columns: point-id, latitude, and longitude. Clustering is performed on the latitude and longitude columns. The dimensionality n of the data is not specified as an argument, but is implicitly derived from the data. |
The optional input table contains the contains the initial seed means for the clusters. This table has the same schema as the table of cluster centroids (specified by the OutputTable argument), which is affected by the UnpackColumns argument and is described by KMeans Results Messages and KMeans Output Table Schema for UnpackColumns('true').