Argument | Category | Description |
---|---|---|
InputTable | Required | Input table is the table containing the list of features by which to cluster the data. |
OutputTable | Required | Output table is the table where output is stored. The output table contains the centroids of the clusters. |
InitialSeedTable | Required if NumClusters is omitted, otherwise not allowed | An input table containing the points that serve as initial cluster centers. |
NumClusters | Required if InitialSeedTable is omitted, otherwise not allowed | If a single value is given, the function trains a model with that number of clusters. If a list of integers is supplied, the function trains a model for each value. Initial seeds are specified by performing KMeans|| sampling using the FixedSample function. |
ModelIdColumn | Optional | If this argument is present, it indicates that the table specified in InitialSeedTable contains more than one set of seed values (that is, it contains seed values for more than one model). This argument specifies the column in InitialSeedTable that identifies which rows are associated with each model. |
InputColumns | Required | Specifies the input table columns to use for clustering. |
Threshold | Optional | This is the convergence threshold. When the centroids move by less than this amount, the algorithm has converged. The input value must be no less than 0.0. The default value is 0.0395. |
MaxIterNum | Optional | Specifies the maximum number of iterations that the algorithm runs before quitting if the convergence threshold is not met. The input value must be an integer greater than 0. The default value is 10. |
Distance | Optional | Specifies the distance metric that the Kmodes function uses for numeric dimensions. The default value is 'euclidean'. |
CategoricalDistance | Optional | Specifies the distance metric that the Kmodes function uses for categorical dimensions:
|
CategoryWeights | Optional | The weights to be assigned to each category in the KModes distance. |
AsCategories | Optional | Indicates which numeric categories to interpret as categorical variables. Input columns must contain numeric SQL types. |