1.1 - 8.10 - KModes Syntax Elements - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Teradata Vantage
Release Number
October 2019
Content Type
Programming Reference
Publication ID
English (United States)
Specify the name of the table in which to output the centroids of the clusters.
[Required if you omit InitialSeedTable, disallowed otherwise.] Specify the number of clusters. If you specify a single value, the function trains a single model with the specified number of clusters. If you specify multiple values, the function trains a model for each value.
[Optional] Specify the name of the InitialSeedTable column that contains seed values for multiple models.
Specify the input table columns to use for clustering.
[Optional] Specify the convergence threshold. When the centroids move by less than threshold, the algorithm has converged. The threshold must be a nonnegative DOUBLE value.
Default: 0.0395
[Optional] Specify the maximum number of iterations that the algorithm runs before quitting if the convergence threshold is not met. The max_iterations must be a positive INTEGER.
Default: 10
[Optional] Specify the distance metric for numeric dimensions.
Default: 'euclidean'
[Optional] Specify the distance metric for categorical dimensions:
Option Description
overlap (Default) Distance is 0 if two points are in same category, 1 otherwise.
hamming Used for categories that are strings of equal length. Percentage of different characters.
[Optional] Specify the weight of each category in the KModes distance. Each weight must be a DOUBLE value.
Default behavior: All categories have equal weight.
[Optional] Specify the input table columns that contain numeric variables to interpret as categorical variables. These columns must have numeric SQL data types.
Default behavior: No numeric variables are treated as categorical variables.
[Optional] Specify the random seed the algorithm uses for repeatable results. The seed must be a LONG value.
If you specify Seed:
  • You must also specify SeedColumn.
  • You must specify NumClusters, not InitialSeedTable.
For repeatable results, use both the Seed and UniqueID syntax elements. For more information, see Nondeterministic Results and UniqueID Syntax Element.
[Optional] Specify the names of the InputTable columns by which to partition the input. Function calls that use the same input data, seed, and seed_column output the same result. If you specify SeedColumn, you must also specify Seed.
Ideally, the number of distinct values in the seed_column is the same as the number of workers in the cluster. A very large number of distinct values in the seed_column degrades function performance.