1.1 - 8.10 - GMM Syntax Elements - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
IDColumn
Specify the name of the InputTable column that contains the row identifier.
Default: First InputColumn column
OutputTable
Specify the name of the output table to which the function outputs cluster information. The table must not already exist.
MaxClusterNum
[Required if you omit ClusterNum, disallowed otherwise.] Specify the maximum number of clusters in a Dirichlet Process model and causes the function to use the DP-GMM algorithm. This value must have the data type INTEGER.
Default: 20
ClusterNum
[Required if you omit MaxClusterNum, disallowed otherwise.] Specify the number of clusters in a model and causes the function to use the basic GMM algorithm. This value must have the data type INTEGER and be greater than 0.
Default: 10
CovarianceType
[Optional] Specify the covariance matrix type, thereby determining how many parameters the function estimates for each cluster, where D is the number of dimensions in the matrix:
Option Description
'diagonal' (Default) Each covariance matrix has zeros on nondiagonal. Function estimates D parameters for each cluster.
'spherical' Each covariance matrix is of form σI. Function estimates one parameter for each cluster.
'tied' Each cluster has the same covariance matrix. Function estimates (1/2)D (D -1) parameters.
'full' Each cluster has an arbitrary covariance matrix. Function estimates (1/2)D (D -1) parameters for each cluster.
Tolerance
[Optional] Specify the minimum change in log-likelihood between iterations that causes the function to terminate. This value must have the data type DOUBLE PRECISION and be greater than 0.
Default: 0.001
MaxIterNum
[Optional] Specify the maximum number of iterations for which the function runs. This value must have the data type INTEGER and be greater than 0.
Default: 10
ConcentrationParam
[Optional] Specify this syntax element only if you specify MaxClusterNum. Specify the concentration parameter, α, which determines the number of clusters that the DP-GMM algorithm creates. This value must have the data type DOUBLE PRECISION and be greater than 0.
The expected number of clusters is α log N, where N is the number of points in the data set; therefore, a larger α value tends to cause the algorithm to find more clusters.
Default: 0.001
PackOutput
[Optional] Specify whether the function packs the output.
Default: 'false'
Seed
[Optional] Specify the random seed the algorithm uses for repeatable results. The seed must be a LONG value.
For repeatable results, use both the Seed and UniqueID syntax elements. For more information, see Nondeterministic Results and UniqueID Syntax Element.
Default: 1