1.0 - 8.00 - GMM Arguments - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)
OutputTable
Specify the name of the output table to which the function outputs cluster information. The table must not already exist.
MaxClusterNum
[Required if you omit ClusterNum, disallowed otherwise.] Specify the maximum number of clusters in a Dirichlet Process model and causes the function to use the DP-GMM algorithm. This value must have the data type INTEGER.
Default: 20
ClusterNum
[Required if you omit MaxClusterNum, disallowed otherwise.] Specify the number of clusters in a model and causes the function to use the basic GMM algorithm. This value must have the data type INTEGER and be greater than 0.
Default: 10
CovarianceType
[Optional] Specify the covariance matrix type, thereby determining how many parameters the function estimates for each cluster, where D is the number of dimensions in the matrix:
Option Description
'diagonal' (Default) Each covariance matrix has zeros on nondiagonal. Function estimates D parameters for each cluster.
'spherical' Each covariance matrix is of form σI. Function estimates one parameter for each cluster.
'tied' Each cluster has the same covariance matrix. Function estimates (1/2)D (D -1) parameters.
'full' Each cluster has an arbitrary covariance matrix. Function estimates (1/2)D (D -1) parameters for each cluster.
Tolerance
[Optional] Specify the minimum change in log-likelihood between iterations that causes the function to terminate. This value must have the data type DOUBLE PRECISION and be greater than 0.
Default: 0.001
MaxIterNum
[Optional] Specify the maximum number of iterations for which the function runs. This value must have the data type INTEGER and be greater than 0.
Default: 10
ConcentrationParam
[Optional] Specify this argument only if you specify MaxClusterNum. Specify the concentration parameter, α, which determines the number of clusters that the DP-GMM algorithm creates. This value must have the data type DOUBLE PRECISION and be greater than 0.
The expected number of clusters is α log N, where N is the number of points in the data set; therefore, a larger α value tends to cause the algorithm to find more clusters.
Default: 0.001
PackOutput
[Optional] Specify whether the function packs the output.
Default: 'false'