5.4.5 - Clustering and Constants in Data - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Teradata Warehouse Miner
Release Number
February 2018
English (United States)
Last Update

When one or more of the variables included in the clustering analysis have only a few values, these values may be singled out and included in particular clusters as constants. This is most likely when the number of clusters sought is large. When this happens, the covariance matrix becomes singular and cannot be inverted, since some of the variances are zero. A feature is provided in the cluster algorithm to improve the chance of success under these conditions, by limiting how close to zero the variance may be set, e.g., 10-3. The default value is 10-10. If the log-likelihood values increase for a number of iterations and then start decreasing, it is likely due to the clustering algorithm having found clusters where selected variables are all the same value (a constant), so the cluster variance is zero. Changing the minimum variance exponent value to a larger value may reduce the effect of these constants, allowing the other variables to converge to a higher log-likelihood value.