Clustering and Constants in Data - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product

Teradata Warehouse Miner

Release Number

5.4.5

Published

February 2018

Language

English (United States)

Last Update

2018-05-04

dita:mapPath

yuy1504291362546.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2302

Product Category

Software

When one or more of the variables included in the clustering analysis have only a few values, these values may be singled out and included in particular clusters as constants. This is most likely when the number of clusters sought is large. When this happens, the covariance matrix becomes singular and cannot be inverted, since some of the variances are zero. A feature is provided in the cluster algorithm to improve the chance of success under these conditions, by limiting how close to zero the variance may be set, e.g., 10-3. The default value is 10-10. If the log-likelihood values increase for a number of iterations and then start decreasing, it is likely due to the clustering algorithm having found clusters where selected variables are all the same value (a constant), so the cluster variance is zero. Changing the minimum variance exponent value to a larger value may reduce the effect of these constants, allowing the other variables to converge to a higher log-likelihood value.