5.4.5 - Sampling Large Database Tables as a Starting Method - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
yuy1504291362546.ditamap
dita:ditavalPath
ft:empty

It may be most effective to use the sample parameter to begin the analysis of extremely large databases. The execution times are much faster and an approximate result obtained that can be used as a starting point. Results can be compared using the log-likelihood value, where the largest value indicates the best clustering fit, in terms of maximum likelihood. Because local maxima can result from a particular EM clustering analysis, multiple executions from different samples can produce a seed that ultimately yields the best log-likelihood value.