It may be most effective to use the sample parameter to begin the analysis of extremely large databases. The execution times are much faster and an approximate result obtained that can be used as a starting point. Results can be compared using the log-likelihood value, where the largest value indicates the best clustering fit, in terms of maximum likelihood. Because local maxima can result from a particular EM clustering analysis, multiple executions from different samples can produce a seed that ultimately yields the best log-likelihood value.