Cluster - INPUT - Analysis Parameters - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product

Teradata Warehouse Miner

Release Number

5.4.5

Published

February 2018

Language

English (United States)

Last Update

2018-05-04

dita:mapPath

yuy1504291362546.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2302

Product Category

Software

On the Clustering dialog box, click INPUT.
Click analysis parameters.
Clustering > Input > Analysis Parameters
On this screen, select:
- Clustering Algorithm
  - Gaussian — Cluster the data using a Gaussian Mixture Model as described above. This is the default algorithm.
  - K-Means — Cluster the data using the K-Means Model as described in K-Means Option.
  - Fast K-Means — Cluster the data using a high-performing version of the K-Means Model.
  - Poisson — Cluster the data using a Poisson Mixture Model as described in Poisson Option.
- Number of clusters — Enter the number of clusters before executing the Cluster analysis.
- Convergence Criterion — For the Gaussian and Poisson Mixture Models, clustering stops when the log-likelihood increases less than this amount. The default value is 0.001. Fast K-Means uses this field as a threshold for cluster changes based on a different formula. Generic K-Means, on the other hand, does not use this criterion as clustering stops when the distances of all points to each cluster have not changed from the previous iteration. In other words, when the assignment of rows to clusters has not changed from the previous iteration, clustering has converged.
- Maximum Iterations — Clustering is stopped after this maximum number of iterations has occurred. The default value is 50.
- Remove Null Values (using Listwise deletion) — This option eliminates all rows from processing that contain any null input columns. The default is enabled. Fast K-Means always performs Listwise deletion — it is not an option.
- Include Variable Importance Evaluation reports — Report shows resultant log-likelihood when each variable is successively dropped out of the clustering calculations. The most important variable will be listed next to the most negative log-likelihood value; the least important variable will be listed with the least negative value. This option is only available with the Gaussian cluster algorithm.
- Cluster Definitions Database and Table — Applies only to the Fast K-Means algorithm. This table holds the model information and is used when continuing a previous run or when scoring. An option is also provided to Advertise Output with an optional Advertise Note.
- Generate SQL Only — Applies only to the Fast K-Means algorithm. This option, if checked, generates the SQL call statement of the external stored procedure td_analyze but does not execute it. The SQL can be viewed on the Results > SQL tab.
- Continue Execution (instead of starting over) — Previous execution results are used as seed values for starting clustering.