The Fast K-Means option provides a dramatic performance improvement over the K-Means option. When selected, the options on the analysis parameters tab are altered and the options on the expert options tab are not available. With Fast K-Means, the options include the following:
- The Number of Clusters, Convergence Criterion and Maximum Iterations are provided as before.
- The option to remove null values using list-wise deletion is not offered, it is automatically done.
- The Variable Importance Evaluation Reports are not offered.
- The Cluster Definitions Database and Table names are supplied by you. This table stores the model and the scoring module processes it. It can also be used to continue execution starting with the cluster definitions in this table, rather than using random starting clusters.
- An Advertise Option option is provided for the Cluster Definitions table.
The Fast K-Means algorithm creates an output table structured differently than other clustering algorithms. The output table is converted into the style used by other algorithms (that is, viewed as a report and graphed in the usual manner). If the conversion is not possible, you can view the cluster definitions in the new style as data, along with the progress report.
Install the td_analyze external stored procedure and the tda_kmeans table operator called by the stored procedure in the database where the TWM metadata tables reside. Use the Install or Uninstall UDFs option under the Teradata Warehouse Miner start program item, selecting the option to Install TD_Analyze UDFs.