Cluster Scoring - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Teradata Warehouse Miner
Release Number
February 2018
English (United States)
Last Update
Product Category

Scoring a table is the assignment of each row to a cluster. In the Gaussian Mixture model, the “maximum probability rule” is used to assign the row to the cluster for which its conditional probability is the largest. The model also assigns relative probabilities of each cluster to the row, so the soft assignment of a row to more than one cluster can be obtained.

When scoring is requested, the selected table is scored against centroids/variances from the selected Clustering analysis. After a single iteration, each row is assigned to one of the previously defined clusters, together with the probability of membership. The row to cluster assignment is based on the largest probability.

The Cluster Scoring analysis scores an input table that contains the same columns that were used to perform the selected Clustering analysis. The implicit assumption in doing this is that the underlying population distributions are the same. When scoring is requested, the specified table is scored against the centroids and variances obtained in the selected Clustering analysis. Only a single iteration is required before the new scored table is produced.

After clusters have been identified by their centroids and variances, the scoring engine identifies to which cluster each row belongs. The Gaussian Mixture model permits multiple cluster memberships, with scoring showing the probability of membership to each cluster. In addition, the highest probability is used to assign the row absolutely to a cluster. The resulting score table consists of the index (key) columns, followed by probabilities for each cluster membership, followed by the assigned cluster number (the cluster with the highest probability of membership).