5.4.5 - Cluster Scoring - RESULTS - Data - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
yuy1504291362546.ditamap
dita:ditavalPath
ft:empty
  1. On the Cluster Scoring dialog box, click RESULTS.
  2. Click data (note that the RESULTS tab is disabled until after the analysis is completed).
    Cluster Scoring > Results > Data

    Results data, if any, is displayed in a data grid.

    If a table was created, a sample of rows is displayed here - the size determined by the setting specified by Maximum result rows to display in Tools > Preferences > Limits

    The following table is built in the requested Output Database by the Cluster Scoring analysis. Note that the options selected affect the structure of the table. Those columns in bold below comprise the Unique Primary Index (UPI). Also, there may be repeated groups of columns, and that some columns are generated only if specific options are selected.

    Output Database (Built by the Cluster Scoring analysis)
    Name Type Definition
    Key User Defined One or more unique-key columns, which default to the index, defined in the table to be scored (i.e., in Selected Tables). The data type defaults to the same as the scored table, but can be changed via Primary Index Columns and Types.
    Probability (Default) FLOAT The probabilities that an observation or row belongs to each cluster if the Include Cluster Probability Scores option is selected. A column is created for each possible cluster, adding the cluster number to the prefix entered in the Column Prefix option. This prefix will be used for each column generated (one per cluster) that will be populated with the probability scores. Note that the prefix used will have sequential numbers, beginning with 1 and incrementing for each cluster appended to it. By default, the Column Prefix is p, so p1, p2, p3, etc., is generated). These columns may be excluded by not selecting the Include Cluster Probability Scores option, but if this is done the cluster membership number must be included.
    Cluster Number (Default) INTEGER The column in the output score table representing the cluster number to which an observation or row belongs can be set by the user. This column may be excluded by not selecting the Include Cluster Membership option, but if this is done the cluster probability scores must be included (see above). The name of the column defaults to Cluster Number, but this can be overwritten by entering another value in Column Name under the Include Cluster Membership option. This cannot have the same name as any of the index columns in the table being scored, and the name entered cannot exist as a column in the table being scored.

    When scoring a Fast K-Means model, the score table differs from that shown above. The first column identifies the cluster and has the default name Cluster Number. The next columns are the Index columns and the last columns are the Retain columns, if any. A Fast K-Means score table does not contain probability columns.