Fast K-Means Cluster Scoring - Teradata Warehouse Miner

Purpose

After building a model using the Fast K-Means Clustering algorithm, new data can be scored using Fast K-Means Cluster Scoring. The first parameter for Fast K-Means Cluster Scoring is the KmeansScore function name, followed by cluster scoring parameters.

Fast K-Means Cluster Scoring returns one or two data sets that can be viewed as result sets. One result set is a progress report with two columns, a timestamp, and a progress message. The other result set is only returned if the samplescoresize parameter is set. It contains a sampling of the rows in the output score table, the actual number of rows determined by the value of the samplescoresize parameter.

Syntax

call twm. td_analyze('KmeansScore','database=twm_source;tablename=twm_customer_analysis;columns=col1,col2,col3;outscoredatabase=twm;outscoretable=table;keycolumns=col;inclusterdatabase=database;inclustertable=table;kvalue=number;Optional Parameters;');

Required Parameters

columns: The input columns used in clustering. The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.; For example: columns=column1,column2,column3
database: The database containing the input table.
inclusterdatabase: The database containing the table that represents the cluster model to score.
inclustertable: The name of the input table containing the cluster model to score.
keycolumns: The names of one or more columns in the input table to use as the primary index of the scored output table.
kvalue: The number of clusters to be contained in the cluster model.
outscoredatabase: The database containing the resulting scored output table.
outscoretable: The name of the scored output table to build.
tablename: The name of the table containing the data to cluster.

Optional Parameters

clustername: The name of the column representing the cluster identifier. The default is clusterid.
fallback: An optional flag to indicate (true), that the scored output table should have the fallback attribute (that is, have a mirrored copy).
operatordatabase: The database where the tda_kmeans table operator called by td_analyze resides. If not specified, the database software searches the standard search path for table operators, including the current user database.; For example: operatordatabase=twm
overwrite: When overwrite is set to true (default), the output tables are dropped before creating new ones.
retaincolumns: A comma-separated list naming columns to include in the scored output table unchanged from their names and values in the input table to be scored.
samplescoresize: The optional number of rows of the output score table to display as a result set.

Example

This example assumes the td_analyze function is installed in a database named twm.

The resulting model in table cust_analysis_clusters scores the twm_customer_analysis table, producing score table twm.cust_analysis_data. Various optional parameters are specified, including samplescoresize, retaincolumns, clustername, and fallback.

call twm.td_analyze('KmeansScore','database=twm_source;tablename=twm_customer_analysis;columns=avg_cc_bal,avg_ck_bal,avg_sv_bal;outscoredatabase=twm;outscoretable=cust_analysis_data;keycolumns=cust_id;inclusterdatabase=twm;inclustertable=cust_analysis_clusters;kvalue=3;operatordatabase=twm;samplescoresize=10;retaincolumns=city_name,state_code;clustername=mycluster;fallback=true;');