Purpose
After building a model using the Fast K-Means Clustering algorithm, new data is scored using Fast K-Means Cluster Scoring. The first parameter for Fast K-Means Cluster Scoring is the KmeansScore function name, followed by cluster scoring parameters.
Fast K-Means Cluster Scoring returns one or two data sets that are viewed as result sets. One result set is a progress report with two columns, a timestamp, and a progress message. The other result set is only returned if the samplescoresize parameter is set. It contains a sampling of the rows in the output score table, the actual number of rows determined by the value of the samplescoresize parameter.
Syntax
call twm. td_analyze('KmeansScore','database=twm_source;tablename=twm_customer_analysis;columns=col1,col2,col3;outscoredatabase=twm;outscoretable=table;keycolumns=col;inclusterdatabase=database;inclustertable=table;kvalue=number;Optional Parameters;');Required Parameters
- columns
- The input columns used in clustering. The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.
- database
- The database containing the input table.
- inclusterdatabase
- The database containing the table that represents the cluster model to score.
- inclustertable
- The name of the input table containing the cluster model to score.
- keycolumns
- The names of one or more columns in the input table to use as the primary index of the scored output table.
- kvalue
- The number of clusters to be contained in the cluster model.
- outscoredatabase
- The database containing the resulting scored output table.
- outscoretable
- The name of the scored output table to build.
- tablename
- The name of the table containing the data to cluster.
Optional Parameters
- clustername
- The name of the column representing the cluster identifier. The default is clusterid.
- fallback
- An optional flag to indicate (true), that the scored output table has the fallback attribute (that is, have a mirrored copy).
- operatordatabase
- The database where the table operators called by td_analyze reside. If not specified, the database software searches the standard search path for table operators, including the current user database.
- overwrite
-
When overwrite is set to true (default), the output tables are dropped before creating new ones.
- retaincolumns
- A comma-separated list naming columns to include in the scored output table unchanged from their names and values in the input table to be scored.
- samplescoresize
- The optional number of rows of the output score table to display as a result set.
Example
This example assumes the td_analyze function is installed in a database named twm.
The resulting model in table cust_analysis_clusters scores the twm_customer_analysis table, producing score table twm.cust_analysis_data. Various optional parameters are specified, including samplescoresize, retaincolumns, clustername, and fallback.
call twm.td_analyze('KmeansScore','database=twm_source;tablename=twm_customer_analysis;columns=avg_cc_bal,avg_ck_bal,avg_sv_bal;outscoredatabase=twm;outscoretable=cust_analysis_data;keycolumns=cust_id;inclusterdatabase=twm;inclustertable=cust_analysis_clusters;kvalue=3;operatordatabase=twm;samplescoresize=10;retaincolumns=city_name,state_code;clustername=mycluster;fallback=true;');