Purpose
After building a model using the Fast K-Means Clustering algorithm, new data can be scored using Fast K-Means Cluster Scoring. The first parameter for Fast K-Means Cluster Scoring is the KmeansScore function name, followed by cluster scoring parameters.
Fast K-Means Cluster Scoring returns one or two data sets that can be viewed as result sets. One result set is a progress report with two columns, a timestamp and a progress message. The other result set is only returned if the samplescoresize parameter is set. It contains a sampling of the rows in the output score table, the actual number of rows determined by the value of the samplescoresize parameter.
Syntax
call twm. td_analyze('KmeansScore','database=twm_source;tablename=twm_customer_analysis;columns=col1,col2,col3;outscoredatabase=twm;outscoretable=table;keycolumns=col;inclusterdatabase=database;inclustertable=table;kvalue=number;Optional Parameters;');Required Parameters
- columns
- The input columns used in clustering. The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.
- database
- The database containing the input table.
- inclusterdatabase
- The database that contains the table that represents the cluster model to be scored.
- inclustertable
- The name of the input table that contains the cluster model to be scored.
- keycolumns
- The names of one or more columns in the input table to be used as the primary index of the scored output table.
- kvalue
- The number of clusters to be contained in the cluster model.
- outscoredatabase
- The database that will contain the resulting scored output table.
- outscoretable
- The name of the scored output table to be built.
- tablename
- The name of the table containing the data that is to be clustered.
Optional Parameters
- clustername
- The name of the column representing the cluster identifier. The default value is clusterid.
- fallback
- An optional flag to indicate, with a value equal to true, that the scored output table should have the fallback attribute (that is, have a mirrored copy).
- operatordatabase
- The database where the tda_kmeans table operator called by td_analyze resides. If not specified, the database software searches the standard search path for table operators, including the current user database.
- retaincolumns
- A comma separated list naming columns to be included in the scored output table unchanged from their names and values in the input table to be scored.
- samplescoresize
- The optional number of rows of the output score table to be displayed as a result set.
Example
The following example assumes that the td_analyze function has been installed in a database named twm.
The resulting model in table cust_analysis_clusters is used to score the twm_customer_analysis table, producing score table twm.cust_analysis_data. Various optional parameters have been specified, including samplescoresize, retaincolumns, clustername, and fallback.
call twm.td_analyze('KmeansScore','database=twm_source;tablename=twm_customer_analysis;columns=avg_cc_bal,avg_ck_bal,avg_sv_bal;outscoredatabase=twm;outscoretable=cust_analysis_data;keycolumns=cust_id;inclusterdatabase=twm;inclustertable=cust_analysis_clusters;kvalue=3;operatordatabase=twm;samplescoresize=10;retaincolumns=city_name,state_code;clustername=mycluster;fallback=true;');