The OutputTable schema depends on the OutputFormat syntax element.
OutputTable Schema, OutputFormat ('sparse') (Default)
The table has D+3 columns, where D is the number of dimensions of the input data.
Column | Data Type | Description |
---|---|---|
accumulate_column | Same as in input table | [Column appears once for each specified accumulate_column.] Column copied from InputTable. Typically, one accumulate_column contains unique data point identifier. |
data_point | Any numeric SQL data type | [Column appears D times.] Input data point, copied from InputTable. |
cluster_rank | INTEGER | Cluster rank, from most probable to least probable. Most probable cluster is identified as "1", next most probable as "2", and so on. |
cluster_id | INTEGER | Cluster identification number. |
prob | DOUBLE PRECISION | Probability that cluster is assigned to data point. |
OutputTable Schema, OutputFormat ('dense')
The table has D+2k columns, where D is the number of dimensions of the input data and n is the the number of cluster weights that the function outputs (the value of the TopK syntax element).
Column | Data Type | Description |
---|---|---|
accumulate_column | Same as in input table | [Column appears once for each specified accumulate_column.] Column copied from InputTable. Typically, one accumulate_column contains unique data point identifier. |
id | Any | Data point identifier. |
data_point | Any numeric SQL data type | [Column appears D times.] Input data point, copied from InputTable. |
cluster_id_i | INTEGER | [Column appears n times.] cluster_id_1 is most probable cluster for observation, cluster_id_2 is next most probable, and so on. |
prob_i | DOUBLE PRECISION | [Column appears n times.] Probability that observation belongs to cluster_id_i. |