The Single_Tree_Drive function outputs console messages, a model table, and (optionally) an intermediate splits table and final response table. The following table shows the schema of the message table.
Column | Data Type | Description |
---|---|---|
message | VARCHAR | Console message. |
The model table has a row for each node in the model (the single decision tree that the function creates). The name of the model table is specified by the OutputTableName argument. The following table shows the schema of the model table.
Column | Data Type | Description |
---|---|---|
node_id | INTEGER | Node identifier. |
node_size | INTEGER | Number of objects in the node. |
node_gini[(p)] | DOUBLE PRECISION | GINI impurity value for the information in the node. If you specify ImpurityMeasurement('gini'), the column name is node_gini(p); otherwise, it is node_gini. |
node_entropy[(p)] | DOUBLE PRECISION | Entropy impurity value for the information in the node. If you specify ImpurityMeasurement('entropy'), the column name is node_entropy(p); otherwise, it is node_entropy. |
node_chisq_pv[(p)] | DOUBLE PRECISION | Chi-square impurity value for the information in the node. If you specify ImpurityMeasurement('chisquare'), the column name is node_chisq_pv(p); otherwise, it is node_chisq_pv. |
node_label | VARCHAR | Output category for the node. |
node_majorvotes | INTEGER | Number of objects that belong to the category identified by node_label. |
split_value | DOUBLE PRECISION | Numerical split value. |
split_gini[(p)] | DOUBLE PRECISION | GINI impurity measurement for the information in the node after splitting. If you specify ImpurityMeasurement('gini'), the column name is split_gini(p); otherwise, it is split_gini. |
split_entropy[(p)] | DOUBLE PRECISION | Entropy impurity measurement for the information in the node after splitting. If you specify ImpurityMeasurement('entropy'), the column name is split_entropy(p); otherwise, it is split_entropy. |
split_chisq_pv[(p)] | DOUBLE PRECISION | Chi-square impurity measurement for the information in the node after splitting. If you specify ImpurityMeasurement('chisquare'), the column name is split_chisq_pv(p); otherwise, it is split_chisq_pv. |
left_id | INTEGER | Identifier of the left child of the node. |
left_size | INTEGER | Number of objects in left child of the node. |
left_label | VARCHAR | Output category for left child of the node. |
left_majorvotes | INTEGER | Number of objects that belong to the category identified by left_label. |
right_id | INTEGER | Identifier of the right child of the node. |
right_size | INTEGER | Number of objects in right child of the node. |
right_label | VARCHAR | Output category for right child of the node. |
right_majorvotes | INTEGER | Number of objects that belong to the category identified by right_label. |
left_bucket | VARCHAR | When the split value is the categorical attribute, the value in the left child of the node. |
right_bucket | VARCHAR | When the split value is the categorical attribute, the value in the right child of the node. |
left_label_probdist | VARCHAR | Output probability of each label for left child of the node. This column appears only if OutputResponseProbDist has the value 'true'. |
right_label_probdist | VARCHAR | Output probability of each label for right child of the node. This column appears only if OutputResponseProbDist has the value 'true'. |
prob_label_order | VARCHAR | Output the label order of probability for the left and right children of the node. This column appears only if OutputResponseProbDist has the value 'true'. |
attribute | VARCHAR | Split attribute. |
node_majorfreq | DOUBLE PRECISION | Weighted objects that belong to the category identified by node_label. This column appears only if the Weighted argument is 'true'. |
left_majorfreq | DOUBLE PRECISION | Weighted objects that belong to the category identified by left_label. This column appears only if the Weighted argument is 'true'. |
right_majorfreq | DOUBLE PRECISION | Weighted objects that belong to the category identified by right_label. This column appears only if the Weighted argument is 'true'. |
The following table describes the intermediate splits table. The name of the intermediate splits table is specified by the MaterializedSplitsTableWithName argument.
Column | Data Type | Description |
---|---|---|
attribute | VARCHAR | Attribute name (from the attribute table, Input). For each attribute, the table has the number of rows specified by the MaxDepth argument. |
percentile | INTEGER | Percentage of values in the split. For example, if attribute A has 100 different values, then percentile =10 and value =1 means that 100*10%=10 (the 10th value) of attribute A is 1, and 1 is the split value. |
value | NUMERIC, INTEGER, BIGINT, or DOUBLE PRECISION | Split value (from the attribute table, Input). |
The following table describes the output response table. The name of the output response table is specified by the SaveFinalResponseTableTo argument.
Column | Data Type | Description |
---|---|---|
node_id | INTEGER | Node identifier. |
pid | Any | Data point identifier. |
response | NUMERIC, INTEGER, BIGINT, or DOUBLE PRECISION | Response value for the data point. |