Output Table Schema
| Column | Data Type | Description |
|---|---|---|
| task_index | SMALLINT | Identifier of AMP that produced a boosting tree. |
| tree_num | SMALLINT | Identifier of boosted tree. Number of unique tree_id values depends on NumBoostedTrees syntax element value and number of AMPs. |
| iter | SMALLINT | Iteration (boosting round) number. |
| class_num | SMALLINT | Index of class column to predict. It only appears in classification. For LossFunction ('softmax'), the default: Number of unique class_num values is number of class labels in data set. For K class labels: class_num values are the integers in range [0, K-1]. For LossFunction('binomial'): There is only one class_num value. |
| tree_order | SMALLINT | Identifier of a complete JSON order of the regression_tree/classification_tree column. |
| regression_tree /classification_tree | VARCHAR | JSON representation of decision tree. For JSON types that can appear in the representation, see the following table. The maximum length is 32000.
|
Out Table [Records Training Accuracy over Iterations] Schema
| Column Name | Data Type | Description |
|---|---|---|
| task_index | SMALLINT | Identifier of AMP that produced a boosting tree. |
| iter | SMALLINT | Iteration (boosting round) number. |
| tree_num | SMALLINT | Identifier of boosted tree. Number of unique tree_id values depends on NumBoostedTrees syntax element value and number of AMPs. |
| iter | SMALLINT | Iteration (boosting round) number. |
| mse (regression)/ accuracy (classification) | Double | Regression MSE represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set. Classification accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the total number of predictions. |
| average residuals (regression)/ deviance (classification) | Double | Regression average residuals is the average difference between the total observed values of the dependent variables (y) and the predicted values y hat. Classification deviance measures the difference in "fit" of a candidate model and that of the saturated model. |
JSON Types in JSON Representation of Boosting Tree
| JSON Type | Description |
|---|---|
| id_ | Node identifier. |
| sum_ | Appears only for regression trees. Sum of values of response variable at node identified by id. |
| sumSq_ | Appears only for regression trees. Sum of squared values of response variable at node identified by id. |
| responseCounts_ | Appears only for classification trees. The number of observations in each class at a node, identified by id. |
| size_ | The total number of observations at a node, identified by id. |
| maxDepth_ | Maximum possible depth of the tree, starting from node identified by id. For root node, the value is max_depth; for leaf nodes, 0; for other nodes, maximum possible depth of the tree, starting from that node. |
| split_ | Start of JSON item describing a split at node identified by id. |
| splitValue_ | The attribute value used for splitting a tree node. |
| score_ | Gini score of the node identified by id. |
| attr_ | Attribute (predictor) on which algorithm split at node identified by id. |
| type_ | Type of tree and split. Possible values:
|
| leftNodeSize_ | The number of observations assigned to the left node in the split. |
| rightNodeSize_ | The number of observations assigned to the right node in the split. |
| leftChild_ | Start of JSON item describing left child of a node, identified by id. |
| rightChild_ | Start of JSON item describing right child of a node, identified by id. |
| nodeType_ | Type of a node identified by id. Possible values:
|
| prediction_ | The value of region prediction of a leaf node. |