Output Table Schema
Column | Data Type | Description |
---|---|---|
task_index | SMALLINT | Identifier of AMP that produced a boosting tree. |
tree_num | SMALLINT | Identifier of boosted tree. Number of unique tree_id values depends on NumBoostedTrees syntax element value and number of AMPs. |
iter | SMALLINT | Iteration (boosting round) number. |
class_num | SMALLINT | Index of class column to predict. It only appears in classification. For LossFunction ('softmax'), the default: Number of unique class_num values is number of class labels in data set. For K class labels: class_num values are the integers in range [0, K-1]. For LossFunction('binomial'): There is only one class_num value. |
tree_order | SMALLINT | Identifier of a complete JSON order of the regression_tree/classification_tree column. |
regression_tree /classification_tree | VARCHAR | JSON representation of decision tree. For JSON types that can appear in the representation, see the following table. The maximum length is 32000.
|
Out Table [Records Training Accuracy over Iterations] Schema
Column Name | Data Type | Description |
---|---|---|
task_index | SMALLINT | Identifier of AMP that produced a boosting tree. |
iter | SMALLINT | Iteration (boosting round) number. |
tree_num | SMALLINT | Identifier of boosted tree. Number of unique tree_id values depends on NumBoostedTrees syntax element value and number of AMPs. |
iter | SMALLINT | Iteration (boosting round) number. |
mse (regression)/ accuracy (classification) | Double | Regression MSE represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set. Classification accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the total number of predictions. |
average residuals (regression)/ deviance (classification) | Double | Regression average residuals is the average difference between the total observed values of the dependent variables (y) and the predicted values y hat. Classification deviance measures the difference in "fit" of a candidate model and that of the saturated model. |
JSON Types in JSON Representation of Boosting Tree
JSON Type | Description |
---|---|
id_ | Node identifier. |
sum_ | Appears only for regression trees. Sum of values of response variable at node identified by id. |
sumSq_ | Appears only for regression trees. Sum of squared values of response variable at node identified by id. |
responseCounts_ | Appears only for classification trees. The number of observations in each class at a node, identified by id. |
size_ | The total number of observations at a node, identified by id. |
maxDepth_ | Maximum possible depth of the tree, starting from node identified by id. For root node, the value is max_depth; for leaf nodes, 0; for other nodes, maximum possible depth of the tree, starting from that node. |
split_ | Start of JSON item describing a split at node identified by id. |
splitValue_ | The attribute value used for splitting a tree node. |
score_ | Gini score of the node identified by id. |
attr_ | Attribute (predictor) on which algorithm split at node identified by id. |
type_ | Type of tree and split. Possible values:
|
leftNodeSize_ | The number of observations assigned to the left node in the split. |
rightNodeSize_ | The number of observations assigned to the right node in the split. |
leftChild_ | Start of JSON item describing left child of a node, identified by id. |
rightChild_ | Start of JSON item describing right child of a node, identified by id. |
nodeType_ | Type of a node identified by id. Possible values:
|
prediction_ | The value of region prediction of a leaf node. |