TD_XGBoost Output - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-02-17
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Output Table Schema

Column Data Type Description
task_index SMALLINT Identifier of AMP that produced a boosting tree.
tree_num SMALLINT Identifier of boosted tree. Number of unique tree_id values depends on NumBoostedTrees syntax element value and number of AMPs.
iter SMALLINT Iteration (boosting round) number.
class_num SMALLINT Index of class column to predict. It only appears in classification. For LossFunction ('softmax'), the default: Number of unique class_num values is number of class labels in data set. For K class labels: class_num values are the integers in range [0, K-1]. For LossFunction('binomial'): There is only one class_num value.
tree_order SMALLINT Identifier of a complete JSON order of the regression_tree/classification_tree column.
regression_tree /classification_tree VARCHAR JSON representation of decision tree. For JSON types that can appear in the representation, see the following table.
The maximum length is 32000.

Out Table [Records Training Accuracy over Iterations] Schema

Column Name Data Type Description
task_index SMALLINT Identifier of AMP that produced a boosting tree.
iter SMALLINT Iteration (boosting round) number.
tree_num SMALLINT Identifier of boosted tree. Number of unique tree_id values depends on NumBoostedTrees syntax element value and number of AMPs.
iter SMALLINT Iteration (boosting round) number.
mse (regression)/ accuracy (classification) Double Regression MSE represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set. Classification accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the total number of predictions.
average residuals (regression)/ deviance (classification) Double Regression average residuals is the average difference between the total observed values of the dependent variables (y) and the predicted values y hat. Classification deviance measures the difference in "fit" of a candidate model and that of the saturated model.

JSON Types in JSON Representation of Boosting Tree

JSON Type Description
id_ Node identifier.
sum_ Appears only for regression trees. Sum of values of response variable at node identified by id.
sumSq_ Appears only for regression trees. Sum of squared values of response variable at node identified by id.
responseCounts_ Appears only for classification trees. The number of observations in each class at a node, identified by id.
size_ The total number of observations at a node, identified by id.
maxDepth_ Maximum possible depth of the tree, starting from node identified by id. For root node, the value is max_depth; for leaf nodes, 0; for other nodes, maximum possible depth of the tree, starting from that node.
split_ Start of JSON item describing a split at node identified by id.
splitValue_ The attribute value used for splitting a tree node.
score_ Gini score of the node identified by id.
attr_ Attribute (predictor) on which algorithm split at node identified by id.
type_ Type of tree and split. Possible values:
  • REGRESSION_NUMERIC_SPLIT
leftNodeSize_ The number of observations assigned to the left node in the split.
rightNodeSize_ The number of observations assigned to the right node in the split.
leftChild_ Start of JSON item describing left child of a node, identified by id.
rightChild_ Start of JSON item describing right child of a node, identified by id.
nodeType_ Type of a node identified by id. Possible values:
  • REGRESSION_NODE
  • REGRESSION_LEAF
prediction_ The value of region prediction of a leaf node.