TD_XGBoostPredict Input - Teradata Vantage

Teradata® VantageCloud Lake

Deployment: VantageCloud
Edition: Lake
Product: Teradata Vantage
Published: January 2023
Language: English (United States)
Last Update: 2024-04-03
dita:mapPath: phg1621910019905.ditamap
dita:ditavalPath: pny1626732985837.ditaval
dita:id: phg1621910019905

InputTable Schema

Column Name	Data Type	Description
ID_Column	Any	Unique test point identifier. Cannot be NULL.
target_column(s)	INTEGER, BIGINT, SMALLINT, BYTEINT, FLOAT, DECIMAL, NUMBER	Column appears once for each specified target_column. Predictor variable. Cannot be NULL.
accumulate_column(s)	Any	Column appears once for each specified accumulate_column. Column to copy to output table.

Model Table Schema

Column Name	Data Type	Description
task_index	SMALLINT	Identifier of AMP that produced a boosting tree.
tree_num	SMALLINT	Identifier of boosted tree. Number of unique tree_id values depends on NumBoostedTrees syntax element value and number of AMPs.
Iter	SMALLINT	Iteration (boosting round) number.
class_num	SMALLINT	Index of class column to predict. It only appears in classification. For LossFunction ('softmax'), the default: Number of unique class_num values is number of class labels in data set. For K class labels: class_num values are the integers in range [0, K-1]. For LossFunction('binomial'): There is only one class_num value.
tree_order	SMALLINT	Identifier of a complete JSON order of the regression_tree/classification_tree column.
regression_tree /classification_tree	VARCHAR 32000	JSON representation of decision tree. For JSON types that can appear in the representation, see the following table.

JSON Types in JSON Representation of Decision Tree

JSON Type	Description
id_	Node identifier.
sum_	Appears only for regression trees. Sum of values of response variable at node identified by id.
sumSq_	Appears only for regression trees. Sum of squared values of response variable at node identified by id.
responseCounts_	Appears only for classification trees. The number of observations in each class at a node, identified by id.
size_	The total number of observations at a node, identified by id.
maxDepth_	Maximum possible depth of the tree, starting from node identified by id. For root node, the value is max_depth; for leaf nodes, 0; for other nodes, maximum possible depth of the tree, starting from that node.
split_	Start of JSON item describing a split at node identified by id.
splitValue_	The attribute value used for splitting a tree node.
score_	Gini score of the node identified by id.
attr_	Attribute (predictor) on which algorithm split at node identified by id.
type_	Type of tree and split. Possible values: REGRESSION_NUMERIC_SPLIT
leftNodeSize_	The number of observations assigned to the left node in the split.
rightNodeSize_	The number of observations assigned to the right node in the split.
leftChild_	Start of JSON item describing left child of a node, identified by id.
rightChild_	Start of JSON item describing right child of a node, identified by id.
nodeType_	Type of a node identified by id. Possible values: REGRESSION_NODE REGRESSION_LEAF

JSON Types in JSON Representation of Region Prediction

JSON Type	Description
id	Region identifier.
value	The value in the region.