Optional Syntax Elements for TD_XGBoostPredict

Optional Syntax Elements for TD_XGBoostPredict - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

NumParallelTrees

Specifies how many boosted trees to use to make predictions.

A combination of both task_Index and tree_num in the model table determines the ampID and number of trees generated by that AMP. As the model table is ordered with these two arguments, the number of boosted trees that are loaded are based on this order.

For example, if there are two AMPs on the system and AMP 1 (task_index - 0) generates three boosted trees (tree_num:1,2,3) while amp 2 (task_index -1) generate two boosted trees (tree_num: 1,2). Then, NumBoostedTree(4) loads three boosted trees from AMP1 (task_index - 0) and one boosted tree from AMP2 (task_index - 1).

As one boosted tree is skipped altogether from loading it in memory and making predictions, this results in a faster elapsed time for queries compared to loading all trees in memory. However, this can also lead to loss in prediction accuracy. In addition, any unique tree is determined by task_index, tree_id and iterNum in the model table.

You can still use the previous argument name NumBoostedTrees.

Default: 1000

NumBoostRounds

Specifies how many iterations to load for each boosted tree to make predictions.

For example, AMP1 (task_index:0) generates three boosted trees (tree_num: 1,2,3) with each tree having four iterations (iter:1,2,3,4). There are 12 trees in total. IterNum(2) only loads two iterations per boosted tree, that is, only six trees are loaded for this example.

As trees are skipped from loading it in memory and making predictions over them, this results in a faster elapsed time for queries compared to loading all trees in memory. However, this can also lead to loss in prediction accuracy.

You can still use the previous argument name IterNum.

Default: 10

ModelType

For classification, output the prediction column as integers. These integral values represent different categories, and so are better observed as an integer column. To make the output schema for prediction column as an integer, set ModelType as Classification.

Default: the prediction column is output as Real Valued Column.

OutputProb

Specifies whether to output the probability for each response.

If OutputProb is true and responses are not provided, output the probability of the predicted class.
The OutputProb argument works only with a classification.

Default: false

Responses

Specifies the classes for which to output probabilities.

If OutputProb is true and responses are not provided, output the probability of the predicted class.
The Responses argument works only with a classification.

Accumulate

Specifies the input columns names to copy to the output table.

The processing time is controlled by (proportional to):
- The number of boosted trees used for prediction from the model (controlled by NumParallelTrees).
- The number of iterations (sub-trees) used for prediction from the model in each boosted tree (controlled by IterNum).
A careful choice of these parameters can be used to control the processing time. When the boosted trees size grows more than what can fit in memory, the trees are cached in a local spool space, which may impact the performance of the function compared to the case when all trees fit in memory.