Optional Syntax Elements for TD_XGBoostPredict - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
lifecycle
latest
Product Category
Teradata Vantageā„¢
NumParallelTrees
Specifies how many boosted trees to use to make predictions.
A combination of both task_Index and tree_num in the model table determines the ampID and number of trees generated by that AMP. As the model table is ordered with these two arguments, the number of boosted trees that are loaded are based on this order.
For example, if there are two AMPs on the system and AMP 1 (task_index - 0) generates three boosted trees (tree_num:1,2,3) while amp 2 (task_index -1) generate two boosted trees (tree_num: 1,2). Then, NumBoostedTree(4) loads three boosted trees from AMP1 (task_index - 0) and one boosted tree from AMP2 (task_index - 1).
As one boosted tree is skipped altogether from loading it in memory and making predictions, this results in a faster elapsed time for queries compared to loading all trees in memory. However, this can also lead to loss in prediction accuracy. In addition, any unique tree is determined by task_index, tree_id and iterNum in the model table.
You can still use the previous argument name NumBoostedTrees.
Default: 1000
NumBoostRounds
Specifies how many iterations to load for each boosted tree to make predictions.
For example, AMP1 (task_index:0) generates three boosted trees (tree_num: 1,2,3) with each tree having four iterations (iter:1,2,3,4). There are 12 trees in total. IterNum(2) only loads two iterations per boosted tree, that is, only six trees are loaded for this example.
As trees are skipped from loading it in memory and making predictions over them, this results in a faster elapsed time for queries compared to loading all trees in memory. However, this can also lead to loss in prediction accuracy.
You can still use the previous argument name IterNum.
Default: 10
ModelType
For classification, output the prediction column as integers. These integral values represent different categories, and so are better observed as an integer column. To make the output schema for prediction column as an integer, set ModelType as Classification.
Default: the prediction column is output as Real Valued Column.
OutputProb
Specifies whether to output the probability for each response.
  • If OutputProb is true and responses are not provided, output the probability of the predicted class.
  • The OutputProb argument works only with a classification.
Default: false
Responses
Specifies the classes for which to output probabilities.
  • If OutputProb is true and responses are not provided, output the probability of the predicted class.
  • The Responses argument works only with a classification.
Accumulate
Specifies the input columns names to copy to the output table.
  • The processing time is controlled by (proportional to):
    • The number of boosted trees used for prediction from the model (controlled by NumParallelTrees).
    • The number of iterations (sub-trees) used for prediction from the model in each boosted tree (controlled by IterNum).

    A careful choice of these parameters can be used to control the processing time. When the boosted trees size grows more than what can fit in memory, the trees are cached in a local spool space, which may impact the performance of the function compared to the case when all trees fit in memory.