Optional Syntax Elements for TD_XGBoost - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢
ModelType
Specifies whether the analysis is a regression (continuous response variable) or a multiple-class classification (predicting result from the number of classes). Only Regression and Classification are accepted values.
Default: Regression.
MaxDepth
Specifies a decision tree stopping criterion. If the tree reaches a depth past this value, the algorithm could stops looking for splits. Decision trees can grow to ( 2(max_depth+1)-1) nodes. This stopping criterion has the greatest effect on the performance of the function. The maximum value is 2147483647.
Default: 5
MinNodeSize
Specifies a decision tree stopping criterion; the minimum size of any node within each decision tree.
Default: 1
NumParallelTrees
Specifies the parallels boosted trees number. The num_trees is an INTEGER value in the range [1, 10000]. Each boosted tree operates on a sample of data that fits in an AMP memory. By default, NumBoostedTrees is chosen equal to the number of AMPs with data.
If NumBoostedTrees is greater than the number of AMPs with data, each boosting operates on a sample of the input data, and the function estimates sample size (number of rows) using this formula: sample_size = total_number_of_input_rows / number_of_trees
The sample_size must fit in an AMP memory. It always uses the sample size (or tree size) that fits in an AMP memory to build tree models and ignores those rows cannot fit in memory.
A higher NumBoostedTrees value may improve function run time but may decrease prediction accuracy.
You can still use the previous argument name NumBoostedTrees.
Default: -1
RegularizationLambda
Specifies the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.
Default: 1
LearningRate
Specifies the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.
You can still use the previous argument name ShrinkageFactor.
Default: 0.5
ColumnSampling
Specifies the features fraction to sample during boosting. The sample_fracti on is a DOUBLE PRECISION value in the range (0, 1].
Default: 1.0
CoverageFactor
Specifies the coverage level for the dataset while boosting trees (in percentage, for example, 1.25 = 125% coverage). You can only use CoverageFactor if you do not supply NumBoostedTrees. When NumBoostedTrees is specified, coverage depends on the value of NumBoostedTrees. If NumBoostedTrees is not specified, NumBoostedTrees is chosen to achieve this level of coverage.
Default: 1.0
NumBoostRounds
Specifies the iterations (rounds) number to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].
You can still use the previous argument name IterNum.
Default: 10
Seed
Specifies an integer value to use in determining the random seed for column sampling.
Default: 1
BaseScore
Specifies the initial prediction value for all data points. Typically that value would be set to the mean of the observed value in the training set. This information is shown in the meta row in the model table. For classification, basescore value must be in the range (0, 1) and the default value is 0.5. The regression case accepts any double values in the range [-1e50, 1e50] and the default value is 0.
Default: 0
MinImpurity
Specifies the minimum impurity at which the tree stops splitting further down. For regression, a criteria of squared error is used whereas for classification, gini impurity is used.
Default: 0.0
TreeSize
Specifies the rows number that each tree uses as its input data set. The function builds a tree using either the number of rows on an AMP, the number of rows that fit into the AMP memory (whichever is less), or the number of rows given by the TreeSize argument. By default, this value is computed as the minimum of the number of rows on an AMP, and the number of rows that fit into the AMP memory.
By using argument TreeSize and reduce the value used in argument NumParallelTrees, most of the exceptions caused by out-of-memory (OOM) can be solved. For example, this is a typical exception caused by OOM.
Default: -1