1.0 - 8.00 - XGBoost Arguments - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)
ResponseColumn
Specify the name of the InputTable column that contains the response variable for each data point in the training data set.
NumericInputs
[Not for sparse format input data. With dense format input data, required if you omit CategoricalInputs.] Specify the names of the InputTable columns that contain the numeric predictor variables. These variables must be numeric values.
CategoricalInputs
[Not for sparse format input data. With dense format input data, required if you omit NumericInputs.] Specify the names of the InputTable columns that contain the categorical predictor variables. These variables can be either numeric or VARCHAR values. Each categorical_column can have at most 20 distinct values.
For information about columns that you must identify as categorical, see Identification of Categorical Columns.
LossFunction
[Optional] Specify the learning task and corresponding learning objective:
Option Description
'softmax' (Default) For multiple-class classification.
'binomial' Negative binomial likelihood, for binary classification.
PredictionType
[Optional] Specify whether the function predicts the result from the number of classes ('classification') or from a continuous response variable ('regression'). The function supports only 'classification'.
AttributeNameColumn
[Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the names of the attributes of the input data set.
AttributeValueColumn
[Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the values of the attributes of the input data set.
RegularizationLambda
[Optional] Specify the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.
Default: 100000
ShrinkageFactor
[Optional] Specify the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.
Default: 0.1
ColumnSubSampling
Specify the fraction of features to subsample during boosting. The sample_fraction is a DOUBLE PRECISION value in the range (0, 1].
Default: 1.0 (no subsampling)
IDColumn
[Optional] Used with NumBoostedTrees. Specify the name of the InputTable column that contains a unique identifier for each data point in the test data set.
NumBoostedTrees
[Optional] Requires IDColumn. Specify the number of parallel boosted trees. The num_trees is an INTEGER value in the range [1, 100]. If num_trees is greater than 1, each boosting operates on a sample of the input data. Samples are determined by input data partitioning. The number of partitions equals the number of boosted trees. A higher num_trees value might improve function run time but decrease prediction accuracy.
Default: 1 if InputTable is a DIMENSION table; otherwise, the number of vworkers available in the cluster
IterNum
[Optional] Specify the number of iterations (rounds) to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].
Default: 10
MinNodeSize
[Optional] Specify a decision-tree stopping criterion, the minimum size of any node within each decision tree. If the size of any node becomes less than min_node_size, the algorithm stops looking for splits. The min_node_size must be an INTEGER of at least 1.
Default: 1
MaxDepth
[Optional] Specify the decision-tree stopping criterion that has the greatest effect on function performance, the maximum tree depth. If the tree depth exceeds max_depth, the algorithm stops looking for splits. A decision tree can grow to 2(max_depth+1)-1 nodes. The max_depth must be an INTEGER in the range [1, 100000].
Default: 12
Variance
[Optional] Specify a decision-tree stopping criterion, the minimum variance for any node. If the variance within any node becomes less than variance, the algorithm stops looking for splits. The variance is a nonnegative DOUBLE PRECISION value.
Default: 0
Seed
[Optional] Specify the random seed the algorithm uses for repeatable results (for more information, see Nondeterministic Results). If you omit Seed or specify its default value, 1, the function uses a faster algorithm but does not ensure repeatability.
The seed must be a LONG value greater than or equal to 1. To ensure repeatability, specify a seed greater than 1.
Default: 1