XGBoost Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

ResponseColumn

Specify the name of the InputTable column that contains the response variable for each data point in the training data set.

NumericInputs

[Not for sparse format input data. With dense format input data, required if you omit CategoricalInputs.] Specify the names of the InputTable columns that contain the numeric predictor variables. These variables must be numeric values.

CategoricalInputs

[Not for sparse format input data. With dense format input data, required if you omit NumericInputs.] Specify the names of the InputTable columns that contain the categorical predictor variables. These variables can be either numeric or VARCHAR values. Each categorical_column can have at most 20 distinct values.

For information about columns that you must identify as categorical, see Identification of Categorical Columns.

LossFunction

[Optional] Specify the learning task and corresponding learning objective:

Option	Description
'softmax' (Default)	For multiple-class classification.
'binomial'	Negative binomial likelihood, for binary classification.

PredictionType

[Optional] Specify whether the function predicts the result from the number of classes ('classification') or from a continuous response variable ('regression'). The function supports only 'classification'.

AttributeNameColumn

[Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the names of the attributes of the input data set.

AttributeValueColumn

[Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the values of the attributes of the input data set.

RegularizationLambda

[Optional] Specify the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.

Default: 100000

ShrinkageFactor

[Optional] Specify the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.

Default: 0.1

ColumnSubSampling

Specify the fraction of features to subsample during boosting. The sample_fraction is a DOUBLE PRECISION value in the range (0, 1].

Default: 1.0 (no subsampling)

IDColumn

[Optional] Used with NumBoostedTrees. Specify the name of the InputTable column that contains a unique identifier for each data point in the test data set.

NumBoostedTrees

[Optional] Requires IDColumn. Specify the number of parallel boosted trees. The num_trees is an INTEGER value in the range [1, 100]. If num_trees is greater than 1, each boosting operates on a sample of the input data. Samples are determined by input data partitioning. The number of partitions equals the number of boosted trees. A higher num_trees value might improve function run time but decrease prediction accuracy.

Default: 1 if InputTable is a DIMENSION table; otherwise, the number of vworkers available in the cluster

IterNum

[Optional] Specify the number of iterations (rounds) to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].

Default: 10

MinNodeSize

[Optional] Specify a decision-tree stopping criterion, the minimum size of any node within each decision tree. If the size of any node becomes less than min_node_size, the algorithm stops looking for splits. The min_node_size must be an INTEGER of at least 1.

Default: 1

MaxDepth

[Optional] Specify the decision-tree stopping criterion that has the greatest effect on function performance, the maximum tree depth. If the tree depth exceeds max_depth, the algorithm stops looking for splits. A decision tree can grow to 2(max_depth+1)-1 nodes. The max_depth must be an INTEGER in the range [1, 100000].

Default: 12

Variance

[Optional] Specify a decision-tree stopping criterion, the minimum variance for any node. If the variance within any node becomes less than variance, the algorithm stops looking for splits. The variance is a nonnegative DOUBLE PRECISION value.

Default: 0

Seed

[Optional] Specify the random seed the algorithm uses for repeatable results (for more information, see Nondeterministic Results). If you omit Seed or specify its default value, 1, the function uses a faster algorithm but does not ensure repeatability.

The seed must be a LONG value greater than or equal to 1. To ensure repeatability, specify a seed greater than 1.

Default: 1