XGBoost Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

OutputTable

[Optional] Specify the name for the model table that the function outputs.

Default: xgboost_model in the current schema

ResponseColumn

Specify the name of the InputTable column that contains the response variable for each data point in the training data set.

NumericInputs

[Not for sparse format input data. With dense format input data, required if you omit CategoricalInputs.] Specify the names of the InputTable columns to treat as the numeric predictor variables. These variables must be numeric values.

CategoricalInputs

[Not for sparse format input data. With dense format input data, required if you omit NumericInputs.] Specify the names of the InputTable columns to treat as the categorical predictor variables. These variables can be either numeric or VARCHAR values.

For information about columns that you must identify as numeric or categorical, see Identification of Numeric and Categorical Columns.

PredictionType

[Required with LossFunction ('mse'), optional otherwise.] Specify the prediction type.

Default: 'classification'

LossFunction

[Required with PredictionType ('regression'), optional otherwise.] Specify the learning task and corresponding learning objective:

PredictionType	Option	Description
'classification' (Default)	'softmax' (Default)	For multiple-class classification.
'classification' (Default)	'binomial'	Negative binomial likelihood, for binary classification.
'regression'	'mse'	Mean squared error.

AttributeNameColumn

[Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the names of the attributes of the input data set.

AttributeValueColumn

[Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the values of the attributes of the input data set.

RegularizationLambda

[Optional] Specify the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.

Default: 100000

ShrinkageFactor

[Optional] Specify the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.

Default: 0.1

ColumnSubSampling

Specify the fraction of features to subsample during boosting. The sample_fraction is a DOUBLE PRECISION value in the range (0, 1].

Default: 1.0 (no subsampling)

IDColumn

[Required with NumBoostedTrees or Seed, of if InputTable is in sparse format; optional otherwise.] Specify the name of the InputTable column that contains a unique identifier for each data point in the test data set.

NumBoostedTrees

[Optional] Requires IDColumn. Specify the number of parallel boosted trees. The num_trees is an INTEGER value in the range [1, 10000]. If num_trees is greater than 1, each boosting operates on a sample of the input data, and the function estimates sample size (number of rows) using this formula:

sample_size = total_number_of_input_rows / number_of_trees

The sample_size must fit in a vworker memory.

A higher num_trees value might improve function run time but decrease prediction accuracy.

Default: 1 if InputTable is a DIMENSION table; otherwise, number of vworkers available in the cluster.

IterNum

[Optional] Specify the number of iterations (rounds) to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].

Default: 10

MinNodeSize

[Optional] Specify a decision-tree stopping criterion, the minimum size of any node within each decision tree. If the size of any node becomes less than min_node_size, the algorithm stops looking for splits. The min_node_size must be an INTEGER of at least 1.

Default: 1

MaxDepth

[Optional] Specify the decision-tree stopping criterion that has the greatest effect on function performance, the maximum tree depth. If the tree depth exceeds max_depth, the algorithm stops looking for splits. A decision tree can grow to 2(max_depth+1)-1 nodes. The max_depth must be an INTEGER in the range [1, 100000].

Default: 12

Variance

[Optional] Specify a decision-tree stopping criterion, the minimum variance for any node. If the variance within any node becomes less than variance, the algorithm stops looking for splits. The variance is a nonnegative DOUBLE PRECISION value.

Default: 0

Seed

[Optional] Specify the random seed the algorithm uses for repeatable results. If you omit Seed, the function uses a faster algorithm but does not ensure repeatability.

The seed must be a LONG value greater than or equal to 1.

For repeatable results, use both the Seed and UniqueID syntax elements. For more information, see Nondeterministic Results and UniqueID Syntax Element.

OutputAccuracy

Specify whether to show training accuracy over iterations in the output message.

Default: 'false'