XGBoost_Drive Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

InputTable

Specifies the name of the table that contains the training data set.

OutputTable

[Optional] Specifies the name of the output table where the function stores the predictive model that it generates. If the database already has a table with this name, the DropOutputTable argument value determines whether the function drops the existing table. Default: 'xgboost_model'.

ResponseColumn

Specifies the name of the input_table column that contains the response variable for each data point in the training data set.

NumericInputs

[Required if you omit CategoricalInputs.] Specifies the names of the input_table columns that contain the numeric predictor variables. These variables must be numeric values.

CategoricalInputs

[Required if you omit NumericInputs.] Specifies the names of the input_table columns that contain the categorical predictor variables. These variables can be either numeric or VARCHAR values. Each categorical_column can have at most 20 distinct values.

LossFunction

[Optional] Specifies the learning task and corresponding learning objective:

'softmax' (Default): For multiple-class classification.
'binomial': Negative binomial likelihood, for binary classification.

PredictionType

[Optional] Specifies whether the function predicts the result from the number of classes ('classification') or from a continuous response variable ('regression'). The function supports only 'classification'.

RegularizationLambda

[Optional] Specifies the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization. Default: 100000.

ShrinkageFactor

[Optional] Specifies the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage. Default: 0.1.

ColumnSubSampling

Specifies the fraction of features to subsample during boosting. The sample_fraction is a DOUBLE PRECISION value in the range (0, 1]. Default: 1.0 (no subsampling).

IDColumn

[Optional] Used with NumBoostedTrees. Specifies the name of the input_table column that contains a unique identifier for each data point in the test data set.

NumBoostedTrees

[Optional] Requires IDColumn. Specifies the number of parallel boosted trees. The num_trees is an INTEGER value in the range [1, 100]. If num_trees is greater than 1, each boosting operates on a sample of the input data. Samples are determined by input data partitioning. The number of partitions equals the number of boosted trees. Default: 1 if input_table is a DIMENSION table; otherwise, the number of vworkers available in the cluster. A higher num_trees value might improve function run time but decrease prediction accuracy.

IterNum

[Optional] Specifies the number of iterations (rounds) to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000]. Default: 10.

MinNodeSize

[Optional] Specifies a decision-tree stopping criterion, the minimum size of any node within each decision tree. If the size of any node becomes less than min_node_size, the algorithm stops looking for splits. The min_node_size must be an INTEGER of at least 1. Default: 1.

MaxDepth

[Optional] Specifies the decision-tree stopping criterion that has the greatest effect on function performance, the maximum tree depth. If the tree depth exceeds max_depth, the algorithm stops looking for splits. A decision tree can grow to 2(max_depth+1)-1 nodes. The max_depth must be an INTEGER in the range [1, 100000]. Default: 12.

Variance

[Optional] Specifies a decision-tree stopping criterion, the minimum variance for any node. If the variance of any node becomes less than min_node_size, the algorithm stops looking for splits. The min_node_size is a nonnegative DOUBLE PRECISION value. Default: 0.

Seed

[Optional] Specifies a value to use in determining the seed for the random number generator. If you specify this value, you can specify the same value in future calls to this function, and the function builds the same tree. The function uses the seed for column sampling. The seed must be a LONG value of at least 1. Default: 1.

DropOutputTable

[Optional] Specifies whether to drop output_table if it exists. Default: 'false'.