- OutputTable
- [Optional] Specify the name for the model table that the function outputs.
- ResponseColumn
- Specify the name of the InputTable column that contains the response variable for each data point in the training data set.
- NumericInputs
- [Not for sparse format input data. With dense format input data, required if you omit CategoricalInputs.] Specify the names of the InputTable columns to treat as the numeric predictor variables. These variables must be numeric values.
- CategoricalInputs
- [Not for sparse format input data. With dense format input data, required if you omit NumericInputs.] Specify the names of the InputTable columns to treat as the categorical predictor variables. These variables can be either numeric or VARCHAR values.
- LossFunction
- [Optional] Specify the learning task and corresponding learning objective:
Option Description 'softmax' (Default) For multiple-class classification. 'binomial' Negative binomial likelihood, for binary classification. - PredictionType
- [Optional] The function supports only 'classification'.
- AttributeNameColumn
- [Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the names of the attributes of the input data set.
- AttributeValueColumn
- [Required if the input data set is in sparse format] Specify the name of the InputTable column that contains the values of the attributes of the input data set.
- RegularizationLambda
- [Optional] Specify the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.
- ShrinkageFactor
- [Optional] Specify the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.
- ColumnSubSampling
- Specify the fraction of features to subsample during boosting. The sample_fraction is a DOUBLE PRECISION value in the range (0, 1].
- IDColumn
- [Optional] Used with NumBoostedTrees. Specify the name of the InputTable column that contains a unique identifier for each data point in the test data set.
- NumBoostedTrees
- [Optional] Requires IDColumn. Specify the number of parallel boosted trees. The num_trees is an INTEGER value in the range [1, 10000]. If num_trees is greater than 1, each boosting operates on a sample of the input data, and the function estimates sample size (number of rows) using this formula:
- IterNum
- [Optional] Specify the number of iterations (rounds) to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].
- MinNodeSize
- [Optional] Specify a decision-tree stopping criterion, the minimum size of any node within each decision tree. If the size of any node becomes less than min_node_size, the algorithm stops looking for splits. The min_node_size must be an INTEGER of at least 1.
- MaxDepth
- [Optional] Specify the decision-tree stopping criterion that has the greatest effect on function performance, the maximum tree depth. If the tree depth exceeds max_depth, the algorithm stops looking for splits. A decision tree can grow to 2(max_depth+1)-1 nodes. The max_depth must be an INTEGER in the range [1, 100000].
- Variance
- [Optional] Specify a decision-tree stopping criterion, the minimum variance for any node. If the variance within any node becomes less than variance, the algorithm stops looking for splits. The variance is a nonnegative DOUBLE PRECISION value.
- Seed
- [Optional] Specify the random seed the algorithm uses for repeatable results. If you omit Seed, the function uses a faster algorithm but does not ensure repeatability.