- ModelType
- Specifies whether the analysis is a regression (continuous response variable) or a multiple-class classification (predicting result from the number of classes). Only Regression and Classification are accepted values.
- Default: Regression.
- MaxDepth
- Specifies a decision tree stopping criterion. If the tree reaches a depth past this value, the algorithm could stops looking for splits. Decision trees can grow to ( 2(max_depth+1)-1) nodes. This stopping criterion has the greatest effect on the performance of the function. The maximum value is 2147483647.
- Default: 5
- MinNodeSize
- Specifies a decision tree stopping criterion; the minimum size of any node within each decision tree.
- Default: 1
- NumParallelTrees
- Specifies the parallels boosted trees number. The num_trees is an INTEGER value in the range [1, 10000]. Each boosted tree operates on a sample of data that fits in an AMP memory. By default, NumBoostedTrees is chosen equal to the number of AMPs with data.
- If NumBoostedTrees is greater than the number of AMPs with data, each boosting operates on a sample of the input data, and the function estimates sample size (number of rows) using this formula: sample_size = total_number_of_input_rows / number_of_trees
- The sample_size must fit in an AMP memory. It always uses the sample size (or tree size) that fits in an AMP memory to build tree models and ignores those rows cannot fit in memory.
- A higher NumBoostedTrees value may improve function run time but may decrease prediction accuracy.
You can still use the previous argument name NumBoostedTrees.
- Default: -1
- RegularizationLambda
- Specifies the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.
- Default: 1
- LearningRate
- Specifies the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.
You can still use the previous argument name ShrinkageFactor.
- Default: 0.5
- ColumnSampling
- Specifies the features fraction to sample during boosting. The sample_fracti on is a DOUBLE PRECISION value in the range (0, 1].
- Default: 1.0
- CoverageFactor
- Specifies the coverage level for the dataset while boosting trees (in percentage, for example, 1.25 = 125% coverage). You can only use CoverageFactor if you do not supply NumBoostedTrees. When NumBoostedTrees is specified, coverage depends on the value of NumBoostedTrees. If NumBoostedTrees is not specified, NumBoostedTrees is chosen to achieve this level of coverage.
- Default: 1.0
- NumBoostRounds
- Specifies the iterations (rounds) number to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].
You can still use the previous argument name IterNum.
- Default: 10
- Seed
- Specifies an integer value to use in determining the random seed for column sampling.
- Default: 1
- BaseScore
- Specifies the initial prediction value for all data points. Typically that value would be set to the mean of the observed value in the training set. This information is shown in the meta row in the model table. For classification, basescore value must be in the range (0, 1) and the default value is 0.5. The regression case accepts any double values in the range [-1e50, 1e50] and the default value is 0.
- Default: 0
- MinImpurity
- Specifies the minimum impurity at which the tree stops splitting further down. For regression, a criteria of squared error is used whereas for classification, gini impurity is used.
- Default: 0.0
- TreeSize
- Specifies the rows number that each tree uses as its input data set. The function builds a tree using either the number of rows on an AMP, the number of rows that fit into the AMP memory (whichever is less), or the number of rows given by the TreeSize argument. By default, this value is computed as the minimum of the number of rows on an AMP, and the number of rows that fit into the AMP memory.
By using argument TreeSize and reduce the value used in argument NumParallelTrees, most of the exceptions caused by out-of-memory (OOM) can be solved. For example, this is a typical exception caused by OOM.
- Default: -1