- OUT clause
- Specifies the name of the output table that records training accuracy over iterations.
- ModelType
- Specifies whether the analysis is a regression (continuous response variable) or a multiple-class classification (predicting result from the number of classes). Only Regression and Classification are accepted values.
- MaxDepth
- Specifies a decision tree stopping criterion. If the tree reaches a depth past this value, the algorithm could stops looking for splits. Decision trees can grow to ( 2(max_depth+1)-1) nodes. This stopping criterion has the greatest effect on the performance of the function. The maximum value is 2147483647.
- MinNodeSize
- Specifies a decision tree stopping criterion; the minimum size of any node within each decision tree.
- NumParallelTrees
- Specifies the parallels boosted trees number. The num_trees is an INTEGER value in the range [1, 10000]. Each boosted tree operates on a sample of data that fits in an AMP memory. By default, NumBoostedTrees is chosen equal to the number of AMPs with data.
- RegularizationLambda
- Specifies the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.
- LearningRate
- Specifies the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.You can still use the previous argument name ShrinkageFactor.
- ColumnSampling
- Specifies the features fraction to sample during boosting. The sample_fracti on is a DOUBLE PRECISION value in the range (0, 1].
- CoverageFactor
- Specifies the coverage level for the dataset while boosting trees (in percentage, for example, 1.25 = 125% coverage). You can only use CoverageFactor if you do not supply NumBoostedTrees. When NumBoostedTrees is specified, coverage depends on the value of NumBoostedTrees. If NumBoostedTrees is not specified, NumBoostedTrees is chosen to achieve this level of coverage.
- NumBoostRounds
- Specifies the iterations (rounds) number to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].
- Seed
- Specifies an integer value to use in determining the random seed for column sampling.
- BaseScore
- Specifies the initial prediction value for all data points. Typically that value would be set to the mean of the observed value in the training set. This information is shown in the meta row in the model table. For classification, basescore value must be in the range (0, 1) and the default value is 0.5. The regression case accepts any double values in the range [-1e50, 1e50] and the default value is 0.
- MinImpurity
- Specifies the minimum impurity at which the tree stops splitting further down. For regression, a criteria of squared error is used whereas for classification, gini impurity is used.
- TreeSize
- Specifies the rows number that each tree uses as its input data set. The function builds a tree using either the number of rows on an AMP, the number of rows that fit into the AMP memory (whichever is less), or the number of rows given by the TreeSize argument. By default, this value is computed as the minimum of the number of rows on an AMP, and the number of rows that fit into the AMP memory.By using argument TreeSize and reduce the value used in argument NumParallelTrees, most of the exceptions caused by out-of-memory (OOM) can be solved. For example, this is a typical exception caused by OOM.
- DataRedistributionColumn
Specifies the name of the column used to redistribute the data. The maximum value is 128.
- If the number of unique values in this column is less than the result of "total number of input rows / MinRowsPerAmp argument", then the rows in the input table will be distributed to the AMPs equivalent to the number of unique values in this column.
- If the number of unique values in this column is greater than the result of "total number of input rows / MinRowsPerAmp", then the rows in the input table will be distributed to the AMPs equivalent to the "total number of input rows / MinRowsPerAmp".
- MinRowsPerAmp
- Specifies the minimum number of rows (input table rows) an AMP should have when data needs to be redistributed using DataRedistributionColumn.
Default: 100