- ResponseColumn
- Specify the name of the column that contains the response variable (that is, the quantity that you want to predict).
- NumericInputs
- [Required if CategoricalInputs is omitted.] Specify the names of the columns that contain the numeric predictor variables (which must be numeric values).
- CategoricalInputs
- [Required if NumericInputs is omitted.] Specify the names of the columns that contain the categorical predictor variables (which can be either numeric or VARCHAR values).
Each categorical input column can have at most max_cat_values distinct categorical values. If max_cat_values exceeds 20, the function might run out of memory, because classification trees grow rapidly as max_cat_valuesincreases.
For information about columns that you must identify as categorical, see Identification of Categorical Columns. - TreeType
- [Optional] Specify whether the analysis is a regression (continuous response variable) or a multiple-class classification (predicting result from the number of classes).
- NumTrees
- [Optional] Specify the number of trees to grow in the forest model. When specified, number_of_trees must be greater than or equal to the number of vworkers.
When not specified, the function builds the minimum number of trees that provides the input data set with full coverage.
- TreeSize
- [Optional] Specify the number of rows that each tree uses as its input data set.
- MinNodeSize
- [Optional] Specify a decision tree stopping criterion; the minimum size of any node within each decision tree.
- Variance
- [Optional] Specify a decision tree stopping criterion. If the variance within any node dips below this value, the algorithm stops looking for splits in the branch.
- MaxDepth
- [Optional] Specify a decision tree stopping criterion. If the tree reaches a depth past this value, the algorithm stops looking for splits. Decision trees can grow to (2(max_depth+1) - 1) nodes. This stopping criteria has the greatest effect on the performance of the function.
- Mtry
- [Optional] Specify the number of variables to randomly sample from each input value. For example, if mtry is 3, then the function randomly samples 3 variables from each input at each split. The mtry must be an INTEGER.
- MtrySeed
- [Optional] Specify a LONG value to use in determining the random seed for mtry.
- OutOfBag
- [Optional] Specify whether to output the out-of-bag estimate of error rate.
- DisplayNumProcessedRows
- [Optional] Specify whether to output the number of input rows allocated to each worker and the number of input rows processed by each worker (excluding rows skipped because they contained NULL values).
- Seed
- [Optional] Specify the random seed the algorithm uses for repeatable results (for more information, see Nondeterministic Results). The seed must be a LONG value.