- ResponseColumn
- Specify the name of the InputTable column that contains the response variable (that is, the quantity you want to predict).
- NumericInputs
- [Optional] Specify the names of the InputTable columns that have the numeric predictor variables (which must be numeric values).
- CategoricalInputs
- [Optional] Specify the names of the InputTable columns that have the categorical predictor variables (which can be either numeric or VARCHAR values).
- TreeType
- [Optional] Specify whether the analysis is a regression (continuous response variable) or a multiple-class classification (predicting result from the number of classes).
- NumTrees
- [Optional] Specify the number of trees to grow in the forest model.
- MinNodeSize
- [Optional] Specify a decision tree stopping criterion; the minimum size of any node within each decision tree.
- Variance
- [Optional] Specify a decision tree stopping criterion. If the variance within any node dips below this value, the algorithm stops looking for splits in the branch.
- MaxDepth
- [Optional] Specify a decision tree stopping criterion. If the tree reaches a depth past this value, the algorithm stops looking for splits. Decision trees can grow to (2(max_depth+1) - 1) nodes. This stopping criteria has the greatest effect on the performance of the function.
- CategoricalEncoding
- [Optional with CategoricalColumns, disallowed otherwise.] Specify the encoding scheme to use for categorical variables.
Option Description Target Uses target encoding described in https://dl.acm.org/citation.cfm?id=507538. Supports regression and binary classification.
Does not create high dimensionality, but requires careful validation, as it is prone to overfitting when distribution of categorical variables in training data and test data differ significantly.
GrayCode Recommended when accuracy is critical. Depending on available memory, performance may be impacted if a categorical column has a large number (for example, 20) unique levels, even with a small data set. Hashing Optimizes calculation speed and minimizes memory use, possibly decreasing accuracy. - MinSamplesForEncoding
- [Optional with CategoricalEncoding ('Target'), disallowed otherwise.] Specify minimum number of samples for target encoding, which is k in the following formula:
- Smoothing
- [Optional with CategoricalEncoding ('Target'), disallowed otherwise.] Specify smoothing parameter for target encoding, which is f in the following formula:
- ErrorHandler
- [Optional] Specify whether the function stops on error or continues with the next model:
Value Function Behavior 'true' Function skips partition where error occurs and continues with next partition. Output table displays error for the partitions for which an error occurs. 'false' Function stops and reports error. - Mtry
- [Optional] Specify the number of variables to randomly sample from each input value. For example, if mtry is 3, then the function randomly samples 3 variables from each input at each split. The mtry must be an INTEGER.
- MtrySeed
- [Optional] Specify a LONG value to use in determining the random seed for mtry.
- DisplayNumProcessedRows
- [Optional] Specify whether to output the number of input rows allocated to each worker and the number of input rows processed by each worker (excluding rows skipped because they contained NULL values).
- Seed
- [Optional] Specify the random seed the algorithm uses for repeatable results. The seed must be a LONG value.For repeatable results, use both the Seed and UniqueID syntax elements. For more information, see Nondeterministic Results and UniqueID Syntax Element.