The decision forest functions create a predictive model based on the algorithm for decision-tree training and prediction described in Classification and Regression Trees by Breiman, Friedman, Olshen, and Stone (1984).
Original Random Forests Algorithm
- If the number of cases in the training set is N, sample N cases at random, but with replacement from the original data. This sample becomes the training set for growing the tree.
- If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random from M and the best split on those m variables is used to split the node. The value of m is held constant during the forest growing.
- Each tree is grown to the largest extent possible. There is no pruning.
Random Forests® and RandomForests® are registered trademarks in the United States, owned by Minitab, Inc.
ML Engine Implementation
- The DecisionForest function lets you specify m using the optional argument Mtry. If you do not specify Mtry, the function uses all variables to train the decision tree (equivalent to bootstrap aggregating or bagging).
- The DecisionForest function randomly assigns rows to individual vworkers. Each vworker creates trees with a bootstrapping technique, using only its local data.
- The tree grows until any stopping criterion is met.
The ML Engine Decision Forest functions support regression, binary, and multiple-class classification problems.
For more detailed information about the ML Engine implementation of functionality like that of the Random Forests algorithm, including detailed examples, see the Teradata Orange Book "Bagging and Random Forest in Teradata Analytics," available from Teradata.
Function | Description |
---|---|
DecisionForest | Builds predictive model based on training data. |
Forest_Predict | Uses model output by DecisionForest function to analyze input data and make predictions. |
DecisionForestEvaluator | Analyzes model output by DecisionForest function and gives weights to variables used in model. Weights help you understand basis by which Forest_Predict function makes predictions. |
You can use the DecisionForest and Forest_Predict functions to create predictions input for the Receiver Operating Characteristic (ROC) function.