1.0 - 8.00 - Decision Forest Functions - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

The decision forest functions create a predictive model based on the algorithm for decision-tree training and prediction described in Classification and Regression Trees by Breiman, Friedman, Olshen, and Stone (1984).

Original Random Forests Algorithm

In the original Random Forests algorithm developed by Leo Breiman and Adele Cutler, each tree grows as follows:
  • If the number of cases in the training set is N, sample N cases at random, but with replacement from the original data. This sample becomes the training set for growing the tree.
  • If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random from M and the best split on those m variables is used to split the node. The value of m is held constant during the forest growing.
  • Each tree is grown to the largest extent possible. There is no pruning.

Random Forests® and RandomForests® are registered trademarks in the United States, owned by Minitab, Inc.

ML Engine Implementation

The ML Engine implementation differs from the Breiman algorithm in the following ways:
  • The DecisionForest function lets you specify m using the optional argument Mtry. If you do not specify Mtry, the function uses all variables to train the decision tree (equivalent to bootstrap aggregating or bagging).
  • The DecisionForest function randomly assigns rows to individual vworkers. Each vworker creates trees with a bootstrapping technique, using only its local data.
  • The tree grows until any stopping criterion is met.

The ML Engine Decision Forest functions support regression, binary, and multiple-class classification problems.

For more detailed information about the ML Engine implementation of functionality like that of the Random Forests algorithm, including detailed examples, see the Teradata Orange Book "Bagging and Random Forest in Teradata Analytics," available from Teradata.

Function Description
DecisionForest Builds predictive model based on training data.
Forest_Predict Uses model output by DecisionForest function to analyze input data and make predictions.
DecisionForestEvaluator Analyzes model output by DecisionForest function and gives weights to variables used in model. Weights help you understand basis by which Forest_Predict function makes predictions.

You can use the DecisionForest and Forest_Predict functions to create predictions input for the Receiver Operating Characteristic (ROC) function.