Decision Forest Functions (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™

The decision forest functions create a predictive model based on the algorithm for decision-tree training and prediction described in Classification and Regression Trees by Breiman, Friedman, Olshen, and Stone (1984).

Original Random Forests Algorithm

In the original Random Forests algorithm developed by Leo Breiman and Adele Cutler, each tree grows as follows:
  • If the number of cases in the training set is N, sample N cases at random, but with replacement from the original data. This sample becomes the training set for growing the tree.
  • If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random from M and the best split on those m variables is used to split the node. The value of m is held constant during the forest growing.
  • Each tree is grown to the largest extent possible. There is no pruning.

Random Forests® and RandomForests® are registered trademarks in the United States, owned by Minitab, Inc.

ML Engine Implementation

ML Engine implementation differs from the Breiman algorithm in the following ways:
  • The DecisionForest function lets you specify m using the optional syntax element Mtry. If you do not specify Mtry, the function uses all variables to train the decision tree (equivalent to bootstrap aggregating or bagging).
  • The DecisionForest function randomly assigns rows to individual vworkers. Each vworker creates trees with a bootstrapping technique, using only its local data.
  • The tree grows until any stopping criterion is met.

ML Engine Decision Forest functions support regression, binary, and multiple-class classification problems.

For more detailed information about ML Engine implementation of functionality like that of the Random Forests algorithm, including detailed examples, see Bagging and Random Forest in Teradata® Aster Analytics, TDN0009013.

Function Description
DecisionForest (ML Engine) Builds predictive model based on training data.
DecisionForestPredict_MLE (ML Engine) Uses model output by DecisionForest function to analyze input data and make predictions.
DecisionForestEvaluator (ML Engine) Analyzes model output by DecisionForest function and gives weights to variables used in model. Weights help you understand basis by which DecisionForestPredict_MLE function makes predictions.

You can use the DecisionForest and DecisionForestPredict_MLE functions to create predictions input for the Receiver Operating Characteristic (ROC) (ML Engine) function.