Implementation Notes - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

In the original Random Forest algorithm developed by Leo Breiman, each tree grows as follows:

  • If the number of cases in the training set is N, sample N cases at random, but with replacement from the original data. This sample becomes the training set for growing the tree.
  • If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random from M and the best split on those m variables is used to split the node. The value of m is held constant during the forest growing.
  • Each tree is grown to the largest extent possible. There is no pruning.

Teradata Aster’s implementation of the Random Forest algorithm differs from Leo Breiman’s algorithm in the following ways:

  • The Forest_Drive function lets you specify m using the optional argument Mtry. If you do not specify Mtry, the function uses all variables to train the decision tree (equivalent to bootstrap aggregating or bagging).
  • The Forest_Drive function randomly assigns rows to individual vworkers. Each vworker creates trees with a bootstrapping technique, using only its local data.