Implementation Notes - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

In the original Random Forest algorithm developed by Leo Breiman, each tree grows as follows:

If the number of cases in the training set is N, sample N cases at random, but with replacement from the original data. This sample becomes the training set for growing the tree.
If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random from M and the best split on those m variables is used to split the node. The value of m is held constant during the forest growing.
Each tree is grown to the largest extent possible. There is no pruning.

Teradata Aster’s implementation of the Random Forest algorithm differs from Leo Breiman’s algorithm in the following ways:

The Forest_Drive function lets you specify m using the optional argument Mtry. If you do not specify Mtry, the function uses all variables to train the decision tree (equivalent to bootstrap aggregating or bagging).
The Forest_Drive function randomly assigns rows to individual vworkers. Each vworker creates trees with a bootstrapping technique, using only its local data.