DecisionForest Example 3: Regression Tree with Out-of-Bag Error - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example creates a regression tree to predict the median value of homes in $1000s.

Input

The InputTable has Boston housing value data. Predictors include crime rate, proportion of nonretail business, average number of rooms in each dwelling, accessibility to radial highways, and property tax rate.

InputTable: boston
id crim zn indus chas nox rm age dis rad tax ptratio black lstat medi
3 0.02729 32 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
13 0.09378 12.5 7.87 0 0.524 5.889 39 5.4509 5 311 15.2 390.5 15.71 21.7
37 0.09744 0 5.96 0 0.499 5.841 61.4 3.3779 5 279 19.2 377.56 11.41 20
47 0.18836 0 6.91 0 0.448 5.786 33.3 5.1004 3 233 17.9 396.9 14.15 20
71 0.08826 0 10.81 0 0.413 6.417 6.6 5.2873 4 305 19.2 383.73 6.72 24.2
81 0.04113 25 4.86 0 0.426 6.727 33.5 5.4007 4 281 19 396.9 5.29 28
91 0.04684 0 3.41 0 0.489 6.417 66.1 3.0923 2 270 17.8 392.18 8.81 22.6
115 0.14231 0 10.01 0 0.547 6.254 84.2 2.2565 6 432 17.8 388.74 10.45 18.5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

SQL Call

SELECT * FROM DecisionForest (
  ON boston AS InputTable
  OUT TABLE OutputTable (rtf_model)
  OUT TABLE MonitorTable(boston_monitor_table)
  USING
  ResponseColumn ('medv') 
  NumericInputs ('crim','zn','indus','chas','nox','rm',
    'age','dis','rad','tax','ptratio','black','lstat')
  TreeType ('regression')
  MinNodeSize ('2')
  MaxDepth ('6') 
  NumTrees ('50')
  OutOfBag ('true') 
) AS dt;

Output

message
Computing 48 regression trees.
Each worker is computing 16 trees.
Each tree will contain approximately 168 points.
Poisson sampling parameter: 0.996
Mean of squared residuals: 15.396538832527742
% Var explained: 81.76188132990643
Decision forest created.

This query returns the following table:

SELECT task_index, tree_num, cast (tree AS VARCHAR(50))
  FROM rft_model ORDER BY 1;
rtf_model
task_index tree_num cast (tree as VARCHAR(50))
0 0 {"sum_":10723.700000000017,"sumSq_":283628.49,"siz
0 2 {"sum_":11020.200000000019,"sumSq_":282336.0800000
0 3 {"sum_":11577.600000000006,"sumSq_":302737.4599999
0 1 {"sum_":11877.69999999999,"sumSq_":309107.12999999