This example creates a regression tree to predict the median value of homes in $1000s.
Input
The InputTable has Boston housing value data. Predictors include crime rate, proportion of nonretail business, average number of rooms in each dwelling, accessibility to radial highways, and property tax rate.
id | crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | black | lstat | medi |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 0.02729 | 32 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 | 34.7 |
13 | 0.09378 | 12.5 | 7.87 | 0 | 0.524 | 5.889 | 39 | 5.4509 | 5 | 311 | 15.2 | 390.5 | 15.71 | 21.7 |
37 | 0.09744 | 0 | 5.96 | 0 | 0.499 | 5.841 | 61.4 | 3.3779 | 5 | 279 | 19.2 | 377.56 | 11.41 | 20 |
47 | 0.18836 | 0 | 6.91 | 0 | 0.448 | 5.786 | 33.3 | 5.1004 | 3 | 233 | 17.9 | 396.9 | 14.15 | 20 |
71 | 0.08826 | 0 | 10.81 | 0 | 0.413 | 6.417 | 6.6 | 5.2873 | 4 | 305 | 19.2 | 383.73 | 6.72 | 24.2 |
81 | 0.04113 | 25 | 4.86 | 0 | 0.426 | 6.727 | 33.5 | 5.4007 | 4 | 281 | 19 | 396.9 | 5.29 | 28 |
91 | 0.04684 | 0 | 3.41 | 0 | 0.489 | 6.417 | 66.1 | 3.0923 | 2 | 270 | 17.8 | 392.18 | 8.81 | 22.6 |
115 | 0.14231 | 0 | 10.01 | 0 | 0.547 | 6.254 | 84.2 | 2.2565 | 6 | 432 | 17.8 | 388.74 | 10.45 | 18.5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
SQL Call
SELECT * FROM DecisionForest ( ON boston AS InputTable OUT TABLE OutputTable (rtf_model) OUT TABLE MonitorTable(boston_monitor_table) USING ResponseColumn ('medv') NumericInputs ('crim','zn','indus','chas','nox','rm', 'age','dis','rad','tax','ptratio','black','lstat') TreeType ('regression') MinNodeSize ('2') MaxDepth ('6') NumTrees ('50') OutOfBag ('true') ) AS dt;
Output
message |
---|
Computing 48 regression trees. Each worker is computing 16 trees. Each tree will contain approximately 168 points. Poisson sampling parameter: 0.996 Mean of squared residuals: 15.396538832527742 % Var explained: 81.76188132990643 Decision forest created. |
This query returns the following table:
SELECT task_index, tree_num, cast (tree AS VARCHAR(50)) FROM rft_model ORDER BY 1;
task_index | tree_num | cast (tree as VARCHAR(50)) |
---|---|---|
0 | 0 | {"sum_":10723.700000000017,"sumSq_":283628.49,"siz |
0 | 2 | {"sum_":11020.200000000019,"sumSq_":282336.0800000 |
0 | 3 | {"sum_":11577.600000000006,"sumSq_":302737.4599999 |
0 | 1 | {"sum_":11877.69999999999,"sumSq_":309107.12999999 |