This example uses home sales data to create a classification tree that predicts home style, which can be input to the DecisionForestPredict_MLE Example: Omit Responses. By default, the function does not output the out-of-bag estimate of error rate.
Input
The following table describes the home sales data contained in the InputTable. There are six numerical predictors and six categorical predictors. The response variable is homestyle.
Column | Description |
---|---|
price | Sale price in U. S. dollars (numeric) |
lotsize | Lot size in square feet (numeric) |
bedrooms | Number of bedrooms (numeric) |
bathrms | Number of full bathrooms (numeric) |
stories | Number of stories, excluding basement (numeric) |
driveway | Whether the house has a driveway—yes or no (categorical) |
recroom | Whether the house has a recreation room—yes or no (categorical) |
fullbase | Whether the house has a full finished basement—yes or no (categorical) |
gashw | Whether the house uses gas to heat water—yes or no (categorical) |
airco | Whether the house has central air conditioning—yes or no (categorical) |
garagepl | Number of garage places (numeric) |
prefarea | Whether the house is in a preferred neighborhood—yes or no (categorical) |
homestyle | Style of home (response variable) |
sn | price | lotsize | bedrooms | bathrms | stories | driveway | recroom | fullbase | gashw | airco | garagepl | prefarea | homestyle |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 42000 | 5850 | 3 | 1 | 2 | yes | no | yes | no | no | 1 | no | Classic |
2 | 38500 | 4000 | 2 | 1 | 1 | yes | no | no | no | no | 0 | no | Classic |
3 | 49500 | 3060 | 3 | 1 | 1 | yes | no | no | no | no | 0 | no | Classic |
4 | 60500 | 6650 | 3 | 1 | 2 | yes | yes | no | no | no | 0 | no | Eclectic |
5 | 61000 | 6360 | 2 | 1 | 1 | yes | no | no | no | no | 0 | no | Eclectic |
6 | 66000 | 4160 | 3 | 1 | 1 | yes | yes | yes | no | yes | 0 | no | Eclectic |
7 | 66000 | 3880 | 3 | 2 | 2 | yes | no | yes | no | no | 2 | no | Eclectic |
8 | 69000 | 4160 | 3 | 1 | 3 | yes | no | no | no | no | 0 | no | Eclectic |
9 | 83800 | 4800 | 3 | 1 | 1 | yes | yes | yes | no | no | 0 | no | Eclectic |
10 | 88500 | 5500 | 3 | 2 | 4 | yes | yes | no | no | yes | 1 | no | Eclectic |
11 | 90000 | 7200 | 3 | 2 | 1 | yes | no | yes | no | yes | 3 | no | Eclectic |
12 | 30500 | 3000 | 2 | 1 | 1 | no | no | no | no | no | 0 | no | Classic |
14 | 36000 | 2880 | 3 | 1 | 1 | no | no | no | no | no | 0 | no | Classic |
15 | 37000 | 3600 | 2 | 1 | 1 | yes | no | no | no | no | 0 | no | Classic |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
SQL Call
This call uses default values for the MaxDepth, MinNodeSize, and Variance syntax elements, and builds 50 trees on two worker nodes. It sets both seed values to 100 for repeatability. Because TreeType is 'classification' and there are 12 prediction variables, Mtry is 3 (round(sqrt(12)). By default, OutOfBag is 'false' and IDColumn is the first InputTable column.
SELECT * FROM DecisionForest (
ON housing_train AS InputTable
OUT TABLE OutputTable (rft_model)
OUT TABLE OutputMessageTable (rf_monitortable)
USING
TreeType ('classification')
ResponseColumn ('homestyle')
NumericInputs ('price','lotsize','bedrooms','bathrms','stories','garagepl')
CategoricalInputs
('driveway','recroom','fullbase','gashw','airco','prefarea')
MaxDepth (12)
MinNodeSize (1)
NumTrees (50)
Variance (0.0)
Mtry ('3')
MtrySeed ('100')
Seed ('100')
) AS dt;
Output
message ------------------------------------------------ Computing 50 classification trees. Each worker is computing 25 trees. Each tree will contain approximately 246 points. Poisson sampling parameter: 1.00 Query finished in 1.997 seconds. Decision forest created.
SELECT task_index, tree_num, CAST (tree AS VARCHAR(50)) FROM rft_model ORDER BY 1;
task_index tree_num tree ---------- -------- -------------------------------------------------- 0 20 {"responseCounts_":{"classic":73,"bungalow":37,"ec 0 23 {"responseCounts_":{"classic":60,"bungalow":48,"ec 0 13 {"responseCounts_":{"classic":73,"bungalow":40,"ec 0 11 {"responseCounts_":{"classic":69,"bungalow":32,"ec 0 7 {"responseCounts_":{"classic":63,"bungalow":45,"ec 0 2 {"responseCounts_":{"classic":70,"bungalow":26,"ec 0 14 {"responseCounts_":{"classic":60,"bungalow":42,"ec 0 17 {"responseCounts_":{"classic":71,"bungalow":34,"ec 0 21 {"responseCounts_":{"classic":71,"bungalow":42,"ec 0 22 {"responseCounts_":{"classic":67,"bungalow":43,"ec 0 8 {"responseCounts_":{"classic":63,"bungalow":41,"ec 0 5 {"responseCounts_":{"classic":71,"bungalow":42,"ec 0 4 {"responseCounts_":{"classic":73,"bungalow":24,"ec 0 6 {"responseCounts_":{"classic":81,"bungalow":52,"ec 0 3 {"responseCounts_":{"classic":84,"bungalow":37,"ec 0 9 {"responseCounts_":{"classic":70,"bungalow":38,"ec 0 19 {"responseCounts_":{"classic":75,"bungalow":24,"ec 0 12 {"responseCounts_":{"classic":70,"bungalow":27,"ec 0 24 {"responseCounts_":{"classic":79,"bungalow":40,"ec 0 16 {"responseCounts_":{"classic":57,"bungalow":31,"ec 0 0 {"responseCounts_":{"classic":69,"bungalow":36,"ec 0 10 {"responseCounts_":{"classic":76,"bungalow":40,"ec 0 18 {"responseCounts_":{"classic":59,"bungalow":41,"ec 0 15 {"responseCounts_":{"classic":71,"bungalow":36,"ec 0 1 {"responseCounts_":{"classic":70,"bungalow":37,"ec 3 12 {"responseCounts_":{"classic":80,"bungalow":15,"ec 3 22 {"responseCounts_":{"classic":59,"bungalow":21,"ec 3 3 {"responseCounts_":{"classic":64,"bungalow":24,"ec 3 6 {"responseCounts_":{"classic":88,"bungalow":27,"ec 3 5 {"responseCounts_":{"classic":79,"bungalow":19,"ec 3 2 {"responseCounts_":{"classic":74,"bungalow":23,"ec 3 0 {"responseCounts_":{"classic":68,"bungalow":23,"ec 3 15 {"responseCounts_":{"classic":73,"bungalow":26,"ec 3 23 {"responseCounts_":{"classic":73,"bungalow":16,"ec 3 24 {"responseCounts_":{"classic":77,"bungalow":19,"ec 3 21 {"responseCounts_":{"classic":83,"bungalow":25,"ec 3 14 {"responseCounts_":{"classic":84,"bungalow":16,"ec 3 11 {"responseCounts_":{"classic":73,"bungalow":28,"ec 3 19 {"responseCounts_":{"classic":75,"bungalow":17,"ec 3 9 {"responseCounts_":{"classic":76,"bungalow":29,"ec 3 7 {"responseCounts_":{"classic":79,"bungalow":24,"ec 3 17 {"responseCounts_":{"classic":68,"bungalow":22,"ec 3 18 {"responseCounts_":{"classic":81,"bungalow":21,"ec 3 20 {"responseCounts_":{"classic":68,"bungalow":19,"ec 3 16 {"responseCounts_":{"classic":81,"bungalow":17,"ec 3 4 {"responseCounts_":{"classic":72,"bungalow":17,"ec 3 13 {"responseCounts_":{"classic":61,"bungalow":27,"ec 3 1 {"responseCounts_":{"classic":79,"bungalow":26,"ec 3 10 {"responseCounts_":{"classic":69,"bungalow":21,"ec 3 8 {"responseCounts_":{"classic":82,"bungalow":17,"ec
Download a zip file of all examples and a SQL script file that creates their input tables.