This example uses test data and the model output by the AdaBoost Example 1 to use real estate sales data to predict home style.
Input
- attributetable: housing_test_att, created by inputting housing_train (the raw data for DecisionForest Example 1: Classification Tree without Out-of-Bag Error) to the Unpivoting function:
CREATE MULTISET TABLE housing_test_att AS ( SELECT * FROM Unpivoting ( ON housing_test USING TargetColumns ('price', 'lotsize', 'bedrooms', 'bathrms', 'stories', 'driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'garagepl', 'prefarea') Accumulate ('sn') ) AS dt ) WITH DATA;
- model: abd_model, output by AdaBoost Example 1
This query returns the following table:
SELECT * FROM housing_test_att ORDER BY 1, 2;
sn | attribute | value_col |
---|---|---|
13 | airco | no |
13 | bathrms | 1 |
13 | bedrooms | 3 |
13 | driveway | yes |
13 | fullbase | no |
13 | garagepl | 0 |
13 | gashw | no |
13 | lotsize | 1700.0 |
13 | prefarea | no |
13 | price | 27000.0 |
13 | recroom | no |
13 | stories | 2 |
16 | airco | yes |
16 | bathrms | 1 |
16 | bedrooms | 2 |
16 | driveway | yes |
16 | fullbase | no |
16 | garagepl | 0 |
16 | gashw | no |
16 | lotsize | 3185.0 |
16 | prefarea | no |
16 | price | 37900.0 |
16 | recroom | no |
16 | stories | 1 |
... | ... | ... |
SQL Call
CREATE MULTISET TABLE housing_prediction AS ( SELECT * FROM AdaBoostPredict ( ON housing_test_att AS attributetable PARTITION BY sn ON abd_model AS model DIMENSION USING AttrTableGroupbyColumns ('attribute') AttrTablePidColumns ('sn') AttrTableValColumn ('value') ) AS dt ) WITH DATA;
Output
This query returns the following table:
SELECT * FROM housing_prediction ORDER BY 1;
The pred_label column contains the predicted response.
sn | pred_label |
---|---|
13 | Classic |
16 | Classic |
25 | Classic |
38 | Eclectic |
53 | Eclectic |
104 | Bungalow |
111 | Classic |
117 | Eclectic |
132 | Classic |
140 | Classic |
142 | Classic |
157 | Eclectic |
161 | Eclectic |
162 | Bungalow |
176 | Eclectic |
177 | Eclectic |
195 | Classic |
198 | Classic |
224 | Eclectic |
234 | Classic |
237 | Classic |
239 | Classic |
249 | Classic |
251 | Classic |
254 | Eclectic |
255 | Eclectic |
260 | Classic |
274 | Eclectic |
294 | Classic |
301 | Eclectic |
306 | Eclectic |
317 | Eclectic |
329 | Bungalow |
339 | Bungalow |
340 | Eclectic |
353 | Eclectic |
355 | Eclectic |
364 | Eclectic |
367 | Bungalow |
377 | Bungalow |
401 | Eclectic |
403 | Eclectic |
408 | Eclectic |
411 | Eclectic |
440 | Eclectic |
441 | Eclectic |
443 | Eclectic |
459 | Classic |
463 | Classic |
469 | Eclectic |
472 | Eclectic |
527 | Bungalow |
530 | Eclectic |
540 | Eclectic |
Prediction Accuracy
This query returns the prediction accuracy:
SELECT (SELECT COUNT(*) FROM housing_prediction, housing_test WHERE housing_prediction.sn = housing_test.sn AND housing_prediction.pred_label = housing_test.homestyle) / (SELECT COUNT(SN) FROM housing_prediction) ) AS PA;
PA |
---|
0.98148148148148148148 |
The prediction accuracy is 98.1%, a large improvement over the Forest_Predict function, whose prediction accuracy is 77.7% on the same input.