This example uses home sales data to create a model that predicts home style, which can be input to the AdaBoostPredict Example 1.
Input
- AttributeTable: housing_train_att, created by inputting housing_train (the raw data for DecisionForest Example 1: Classification Tree without Out-of-Bag Error) to the Unpivoting function:
CREATE MULTISET TABLE housing_train_att AS ( SELECT * FROM Unpivoting ( ON housing_train USING TargetColumns ('price', 'lotsize', 'bedrooms', 'bathrms', 'stories', 'driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'garagepl', 'prefarea') Accumulate ('sn') ) AS dt ) WITH DATA;
- CategoricalAttributeTable: housing_cat, created and populated with this code:
CREATE MULTISET TABLE housing_cat (attribute VARCHAR); INSERT INTO housing_cat VALUES ('driveway'); INSERT INTO housing_cat VALUES ('recroom'); INSERT INTO housing_cat VALUES ('fullbase'); INSERT INTO housing_cat VALUES ( 'gashw'); INSERT INTO housing_cat VALUES ('airco'); INSERT INTO housing_cat VALUES ('prefarea');
- ResponseTable: housing_train_response, created by selecting the columns sn and homestyle from housing_train:
CREATE MULTISET TABLE housing_train_response AS ( SELECT sn, homestyle AS response FROM housing_train ) WITH DATA;
This query returns the following table:
SELECT * FROM housing_train_att ORDER BY 1, 2;
sn | attribute | value_col |
---|---|---|
1 | airco | no |
1 | bathrms | 1 |
1 | bedrooms | 3 |
1 | driveway | yes |
1 | fullbase | yes |
1 | garagepl | 1 |
1 | gashw | no |
1 | lotsize | 5850.0 |
1 | prefarea | no |
1 | price | 42000.0 |
1 | recroom | no |
1 | stories | 2 |
2 | airco | no |
2 | bathrms | 1 |
2 | bedrooms | 2 |
2 | fullbase | no |
2 | garagepl | 0 |
2 | gashw | no |
2 | lotsize | 4000.0 |
2 | prefarea | no |
2 | price | 38500.0 |
2 | recroom | no |
2 | stories | 1 |
... | ... | ... |
This query returns the following table:
SELECT * FROM housing_cat ORDER BY 1;
attribute |
---|
airco |
driveway |
fullbase |
gashw |
prefarea |
recroom |
This query returns the following table:
SELECT * FROM housing_train_response ORDER BY 1;
sn | response |
---|---|
1 | Classic |
2 | Classic |
3 | Classic |
4 | Eclectic |
5 | Eclectic |
6 | Eclectic |
7 | Eclectic |
8 | Eclectic |
9 | Eclectic |
10 | Eclectic |
... | ... |
SQL Call
This call creates the model, abd_model, using the default values for the optional arguments.
SELECT * FROM AdaBoost ( ON housing_train_att AS AttributeTable ON housing_cat AS CategoricalAttributeTable ON housing_train_response AS ResponseTable OUT TABLE OutputTable (abd_model) USING IdColumns ('sn') AttributeNameColumns ('attribute') AttributeValueColumn ('value_col') ResponseColumn ('response') IterNum (20) NumSplits (10) MaxDepth (3) MinNodeSize (100) ) AS dt ;
Output
Because the argument IterNum has the value 20, the function builds 20 classification trees.
message |
---|
Running 20 round AdaBoost, computing 20 classification trees. AdaBoost model created. |
This query returns the following table:
SELECT * FROM abd_model ORDER BY 1,3;
classifier_id | classifier_weight | node_id | node_label | node_majorfreq | attribute | split_value | left_bucket | right_bucket | left_label | right_label | left_majorfreq | right_majorfreq |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.82867818889083 | 0 | Eclectic | 0.601626016260163 | price | 49500 | Classic | Eclectic | 0.241869918699188 | 0.601626016260163 | ||
1 | 2.82867818889083 | 2 | Eclectic | 0.601626016260163 | price | 90000 | Eclectic | bungalow | 0.538617886178861 | 0.113821138211382 | ||
1 | 2.82867818889083 | 5 | Eclectic | 0.538617886178861 | price | 55000 | Eclectic | Eclectic | 0.0792682926829268 | 0.459349593495934 | ||
2 | 3.284500712218 | 0 | Eclectic | 0.598193473193469 | price | 55000 | Classic | Eclectic | 0.359382284382285 | 0.568648018648015 | ||
2 | 3.284500712218 | 1 | Classic | 0.359382284382285 | gashw | NaN | yes | no | Eclectic | Classic | 0.00303030303030304 | 0.357867132867133 |
2 | 3.284500712218 | 2 | Eclectic | 0.568648018648017 | garagepl | 1 | Eclectic | Eclectic | 0.257575757575758 | 0.311072261072259 | ||
2 | 3.284500712218 | 4 | Classic | 0.357867132867133 | fullbase | NaN | yes | no | Classic | Classic | 0.0294871794871793 | 0.328379953379953 |
2 | 3.284500712218 | 5 | Eclectic | 0.257575757575758 | lotsize | 16200 | Eclectic | bungalow | 0.257575757575758 | 0.000757575757575758 | ||
2 | 3.284500712218 | 6 | Eclectic | 0.311072261072261 | stories | 4 | Eclectic | Eclectic | 0.295221445221445 | 0.0158508158508158 | ||
3 | 2.87425191371606 | 0 | Eclectic | 0.458459069137445 | price | 95000 | Eclectic | bungalow | 0.384960363716916 | 0.398822168720203 | ||
3 | 2.87425191371606 | 1 | Eclectic | 0.384960363716916 | price | 51500 | Classic | Eclectic | 0.142718762142357 | 0.348728479658944 | ||
3 | 2.87425191371606 | 3 | Classic | 0.142718762142357 | garagepl | 2 | Classic | Eclectic | 0.13649642855846 | 0.0144927536231885 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |