Forest_Drive Example Input Data Descriptions describes the real estate sales data contained in the input table. There are six numerical predictors and six categorical predictors. The response variable is homestyle.
The table of raw training data, housing_train, is described by the following two tables.
sn | price | lotsize | bedrooms | bathrms | stories | driveway | recroom | fullbase |
---|---|---|---|---|---|---|---|---|
1 | 42000 | 5850 | 3 | 1 | 2 | yes | no | yes |
2 | 38500 | 4000 | 2 | 2 | 1 | yes | no | no |
3 | 49500 | 3060 | 3 | 3 | 1 | yes | no | no |
4 | 60500 | 6650 | 3 | 1 | 2 | yes | yes | no |
5 | 61000 | 6360 | 2 | 1 | 1 | yes | no | no |
6 | 66000 | 4160 | 3 | 1 | 1 | yes | yes | yes |
7 | 66000 | 3880 | 3 | 2 | 2 | yes | no | yes |
8 | 69000 | 4160 | 3 | 1 | 1 | yes | no | no |
9 | 83800 | 4800 | 3 | 1 | 1 | yes | yes | yes |
10 | 88500 | 5500 | 3 | 2 | 4 | yes | yes | no |
11 | 90000 | 7200 | 3 | 2 | 1 | yes | no | yes |
12 | 30500 | 3000 | 2 | 1 | 1 | yes | no | no |
14 | 36000 | 2880 | 3 | 1 | 1 | no | no | no |
15 | 37000 | 3600 | 2 | 1 | 1 | yes | no | no |
... | ... | ... | ... | ... | ... | ... | ... | ... |
gashw | airco | prefarea | garagepl | homestyle |
---|---|---|---|---|
no | no | 1 | no | Classic |
no | no | 0 | no | Classic |
no | no | 0 | no | Classic |
no | no | 0 | no | Eclectic |
no | no | 0 | no | Eclectic |
no | yes | 0 | no | Eclectic |
no | no | 2 | no | Eclectic |
no | no | 0 | no | Eclectic |
no | no | 0 | no | Eclectic |
no | yes | 1 | no | Eclectic |
no | yes | 3 | no | Eclectic |
no | no | 0 | no | Classic |
no | no | 0 | no | Classic |
no | no | 0 | no | Classic |
... | ... | ... | ... | ... |
Create the input table for the AdaBoostDrive function, housing_train_att, by using the Unpivot function on the table of raw data, housing_train:
CREATE TABLE housing_train_att distribute BY HASH(sn) AS ( SELECT * FROM unpivot ( ON housing_train ColsToUnpivot ('price', 'lotsize', 'bedrooms', 'bathrms', 'stories', 'driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'garagepl', 'prefarea') ColsToAccumulate ('sn') ) );
This query returns the following table:
SELECT * FROM housing_train_att ORDER BY 1, 2;
sn | attribute | value |
---|---|---|
1 | airco | no |
1 | bathrms | 1 |
1 | bedrooms | 3 |
1 | driveway | yes |
1 | fullbase | yes |
1 | garagepl | 1 |
1 | gashw | no |
1 | lotsize | 5850.0 |
1 | prefarea | no |
1 | price | 42000.0 |
1 | recroom | no |
1 | stories | 2 |
2 | airco | no |
2 | bathrms | 1 |
2 | bedrooms | 2 |
2 | fullbase | no |
2 | garagepl | 0 |
2 | gashw | no |
2 | lotsize | 4000.0 |
2 | prefarea | no |
2 | price | 38500.0 |
2 | recroom | no |
2 | stories | 1 |
... | ... | ... |
Create the response table for the AdaBoostDrive function, housing_train_response, by selecting the columns sn and homestyle from the table of raw data, housing_train:
CREATE TABLE housing_train_response DISTRIBUTE BY HASH(sn) AS (SELECT sn, homestyle AS response FROM housing_train);
This query returns the following table:
SELECT * FROM housing_train_response ORDER BY 1;
sn | response |
---|---|
1 | Classic |
2 | Classic |
3 | Classic |
4 | Eclectic |
5 | Eclectic |
6 | Eclectic |
7 | Eclectic |
8 | Eclectic |
9 | Eclectic |
10 | Eclectic |
... | ... |
Create and populate the categorical attribute table, housing_cat, for the AdaBoostDrive function:
CREATE TABLE housing_cat (attribute VARCHAR) DISTRIBUTE BY HASH(attribute); INSERT INTO housing_cat VALUES ('driveway'); INSERT INTO housing_cat VALUES ('recroom'); INSERT INTO housing_cat VALUES ('fullbase'); INSERT INTO housing_cat VALUES ( 'gashw'); INSERT INTO housing_cat VALUES ('airco'); INSERT INTO housing_cat VALUES ('prefarea');
This query returns the following table:
SELECT * FROM housing_cat ORDER BY 1;
attribute |
---|
airco |
driveway |
fullbase |
gashw |
prefarea |
recroom |