The example uses the Gaussian model created in GLM Example: Gaussian Distribution Analysis to predict house prices. To evaluate the accuracy of the model, the example calculates the root mean square error (RMSE) between the known actual price from the input table and the price predicted by the model.
Input
- Input table: housing_test, which contains test data for 54 houses
- Model: glm_housing_model from the output section of GLM Example: Gaussian Distribution Analysis
sn | price | lotsize | bedrooms | bathrms | stories | driveway | recroom | fullbase | gashw | airco | garagepl | prefarea | homestyle |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | 27000 | 1700 | 3 | 1 | 2 | yes | no | no | no | no | 0 | no | Classic |
16 | 37900 | 3185 | 2 | 1 | 1 | yes | no | no | no | yes | 0 | no | Classic |
25 | 42000 | 4960 | 2 | 1 | 1 | yes | no | no | no | no | 0 | no | Classic |
38 | 67000 | 5170 | 3 | 1 | 4 | yes | no | no | no | yes | 0 | no | Eclectic |
53 | 68000 | 9166 | 2 | 1 | 1 | yes | no | yes | no | yes | 2 | no | Eclectic |
104 | 132000 | 3500 | 4 | 2 | 2 | yes | no | no | yes | no | 2 | no | bungalow |
111 | 43000 | 5076 | 3 | 1 | 1 | no | no | no | no | no | 0 | no | Classic |
117 | 93000 | 3760 | 3 | 1 | 2 | yes | no | no | yes | no | 2 | no | Eclectic |
132 | 44500 | 3850 | 3 | 1 | 2 | yes | no | no | no | no | 0 | no | Classic |
140 | 43000 | 3750 | 3 | 1 | 2 | yes | no | no | no | no | 0 | no | Classic |
142 | 40000 | 2650 | 3 | 1 | 2 | yes | no | yes | no | no | 1 | no | Classic |
157 | 60000 | 2953 | 3 | 1 | 2 | yes | no | yes | no | yes | 0 | no | Eclectic |
161 | 63900 | 3162 | 3 | 1 | 2 | yes | no | no | no | yes | 1 | no | Eclectic |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
SQL Call
The canonical link specifies the default family link, which is "identity" for the Gaussian distribution.
CREATE MULTISET TABLE glmpredict_housing AS ( SELECT * FROM GLMPredict_MLE ( ON housing_test PARTITION BY ANY ON glm_housing_model AS Model DIMENSION USING Accumulate ('sn', 'price') Family ('GAUSSIAN') LinkFunction ('CANONICAL') ) AS dt ) WITH DATA;
Output
The fitted_value column gives the predicted home price.
SELECT * FROM glmpredict_housing ORDER BY 1;
sn price fitted_value --- -------- ------------------ 13 27000.0 37345.84477329254 16 37900.0 43687.13245987892 25 42000.0 40902.02870941162 38 67000.0 72487.67201280594 53 68000.0 79238.69493055344 104 132000.0 111528.00744915009 111 43000.0 39102.882046699524 117 93000.0 66936.95215988159 132 44500.0 41819.88732004166 140 43000.0 41611.79231786728 142 40000.0 44394.14731836319 157 60000.0 66571.26562905312 161 63900.0 64900.98411035538 162 130000.0 107759.1224937439 176 57500.0 73438.73871564865 177 70000.0 62378.35326194763 195 33000.0 37197.9376707077 198 40500.0 47308.08242368698 224 78500.0 67232.86958527565 234 32500.0 35237.16528749466 237 43000.0 46593.47125959396 239 26000.0 43377.86666679382 249 44500.0 37863.84167766571 251 48500.0 45096.38719415665 254 60000.0 74665.70030021667 255 61000.0 60214.16523933411 260 41000.0 43066.21673202515 274 64900.0 67232.45221805573 294 47000.0 38987.55468940735 301 55000.0 55621.6930809021 306 64000.0 67339.69788479805 317 80000.0 65655.120470047 329 115442.0 123612.01978111267 339 141000.0 126282.07774448395 340 62500.0 58474.835211753845 353 78500.0 67485.69113445282 355 86900.0 68425.80433177948 364 72000.0 77422.12543773651 367 114000.0 128556.01284217834 377 140000.0 127201.90244436264 401 92500.0 84040.80987596512 403 77500.0 79857.25416207314 408 87500.0 76218.38956928253 411 90000.0 78179.1003665924 440 69000.0 80549.23930311203 441 51900.0 64670.294174194336 443 65000.0 61704.09422302246 459 44555.0 42818.367908000946 463 49000.0 49293.44947862625 469 55000.0 61779.35452270508 472 60500.0 63767.0579059124 527 105000.0 119762.26224088669 530 108000.0 116119.24969100952 540 85000.0 73146.08736228943
RMSE
This query returns the following table:
SELECT SQRT(AVG(POWER(glmpredict_housing.price - glmpredict_housing.fitted_value, 2))) AS RMSE FROM glmpredict_housing;
RMSE ------------------ 10246.752127962065
Download a zip file of all examples and a SQL script file that creates their input tables.