For the Gaussian distribution the response variable must be a continuous numerical variable, where the data is grouped around a single mean and the graph looks like a normal or bell curve distribution.
Input
The input table, housing_train, is real estate data on homes, which models the home price with 12 predictors (six numerical and six categorical variables).
Response Variable | Predictor | Description |
---|---|---|
price | ||
lotsize | ||
bedrooms | ||
bathrms | ||
stories | ||
driveway | ||
recroom | ||
fullbase | ||
gashw | ||
airco | ||
garagepl | ||
prefarea | ||
homestyle | Architectural style of the house |
sn | price | lotsize | bedrooms | bathrms | stories | driveway | recroom | fullbase | gashw | airco | garagepl | prefarea | homestyle |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 42000 | 5850 | 3 | 1 | 2 | yes | no | yes | no | no | 1 | no | Classic |
2 | 38500 | 4000 | 2 | 1 | 1 | yes | no | no | no | no | 0 | no | Classic |
3 | 49500 | 3060 | 3 | 1 | 1 | yes | no | no | no | no | 0 | no | Classic |
4 | 60500 | 6650 | 3 | 1 | 2 | yes | yes | no | no | no | 0 | no | Eclectic |
5 | 61000 | 6360 | 2 | 1 | 1 | yes | no | no | no | no | 0 | no | Eclectic |
6 | 66000 | 4160 | 3 | 1 | 1 | yes | yes | yes | no | yes | 0 | no | Eclectic |
7 | 66000 | 3880 | 3 | 2 | 2 | yes | no | yes | no | no | 2 | no | Eclectic |
8 | 69000 | 4160 | 3 | 1 | 3 | yes | no | no | no | no | 0 | no | Eclectic |
9 | 83800 | 4800 | 3 | 1 | 1 | yes | yes | yes | no | no | 0 | no | Eclectic |
10 | 88500 | 5500 | 3 | 2 | 4 | yes | yes | no | no | yes | 1 | no | Eclectic |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
SQL Call
In this example, the family is GAUSSIAN and the default family link is IDENTITY.
DROP TABLE glm_housing_model; SELECT * FROM GLM ( ON housing_train AS InputTable OUT TABLE OutputTable (glm_housing_model) USING InputColumns ('price', 'lotsize', 'bedrooms', 'bathrms', 'stories', 'garagepl', 'driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'prefarea', 'homestyle') CategoricalColumns ('driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'prefarea', 'homestyle') Family ('GAUSSIAN') LinkFunction ('IDENTITY') WeightColumn ('1') StopThreshold (0.01) MaxIterNum (25) Step ('false') Intercept ('true') ) AS dt;
Output
predictor | estimate | std_error | t_score | p_value | significance |
---|---|---|---|---|---|
(Intercept) | 36349.3 | 2733.46 | 13.2979 | 0 | *** |
lotsize | 2.08095 | 0.26133 | 7.96291 | 1.24345e-14 | *** |
bedrooms | 782.093 | 766.84 | 1.01989 | 0.308296 | |
bathrms | 6772.31 | 1106.78 | 6.11894 | 1.96318e-09 | *** |
stories | 2445.62 | 694.145 | 3.52321 | 0.000467307 | *** |
garagepl | 1483.1 | 623.597 | 2.3783 | 0.0177847 | * |
driveway.no | -2822.63 | 1481.25 | -1.90558 | 0.0573049 | . |
recroom.yes | 1208.53 | 1358.57 | 0.88956 | 0.37415 | |
fullbase.yes | 3588.3 | 1167.37 | 3.07382 | 0.00223419 | ** |
gashw.yes | 5787.25 | 2405.47 | 2.40587 | 0.0165127 | * |
airco.yes | 6478.79 | 1152.16 | 5.62317 | 3.19341e-08 | *** |
prefarea.yes | 6465.64 | 1212.84 | 5.33099 | 1.50887e-07 | *** |
homestyle.Classic | -16550.9 | 1308.59 | -12.6479 | 0 | *** |
homestyle.bungalow | 37577.7 | 1850.17 | 20.3104 | 0 | *** |
ITERATIONS # | 2 | 0 | 0 | 0 | Number of Fisher Scoring iterations |
ROWS # | 492 | 0 | 0 | 0 | Number of rows |
Residual deviance | Infinity | 0 | 0 | 0 | on 478 degrees of freedom |
Pearson goodness of fit | 5.30669e+10 | 0 | 0 | 0 | on 478 degrees of freedom |
AIC | Infinity | 0 | 0 | 0 | Akaike information criterion |
BIC | Infinity | 0 | 0 | 0 | Bayesian information criterion |
Wald Test | 23174 | 0 | 0 | 0 | *** |
Dispersion parameter | 1.11019e+08 | 0 | 0 | 0 | Taken to be 1 for BINOMIAL and POISSON. |
Many predictors are significant at 95% confidence level (p-value < 0.05).
This query returns the following table:
SELECT * FROM glm_housing_model ORDER BY attribute;
attribute | predictor | category | estimate | std_err | z_score | p_value | significance | family |
---|---|---|---|---|---|---|---|---|
-1 | Loglik | -Infinity | 492 | 13 | 0 | GAUSSIAN | ||
0 | (Intercept) | 36349.3 | 2733.46 | 13.2979 | 0 | *** | GAUSSIAN | |
1 | lotsize | 2.08095 | 0.26133 | 7.96291 | 1.24345e-14 | *** | GAUSSIAN | |
2 | bedrooms | 782.093 | 766.84 | 1.01989 | 0.308296 | GAUSSIAN | ||
3 | bathrms | 6772.31 | 1106.78 | 6.11894 | 1.96318e-09 | *** | GAUSSIAN | |
4 | stories | 2445.62 | 694.145 | 3.52321 | 0.000467307 | *** | GAUSSIAN | |
5 | garagepl | 1483.1 | 623.597 | 2.3783 | 0.0177847 | * | GAUSSIAN | |
6 | driveway | yes | GAUSSIAN | |||||
7 | driveway | no | -2822.63 | 1481.25 | -1.90558 | 0.0573049 | . | GAUSSIAN |
8 | recroom | no | GAUSSIAN | |||||
9 | recroom | yes | 1208.53 | 1358.57 | 0.88956 | 0.37415 | GAUSSIAN | |
10 | fullbase | no | GAUSSIAN | |||||
11 | fullbase | yes | 3588.3 | 1167.37 | 3.07382 | 0.00223419 | ** | GAUSSIAN |
12 | gashw | no | GAUSSIAN | |||||
13 | gashw | yes | 5787.25 | 2405.47 | 2.40587 | 0.0165127 | * | GAUSSIAN |
14 | airco | no | GAUSSIAN | |||||
15 | airco | yes | 6478.79 | 1152.16 | 5.62317 | 3.19341e-08 | *** | GAUSSIAN |
16 | prefarea | no | GAUSSIAN | |||||
17 | prefarea | yes | 6465.64 | 1212.84 | 5.33099 | 1.50887e-07 | *** | GAUSSIAN |
18 | homestyle | Eclectic | GAUSSIAN | |||||
19 | homestyle | Classic | -16550.9 | 1308.59 | -12.6479 | 0 | *** | GAUSSIAN |
20 | homestyle | bungalow | 37577.7 | 1850.17 | 20.3104 | 0 | *** | GAUSSIAN |