GLM Example 3: Gaussian Distribution Analysis with Default Options - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

For the Gaussian distribution the response variable must be a continuous numerical variable, where the data is grouped around a single mean and the graph looks like a normal or bell curve distribution.

Input

The input table, housing_train, is real estate data on homes, which models the home price with 12 predictors (six numerical and six categorical variables).

Variable Definitions
Response Variable Predictor Description
price    
  lotsize  
  bedrooms  
  bathrms  
  stories  
  driveway  
  recroom  
  fullbase  
  gashw  
  airco  
  garagepl  
  prefarea  
  homestyle Architectural style of the house
housing_train
sn price lotsize bedrooms bathrms stories driveway recroom fullbase gashw airco garagepl prefarea homestyle
1 42000 5850 3 1 2 yes no yes no no 1 no Classic
2 38500 4000 2 1 1 yes no no no no 0 no Classic
3 49500 3060 3 1 1 yes no no no no 0 no Classic
4 60500 6650 3 1 2 yes yes no no no 0 no Eclectic
5 61000 6360 2 1 1 yes no no no no 0 no Eclectic
6 66000 4160 3 1 1 yes yes yes no yes 0 no Eclectic
7 66000 3880 3 2 2 yes no yes no no 2 no Eclectic
8 69000 4160 3 1 3 yes no no no no 0 no Eclectic
9 83800 4800 3 1 1 yes yes yes no no 0 no Eclectic
10 88500 5500 3 2 4 yes yes no no yes 1 no Eclectic
... ... ... ... ... ... ... ... ... ... ... ... ... ...

SQL Call

In this example, the family is GAUSSIAN and the default family link is IDENTITY.

DROP TABLE glm_housing_model;

SELECT * FROM GLM (
  ON housing_train AS InputTable
  OUT TABLE OutputTable (glm_housing_model)
  USING
  InputColumns ('price', 'lotsize', 'bedrooms', 'bathrms',
                'stories', 'garagepl', 'driveway', 'recroom',
                'fullbase', 'gashw', 'airco', 'prefarea', 'homestyle')
  CategoricalColumns ('driveway', 'recroom', 'fullbase', 'gashw',
                       'airco', 'prefarea', 'homestyle')
  Family ('GAUSSIAN')
  LinkFunction ('IDENTITY')
  WeightColumn ('1')
  StopThreshold (0.01)
  MaxIterNum (25)
  Step ('false')
  Intercept ('true')
) AS dt;

Output

Model Statistics
predictor estimate std_error t_score p_value significance
(Intercept) 36349.3 2733.46 13.2979 0 ***
lotsize 2.08095 0.26133 7.96291 1.24345e-14 ***
bedrooms 782.093 766.84 1.01989 0.308296  
bathrms 6772.31 1106.78 6.11894 1.96318e-09 ***
stories 2445.62 694.145 3.52321 0.000467307 ***
garagepl 1483.1 623.597 2.3783 0.0177847 *
driveway.no -2822.63 1481.25 -1.90558 0.0573049 .
recroom.yes 1208.53 1358.57 0.88956 0.37415  
fullbase.yes 3588.3 1167.37 3.07382 0.00223419 **
gashw.yes 5787.25 2405.47 2.40587 0.0165127 *
airco.yes 6478.79 1152.16 5.62317 3.19341e-08 ***
prefarea.yes 6465.64 1212.84 5.33099 1.50887e-07 ***
homestyle.Classic -16550.9 1308.59 -12.6479 0 ***
homestyle.bungalow 37577.7 1850.17 20.3104 0 ***
ITERATIONS # 2 0 0 0 Number of Fisher Scoring iterations
ROWS # 492 0 0 0 Number of rows
Residual deviance Infinity 0 0 0 on 478 degrees of freedom
Pearson goodness of fit 5.30669e+10 0 0 0 on 478 degrees of freedom
AIC Infinity 0 0 0 Akaike information criterion
BIC Infinity 0 0 0 Bayesian information criterion
Wald Test 23174 0 0 0 ***
Dispersion parameter 1.11019e+08 0 0 0 Taken to be 1 for BINOMIAL and POISSON.

Many predictors are significant at 95% confidence level (p-value < 0.05).

This query returns the following table:

SELECT * FROM glm_housing_model ORDER BY attribute;
glm_housing_model
attribute predictor category estimate std_err z_score p_value significance family
-1 Loglik   -Infinity 492 13 0   GAUSSIAN
0 (Intercept)   36349.3 2733.46 13.2979 0 *** GAUSSIAN
1 lotsize   2.08095 0.26133 7.96291 1.24345e-14 *** GAUSSIAN
2 bedrooms   782.093 766.84 1.01989 0.308296   GAUSSIAN
3 bathrms   6772.31 1106.78 6.11894 1.96318e-09 *** GAUSSIAN
4 stories   2445.62 694.145 3.52321 0.000467307 *** GAUSSIAN
5 garagepl   1483.1 623.597 2.3783 0.0177847 * GAUSSIAN
6 driveway yes           GAUSSIAN
7 driveway no -2822.63 1481.25 -1.90558 0.0573049 . GAUSSIAN
8 recroom no           GAUSSIAN
9 recroom yes 1208.53 1358.57 0.88956 0.37415   GAUSSIAN
10 fullbase no           GAUSSIAN
11 fullbase yes 3588.3 1167.37 3.07382 0.00223419 ** GAUSSIAN
12 gashw no           GAUSSIAN
13 gashw yes 5787.25 2405.47 2.40587 0.0165127 * GAUSSIAN
14 airco no           GAUSSIAN
15 airco yes 6478.79 1152.16 5.62317 3.19341e-08 *** GAUSSIAN
16 prefarea no           GAUSSIAN
17 prefarea yes 6465.64 1212.84 5.33099 1.50887e-07 *** GAUSSIAN
18 homestyle Eclectic           GAUSSIAN
19 homestyle Classic -16550.9 1308.59 -12.6479 0 *** GAUSSIAN
20 homestyle bungalow 37577.7 1850.17 20.3104 0 *** GAUSSIAN