1.0 - 8.00 - GLM Example 2: Logistic Regression Analysis with Step - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

This example creates the regression model using the Step argument with an intercept.

The step argument is similar to the R function step(). After each step, the function drops one predictor from the current predictor group. The next step starts with the GLM model that has the lowest AIC score model. The function repeats this process until only the intercept remains.

SQL Call

DROP TABLE glm_admissions_model1;

SELECT * FROM GLM (
  ON admissions_train AS InputTable
  OUT TABLE OutputTable (glm_admissions_model1)
  USING
  InputColumns ('admitted', 'masters', 'gpa', 'stats', 'programming')
  CategoricalColumns ('masters', 'stats', 'programming')
  Family ('LOGISTIC')
  LinkFunction ('LOGIT')
  WeightColumn ('1')
  StopThreshold (0.01)
  MaxIterNum (25)
  Step ('true')
  Intercept ('true')
) AS dt;

Output

The model starts with 33 degrees of freedom and then consecutively increases the degrees of freedom to 39, at which point the response is modeled with only the intercept. The model parameters are obtained progressively by dropping one predictor variable.

Model Statistics
predictor estimate std_error z_score p_value significance
(Intercept) 1.07751 2.92076 0.368914 0.712192  
masters.no 2.21655 1.01999 2.17311 0.0297719 *
gpa -0.113935 0.802573 -0.141962 0.88711  
stats.Novice 0.0406848 1.11567 0.0364667 0.97091  
stats.Beginner 0.526618 1.2229 0.430631 0.666736  
programming.Beginner -1.76976 1.069 -1.65553 0.0978177 .
programming.Novice -0.98035 1.14004 -0.859923 0.389831  
ITERATIONS # 4 0 0 0 Number of Fisher Scoring iterations
ROWS # 40 0 0 0 Number of rows
Residual deviance 38.9038 0 0 0 on 33 degrees of freedom
Pearson goodness of fit 37.7905 0 0 0 on 33 degrees of freedom
AIC 52.9038 0 0 0 Akaike information criterion
BIC 64.726 0 0 0 Bayesian information criterion
Wald Test 9.89642 0 0 0.19452  
Dispersion parameter 1 0 0 0 Taken to be 1 for BINOMIAL and POISSON.
.... .... .... .... .... ....
Residual deviance 44.7694 0 0 0 on 34 degrees of freedom
Pearson goodness of fit 39.895 0 0 0 on 34 degrees of freedom
AIC 56.7694 0 0 0 Akaike information criterion
BIC 66.9027 0 0 0 Bayesian information criterion
.... .... .... .... .... ....
Residual deviance 41.8984 0 0 0 on 35 degrees of freedom
Pearson goodness of fit 41.8616 0 0 0 on 35 degrees of freedom
AIC 51.8984 0 0 0 Akaike information criterion
BIC 60.3428 0 0 0 Bayesian information criterion
... .... .... .... .... ....
Residual deviance 39.1062 0 0 0 on 36 degrees of freedom
Pearson goodness of fit 37.9515 0 0 0 on 36 degrees of freedom
AIC 47.1062 0 0 0 Akaike information criterion
BIC 53.8617 0 0 0 Bayesian information criterion
.... .... .... .... .... ....
Residual deviance 45.6566 0 0 0 on 37 degrees of freedom
Pearson goodness of fit 40 0 0 0 on 37 degrees of freedom
AIC 51.6566 0 0 0 Akaike information criterion
BIC 56.7232 0 0 0 Bayesian information criterion
... .... .... .... .... ....
Residual deviance 42.8744 0 0 0 on 38 degrees of freedom
Pearson goodness of fit 40 0 0 0 on 38 degrees of freedom
AIC 46.8744 0 0 0 Akaike information criterion
BIC 50.2522 0 0 0 Bayesian information criterion
.... .... .... .... .... ....
(Intercept) 0.619039 0.331497 1.86741 0.0618448 .
ITERATIONS # 3 0 0 0 Number of Fisher Scoring iterations
ROWS # 40 0 0 0 Number of rows
Residual deviance 51.7958 0 0 0 on 39 degrees of freedom
Pearson goodness of fit 40 0 0 0 on 39 degrees of freedom
AIC 53.7958 0 0 0 Akaike information criterion
BIC 55.4847 0 0 0 Bayesian information criterion
Wald Test 3.48721 0 0 0.0618447 .
Dispersion parameter 1 0 0 0 Taken to be 1 for BINOMIAL and POISSON.

This query returns the following table:

SELECT * FROM glm_admissions_model1 ORDER BY attribute;

The example uses stepwise selection only to show which predictors are included in the regression. The predicted output models are not used in any prediction function. The following table is only a collection of the coefficients estimated in each step.

glm_admissions_model1
attribute predictor category estimate std_err z_score p_value significance family
-1 Loglik   -22.3847 40 5 0   LOGISTIC
-1 Loglik   -19.4519 40 6 0   LOGISTIC
-1 Loglik   -19.4621 40 5 0   LOGISTIC
-1 Loglik   -19.5493 40 4 0   LOGISTIC
-1 Loglik   -20.9492 40 4 0   LOGISTIC
-1 Loglik   -22.8158 40 3 0   LOGISTIC
-1 Loglik   -19.5531 40 3 0   LOGISTIC
-1 Loglik   -21.4127 40 2 0   LOGISTIC
-1 Loglik   -22.8283 40 2 0   LOGISTIC
-1 Loglik   -21.4372 40 1 0   LOGISTIC
-1 Loglik   -25.8979 40 0 0   LOGISTIC
0 (Intercept)   0.676415 0.722718 0.935932 0.349308   LOGISTIC
0 (Intercept)   0.989811 2.90188 0.341094 0.733033   LOGISTIC
0 (Intercept)   1.46634 0.640513 2.28932 0.022061 * LOGISTIC
0 (Intercept)   0.666854 2.75228 0.242292 0.808554   LOGISTIC
0 (Intercept)   1.04008 2.7457 0.378802 0.704835   LOGISTIC
0 (Intercept)   0.619039 0.331497 1.86741 0.0618448 . LOGISTIC
... ... ... ... ... ... ... ... ...