This example creates the regression model using the Step argument with an intercept.
The step argument is similar to the R function step(). After each step, the function drops one predictor from the current predictor group. The next step starts with the GLM model that has the lowest AIC score model. The function repeats this process until only the intercept remains.
Input
- admissions_train, as in GLM Example 1: Logistic Regression Analysis with Intercept
SQL Call
DROP TABLE glm_admissions_model1; SELECT * FROM GLM ( ON admissions_train AS InputTable OUT TABLE OutputTable (glm_admissions_model1) USING InputColumns ('admitted', 'masters', 'gpa', 'stats', 'programming') CategoricalColumns ('masters', 'stats', 'programming') Family ('LOGISTIC') LinkFunction ('LOGIT') WeightColumn ('1') StopThreshold (0.01) MaxIterNum (25) Step ('true') Intercept ('true') ) AS dt;
Output
The model starts with 33 degrees of freedom and then consecutively increases the degrees of freedom to 39, at which point the response is modeled with only the intercept. The model parameters are obtained progressively by dropping one predictor variable.
predictor | estimate | std_error | z_score | p_value | significance |
---|---|---|---|---|---|
(Intercept) | 1.07751 | 2.92076 | 0.368914 | 0.712192 | |
masters.no | 2.21655 | 1.01999 | 2.17311 | 0.0297719 | * |
gpa | -0.113935 | 0.802573 | -0.141962 | 0.88711 | |
stats.Novice | 0.0406848 | 1.11567 | 0.0364667 | 0.97091 | |
stats.Beginner | 0.526618 | 1.2229 | 0.430631 | 0.666736 | |
programming.Beginner | -1.76976 | 1.069 | -1.65553 | 0.0978177 | . |
programming.Novice | -0.98035 | 1.14004 | -0.859923 | 0.389831 | |
ITERATIONS # | 4 | 0 | 0 | 0 | Number of Fisher Scoring iterations |
ROWS # | 40 | 0 | 0 | 0 | Number of rows |
Residual deviance | 38.9038 | 0 | 0 | 0 | on 33 degrees of freedom |
Pearson goodness of fit | 37.7905 | 0 | 0 | 0 | on 33 degrees of freedom |
AIC | 52.9038 | 0 | 0 | 0 | Akaike information criterion |
BIC | 64.726 | 0 | 0 | 0 | Bayesian information criterion |
Wald Test | 9.89642 | 0 | 0 | 0.19452 | |
Dispersion parameter | 1 | 0 | 0 | 0 | Taken to be 1 for BINOMIAL and POISSON. |
.... | .... | .... | .... | .... | .... |
Residual deviance | 44.7694 | 0 | 0 | 0 | on 34 degrees of freedom |
Pearson goodness of fit | 39.895 | 0 | 0 | 0 | on 34 degrees of freedom |
AIC | 56.7694 | 0 | 0 | 0 | Akaike information criterion |
BIC | 66.9027 | 0 | 0 | 0 | Bayesian information criterion |
.... | .... | .... | .... | .... | .... |
Residual deviance | 41.8984 | 0 | 0 | 0 | on 35 degrees of freedom |
Pearson goodness of fit | 41.8616 | 0 | 0 | 0 | on 35 degrees of freedom |
AIC | 51.8984 | 0 | 0 | 0 | Akaike information criterion |
BIC | 60.3428 | 0 | 0 | 0 | Bayesian information criterion |
... | .... | .... | .... | .... | .... |
Residual deviance | 39.1062 | 0 | 0 | 0 | on 36 degrees of freedom |
Pearson goodness of fit | 37.9515 | 0 | 0 | 0 | on 36 degrees of freedom |
AIC | 47.1062 | 0 | 0 | 0 | Akaike information criterion |
BIC | 53.8617 | 0 | 0 | 0 | Bayesian information criterion |
.... | .... | .... | .... | .... | .... |
Residual deviance | 45.6566 | 0 | 0 | 0 | on 37 degrees of freedom |
Pearson goodness of fit | 40 | 0 | 0 | 0 | on 37 degrees of freedom |
AIC | 51.6566 | 0 | 0 | 0 | Akaike information criterion |
BIC | 56.7232 | 0 | 0 | 0 | Bayesian information criterion |
... | .... | .... | .... | .... | .... |
Residual deviance | 42.8744 | 0 | 0 | 0 | on 38 degrees of freedom |
Pearson goodness of fit | 40 | 0 | 0 | 0 | on 38 degrees of freedom |
AIC | 46.8744 | 0 | 0 | 0 | Akaike information criterion |
BIC | 50.2522 | 0 | 0 | 0 | Bayesian information criterion |
.... | .... | .... | .... | .... | .... |
(Intercept) | 0.619039 | 0.331497 | 1.86741 | 0.0618448 | . |
ITERATIONS # | 3 | 0 | 0 | 0 | Number of Fisher Scoring iterations |
ROWS # | 40 | 0 | 0 | 0 | Number of rows |
Residual deviance | 51.7958 | 0 | 0 | 0 | on 39 degrees of freedom |
Pearson goodness of fit | 40 | 0 | 0 | 0 | on 39 degrees of freedom |
AIC | 53.7958 | 0 | 0 | 0 | Akaike information criterion |
BIC | 55.4847 | 0 | 0 | 0 | Bayesian information criterion |
Wald Test | 3.48721 | 0 | 0 | 0.0618447 | . |
Dispersion parameter | 1 | 0 | 0 | 0 | Taken to be 1 for BINOMIAL and POISSON. |
This query returns the following table:
SELECT * FROM glm_admissions_model1 ORDER BY attribute;
The example uses stepwise selection only to show which predictors are included in the regression. The predicted output models are not used in any prediction function. The following table is only a collection of the coefficients estimated in each step.
attribute | predictor | category | estimate | std_err | z_score | p_value | significance | family |
---|---|---|---|---|---|---|---|---|
-1 | Loglik | -22.3847 | 40 | 5 | 0 | LOGISTIC | ||
-1 | Loglik | -19.4519 | 40 | 6 | 0 | LOGISTIC | ||
-1 | Loglik | -19.4621 | 40 | 5 | 0 | LOGISTIC | ||
-1 | Loglik | -19.5493 | 40 | 4 | 0 | LOGISTIC | ||
-1 | Loglik | -20.9492 | 40 | 4 | 0 | LOGISTIC | ||
-1 | Loglik | -22.8158 | 40 | 3 | 0 | LOGISTIC | ||
-1 | Loglik | -19.5531 | 40 | 3 | 0 | LOGISTIC | ||
-1 | Loglik | -21.4127 | 40 | 2 | 0 | LOGISTIC | ||
-1 | Loglik | -22.8283 | 40 | 2 | 0 | LOGISTIC | ||
-1 | Loglik | -21.4372 | 40 | 1 | 0 | LOGISTIC | ||
-1 | Loglik | -25.8979 | 40 | 0 | 0 | LOGISTIC | ||
0 | (Intercept) | 0.676415 | 0.722718 | 0.935932 | 0.349308 | LOGISTIC | ||
0 | (Intercept) | 0.989811 | 2.90188 | 0.341094 | 0.733033 | LOGISTIC | ||
0 | (Intercept) | 1.46634 | 0.640513 | 2.28932 | 0.022061 | * | LOGISTIC | |
0 | (Intercept) | 0.666854 | 2.75228 | 0.242292 | 0.808554 | LOGISTIC | ||
0 | (Intercept) | 1.04008 | 2.7457 | 0.378802 | 0.704835 | LOGISTIC | ||
0 | (Intercept) | 0.619039 | 0.331497 | 1.86741 | 0.0618448 | . | LOGISTIC | |
... | ... | ... | ... | ... | ... | ... | ... | ... |