GLM Example 2: Logistic Regression Analysis with Step

GLM Example 2: Logistic Regression Analysis with Step - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

This example creates the regression model using the Step argument with an intercept.

The step argument is similar to the R function step(). After each step, the function drops one predictor from the current predictor group. The next step starts with the GLM model that has the lowest AIC score model. The function repeats this process until only the intercept remains.

Input

admissions_train, as in GLM Example 1: Logistic Regression Analysis with Intercept

SQL Call

DROP TABLE glm_admissions_model1;

SELECT * FROM GLM (
  ON admissions_train AS InputTable
  OUT TABLE OutputTable (glm_admissions_model1)
  USING
  InputColumns ('admitted', 'masters', 'gpa', 'stats', 'programming')
  CategoricalColumns ('masters', 'stats', 'programming')
  Family ('LOGISTIC')
  LinkFunction ('LOGIT')
  WeightColumn ('1')
  StopThreshold (0.01)
  MaxIterNum (25)
  Step ('true')
  Intercept ('true')
) AS dt;

Output

The model starts with 33 degrees of freedom and then consecutively increases the degrees of freedom to 39, at which point the response is modeled with only the intercept. The model parameters are obtained progressively by dropping one predictor variable.

Model Statistics
predictor	estimate	std_error	z_score	p_value	significance
(Intercept)	1.07751	2.92076	0.368914	0.712192
masters.no	2.21655	1.01999	2.17311	0.0297719	*
gpa	-0.113935	0.802573	-0.141962	0.88711
stats.Novice	0.0406848	1.11567	0.0364667	0.97091
stats.Beginner	0.526618	1.2229	0.430631	0.666736
programming.Beginner	-1.76976	1.069	-1.65553	0.0978177	.
programming.Novice	-0.98035	1.14004	-0.859923	0.389831
ITERATIONS #	4	0	0	0	Number of Fisher Scoring iterations
ROWS #	40	0	0	0	Number of rows
Residual deviance	38.9038	0	0	0	on 33 degrees of freedom
Pearson goodness of fit	37.7905	0	0	0	on 33 degrees of freedom
AIC	52.9038	0	0	0	Akaike information criterion
BIC	64.726	0	0	0	Bayesian information criterion
Wald Test	9.89642	0	0	0.19452
Dispersion parameter	1	0	0	0	Taken to be 1 for BINOMIAL and POISSON.
....	....	....	....	....	....
Residual deviance	44.7694	0	0	0	on 34 degrees of freedom
Pearson goodness of fit	39.895	0	0	0	on 34 degrees of freedom
AIC	56.7694	0	0	0	Akaike information criterion
BIC	66.9027	0	0	0	Bayesian information criterion
....	....	....	....	....	....
Residual deviance	41.8984	0	0	0	on 35 degrees of freedom
Pearson goodness of fit	41.8616	0	0	0	on 35 degrees of freedom
AIC	51.8984	0	0	0	Akaike information criterion
BIC	60.3428	0	0	0	Bayesian information criterion
...	....	....	....	....	....
Residual deviance	39.1062	0	0	0	on 36 degrees of freedom
Pearson goodness of fit	37.9515	0	0	0	on 36 degrees of freedom
AIC	47.1062	0	0	0	Akaike information criterion
BIC	53.8617	0	0	0	Bayesian information criterion
....	....	....	....	....	....
Residual deviance	45.6566	0	0	0	on 37 degrees of freedom
Pearson goodness of fit	40	0	0	0	on 37 degrees of freedom
AIC	51.6566	0	0	0	Akaike information criterion
BIC	56.7232	0	0	0	Bayesian information criterion
...	....	....	....	....	....
Residual deviance	42.8744	0	0	0	on 38 degrees of freedom
Pearson goodness of fit	40	0	0	0	on 38 degrees of freedom
AIC	46.8744	0	0	0	Akaike information criterion
BIC	50.2522	0	0	0	Bayesian information criterion
....	....	....	....	....	....
(Intercept)	0.619039	0.331497	1.86741	0.0618448	.
ITERATIONS #	3	0	0	0	Number of Fisher Scoring iterations
ROWS #	40	0	0	0	Number of rows
Residual deviance	51.7958	0	0	0	on 39 degrees of freedom
Pearson goodness of fit	40	0	0	0	on 39 degrees of freedom
AIC	53.7958	0	0	0	Akaike information criterion
BIC	55.4847	0	0	0	Bayesian information criterion
Wald Test	3.48721	0	0	0.0618447	.
Dispersion parameter	1	0	0	0	Taken to be 1 for BINOMIAL and POISSON.

This query returns the following table:

SELECT * FROM glm_admissions_model1 ORDER BY attribute;

The example uses stepwise selection only to show which predictors are included in the regression. The predicted output models are not used in any prediction function. The following table is only a collection of the coefficients estimated in each step.

glm_admissions_model1
attribute	predictor	category	estimate	std_err	z_score	p_value	significance	family
-1	Loglik		-22.3847	40	5	0		LOGISTIC
-1	Loglik		-19.4519	40	6	0		LOGISTIC
-1	Loglik		-19.4621	40	5	0		LOGISTIC
-1	Loglik		-19.5493	40	4	0		LOGISTIC
-1	Loglik		-20.9492	40	4	0		LOGISTIC
-1	Loglik		-22.8158	40	3	0		LOGISTIC
-1	Loglik		-19.5531	40	3	0		LOGISTIC
-1	Loglik		-21.4127	40	2	0		LOGISTIC
-1	Loglik		-22.8283	40	2	0		LOGISTIC
-1	Loglik		-21.4372	40	1	0		LOGISTIC
-1	Loglik		-25.8979	40	0	0		LOGISTIC
0	(Intercept)		0.676415	0.722718	0.935932	0.349308		LOGISTIC
0	(Intercept)		0.989811	2.90188	0.341094	0.733033		LOGISTIC
0	(Intercept)		1.46634	0.640513	2.28932	0.022061	*	LOGISTIC
0	(Intercept)		0.666854	2.75228	0.242292	0.808554		LOGISTIC
0	(Intercept)		1.04008	2.7457	0.378802	0.704835		LOGISTIC
0	(Intercept)		0.619039	0.331497	1.86741	0.0618448	.	LOGISTIC
...	...	...	...	...	...	...	...	...