Stepwise Logistic Regression | Vantage Analytics Library - Stepwise Logistic Regression - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Lake
Product
Vantage Analytics Library
Release Number
2.2.0
Published
June 2025
ft:locale
en-US
ft:lastEdition
2025-07-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage

Automated stepwise regression procedures are available for logistic regression to aid in model selection, just as they are for linear regression. The procedures are similar to those described for linear regression. This topic highlights some of the procedure similarities and differences.

As is the case with stepwise linear regression, the following automated stepwise procedures provide insight into the variables to include in a logistic regression model. Teradata recommends an element of human decision making in order to produce a model with a useful business application.

Forward-Only Stepwise Logistic Regression

The forward-only procedure consists solely of forward steps, starting without any independent x variables in the model. These steps are continued until no variables can be added to the model.

Forward Stepwise Logistic Regression

The forward stepwise procedure is a combination of the forward and backward steps, always done in pairs, starting without any independent x variables in the model. One forward step is always followed by one backward step, and these single forward and backward steps are alternated until no variables can be added or removed. Additional checks are made after each step to see if the same variables exist in the model as existed after a previous step in the same direction. When this condition is detected in both the forward and backward directions, the algorithm stops running.

Backward-Only Stepwise Logistic Regression

The backward-only procedure consists solely of backward steps, starting with all of the independent x variables in the model. Backward steps are continued until no variables can be removed from the model.

Backward Stepwise Logistic Regression

The backward stepwise procedure is a combination of the backward and forward steps always done in pairs, starting with all of the independent x variables in the model. One backward step is followed by one forward step, and these single backward and forward steps are alternated until no variables can be added or removed. Additional checks are made after each step to see if the same variables exist in the model as existed after a previous step in the same direction. When this condition is detected in both the backward and forward directions, the algorithm stops running.

Stepwise Logistic Regression - Forward Step

In stepwise linear regression, the partial F statistic—or the analogous T-statistic probability value—is computed separately for each variable outside the model, adding each of them into the model one at a time. The analogous procedure for logistic regression consists of computing the likelihood ratio statistic G, defined in "Logistic Regression Model" in Data Quality Reports. For each variable outside the model, selecting the variable that results in the largest G value when added to the model. In the case of logistic regression, this becomes an expensive proposition because the solution of the model for each variable requires another iterative maximum likelihood solution, contrasted to the more rapidly achieved closed-form solution available in linear regression.

The Analytics Library uses a statistic that can be calculated without requiring an additional maximum likelihood solution. The statistic—proposed by Peduzzi, Hardy, and Holford—is called a W statistic. This statistic is comparatively inexpensive to compute for each variable outside the model and is expedient to use as a criterion for selecting a variable to add to the model. The W statistic follows a chi-squared distribution with one degree of freedom due to its similarity to other statistics, and behaves similarly to the likelihood ratio statistic. The variable with the smallest chi-squared P-value associated with its W statistic is added to the model in a forward step, if the P-value is less than the criterion to enter. If more than one variable has a P-value of 0, then the variable with the largest W statistic is entered. For more information, refer to [Peduzzi, Hardy and Holford].

Stepwise Logistic Regression - Backward step

Each backward step seeks to remove those variables that have statistical significance below a certain level. This is done by first fitting the model with the currently selected variables, including the calculation of the probability, or P-value, associated with the T-statistic for each variable, which is the ratio of the b-coefficient to its standard error. The variable with the largest P-value is removed if it is greater than the criterion to remove.

Rules

When using stepwise logistic regression, the following rules apply:
  • If stepwise is selected, the default stepwise technique is forward.
  • If stepwise is not requested, then forward, forwardonly, backward, and backwardonly must also not be selected.
  • Do not select more than one forward, forwardonly, backward, and backwardonly variable.
  • Forward is selected automatically if forwardonly is selected.
  • Backward is selected automatically if backwardonly is select.
  • The criteria to remove must be greater than or equal to zero (0) and less than or equal to one (1).
  • The criteria to enter must be greater than or equal to zero (0) and less than or equal to one (1).
  • The criteria to remove must be greater than or equal to the criteria to enter.
  • Do not select both the groupby and stepwise parameters.