Model Diagnostics | Logistic Regression | Vantage Analytics Library - Model Diagnostics - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage

Logistic regression has counterparts to many model diagnostics available with linear regression. These diagnostics provide a mathematically sound way to evaluate a model built with logistic regression.

Standard Errors and Statistics

For each b-coefficient, the logistic function computes the standard error, T-statistic (or Wald statistic), and t-distribution probability value.

The T-statistic is the ratio of a b-coefficient value to its standard error. You can use the T-statistic and t-distribution probability value to assess the statistical significance of this b-coefficient in the model.

To compute the standard errors of the b-coefficients, the function uses an information matrix (or Hessian matrix), a matrix of second-order partial derivatives of the log likelihood function with respect to all possible pairs of the coefficient values.

This is the formula for information matrix element Aj, k:

Standard error of coefficients equation

Standard error of coefficients equation

Odds Ratios and Confidence Intervals

The logistic function computes confidence intervals using odds ratios.

In a linear regression model, each b-coefficient represents the change in the dependent y variable value when the corresponding independent x value changes by 1. In a logistic regression model, increasing an x variable value by 1 implies a change in the odds that the outcome y variable value is 1 rather than 0.

Here is the formula for the logit response function again:


Logit response function formula

The response function is the log of the odds that the response is 1, where π(x) is the probability that the response is 1 and 1 – π(x) is the probability that the response is 0. If xj varies by 1, the response function varies by bj. That is:

g(x0 … xj) - g(x0 … xj … xn) = bj

Equivalently:


Alternate logit response function formula

Therefore, this is the formula for the odds ratio of the coefficient bj:


Odds ration of coefficient formula

By taking the exponent of a b-coefficient, you get the odds ratio that is the factor by which the odds change due to a unit increase in xj.

Confidence intervals calculated on odds ratios for each b-coefficient are more meaningful than those calculated on the b-coefficients themselves. The confidence interval is computed based on a 95% confidence level and a two-tailed normal distribution.

Logistic Regression Goodness of Fit

In linear regression, a key measure associated with goodness of fit is the residual sums of squares (RSS). The analogous measure for logistic regression is the deviance. The deviance (D) is the ratio of the likelihood of a given model to the likelihood of a perfectly fitted or saturated model:

D = -2ln(ModelLH / SatModelLH)

Equivalently, in terms of the model log likelihood and the saturated model log likelihood:

D = -2LM + 2LS

Looking at the data as a set of n independent Bernoulli observations, LS=0, so D = -2LM.

You can compare two models by taking the difference between their deviance values:

G = D1 - D2 = -2(L1 - L2)

To evaluate the independent model terms as a whole, calculate the difference in deviance for the model with a constant term only and the model with all variables fitted:

G = -2(L0 - LM)

Calculate LM with the log likelihood formula.

Calculate L0 with this formula, where n is the number of observations:


Goodness of fit equation

G has a chi-squared distribution with v-1 degrees of freedom, where v is the number of variables. Therefore, G is the probability that zero is the correct value for every x-term coefficient.

Several pseudo R-squared values are suggested. They are not true goodness of fit measures, but can be useful in evaluating the model. [Agresti]

The logistic function provides one such measure, suggested by McFadden:

(L0 - LM) / L0