Results Data | Logistic Regression | Vantage Analytics Library - Results Data - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
This function outputs one or more columns of XML. You can transform the XML to HTML, which is easier to view—see Reports.

Logistic Regression Model Statistics

Table name = outputdatabase.outputtablename_rpt
Report Item Description
rid If the groupby option is used, rid is added as an index to the table and is incremented for each distinct value of the groupby column.
Groupby Columns A column is generated for each groupby column. Within each column there are distinct values of the groupby columns for which a logistic model was built.
Total Observations Number of rows in the table the logistic regression analysis is based on. The number of observations reflects the row count after any rows were eliminated by listwise deletion (due to one of the variables being null).
Total Iterations Number of iterations used by the non-linear optimization algorithm in maximizing the log likelihood function.
Initial Log Likelihood Loglikelihood of the constant only model and is given only when the constant is included in the model. The formula for initial log likelihood is given by:

Initial log likelihood formula

where n is the number of observations.

Final Log Likelihood Value of the log likelihood function after the last iteration.
Likelihood Ratio G Statistic Deviance, given by D = -2LM, where LM is the log likelihood of the logistic regression model, is a measure analogous to the residual sums of squares RSS in a linear regression model. In order to assess the utility of the independent terms taken as a whole in the logistic regression model, the deviance difference statistic G is calculated for the model with a constant term only versus the model with all variables fitted. This statistic is then G = -2(L0 - LM), where L0 is the log likelihood of a model containing only a constant. The G statistic, like the deviance D, is an example of a likelihood ratio test statistic.
Chi-Squared Degrees of Freedom The G Statistic follows a chi-squared distribution with “variables minus one” degrees of freedom. This field then is the degrees of freedom for the G Statistic’s chi-squared test.
Chi-Squared Value Chi-squared random variable value for the Likelihood Ratio Test G Statistic. Use this value to test whether all the independent variable coefficients should be 0. However, examining the Chi-squared Probability field is the easiest way to assess this test.
Chi-Squared Probability Chi-squared probability value for the Likelihood Ratio Test G Statistic. Use this value to test whether all the independent variable coefficients should be 0. That is, the probability that a chi-squared distributed variable would have the value G or greater is the probability associated with having all 0 coefficients. The null hypothesis that all the terms should be 0 can be rejected if this probability is sufficiently small, say less than 0.05.
McFadden's Pseudo R- Squared To mimic the Squared Multiple Correlation Coefficient (R2) in a linear regression model, the researcher McFadden suggested this measure given by (L0 - LM) / L0 where L0 is the log likelihood of a model containing only a constant, and LM is the log likelihood of the logistic regression model. Although it is not, truly speaking, a goodness of fit measure, it can be useful in assessing a logistic regression model. Experience shows that the value of this statistic tends to be less than the R2 value it mimics. In fact, values between 0.20 and 0.40 are quite satisfactory.
Dependent Variable Column chosen as the dependent variable.
Dependent Response Value Response value chosen for the dependent variable.
Total Distinct Values Number of distinct values that the dependent variable takes on.

Logistic Regression Variables in Model Report

Table name = outputdatabase.outputtablename
Report Item Description
Groupby Value If groupby is specified, the distinct values for which a logistic model was built are added as part of the index here.
Column Name Each independent variable in the model is listed along with accompanying measures. The first independent variable listed is CONSTANT, a fixed value representing the constant term in the logistic regression model.
B Coefficient Coefficient in the logistic regression model for this variable. The following equations describe the logistic regression model, with being the probability that the dependent variable is 1, and g(x) being the logit transformation:

Logistic regression model equation

Logit transformation equation
Standard Error Standard error of a b-coefficient in the logistic regression model is a measure of its expected accuracy. It is analogous to the standard error of a coefficient in a linear regression model.
Wald Statistic Calculated as the square of the T-statistic (T Stat) described in this table. The T-statistic is calculated for each b-coefficient as the ratio of the b-coefficient value to its standard error.
T Statistic Similar to linear regression, the T-statistic is calculated for each b-coefficient as the ratio of the b-coefficient value to its standard error. Use this value, along with its associated t-distribution probability value, to assess the statistical significance of this term in the model
P-value t-distribution probability value associated with the T-statistic (T Stat), that is, the ratio of the b-coefficient value (B Coef) to its standard error (Std Error). Use this value to assess the statistical significance of this term in the logistic regression model. A value close to 0 implies statistical significance and means this term in the model is important.

The P-value represents the probability that the null hypothesis is true, that is, the observation of the estimated coefficient value is chance occurrence (i.e., the null hypothesis is that the coefficient equals zero). The smaller the P-value, the stronger the evidence for rejecting the null hypothesis that the coefficient is actually equal to zero. In other words, the smaller the P-value, the larger the evidence that the coefficient is different from zero.

Odds Ratio Calculated by taking the exponent of the b-coefficient. The odds ratio is the factor by which the odds of the dependent variable being 1 change due to a unit increase in this independent variable.
Lower Because of the intuitive meaning of the odds ratio, confidence intervals for coefficients in the model are calculated on odds ratios rather than on the coefficients themselves. The confidence interval is computed based on a 95% confidence level and a two-tailed normal distribution. “Lower” is the lower range of this confidence interval.
Upper Because of the intuitive meaning of the odds ratio, confidence intervals for coefficients in the model are calculated on odds ratios rather than on the coefficients themselves. The confidence interval is computed based on a 95% confidence level and a two-tailed normal distribution. “Upper” is the upper range of this confidence interval.
Partial R Calculated for each b- coefficient value as:

Partial R statistic equation
where:
  • bi is the b- coefficient
  • wi is the Wald Statistic of the ith independent variable
  • L0 is the initial log likelihood of the model
If wi <= 2, then Partial R is set to 0. This statistic provides a measure of the relative importance of each variable in the model. It is calculated only when the constant term is included in the model. [SPSS]
Standardized Coefficient The estimated standardized coefficient is calculated for each b-coefficient value as:

Estimated standard coefficient equation
where:
  • bi is the b-coefficient
  • is the standard deviation of the ith independent variable
  • Standard deviation of logistic distribution is the standard deviation of the standard logistic distribution
This calculation only provides an estimate of the standardized coefficients since it uses a constant value for the logistic distribution without regard to the actual distribution of the dependent variable in the model. [Menard]

Logistic Regression Model

Table name = outputdatabase.outputtablename_txt
Report Item Data Type Description
Groupby Variable User-Defined A column is generated for each groupby column. Within each column there are distinct values of the groupby columns for which a logistic model was built.
partId INTEGER For each batch of XML in the second column that is 31000 bytes, partId is incremented.
XmlModel VARCHAR(31000) A 31000 byte block of an XML representation of the model. This column is used for scoring the model.

Any requested reports appear here as well.

To extract the XML into a viewable format, run the following query:

SELECT XMLSERIALIZE(Content X.Dot) as XMLText
FROM (SELECT * FROM "outputdatabase"."outputtablename_txt") AS C,
XMLTable (
'//*'
PASSING CREATEXML(C.XmlModel)
) AS X ("Dot");