This function outputs one or more columns of XML. You can transform the XML to HTML, which is easier to view—see Reports.
Logistic Regression Model Statistics
Report Item | Description |
---|---|
rid | If the groupby option is used, rid is added as an index to the table and is incremented for each distinct value of the groupby column. |
Groupby Columns | A column is generated for each groupby column. Within each column there are distinct values of the groupby columns for which a logistic model was built. |
Total Observations | Number of rows in the table the logistic regression analysis is based on. The number of observations reflects the row count after any rows were eliminated by listwise deletion (due to one of the variables being null). |
Total Iterations | Number of iterations used by the non-linear optimization algorithm in maximizing the log likelihood function. |
Initial Log Likelihood | Loglikelihood of the constant only model and is
given only when the constant is included in the model. The formula
for initial log likelihood is given by: where n is the number of observations. |
Final Log Likelihood | Value of the log likelihood function after the last iteration. |
Likelihood Ratio G Statistic | Deviance, given by D = -2LM, where LM is the log likelihood of the logistic regression model, is a measure analogous to the residual sums of squares RSS in a linear regression model. In order to assess the utility of the independent terms taken as a whole in the logistic regression model, the deviance difference statistic G is calculated for the model with a constant term only versus the model with all variables fitted. This statistic is then G = -2(L0 - LM), where L0 is the log likelihood of a model containing only a constant. The G statistic, like the deviance D, is an example of a likelihood ratio test statistic. |
Chi-Squared Degrees of Freedom | The G Statistic follows a chi-squared distribution with “variables minus one” degrees of freedom. This field then is the degrees of freedom for the G Statistic’s chi-squared test. |
Chi-Squared Value | Chi-squared random variable value for the Likelihood Ratio Test G Statistic. Use this value to test whether all the independent variable coefficients should be 0. However, examining the Chi-squared Probability field is the easiest way to assess this test. |
Chi-Squared Probability | Chi-squared probability value for the Likelihood Ratio Test G Statistic. Use this value to test whether all the independent variable coefficients should be 0. That is, the probability that a chi-squared distributed variable would have the value G or greater is the probability associated with having all 0 coefficients. The null hypothesis that all the terms should be 0 can be rejected if this probability is sufficiently small, say less than 0.05. |
McFadden's Pseudo R- Squared | To mimic the Squared Multiple Correlation Coefficient (R2) in a linear regression model, the researcher McFadden suggested this measure given by (L0 - LM) / L0 where L0 is the log likelihood of a model containing only a constant, and LM is the log likelihood of the logistic regression model. Although it is not, truly speaking, a goodness of fit measure, it can be useful in assessing a logistic regression model. Experience shows that the value of this statistic tends to be less than the R2 value it mimics. In fact, values between 0.20 and 0.40 are quite satisfactory. |
Dependent Variable | Column chosen as the dependent variable. |
Dependent Response Value | Response value chosen for the dependent variable. |
Total Distinct Values | Number of distinct values that the dependent variable takes on. |
Logistic Regression Variables in Model Report
Report Item | Description |
---|---|
Groupby Value | If groupby is specified, the distinct values for which a logistic model was built are added as part of the index here. |
Column Name | Each independent variable in the model is listed along with accompanying measures. The first independent variable listed is CONSTANT, a fixed value representing the constant term in the logistic regression model. |
B Coefficient | Coefficient in the logistic regression model for
this variable. The following equations describe the logistic
regression model, with being the probability that the dependent
variable is 1, and g(x) being the logit transformation: |
Standard Error | Standard error of a b-coefficient in the logistic regression model is a measure of its expected accuracy. It is analogous to the standard error of a coefficient in a linear regression model. |
Wald Statistic | Calculated as the square of the T-statistic (T Stat) described in this table. The T-statistic is calculated for each b-coefficient as the ratio of the b-coefficient value to its standard error. |
T Statistic | Similar to linear regression, the T-statistic is calculated for each b-coefficient as the ratio of the b-coefficient value to its standard error. Use this value, along with its associated t-distribution probability value, to assess the statistical significance of this term in the model |
P-value | t-distribution probability value associated with
the T-statistic (T Stat),
that is, the ratio of the b-coefficient value (B
Coef) to its standard error (Std Error). Use this value to
assess the statistical significance of this term in the logistic
regression model. A value close to 0 implies statistical
significance and means this term in the model is important. The P-value represents the probability that the null hypothesis is true, that is, the observation of the estimated coefficient value is chance occurrence (i.e., the null hypothesis is that the coefficient equals zero). The smaller the P-value, the stronger the evidence for rejecting the null hypothesis that the coefficient is actually equal to zero. In other words, the smaller the P-value, the larger the evidence that the coefficient is different from zero. |
Odds Ratio | Calculated by taking the exponent of the b-coefficient. The odds ratio is the factor by which the odds of the dependent variable being 1 change due to a unit increase in this independent variable. |
Lower | Because of the intuitive meaning of the odds ratio, confidence intervals for coefficients in the model are calculated on odds ratios rather than on the coefficients themselves. The confidence interval is computed based on a 95% confidence level and a two-tailed normal distribution. “Lower” is the lower range of this confidence interval. |
Upper | Because of the intuitive meaning of the odds ratio, confidence intervals for coefficients in the model are calculated on odds ratios rather than on the coefficients themselves. The confidence interval is computed based on a 95% confidence level and a two-tailed normal distribution. “Upper” is the upper range of this confidence interval. |
Partial R | Calculated for each b- coefficient value as: where:
If wi <=
2, then Partial R is set to 0. This
statistic provides a measure of the relative importance of
each variable in the model. It is calculated only when the
constant term is included in the model.
[SPSS]
|
Standardized Coefficient | The estimated standardized coefficient is
calculated for each b-coefficient value as: where:
|
Logistic Regression Model
Report Item | Data Type | Description |
---|---|---|
Groupby Variable | User-Defined | A column is generated for each groupby column. Within each column there are distinct values of the groupby columns for which a logistic model was built. |
partId | INTEGER | For each batch of XML in the second column that is 31000 bytes, partId is incremented. |
XmlModel | VARCHAR(31000) | A 31000 byte block of an XML representation of
the model. This column is used for scoring the model. Any requested reports appear here as well. |
To extract the XML into a viewable format, run the following query:
SELECT XMLSERIALIZE(Content X.Dot) as XMLText FROM (SELECT * FROM "outputdatabase"."outputtablename_txt") AS C, XMLTable ( '//*' PASSING CREATEXML(C.XmlModel) ) AS X ("Dot");