Model Evaluation | Logistic Regression Scoring | Vantage Analytics Library - Model Evaluation - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage

The same model evaluation available when building a Logistic Regression model is also available when scoring it, including the following report tables.

Prediction Success Table

The prediction success table is computed using only probabilities and not estimates based on a threshold value. Using an input table that contains known values for the dependent variable, the sum of the probability values π(x) and 1 – π(x), which correspond to the probability that the predicted value is 1 or 0 respectively, are calculated separately for rows with actual values of 1 and 0. This produces a report as shown in the following table:

  Estimate Response Estimate Non-Response Actual Total
Actual Response 306.5325 68.4675 375.0000
Actual Non-Response 69.0115 302.9885 372.0000
Estimated Total 375.5440 371.4560 747.0000

An interesting and useful feature of this table is that it is independent of the threshold value that is used in scoring to determine which probabilities correspond to an estimate of 1 and 0, respectively. This is possible because the entries in the “Estimate Response” column are the sums of the probabilities π(x) that the outcome is 1, summed separately over the rows where the actual outcome is 1 and 0 and then totaled. Similarly, the entries in the “Estimate Non-Response” column are the sums of the probabilities 1 – π(x) that the outcome is 0.

Multi-Threshold Success Table

This table provides values similar to those in the prediction success table, but instead of summing probabilities, the estimated values based on a threshold value are summed instead. Rather than just one threshold, however, several thresholds ranging from a user-specified low to high value are displayed in user-specified increments. This allows the user to compare several success scenarios using different threshold values, to aid in the choice of an ideal threshold.

For example, consider the ideal threshold value is one that maximizes the number of correctly classified observations. However, subjective business considerations may be applied by looking at all of the success values. It may be that wrong predictions in one direction (for example, estimate 1 when the actual value is 0) is more tolerable than in the other direction (estimate 0 when the actual value is 1). One may, for example, mind less overlooking fraudulent behavior than wrongly accusing someone of fraud.

The following is an example of a logistic regression multi-threshold success table.

Threshold Probability Actual Response, Estimate Response Actual Response, Estimate Non-Response Actual Non-Response, Estimate Response Actual Non-Response, Estimate Non-Response
0.0000 375 0 372 0
0.0500 375 0 326 46
0.1000 374 1 231 141
0.1500 372 3 145 227
0.2000 367 8 93 279
0.2500 358 17 59 313
0.3000 354 21 46 326
0.3500 347 28 38 334
0.4000 338 37 32 340
0.4500 326 49 27 345
0.5000 318 57 27 345
0.5500 304 71 26 346
0.6000 296 79 24 348
0.6500 287 88 22 350
0.7000 279 96 21 351
0.7500 270 105 19 353
0.8000 258 117 18 354
0.8500 245 130 16 356
0.9000 222 153 12 360
0.9500 187 188 10 362

Cumulative Lift Table

The cumulative lift table is produced for deciles based on the probability values. Note that the deciles are labeled such that 1 is the highest decile and 10 is the lowest, based on the probability values calculated by logistic regression. Within each decile, the following measures are given:
  • Count of “response” values
  • Count of observations
  • Percentage response (percentage of response values within the decile)
  • Captured response (percentage of responses over all response values)
  • Lift value (percentage response / expected response, where the expected response is the percentage of responses over all observations)
  • Cumulative versions of each of the measures listed

The following is an example of a logistic regression cumulative lift table.

Decile Count Response Response (%) Captured Response (%) Lift Cumulative Response Cumulative Response (%) Cumulative Captured Response (%) Cumulative Lift
1 74.0000 73.0000 98.6486 19.4667 1.9651 73.0000 98.6486 19.4667 1.9651
2 75.0000 69.0000 92.0000 18.4000 1.8326 142.0000 95.3020 37.8667 1.8984
3 75.0000 71.0000 94.6667 18.9333 1.8858 213.0000 95.0893 56.8000 1.8942
4 74.0000 65.0000 87.8378 17.3333 1.7497 278.0000 93.2886 74.1333 1.8583
5 75.0000 63.0000 84.0000 16.8000 1.6733 341.0000 91.4209 90.9333 1.8211
6 75.0000 23.0000 30.6667 6.1333 0.6109 364.0000 81.2500 97.0667 1.6185
7 74.0000 8.0000 10.8108 2.1333 0.2154 372.0000 71.2644 99.2000 1.4196
8 75.0000 2.0000 2.6667 0.5333 0.0531 374.0000 62.6466 99.7333 1.2479
9 75.0000 1.0000 1.3333 0.2667 0.0266 375.0000 55.8036 100.0000 1.1116
10 75.0000 0.0000 0.0000 0.0000 0.0000 375.0000 50.2008 100.0000 1.0000