Logistic Scoring - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product

Teradata Warehouse Miner

Release Number

5.4.5

Published

February 2018

Language

English (United States)

Last Update

2018-05-04

dita:mapPath

yuy1504291362546.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2302

Product Category

Software

Once a logistic regression model has been built, it can be used to “score” new data, that is, to estimate the value of the dependent variable in the model using data for which its value may not be known. Scoring is performed using the values of the b-coefficients in the logistic regression model and the names of the independent variable column names they correspond to. This information resides in the results metadata stored in the Teradata database by Teradata Warehouse Miner. Other information needed includes the table name in which the data resides, the new table to be created, and primary index information for the new table.

Scoring a logistic regression model requires some steps beyond those required in scoring a linear regression model. The result of scoring a logistic regression model will be a new table containing primary index columns, the probability that the dependent variable is 1 (representing the response value) rather than 0 (representing the non-response value), and/or an estimate of the dependent variable, either 0 or 1, based on a user specified threshold value. For example, if the threshold value is 0.5, then a value of 1 is estimated if the probability value is greater than or equal to 0.5. The probability is based on the logistic regression functions given earlier.

The user can achieve different results based on the threshold value applied to the probability. The model evaluation tables described below can be used to determine what this threshold value should be.

Logistic Scoring applies a Logistic Regression model to a data set that has the same columns as those used in building the model (with the exception that the scoring input table need not include the predicted or dependent variable column unless model evaluation is requested).