In many types of regression problems, the response variable or dependent variable to be predicted has only two possible outcomes. For example, will the customer buy the product in response to the promotion or not? Is the transaction fraudulent or not? Will the customer close their account or not? There are many examples of business problems with only two possible outcomes. Unfortunately the linear regression model comes up short in finding solutions to this type of problem. It is worth trying to understand what these shortcomings are and how the logistic regression model is an improvement when predicting a two-valued response variable.
When the response variable y has only two possible values, which may be coded as a 0 and 1, the expected value of y i , E(y i ) , is actually the probability that the value will be 1. The error term for a linear regression model for a two-valued response function also has only two possible values, so it doesn't have a normal distribution or constant variance over the values of the independent variables. Finally, the regression model can produce a value that does not fall within the necessary constraint of 0 to 1. What would be better would be to compute a continuous probability function between 0 and 1. In order to achieve this continuous probability function, the usual linear regression expression b 0 + b 1 x 1 + ... + b n x n is transformed using a function called a logit transformation function. This function is an example of a sigmoid function, so named because it looks like a sigma or 's' when plotted. It is of course the logit transformation function that gives rise to the term logistic regression.
The type of logistic regression model that Teradata Warehouse Miner supports is one with a two-valued dependent variable, referred to as a binary logit model. However, Teradata Warehouse Miner is capable of coding values for the dependent variable so that the user is not required to code their dependent variable to two distinct values. The user can choose which values to represent as the response value (i.e., 1 or TRUE) and all other will be treated as non-response values (i.e., 0 or FALSE). Even though values other than 1 and 0 are supported in the dependent variable, throughout this section the dependent variable response value is represented as 1 and the non-response value as 0 for ease of reading.
The primary sources of information and formulae in this section are [Hosmer] and [Neter].