Tutorial - Logistic Regression - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
yuy1504291362546.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2302
Product Category
Software

The following is an example of using the stepwise feature of Logistic Regression analysis. The stepwise feature adds extra processing steps to the analysis; that is, normal Logistic Regression processing is a subset of the output shown below. In this example, ccacct (has credit card, 0 or 1) is being predicted in terms of 16 independent variables, from income to avg_sv_tran_cnt. The forward stepwise process determines that only 7 out of the original 16 input variables should be used in the model. These include avg_sv_tran_amt (average amount of savings transactions), avg_sv_tran_cnt (average number of savings transactions per month), avg_sv_bal (average savings account balance), married, years_with_bank, avg_ck_tran_cnt (average number of checking transactions per month), and ckacct (has checking account, 0 or 1).

Step 0 shows that all of the original 16 independent variables are excluded from the model, the starting point for forward stepwise regression. In Step 1, the Model Assessment report shows that the variable avg_sv_tran_amt added to the model, along with the constant term, with all other variables still excluded from the model. For the sake of brevity, Steps 2 through 6 are not shown. Then in Step 7, the variable ckacct is the last variable added to the model.

At this point, the stepwise algorithm stops because there are no more variables qualifying to be added or removed from the model, and the Reweighted Least Squares Logistic Regression and Variables in Model reports are given, just as they would be if these variables were analyzed without stepwise requested. Finally, the Prediction Success Table, Multi-Threshold Success Table, and Cumulative Lift Table are given, as requested, to complete the analysis.

  1. Parameterize a Logistic Regression Analysis as follows:
    • Available Table — twm_customer_analysis
    • Dependent Variable — cc_acct

    • Independent Variables
      • income
      • age
      • years_with_bank
      • nbr_children
      • female
      • single
      • married
      • separated
      • ckacct
      • svacct
      • avg_ck_bal
      • avg_sv_bal
      • avg_ck_tran_amt
      • avg_ck_tran_cnt
      • avg_sv_tran_amt
      • avg_sv_tran_cnt
    • Convergence Criterion — 0.001
    • Maximum Iterations — 100
    • Response Value — 1
    • Include Constant — Enabled
    • Prediction Success Table — Enabled
    • Multi-Threshold Success Table — Enabled
      • Threshold Begin — 0
      • Threshold End — 1
      • Threshold Increment — 0.05
    • Cumulative Lift Table — Enabled
    • Use Stepwise Regression — Enabled
      • Criterion to Enter — 0.05
      • Criterion to Remove — 0.05
      • Direction — Forward
    • Optimization Type — Automatic
  2. Run the analysis.
  3. Click Results when it completes.

    For this example, the Logistic Regression Analysis generated the following pages. A single click on each page name populates Results with the item.

    Logistic Regression Report
    Total Observations: 747
    Total Iterations: 9
    Initial Log Likelihood: -517.7749
    Final Log Likelihood: -244.4929
    Likelihood Ratio Test G Statistic: 546.5641
    Chi-Square Degrees of Freedom: 7.0000
    Chi-Square Value: 14.0671
    Chi-Square Probability: 0.0000
    McFadden's Pseudo R-Squared: 0.5278
    Dependent Variable: ccacct
    Dependent Response Value: 1
    Total Distinct Values: 2
    Execution Summary
    6/20/2004 2:19:02 PM Stepwise Logistic Regression Running.
    6/20/2004 2:19:03 PM Step 0 Complete
    6/20/2004 2:19:03 PM Step 1 Complete
    6/20/2004 2:19:03 PM Step 2 Complete
    6/20/2004 2:19:03 PM Step 3 Complete
    6/20/2004 2:19:03 PM Step 4 Complete
    6/20/2004 2:19:04 PM Step 5 Complete
    6/20/2004 2:19:04 PM Step 6 Complete
    6/20/2004 2:19:04 PM Step 7 Complete
    6/20/2004 2:19:04 PM Log Likelihood: -517.78094387828
    6/20/2004 2:19:04 PM Log Likelihood: -354.38456690558
    6/20/2004 2:19:04 PM Log Likelihood: -287.159936852895
    6/20/2004 2:19:04 PM Log Likelihood: -258.834546711159
    6/20/2004 2:19:04 PM Log Likelihood: -247.445356552554
    6/20/2004 2:19:04 PM Log Likelihood: -244.727173470081
    6/20/2004 2:19:04 PM Log Likelihood: -244.49467692232
    6/20/2004 2:19:04 PM Log Likelihood: -244.492882024522
    6/20/2004 2:19:04 PM Log Likelihood: -244.492881920691
    6/20/2004 2:19:04 PM Computing Multi-Threshold Success Table
    6/20/2004 2:19:06 PM Computing Prediction Success Table
    6/20/2004 2:19:06 PM Computing Cumulative Lift Table
    6/20/2004 2:19:07 PM Creating Report
    Variables
    Column Name B Coefficient Standard Error Wald Statistic T Statistic P-Value Odds Ratio Lower Upper Partial R Standardized Coefficient
    (Constant) -1.1864 0.2733 18.8462 -4.3412 0.0000 N/A N/A N/A N/A N/A
    avg_sv_tran_amt 0.0308 0.0038 64.7039 8.0439 0.0000 1.0312 1.0235 1.0390 0.2461 2.0618
    avg_sv_tran_cnt -1.1921 0.2133 31.2295 -5.5883 0.0000 0.3036 0.1999 0.4612 -0.1680 -0.9144
    avg_sv_bal 0.0031 0.0006 31.1687 5.5829 0.0000 1.0031 1.0020 1.0042 0.1678 2.6259
    married -0.6225 0.2334 7.1152 -2.6674 0.0078 0.5366 0.3396 0.8478 -0.0703 -0.1715
    years_with_bank -0.0981 0.0443 4.9149 -2.2170 0.0269 0.9066 0.8312 0.9887 -0.0531 -0.1447
    avg_ck_tran_cnt -0.0228 0.0096 5.6088 -2.3683 0.0181 0.9775 0.9592 0.9961 -0.0590 -0.1792
    ckacct 0.4657 0.2365 3.8760 1.9688 0.0494 1.5931 1.0021 2.5326 0.0426 0.1273