1.1 - 8.10 - HMMUnsupervised Example: Loan Default Prediction - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

Financial institutions try to predict defaulting customers by analyzing previous transaction data. To reduce the number of defaulting customers, the financial institution may suggest default prevention initiatives to customers.

Loan status updates depend on customer payments. You can use loan status updates to build a Hidden Markov Model (HMM) to predict loan defaults. The following table shows the possible loan statuses and their observation symbols. The observation symbols are used to build the HMM.

Loan Status Symbol
current 1
late 2
one month late 3
two months late 4
three months late 5
four months late 6
defaulted 7
paid 8

Use all transaction sequences ending with "paid" to train one model and all sequences ending with "defaulted" to train a second model. Evaluate the current sequence of statuses for a customer against both models. Base the default prediction on the model that gives a higher probability.

The hidden states of the HMM inherently indicate the financial health of the customer. A hidden state with a high probability of emission of defaulted status indicates poor financial health, while a hidden state with high probability of emission of paid status indicates good financial health.

Input

The Vertices table, loan_prediction, contains data for training the two models. The model_id column has loan status—1 for a paid loan, 2 for a defaulted loan. Rows with model_id=1 are used to train the first model; rows with model_id=2 are used to train the second model. Defaulted loans have observed_id=7; paid loans end have observed_id=8. The seq_vertex_id column provides the ordering of the symbols in the sequences.

Vertices: loan_prediction
model_id seq_id seq_vertex_id observed_id
1 1 0 1
1 1 1 1
1 1 2 1
1 1 3 1
1 1 4 1
1 1 5 1
1 1 6 1
1 1 7 1
1 1 8 1
1 1 9 1
1 1 10 1
1 1 11 1
1 1 12 4
1 1 13 5
1 1 14 6
1 1 15 6
1 1 16 6
1 1 17 7
... ... ... ...
2 1 0 1
2 1 1 1
2 1 2 1
2 1 3 1
2 1 4 1
2 1 5 1
2 1 6 1
2 1 7 1
2 1 8 1
2 1 9 1
2 1 10 1
2 1 11 1
2 1 12 8

SQL Call

DROP TABLE pi_loan;
DROP TABLE A_loan;
DROP TABLE B_loan;

SELECT * FROM HMMUnsupervised (
ON loan_prediction AS Vertices
PARTITION BY model_id, seq_id
ORDER BY seq_vertex_id
OUT TABLE InitStateTable (pi_loan)
OUT TABLE StateTransitionTable (A_loan)
OUT TABLE EmissionTable (B_loan)
USING
ModelColumn ('model_id')
SeqColumn ('seq_id')
ObservationColumn ('observed_id')
HiddenStateNum (3)
InitMethods ('random')
) AS dt;
SELECT * FROM HMMUnsupervised(
ON loan_prediction AS Vertices PARTITION BY model_id, seq_id
ORDER BY seq_vertex_id
OUT TABLE InitStateTable(pi_loan)
OUT TABLE StateTransitionTable(A_loan)
OUT TABLE EmissionTable(B_loan)
USING
ModelColumn('model_id')
SeqColumn('seq_id')
ObservationColumn('observed_id')
HiddenStateNum(3)
InitMethods('random')
) AS dt ;

Output

message
---------------------------------------------------------------------------------------------------
HMM models will be saved to the  output tables once the training process is successfully completed.

SELECT * FROM pi_loan ORDER BY 1, 2, 3;
model_id state probability
-------- ----- ----------------------
1 0     1.9046716874764173E-11
1 1     1.3028751241725672E-12
1 2         0.9999999999796504
2 0       0.017791936081545466
2 1       0.012540722577073648
2 2         0.9696673413413809
SELECT * FROM A_loan ORDER BY 1, 2, 3;
model_id from_state to_state probability
-------- ---------- -------- --------------------
1 0          0        0.034350221654187815
1 0          1          0.9656493269833311
1 0          2        4.513624811896442E-7
1 1          0           0.943657329840527
1 1          1         0.05633694510467473
1 1          2        5.725054798558155E-6
1 2          0        0.016131381366779184
1 2          1         0.11988350200836678
1 2          2          0.8639851166248538
2 0          0        0.006066428449697642
2 0          1          0.7535894555154948
2 0          2         0.24034411603480746
2 1          0          0.6291758649110152
2 1          1         0.13451837399524283
2 1          2         0.23630576109374207
2 2          0         0.09591775004852725
2 2          1          0.1484849472359561
2 2          2          0.7555973027155165
SELECT * FROM B_loan ORDER BY 1, 2, 3;
model_id state observed probability
-------- ----- -------- ----------------------
1 0        1       0.008891457654605486
1 0        2       5.468041729835976E-9
1 0        3      4.1391822187081835E-9
1 0        4        0.27160363157687994
1 0        5      2.2063825107513166E-8
1 0        6         0.6142876565496275
1 0        7        0.10521722254783777
1 0        8       0.019644702389412974
1 1        1        6.76987232427815E-5
1 1        2        0.06232468547238135
1 1        3        0.13122140151624764
1 1        4       0.021965148206372923
1 1        5        0.19092781165768866
1 1        6         0.4713908643412146
1 1        7        0.12210239008285201
1 1        8        0.19138971490485823
1 2        1         0.9879618077382735
1 2        2       8.449631929911974E-4
1 2        3       0.011077522833205329
1 2        4      1.1570559134744652E-4
1 2        5      1.356901157720188E-10
1 2        6     4.1261346059402553E-10
1 2        7      9.587888533000593E-11
1 2        8        0.08108497126039647
2 0        1         0.5958941361046881
2 0        2      1.3089471188240957E-4
2 0        3      1.2661579814008092E-4
2 0        4         0.2505368195493097
2 0        5      6.596053412122555E-10
2 0        6        0.08043035096135323
2 0        7        0.17227888424304832
2 0        8         0.0728811822150213
2 1        1         0.4440144412391816
2 1        2        0.08543890999226664
2 1        3         0.2104246992667131
2 1        4         6.3966539144571E-7
2 1        5        0.09245389189993217
2 1        6        0.03274628512838264
2 1        7       0.030281066209571193
2 1        8         0.1349211328081325
2 2        1         0.9700830457796161
2 2        2      0.0029351316912068154
2 2        3      0.0022187354229495494
2 2        4       6.829831755368084E-5
2 2        5       5.944648340203616E-6
2 2        6      1.7075168214740832E-4
2 2        7        0.08820983879215442
2 2        8       0.024518092458186168

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.