Financial institutions try to predict defaulting customers by analyzing previous transaction data. To reduce the number of defaulting customers, the financial institution may suggest default prevention initiatives to customers.
Loan status updates depend on customer payments. You can use loan status updates to build a Hidden Markov Model (HMM) to predict loan defaults. The following table shows the possible loan statuses and their observation symbols. The observation symbols are used to build the HMM.
Loan Status | Symbol |
---|---|
current | 1 |
late | 2 |
one month late | 3 |
two months late | 4 |
three months late | 5 |
four months late | 6 |
defaulted | 7 |
paid | 8 |
Use all transaction sequences ending with "paid" to train one model and all sequences ending with "defaulted" to train a second model. Evaluate the current sequence of statuses for a customer against both models. Base the default prediction on the model that gives a higher probability.
The hidden states of the HMM inherently indicate the financial health of the customer. A hidden state with a high probability of emission of defaulted status indicates poor financial health, while a hidden state with high probability of emission of paid status indicates good financial health.
Input
The Vertices table, loan_prediction, contains data for training the two models. The model_id column has loan status—1 for a paid loan, 2 for a defaulted loan. Rows with model_id=1 are used to train the first model; rows with model_id=2 are used to train the second model. Defaulted loans have observed_id=7; paid loans end have observed_id=8. The seq_vertex_id column provides the ordering of the symbols in the sequences.
model_id | seq_id | seq_vertex_id | observed_id |
---|---|---|---|
1 | 1 | 0 | 1 |
1 | 1 | 1 | 1 |
1 | 1 | 2 | 1 |
1 | 1 | 3 | 1 |
1 | 1 | 4 | 1 |
1 | 1 | 5 | 1 |
1 | 1 | 6 | 1 |
1 | 1 | 7 | 1 |
1 | 1 | 8 | 1 |
1 | 1 | 9 | 1 |
1 | 1 | 10 | 1 |
1 | 1 | 11 | 1 |
1 | 1 | 12 | 4 |
1 | 1 | 13 | 5 |
1 | 1 | 14 | 6 |
1 | 1 | 15 | 6 |
1 | 1 | 16 | 6 |
1 | 1 | 17 | 7 |
... | ... | ... | ... |
2 | 1 | 0 | 1 |
2 | 1 | 1 | 1 |
2 | 1 | 2 | 1 |
2 | 1 | 3 | 1 |
2 | 1 | 4 | 1 |
2 | 1 | 5 | 1 |
2 | 1 | 6 | 1 |
2 | 1 | 7 | 1 |
2 | 1 | 8 | 1 |
2 | 1 | 9 | 1 |
2 | 1 | 10 | 1 |
2 | 1 | 11 | 1 |
2 | 1 | 12 | 8 |
SQL Call
DROP TABLE pi_loan; DROP TABLE A_loan; DROP TABLE B_loan; SELECT * FROM HMMUnsupervised ( ON loan_prediction AS Vertices PARTITION BY model_id, seq_id ORDER BY seq_vertex_id OUT TABLE InitStateTable (pi_loan) OUT TABLE StateTransitionTable (A_loan) OUT TABLE EmissionTable (B_loan) USING ModelColumn ('model_id') SeqColumn ('seq_id') ObservationColumn ('observed_id') HiddenStateNum (3) InitMethods ('random') ) AS dt;
SELECT * FROM HMMUnsupervised( ON loan_prediction AS Vertices PARTITION BY model_id, seq_id ORDER BY seq_vertex_id OUT TABLE InitStateTable(pi_loan) OUT TABLE StateTransitionTable(A_loan) OUT TABLE EmissionTable(B_loan) USING ModelColumn('model_id') SeqColumn('seq_id') ObservationColumn('observed_id') HiddenStateNum(3) InitMethods('random') ) AS dt ;
Output
message --------------------------------------------------------------------------------------------------- HMM models will be saved to the output tables once the training process is successfully completed. SELECT * FROM pi_loan ORDER BY 1, 2, 3;
model_id state probability -------- ----- ---------------------- 1 0 1.9046716874764173E-11 1 1 1.3028751241725672E-12 1 2 0.9999999999796504 2 0 0.017791936081545466 2 1 0.012540722577073648 2 2 0.9696673413413809
SELECT * FROM A_loan ORDER BY 1, 2, 3;
model_id from_state to_state probability -------- ---------- -------- -------------------- 1 0 0 0.034350221654187815 1 0 1 0.9656493269833311 1 0 2 4.513624811896442E-7 1 1 0 0.943657329840527 1 1 1 0.05633694510467473 1 1 2 5.725054798558155E-6 1 2 0 0.016131381366779184 1 2 1 0.11988350200836678 1 2 2 0.8639851166248538 2 0 0 0.006066428449697642 2 0 1 0.7535894555154948 2 0 2 0.24034411603480746 2 1 0 0.6291758649110152 2 1 1 0.13451837399524283 2 1 2 0.23630576109374207 2 2 0 0.09591775004852725 2 2 1 0.1484849472359561 2 2 2 0.7555973027155165
SELECT * FROM B_loan ORDER BY 1, 2, 3;
model_id state observed probability -------- ----- -------- ---------------------- 1 0 1 0.008891457654605486 1 0 2 5.468041729835976E-9 1 0 3 4.1391822187081835E-9 1 0 4 0.27160363157687994 1 0 5 2.2063825107513166E-8 1 0 6 0.6142876565496275 1 0 7 0.10521722254783777 1 0 8 0.019644702389412974 1 1 1 6.76987232427815E-5 1 1 2 0.06232468547238135 1 1 3 0.13122140151624764 1 1 4 0.021965148206372923 1 1 5 0.19092781165768866 1 1 6 0.4713908643412146 1 1 7 0.12210239008285201 1 1 8 0.19138971490485823 1 2 1 0.9879618077382735 1 2 2 8.449631929911974E-4 1 2 3 0.011077522833205329 1 2 4 1.1570559134744652E-4 1 2 5 1.356901157720188E-10 1 2 6 4.1261346059402553E-10 1 2 7 9.587888533000593E-11 1 2 8 0.08108497126039647 2 0 1 0.5958941361046881 2 0 2 1.3089471188240957E-4 2 0 3 1.2661579814008092E-4 2 0 4 0.2505368195493097 2 0 5 6.596053412122555E-10 2 0 6 0.08043035096135323 2 0 7 0.17227888424304832 2 0 8 0.0728811822150213 2 1 1 0.4440144412391816 2 1 2 0.08543890999226664 2 1 3 0.2104246992667131 2 1 4 6.3966539144571E-7 2 1 5 0.09245389189993217 2 1 6 0.03274628512838264 2 1 7 0.030281066209571193 2 1 8 0.1349211328081325 2 2 1 0.9700830457796161 2 2 2 0.0029351316912068154 2 2 3 0.0022187354229495494 2 2 4 6.829831755368084E-5 2 2 5 5.944648340203616E-6 2 2 6 1.7075168214740832E-4 2 2 7 0.08820983879215442 2 2 8 0.024518092458186168
Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.