1.0 - 8.00 - HMMUnsupervised Example: Loan Default Prediction - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

Financial institutions try to predict defaulting customers by analyzing previous transaction data. To reduce the number of defaulting customers, the financial institution may suggest default prevention initiatives to customers.

Loan status updates depend on customer payments. You can use loan status updates to build a Hidden Markov Model (HMM) to predict loan defaults. The following table shows the possible loan statuses and their observation symbols. The observation symbols are used to build the HMM.

Loan Status Symbol
current 1
late 2
one month late 3
two months late 4
three months late 5
four months late 6
defaulted 7
paid 8

Use all transaction sequences ending with "paid" to train one model and all sequences ending with "defaulted" to train a second model. Evaluate the current sequence of statuses for a customer against both models. Base the default prediction on the model that gives a higher probability.

The hidden states of the HMM inherently indicate the financial health of the customer. A hidden state with a high probability of emission of defaulted status indicates poor financial health, while a hidden state with high probability of emission of paid status indicates good financial health.

Input

The input table, loan_prediction, contains data for training the two models. The model_id column has loan status—1 for a paid loan, 2 for a defaulted loan. Rows with model_id=1 are used to train the first model; rows with model_id=2 are used to train the second model. Defaulted loans have observed_id=7; paid loans end have observed_id=8. The seq_vertex_id column provides the ordering of the symbols in the sequences.

loan_prediction
model_id seq_id seq_vertex_id observed_id
1 1 0 1
1 1 1 1
1 1 2 1
1 1 3 1
1 1 4 1
1 1 5 1
1 1 6 1
1 1 7 1
1 1 8 1
1 1 9 1
1 1 10 1
1 1 11 1
1 1 12 4
1 1 13 5
1 1 14 6
1 1 15 6
1 1 16 6
1 1 17 7
... ... ... ...
2 1 0 1
2 1 1 1
2 1 2 1
2 1 3 1
2 1 4 1
2 1 5 1
2 1 6 1
2 1 7 1
2 1 8 1
2 1 9 1
2 1 10 1
2 1 11 1
2 1 12 8

SQL Call

DROP TABLE pi_loan;
DROP TABLE A_loan;
DROP TABLE B_loan;

SELECT * FROM HMMUnsupervised (
  ON loan_prediction AS vertices
    PARTITION BY model_id, seq_id
    ORDER BY seq_vertex_id
  OUT TABLE InitStateTable (pi_loan)
  OUT TABLE StateTransitionTable (A_loan)
  OUT TABLE EmissionTable (B_loan)
  USING 
  ModelColumn ('model_id')
  SeqColumn ('seq_id')
  ObservationColumn ('observed_id')
  HiddenStateNum (3)
  InitMethods ('random')
) AS dt;

Output

message
HMM models will be saved to the output tables once the training process is successfully completed.

This query returns the following table:

SELECT * FROM pi_loan ORDER BY 1, 2, 3;
pi_loan
model_id state probability
1 0 3.27859360752863e-09
1 1 6.31101164254861e-11
1 2 0.999999996658296
2 0 0.0281195637034241
2 1 0.0342773193334441
2 2 0.937603116963132

This query returns the following table:

SELECT * FROM A_loan ORDER BY 1, 2, 3;
A_loan
model_id from_state to_state probability
1 0 0 0.0640625915875049
1 0 1 0.935924962187232
1 0 2 1.24462252627054e-05
1 1 0 0.965881665949451
1 1 1 0.034116494612011
1 1 2 1.83943853841951e-06
1 2 0 0.0200526405800346
1 2 1 0.117982394786916
1 2 2 0.86196496463305
2 0 0 0.0135275442521027
2 0 1 0.831238578948845
2 0 2 0.155233876799053
2 1 0 0.650183335972221
2 1 1 0.176213616071229
2 1 2 0.17360304795655
2 2 0 0.143610221301369
2 2 1 0.259795319424887
2 2 2 0.596594459273744

This query returns the following table:

SELECT * FROM B_loan ORDER BY 1, 2, 3;
B_loan
model_id state observed probability
1 0 1 0.0278489806401534
1 0 2 2.13490305615227e-08
1 0 3 2.16452516177902e-08
1 0 4 0.303429629806469
1 0 5 4.02001410009148e-08
1 0 6 0.668721306358954
1 0 7 9.38935391851262e-218
1 0 8 1.07065392354079e-218
1 1 1 8.06980973060956e-05
1 1 2 0.0692752214932328
1 1 3 0.156873643540954
1 1 4 0.0227441708323421
1 1 5 0.213428380115597
1 1 6 0.537597885920567
1 1 7 1.71260624384861e-218
1 1 8 1.08244279935844e-217
1 2 1 0.993584745503789
1 2 2 0.0010870231448872
1 2 3 0.00531194778573366
1 2 4 1.62821429202875e-05
1 2 5 6.37778366603837e-11
1 2 6 1.35889208728403e-09
1 2 7 5.52574644335947e-220
1 2 8 5.07942195209965e-220
2 0 1 0.742492680184544
2 0 2 2.65054280803536e-06
2 0 3 3.46492926202034e-05
2 0 4 0.196209179240367
2 0 5 1.73539886740778e-10
2 0 6 0.0612608405661211
2 0 7 9.55020607438646e-216
2 0 8 1.0889956532584e-216
2 1 1 0.686900699423504
2 1 2 0.066121858529193
2 1 3 0.155155089264256
2 1 4 9.67394943767393e-06
2 1 5 0.0668633786585835
2 1 6 0.0249493001750253
2 1 7 1.5969261489751e-218
2 1 8 1.00932786930684e-217
2 2 1 0.998254250327946
2 2 2 0.000671519946212711
2 2 3 0.000761504268610617
2 2 4 0.00012244326239558
2 2 5 2.2387837278705e-06
2 2 6 0.000188043411107735
2 2 7 3.45823602960767e-219
2 2 8 3.17890807773867e-219