Run AutoFraud for fraud detection problem with early stopping timer and metrics threshold - Example 8: Run AutoFraud for fraud detection problem with early stopping timer and metrics threshold - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-12-05
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
This example predicts whether the transaction is fraud or not based on different factors. Run AutoML to get the best performing model with the following specifications:
  • Set early stopping criteria, i.e., time limit to 100 sec and performance metrics MICRO-RECALL threshold value to 0.1.
  • Opt for verbose level 2 to get detailed logging.
  1. Load the online fraud dataset.
    >>> load_example_data('teradataml','payment_fraud_dataset')
    >>> fraud_df = DataFrame('payment_fraud_dataset')
    >>> fraud_sample = fraud_df.sample(frac = [0.8, 0.2])
    >>> fraud_train= fraud_sample[fraud_sample['sampleid'] == 1].drop('sampleid', axis=1)
    >>> fraud_test = fraud_sample[fraud_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoFraud instance.
    >>> fd = AutoFraud(verbose=2,
    >>>              max_runtime_secs=100,
    >>>              stopping_metric='MICRO-RECALL',
    >>>              stopping_tolerance=0.1,
    >>>              seed=42)
  3. Fit the data.
    >>>fd.fit(fraud_train,fraud_train.isFraud)
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:20:34,546 | INFO     | Feature Exploration started
    2025-11-04 04:20:34,546 | INFO     | Data Overview:
    2025-11-04 04:20:34,651 | INFO     | Total Rows in the data: 8000
    2025-11-04 04:20:34,693 | INFO     | Total Columns in the data: 10
    2025-11-04 04:20:35,626 | INFO     | Column Summary:
           ColumnName                         Datatype  NonNullCount  NullCount  BlankCount  ZeroCount  PositiveCount  NegativeCount  NullPercentage  NonNullPercentage
    0   oldbalanceOrg                            FLOAT          7991          9         NaN     2314.0         5677.0            0.0          0.1125            99.8875
    1    payment_type  VARCHAR(40) CHARACTER SET LATIN          8000          0         0.0        NaN            NaN            NaN          0.0000           100.0000
    2          amount                            FLOAT          7984         16         NaN        0.0         7984.0            0.0          0.2000            99.8000
    3        nameOrig  VARCHAR(40) CHARACTER SET LATIN          8000          0         0.0        NaN            NaN            NaN          0.0000           100.0000
    4  newbalanceDest                            FLOAT          7992          8         NaN     3839.0         4153.0            0.0          0.1000            99.9000
    5         isFraud                           BIGINT          8000          0         NaN     7849.0          151.0            0.0          0.0000           100.0000
    6  newbalanceOrig                            FLOAT          7994          6         NaN     4068.0         3926.0            0.0          0.0750            99.9250
    7  oldbalanceDest                            FLOAT          7993          7         NaN     3969.0         4024.0            0.0          0.0875            99.9125
    8        nameDest  VARCHAR(40) CHARACTER SET LATIN          8000          0         0.0        NaN            NaN            NaN          0.0000           100.0000
    9            step                           BIGINT          8000          0         NaN        0.0         8000.0            0.0          0.0000           100.0000
    2025-11-04 04:20:36,886 | INFO     | Statistics of Data:
            ATTRIBUTE StatName    StatValue
    0         isFraud  MAXIMUM         1.00
    1  newbalanceOrig  MINIMUM         0.00
    2  newbalanceOrig  MAXIMUM  13000000.00
    3          amount    COUNT      7984.00
    4          amount  MAXIMUM  10000000.00
    5  newbalanceDest    COUNT      7992.00
    6  newbalanceDest  MINIMUM         0.00
    7  newbalanceDest  MAXIMUM  34700000.00
    8          amount  MINIMUM         0.65
    9  newbalanceOrig    COUNT      7994.00
    2025-11-04 04:20:37,862 | INFO     | Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    payment_type              5
    nameOrig                  7996
    nameDest                  7035
    2025-11-04 04:20:42,109 | INFO     | Futile columns in dataset:
      ColumnName
    0   nameOrig
    1   nameDest
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 04:20:48,618 | INFO     | Columns with outlier percentage :-
           ColumnName  OutlierPercentage
    0   oldbalanceOrg             1.1000
    1  newbalanceDest             1.0875
    2  newbalanceOrig             1.0625
    3          amount             2.1750
    4  oldbalanceDest             1.0250
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:20:49,708 | INFO     | Feature Engineering started ...
    2025-11-04 04:20:49,708 | INFO     | Handling duplicate records present in dataset ...
    2025-11-04 04:20:50,031 | INFO     | Analysis completed. No action taken.
    2025-11-04 04:20:50,031 | INFO     | Total time to handle duplicate records: 0.32 sec
    2025-11-04 04:20:50,031 | INFO     | Handling less significant features from data ...
    2025-11-04 04:20:55,929 | INFO     | Removing Futile columns:
    ['nameOrig', 'nameDest']
    2025-11-04 04:20:55,929 | INFO     | Sample of Data after removing Futile columns:
       step payment_type     amount  oldbalanceOrg  newbalanceOrig  oldbalanceDest  newbalanceDest  isFraud  automl_id
    0    38      CASH_IN  315951.64      104392.00       420343.64        77879.17            0.00        0         12
    1    17     CASH_OUT  182687.76      261008.42        78320.65       213250.74       395938.50        0          9
    2    17     CASH_OUT  158651.10           0.00            0.00       271682.54       430333.64        0         13
    3    40     CASH_OUT  105987.63           0.00            0.00      1234821.83      1340809.46        0          6
    4    40     CASH_OUT  120549.52           0.00            0.00      2381888.19      2502437.71        0         14
    5    19      PAYMENT    8796.56           0.00            0.00            0.00            0.00        0          7
    6    19     CASH_OUT   87459.82           0.00            0.00      2004723.65      2092183.47        0         11
    7    19      PAYMENT    1154.55      321707.75       320553.20            0.00            0.00        0         15
    8    40     CASH_OUT   75672.79           0.00            0.00       478121.23       553794.01        0         10
    9    17     CASH_OUT   31724.00       35311.00         3587.00       954105.58      1535829.31        0          5
    8000 rows X 9 columns
    2025-11-04 04:20:56,443 | INFO     | Total time to handle less significant features: 6.41 sec
    2025-11-04 04:20:56,444 | INFO     | Handling Date Features ...
    2025-11-04 04:20:56,444 | INFO     | Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    2025-11-04 04:20:56,444 | INFO     | Total time to handle date features: 0.00 sec
    2025-11-04 04:20:56,444 | INFO     | Checking Missing values in dataset using AutoFraud function...
    2025-11-04 04:20:57,788 | INFO     | Columns with their missing values:
    newbalanceDest: 8
    oldbalanceDest: 7
    newbalanceOrig: 6
    amount: 16
    oldbalanceOrg: 9
    2025-11-04 04:20:59,787 | INFO     | Flagging these columns for imputation:
    ['newbalanceDest', 'oldbalanceDest', 'newbalanceOrig', 'amount', 'oldbalanceOrg']
    2025-11-04 04:20:59,787 | INFO     | Total time to find missing values in data using AutoFraud : 3.34 sec
    2025-11-04 04:20:59,787 | INFO     | Imputing Missing Values using SimpleImputeFit partition column...
    2025-11-04 04:21:00,168 | INFO     | Columns with their imputation method:
    newbalanceDest: median
    oldbalanceDest: median
    newbalanceOrig: median
    amount: median
    oldbalanceOrg: median
    2025-11-04 04:21:04,228 | INFO     | Sample of dataset after Imputation:
       step payment_type     amount  oldbalanceOrg  newbalanceOrig  oldbalanceDest  newbalanceDest  isFraud  automl_id
    0    17     CASH_OUT  158651.10           0.00            0.00       271682.54       430333.64        0         13
    1    40     CASH_OUT   75672.79           0.00            0.00       478121.23       553794.01        0         10
    2    40     CASH_OUT  120549.52           0.00            0.00      2381888.19      2502437.71        0         14
    3    19      PAYMENT    8796.56           0.00            0.00            0.00            0.00        0          7
    4    19      PAYMENT    1154.55      321707.75       320553.20            0.00            0.00        0         15
    5    61     TRANSFER  475368.94      475368.94            0.00            0.00            0.00        1          4
    6    61     CASH_OUT  475368.94      475368.94            0.00      1348026.73      1823395.67        1          8
    7    38      CASH_IN  315951.64      104392.00       420343.64        77879.17            0.00        0         12
    8    19     CASH_OUT   87459.82           0.00            0.00      2004723.65      2092183.47        0         11
    9    40     CASH_OUT  105987.63           0.00            0.00      1234821.83      1340809.46        0          6
    8000 rows X 9 columns
    2025-11-04 04:21:05,312 | INFO     | Time taken to perform imputation: 5.52 sec
    2025-11-04 04:21:05,312 | INFO     | Performing target encoding for categorical columns ...
    2025-11-04 04:21:12,951 | INFO     | Target Encoding completed for categorical columns using CBM_BETA.
    2025-11-04 04:21:12,951 | INFO     | Target Encoding these Columns:
    ['payment_type']
    2025-11-04 04:21:12,951 | INFO     | Sample of dataset after performing target encoding:
                  newbalanceDest  oldbalanceDest  isFraud  newbalanceOrig  oldbalanceOrg  automl_id     amount  step
    payment_type
    0.000005                0.00            0.00        0        90210.24       94916.53      10656    4706.29     2
    0.000005                0.00            0.00        0       115491.65      122275.00       5899    6783.35    23
    0.000005                0.00            0.00        0        11124.12       15077.00       3327    3952.88     6
    0.000005                0.00            0.00        0            0.00           0.00        419    5190.06    19
    0.000005                0.00            0.00        0            0.00           0.00       4911    7338.27     6
    0.000005                0.00            0.00        0       279754.74      287908.00       5591    8153.26    23
    0.034876          1937458.07      1886690.69        0       203571.62      254339.00         32   50767.38    38
    0.034876           245202.78            0.00        0            0.00        8626.00         60  245202.78    38
    0.034876           294692.87            0.00        0            0.00       23171.00         64  294692.87    38
    0.034876           635663.11       630286.84        0        23439.73       28816.00         76    5376.27    38
    8000 rows X 9 columns
    2025-11-04 04:21:13,069 | INFO     | Time taken to encode the columns: 7.76 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:21:13,069 | INFO     | Data preparation started ...
    2025-11-04 04:21:13,069 | INFO     | AutoFraud Outlier preprocessing using Percentile...
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 04:21:17,176 | INFO     | Columns with outlier percentage :-
           ColumnName  OutlierPercentage
    0  newbalanceDest             1.0000
    1  newbalanceOrig             1.0000
    2       automl_id             1.9875
    3  oldbalanceDest             1.0000
    4   oldbalanceOrg             1.0000
    5          amount             1.9875
    2025-11-04 04:21:17,830 | INFO     | Replacing outliers with median:
    ['newbalanceDest', 'oldbalanceOrg', 'amount', 'oldbalanceDest', 'newbalanceOrig', 'automl_id']
    2025-11-04 04:21:21,201 | INFO     | Sample of dataset after replacing outliers with MEDIAN:
                  newbalanceDest  oldbalanceDest  isFraud  newbalanceOrig  oldbalanceOrg  automl_id    amount  step
    payment_type
    0.000005                 0.0             0.0        0        29336.39       30661.00       3216   1324.61    24
    0.000005                 0.0             0.0        0        41370.75       45837.00       3255   4466.25     6
    0.000005                 0.0             0.0        0        49632.45       60989.84       4003  11357.40     2
    0.000005                 0.0             0.0        0        23602.75       25474.00       1668   1871.25     5
    0.000005                 0.0             0.0        0         3246.38        8201.00        542   4954.62    40
    0.000005                 0.0             0.0        0        89752.25       91303.00       3160   1550.75     5
    0.000005                 0.0             0.0        0            0.00        5648.85       2305  34220.19     9
    0.000005                 0.0             0.0        0            0.00        5383.00       2098  13620.42    22
    0.000005                 0.0             0.0        0         5687.72        9569.61       4003   3881.89    19
    0.000005                 0.0             0.0        0       102685.01      104778.00       2168   2092.99     5
    8000 rows X 9 columns
    2025-11-04 04:21:21,315 | INFO     | Time Taken by Outlier processing: 8.25 sec
    2025-11-04 04:21:21,316 | INFO     | Checking imbalance data ...
    2025-11-04 04:21:21,400 | INFO     | Imbalance Found.
    2025-11-04 04:21:21,400 | INFO     | Handling data imbalance using SMOTE ...
    2025-11-04 04:21:25,310 | INFO     | Completed data imbalance handling.
    2025-11-04 04:21:26,852 | INFO     | Feature selection using rfe ...
    2025-11-04 04:21:40,940 | INFO     | feature selected by RFE:
    ['step', 'payment_type', 'newbalanceDest', 'oldbalanceDest', 'newbalanceOrig', 'oldbalanceOrg', 'amount']
    2025-11-04 04:21:40,942 | INFO     | Total time taken by feature selection: 14.09 sec
    2025-11-04 04:21:41,475 | INFO     | Scaling Features of rfe data ...
    2025-11-04 04:21:42,845 | INFO     | columns that will be scaled:
    ['r_step', 'r_payment_type', 'r_newbalanceDest', 'r_oldbalanceDest', 'r_newbalanceOrig', 'r_oldbalanceOrg', 'r_amount']
    2025-11-04 04:21:44,964 | INFO     | Dataset sample after scaling:
       automl_id  isFraud    r_step  r_payment_type  r_newbalanceDest  r_oldbalanceDest  r_newbalanceOrig  r_oldbalanceOrg  r_amount
    0          6        0  0.042553        0.000000          0.000000          0.000000          0.020752         0.021417  0.001921
    1          8        1  0.085106        0.556976          0.047131          0.000000          0.000000         0.049108  0.063790
    2          9        0  0.202128        0.615548          0.052242          0.063600          0.000000         0.049715  0.477112
    3         10        0  0.191489        0.000000          0.000000          0.000000          0.001149         0.001956  0.005329
    4         12        1  0.648936        0.451544          0.001332          0.000000          0.000000         0.003913  0.061007
    5         13        1  0.776596        0.686901          0.152266          0.146592          0.000000         0.129608  0.340321
    6         11        0  0.393617        0.371672          0.120486          0.053914          0.000000         0.006135  0.900905
    7          7        0  0.393617        0.371672          0.049388          0.000000          0.000000         0.004736  0.559428
    8          5        1  0.170213        0.615548          0.332159          0.000000          0.000000         0.265401  0.058486
    9          4        1  0.925532        0.407881          0.138873          0.010018          0.000000         0.267048  0.058486
    8735 rows X 9 columns
    2025-11-04 04:21:46,501 | INFO     | Total time taken by feature scaling: 5.03 sec
    2025-11-04 04:21:46,502 | INFO     | Scaling Features of pca data ...
    2025-11-04 04:21:47,373 | INFO     | columns that will be scaled:
    ['payment_type', 'newbalanceDest', 'oldbalanceDest', 'newbalanceOrig', 'oldbalanceOrg', 'amount', 'step']
    2025-11-04 04:21:49,717 | INFO     | Dataset sample after scaling:
       automl_id  isFraud  payment_type  newbalanceDest  oldbalanceDest  newbalanceOrig  oldbalanceOrg    amount      step
    0         13        1      0.687473        0.152266        0.146592        0.000000       0.129608  0.340321  0.776596
    1          8        1      0.557764        0.047131        0.000000        0.000000       0.049108  0.063790  0.085106
    2         12        1      0.452046        0.001332        0.000000        0.000000       0.003913  0.061007  0.648936
    3       8607        1      0.448108        0.000000        0.004310        0.000000       0.053291  0.353205  0.723404
    4       8615        1      0.032317        0.118966        0.064117        0.000262       0.047860  0.570179  0.265957
    5      23410        1      0.553099        0.000000        0.014718        0.000000       0.154165  0.562081  0.872340
    6      23406        1      0.553099        0.002574        0.000000        0.000000       0.023212  0.141818  0.234043
    7      23414        1      0.567007        0.268718        0.290448        0.000000       0.225627  0.058486  0.521277
    8       8611        1      0.890055        0.091287        0.091824        0.000000       0.019294  0.181024  0.489362
    9          4        1      0.408364        0.138873        0.010018        0.000000       0.267048  0.058486  0.925532
    8735 rows X 9 columns
    2025-11-04 04:21:50,546 | INFO     | Total time taken by feature scaling: 4.04 sec
    2025-11-04 04:21:50,547 | INFO     | Dimension Reduction using pca ...
    2025-11-04 04:21:51,343 | INFO     | PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4']
    2025-11-04 04:21:51,344 | INFO     | Total time taken by PCA: 0.80 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:21:52,201 | INFO     | Model Training started ...
    2025-11-04 04:21:52,244 | INFO     | Hyperparameters used for model training:
    2025-11-04 04:21:52,245 | INFO     | Model: glm
    2025-11-04 04:21:52,245 | INFO     | Hyperparameters: {'response_column': 'isFraud', 'name': 'glm', 'family': 'BINOMIAL', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'iter_num_no_change': (5, 10, 50), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)}
    2025-11-04 04:21:52,245 | INFO     | Total number of models for glm: 1296
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:21:52,245 | INFO     | Model: svm
    2025-11-04 04:21:52,245 | INFO     | Hyperparameters: {'response_column': 'isFraud', 'name': 'svm', 'model_type': 'Classification', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'tolerance': (0.001, 0.01), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'nesterov': True, 'intercept': True, 'iter_num_no_change': (5, 10, 50), 'local_sgd_iterations ': (10, 20), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)}
    2025-11-04 04:21:52,246 | INFO     | Total number of models for svm: 5184
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:21:52,246 | INFO     | Model: knn
    2025-11-04 04:21:52,246 | INFO     | Hyperparameters: {'response_column': 'isFraud', 'name': 'knn', 'model_type': 'Classification', 'k': (3, 5, 6, 8, 10, 12), 'id_column': 'automl_id', 'voting_weight': 1.0}
    2025-11-04 04:21:52,246 | INFO     | Total number of models for knn: 6
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:21:52,247 | INFO     | Model: decision_forest
    2025-11-04 04:21:52,247 | INFO     | Hyperparameters: {'response_column': 'isFraud', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': (0.0, 0.1, 0.2), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'num_trees': (-1,), 'seed': 42}
    2025-11-04 04:21:52,247 | INFO     | Total number of models for decision_forest: 36
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:21:52,247 | INFO     | Model: xgboost
    2025-11-04 04:21:52,247 | INFO     | Hyperparameters: {'response_column': 'isFraud', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2), 'lambda1': (1.0, 0.01, 0.1), 'shrinkage_factor': (0.5, 0.1, 0.3), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'iter_num': (10, 20, 30), 'num_boosted_trees': (-1, 5, 10), 'seed': 42}
    2025-11-04 04:21:52,248 | INFO     | Total number of models for xgboost: 5832
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:21:52,248 | INFO     | Performing hyperparameter tuning ...
                                                                                                                                                                 2025-11-04 04:21:53,716 | INFO     | Model training for glm
    2025-11-04 04:22:07,589 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:22:07,589 | INFO     | Model training for svm
    2025-11-04 04:22:20,837 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:22:20,838 | INFO     | Model training for knn
    2025-11-04 04:23:26,393 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:23:26,393 | INFO     | Model training for decision_forest
    2025-11-04 04:23:50,783 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:23:50,783 | INFO     | Model training for xgboost
    2025-11-04 04:24:05,343 | INFO     | ----------------------------------------------------------------------------------------------------
    2025-11-04 04:24:05,346 | INFO     | Leaderboard
        RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0      1         XGBOOST_2               rfe  0.983400         0.983400  ...      0.940637  0.953124            0.983138         0.983400     0.983163
    1      2  DECISIONFOREST_0               rfe  0.978248         0.978248  ...      0.925238  0.938417            0.977816         0.978248     0.977908
    2      3  DECISIONFOREST_2               rfe  0.977676         0.977676  ...      0.924920  0.936960            0.977245         0.977676     0.977357
    3      4             KNN_4               rfe  0.973097         0.973097  ...      0.884776  0.919385            0.972650         0.973097     0.971854
    4      5             KNN_0               rfe  0.970807         0.970807  ...      0.888515  0.914436            0.969924         0.970807     0.969813
    5      6             KNN_7               pca  0.968517         0.968517  ...      0.872203  0.905663            0.967593         0.968517     0.967063
    6      7             KNN_3               pca  0.966228         0.966228  ...      0.875942  0.901014            0.965004         0.966228     0.965078
    7      8         XGBOOST_3               pca  0.946193         0.946193  ...      0.844744  0.850003            0.945419         0.946193     0.945781
    8      9  DECISIONFOREST_1               pca  0.945621         0.945621  ...      0.831894  0.844827            0.943799         0.945621     0.944547
    9     10  DECISIONFOREST_3               pca  0.945049         0.945049  ...      0.831575  0.843605            0.943311         0.945049     0.944039
    10    11         XGBOOST_0               rfe  0.934745         0.934745  ...      0.928605  0.851790            0.952998         0.934745     0.940204
    11    12             GLM_0               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    12    13             GLM_1               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    13    14             GLM_2               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    14    15             GLM_3               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    15    16             SVM_0               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    16    17             SVM_3               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    17    18             SVM_1               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    18    19             SVM_2               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    19    20         XGBOOST_1               pca  0.895249         0.895249  ...      0.879060  0.781686            0.932727         0.895249     0.907236
    [20 rows x 13 columns]
    20 rows X 13 columns
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 16/16
  4. Display leaderboard.
    >>> fd.leaderboard()
        RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0      1         XGBOOST_2               rfe  0.983400         0.983400  ...      0.940637  0.953124            0.983138         0.983400     0.983163
    1      2  DECISIONFOREST_0               rfe  0.978248         0.978248  ...      0.925238  0.938417            0.977816         0.978248     0.977908
    2      3  DECISIONFOREST_2               rfe  0.977676         0.977676  ...      0.924920  0.936960            0.977245         0.977676     0.977357
    3      4             KNN_4               rfe  0.973097         0.973097  ...      0.884776  0.919385            0.972650         0.973097     0.971854
    4      5             KNN_0               rfe  0.970807         0.970807  ...      0.888515  0.914436            0.969924         0.970807     0.969813
    5      6             KNN_7               pca  0.968517         0.968517  ...      0.872203  0.905663            0.967593         0.968517     0.967063
    6      7             KNN_3               pca  0.966228         0.966228  ...      0.875942  0.901014            0.965004         0.966228     0.965078
    7      8         XGBOOST_3               pca  0.946193         0.946193  ...      0.844744  0.850003            0.945419         0.946193     0.945781
    8      9  DECISIONFOREST_1               pca  0.945621         0.945621  ...      0.831894  0.844827            0.943799         0.945621     0.944547
    9     10  DECISIONFOREST_3               pca  0.945049         0.945049  ...      0.831575  0.843605            0.943311         0.945049     0.944039
    10    11         XGBOOST_0               rfe  0.934745         0.934745  ...      0.928605  0.851790            0.952998         0.934745     0.940204
    11    12             GLM_0               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    12    13             GLM_1               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    13    14             GLM_2               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    14    15             GLM_3               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    15    16             SVM_0               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    16    17             SVM_3               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    17    18             SVM_1               pca  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    18    19             SVM_2               rfe  0.898683         0.898683  ...      0.500000  0.473319            0.807632         0.898683     0.850728
    19    20         XGBOOST_1               pca  0.895249         0.895249  ...      0.879060  0.781686            0.932727         0.895249     0.907236
    [20 rows x 13 columns]
  5. Display best performing model.
    >>> fd.leader()
       RANK   MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  XGBOOST_2               rfe    0.9834           0.9834  ...      0.940637  0.953124            0.983138           0.9834     0.983163
    [1 rows x 13 columns]
  6. Display model hyperparameters for rank 1.
    >>> fd.model_hyperparameters(rank=1)
    {'response_column': 'isFraud', 
      'name': 'xgboost', 
      'model_type': 'Classification', 
      'column_sampling': 1, 
      'min_impurity': 0.0, 
      'lambda1': 1.0, 
      'shrinkage_factor': 0.5, 
      'max_depth': 5, 
      'min_node_size': 1, 
      'iter_num': 10, 
      'num_boosted_trees': 5, 
      'seed': 42, 
      'persist': False, 
      'output_prob': True, 
      'output_responses': ['1', '0']}
    
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = fd.predict(fraud_test)
    2025-11-04 04:40:37,188 | INFO     | Data Transformation started ...
    2025-11-04 04:40:37,188 | INFO     | Performing transformation carried out in feature engineering phase ...
    2025-11-04 04:40:37,974 | INFO     | Updated dataset after dropping futile columns :
       step payment_type     amount  oldbalanceOrg  newbalanceOrig  oldbalanceDest  newbalanceDest  isFraud  automl_id
    0    17     CASH_OUT  251240.86       52381.00            0.00      4248406.53      4499647.39        0         13
    1    38      PAYMENT   11224.00       76055.24        64831.24            0.00            0.00        0          8
    2    38      PAYMENT    3970.07           0.00            0.00            0.00            0.00        0         12
    3    40     CASH_OUT  336996.80           0.00            0.00       419362.17       756358.97        0          6
    4    40      PAYMENT    6440.87           0.00            0.00            0.00            0.00        0         14
    5    19     CASH_OUT    6448.12           0.00            0.00      1556690.40      1468455.10        0          7
    6    19      CASH_IN   26877.75       15207.00        42084.75        52460.32        25582.58        0         11
    7    19     CASH_OUT  313071.96      216622.57            0.00       292599.86       605671.81        0         15
    8    40     CASH_OUT  482260.45           0.00            0.00      6848237.76      7330498.21        0         10
    9    38     CASH_OUT   14718.25       20848.00         6129.75        50688.71        65406.95        0          4
    2000 rows X 9 columns
    2025-11-04 04:40:38,291 | INFO     | Updated dataset after performing target column transformation :
       step payment_type      amount  oldbalanceOrg  newbalanceOrig  oldbalanceDest  newbalanceDest  isFraud  automl_id
    0    40      PAYMENT     6440.87           0.00            0.00            0.00            0.00        0         14
    1    38      PAYMENT    11224.00       76055.24        64831.24            0.00            0.00        0          8
    2    38      PAYMENT     3970.07           0.00            0.00            0.00            0.00        0         12
    3    17      PAYMENT    10038.15       31688.80        21650.65            0.00            0.00        0          5
    4    17     CASH_OUT   251240.86       52381.00            0.00      4248406.53      4499647.39        0         13
    5    19     CASH_OUT     6448.12           0.00            0.00      1556690.40      1468455.10        0          7
    6    19      CASH_IN    26877.75       15207.00        42084.75        52460.32        25582.58        0         11
    7    19     CASH_OUT   313071.96      216622.57            0.00       292599.86       605671.81        0         15
    8    17     TRANSFER  1098532.21       25802.00            0.00       185981.07      1284513.28        0          9
    9    38     CASH_OUT    14718.25       20848.00         6129.75        50688.71        65406.95        0          4
    2000 rows X 9 columns
    2025-11-04 04:40:39,145 | INFO     | Updated dataset after imputing missing value containing columns :
       step payment_type      amount  oldbalanceOrg  newbalanceOrig  oldbalanceDest  newbalanceDest  isFraud  automl_id
    0    19     CASH_OUT   313071.96      216622.57            0.00       292599.86       605671.81        0         15
    1    17     TRANSFER  1098532.21       25802.00            0.00       185981.07      1284513.28        0          9
    2    17     CASH_OUT   251240.86       52381.00            0.00      4248406.53      4499647.39        0         13
    3    40     CASH_OUT   336996.80           0.00            0.00       419362.17       756358.97        0          6
    4    40      PAYMENT     6440.87           0.00            0.00            0.00            0.00        0         14
    5    38     CASH_OUT    14718.25       20848.00         6129.75        50688.71        65406.95        0          4
    6    38      PAYMENT    11224.00       76055.24        64831.24            0.00            0.00        0          8
    7    38      PAYMENT     3970.07           0.00            0.00            0.00            0.00        0         12
    8    40     CASH_OUT   482260.45           0.00            0.00      6848237.76      7330498.21        0         10
    9    17      PAYMENT    10038.15       31688.80        21650.65            0.00            0.00        0          5
    2000 rows X 9 columns
    2025-11-04 04:40:41,108 | INFO     | Updated dataset after performing categorical encoding :
                  newbalanceDest  oldbalanceDest  isFraud  newbalanceOrig  oldbalanceOrg  automl_id     amount  step
    payment_type
    0.034876           657807.19       563733.07        0            0.00           0.00         61   94074.12    17
    0.034876           678419.64       398931.35        1       298767.61      340830.43         73   42062.82    17
    0.034876            69163.65            0.00        0            0.00       41567.00         77   69163.65    17
    0.034876           250803.22            0.00        0            0.00      165989.00         85  250803.22    17
    0.034876           272986.03            0.00        0            0.00       31417.00        101  272986.03    17
    0.034876           583424.08       250916.50        0            0.00           0.00        109  159921.79    17
    0.000005                0.00            0.00        0        18195.70       20898.00       1133    2702.30    10
    0.000005                0.00            0.00        0            0.00        4922.00        525    5914.18     9
    0.000005                0.00            0.00        0        18579.92       33714.26       2492   15134.34    25
    0.000005                0.00            0.00        0            0.00           0.00       1204    2556.46     3
    2000 rows X 9 columns
    2025-11-04 04:40:41,243 | INFO     | Performing transformation carried out in data preparation phase ...
    2025-11-04 04:40:41,996 | INFO     | Updated dataset after performing RFE feature selection:
               step  payment_type  newbalanceDest  oldbalanceDest  newbalanceOrig  oldbalanceOrg     amount  isFraud
    automl_id
    122          40        0.0000            0.00            0.00       132295.33      133904.00    1608.67        0
    387          15        0.0000            0.00            0.00       179737.71         316.00  179421.71        0
    856          24        0.0000            0.00            0.00        14885.75       26247.00   11361.25        0
    2528         25        0.0000            0.00            0.00        40269.29       50314.00   10044.71        0
    713          49        0.0000            0.00            0.00       237403.94      239115.80    1711.87        0
    1182         12        0.0000            0.00            0.00        43841.22       51378.00    7536.78        0
    448           5        0.0000        66221.72       105279.81      3457455.51     3418397.43   39058.08        0
    591          47        0.0000            0.00            0.00            0.00           0.00    2821.69        0
    1203          6        0.0000            0.00            0.00            0.00           0.00   45923.16        0
    938          35        0.0349      1993208.51      1941467.64            0.00       12632.00   51740.87        0
    2000 rows X 9 columns
    2025-11-04 04:40:42,845 | INFO     | Updated dataset after performing scaling on RFE selected features :
       automl_id  isFraud    r_step  r_payment_type  r_newbalanceDest  r_oldbalanceDest  r_newbalanceOrig  r_oldbalanceOrg  r_amount
    0       1672        0  0.000000        0.000000          0.000405          0.121635          1.376248         1.373288  0.172193
    1        265        0  0.351064        0.371672          0.540747          0.527264          0.000000         0.000000  0.675852
    2        530        0  0.404255        0.000000          0.291644          0.334035          0.016149         0.000288  0.147508
    3        469        0  0.106383        0.371672          0.325561          0.032070          0.102284         0.112889  0.085889
    4        326        0  0.063830        0.000000          0.000000          0.022551          0.424472         0.422271  0.063689
    5        938        0  0.361702        0.371672          0.334045          0.356581          0.000000         0.002582  0.096518
    6       1203        0  0.053191        0.000000          0.000000          0.000000          0.000000         0.000000  0.085433
    7        122        0  0.414894        0.000000          0.000000          0.000000          0.026736         0.027370  0.000998
    8         61        0  0.170213        0.371672          0.110243          0.103538          0.000000         0.000000  0.177178
    9       1876        0  0.000000        0.001065          0.000908          0.003499          0.020793         0.021950  0.006503
    2000 rows X 9 columns
    2025-11-04 04:40:44,023 | INFO     | Updated dataset after performing scaling for PCA feature selection :
       automl_id  isFraud  payment_type  newbalanceDest  oldbalanceDest  newbalanceOrig  oldbalanceOrg    amount      step
    0        326        0     -0.000058        0.000000        0.022551        0.424472       0.422271  0.063689  0.063830
    1        734        0      0.371832        0.049296        0.006188        0.000000       0.029898  0.494182  0.382979
    2       1672        0     -0.000058        0.000405        0.121635        1.376248       1.373288  0.172193  0.000000
    3        938        0      0.371832        0.334045        0.356581        0.000000       0.002582  0.096518  0.361702
    4        122        0     -0.000058        0.000000        0.000000        0.026736       0.027370  0.000998  0.414894
    5       1876        0      0.001008        0.000908        0.003499        0.020793       0.021950  0.006503  0.000000
    6        265        0      0.371832        0.540747        0.527264        0.000000       0.000000  0.675852  0.351064
    7        530        0     -0.000058        0.291644        0.334035        0.016149       0.000288  0.147508  0.404255
    8       1203        0     -0.000058        0.000000        0.000000        0.000000       0.000000  0.085433  0.053191
    9       1407        0     -0.000058        0.000000        0.000000        0.000000       0.000000  0.002412  0.234043
    2000 rows X 9 columns
    2025-11-04 04:40:44,424 | INFO     | Updated dataset after performing PCA feature selection :
       automl_id     col_0     col_1     col_2     col_3     col_4  isFraud
    0        469  0.087023  0.094250 -0.166024  0.045961 -0.175616        0
    1       1876 -0.354038 -0.040333 -0.207998  0.027194 -0.017408        0
    2       1407 -0.303722 -0.118541  0.006120  0.059805 -0.006441        0
    3        938  0.215680  0.111473  0.066321  0.301660 -0.129706        0
    4         61  0.105994 -0.026166 -0.130554  0.037863 -0.000090        0
    5        265  0.477408  0.540892  0.119593  0.276230  0.258709        0
    6        734  0.223673 -0.005435  0.090766 -0.149628  0.244795        0
    7       1203 -0.316284 -0.036183 -0.157193  0.019565  0.059413        0
    8        326 -0.362477  0.257506 -0.039267 -0.339162 -0.298827        0
    9        530 -0.097231  0.196814  0.209709  0.306424  0.009944        0
    10 rows X 7 columns
    2025-11-04 04:40:44,789 | INFO     | Data Transformation completed.█████| 100% - 9/9
    2025-11-04 04:40:45,338 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:40:45,338 | INFO     | Model ID : XGBOOST_2
    2025-11-04 04:40:45,338 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:40:45,889 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 04:40:50,120 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 04:40:50,263 | INFO     | Prediction :
       automl_id  Prediction  isFraud    prob_0    prob_1
    0        326           0        0  0.815679  0.184321
    1        265           0        0  0.976456  0.023544
    2        530           0        0  0.989053  0.010947
    3        938           0        0  0.966440  0.033560
    4        122           0        0  0.980079  0.019921
    5       1407           0        0  0.974600  0.025400
    6        734           0        0  0.646486  0.353514
    7       1672           0        0  0.654323  0.345677
    8       1203           0        0  0.981232  0.018768
    9       1876           0        0  0.987697  0.012303
    2025-11-04 04:40:51,855 | INFO     | ROC-AUC :
                  GINI
    AUC
    0.966508  0.933015
       threshold_value       tpr       fpr
    0         0.040816  1.000000  0.367263
    1         0.081633  0.977778  0.305371
    2         0.102041  0.977778  0.285422
    3         0.122449  0.977778  0.276215
    4         0.163265  0.955556  0.254220
    5         0.183673  0.955556  0.231714
    6         0.142857  0.977778  0.265985
    7         0.061224  1.000000  0.336061
    8         0.020408  1.000000  0.648082
    9         0.000000  1.000000  1.000000
    2025-11-04 04:40:52,302 | INFO     | Confusion Matrix :
    [[1949    6]
     [   8   37]]
    >>> prediction
       automl_id  Prediction  isFraud    prob_0    prob_1
    0        530           0        0  0.989053  0.010947
    1        734           0        0  0.646486  0.353514
    2       1672           0        0  0.654323  0.345677
    3        938           0        0  0.966440  0.033560
    4        122           0        0  0.980079  0.019921
    5        469           0        0  0.928566  0.071434
    6         61           0        0  0.971149  0.028851
    7        326           0        0  0.815679  0.184321
    8       1203           0        0  0.981232  0.018768
    9       1407           0        0  0.974600  0.025400
  8. Generate evaluation metrics on test dataset using best performing model.
    >>> performance_metrics = fd.evaluate(fraud_test)
    2025-11-04 04:41:40,851 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 04:41:41,438 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:41:41,438 | INFO     | Model ID : XGBOOST_2
    2025-11-04 04:41:41,438 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:41:44,136 | INFO     | Performance Metrics :
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    1               1  CLASS_2        6       37   0.860465  0.822222  0.840909       45
    0               0  CLASS_1     1949        8   0.995912  0.996931  0.996421     1955
    --------------------------------------------------------------------------------
       SeqNum              Metric  MetricValue
    0       3        Micro-Recall     0.993000
    1       5     Macro-Precision     0.928189
    2       6        Macro-Recall     0.909577
    3       7            Macro-F1     0.918665
    4       9     Weighted-Recall     0.993000
    5      10         Weighted-F1     0.992922
    6       8  Weighted-Precision     0.992865
    7       4            Micro-F1     0.993000
    8       2     Micro-Precision     0.993000
    9       1            Accuracy     0.993000
    >>> performance_metrics
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1     1949        8   0.995912  0.996931  0.996421     1955
    1               1  CLASS_2        6       37   0.860465  0.822222  0.840909       45
  9. Generate prediction on test dataset using second best performing model.
    >>> prediction = fd.predict(fraud_test,2)
    2025-11-04 04:42:13,468 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 04:42:14,019 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:42:14,020 | INFO     | Model ID : DECISIONFOREST_0
    2025-11-04 04:42:14,020 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:42:14,773 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 04:42:17,753 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 04:42:17,834 | INFO     | Prediction :
       automl_id  prediction  prob_1  prob_0  isFraud
    0       1672           0     0.0     1.0        0
    1         61           0     0.0     1.0        0
    2        326           0     0.0     1.0        0
    3       1876           0     0.0     1.0        0
    4        530           0     0.0     1.0        0
    5        938           0     0.0     1.0        0
    6       1203           0     0.0     1.0        0
    7        122           0     0.0     1.0        0
    8        265           0     0.0     1.0        0
    9        469           0     0.0     1.0        0
    2025-11-04 04:42:20,578 | INFO     | ROC-AUC :
                  GINI
    AUC
    0.873737  0.747474
       threshold_value       tpr       fpr
    0         0.040816  0.755556  0.004604
    1         0.081633  0.755556  0.004604
    2         0.102041  0.755556  0.004604
    3         0.122449  0.755556  0.004604
    4         0.163265  0.755556  0.004604
    5         0.183673  0.755556  0.004604
    6         0.142857  0.755556  0.004604
    7         0.061224  0.755556  0.004604
    8         0.020408  0.755556  0.004604
    9         0.000000  1.000000  1.000000
    2025-11-04 04:42:21,430 | INFO     | Confusion Matrix :
    [[1946    9]
     [  11   34]]
    >>> prediction.head()
       automl_id  prediction  prob_1  prob_0  isFraud
    0        122           0     0.0     1.0        0
    1         61           0     0.0     1.0        0
    2        326           0     0.0     1.0        0
    3       1876           0     0.0     1.0        0
    4        530           0     0.0     1.0        0
    5       1407           0     0.0     1.0        0
    6        734           0     0.0     1.0        0
    7       1672           0     0.0     1.0        0
    8        265           0     0.0     1.0        0
    9        469           0     0.0     1.0        0
  10. Generate evaluation metrics on test dataset using second best performing model.
    >>> performance_metrics = fd.evaluate(fraud_test, 2)
    2025-11-04 04:42:50,505 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 04:42:51,049 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:42:51,049 | INFO     | Model ID : DECISIONFOREST_0
    2025-11-04 04:42:51,049 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:42:56,196 | INFO     | Performance Metrics :
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    1               1  CLASS_2        9       34   0.790698  0.755556  0.772727       45
    0               0  CLASS_1     1946       11   0.994379  0.995396  0.994888     1955
    --------------------------------------------------------------------------------
       SeqNum              Metric  MetricValue
    0       3        Micro-Recall     0.990000
    1       5     Macro-Precision     0.892538
    2       6        Macro-Recall     0.875476
    3       7            Macro-F1     0.883807
    4       9     Weighted-Recall     0.990000
    5      10         Weighted-F1     0.989889
    6       8  Weighted-Precision     0.989796
    7       4            Micro-F1     0.990000
    8       2     Micro-Precision     0.990000
    9       1            Accuracy     0.990000
    >>> performance_metrics
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1     1946       11   0.994379  0.995396  0.994888     1955
    1               1  CLASS_2        9       34   0.790698  0.755556  0.772727       45