This example predicts whether the transaction is fraud or not based on different factors. Run AutoML to get the best performing model with the following specifications:
- Set early stopping criteria, i.e., time limit to 100 sec and performance metrics MICRO-RECALL threshold value to 0.1.
- Opt for verbose level 2 to get detailed logging.
- Load the online fraud dataset.
>>> load_example_data('teradataml','payment_fraud_dataset')>>> fraud_df = DataFrame('payment_fraud_dataset') >>> fraud_sample = fraud_df.sample(frac = [0.8, 0.2]) >>> fraud_train= fraud_sample[fraud_sample['sampleid'] == 1].drop('sampleid', axis=1) >>> fraud_test = fraud_sample[fraud_sample['sampleid'] == 2].drop('sampleid', axis=1) - Create an AutoFraud instance.
>>> fd = AutoFraud(verbose=2, >>> max_runtime_secs=100, >>> stopping_metric='MICRO-RECALL', >>> stopping_tolerance=0.1, >>> seed=42)
- Fit the data.
>>>fd.fit(fraud_train,fraud_train.isFraud)
1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:20:34,546 | INFO | Feature Exploration started 2025-11-04 04:20:34,546 | INFO | Data Overview: 2025-11-04 04:20:34,651 | INFO | Total Rows in the data: 8000 2025-11-04 04:20:34,693 | INFO | Total Columns in the data: 10 2025-11-04 04:20:35,626 | INFO | Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage 0 oldbalanceOrg FLOAT 7991 9 NaN 2314.0 5677.0 0.0 0.1125 99.8875 1 payment_type VARCHAR(40) CHARACTER SET LATIN 8000 0 0.0 NaN NaN NaN 0.0000 100.0000 2 amount FLOAT 7984 16 NaN 0.0 7984.0 0.0 0.2000 99.8000 3 nameOrig VARCHAR(40) CHARACTER SET LATIN 8000 0 0.0 NaN NaN NaN 0.0000 100.0000 4 newbalanceDest FLOAT 7992 8 NaN 3839.0 4153.0 0.0 0.1000 99.9000 5 isFraud BIGINT 8000 0 NaN 7849.0 151.0 0.0 0.0000 100.0000 6 newbalanceOrig FLOAT 7994 6 NaN 4068.0 3926.0 0.0 0.0750 99.9250 7 oldbalanceDest FLOAT 7993 7 NaN 3969.0 4024.0 0.0 0.0875 99.9125 8 nameDest VARCHAR(40) CHARACTER SET LATIN 8000 0 0.0 NaN NaN NaN 0.0000 100.0000 9 step BIGINT 8000 0 NaN 0.0 8000.0 0.0 0.0000 100.0000 2025-11-04 04:20:36,886 | INFO | Statistics of Data: ATTRIBUTE StatName StatValue 0 isFraud MAXIMUM 1.00 1 newbalanceOrig MINIMUM 0.00 2 newbalanceOrig MAXIMUM 13000000.00 3 amount COUNT 7984.00 4 amount MAXIMUM 10000000.00 5 newbalanceDest COUNT 7992.00 6 newbalanceDest MINIMUM 0.00 7 newbalanceDest MAXIMUM 34700000.00 8 amount MINIMUM 0.65 9 newbalanceOrig COUNT 7994.00 2025-11-04 04:20:37,862 | INFO | Categorical Columns with their Distinct values: ColumnName DistinctValueCount payment_type 5 nameOrig 7996 nameDest 7035 2025-11-04 04:20:42,109 | INFO | Futile columns in dataset: ColumnName 0 nameOrig 1 nameDest 2025-11-04 04:20:48,618 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 oldbalanceOrg 1.1000 1 newbalanceDest 1.0875 2 newbalanceOrig 1.0625 3 amount 2.1750 4 oldbalanceDest 1.0250 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:20:49,708 | INFO | Feature Engineering started ... 2025-11-04 04:20:49,708 | INFO | Handling duplicate records present in dataset ... 2025-11-04 04:20:50,031 | INFO | Analysis completed. No action taken. 2025-11-04 04:20:50,031 | INFO | Total time to handle duplicate records: 0.32 sec 2025-11-04 04:20:50,031 | INFO | Handling less significant features from data ... 2025-11-04 04:20:55,929 | INFO | Removing Futile columns: ['nameOrig', 'nameDest'] 2025-11-04 04:20:55,929 | INFO | Sample of Data after removing Futile columns: step payment_type amount oldbalanceOrg newbalanceOrig oldbalanceDest newbalanceDest isFraud automl_id 0 38 CASH_IN 315951.64 104392.00 420343.64 77879.17 0.00 0 12 1 17 CASH_OUT 182687.76 261008.42 78320.65 213250.74 395938.50 0 9 2 17 CASH_OUT 158651.10 0.00 0.00 271682.54 430333.64 0 13 3 40 CASH_OUT 105987.63 0.00 0.00 1234821.83 1340809.46 0 6 4 40 CASH_OUT 120549.52 0.00 0.00 2381888.19 2502437.71 0 14 5 19 PAYMENT 8796.56 0.00 0.00 0.00 0.00 0 7 6 19 CASH_OUT 87459.82 0.00 0.00 2004723.65 2092183.47 0 11 7 19 PAYMENT 1154.55 321707.75 320553.20 0.00 0.00 0 15 8 40 CASH_OUT 75672.79 0.00 0.00 478121.23 553794.01 0 10 9 17 CASH_OUT 31724.00 35311.00 3587.00 954105.58 1535829.31 0 5 8000 rows X 9 columns 2025-11-04 04:20:56,443 | INFO | Total time to handle less significant features: 6.41 sec 2025-11-04 04:20:56,444 | INFO | Handling Date Features ... 2025-11-04 04:20:56,444 | INFO | Analysis Completed. Dataset does not contain any feature related to dates. No action needed. 2025-11-04 04:20:56,444 | INFO | Total time to handle date features: 0.00 sec 2025-11-04 04:20:56,444 | INFO | Checking Missing values in dataset using AutoFraud function... 2025-11-04 04:20:57,788 | INFO | Columns with their missing values: newbalanceDest: 8 oldbalanceDest: 7 newbalanceOrig: 6 amount: 16 oldbalanceOrg: 9 2025-11-04 04:20:59,787 | INFO | Flagging these columns for imputation: ['newbalanceDest', 'oldbalanceDest', 'newbalanceOrig', 'amount', 'oldbalanceOrg'] 2025-11-04 04:20:59,787 | INFO | Total time to find missing values in data using AutoFraud : 3.34 sec 2025-11-04 04:20:59,787 | INFO | Imputing Missing Values using SimpleImputeFit partition column... 2025-11-04 04:21:00,168 | INFO | Columns with their imputation method: newbalanceDest: median oldbalanceDest: median newbalanceOrig: median amount: median oldbalanceOrg: median 2025-11-04 04:21:04,228 | INFO | Sample of dataset after Imputation: step payment_type amount oldbalanceOrg newbalanceOrig oldbalanceDest newbalanceDest isFraud automl_id 0 17 CASH_OUT 158651.10 0.00 0.00 271682.54 430333.64 0 13 1 40 CASH_OUT 75672.79 0.00 0.00 478121.23 553794.01 0 10 2 40 CASH_OUT 120549.52 0.00 0.00 2381888.19 2502437.71 0 14 3 19 PAYMENT 8796.56 0.00 0.00 0.00 0.00 0 7 4 19 PAYMENT 1154.55 321707.75 320553.20 0.00 0.00 0 15 5 61 TRANSFER 475368.94 475368.94 0.00 0.00 0.00 1 4 6 61 CASH_OUT 475368.94 475368.94 0.00 1348026.73 1823395.67 1 8 7 38 CASH_IN 315951.64 104392.00 420343.64 77879.17 0.00 0 12 8 19 CASH_OUT 87459.82 0.00 0.00 2004723.65 2092183.47 0 11 9 40 CASH_OUT 105987.63 0.00 0.00 1234821.83 1340809.46 0 6 8000 rows X 9 columns 2025-11-04 04:21:05,312 | INFO | Time taken to perform imputation: 5.52 sec 2025-11-04 04:21:05,312 | INFO | Performing target encoding for categorical columns ... 2025-11-04 04:21:12,951 | INFO | Target Encoding completed for categorical columns using CBM_BETA. 2025-11-04 04:21:12,951 | INFO | Target Encoding these Columns: ['payment_type'] 2025-11-04 04:21:12,951 | INFO | Sample of dataset after performing target encoding: newbalanceDest oldbalanceDest isFraud newbalanceOrig oldbalanceOrg automl_id amount step payment_type 0.000005 0.00 0.00 0 90210.24 94916.53 10656 4706.29 2 0.000005 0.00 0.00 0 115491.65 122275.00 5899 6783.35 23 0.000005 0.00 0.00 0 11124.12 15077.00 3327 3952.88 6 0.000005 0.00 0.00 0 0.00 0.00 419 5190.06 19 0.000005 0.00 0.00 0 0.00 0.00 4911 7338.27 6 0.000005 0.00 0.00 0 279754.74 287908.00 5591 8153.26 23 0.034876 1937458.07 1886690.69 0 203571.62 254339.00 32 50767.38 38 0.034876 245202.78 0.00 0 0.00 8626.00 60 245202.78 38 0.034876 294692.87 0.00 0 0.00 23171.00 64 294692.87 38 0.034876 635663.11 630286.84 0 23439.73 28816.00 76 5376.27 38 8000 rows X 9 columns 2025-11-04 04:21:13,069 | INFO | Time taken to encode the columns: 7.76 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:21:13,069 | INFO | Data preparation started ... 2025-11-04 04:21:13,069 | INFO | AutoFraud Outlier preprocessing using Percentile... 2025-11-04 04:21:17,176 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 newbalanceDest 1.0000 1 newbalanceOrig 1.0000 2 automl_id 1.9875 3 oldbalanceDest 1.0000 4 oldbalanceOrg 1.0000 5 amount 1.9875 2025-11-04 04:21:17,830 | INFO | Replacing outliers with median: ['newbalanceDest', 'oldbalanceOrg', 'amount', 'oldbalanceDest', 'newbalanceOrig', 'automl_id'] 2025-11-04 04:21:21,201 | INFO | Sample of dataset after replacing outliers with MEDIAN: newbalanceDest oldbalanceDest isFraud newbalanceOrig oldbalanceOrg automl_id amount step payment_type 0.000005 0.0 0.0 0 29336.39 30661.00 3216 1324.61 24 0.000005 0.0 0.0 0 41370.75 45837.00 3255 4466.25 6 0.000005 0.0 0.0 0 49632.45 60989.84 4003 11357.40 2 0.000005 0.0 0.0 0 23602.75 25474.00 1668 1871.25 5 0.000005 0.0 0.0 0 3246.38 8201.00 542 4954.62 40 0.000005 0.0 0.0 0 89752.25 91303.00 3160 1550.75 5 0.000005 0.0 0.0 0 0.00 5648.85 2305 34220.19 9 0.000005 0.0 0.0 0 0.00 5383.00 2098 13620.42 22 0.000005 0.0 0.0 0 5687.72 9569.61 4003 3881.89 19 0.000005 0.0 0.0 0 102685.01 104778.00 2168 2092.99 5 8000 rows X 9 columns 2025-11-04 04:21:21,315 | INFO | Time Taken by Outlier processing: 8.25 sec 2025-11-04 04:21:21,316 | INFO | Checking imbalance data ... 2025-11-04 04:21:21,400 | INFO | Imbalance Found. 2025-11-04 04:21:21,400 | INFO | Handling data imbalance using SMOTE ... 2025-11-04 04:21:25,310 | INFO | Completed data imbalance handling. 2025-11-04 04:21:26,852 | INFO | Feature selection using rfe ... 2025-11-04 04:21:40,940 | INFO | feature selected by RFE: ['step', 'payment_type', 'newbalanceDest', 'oldbalanceDest', 'newbalanceOrig', 'oldbalanceOrg', 'amount'] 2025-11-04 04:21:40,942 | INFO | Total time taken by feature selection: 14.09 sec 2025-11-04 04:21:41,475 | INFO | Scaling Features of rfe data ... 2025-11-04 04:21:42,845 | INFO | columns that will be scaled: ['r_step', 'r_payment_type', 'r_newbalanceDest', 'r_oldbalanceDest', 'r_newbalanceOrig', 'r_oldbalanceOrg', 'r_amount'] 2025-11-04 04:21:44,964 | INFO | Dataset sample after scaling: automl_id isFraud r_step r_payment_type r_newbalanceDest r_oldbalanceDest r_newbalanceOrig r_oldbalanceOrg r_amount 0 6 0 0.042553 0.000000 0.000000 0.000000 0.020752 0.021417 0.001921 1 8 1 0.085106 0.556976 0.047131 0.000000 0.000000 0.049108 0.063790 2 9 0 0.202128 0.615548 0.052242 0.063600 0.000000 0.049715 0.477112 3 10 0 0.191489 0.000000 0.000000 0.000000 0.001149 0.001956 0.005329 4 12 1 0.648936 0.451544 0.001332 0.000000 0.000000 0.003913 0.061007 5 13 1 0.776596 0.686901 0.152266 0.146592 0.000000 0.129608 0.340321 6 11 0 0.393617 0.371672 0.120486 0.053914 0.000000 0.006135 0.900905 7 7 0 0.393617 0.371672 0.049388 0.000000 0.000000 0.004736 0.559428 8 5 1 0.170213 0.615548 0.332159 0.000000 0.000000 0.265401 0.058486 9 4 1 0.925532 0.407881 0.138873 0.010018 0.000000 0.267048 0.058486 8735 rows X 9 columns 2025-11-04 04:21:46,501 | INFO | Total time taken by feature scaling: 5.03 sec 2025-11-04 04:21:46,502 | INFO | Scaling Features of pca data ... 2025-11-04 04:21:47,373 | INFO | columns that will be scaled: ['payment_type', 'newbalanceDest', 'oldbalanceDest', 'newbalanceOrig', 'oldbalanceOrg', 'amount', 'step'] 2025-11-04 04:21:49,717 | INFO | Dataset sample after scaling: automl_id isFraud payment_type newbalanceDest oldbalanceDest newbalanceOrig oldbalanceOrg amount step 0 13 1 0.687473 0.152266 0.146592 0.000000 0.129608 0.340321 0.776596 1 8 1 0.557764 0.047131 0.000000 0.000000 0.049108 0.063790 0.085106 2 12 1 0.452046 0.001332 0.000000 0.000000 0.003913 0.061007 0.648936 3 8607 1 0.448108 0.000000 0.004310 0.000000 0.053291 0.353205 0.723404 4 8615 1 0.032317 0.118966 0.064117 0.000262 0.047860 0.570179 0.265957 5 23410 1 0.553099 0.000000 0.014718 0.000000 0.154165 0.562081 0.872340 6 23406 1 0.553099 0.002574 0.000000 0.000000 0.023212 0.141818 0.234043 7 23414 1 0.567007 0.268718 0.290448 0.000000 0.225627 0.058486 0.521277 8 8611 1 0.890055 0.091287 0.091824 0.000000 0.019294 0.181024 0.489362 9 4 1 0.408364 0.138873 0.010018 0.000000 0.267048 0.058486 0.925532 8735 rows X 9 columns 2025-11-04 04:21:50,546 | INFO | Total time taken by feature scaling: 4.04 sec 2025-11-04 04:21:50,547 | INFO | Dimension Reduction using pca ... 2025-11-04 04:21:51,343 | INFO | PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4'] 2025-11-04 04:21:51,344 | INFO | Total time taken by PCA: 0.80 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:21:52,201 | INFO | Model Training started ... 2025-11-04 04:21:52,244 | INFO | Hyperparameters used for model training: 2025-11-04 04:21:52,245 | INFO | Model: glm 2025-11-04 04:21:52,245 | INFO | Hyperparameters: {'response_column': 'isFraud', 'name': 'glm', 'family': 'BINOMIAL', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'iter_num_no_change': (5, 10, 50), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)} 2025-11-04 04:21:52,245 | INFO | Total number of models for glm: 1296 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:21:52,245 | INFO | Model: svm 2025-11-04 04:21:52,245 | INFO | Hyperparameters: {'response_column': 'isFraud', 'name': 'svm', 'model_type': 'Classification', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'tolerance': (0.001, 0.01), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'nesterov': True, 'intercept': True, 'iter_num_no_change': (5, 10, 50), 'local_sgd_iterations ': (10, 20), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)} 2025-11-04 04:21:52,246 | INFO | Total number of models for svm: 5184 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:21:52,246 | INFO | Model: knn 2025-11-04 04:21:52,246 | INFO | Hyperparameters: {'response_column': 'isFraud', 'name': 'knn', 'model_type': 'Classification', 'k': (3, 5, 6, 8, 10, 12), 'id_column': 'automl_id', 'voting_weight': 1.0} 2025-11-04 04:21:52,246 | INFO | Total number of models for knn: 6 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:21:52,247 | INFO | Model: decision_forest 2025-11-04 04:21:52,247 | INFO | Hyperparameters: {'response_column': 'isFraud', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': (0.0, 0.1, 0.2), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'num_trees': (-1,), 'seed': 42} 2025-11-04 04:21:52,247 | INFO | Total number of models for decision_forest: 36 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:21:52,247 | INFO | Model: xgboost 2025-11-04 04:21:52,247 | INFO | Hyperparameters: {'response_column': 'isFraud', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2), 'lambda1': (1.0, 0.01, 0.1), 'shrinkage_factor': (0.5, 0.1, 0.3), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'iter_num': (10, 20, 30), 'num_boosted_trees': (-1, 5, 10), 'seed': 42} 2025-11-04 04:21:52,248 | INFO | Total number of models for xgboost: 5832 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:21:52,248 | INFO | Performing hyperparameter tuning ... 2025-11-04 04:21:53,716 | INFO | Model training for glm 2025-11-04 04:22:07,589 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:22:07,589 | INFO | Model training for svm 2025-11-04 04:22:20,837 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:22:20,838 | INFO | Model training for knn 2025-11-04 04:23:26,393 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:23:26,393 | INFO | Model training for decision_forest 2025-11-04 04:23:50,783 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:23:50,783 | INFO | Model training for xgboost 2025-11-04 04:24:05,343 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:24:05,346 | INFO | Leaderboard RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_2 rfe 0.983400 0.983400 ... 0.940637 0.953124 0.983138 0.983400 0.983163 1 2 DECISIONFOREST_0 rfe 0.978248 0.978248 ... 0.925238 0.938417 0.977816 0.978248 0.977908 2 3 DECISIONFOREST_2 rfe 0.977676 0.977676 ... 0.924920 0.936960 0.977245 0.977676 0.977357 3 4 KNN_4 rfe 0.973097 0.973097 ... 0.884776 0.919385 0.972650 0.973097 0.971854 4 5 KNN_0 rfe 0.970807 0.970807 ... 0.888515 0.914436 0.969924 0.970807 0.969813 5 6 KNN_7 pca 0.968517 0.968517 ... 0.872203 0.905663 0.967593 0.968517 0.967063 6 7 KNN_3 pca 0.966228 0.966228 ... 0.875942 0.901014 0.965004 0.966228 0.965078 7 8 XGBOOST_3 pca 0.946193 0.946193 ... 0.844744 0.850003 0.945419 0.946193 0.945781 8 9 DECISIONFOREST_1 pca 0.945621 0.945621 ... 0.831894 0.844827 0.943799 0.945621 0.944547 9 10 DECISIONFOREST_3 pca 0.945049 0.945049 ... 0.831575 0.843605 0.943311 0.945049 0.944039 10 11 XGBOOST_0 rfe 0.934745 0.934745 ... 0.928605 0.851790 0.952998 0.934745 0.940204 11 12 GLM_0 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 12 13 GLM_1 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 13 14 GLM_2 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 14 15 GLM_3 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 15 16 SVM_0 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 16 17 SVM_3 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 17 18 SVM_1 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 18 19 SVM_2 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 19 20 XGBOOST_1 pca 0.895249 0.895249 ... 0.879060 0.781686 0.932727 0.895249 0.907236 [20 rows x 13 columns] 20 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 16/16 - Display leaderboard.
>>> fd.leaderboard()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_2 rfe 0.983400 0.983400 ... 0.940637 0.953124 0.983138 0.983400 0.983163 1 2 DECISIONFOREST_0 rfe 0.978248 0.978248 ... 0.925238 0.938417 0.977816 0.978248 0.977908 2 3 DECISIONFOREST_2 rfe 0.977676 0.977676 ... 0.924920 0.936960 0.977245 0.977676 0.977357 3 4 KNN_4 rfe 0.973097 0.973097 ... 0.884776 0.919385 0.972650 0.973097 0.971854 4 5 KNN_0 rfe 0.970807 0.970807 ... 0.888515 0.914436 0.969924 0.970807 0.969813 5 6 KNN_7 pca 0.968517 0.968517 ... 0.872203 0.905663 0.967593 0.968517 0.967063 6 7 KNN_3 pca 0.966228 0.966228 ... 0.875942 0.901014 0.965004 0.966228 0.965078 7 8 XGBOOST_3 pca 0.946193 0.946193 ... 0.844744 0.850003 0.945419 0.946193 0.945781 8 9 DECISIONFOREST_1 pca 0.945621 0.945621 ... 0.831894 0.844827 0.943799 0.945621 0.944547 9 10 DECISIONFOREST_3 pca 0.945049 0.945049 ... 0.831575 0.843605 0.943311 0.945049 0.944039 10 11 XGBOOST_0 rfe 0.934745 0.934745 ... 0.928605 0.851790 0.952998 0.934745 0.940204 11 12 GLM_0 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 12 13 GLM_1 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 13 14 GLM_2 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 14 15 GLM_3 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 15 16 SVM_0 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 16 17 SVM_3 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 17 18 SVM_1 pca 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 18 19 SVM_2 rfe 0.898683 0.898683 ... 0.500000 0.473319 0.807632 0.898683 0.850728 19 20 XGBOOST_1 pca 0.895249 0.895249 ... 0.879060 0.781686 0.932727 0.895249 0.907236 [20 rows x 13 columns]
- Display best performing model.
>>> fd.leader()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_2 rfe 0.9834 0.9834 ... 0.940637 0.953124 0.983138 0.9834 0.983163 [1 rows x 13 columns]
- Display model hyperparameters for rank 1.
>>> fd.model_hyperparameters(rank=1)
{'response_column': 'isFraud', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': 1, 'min_impurity': 0.0, 'lambda1': 1.0, 'shrinkage_factor': 0.5, 'max_depth': 5, 'min_node_size': 1, 'iter_num': 10, 'num_boosted_trees': 5, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0']} - Generate prediction on test dataset using best performing model.
>>> prediction = fd.predict(fraud_test)
2025-11-04 04:40:37,188 | INFO | Data Transformation started ... 2025-11-04 04:40:37,188 | INFO | Performing transformation carried out in feature engineering phase ... 2025-11-04 04:40:37,974 | INFO | Updated dataset after dropping futile columns : step payment_type amount oldbalanceOrg newbalanceOrig oldbalanceDest newbalanceDest isFraud automl_id 0 17 CASH_OUT 251240.86 52381.00 0.00 4248406.53 4499647.39 0 13 1 38 PAYMENT 11224.00 76055.24 64831.24 0.00 0.00 0 8 2 38 PAYMENT 3970.07 0.00 0.00 0.00 0.00 0 12 3 40 CASH_OUT 336996.80 0.00 0.00 419362.17 756358.97 0 6 4 40 PAYMENT 6440.87 0.00 0.00 0.00 0.00 0 14 5 19 CASH_OUT 6448.12 0.00 0.00 1556690.40 1468455.10 0 7 6 19 CASH_IN 26877.75 15207.00 42084.75 52460.32 25582.58 0 11 7 19 CASH_OUT 313071.96 216622.57 0.00 292599.86 605671.81 0 15 8 40 CASH_OUT 482260.45 0.00 0.00 6848237.76 7330498.21 0 10 9 38 CASH_OUT 14718.25 20848.00 6129.75 50688.71 65406.95 0 4 2000 rows X 9 columns 2025-11-04 04:40:38,291 | INFO | Updated dataset after performing target column transformation : step payment_type amount oldbalanceOrg newbalanceOrig oldbalanceDest newbalanceDest isFraud automl_id 0 40 PAYMENT 6440.87 0.00 0.00 0.00 0.00 0 14 1 38 PAYMENT 11224.00 76055.24 64831.24 0.00 0.00 0 8 2 38 PAYMENT 3970.07 0.00 0.00 0.00 0.00 0 12 3 17 PAYMENT 10038.15 31688.80 21650.65 0.00 0.00 0 5 4 17 CASH_OUT 251240.86 52381.00 0.00 4248406.53 4499647.39 0 13 5 19 CASH_OUT 6448.12 0.00 0.00 1556690.40 1468455.10 0 7 6 19 CASH_IN 26877.75 15207.00 42084.75 52460.32 25582.58 0 11 7 19 CASH_OUT 313071.96 216622.57 0.00 292599.86 605671.81 0 15 8 17 TRANSFER 1098532.21 25802.00 0.00 185981.07 1284513.28 0 9 9 38 CASH_OUT 14718.25 20848.00 6129.75 50688.71 65406.95 0 4 2000 rows X 9 columns 2025-11-04 04:40:39,145 | INFO | Updated dataset after imputing missing value containing columns : step payment_type amount oldbalanceOrg newbalanceOrig oldbalanceDest newbalanceDest isFraud automl_id 0 19 CASH_OUT 313071.96 216622.57 0.00 292599.86 605671.81 0 15 1 17 TRANSFER 1098532.21 25802.00 0.00 185981.07 1284513.28 0 9 2 17 CASH_OUT 251240.86 52381.00 0.00 4248406.53 4499647.39 0 13 3 40 CASH_OUT 336996.80 0.00 0.00 419362.17 756358.97 0 6 4 40 PAYMENT 6440.87 0.00 0.00 0.00 0.00 0 14 5 38 CASH_OUT 14718.25 20848.00 6129.75 50688.71 65406.95 0 4 6 38 PAYMENT 11224.00 76055.24 64831.24 0.00 0.00 0 8 7 38 PAYMENT 3970.07 0.00 0.00 0.00 0.00 0 12 8 40 CASH_OUT 482260.45 0.00 0.00 6848237.76 7330498.21 0 10 9 17 PAYMENT 10038.15 31688.80 21650.65 0.00 0.00 0 5 2000 rows X 9 columns 2025-11-04 04:40:41,108 | INFO | Updated dataset after performing categorical encoding : newbalanceDest oldbalanceDest isFraud newbalanceOrig oldbalanceOrg automl_id amount step payment_type 0.034876 657807.19 563733.07 0 0.00 0.00 61 94074.12 17 0.034876 678419.64 398931.35 1 298767.61 340830.43 73 42062.82 17 0.034876 69163.65 0.00 0 0.00 41567.00 77 69163.65 17 0.034876 250803.22 0.00 0 0.00 165989.00 85 250803.22 17 0.034876 272986.03 0.00 0 0.00 31417.00 101 272986.03 17 0.034876 583424.08 250916.50 0 0.00 0.00 109 159921.79 17 0.000005 0.00 0.00 0 18195.70 20898.00 1133 2702.30 10 0.000005 0.00 0.00 0 0.00 4922.00 525 5914.18 9 0.000005 0.00 0.00 0 18579.92 33714.26 2492 15134.34 25 0.000005 0.00 0.00 0 0.00 0.00 1204 2556.46 3 2000 rows X 9 columns 2025-11-04 04:40:41,243 | INFO | Performing transformation carried out in data preparation phase ... 2025-11-04 04:40:41,996 | INFO | Updated dataset after performing RFE feature selection: step payment_type newbalanceDest oldbalanceDest newbalanceOrig oldbalanceOrg amount isFraud automl_id 122 40 0.0000 0.00 0.00 132295.33 133904.00 1608.67 0 387 15 0.0000 0.00 0.00 179737.71 316.00 179421.71 0 856 24 0.0000 0.00 0.00 14885.75 26247.00 11361.25 0 2528 25 0.0000 0.00 0.00 40269.29 50314.00 10044.71 0 713 49 0.0000 0.00 0.00 237403.94 239115.80 1711.87 0 1182 12 0.0000 0.00 0.00 43841.22 51378.00 7536.78 0 448 5 0.0000 66221.72 105279.81 3457455.51 3418397.43 39058.08 0 591 47 0.0000 0.00 0.00 0.00 0.00 2821.69 0 1203 6 0.0000 0.00 0.00 0.00 0.00 45923.16 0 938 35 0.0349 1993208.51 1941467.64 0.00 12632.00 51740.87 0 2000 rows X 9 columns 2025-11-04 04:40:42,845 | INFO | Updated dataset after performing scaling on RFE selected features : automl_id isFraud r_step r_payment_type r_newbalanceDest r_oldbalanceDest r_newbalanceOrig r_oldbalanceOrg r_amount 0 1672 0 0.000000 0.000000 0.000405 0.121635 1.376248 1.373288 0.172193 1 265 0 0.351064 0.371672 0.540747 0.527264 0.000000 0.000000 0.675852 2 530 0 0.404255 0.000000 0.291644 0.334035 0.016149 0.000288 0.147508 3 469 0 0.106383 0.371672 0.325561 0.032070 0.102284 0.112889 0.085889 4 326 0 0.063830 0.000000 0.000000 0.022551 0.424472 0.422271 0.063689 5 938 0 0.361702 0.371672 0.334045 0.356581 0.000000 0.002582 0.096518 6 1203 0 0.053191 0.000000 0.000000 0.000000 0.000000 0.000000 0.085433 7 122 0 0.414894 0.000000 0.000000 0.000000 0.026736 0.027370 0.000998 8 61 0 0.170213 0.371672 0.110243 0.103538 0.000000 0.000000 0.177178 9 1876 0 0.000000 0.001065 0.000908 0.003499 0.020793 0.021950 0.006503 2000 rows X 9 columns 2025-11-04 04:40:44,023 | INFO | Updated dataset after performing scaling for PCA feature selection : automl_id isFraud payment_type newbalanceDest oldbalanceDest newbalanceOrig oldbalanceOrg amount step 0 326 0 -0.000058 0.000000 0.022551 0.424472 0.422271 0.063689 0.063830 1 734 0 0.371832 0.049296 0.006188 0.000000 0.029898 0.494182 0.382979 2 1672 0 -0.000058 0.000405 0.121635 1.376248 1.373288 0.172193 0.000000 3 938 0 0.371832 0.334045 0.356581 0.000000 0.002582 0.096518 0.361702 4 122 0 -0.000058 0.000000 0.000000 0.026736 0.027370 0.000998 0.414894 5 1876 0 0.001008 0.000908 0.003499 0.020793 0.021950 0.006503 0.000000 6 265 0 0.371832 0.540747 0.527264 0.000000 0.000000 0.675852 0.351064 7 530 0 -0.000058 0.291644 0.334035 0.016149 0.000288 0.147508 0.404255 8 1203 0 -0.000058 0.000000 0.000000 0.000000 0.000000 0.085433 0.053191 9 1407 0 -0.000058 0.000000 0.000000 0.000000 0.000000 0.002412 0.234043 2000 rows X 9 columns 2025-11-04 04:40:44,424 | INFO | Updated dataset after performing PCA feature selection : automl_id col_0 col_1 col_2 col_3 col_4 isFraud 0 469 0.087023 0.094250 -0.166024 0.045961 -0.175616 0 1 1876 -0.354038 -0.040333 -0.207998 0.027194 -0.017408 0 2 1407 -0.303722 -0.118541 0.006120 0.059805 -0.006441 0 3 938 0.215680 0.111473 0.066321 0.301660 -0.129706 0 4 61 0.105994 -0.026166 -0.130554 0.037863 -0.000090 0 5 265 0.477408 0.540892 0.119593 0.276230 0.258709 0 6 734 0.223673 -0.005435 0.090766 -0.149628 0.244795 0 7 1203 -0.316284 -0.036183 -0.157193 0.019565 0.059413 0 8 326 -0.362477 0.257506 -0.039267 -0.339162 -0.298827 0 9 530 -0.097231 0.196814 0.209709 0.306424 0.009944 0 10 rows X 7 columns 2025-11-04 04:40:44,789 | INFO | Data Transformation completed.█████| 100% - 9/9 2025-11-04 04:40:45,338 | INFO | Following model is being picked for evaluation: 2025-11-04 04:40:45,338 | INFO | Model ID : XGBOOST_2 2025-11-04 04:40:45,338 | INFO | Feature Selection Method : rfe 2025-11-04 04:40:45,889 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 04:40:50,120 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 04:40:50,263 | INFO | Prediction : automl_id Prediction isFraud prob_0 prob_1 0 326 0 0 0.815679 0.184321 1 265 0 0 0.976456 0.023544 2 530 0 0 0.989053 0.010947 3 938 0 0 0.966440 0.033560 4 122 0 0 0.980079 0.019921 5 1407 0 0 0.974600 0.025400 6 734 0 0 0.646486 0.353514 7 1672 0 0 0.654323 0.345677 8 1203 0 0 0.981232 0.018768 9 1876 0 0 0.987697 0.012303 2025-11-04 04:40:51,855 | INFO | ROC-AUC : GINI AUC 0.966508 0.933015 threshold_value tpr fpr 0 0.040816 1.000000 0.367263 1 0.081633 0.977778 0.305371 2 0.102041 0.977778 0.285422 3 0.122449 0.977778 0.276215 4 0.163265 0.955556 0.254220 5 0.183673 0.955556 0.231714 6 0.142857 0.977778 0.265985 7 0.061224 1.000000 0.336061 8 0.020408 1.000000 0.648082 9 0.000000 1.000000 1.000000 2025-11-04 04:40:52,302 | INFO | Confusion Matrix : [[1949 6] [ 8 37]]>>> prediction
automl_id Prediction isFraud prob_0 prob_1 0 530 0 0 0.989053 0.010947 1 734 0 0 0.646486 0.353514 2 1672 0 0 0.654323 0.345677 3 938 0 0 0.966440 0.033560 4 122 0 0 0.980079 0.019921 5 469 0 0 0.928566 0.071434 6 61 0 0 0.971149 0.028851 7 326 0 0 0.815679 0.184321 8 1203 0 0 0.981232 0.018768 9 1407 0 0 0.974600 0.025400
- Generate evaluation metrics on test dataset using best performing model.
>>> performance_metrics = fd.evaluate(fraud_test)
2025-11-04 04:41:40,851 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 04:41:41,438 | INFO | Following model is being picked for evaluation: 2025-11-04 04:41:41,438 | INFO | Model ID : XGBOOST_2 2025-11-04 04:41:41,438 | INFO | Feature Selection Method : rfe 2025-11-04 04:41:44,136 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 1 1 CLASS_2 6 37 0.860465 0.822222 0.840909 45 0 0 CLASS_1 1949 8 0.995912 0.996931 0.996421 1955 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 0.993000 1 5 Macro-Precision 0.928189 2 6 Macro-Recall 0.909577 3 7 Macro-F1 0.918665 4 9 Weighted-Recall 0.993000 5 10 Weighted-F1 0.992922 6 8 Weighted-Precision 0.992865 7 4 Micro-F1 0.993000 8 2 Micro-Precision 0.993000 9 1 Accuracy 0.993000>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 1949 8 0.995912 0.996931 0.996421 1955 1 1 CLASS_2 6 37 0.860465 0.822222 0.840909 45
- Generate prediction on test dataset using second best performing model.
>>> prediction = fd.predict(fraud_test,2)
2025-11-04 04:42:13,468 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 04:42:14,019 | INFO | Following model is being picked for evaluation: 2025-11-04 04:42:14,020 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 04:42:14,020 | INFO | Feature Selection Method : rfe 2025-11-04 04:42:14,773 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 04:42:17,753 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 04:42:17,834 | INFO | Prediction : automl_id prediction prob_1 prob_0 isFraud 0 1672 0 0.0 1.0 0 1 61 0 0.0 1.0 0 2 326 0 0.0 1.0 0 3 1876 0 0.0 1.0 0 4 530 0 0.0 1.0 0 5 938 0 0.0 1.0 0 6 1203 0 0.0 1.0 0 7 122 0 0.0 1.0 0 8 265 0 0.0 1.0 0 9 469 0 0.0 1.0 0 2025-11-04 04:42:20,578 | INFO | ROC-AUC : GINI AUC 0.873737 0.747474 threshold_value tpr fpr 0 0.040816 0.755556 0.004604 1 0.081633 0.755556 0.004604 2 0.102041 0.755556 0.004604 3 0.122449 0.755556 0.004604 4 0.163265 0.755556 0.004604 5 0.183673 0.755556 0.004604 6 0.142857 0.755556 0.004604 7 0.061224 0.755556 0.004604 8 0.020408 0.755556 0.004604 9 0.000000 1.000000 1.000000 2025-11-04 04:42:21,430 | INFO | Confusion Matrix : [[1946 9] [ 11 34]]>>> prediction.head()
automl_id prediction prob_1 prob_0 isFraud 0 122 0 0.0 1.0 0 1 61 0 0.0 1.0 0 2 326 0 0.0 1.0 0 3 1876 0 0.0 1.0 0 4 530 0 0.0 1.0 0 5 1407 0 0.0 1.0 0 6 734 0 0.0 1.0 0 7 1672 0 0.0 1.0 0 8 265 0 0.0 1.0 0 9 469 0 0.0 1.0 0
- Generate evaluation metrics on test dataset using second best performing model.
>>> performance_metrics = fd.evaluate(fraud_test, 2)
2025-11-04 04:42:50,505 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 04:42:51,049 | INFO | Following model is being picked for evaluation: 2025-11-04 04:42:51,049 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 04:42:51,049 | INFO | Feature Selection Method : rfe 2025-11-04 04:42:56,196 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 1 1 CLASS_2 9 34 0.790698 0.755556 0.772727 45 0 0 CLASS_1 1946 11 0.994379 0.995396 0.994888 1955 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 0.990000 1 5 Macro-Precision 0.892538 2 6 Macro-Recall 0.875476 3 7 Macro-F1 0.883807 4 9 Weighted-Recall 0.990000 5 10 Weighted-F1 0.989889 6 8 Weighted-Precision 0.989796 7 4 Micro-F1 0.990000 8 2 Micro-Precision 0.990000 9 1 Accuracy 0.990000>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 1946 11 0.994379 0.995396 0.994888 1955 1 1 CLASS_2 9 34 0.790698 0.755556 0.772727 45