This example predicts whether the user will churn based on different factors. Run AutoML to get the best performing model with the following specifications:
- Set early stopping criteria, i.e., time limit to 100 sec and performance metrics MACRO-F1 threshold value to 0.7.
- Opt for verbose level 2 to get detailed logging.
- Load the churn dataset.
>>> load_example_data('teradataml','bank_churn')>>> bank_df = DataFrame("bank_churn") >>> bank_df_sample = bank_df.sample(frac = [0.8, 0.2]) >>> bank_train= bank_df_sample[bank_df_sample['sampleid'] == 1].drop('sampleid', axis=1) >>> bank_test = bank_df_sample[bank_df_sample['sampleid'] == 2].drop('sampleid', axis=1) - Create an AutoChurn instance.
>>> ch = AutoChurn(verbose=2, >>> max_runtime_secs=100, >>> stopping_metric='MACRO-F1', >>> stopping_tolerance=0.7, >>> seed=42)
- Fit the data.
>>> ch.fit(bank_train,bank_train.churn)
1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:44:36,274 | INFO | Feature Exploration started 2025-11-04 04:44:36,274 | INFO | Data Overview: 2025-11-04 04:44:36,400 | INFO | Total Rows in the data: 8000 2025-11-04 04:44:36,442 | INFO | Total Columns in the data: 12 2025-11-04 04:44:37,074 | INFO | Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage 0 churn BIGINT 8000 0 NaN 6369.0 1631.0 0.0 0.0 100.0 1 country VARCHAR(256) CHARACTER SET LATIN 8000 0 0.0 NaN NaN NaN 0.0 100.0 2 gender VARCHAR(20) CHARACTER SET LATIN 8000 0 0.0 NaN NaN NaN 0.0 100.0 3 tenure BIGINT 8000 0 NaN 314.0 7686.0 0.0 0.0 100.0 4 age INTEGER 8000 0 NaN 0.0 8000.0 0.0 0.0 100.0 5 products_number BIGINT 8000 0 NaN 0.0 8000.0 0.0 0.0 100.0 6 customer_id BIGINT 8000 0 NaN 0.0 8000.0 0.0 0.0 100.0 7 balance FLOAT 8000 0 NaN 2887.0 5113.0 0.0 0.0 100.0 8 credit_card BIGINT 8000 0 NaN 2363.0 5637.0 0.0 0.0 100.0 9 credit_score BIGINT 8000 0 NaN 0.0 8000.0 0.0 0.0 100.0 10 estimated_salary FLOAT 8000 0 NaN 0.0 8000.0 0.0 0.0 100.0 11 active_member BIGINT 8000 0 NaN 3880.0 4120.0 0.0 0.0 100.0 2025-11-04 04:44:37,906 | INFO | Statistics of Data: ATTRIBUTE StatName StatValue 0 credit_card MAXIMUM 1.0 1 age MINIMUM 18.0 2 age MAXIMUM 92.0 3 churn COUNT 8000.0 4 churn MAXIMUM 1.0 5 active_member COUNT 8000.0 6 active_member MINIMUM 0.0 7 active_member MAXIMUM 1.0 8 churn MINIMUM 0.0 9 age COUNT 8000.0 2025-11-04 04:44:38,499 | INFO | Categorical Columns with their Distinct values: ColumnName DistinctValueCount country 3 gender 2 2025-11-04 04:44:40,968 | INFO | No Futile columns found. 2025-11-04 04:44:44,736 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 products_number 0.5875 1 customer_id 1.9875 2 balance 1.0000 3 estimated_salary 1.9875 4 age 1.6875 5 credit_score 0.8875 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:44:45,119 | INFO | Feature Engineering started ... 2025-11-04 04:44:45,119 | INFO | Handling duplicate records present in dataset ... 2025-11-04 04:44:45,315 | INFO | Analysis completed. No action taken. 2025-11-04 04:44:45,315 | INFO | Total time to handle duplicate records: 0.20 sec 2025-11-04 04:44:45,316 | INFO | Handling less significant features from data ... 2025-11-04 04:44:49,520 | INFO | Analysis indicates all categorical columns are significant. No action Needed. 2025-11-04 04:44:49,520 | INFO | Total time to handle less significant features: 4.20 sec 2025-11-04 04:44:49,520 | INFO | Handling Date Features ... 2025-11-04 04:44:49,520 | INFO | Analysis Completed. Dataset does not contain any feature related to dates. No action needed. 2025-11-04 04:44:49,520 | INFO | Total time to handle date features: 0.00 sec 2025-11-04 04:44:49,520 | INFO | Checking Missing values in dataset using AutoChurn function... 2025-11-04 04:44:51,510 | INFO | Analysis Completed. No Missing Values Detected. 2025-11-04 04:44:51,510 | INFO | Total time to find missing values in data using AutoChurn : 1.99 sec 2025-11-04 04:44:51,511 | INFO | Imputing Missing Values using SimpleImputeFit partition column... 2025-11-04 04:44:51,511 | INFO | Analysis completed. No imputation required. 2025-11-04 04:44:51,511 | INFO | Time taken to perform imputation: 0.00 sec 2025-11-04 04:44:51,511 | INFO | Performing target encoding for categorical columns ... 2025-11-04 04:44:56,597 | INFO | Target Encoding completed for categorical columns using CBM_BETA. 2025-11-04 04:44:56,597 | INFO | Target Encoding these Columns: ['country', 'gender'] 2025-11-04 04:44:56,598 | INFO | Sample of dataset after performing target encoding: gender age balance products_number active_member estimated_salary credit_card tenure automl_id customer_id churn credit_score country 0.16705 0.246369 45 0.00 1 0 73881.68 1 6 45 15737047 1 754 0.16705 0.168019 36 0.00 2 0 35156.54 1 3 73 15569364 0 666 0.16705 0.168019 41 144147.68 1 1 14789.90 1 5 77 15728523 0 522 0.16705 0.168019 34 0.00 2 1 91711.66 1 5 81 15793247 0 498 0.16705 0.246369 43 0.00 2 0 2465.80 0 3 97 15684925 0 850 0.16705 0.168019 48 148116.48 1 0 116973.48 0 0 117 15722548 0 540 0.16705 0.168019 29 0.00 2 0 172097.40 0 9 89 15591428 0 781 0.16705 0.168019 34 117468.67 1 0 185227.42 1 2 57 15713637 0 699 0.16705 0.246369 47 0.00 1 1 66408.01 1 8 37 15644692 1 546 0.16705 0.246369 23 0.00 2 1 141756.32 1 1 17 15675749 0 695 8000 rows X 13 columns 2025-11-04 04:44:56,732 | INFO | Time taken to encode the columns: 5.22 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:44:56,732 | INFO | Data preparation started ... 2025-11-04 04:44:56,733 | INFO | AutoChurn Outlier preprocessing using Percentile... 2025-11-04 04:45:00,566 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 balance 1.0000 1 customer_id 1.9875 2 credit_score 0.8875 3 products_number 0.5875 4 automl_id 1.9875 5 age 1.6875 6 estimated_salary 1.9875 2025-11-04 04:45:01,097 | INFO | Replacing outliers with median: ['automl_id', 'products_number', 'age', 'estimated_salary', 'customer_id', 'balance', 'credit_score'] 2025-11-04 04:45:04,102 | INFO | Sample of dataset after replacing outliers with MEDIAN: gender age balance products_number active_member estimated_salary credit_card tenure automl_id customer_id churn credit_score country 0.167050 0.168019 41 144147.68 1 1 14789.90 1 5 4003 15728523 0 522 0.167050 0.246369 43 0.00 2 0 100129.33 0 3 4003 15684925 0 652 0.167050 0.168019 48 148116.48 1 0 116973.48 0 0 4003 15722548 0 540 0.167050 0.168019 37 0.00 1 0 120906.83 0 2 4003 15730460 0 722 0.167050 0.168019 39 139153.68 2 0 147662.33 1 3 4003 15614491 0 539 0.167050 0.246369 33 0.00 1 0 142797.50 0 9 4003 15772243 1 612 0.323218 0.168019 42 117691.00 1 1 23135.65 1 1 4003 15801832 1 684 0.323218 0.246369 34 97440.67 1 0 100129.33 1 2 4003 15690198 0 790 0.323218 0.168019 41 139706.31 1 0 63337.19 1 7 4003 15587451 0 778 0.323218 0.246369 28 105173.99 1 1 29835.37 0 10 4003 15709252 1 616 8000 rows X 13 columns 2025-11-04 04:45:04,235 | INFO | Time Taken by Outlier processing: 7.50 sec 2025-11-04 04:45:04,236 | INFO | Checking imbalance data ... 2025-11-04 04:45:04,319 | INFO | Imbalance Found. 2025-11-04 04:45:04,319 | INFO | Handling data imbalance using SMOTE ... 2025-11-04 04:45:10,751 | INFO | Completed data imbalance handling. 2025-11-04 04:45:12,647 | INFO | Feature selection using rfe ... 2025-11-04 04:46:16,732 | INFO | feature selected by RFE: ['age', 'active_member', 'products_number', 'tenure', 'customer_id', 'credit_score', 'country', 'gender', 'balance', 'estimated_salary'] 2025-11-04 04:46:16,734 | INFO | Total time taken by feature selection: 64.09 sec 2025-11-04 04:46:17,682 | INFO | Scaling Features of rfe data ... 2025-11-04 04:46:19,238 | INFO | columns that will be scaled: ['r_age', 'r_active_member', 'r_products_number', 'r_tenure', 'r_customer_id', 'r_credit_score', 'r_country', 'r_gender', 'r_balance', 'r_estimated_salary'] 2025-11-04 04:46:21,237 | INFO | Dataset sample after scaling: automl_id churn r_age r_active_member r_products_number r_tenure r_customer_id r_credit_score r_country r_gender r_balance r_estimated_salary 0 6 1 0.571429 0.0 0.0 0.6 0.706082 0.820988 0.048721 1.0 0.000000 0.355270 1 8 0 0.228571 1.0 1.0 0.6 0.605200 0.506173 0.000000 1.0 0.000000 0.475320 2 9 0 0.257143 0.0 0.0 0.2 0.496531 0.932099 1.000000 1.0 0.595714 0.501021 3 10 0 0.314286 0.0 1.0 0.3 0.496531 0.549383 0.048721 0.0 0.000000 0.140234 4 12 0 0.485714 0.0 1.0 0.3 0.183988 0.506173 0.000000 1.0 0.510438 0.155546 5 13 0 0.457143 0.0 0.0 0.7 0.036955 0.895062 1.000000 0.0 0.854109 0.296718 6 11 1 0.657143 1.0 1.0 0.2 0.443501 0.746914 0.058465 1.0 0.200035 0.576641 7 7 1 0.428571 0.0 0.0 0.3 0.536094 0.438272 0.017052 1.0 0.000000 0.377752 8 5 1 0.485714 1.0 0.0 0.1 0.995858 0.604938 1.000000 0.0 0.719516 0.073483 9 4 0 0.085714 0.0 0.0 0.7 0.630740 0.274691 0.000000 0.0 0.000000 0.740601 10928 rows X 12 columns 2025-11-04 04:46:22,554 | INFO | Total time taken by feature scaling: 4.87 sec 2025-11-04 04:46:22,554 | INFO | Scaling Features of pca data ... 2025-11-04 04:46:23,607 | INFO | columns that will be scaled: ['country', 'gender', 'age', 'balance', 'products_number', 'active_member', 'estimated_salary', 'credit_card', 'tenure', 'customer_id', 'credit_score'] 2025-11-04 04:46:25,741 | INFO | Dataset sample after scaling: automl_id churn country gender age balance products_number active_member estimated_salary credit_card tenure customer_id credit_score 0 8012 1 0.183401 0.000000 0.571429 0.555790 0.0 1.0 0.500402 0.0 0.4 0.792462 0.296296 1 7965 1 0.005166 1.000000 0.485714 0.000000 0.0 1.0 0.571366 1.0 0.5 0.818096 0.148148 2 7969 1 0.013981 0.258173 0.714286 0.000000 0.0 0.0 0.474954 0.0 0.7 0.884747 0.172840 3 7 1 0.017007 1.000000 0.428571 0.000000 0.0 0.0 0.377752 0.0 0.3 0.536094 0.438272 4 15 1 0.025299 0.000000 0.457143 0.471352 0.0 0.0 0.504431 1.0 0.3 0.490506 0.160494 5 16050 1 0.013621 1.000000 0.571429 0.000000 0.0 1.0 0.896129 0.0 0.5 0.279515 0.175926 6 16054 1 0.033517 1.000000 0.285714 0.000000 0.0 0.0 0.448751 0.0 0.7 0.516266 0.429012 7 16058 1 0.404530 0.618559 0.257143 0.720069 0.0 0.0 0.250664 1.0 0.5 0.336782 0.416667 8 11 1 0.058308 1.000000 0.657143 0.200035 1.0 1.0 0.576641 1.0 0.2 0.443501 0.746914 9 7961 1 0.067613 0.330397 0.571429 0.872392 1.0 0.0 0.748723 1.0 0.9 0.466008 0.354938 10928 rows X 13 columns 2025-11-04 04:46:26,573 | INFO | Total time taken by feature scaling: 4.02 sec 2025-11-04 04:46:26,573 | INFO | Dimension Reduction using pca ... 2025-11-04 04:46:27,412 | INFO | PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9'] 2025-11-04 04:46:27,413 | INFO | Total time taken by PCA: 0.84 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 04:46:28,313 | INFO | Model Training started ... 2025-11-04 04:46:28,356 | INFO | Hyperparameters used for model training: 2025-11-04 04:46:28,356 | INFO | Model: glm 2025-11-04 04:46:28,356 | INFO | Hyperparameters: {'response_column': 'churn', 'name': 'glm', 'family': 'BINOMIAL', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'iter_num_no_change': (5, 10, 50), 'iter_max': (300, 400), 'batch_size': (10, 100, 150)} 2025-11-04 04:46:28,357 | INFO | Total number of models for glm: 648 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:46:28,357 | INFO | Model: svm 2025-11-04 04:46:28,357 | INFO | Hyperparameters: {'response_column': 'churn', 'name': 'svm', 'model_type': 'Classification', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'tolerance': (0.001, 0.01), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'nesterov': True, 'intercept': True, 'iter_num_no_change': (5, 10, 50), 'local_sgd_iterations ': (10, 20), 'iter_max': (300, 400), 'batch_size': (10, 100, 150)} 2025-11-04 04:46:28,357 | INFO | Total number of models for svm: 2592 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:46:28,358 | INFO | Model: knn 2025-11-04 04:46:28,358 | INFO | Hyperparameters: {'response_column': 'churn', 'name': 'knn', 'model_type': 'Classification', 'k': (3, 5, 6, 8, 10, 12), 'id_column': 'automl_id', 'voting_weight': 1.0} 2025-11-04 04:46:28,358 | INFO | Total number of models for knn: 6 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:46:28,358 | INFO | Model: decision_forest 2025-11-04 04:46:28,358 | INFO | Hyperparameters: {'response_column': 'churn', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': (0.0, 0.2, 0.3), 'max_depth': (5, 4, 6, 7), 'min_node_size': (1, 3, 4), 'num_trees': (-1,), 'seed': 42} 2025-11-04 04:46:28,358 | INFO | Total number of models for decision_forest: 36 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:46:28,358 | INFO | Model: xgboost 2025-11-04 04:46:28,358 | INFO | Hyperparameters: {'response_column': 'churn', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.2, 0.3), 'lambda1': (1.0, 0.1, 1.0), 'shrinkage_factor': (0.5, 0.01, 0.1, 0.2), 'max_depth': (5, 4, 6, 7), 'min_node_size': (1, 3, 4), 'iter_num': (10, 30, 40), 'num_boosted_trees': (-1, 10, 20), 'seed': 42} 2025-11-04 04:46:28,360 | INFO | Total number of models for xgboost: 7776 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 04:46:28,360 | INFO | Performing hyperparameter tuning ... 2025-11-04 04:46:29,618 | INFO | Model training for glm 2025-11-04 04:46:48,368 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:46:48,368 | INFO | Model training for svm 2025-11-04 04:47:13,008 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:47:13,009 | INFO | Model training for knn 2025-11-04 04:48:27,858 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:48:27,858 | INFO | Model training for decision_forest 2025-11-04 04:48:50,857 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:48:50,857 | INFO | Model training for xgboost 2025-11-04 04:49:09,178 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 04:49:09,181 | INFO | Leaderboard RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_0 rfe 0.817932 0.817932 ... 0.811864 0.812442 0.817630 0.817932 0.817756 1 2 DECISIONFOREST_2 rfe 0.817932 0.817932 ... 0.811864 0.812442 0.817630 0.817932 0.817756 2 3 XGBOOST_0 rfe 0.811070 0.811070 ... 0.816414 0.809047 0.819893 0.811070 0.812302 3 4 XGBOOST_2 rfe 0.809241 0.809241 ... 0.817025 0.807742 0.821520 0.809241 0.810553 4 5 KNN_4 rfe 0.788655 0.788655 ... 0.780982 0.781926 0.788047 0.788655 0.788270 5 6 KNN_0 rfe 0.787283 0.787283 ... 0.778559 0.780028 0.786415 0.787283 0.786643 6 7 XGBOOST_1 pca 0.772187 0.772187 ... 0.774954 0.769313 0.779456 0.772187 0.773577 7 8 XGBOOST_3 pca 0.768984 0.768984 ... 0.774544 0.766894 0.779825 0.768984 0.770550 8 9 DECISIONFOREST_1 pca 0.752516 0.752516 ... 0.736892 0.740360 0.750839 0.752516 0.749663 9 10 DECISIONFOREST_3 pca 0.752516 0.752516 ... 0.736892 0.740360 0.750839 0.752516 0.749663 10 11 KNN_7 pca 0.750229 0.750229 ... 0.738824 0.740750 0.748694 0.750229 0.748959 11 12 KNN_3 pca 0.745197 0.745197 ... 0.735130 0.736415 0.744002 0.745197 0.744382 12 13 GLM_5 pca 0.718664 0.718664 ... 0.715327 0.713153 0.721856 0.718664 0.719737 13 14 GLM_0 rfe 0.710430 0.710430 ... 0.688011 0.691238 0.708035 0.710430 0.703986 14 15 SVM_4 rfe 0.698079 0.698079 ... 0.716983 0.698072 0.735644 0.698079 0.698301 15 16 SVM_5 pca 0.693047 0.693047 ... 0.659700 0.660269 0.694934 0.693047 0.677744 16 17 GLM_3 pca 0.681153 0.681153 ... 0.688130 0.679500 0.697436 0.681153 0.683311 17 18 SVM_3 pca 0.666057 0.666057 ... 0.630002 0.627414 0.664701 0.666057 0.647284 18 19 SVM_1 pca 0.654620 0.654620 ... 0.652905 0.649703 0.661418 0.654620 0.656576 19 20 SVM_7 pca 0.654620 0.654620 ... 0.652905 0.649703 0.661418 0.654620 0.656576 20 21 GLM_1 pca 0.639067 0.639067 ... 0.620867 0.621961 0.634351 0.639067 0.635278 21 22 GLM_4 rfe 0.600640 0.600640 ... 0.652393 0.583213 0.756020 0.600640 0.569100 22 23 SVM_2 rfe 0.547575 0.547575 ... 0.609048 0.511899 0.750107 0.547575 0.490046 23 24 GLM_2 rfe 0.526532 0.526532 ... 0.592241 0.480144 0.755583 0.526532 0.454428 24 25 SVM_0 rfe 0.525160 0.525160 ... 0.589350 0.481550 0.733600 0.525160 0.456650 25 26 SVM_6 rfe 0.525160 0.525160 ... 0.589350 0.481550 0.733600 0.525160 0.456650 [26 rows x 13 columns] 26 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 16/16 - Display leaderboard.
>>> ch.leaderboard()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_0 rfe 0.817932 0.817932 ... 0.811864 0.812442 0.817630 0.817932 0.817756 1 2 DECISIONFOREST_2 rfe 0.817932 0.817932 ... 0.811864 0.812442 0.817630 0.817932 0.817756 2 3 XGBOOST_0 rfe 0.811070 0.811070 ... 0.816414 0.809047 0.819893 0.811070 0.812302 3 4 XGBOOST_2 rfe 0.809241 0.809241 ... 0.817025 0.807742 0.821520 0.809241 0.810553 4 5 KNN_4 rfe 0.788655 0.788655 ... 0.780982 0.781926 0.788047 0.788655 0.788270 5 6 KNN_0 rfe 0.787283 0.787283 ... 0.778559 0.780028 0.786415 0.787283 0.786643 6 7 XGBOOST_1 pca 0.772187 0.772187 ... 0.774954 0.769313 0.779456 0.772187 0.773577 7 8 XGBOOST_3 pca 0.768984 0.768984 ... 0.774544 0.766894 0.779825 0.768984 0.770550 8 9 DECISIONFOREST_1 pca 0.752516 0.752516 ... 0.736892 0.740360 0.750839 0.752516 0.749663 9 10 DECISIONFOREST_3 pca 0.752516 0.752516 ... 0.736892 0.740360 0.750839 0.752516 0.749663 10 11 KNN_7 pca 0.750229 0.750229 ... 0.738824 0.740750 0.748694 0.750229 0.748959 11 12 KNN_3 pca 0.745197 0.745197 ... 0.735130 0.736415 0.744002 0.745197 0.744382 12 13 GLM_5 pca 0.718664 0.718664 ... 0.715327 0.713153 0.721856 0.718664 0.719737 13 14 GLM_0 rfe 0.710430 0.710430 ... 0.688011 0.691238 0.708035 0.710430 0.703986 14 15 SVM_4 rfe 0.698079 0.698079 ... 0.716983 0.698072 0.735644 0.698079 0.698301 15 16 SVM_5 pca 0.693047 0.693047 ... 0.659700 0.660269 0.694934 0.693047 0.677744 16 17 GLM_3 pca 0.681153 0.681153 ... 0.688130 0.679500 0.697436 0.681153 0.683311 17 18 SVM_3 pca 0.666057 0.666057 ... 0.630002 0.627414 0.664701 0.666057 0.647284 18 19 SVM_1 pca 0.654620 0.654620 ... 0.652905 0.649703 0.661418 0.654620 0.656576 19 20 SVM_7 pca 0.654620 0.654620 ... 0.652905 0.649703 0.661418 0.654620 0.656576 20 21 GLM_1 pca 0.639067 0.639067 ... 0.620867 0.621961 0.634351 0.639067 0.635278 21 22 GLM_4 rfe 0.600640 0.600640 ... 0.652393 0.583213 0.756020 0.600640 0.569100 22 23 SVM_2 rfe 0.547575 0.547575 ... 0.609048 0.511899 0.750107 0.547575 0.490046 23 24 GLM_2 rfe 0.526532 0.526532 ... 0.592241 0.480144 0.755583 0.526532 0.454428 24 25 SVM_0 rfe 0.525160 0.525160 ... 0.589350 0.481550 0.733600 0.525160 0.456650 25 26 SVM_6 rfe 0.525160 0.525160 ... 0.589350 0.481550 0.733600 0.525160 0.456650 [26 rows x 13 columns]
- Display best performing model.
>>> ch.leader()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 DECISIONFOREST_0 rfe 0.817932 0.817932 ... 0.811864 0.812442 0.81763 0.817932 0.817756 [1 rows x 13 columns]
- Get hyperparameters for trained model:
- Display model hyperparameters for rank 1.
>>> ch.model_hyperparameters(rank=1)
{'response_column': 'churn', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': 0.0, 'max_depth': 5, 'min_node_size': 1, 'num_trees': -1, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0']} - Display model hyperparameters for rank 5.
>>> ch.model_hyperparameters(rank=5)
{'response_column': 'churn', 'name': 'knn', 'model_type': 'Classification', 'k': 5, 'id_column': 'automl_id', 'voting_weight': 1.0, 'persist': False, 'output_prob': True, 'output_responses': ['1', '0']}
- Display model hyperparameters for rank 1.
- Generate prediction on test dataset using best performing model.
>>> prediction = ch.predict(bank_test)
2025-11-04 04:50:58,466 | INFO | Data Transformation started ... 2025-11-04 04:50:58,467 | INFO | Performing transformation carried out in feature engineering phase ... 2025-11-04 04:50:59,185 | INFO | Updated dataset after performing target column transformation : customer_id credit_score country gender age tenure balance products_number credit_card active_member estimated_salary churn automl_id 0 15787884 692 France Female 30 7 0.00 2 1 1 18826.34 0 14 1 15688963 731 France Female 52 10 0.00 1 1 1 24998.75 1 9 2 15791045 568 France Female 38 3 132951.92 1 0 1 124486.28 0 13 3 15707132 465 France Male 33 5 0.00 2 0 1 78698.09 0 7 4 15805523 717 France Female 28 1 90537.16 1 0 1 74800.99 0 15 5 15796612 527 France Female 31 1 112203.25 1 1 0 182266.01 0 4 6 15602909 604 Spain Female 41 10 0.00 2 1 1 166224.39 0 8 7 15652808 774 France Female 41 5 126670.37 1 1 0 102426.06 0 12 8 15670039 509 Spain Female 25 3 108738.71 2 1 0 106920.57 0 11 9 15668775 757 France Male 47 3 130747.10 1 1 0 143829.54 0 5 2000 rows X 13 columns 2025-11-04 04:51:00,618 | INFO | Updated dataset after performing categorical encoding : gender age balance products_number active_member estimated_salary credit_card tenure automl_id customer_id churn credit_score country 0.159022 0.168019 62 64119.38 1 1 76569.64 1 1 50 15727299 1 445 0.159022 0.168019 35 0.00 2 1 140780.80 0 7 94 15611105 0 799 0.159022 0.168019 34 137523.02 1 0 24761.36 0 1 98 15721303 0 640 0.159022 0.246369 26 135219.57 1 1 59747.63 0 4 106 15749851 0 702 0.159022 0.168019 37 138207.08 1 0 60778.11 1 1 114 15583371 1 632 0.159022 0.168019 62 0.00 2 1 180243.56 1 5 118 15719793 0 850 0.323218 0.246369 25 152885.77 1 0 58214.79 1 5 41 15576990 0 790 0.323218 0.246369 33 150412.14 2 0 170764.08 1 9 57 15810457 0 728 0.323218 0.246369 26 97331.19 1 0 63717.49 1 1 89 15604314 0 703 0.323218 0.168019 36 115725.24 2 0 1871.25 0 5 93 15574868 0 792 2000 rows X 13 columns 2025-11-04 04:51:00,734 | INFO | Performing transformation carried out in data preparation phase ... 2025-11-04 04:51:01,544 | INFO | Updated dataset after performing RFE feature selection: automl_id active_member products_number tenure customer_id credit_score country gender balance estimated_salary churn age 40 408 1 1 5 15784286 641 0.1590 0.1680 102145.13 100637.07 0 40 280 1 2 2 15806230 629 0.3232 0.1680 121647.54 64849.74 1 40 745 0 1 3 15619514 507 0.3232 0.1680 120105.43 92075.01 1 40 951 0 2 8 15610090 667 0.1670 0.1680 72945.29 98931.50 0 40 947 0 1 6 15645572 743 0.1670 0.2464 0.00 28280.80 1 40 642 0 1 4 15626612 741 0.1590 0.1680 104784.23 135163.76 1 40 1590 0 3 5 15765300 596 0.3232 0.1680 62389.03 148623.43 1 40 707 1 2 2 15679733 796 0.3232 0.1680 113228.38 46415.09 0 40 644 1 1 1 15577771 453 0.3232 0.2464 111524.49 120373.84 1 40 210 1 2 3 15739857 785 0.1670 0.2464 0.00 96832.82 0 2000 rows X 12 columns 2025-11-04 04:51:02,510 | INFO | Updated dataset after performing scaling on RFE selected features : automl_id churn r_age r_active_member r_products_number r_tenure r_customer_id r_credit_score r_country r_gender r_balance r_estimated_salary 0 1289 0 1.028571 1.0 0.0 0.2 1.036606 -0.132716 1.000000 1.0 0.841570 0.111099 1 1491 1 0.257143 0.0 0.0 0.7 0.261669 0.413580 1.000000 1.0 0.603265 0.716348 2 639 0 0.257143 1.0 0.0 0.6 0.048218 1.117284 0.048721 0.0 0.000000 0.238185 3 210 0 0.428571 1.0 1.0 0.3 0.718651 0.916667 0.048721 1.0 0.000000 0.482716 4 408 0 0.428571 1.0 0.0 0.5 0.917377 0.472222 0.000000 0.0 0.624475 0.503840 5 1605 0 -0.171429 0.0 1.0 0.8 0.857807 0.780864 1.000000 1.0 0.664620 0.083043 6 700 0 -0.171429 1.0 1.0 0.2 0.513099 0.336420 0.048721 0.0 0.000000 0.450530 7 86 1 -0.171429 1.0 0.0 0.9 0.909021 0.311728 1.000000 1.0 0.510456 0.739201 8 644 1 0.428571 1.0 0.0 0.1 -0.006343 -0.108025 1.000000 1.0 0.681817 0.613436 9 1201 0 0.257143 0.0 0.0 0.1 0.567972 0.456790 0.048721 0.0 0.513882 0.153196 2000 rows X 12 columns 2025-11-04 04:51:03,878 | INFO | Updated dataset after performing scaling for PCA feature selection : automl_id churn country gender age balance products_number active_member estimated_salary credit_card tenure customer_id credit_score 0 86 1 0.999890 1.000394 -0.171429 0.510456 0.0 1.0 0.739201 1.0 0.9 0.909021 0.311728 1 1611 1 0.048586 1.000394 1.028571 0.492381 0.0 0.0 0.674138 1.0 0.7 0.733765 0.614198 2 1289 0 0.999890 1.000394 1.028571 0.841570 0.0 1.0 0.111099 0.0 0.2 1.036606 -0.132716 3 1201 0 0.048586 -0.000247 0.257143 0.513882 0.0 0.0 0.153196 0.0 0.1 0.567972 0.456790 4 639 0 0.048586 -0.000247 0.257143 0.000000 0.0 1.0 0.238185 0.0 0.6 0.048218 1.117284 5 210 0 0.048586 1.000394 0.428571 0.000000 1.0 1.0 0.482716 1.0 0.3 0.718651 0.916667 6 644 1 0.999890 1.000394 0.428571 0.681817 0.0 1.0 0.613436 1.0 0.1 -0.006343 -0.108025 7 408 0 -0.000137 -0.000247 0.428571 0.624475 0.0 1.0 0.503840 1.0 0.5 0.917377 0.472222 8 1491 1 0.999890 1.000394 0.257143 0.603265 0.0 0.0 0.716348 1.0 0.7 0.261669 0.413580 9 827 1 0.048586 1.000394 1.028571 0.398548 0.0 1.0 0.302111 1.0 0.8 0.498969 0.148148 2000 rows X 13 columns 2025-11-04 04:51:04,362 | INFO | Updated dataset after performing PCA feature selection : automl_id col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 churn 0 827 -0.038034 0.395402 0.476988 0.489123 -0.501875 0.269476 -0.107793 -0.241479 -0.106426 -0.339520 1 1 1201 -0.311645 -0.164746 -0.390991 -0.852924 -0.363499 -0.442413 0.033928 -0.316707 0.108449 -0.025788 0 2 210 0.881702 0.005745 0.554778 0.457411 -0.006528 -0.144999 0.249513 -0.066713 -0.097130 0.415885 0 3 1605 -0.106923 -0.524032 0.365589 0.342021 0.978652 0.282409 0.164703 -0.525404 0.128984 0.246235 0 4 1611 -0.492504 -0.464567 0.357513 0.300643 -0.400913 0.254261 0.237596 0.045910 0.015132 0.114967 1 5 1491 -0.841408 -0.242276 0.343362 0.354427 0.349340 0.190090 -0.199311 0.234644 -0.249660 -0.076840 1 6 644 -0.471611 0.670113 0.487760 0.530222 0.288229 -0.444138 -0.363210 0.282901 -0.279595 -0.583980 1 7 700 1.040975 0.125794 -0.420409 0.247582 0.018582 -0.307925 0.055894 0.011868 -0.085413 -0.166389 0 8 1289 -0.560369 0.880995 0.707058 -0.433059 0.333166 -0.249209 0.442741 -0.547914 -0.218182 -0.606135 0 9 639 0.305519 0.588268 -0.249405 -0.659160 -0.574371 0.014322 -0.512917 -0.130990 -0.291023 0.662586 0 10 rows X 12 columns 2025-11-04 04:51:04,806 | INFO | Data Transformation completed.█████| 100% - 9/9 2025-11-04 04:51:05,416 | INFO | Following model is being picked for evaluation: 2025-11-04 04:51:05,416 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 04:51:05,416 | INFO | Feature Selection Method : rfe 2025-11-04 04:51:06,478 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 04:51:11,383 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 04:51:11,476 | INFO | Prediction : automl_id prediction prob_1 prob_0 churn 0 639 0 0.0 1.0 0 1 644 1 1.0 0.0 1 2 408 0 0.0 1.0 0 3 1605 0 0.0 1.0 0 4 86 0 0.0 1.0 1 5 827 1 1.0 0.0 1 6 1611 1 1.0 0.0 1 7 1289 1 1.0 0.0 0 8 700 0 0.0 1.0 0 9 210 0 0.0 1.0 0 2025-11-04 04:51:15,457 | INFO | ROC-AUC : GINI AUC 0.662776 0.325553 threshold_value tpr fpr 0 0.040816 0.58867 0.165621 1 0.081633 0.58867 0.165621 2 0.102041 0.58867 0.165621 3 0.122449 0.58867 0.165621 4 0.163265 0.58867 0.165621 5 0.183673 0.58867 0.165621 6 0.142857 0.58867 0.165621 7 0.061224 0.58867 0.165621 8 0.020408 0.58867 0.165621 9 0.000000 1.00000 1.000000 2025-11-04 04:51:16,852 | INFO | Confusion Matrix : [[1330 264] [ 167 239]]>>> prediction.head()
automl_id prediction prob_1 prob_0 churn 0 639 0 0.0 1.0 0 1 644 1 1.0 0.0 1 2 408 0 0.0 1.0 0 3 1605 0 0.0 1.0 0 4 86 0 0.0 1.0 1 5 827 1 1.0 0.0 1 6 1611 1 1.0 0.0 1 7 1289 1 1.0 0.0 0 8 700 0 0.0 1.0 0 9 210 0 0.0 1.0 0
- Generate evaluation metrics on test dataset using best performing model.
>>> performance_metrics = ch.evaluate(bank_test)
2025-11-04 04:51:56,301 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 04:51:56,849 | INFO | Following model is being picked for evaluation: 2025-11-04 04:51:56,849 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 04:51:56,849 | INFO | Feature Selection Method : rfe 2025-11-04 04:52:03,186 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 1330 167 0.888444 0.834379 0.860563 1594 1 1 CLASS_2 264 239 0.475149 0.588670 0.525853 406 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 0.784500 1 5 Macro-Precision 0.681796 2 6 Macro-Recall 0.711524 3 7 Macro-F1 0.693208 4 9 Weighted-Recall 0.784500 5 10 Weighted-F1 0.792617 6 8 Weighted-Precision 0.804545 7 4 Micro-F1 0.784500 8 2 Micro-Precision 0.784500 9 1 Accuracy 0.784500>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 1 1 CLASS_2 264 239 0.475149 0.588670 0.525853 406 0 0 CLASS_1 1330 167 0.888444 0.834379 0.860563 1594
- Generate prediction on test dataset using second best performing model.
>>> prediction = ch.predict(bank_test,2)
2025-11-04 04:52:32,427 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 04:52:32,976 | INFO | Following model is being picked for evaluation: 2025-11-04 04:52:32,976 | INFO | Model ID : DECISIONFOREST_2 2025-11-04 04:52:32,976 | INFO | Feature Selection Method : rfe 2025-11-04 04:52:33,789 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 04:52:37,270 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 04:52:37,366 | INFO | Prediction : automl_id prediction prob_1 prob_0 churn 0 1289 1 1.0 0.0 0 1 1491 0 0.0 1.0 1 2 639 0 0.0 1.0 0 3 210 0 0.0 1.0 0 4 408 0 0.0 1.0 0 5 1605 0 0.0 1.0 0 6 700 0 0.0 1.0 0 7 86 0 0.0 1.0 1 8 644 1 1.0 0.0 1 9 1201 0 0.0 1.0 0 2025-11-04 04:52:41,136 | INFO | ROC-AUC : GINI AUC 0.662776 0.325553 threshold_value tpr fpr 0 0.040816 0.58867 0.165621 1 0.081633 0.58867 0.165621 2 0.102041 0.58867 0.165621 3 0.122449 0.58867 0.165621 4 0.163265 0.58867 0.165621 5 0.183673 0.58867 0.165621 6 0.142857 0.58867 0.165621 7 0.061224 0.58867 0.165621 8 0.020408 0.58867 0.165621 9 0.000000 1.00000 1.000000 2025-11-04 04:52:42,530 | INFO | Confusion Matrix : [[1330 264] [ 167 239]]>>> prediction.head()
automl_id prediction prob_1 prob_0 churn 0 6 1 1.0 0.0 1 1 8 0 0.0 1.0 0 2 9 1 1.0 0.0 1 3 10 1 1.0 0.0 0 4 12 0 0.0 1.0 0 5 13 0 0.0 1.0 0 6 11 0 0.0 1.0 0 7 7 0 0.0 1.0 0 8 5 1 1.0 0.0 0 9 4 0 0.0 1.0 0
- Generate evaluation metrics on test dataset using second best performing model.
>>> performance_metrics = ch.evaluate(bank_test, 2)
2025-11-04 04:53:23,452 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 04:53:24,000 | INFO | Following model is being picked for evaluation: 2025-11-04 04:53:24,001 | INFO | Model ID : DECISIONFOREST_2 2025-11-04 04:53:24,001 | INFO | Feature Selection Method : rfe 2025-11-04 04:53:30,301 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 1330 167 0.888444 0.834379 0.860563 1594 1 1 CLASS_2 264 239 0.475149 0.588670 0.525853 406 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 0.784500 1 5 Macro-Precision 0.681796 2 6 Macro-Recall 0.711524 3 7 Macro-F1 0.693208 4 9 Weighted-Recall 0.784500 5 10 Weighted-F1 0.792617 6 8 Weighted-Precision 0.804545 7 4 Micro-F1 0.784500 8 2 Micro-Precision 0.784500 9 1 Accuracy 0.784500>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 Precision Recall F1 Support SeqNum 0 0 CLASS_1 1330 167 0.888444 0.834379 0.860563 1594 1 1 CLASS_2 264 239 0.475149 0.588670 0.525853 406