Run AutoChurn for churn detection problem with early stopping timer and metrics threshold - Example 9: Run AutoChurn for churn prediction problem with early stopping timer and metrics threshold - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-12-05
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
This example predicts whether the user will churn based on different factors. Run AutoML to get the best performing model with the following specifications:
  • Set early stopping criteria, i.e., time limit to 100 sec and performance metrics MACRO-F1 threshold value to 0.7.
  • Opt for verbose level 2 to get detailed logging.
  1. Load the churn dataset.
    >>> load_example_data('teradataml','bank_churn')
    >>> bank_df = DataFrame("bank_churn")
    >>> bank_df_sample = bank_df.sample(frac = [0.8, 0.2])
    >>> bank_train= bank_df_sample[bank_df_sample['sampleid'] == 1].drop('sampleid', axis=1)
    >>> bank_test = bank_df_sample[bank_df_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoChurn instance.
    >>> ch = AutoChurn(verbose=2,
    >>>              max_runtime_secs=100,
    >>>              stopping_metric='MACRO-F1',
    >>>              stopping_tolerance=0.7,
    >>>              seed=42)
  3. Fit the data.
    >>> ch.fit(bank_train,bank_train.churn)
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:44:36,274 | INFO     | Feature Exploration started
    2025-11-04 04:44:36,274 | INFO     | Data Overview:
    2025-11-04 04:44:36,400 | INFO     | Total Rows in the data: 8000
    2025-11-04 04:44:36,442 | INFO     | Total Columns in the data: 12
    2025-11-04 04:44:37,074 | INFO     | Column Summary:
              ColumnName                          Datatype  NonNullCount  NullCount  BlankCount  ZeroCount  PositiveCount  NegativeCount  NullPercentage  NonNullPercentage
    0              churn                            BIGINT          8000          0         NaN     6369.0         1631.0            0.0             0.0              100.0
    1            country  VARCHAR(256) CHARACTER SET LATIN          8000          0         0.0        NaN            NaN            NaN             0.0              100.0
    2             gender   VARCHAR(20) CHARACTER SET LATIN          8000          0         0.0        NaN            NaN            NaN             0.0              100.0
    3             tenure                            BIGINT          8000          0         NaN      314.0         7686.0            0.0             0.0              100.0
    4                age                           INTEGER          8000          0         NaN        0.0         8000.0            0.0             0.0              100.0
    5    products_number                            BIGINT          8000          0         NaN        0.0         8000.0            0.0             0.0              100.0
    6        customer_id                            BIGINT          8000          0         NaN        0.0         8000.0            0.0             0.0              100.0
    7            balance                             FLOAT          8000          0         NaN     2887.0         5113.0            0.0             0.0              100.0
    8        credit_card                            BIGINT          8000          0         NaN     2363.0         5637.0            0.0             0.0              100.0
    9       credit_score                            BIGINT          8000          0         NaN        0.0         8000.0            0.0             0.0              100.0
    10  estimated_salary                             FLOAT          8000          0         NaN        0.0         8000.0            0.0             0.0              100.0
    11     active_member                            BIGINT          8000          0         NaN     3880.0         4120.0            0.0             0.0              100.0
    2025-11-04 04:44:37,906 | INFO     | Statistics of Data:
           ATTRIBUTE StatName  StatValue
    0    credit_card  MAXIMUM        1.0
    1            age  MINIMUM       18.0
    2            age  MAXIMUM       92.0
    3          churn    COUNT     8000.0
    4          churn  MAXIMUM        1.0
    5  active_member    COUNT     8000.0
    6  active_member  MINIMUM        0.0
    7  active_member  MAXIMUM        1.0
    8          churn  MINIMUM        0.0
    9            age    COUNT     8000.0
    2025-11-04 04:44:38,499 | INFO     | Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    country                   3
    gender                    2
    2025-11-04 04:44:40,968 | INFO     | No Futile columns found.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 04:44:44,736 | INFO     | Columns with outlier percentage :-
             ColumnName  OutlierPercentage
    0   products_number             0.5875
    1       customer_id             1.9875
    2           balance             1.0000
    3  estimated_salary             1.9875
    4               age             1.6875
    5      credit_score             0.8875
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:44:45,119 | INFO     | Feature Engineering started ...
    2025-11-04 04:44:45,119 | INFO     | Handling duplicate records present in dataset ...
    2025-11-04 04:44:45,315 | INFO     | Analysis completed. No action taken.
    2025-11-04 04:44:45,315 | INFO     | Total time to handle duplicate records: 0.20 sec
    2025-11-04 04:44:45,316 | INFO     | Handling less significant features from data ...
    2025-11-04 04:44:49,520 | INFO     | Analysis indicates all categorical columns are significant. No action Needed.
    2025-11-04 04:44:49,520 | INFO     | Total time to handle less significant features: 4.20 sec
    2025-11-04 04:44:49,520 | INFO     | Handling Date Features ...
    2025-11-04 04:44:49,520 | INFO     | Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    2025-11-04 04:44:49,520 | INFO     | Total time to handle date features: 0.00 sec
    2025-11-04 04:44:49,520 | INFO     | Checking Missing values in dataset using AutoChurn function...
    2025-11-04 04:44:51,510 | INFO     | Analysis Completed. No Missing Values Detected.
    2025-11-04 04:44:51,510 | INFO     | Total time to find missing values in data using AutoChurn : 1.99 sec
    2025-11-04 04:44:51,511 | INFO     | Imputing Missing Values using SimpleImputeFit partition column...
    2025-11-04 04:44:51,511 | INFO     | Analysis completed. No imputation required.
    2025-11-04 04:44:51,511 | INFO     | Time taken to perform imputation: 0.00 sec
    2025-11-04 04:44:51,511 | INFO     | Performing target encoding for categorical columns ...
    2025-11-04 04:44:56,597 | INFO     | Target Encoding completed for categorical columns using CBM_BETA.
    2025-11-04 04:44:56,597 | INFO     | Target Encoding these Columns:
    ['country', 'gender']
    2025-11-04 04:44:56,598 | INFO     | Sample of dataset after performing target encoding:
               gender  age    balance  products_number  active_member  estimated_salary  credit_card  tenure  automl_id  customer_id  churn  credit_score
    country
    0.16705  0.246369   45       0.00                1              0          73881.68            1       6         45     15737047      1           754
    0.16705  0.168019   36       0.00                2              0          35156.54            1       3         73     15569364      0           666
    0.16705  0.168019   41  144147.68                1              1          14789.90            1       5         77     15728523      0           522
    0.16705  0.168019   34       0.00                2              1          91711.66            1       5         81     15793247      0           498
    0.16705  0.246369   43       0.00                2              0           2465.80            0       3         97     15684925      0           850
    0.16705  0.168019   48  148116.48                1              0         116973.48            0       0        117     15722548      0           540
    0.16705  0.168019   29       0.00                2              0         172097.40            0       9         89     15591428      0           781
    0.16705  0.168019   34  117468.67                1              0         185227.42            1       2         57     15713637      0           699
    0.16705  0.246369   47       0.00                1              1          66408.01            1       8         37     15644692      1           546
    0.16705  0.246369   23       0.00                2              1         141756.32            1       1         17     15675749      0           695
    8000 rows X 13 columns
    2025-11-04 04:44:56,732 | INFO     | Time taken to encode the columns: 5.22 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:44:56,732 | INFO     | Data preparation started ...
    2025-11-04 04:44:56,733 | INFO     | AutoChurn Outlier preprocessing using Percentile...
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 04:45:00,566 | INFO     | Columns with outlier percentage :-
             ColumnName  OutlierPercentage
    0           balance             1.0000
    1       customer_id             1.9875
    2      credit_score             0.8875
    3   products_number             0.5875
    4         automl_id             1.9875
    5               age             1.6875
    6  estimated_salary             1.9875
    2025-11-04 04:45:01,097 | INFO     | Replacing outliers with median:
    ['automl_id', 'products_number', 'age', 'estimated_salary', 'customer_id', 'balance', 'credit_score']
    2025-11-04 04:45:04,102 | INFO     | Sample of dataset after replacing outliers with MEDIAN:
                gender  age    balance  products_number  active_member  estimated_salary  credit_card  tenure  automl_id  customer_id  churn  credit_score
    country
    0.167050  0.168019   41  144147.68                1              1          14789.90            1       5       4003     15728523      0           522
    0.167050  0.246369   43       0.00                2              0         100129.33            0       3       4003     15684925      0           652
    0.167050  0.168019   48  148116.48                1              0         116973.48            0       0       4003     15722548      0           540
    0.167050  0.168019   37       0.00                1              0         120906.83            0       2       4003     15730460      0           722
    0.167050  0.168019   39  139153.68                2              0         147662.33            1       3       4003     15614491      0           539
    0.167050  0.246369   33       0.00                1              0         142797.50            0       9       4003     15772243      1           612
    0.323218  0.168019   42  117691.00                1              1          23135.65            1       1       4003     15801832      1           684
    0.323218  0.246369   34   97440.67                1              0         100129.33            1       2       4003     15690198      0           790
    0.323218  0.168019   41  139706.31                1              0          63337.19            1       7       4003     15587451      0           778
    0.323218  0.246369   28  105173.99                1              1          29835.37            0      10       4003     15709252      1           616
    8000 rows X 13 columns
    2025-11-04 04:45:04,235 | INFO     | Time Taken by Outlier processing: 7.50 sec
    2025-11-04 04:45:04,236 | INFO     | Checking imbalance data ...
    2025-11-04 04:45:04,319 | INFO     | Imbalance Found.
    2025-11-04 04:45:04,319 | INFO     | Handling data imbalance using SMOTE ...
    2025-11-04 04:45:10,751 | INFO     | Completed data imbalance handling.
    2025-11-04 04:45:12,647 | INFO     | Feature selection using rfe ...
    2025-11-04 04:46:16,732 | INFO     | feature selected by RFE:
    ['age', 'active_member', 'products_number', 'tenure', 'customer_id', 'credit_score', 'country', 'gender', 'balance', 'estimated_salary']
    2025-11-04 04:46:16,734 | INFO     | Total time taken by feature selection: 64.09 sec
    2025-11-04 04:46:17,682 | INFO     | Scaling Features of rfe data ...
    2025-11-04 04:46:19,238 | INFO     | columns that will be scaled:
    ['r_age', 'r_active_member', 'r_products_number', 'r_tenure', 'r_customer_id', 'r_credit_score', 'r_country', 'r_gender', 'r_balance', 'r_estimated_salary']
    2025-11-04 04:46:21,237 | INFO     | Dataset sample after scaling:
       automl_id  churn     r_age  r_active_member  r_products_number  r_tenure  r_customer_id  r_credit_score  r_country  r_gender  r_balance  r_estimated_salary
    0          6      1  0.571429              0.0                0.0       0.6       0.706082        0.820988   0.048721       1.0   0.000000            0.355270
    1          8      0  0.228571              1.0                1.0       0.6       0.605200        0.506173   0.000000       1.0   0.000000            0.475320
    2          9      0  0.257143              0.0                0.0       0.2       0.496531        0.932099   1.000000       1.0   0.595714            0.501021
    3         10      0  0.314286              0.0                1.0       0.3       0.496531        0.549383   0.048721       0.0   0.000000            0.140234
    4         12      0  0.485714              0.0                1.0       0.3       0.183988        0.506173   0.000000       1.0   0.510438            0.155546
    5         13      0  0.457143              0.0                0.0       0.7       0.036955        0.895062   1.000000       0.0   0.854109            0.296718
    6         11      1  0.657143              1.0                1.0       0.2       0.443501        0.746914   0.058465       1.0   0.200035            0.576641
    7          7      1  0.428571              0.0                0.0       0.3       0.536094        0.438272   0.017052       1.0   0.000000            0.377752
    8          5      1  0.485714              1.0                0.0       0.1       0.995858        0.604938   1.000000       0.0   0.719516            0.073483
    9          4      0  0.085714              0.0                0.0       0.7       0.630740        0.274691   0.000000       0.0   0.000000            0.740601
    10928 rows X 12 columns
    2025-11-04 04:46:22,554 | INFO     | Total time taken by feature scaling: 4.87 sec
    2025-11-04 04:46:22,554 | INFO     | Scaling Features of pca data ...
    2025-11-04 04:46:23,607 | INFO     | columns that will be scaled:
    ['country', 'gender', 'age', 'balance', 'products_number', 'active_member', 'estimated_salary', 'credit_card', 'tenure', 'customer_id', 'credit_score']
    2025-11-04 04:46:25,741 | INFO     | Dataset sample after scaling:
       automl_id  churn   country    gender       age   balance  products_number  active_member  estimated_salary  credit_card  tenure  customer_id  credit_score
    0       8012      1  0.183401  0.000000  0.571429  0.555790              0.0            1.0          0.500402          0.0     0.4     0.792462      0.296296
    1       7965      1  0.005166  1.000000  0.485714  0.000000              0.0            1.0          0.571366          1.0     0.5     0.818096      0.148148
    2       7969      1  0.013981  0.258173  0.714286  0.000000              0.0            0.0          0.474954          0.0     0.7     0.884747      0.172840
    3          7      1  0.017007  1.000000  0.428571  0.000000              0.0            0.0          0.377752          0.0     0.3     0.536094      0.438272
    4         15      1  0.025299  0.000000  0.457143  0.471352              0.0            0.0          0.504431          1.0     0.3     0.490506      0.160494
    5      16050      1  0.013621  1.000000  0.571429  0.000000              0.0            1.0          0.896129          0.0     0.5     0.279515      0.175926
    6      16054      1  0.033517  1.000000  0.285714  0.000000              0.0            0.0          0.448751          0.0     0.7     0.516266      0.429012
    7      16058      1  0.404530  0.618559  0.257143  0.720069              0.0            0.0          0.250664          1.0     0.5     0.336782      0.416667
    8         11      1  0.058308  1.000000  0.657143  0.200035              1.0            1.0          0.576641          1.0     0.2     0.443501      0.746914
    9       7961      1  0.067613  0.330397  0.571429  0.872392              1.0            0.0          0.748723          1.0     0.9     0.466008      0.354938
    10928 rows X 13 columns
    2025-11-04 04:46:26,573 | INFO     | Total time taken by feature scaling: 4.02 sec
    2025-11-04 04:46:26,573 | INFO     | Dimension Reduction using pca ...
    2025-11-04 04:46:27,412 | INFO     | PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9']
    2025-11-04 04:46:27,413 | INFO     | Total time taken by PCA: 0.84 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 04:46:28,313 | INFO     | Model Training started ...
    2025-11-04 04:46:28,356 | INFO     | Hyperparameters used for model training:
    2025-11-04 04:46:28,356 | INFO     | Model: glm
    2025-11-04 04:46:28,356 | INFO     | Hyperparameters: {'response_column': 'churn', 'name': 'glm', 'family': 'BINOMIAL', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'iter_num_no_change': (5, 10, 50), 'iter_max': (300, 400), 'batch_size': (10, 100, 150)}
    2025-11-04 04:46:28,357 | INFO     | Total number of models for glm: 648
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:46:28,357 | INFO     | Model: svm
    2025-11-04 04:46:28,357 | INFO     | Hyperparameters: {'response_column': 'churn', 'name': 'svm', 'model_type': 'Classification', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'tolerance': (0.001, 0.01), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'nesterov': True, 'intercept': True, 'iter_num_no_change': (5, 10, 50), 'local_sgd_iterations ': (10, 20), 'iter_max': (300, 400), 'batch_size': (10, 100, 150)}
    2025-11-04 04:46:28,357 | INFO     | Total number of models for svm: 2592
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:46:28,358 | INFO     | Model: knn
    2025-11-04 04:46:28,358 | INFO     | Hyperparameters: {'response_column': 'churn', 'name': 'knn', 'model_type': 'Classification', 'k': (3, 5, 6, 8, 10, 12), 'id_column': 'automl_id', 'voting_weight': 1.0}
    2025-11-04 04:46:28,358 | INFO     | Total number of models for knn: 6
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:46:28,358 | INFO     | Model: decision_forest
    2025-11-04 04:46:28,358 | INFO     | Hyperparameters: {'response_column': 'churn', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': (0.0, 0.2, 0.3), 'max_depth': (5, 4, 6, 7), 'min_node_size': (1, 3, 4), 'num_trees': (-1,), 'seed': 42}
    2025-11-04 04:46:28,358 | INFO     | Total number of models for decision_forest: 36
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:46:28,358 | INFO     | Model: xgboost
    2025-11-04 04:46:28,358 | INFO     | Hyperparameters: {'response_column': 'churn', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.2, 0.3), 'lambda1': (1.0, 0.1, 1.0), 'shrinkage_factor': (0.5, 0.01, 0.1, 0.2), 'max_depth': (5, 4, 6, 7), 'min_node_size': (1, 3, 4), 'iter_num': (10, 30, 40), 'num_boosted_trees': (-1, 10, 20), 'seed': 42}
    2025-11-04 04:46:28,360 | INFO     | Total number of models for xgboost: 7776
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 04:46:28,360 | INFO     | Performing hyperparameter tuning ...
                                                                                                                                                                 2025-11-04 04:46:29,618 | INFO     | Model training for glm
    2025-11-04 04:46:48,368 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:46:48,368 | INFO     | Model training for svm
    2025-11-04 04:47:13,008 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:47:13,009 | INFO     | Model training for knn
    2025-11-04 04:48:27,858 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:48:27,858 | INFO     | Model training for decision_forest
    2025-11-04 04:48:50,857 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 04:48:50,857 | INFO     | Model training for xgboost
    2025-11-04 04:49:09,178 | INFO     | ----------------------------------------------------------------------------------------------------
    2025-11-04 04:49:09,181 | INFO     | Leaderboard
        RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0      1  DECISIONFOREST_0               rfe  0.817932         0.817932  ...      0.811864  0.812442            0.817630         0.817932     0.817756
    1      2  DECISIONFOREST_2               rfe  0.817932         0.817932  ...      0.811864  0.812442            0.817630         0.817932     0.817756
    2      3         XGBOOST_0               rfe  0.811070         0.811070  ...      0.816414  0.809047            0.819893         0.811070     0.812302
    3      4         XGBOOST_2               rfe  0.809241         0.809241  ...      0.817025  0.807742            0.821520         0.809241     0.810553
    4      5             KNN_4               rfe  0.788655         0.788655  ...      0.780982  0.781926            0.788047         0.788655     0.788270
    5      6             KNN_0               rfe  0.787283         0.787283  ...      0.778559  0.780028            0.786415         0.787283     0.786643
    6      7         XGBOOST_1               pca  0.772187         0.772187  ...      0.774954  0.769313            0.779456         0.772187     0.773577
    7      8         XGBOOST_3               pca  0.768984         0.768984  ...      0.774544  0.766894            0.779825         0.768984     0.770550
    8      9  DECISIONFOREST_1               pca  0.752516         0.752516  ...      0.736892  0.740360            0.750839         0.752516     0.749663
    9     10  DECISIONFOREST_3               pca  0.752516         0.752516  ...      0.736892  0.740360            0.750839         0.752516     0.749663
    10    11             KNN_7               pca  0.750229         0.750229  ...      0.738824  0.740750            0.748694         0.750229     0.748959
    11    12             KNN_3               pca  0.745197         0.745197  ...      0.735130  0.736415            0.744002         0.745197     0.744382
    12    13             GLM_5               pca  0.718664         0.718664  ...      0.715327  0.713153            0.721856         0.718664     0.719737
    13    14             GLM_0               rfe  0.710430         0.710430  ...      0.688011  0.691238            0.708035         0.710430     0.703986
    14    15             SVM_4               rfe  0.698079         0.698079  ...      0.716983  0.698072            0.735644         0.698079     0.698301
    15    16             SVM_5               pca  0.693047         0.693047  ...      0.659700  0.660269            0.694934         0.693047     0.677744
    16    17             GLM_3               pca  0.681153         0.681153  ...      0.688130  0.679500            0.697436         0.681153     0.683311
    17    18             SVM_3               pca  0.666057         0.666057  ...      0.630002  0.627414            0.664701         0.666057     0.647284
    18    19             SVM_1               pca  0.654620         0.654620  ...      0.652905  0.649703            0.661418         0.654620     0.656576
    19    20             SVM_7               pca  0.654620         0.654620  ...      0.652905  0.649703            0.661418         0.654620     0.656576
    20    21             GLM_1               pca  0.639067         0.639067  ...      0.620867  0.621961            0.634351         0.639067     0.635278
    21    22             GLM_4               rfe  0.600640         0.600640  ...      0.652393  0.583213            0.756020         0.600640     0.569100
    22    23             SVM_2               rfe  0.547575         0.547575  ...      0.609048  0.511899            0.750107         0.547575     0.490046
    23    24             GLM_2               rfe  0.526532         0.526532  ...      0.592241  0.480144            0.755583         0.526532     0.454428
    24    25             SVM_0               rfe  0.525160         0.525160  ...      0.589350  0.481550            0.733600         0.525160     0.456650
    25    26             SVM_6               rfe  0.525160         0.525160  ...      0.589350  0.481550            0.733600         0.525160     0.456650
    [26 rows x 13 columns]
    26 rows X 13 columns
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 16/16
  4. Display leaderboard.
    >>> ch.leaderboard()
        RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0      1  DECISIONFOREST_0               rfe  0.817932         0.817932  ...      0.811864  0.812442            0.817630         0.817932     0.817756
    1      2  DECISIONFOREST_2               rfe  0.817932         0.817932  ...      0.811864  0.812442            0.817630         0.817932     0.817756
    2      3         XGBOOST_0               rfe  0.811070         0.811070  ...      0.816414  0.809047            0.819893         0.811070     0.812302
    3      4         XGBOOST_2               rfe  0.809241         0.809241  ...      0.817025  0.807742            0.821520         0.809241     0.810553
    4      5             KNN_4               rfe  0.788655         0.788655  ...      0.780982  0.781926            0.788047         0.788655     0.788270
    5      6             KNN_0               rfe  0.787283         0.787283  ...      0.778559  0.780028            0.786415         0.787283     0.786643
    6      7         XGBOOST_1               pca  0.772187         0.772187  ...      0.774954  0.769313            0.779456         0.772187     0.773577
    7      8         XGBOOST_3               pca  0.768984         0.768984  ...      0.774544  0.766894            0.779825         0.768984     0.770550
    8      9  DECISIONFOREST_1               pca  0.752516         0.752516  ...      0.736892  0.740360            0.750839         0.752516     0.749663
    9     10  DECISIONFOREST_3               pca  0.752516         0.752516  ...      0.736892  0.740360            0.750839         0.752516     0.749663
    10    11             KNN_7               pca  0.750229         0.750229  ...      0.738824  0.740750            0.748694         0.750229     0.748959
    11    12             KNN_3               pca  0.745197         0.745197  ...      0.735130  0.736415            0.744002         0.745197     0.744382
    12    13             GLM_5               pca  0.718664         0.718664  ...      0.715327  0.713153            0.721856         0.718664     0.719737
    13    14             GLM_0               rfe  0.710430         0.710430  ...      0.688011  0.691238            0.708035         0.710430     0.703986
    14    15             SVM_4               rfe  0.698079         0.698079  ...      0.716983  0.698072            0.735644         0.698079     0.698301
    15    16             SVM_5               pca  0.693047         0.693047  ...      0.659700  0.660269            0.694934         0.693047     0.677744
    16    17             GLM_3               pca  0.681153         0.681153  ...      0.688130  0.679500            0.697436         0.681153     0.683311
    17    18             SVM_3               pca  0.666057         0.666057  ...      0.630002  0.627414            0.664701         0.666057     0.647284
    18    19             SVM_1               pca  0.654620         0.654620  ...      0.652905  0.649703            0.661418         0.654620     0.656576
    19    20             SVM_7               pca  0.654620         0.654620  ...      0.652905  0.649703            0.661418         0.654620     0.656576
    20    21             GLM_1               pca  0.639067         0.639067  ...      0.620867  0.621961            0.634351         0.639067     0.635278
    21    22             GLM_4               rfe  0.600640         0.600640  ...      0.652393  0.583213            0.756020         0.600640     0.569100
    22    23             SVM_2               rfe  0.547575         0.547575  ...      0.609048  0.511899            0.750107         0.547575     0.490046
    23    24             GLM_2               rfe  0.526532         0.526532  ...      0.592241  0.480144            0.755583         0.526532     0.454428
    24    25             SVM_0               rfe  0.525160         0.525160  ...      0.589350  0.481550            0.733600         0.525160     0.456650
    25    26             SVM_6               rfe  0.525160         0.525160  ...      0.589350  0.481550            0.733600         0.525160     0.456650
    [26 rows x 13 columns]
  5. Display best performing model.
    >>> ch.leader()
       RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  DECISIONFOREST_0               rfe  0.817932         0.817932  ...      0.811864  0.812442             0.81763         0.817932     0.817756
    [1 rows x 13 columns]
  6. Get hyperparameters for trained model:
    • Display model hyperparameters for rank 1.
      >>> ch.model_hyperparameters(rank=1)
      {'response_column': 'churn', 
        'name': 'decision_forest', 
        'tree_type': 'Classification', 
        'min_impurity': 0.0, 
        'max_depth': 5, 
        'min_node_size': 1, 
        'num_trees': -1, 
        'seed': 42, 
        'persist': False, 
        'output_prob': True, 
        'output_responses': ['1', '0']}
      
    • Display model hyperparameters for rank 5.
      >>> ch.model_hyperparameters(rank=5)
      {'response_column': 'churn', 
        'name': 'knn', 
        'model_type': 'Classification', 
        'k': 5, 
        'id_column': 'automl_id', 
        'voting_weight': 1.0, 
        'persist': False, 
        'output_prob': True, 
        'output_responses': ['1', '0']}
      
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = ch.predict(bank_test)
    2025-11-04 04:50:58,466 | INFO     | Data Transformation started ...
    2025-11-04 04:50:58,467 | INFO     | Performing transformation carried out in feature engineering phase ...
    2025-11-04 04:50:59,185 | INFO     | Updated dataset after performing target column transformation :
       customer_id  credit_score country  gender  age  tenure    balance  products_number  credit_card  active_member  estimated_salary  churn  automl_id
    0     15787884           692  France  Female   30       7       0.00                2            1              1          18826.34      0         14
    1     15688963           731  France  Female   52      10       0.00                1            1              1          24998.75      1          9
    2     15791045           568  France  Female   38       3  132951.92                1            0              1         124486.28      0         13
    3     15707132           465  France    Male   33       5       0.00                2            0              1          78698.09      0          7
    4     15805523           717  France  Female   28       1   90537.16                1            0              1          74800.99      0         15
    5     15796612           527  France  Female   31       1  112203.25                1            1              0         182266.01      0          4
    6     15602909           604   Spain  Female   41      10       0.00                2            1              1         166224.39      0          8
    7     15652808           774  France  Female   41       5  126670.37                1            1              0         102426.06      0         12
    8     15670039           509   Spain  Female   25       3  108738.71                2            1              0         106920.57      0         11
    9     15668775           757  France    Male   47       3  130747.10                1            1              0         143829.54      0          5
    2000 rows X 13 columns
    2025-11-04 04:51:00,618 | INFO     | Updated dataset after performing categorical encoding :
                gender  age    balance  products_number  active_member  estimated_salary  credit_card  tenure  automl_id  customer_id  churn  credit_score
    country
    0.159022  0.168019   62   64119.38                1              1          76569.64            1       1         50     15727299      1           445
    0.159022  0.168019   35       0.00                2              1         140780.80            0       7         94     15611105      0           799
    0.159022  0.168019   34  137523.02                1              0          24761.36            0       1         98     15721303      0           640
    0.159022  0.246369   26  135219.57                1              1          59747.63            0       4        106     15749851      0           702
    0.159022  0.168019   37  138207.08                1              0          60778.11            1       1        114     15583371      1           632
    0.159022  0.168019   62       0.00                2              1         180243.56            1       5        118     15719793      0           850
    0.323218  0.246369   25  152885.77                1              0          58214.79            1       5         41     15576990      0           790
    0.323218  0.246369   33  150412.14                2              0         170764.08            1       9         57     15810457      0           728
    0.323218  0.246369   26   97331.19                1              0          63717.49            1       1         89     15604314      0           703
    0.323218  0.168019   36  115725.24                2              0           1871.25            0       5         93     15574868      0           792
    2000 rows X 13 columns
    2025-11-04 04:51:00,734 | INFO     | Performing transformation carried out in data preparation phase ...
    2025-11-04 04:51:01,544 | INFO     | Updated dataset after performing RFE feature selection:
         automl_id  active_member  products_number  tenure  customer_id  credit_score  country  gender    balance  estimated_salary  churn
    age
    40         408              1                1       5     15784286           641   0.1590  0.1680  102145.13         100637.07      0
    40         280              1                2       2     15806230           629   0.3232  0.1680  121647.54          64849.74      1
    40         745              0                1       3     15619514           507   0.3232  0.1680  120105.43          92075.01      1
    40         951              0                2       8     15610090           667   0.1670  0.1680   72945.29          98931.50      0
    40         947              0                1       6     15645572           743   0.1670  0.2464       0.00          28280.80      1
    40         642              0                1       4     15626612           741   0.1590  0.1680  104784.23         135163.76      1
    40        1590              0                3       5     15765300           596   0.3232  0.1680   62389.03         148623.43      1
    40         707              1                2       2     15679733           796   0.3232  0.1680  113228.38          46415.09      0
    40         644              1                1       1     15577771           453   0.3232  0.2464  111524.49         120373.84      1
    40         210              1                2       3     15739857           785   0.1670  0.2464       0.00          96832.82      0
    2000 rows X 12 columns
    2025-11-04 04:51:02,510 | INFO     | Updated dataset after performing scaling on RFE selected features :
       automl_id  churn     r_age  r_active_member  r_products_number  r_tenure  r_customer_id  r_credit_score  r_country  r_gender  r_balance  r_estimated_salary
    0       1289      0  1.028571              1.0                0.0       0.2       1.036606       -0.132716   1.000000       1.0   0.841570            0.111099
    1       1491      1  0.257143              0.0                0.0       0.7       0.261669        0.413580   1.000000       1.0   0.603265            0.716348
    2        639      0  0.257143              1.0                0.0       0.6       0.048218        1.117284   0.048721       0.0   0.000000            0.238185
    3        210      0  0.428571              1.0                1.0       0.3       0.718651        0.916667   0.048721       1.0   0.000000            0.482716
    4        408      0  0.428571              1.0                0.0       0.5       0.917377        0.472222   0.000000       0.0   0.624475            0.503840
    5       1605      0 -0.171429              0.0                1.0       0.8       0.857807        0.780864   1.000000       1.0   0.664620            0.083043
    6        700      0 -0.171429              1.0                1.0       0.2       0.513099        0.336420   0.048721       0.0   0.000000            0.450530
    7         86      1 -0.171429              1.0                0.0       0.9       0.909021        0.311728   1.000000       1.0   0.510456            0.739201
    8        644      1  0.428571              1.0                0.0       0.1      -0.006343       -0.108025   1.000000       1.0   0.681817            0.613436
    9       1201      0  0.257143              0.0                0.0       0.1       0.567972        0.456790   0.048721       0.0   0.513882            0.153196
    2000 rows X 12 columns
    2025-11-04 04:51:03,878 | INFO     | Updated dataset after performing scaling for PCA feature selection :
       automl_id  churn   country    gender       age   balance  products_number  active_member  estimated_salary  credit_card  tenure  customer_id  credit_score
    0         86      1  0.999890  1.000394 -0.171429  0.510456              0.0            1.0          0.739201          1.0     0.9     0.909021      0.311728
    1       1611      1  0.048586  1.000394  1.028571  0.492381              0.0            0.0          0.674138          1.0     0.7     0.733765      0.614198
    2       1289      0  0.999890  1.000394  1.028571  0.841570              0.0            1.0          0.111099          0.0     0.2     1.036606     -0.132716
    3       1201      0  0.048586 -0.000247  0.257143  0.513882              0.0            0.0          0.153196          0.0     0.1     0.567972      0.456790
    4        639      0  0.048586 -0.000247  0.257143  0.000000              0.0            1.0          0.238185          0.0     0.6     0.048218      1.117284
    5        210      0  0.048586  1.000394  0.428571  0.000000              1.0            1.0          0.482716          1.0     0.3     0.718651      0.916667
    6        644      1  0.999890  1.000394  0.428571  0.681817              0.0            1.0          0.613436          1.0     0.1    -0.006343     -0.108025
    7        408      0 -0.000137 -0.000247  0.428571  0.624475              0.0            1.0          0.503840          1.0     0.5     0.917377      0.472222
    8       1491      1  0.999890  1.000394  0.257143  0.603265              0.0            0.0          0.716348          1.0     0.7     0.261669      0.413580
    9        827      1  0.048586  1.000394  1.028571  0.398548              0.0            1.0          0.302111          1.0     0.8     0.498969      0.148148
    2000 rows X 13 columns
    2025-11-04 04:51:04,362 | INFO     | Updated dataset after performing PCA feature selection :
       automl_id     col_0     col_1     col_2     col_3     col_4     col_5     col_6     col_7     col_8     col_9  churn
    0        827 -0.038034  0.395402  0.476988  0.489123 -0.501875  0.269476 -0.107793 -0.241479 -0.106426 -0.339520      1
    1       1201 -0.311645 -0.164746 -0.390991 -0.852924 -0.363499 -0.442413  0.033928 -0.316707  0.108449 -0.025788      0
    2        210  0.881702  0.005745  0.554778  0.457411 -0.006528 -0.144999  0.249513 -0.066713 -0.097130  0.415885      0
    3       1605 -0.106923 -0.524032  0.365589  0.342021  0.978652  0.282409  0.164703 -0.525404  0.128984  0.246235      0
    4       1611 -0.492504 -0.464567  0.357513  0.300643 -0.400913  0.254261  0.237596  0.045910  0.015132  0.114967      1
    5       1491 -0.841408 -0.242276  0.343362  0.354427  0.349340  0.190090 -0.199311  0.234644 -0.249660 -0.076840      1
    6        644 -0.471611  0.670113  0.487760  0.530222  0.288229 -0.444138 -0.363210  0.282901 -0.279595 -0.583980      1
    7        700  1.040975  0.125794 -0.420409  0.247582  0.018582 -0.307925  0.055894  0.011868 -0.085413 -0.166389      0
    8       1289 -0.560369  0.880995  0.707058 -0.433059  0.333166 -0.249209  0.442741 -0.547914 -0.218182 -0.606135      0
    9        639  0.305519  0.588268 -0.249405 -0.659160 -0.574371  0.014322 -0.512917 -0.130990 -0.291023  0.662586      0
    10 rows X 12 columns
    2025-11-04 04:51:04,806 | INFO     | Data Transformation completed.█████| 100% - 9/9
    2025-11-04 04:51:05,416 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:51:05,416 | INFO     | Model ID : DECISIONFOREST_0
    2025-11-04 04:51:05,416 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:51:06,478 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 04:51:11,383 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 04:51:11,476 | INFO     | Prediction :
       automl_id  prediction  prob_1  prob_0  churn
    0        639           0     0.0     1.0      0
    1        644           1     1.0     0.0      1
    2        408           0     0.0     1.0      0
    3       1605           0     0.0     1.0      0
    4         86           0     0.0     1.0      1
    5        827           1     1.0     0.0      1
    6       1611           1     1.0     0.0      1
    7       1289           1     1.0     0.0      0
    8        700           0     0.0     1.0      0
    9        210           0     0.0     1.0      0
    2025-11-04 04:51:15,457 | INFO     | ROC-AUC :
                  GINI
    AUC
    0.662776  0.325553
       threshold_value      tpr       fpr
    0         0.040816  0.58867  0.165621
    1         0.081633  0.58867  0.165621
    2         0.102041  0.58867  0.165621
    3         0.122449  0.58867  0.165621
    4         0.163265  0.58867  0.165621
    5         0.183673  0.58867  0.165621
    6         0.142857  0.58867  0.165621
    7         0.061224  0.58867  0.165621
    8         0.020408  0.58867  0.165621
    9         0.000000  1.00000  1.000000
    2025-11-04 04:51:16,852 | INFO     | Confusion Matrix :
    [[1330  264]
     [ 167  239]]
    >>> prediction.head()
       automl_id  prediction  prob_1  prob_0  churn
    0        639           0     0.0     1.0      0
    1        644           1     1.0     0.0      1
    2        408           0     0.0     1.0      0
    3       1605           0     0.0     1.0      0
    4         86           0     0.0     1.0      1
    5        827           1     1.0     0.0      1
    6       1611           1     1.0     0.0      1
    7       1289           1     1.0     0.0      0
    8        700           0     0.0     1.0      0
    9        210           0     0.0     1.0      0
  8. Generate evaluation metrics on test dataset using best performing model.
    >>> performance_metrics = ch.evaluate(bank_test)
    2025-11-04 04:51:56,301 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 04:51:56,849 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:51:56,849 | INFO     | Model ID : DECISIONFOREST_0
    2025-11-04 04:51:56,849 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:52:03,186 | INFO     | Performance Metrics :
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1     1330      167   0.888444  0.834379  0.860563     1594
    1               1  CLASS_2      264      239   0.475149  0.588670  0.525853      406
    --------------------------------------------------------------------------------
       SeqNum              Metric  MetricValue
    0       3        Micro-Recall     0.784500
    1       5     Macro-Precision     0.681796
    2       6        Macro-Recall     0.711524
    3       7            Macro-F1     0.693208
    4       9     Weighted-Recall     0.784500
    5      10         Weighted-F1     0.792617
    6       8  Weighted-Precision     0.804545
    7       4            Micro-F1     0.784500
    8       2     Micro-Precision     0.784500
    9       1            Accuracy     0.784500
    >>> performance_metrics
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    1               1  CLASS_2      264      239   0.475149  0.588670  0.525853      406
    0               0  CLASS_1     1330      167   0.888444  0.834379  0.860563     1594
  9. Generate prediction on test dataset using second best performing model.
    >>> prediction = ch.predict(bank_test,2)
    2025-11-04 04:52:32,427 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 04:52:32,976 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:52:32,976 | INFO     | Model ID : DECISIONFOREST_2
    2025-11-04 04:52:32,976 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:52:33,789 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 04:52:37,270 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 04:52:37,366 | INFO     | Prediction :
       automl_id  prediction  prob_1  prob_0  churn
    0       1289           1     1.0     0.0      0
    1       1491           0     0.0     1.0      1
    2        639           0     0.0     1.0      0
    3        210           0     0.0     1.0      0
    4        408           0     0.0     1.0      0
    5       1605           0     0.0     1.0      0
    6        700           0     0.0     1.0      0
    7         86           0     0.0     1.0      1
    8        644           1     1.0     0.0      1
    9       1201           0     0.0     1.0      0
    2025-11-04 04:52:41,136 | INFO     | ROC-AUC :
                  GINI
    AUC
    0.662776  0.325553
       threshold_value      tpr       fpr
    0         0.040816  0.58867  0.165621
    1         0.081633  0.58867  0.165621
    2         0.102041  0.58867  0.165621
    3         0.122449  0.58867  0.165621
    4         0.163265  0.58867  0.165621
    5         0.183673  0.58867  0.165621
    6         0.142857  0.58867  0.165621
    7         0.061224  0.58867  0.165621
    8         0.020408  0.58867  0.165621
    9         0.000000  1.00000  1.000000
    2025-11-04 04:52:42,530 | INFO     | Confusion Matrix :
    [[1330  264]
     [ 167  239]]
    >>> prediction.head()
       automl_id  prediction  prob_1  prob_0  churn
    0          6           1     1.0     0.0      1
    1          8           0     0.0     1.0      0
    2          9           1     1.0     0.0      1
    3         10           1     1.0     0.0      0
    4         12           0     0.0     1.0      0
    5         13           0     0.0     1.0      0
    6         11           0     0.0     1.0      0
    7          7           0     0.0     1.0      0
    8          5           1     1.0     0.0      0
    9          4           0     0.0     1.0      0
  10. Generate evaluation metrics on test dataset using second best performing model.
    >>> performance_metrics = ch.evaluate(bank_test, 2)
    2025-11-04 04:53:23,452 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 04:53:24,000 | INFO     | Following model is being picked for evaluation:
    2025-11-04 04:53:24,001 | INFO     | Model ID : DECISIONFOREST_2
    2025-11-04 04:53:24,001 | INFO     | Feature Selection Method : rfe
    2025-11-04 04:53:30,301 | INFO     | Performance Metrics :
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1     1330      167   0.888444  0.834379  0.860563     1594
    1               1  CLASS_2      264      239   0.475149  0.588670  0.525853      406
    --------------------------------------------------------------------------------
       SeqNum              Metric  MetricValue
    0       3        Micro-Recall     0.784500
    1       5     Macro-Precision     0.681796
    2       6        Macro-Recall     0.711524
    3       7            Macro-F1     0.693208
    4       9     Weighted-Recall     0.784500
    5      10         Weighted-F1     0.792617
    6       8  Weighted-Precision     0.804545
    7       4            Micro-F1     0.784500
    8       2     Micro-Precision     0.784500
    9       1            Accuracy     0.784500
    >>> performance_metrics
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1     1330      167   0.888444  0.834379  0.860563     1594
    1               1  CLASS_2      264      239   0.475149  0.588670  0.525853      406