Run AutoClassifier for classification problem using early stopping timer - Example 3: Run AutoClassifier for Classification Problem using Early Stopping Timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-02-20
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predict whether passenger aboard the RMS Titanic survived or not based on different factors.

Run AutoClassifier to get the best performing model out of available models with following specifications:
  • Use all default models except 'knn' and 'svm'.
  • Set early stopping timer to 100 sec.
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "titanic")
      >>> titanic = DataFrame.from_table("titanic")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoClassifier instance.
    >>> aml = AutoClassifier(exclude='knn' 'svm',
                             verbose=2,
                             max_runtime_secs=100)
  3. Fit the data.
    >>> aml.fit(titanic_train, 'survived')
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 01:44:50,473 | INFO     | Feature Exploration started
    2025-11-04 01:44:50,473 | INFO     | Data Overview:
    2025-11-04 01:44:50,619 | INFO     | Total Rows in the data: 713
    2025-11-04 01:44:50,661 | INFO     | Total Columns in the data: 12
    2025-11-04 01:44:51,316 | INFO     | Column Summary:
       ColumnName                           Datatype  NonNullCount  NullCount  BlankCount  ZeroCount  PositiveCount  NegativeCount  NullPercentage  NonNullPercentage
    0    embarked    VARCHAR(20) CHARACTER SET LATIN           711          2         0.0        NaN            NaN            NaN        0.280505          99.719495
    1       sibsp                            INTEGER           713          0         NaN      486.0          227.0            0.0        0.000000         100.000000
    2    survived                            INTEGER           713          0         NaN      445.0          268.0            0.0        0.000000         100.000000
    3      pclass                            INTEGER           713          0         NaN        0.0          713.0            0.0        0.000000         100.000000
    4         age                            INTEGER           573        140         NaN        5.0          568.0            0.0       19.635344          80.364656
    5      ticket    VARCHAR(20) CHARACTER SET LATIN           713          0         0.0        NaN            NaN            NaN        0.000000         100.000000
    6         sex    VARCHAR(20) CHARACTER SET LATIN           713          0         0.0        NaN            NaN            NaN        0.000000         100.000000
    7       parch                            INTEGER           713          0         NaN      540.0          173.0            0.0        0.000000         100.000000
    8        name  VARCHAR(1000) CHARACTER SET LATIN           713          0         0.0        NaN            NaN            NaN        0.000000         100.000000
    9   passenger                            INTEGER           713          0         NaN        0.0          713.0            0.0        0.000000         100.000000
    10      cabin    VARCHAR(20) CHARACTER SET LATIN           154        559         0.0        NaN            NaN            NaN       78.401122          21.598878
    11       fare                              FLOAT           713          0         NaN       10.0          703.0            0.0        0.000000         100.000000
    2025-11-04 01:44:52,148 | INFO     | Statistics of Data:
      ATTRIBUTE            StatName   StatValue
    0  survived             MAXIMUM    1.000000
    1  survived  STANDARD DEVIATION    0.484688
    2  survived     PERCENTILES(25)    0.000000
    3  survived     PERCENTILES(50)    0.000000
    4      fare               COUNT  713.000000
    5      fare             MINIMUM    0.000000
    6      fare             MAXIMUM  512.329200
    7      fare                MEAN   32.204125
    8      fare  STANDARD DEVIATION   51.384597
    9      fare     PERCENTILES(25)    7.925000
    2025-11-04 01:44:52,477 | INFO     | Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    name                      713
    sex                       2
    ticket                    565
    cabin                     124
    embarked                  3
    2025-11-04 01:44:55,757 | INFO     | Futile columns in dataset:
      ColumnName
    0     ticket
    1       name
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 01:44:59,620 | INFO     | Columns with outlier percentage :-
      ColumnName  OutlierPercentage
    0      sibsp           5.329593
    1       fare          12.762973
    2        age          20.476858
    3      parch          24.263675
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 01:45:00,004 | INFO     | Feature Engineering started ...
    2025-11-04 01:45:00,005 | INFO     | Handling duplicate records present in dataset ...
    2025-11-04 01:45:00,183 | INFO     | Analysis completed. No action taken.
    2025-11-04 01:45:00,183 | INFO     | Total time to handle duplicate records: 0.18 sec
    2025-11-04 01:45:00,184 | INFO     | Handling less significant features from data ...
    2025-11-04 01:45:07,148 | INFO     | Removing Futile columns:
    ['ticket', 'name']
    2025-11-04 01:45:07,148 | INFO     | Sample of Data after removing Futile columns:
       passenger  survived  pclass     sex   age  sibsp  parch     fare cabin embarked  automl_id
    0        795         0       3    male  25.0      0      0   7.8958  None        S         12
    1        591         0       3    male  35.0      0      0   7.1250  None        S         11
    2        387         0       3    male   1.0      5      2  46.9000  None        S         15
    3        530         0       2    male  23.0      2      1  11.5000  None        S          5
    4        570         1       3    male  32.0      0      0   7.8542  None        S         13
    5         40         1       3  female  14.0      1      0  11.2417  None        C          6
    6        162         1       2  female  40.0      0      0  15.7500  None        S         10
    7        631         1       1    male  80.0      0      0  30.0000   A23        S         14
    8        305         0       3    male   NaN      0      0   8.0500  None        S          9
    9        122         0       3    male   NaN      0      0   8.0500  None        S          7
    713 rows X 11 columns
    2025-11-04 01:45:07,948 | INFO     | Total time to handle less significant features: 7.76 sec
    2025-11-04 01:45:07,948 | INFO     | Handling Date Features ...
    2025-11-04 01:45:07,948 | INFO     | Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    2025-11-04 01:45:07,948 | INFO     | Total time to handle date features: 0.00 sec
    2025-11-04 01:45:07,948 | INFO     | Checking Missing values in dataset ...
    2025-11-04 01:45:09,342 | INFO     | Columns with their missing values:
    cabin: 559
    age: 140
    embarked: 2
    2025-11-04 01:45:10,421 | INFO     | Deleting rows of these columns for handling missing values:
    ['embarked']
    2025-11-04 01:45:11,071 | INFO     | Sample of dataset after removing 2 rows:
       passenger  survived  pclass     sex   age  sibsp  parch     fare cabin embarked  automl_id
    0        795         0       3    male  25.0      0      0   7.8958  None        S         12
    1        162         1       2  female  40.0      0      0  15.7500  None        S         10
    2        631         1       1    male  80.0      0      0  30.0000   A23        S         14
    3        530         0       2    male  23.0      2      1  11.5000  None        S          5
    4        570         1       3    male  32.0      0      0   7.8542  None        S         13
    5        122         0       3    male   NaN      0      0   8.0500  None        S          7
    6        591         0       3    male  35.0      0      0   7.1250  None        S         11
    7        387         0       3    male   1.0      5      2  46.9000  None        S         15
    8        305         0       3    male   NaN      0      0   8.0500  None        S          9
    9         40         1       3  female  14.0      1      0  11.2417  None        C          6
    711 rows X 11 columns
    2025-11-04 01:45:12,023 | INFO     | Dropping these columns for handling missing values:
    ['cabin']
    2025-11-04 01:45:12,023 | INFO     | Sample of dataset after removing 1 columns:
       passenger  survived  pclass     sex   age  sibsp  parch      fare embarked  automl_id
    0        387         0       3    male   1.0      5      2   46.9000        S         15
    1        326         1       1  female  36.0      0      0  135.6333        C          8
    2        795         0       3    male  25.0      0      0    7.8958        S         12
    3        530         0       2    male  23.0      2      1   11.5000        S          5
    4        570         1       3    male  32.0      0      0    7.8542        S         13
    5         40         1       3  female  14.0      1      0   11.2417        C          6
    6        162         1       2  female  40.0      0      0   15.7500        S         10
    7        631         1       1    male  80.0      0      0   30.0000        S         14
    8        305         0       3    male   NaN      0      0    8.0500        S          9
    9        469         0       3    male   NaN      0      0    7.7250        Q          4
    711 rows X 10 columns
    2025-11-04 01:45:12,885 | INFO     | Total time to find missing values in data: 4.94 sec
    2025-11-04 01:45:12,885 | INFO     | Imputing Missing Values ...
    2025-11-04 01:45:13,164 | INFO     | Columns with their imputation method:
    age: mean
    2025-11-04 01:45:15,562 | INFO     | Sample of dataset after Imputation:
       passenger  survived  pclass     sex  age  sibsp  parch      fare embarked  automl_id
    0        570         1       3    male   32      0      0    7.8542        S         13
    1        591         0       3    male   35      0      0    7.1250        S         11
    2        387         0       3    male    1      5      2   46.9000        S         15
    3        469         0       3    male   29      0      0    7.7250        Q          4
    4        795         0       3    male   25      0      0    7.8958        S         12
    5         40         1       3  female   14      1      0   11.2417        C          6
    6        162         1       2  female   40      0      0   15.7500        S         10
    7        631         1       1    male   80      0      0   30.0000        S         14
    8        326         1       1  female   36      0      0  135.6333        C          8
    9        122         0       3    male   29      0      0    8.0500        S          7
    711 rows X 10 columns
    2025-11-04 01:45:16,322 | INFO     | Time taken to perform imputation: 3.44 sec
    2025-11-04 01:45:16,323 | INFO     | Performing encoding for categorical columns ...
    2025-11-04 01:45:24,872 | INFO     | ONE HOT Encoding these Columns:
    ['sex', 'embarked']
    2025-11-04 01:45:24,873 | INFO     | Sample of dataset after performing one hot encoding:
               survived  pclass  sex_0  sex_1  age  sibsp  parch    fare  embarked_0  embarked_1  embarked_2  automl_id
    passenger
    387               0       3      0      1    1      5      2  46.900           0           0           1         15
    448               1       1      0      1   34      0      0  26.550           0           0           1         23
    713               1       1      0      1   48      1      0  52.000           0           0           1         27
    753               0       3      0      1   33      0      0   9.500           0           0           1         31
    59                1       2      1      0    5      1      2  27.750           0           0           1         39
    324               1       2      1      0   22      1      1  29.000           0           0           1         43
    263               0       1      0      1   52      1      1  79.650           0           0           1         35
    856               1       3      1      0   18      0      1   9.350           0           0           1         19
    591               0       3      0      1   35      0      0   7.125           0           0           1         11
    122               0       3      0      1   29      0      0   8.050           0           0           1          7
    711 rows X 13 columns
    2025-11-04 01:45:24,966 | INFO     | Time taken to encode the columns: 8.64 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 01:45:24,967 | INFO     | Data preparation started ...
    2025-11-04 01:45:24,967 | INFO     | Outlier preprocessing ...
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 01:45:28,116 | INFO     | Columns with outlier percentage :-
      ColumnName  OutlierPercentage
    0      sibsp           5.344585
    1       fare          12.517581
    2        age           7.172996
    3      parch          24.331927
    2025-11-04 01:45:28,547 | INFO     | Deleting rows of these columns:
    ['age', 'sibsp']
    2025-11-04 01:45:30,611 | INFO     | Sample of dataset after removing outlier rows:
               survived  pclass  sex_0  sex_1  age  sibsp  parch      fare  embarked_0  embarked_1  embarked_2  automl_id
    passenger
    795               0       3      0      1   25      0      0    7.8958           0           0           1         12
    509               0       3      0      1   28      0      0   22.5250           0           0           1         24
    774               0       3      0      1   29      0      0    7.2250           1           0           0         28
    366               0       3      0      1   30      0      0    7.2500           0           0           1         32
    242               1       3      1      0   29      1      0   15.5000           0           1           0         48
    38                0       3      0      1   21      0      0    8.0500           0           0           1         52
    467               0       2      0      1   29      0      0    0.0000           0           0           1         40
    652               1       2      1      0   18      0      1   23.0000           0           0           1         20
    326               1       1      1      0   36      0      0  135.6333           1           0           0          8
    469               0       3      0      1   29      0      0    7.7250           0           1           0          4
    629 rows X 13 columns
    2025-11-04 01:45:30,722 | INFO     | median inplace of outliers:
    ['parch', 'fare']
    2025-11-04 01:45:32,775 | INFO     | Sample of dataset after performing MEDIAN inplace:
               survived  pclass  sex_0  sex_1  age  sibsp  parch     fare  embarked_0  embarked_1  embarked_2  automl_id
    passenger
    856               1       3      1      0   18      0      0   9.3500           0           0           1         19
    713               1       1      0      1   48      1      0  52.0000           0           0           1         27
    753               0       3      0      1   33      0      0   9.5000           0           0           1         31
    263               0       1      0      1   52      1      0  13.0000           0           0           1         35
    324               1       2      1      0   22      1      0  29.0000           0           0           1         43
    385               0       3      0      1   29      0      0   7.8958           0           0           1         47
    59                1       2      1      0    5      1      0  27.7500           0           0           1         39
    448               1       1      0      1   34      0      0  26.5500           0           0           1         23
    591               0       3      0      1   35      0      0   7.1250           0           0           1         11
    122               0       3      0      1   29      0      0   8.0500           0           0           1          7
    629 rows X 13 columns
    2025-11-04 01:45:32,932 | INFO     | Time Taken by Outlier processing: 7.96 sec
    2025-11-04 01:45:32,932 | INFO     | Checking imbalance data ...
    2025-11-04 01:45:33,014 | INFO     | Imbalance Not Found.
    2025-11-04 01:45:33,973 | INFO     | Feature selection using rfe ...
    2025-11-04 01:45:53,930 | INFO     | feature selected by RFE:
    ['passenger', 'age', 'sex_1', 'pclass', 'sex_0', 'embarked_0', 'embarked_1', 'sibsp', 'embarked_2', 'fare']
    2025-11-04 01:45:53,931 | INFO     | Total time taken by feature selection: 19.96 sec
    2025-11-04 01:45:54,256 | INFO     | Scaling Features of rfe data ...
    2025-11-04 01:45:55,870 | INFO     | columns that will be scaled:
    ['r_passenger', 'r_age', 'r_pclass', 'r_sibsp', 'r_fare']
    2025-11-04 01:45:58,415 | INFO     | Dataset sample after scaling:
       r_embarked_0  survived  r_sex_1  r_sex_0  automl_id  r_embarked_1  r_embarked_2  r_passenger     r_age  r_pclass  r_sibsp    r_fare
    0             1         1        0        1          6             0             0     0.043870  0.215686       1.0      0.5  0.197223
    1             1         1        0        1          8             0             0     0.365579  0.647059       0.0      0.0  0.228070
    2             0         0        1        0          9             0             1     0.341957  0.509804       1.0      0.0  0.141228
    3             0         1        0        1         10             0             1     0.181102  0.725490       0.5      0.0  0.276316
    4             0         0        1        0         12             0             1     0.893138  0.431373       1.0      0.0  0.138523
    5             0         1        1        0         13             0             1     0.640045  0.568627       1.0      0.0  0.137793
    6             0         0        1        0         11             0             1     0.663667  0.627451       1.0      0.0  0.125000
    7             0         0        1        0          7             0             1     0.136108  0.509804       1.0      0.0  0.141228
    8             0         0        1        0          5             0             1     0.595051  0.392157       0.5      1.0  0.201754
    9             0         0        1        0          4             1             0     0.526434  0.509804       1.0      0.0  0.135526
    629 rows X 12 columns
    2025-11-04 01:45:59,105 | INFO     | Total time taken by feature scaling: 4.85 sec
    2025-11-04 01:45:59,106 | INFO     | Scaling Features of pca data ...
    2025-11-04 01:46:00,204 | INFO     | columns that will be scaled:
    ['passenger', 'pclass', 'age', 'sibsp', 'fare']
    2025-11-04 01:46:02,578 | INFO     | Dataset sample after scaling:
       survived  parch  embarked_0  sex_1  sex_0  embarked_1  automl_id  embarked_2  passenger  pclass       age  sibsp      fare
    0         0      0           0      1      0           0         18           1   0.249719     1.0  0.941176    0.0  0.141228
    1         0      0           0      1      0           0          9           1   0.341957     1.0  0.509804    0.0  0.141228
    2         1      0           0      1      0           0         13           1   0.640045     1.0  0.568627    0.0  0.137793
    3         0      0           0      1      0           1          4           0   0.526434     1.0  0.509804    0.0  0.135526
    4         0      0           0      1      0           0         12           1   0.893138     1.0  0.431373    0.0  0.138523
    5         0      0           0      1      0           0          7           1   0.136108     1.0  0.509804    0.0  0.141228
    6         0      0           0      1      0           0         11           1   0.663667     1.0  0.627451    0.0  0.125000
    7         1      0           0      0      1           0         19           1   0.961755     1.0  0.294118    0.0  0.164035
    8         1      0           1      0      1           0          8           0   0.365579     0.0  0.647059    0.0  0.228070
    9         0      0           0      1      0           0          5           1   0.595051     0.5  0.392157    1.0  0.201754
    629 rows X 13 columns
    2025-11-04 01:46:03,190 | INFO     | Total time taken by feature scaling: 4.08 sec
    2025-11-04 01:46:03,191 | INFO     | Dimension Reduction using pca ...
    2025-11-04 01:46:03,851 | INFO     | PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5']
    2025-11-04 01:46:03,852 | INFO     | Total time taken by PCA: 0.66 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 01:46:04,269 | INFO     | Model Training started ...
    2025-11-04 01:46:04,313 | INFO     | Hyperparameters used for model training:
    2025-11-04 01:46:04,314 | INFO     | Model: decision_forest
    2025-11-04 01:46:04,314 | INFO     | Hyperparameters: {'response_column': 'survived', 'name': 'decision_forest', 'tree_type': 'Classification', 'min_impurity': (0.0, 0.1, 0.2), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'num_trees': (-1,), 'seed': 42}
    2025-11-04 01:46:04,314 | INFO     | Total number of models for decision_forest: 36
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 01:46:04,315 | INFO     | Model: xgboost
    2025-11-04 01:46:04,315 | INFO     | Hyperparameters: {'response_column': 'survived', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2), 'lambda1': (1.0, 0.01, 0.1), 'shrinkage_factor': (0.5, 0.1, 0.3), 'max_depth': (5, 6, 8, 10), 'min_node_size': (1, 2, 3), 'iter_num': (10, 20, 30), 'num_boosted_trees': (-1, 5, 10), 'seed': 42}
    2025-11-04 01:46:04,318 | INFO     | Total number of models for xgboost: 5832
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 01:46:04,318 | INFO     | Model: glm
    2025-11-04 01:46:04,319 | INFO     | Hyperparameters: {'response_column': 'survived', 'name': 'glm', 'family': 'BINOMIAL', 'lambda1': (0.001, 0.02, 0.1), 'alpha': (0.15, 0.85), 'learning_rate': 'OPTIMAL', 'initial_eta': (0.05, 0.1), 'momentum': (0.65, 0.8, 0.95), 'iter_num_no_change': (5, 10, 50), 'iter_max': (300, 200, 400), 'batch_size': (10, 50, 60, 80)}
    2025-11-04 01:46:04,319 | INFO     | Total number of models for glm: 1296
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 01:46:04,319 | INFO     | Performing hyperparameter tuning ...
                                                                                                                                                                 2025-11-04 01:46:05,563 | INFO     | Model training for decision_forest
    2025-11-04 01:46:25,904 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 01:46:25,904 | INFO     | Model training for xgboost
    2025-11-04 01:46:45,404 | INFO     | ----------------------------------------------------------------------------------------------------
                                                                                                                                                                 2025-11-04 01:46:45,404 | INFO     | Model training for glm
    2025-11-04 01:47:04,963 | INFO     | ----------------------------------------------------------------------------------------------------
    2025-11-04 01:47:04,966 | INFO     | Leaderboard
        RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0      1  DECISIONFOREST_2               rfe  0.825397         0.825397  ...      0.808905  0.813358            0.824143         0.825397     0.823892
    1      2  DECISIONFOREST_4               rfe  0.825397         0.825397  ...      0.808905  0.813358            0.824143         0.825397     0.823892
    2      3  DECISIONFOREST_0               rfe  0.817460         0.817460  ...      0.802412  0.805699            0.816153         0.817460     0.816322
    3      4         XGBOOST_2               rfe  0.809524         0.809524  ...      0.814471  0.804601            0.819985         0.809524     0.811493
    4      5         XGBOOST_0               rfe  0.793651         0.793651  ...      0.790353  0.785882            0.797852         0.793651     0.794946
    5      6         XGBOOST_6               rfe  0.793651         0.793651  ...      0.790353  0.785882            0.797852         0.793651     0.794946
    6      7             GLM_2               rfe  0.785714         0.785714  ...      0.776438  0.775401            0.786579         0.785714     0.786096
    7      8  DECISIONFOREST_1               pca  0.785714         0.785714  ...      0.739332  0.751079            0.801736         0.785714     0.771713
    8      9  DECISIONFOREST_5               pca  0.785714         0.785714  ...      0.739332  0.751079            0.801736         0.785714     0.771713
    9     10             GLM_4               rfe  0.777778         0.777778  ...      0.758813  0.762456            0.775583         0.777778     0.775863
    10    11  DECISIONFOREST_3               pca  0.777778         0.777778  ...      0.732839  0.743605            0.789323         0.777778     0.764406
    11    12             GLM_7               pca  0.769841         0.769841  ...      0.752319  0.755012            0.767874         0.769841     0.768406
    12    13         XGBOOST_4               rfe  0.769841         0.769841  ...      0.748609  0.752891            0.767246         0.769841     0.767273
    13    14             GLM_6               rfe  0.761905         0.761905  ...      0.768089  0.756944            0.776963         0.761905     0.764660
    14    15             GLM_0               rfe  0.761905         0.761905  ...      0.764378  0.755751            0.772947         0.761905     0.764366
    15    16         XGBOOST_5               pca  0.761905         0.761905  ...      0.742115  0.745489            0.759396         0.761905     0.759853
    16    17             GLM_3               pca  0.761905         0.761905  ...      0.738404  0.743207            0.758943         0.761905     0.758605
    17    18             GLM_5               pca  0.746032         0.746032  ...      0.710575  0.717568            0.743834         0.746032     0.737493
    18    19         XGBOOST_3               pca  0.738095         0.738095  ...      0.730056  0.727362            0.741511         0.738095     0.739383
    19    20         XGBOOST_1               pca  0.730159         0.730159  ...      0.712430  0.713942            0.728372         0.730159     0.729078
    20    21         XGBOOST_7               pca  0.730159         0.730159  ...      0.712430  0.713942            0.728372         0.730159     0.729078
    21    22             GLM_1               pca  0.682540         0.682540  ...      0.725417  0.682219            0.772840         0.682540     0.679978
    [22 rows x 13 columns]
    22 rows X 13 columns
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 14/14
  4. Display model leaderboard.
    >>> aml.leaderboard()
        RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0      1  DECISIONFOREST_2               rfe  0.825397         0.825397  ...      0.808905  0.813358            0.824143         0.825397     0.823892
    1      2  DECISIONFOREST_4               rfe  0.825397         0.825397  ...      0.808905  0.813358            0.824143         0.825397     0.823892
    2      3  DECISIONFOREST_0               rfe  0.817460         0.817460  ...      0.802412  0.805699            0.816153         0.817460     0.816322
    3      4         XGBOOST_2               rfe  0.809524         0.809524  ...      0.814471  0.804601            0.819985         0.809524     0.811493
    4      5         XGBOOST_0               rfe  0.793651         0.793651  ...      0.790353  0.785882            0.797852         0.793651     0.794946
    5      6         XGBOOST_6               rfe  0.793651         0.793651  ...      0.790353  0.785882            0.797852         0.793651     0.794946
    6      7             GLM_2               rfe  0.785714         0.785714  ...      0.776438  0.775401            0.786579         0.785714     0.786096
    7      8  DECISIONFOREST_1               pca  0.785714         0.785714  ...      0.739332  0.751079            0.801736         0.785714     0.771713
    8      9  DECISIONFOREST_5               pca  0.785714         0.785714  ...      0.739332  0.751079            0.801736         0.785714     0.771713
    9     10             GLM_4               rfe  0.777778         0.777778  ...      0.758813  0.762456            0.775583         0.777778     0.775863
    10    11  DECISIONFOREST_3               pca  0.777778         0.777778  ...      0.732839  0.743605            0.789323         0.777778     0.764406
    11    12             GLM_7               pca  0.769841         0.769841  ...      0.752319  0.755012            0.767874         0.769841     0.768406
    12    13         XGBOOST_4               rfe  0.769841         0.769841  ...      0.748609  0.752891            0.767246         0.769841     0.767273
    13    14             GLM_6               rfe  0.761905         0.761905  ...      0.768089  0.756944            0.776963         0.761905     0.764660
    14    15             GLM_0               rfe  0.761905         0.761905  ...      0.764378  0.755751            0.772947         0.761905     0.764366
    15    16         XGBOOST_5               pca  0.761905         0.761905  ...      0.742115  0.745489            0.759396         0.761905     0.759853
    16    17             GLM_3               pca  0.761905         0.761905  ...      0.738404  0.743207            0.758943         0.761905     0.758605
    17    18             GLM_5               pca  0.746032         0.746032  ...      0.710575  0.717568            0.743834         0.746032     0.737493
    18    19         XGBOOST_3               pca  0.738095         0.738095  ...      0.730056  0.727362            0.741511         0.738095     0.739383
    19    20         XGBOOST_1               pca  0.730159         0.730159  ...      0.712430  0.713942            0.728372         0.730159     0.729078
    20    21         XGBOOST_7               pca  0.730159         0.730159  ...      0.712430  0.713942            0.728372         0.730159     0.729078
    21    22             GLM_1               pca  0.682540         0.682540  ...      0.725417  0.682219            0.772840         0.682540     0.679978
    [22 rows x 13 columns]
  5. Display the best performing model.
    >>> aml.leader()
       RANK          MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  DECISIONFOREST_2               rfe  0.825397         0.825397  ...      0.808905  0.813358            0.824143         0.825397     0.823892
    [1 rows x 13 columns]
  6. Display hyperparameters for trained model.
    1. Display mode hyperparameters for rank 1.
      >>> aml.model_hyperparameters(rank=1)
      {'response_column': 'survived', 
        'name': 'decision_forest', 
        'tree_type': 'Classification', 
        'min_impurity': 0.0, 
        'max_depth': 5, 
        'min_node_size': 2, 
        'num_trees': -1, 
        'seed': 42, 
        'persist': False, 
        'output_prob': True, 
        'output_responses': ['1', '0']}
      
    2. Display model hyperparameters for rank 4.
      >>> aml.model_hyperparameters(rank=4)
      {'response_column': 'survived', 
        'name': 'xgboost', 
        'model_type': 'Classification', 
        'column_sampling': 1, 
        'min_impurity': 0.0, 
        'lambda1': 1.0, 
        'shrinkage_factor': 0.5, 
        'max_depth': 5, 
        'min_node_size': 1, 
        'iter_num': 10, 
        'num_boosted_trees': 5, 
        'seed': 42, 
        'persist': False, 
        'output_prob': True, 
        'output_responses': ['1', '0']}
      
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(titanic_test)
    2025-11-04 01:49:52,113 | INFO     | Data Transformation started ...
    2025-11-04 01:49:52,113 | INFO     | Performing transformation carried out in feature engineering phase ...
    2025-11-04 01:49:52,736 | INFO     | Updated dataset after dropping futile columns :
       passenger  survived  pclass     sex   age  sibsp  parch     fare cabin embarked  automl_id
    0        793         0       3  female   NaN      8      2  69.5500  None        S         14
    1        814         0       3  female   6.0      4      2  31.2750  None        S          8
    2        812         0       3    male  39.0      0      0  24.1500  None        S         12
    3        265         0       3  female   NaN      0      0   7.7500  None        Q          5
    4        101         0       3  female  28.0      0      0   7.8958  None        S         13
    5         19         0       3  female  31.0      1      0  18.0000  None        S          7
    6        730         0       3  female  25.0      1      0   7.9250  None        S         11
    7        137         1       1  female  19.0      0      2  26.2833   D47        S         15
    8        244         0       3    male  22.0      0      0   7.1250  None        S          9
    9         61         0       3    male  22.0      0      0   7.2292  None        C          4
    178 rows X 11 columns
    2025-11-04 01:49:53,072 | INFO     | Updated dataset after performing target column transformation :
       passenger  survived  pclass     sex   age  sibsp  parch     fare cabin embarked  automl_id
    0        101         0       3  female  28.0      0      0   7.8958  None        S         13
    1        730         0       3  female  25.0      1      0   7.9250  None        S         11
    2        137         1       1  female  19.0      0      2  26.2833   D47        S         15
    3         61         0       3    male  22.0      0      0   7.2292  None        C          4
    4        812         0       3    male  39.0      0      0  24.1500  None        S         12
    5        734         0       2    male  23.0      0      0  13.0000  None        S          6
    6        345         0       2    male  36.0      0      0  13.0000  None        S         10
    7        793         0       3  female   NaN      8      2  69.5500  None        S         14
    8        814         0       3  female   6.0      4      2  31.2750  None        S          8
    9         19         0       3  female  31.0      1      0  18.0000  None        S          7
    178 rows X 11 columns
    2025-11-04 01:49:53,317 | INFO     | Updated dataset after dropping missing value containing columns :
       passenger  survived  pclass     sex   age  sibsp  parch     fare embarked  automl_id
    0        101         0       3  female  28.0      0      0   7.8958        S         13
    1        730         0       3  female  25.0      1      0   7.9250        S         11
    2        137         1       1  female  19.0      0      2  26.2833        S         15
    3         61         0       3    male  22.0      0      0   7.2292        C          4
    4        812         0       3    male  39.0      0      0  24.1500        S         12
    5        734         0       2    male  23.0      0      0  13.0000        S          6
    6        345         0       2    male  36.0      0      0  13.0000        S         10
    7        793         0       3  female   NaN      8      2  69.5500        S         14
    8        814         0       3  female   6.0      4      2  31.2750        S          8
    9         19         0       3  female  31.0      1      0  18.0000        S          7
    178 rows X 10 columns
    2025-11-04 01:49:54,303 | INFO     | Updated dataset after imputing missing value containing columns :
       passenger  survived  pclass     sex  age  sibsp  parch     fare embarked  automl_id
    0        793         0       3  female   29      8      2  69.5500        S         14
    1        730         0       3  female   25      1      0   7.9250        S         11
    2        137         1       1  female   19      0      2  26.2833        S         15
    3         61         0       3    male   22      0      0   7.2292        C          4
    4        812         0       3    male   39      0      0  24.1500        S         12
    5        265         0       3  female   29      0      0   7.7500        Q          5
    6        244         0       3    male   22      0      0   7.1250        S          9
    7        101         0       3  female   28      0      0   7.8958        S         13
    8        814         0       3  female    6      4      2  31.2750        S          8
    9         19         0       3  female   31      1      0  18.0000        S          7
    178 rows X 10 columns
    2025-11-04 01:49:59,120 | INFO     | Updated dataset after performing categorical encoding :
               survived  pclass  sex_0  sex_1  age  sibsp  parch      fare  embarked_0  embarked_1  embarked_2  automl_id
    passenger
    101               0       3      1      0   28      0      0    7.8958           0           0           1         13
    610               1       1      1      0   40      0      0  153.4625           0           0           1         21
    404               0       3      0      1   28      1      0   15.8500           0           0           1         25
    873               0       1      0      1   33      0      0    5.0000           0           0           1         29
    747               0       3      0      1   16      1      1   20.2500           0           0           1         37
    604               0       3      0      1   44      0      0    8.0500           0           0           1         41
    34                0       2      0      1   66      0      0   10.5000           0           0           1         33
    835               0       3      0      1   18      0      0    8.3000           0           0           1         17
    244               0       3      0      1   22      0      0    7.1250           0           0           1          9
    265               0       3      1      0   29      0      0    7.7500           0           1           0          5
    178 rows X 13 columns
    2025-11-04 01:49:59,273 | INFO     | Performing transformation carried out in data preparation phase ...
    2025-11-04 01:49:59,998 | INFO     | Updated dataset after performing RFE feature selection:
              automl_id  passenger  age  sex_1  pclass  sex_0  embarked_0  embarked_1  sibsp  embarked_2     fare
    survived
    1                64        188   45      1       1      0           0           0      0           1  26.5500
    1                84        756    0      1       2      0           0           0      1           1  14.5000
    1               108         75   32      1       3      0           0           0      0           1  56.4958
    1               116        258   30      0       1      1           0           0      0           1  86.5000
    1               132        521   30      0       1      1           0           0      0           1  93.5000
    1               148        866   42      0       2      1           0           0      0           1  13.0000
    0                13        101   28      0       3      1           0           0      0           1   7.8958
    0                25        404   28      1       3      0           0           0      1           1  15.8500
    0                29        873   33      1       1      0           0           0      0           1   5.0000
    0                33         34   66      1       2      0           0           0      0           1  10.5000
    178 rows X 12 columns
    2025-11-04 01:50:00,864 | INFO     | Updated dataset after performing scaling on RFE selected features :
       survived  r_embarked_0  r_sex_1  r_sex_0  automl_id  r_embarked_1  r_embarked_2  r_passenger     r_age  r_pclass  r_sibsp    r_fare
    0         0             0        1        0         29             0             1     0.980877  0.588235       0.0      0.0  0.087719
    1         0             0        1        0         41             0             1     0.678290  0.803922       1.0      0.0  0.141228
    2         0             0        1        0         65             0             1     0.602925  0.823529       0.0      0.0  0.465789
    3         0             0        0        1         81             1             0     0.735658  0.294118       1.0      0.0  0.118421
    4         0             0        1        0         89             0             1     0.179978  0.803922       1.0      0.0  0.282456
    5         0             0        1        0         97             1             0     0.314961  1.215686       1.0      0.0  0.135965
    6         1             1        1        0         40             0             0     0.673791  0.901961       0.0      0.5  0.998758
    7         1             1        0        1         52             0             0     0.532058  0.392157       0.5      0.0  0.241960
    8         1             0        1        0         64             0             1     0.210349  0.823529       0.0      0.0  0.465789
    9         1             1        0        1         72             0             0     0.988751  1.039216       0.0      0.0  1.458918
    178 rows X 12 columns
    2025-11-04 01:50:02,234 | INFO     | Updated dataset after performing scaling for PCA feature selection :
       survived  parch  embarked_0  sex_1  sex_0  embarked_1  automl_id  embarked_2  passenger  pclass       age  sibsp      fare
    0         1      0           0      1      0           0         64           1   0.210349     0.0  0.823529    0.0  0.465789
    1         1      1           0      1      0           0         84           1   0.849269     0.5 -0.058824    0.5  0.254386
    2         1      0           0      1      0           0        108           1   0.083240     1.0  0.568627    0.0  0.991154
    3         1      0           0      0      1           0        116           1   0.289089     0.0  0.529412    0.0  1.517544
    4         1      0           0      0      1           0        132           1   0.584927     0.0  0.529412    0.0  1.640351
    5         1      0           0      0      1           0        148           1   0.973003     0.5  0.764706    0.0  0.228070
    6         0      0           0      0      1           0         13           1   0.112486     1.0  0.490196    0.0  0.138523
    7         0      0           0      1      0           0         25           1   0.453318     1.0  0.490196    0.5  0.278070
    8         0      0           0      1      0           0         29           1   0.980877     0.0  0.588235    0.0  0.087719
    9         0      0           0      1      0           0         33           1   0.037120     0.5  1.235294    0.0  0.184211
    178 rows X 13 columns
    2025-11-04 01:50:02,690 | INFO     | Updated dataset after performing PCA feature selection :
       automl_id     col_0     col_1     col_2     col_3     col_4     col_5  survived
    0         40  0.115488 -1.108618 -0.961301  0.189160  0.037975  0.449300         1
    1         13  0.644199  0.662373  0.419392 -0.274127 -0.270581 -0.327601         0
    2         52  1.242898 -0.612770 -0.061959 -0.352359  0.154754 -0.207412         1
    3         25 -0.587362  0.155686  0.136922 -0.108206 -0.201738  0.375208         0
    4         64 -0.467396  0.116995 -0.719125  0.350888 -0.227753 -0.312627         1
    5         29 -0.494907  0.122419 -0.637915  0.252426  0.496492 -0.111965         0
    6         72  1.362560 -0.551915 -0.871712  0.129982  0.552677  0.112351         1
    7         33 -0.567285  0.105638 -0.284682  0.180347 -0.359583 -0.357239         0
    8         84 -0.503757  0.158937 -0.218662 -0.006225  0.146552  0.436244         1
    9         41 -0.652225  0.137441  0.163958 -0.100921  0.209487 -0.025870         0
    10 rows X 8 columns
    2025-11-04 01:50:02,981 | INFO     | Data Transformation completed.█████| 100% - 9/9
    2025-11-04 01:50:03,526 | INFO     | Following model is being picked for evaluation:
    2025-11-04 01:50:03,527 | INFO     | Model ID : DECISIONFOREST_2
    2025-11-04 01:50:03,527 | INFO     | Feature Selection Method : rfe
    2025-11-04 01:50:04,320 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 01:50:06,531 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 01:50:06,623 | INFO     | Prediction :
       automl_id  prediction  prob_1  prob_0  survived
    0         29           0     0.0     1.0         0
    1         41           0     0.0     1.0         0
    2         65           1     1.0     0.0         0
    3         81           1     1.0     0.0         0
    4         89           0     0.0     1.0         0
    5         97           0     0.0     1.0         0
    6         40           1     1.0     0.0         1
    7         52           1     1.0     0.0         1
    8         64           1     1.0     0.0         1
    9         72           1     1.0     0.0         1
    2025-11-04 01:50:08,738 | INFO     | ROC-AUC :
                  GINI
    AUC
    0.672362  0.344725
       threshold_value      tpr       fpr
    0         0.040816  0.77027  0.240385
    1         0.081633  0.77027  0.240385
    2         0.102041  0.77027  0.240385
    3         0.122449  0.77027  0.240385
    4         0.163265  0.77027  0.240385
    5         0.183673  0.77027  0.240385
    6         0.142857  0.77027  0.240385
    7         0.061224  0.77027  0.240385
    8         0.020408  0.77027  0.240385
    9         0.000000  1.00000  1.000000
    2025-11-04 01:50:09,226 | INFO     | Confusion Matrix :
    [[79 25]
     [17 57]]
    >>> prediction.head()
       automl_id  prediction  prob_1  prob_0  survived
    0         64           1     1.0     0.0         1
    1         84           1     1.0     0.0         1
    2        108           0     0.0     1.0         1
    3        116           1     1.0     0.0         1
    4        132           1     1.0     0.0         1
    5        148           1     1.0     0.0         1
    6         13           0     0.0     1.0         0
    7         25           0     0.0     1.0         0
    8         29           0     0.0     1.0         0
    9         33           0     0.0     1.0         0
  8. Generate evaluation metrics on test dataset using best performing model.
    >>> performance_metrics = aml.evaluate(titanic_test)
    2025-11-04 01:50:49,987 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 01:50:50,536 | INFO     | Following model is being picked for evaluation:
    2025-11-04 01:50:50,536 | INFO     | Model ID : DECISIONFOREST_2
    2025-11-04 01:50:50,536 | INFO     | Feature Selection Method : rfe
    2025-11-04 01:50:54,574 | INFO     | Performance Metrics :
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1       79       17   0.822917  0.759615  0.790000      104
    1               1  CLASS_2       25       57   0.695122  0.770270  0.730769       74
    --------------------------------------------------------------------------------
       SeqNum              Metric  MetricValue
    0       3        Micro-Recall     0.764045
    1       5     Macro-Precision     0.759019
    2       6        Macro-Recall     0.764943
    3       7            Macro-F1     0.760385
    4       9     Weighted-Recall     0.764045
    5      10         Weighted-F1     0.765376
    6       8  Weighted-Precision     0.769789
    7       4            Micro-F1     0.764045
    8       2     Micro-Precision     0.764045
    9       1            Accuracy     0.764045
    >>> performance_metrics
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum
    0               0  CLASS_1       79       17   0.822917  0.759615  0.790000      104
    1               1  CLASS_2       25       57   0.695122  0.770270  0.730769       74