Run AutoML for multiclass classification problem using early stopping timer and max_models - Example 5: Run AutoML for multiclass classification problem using early stopping timer and max_models - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-01-07
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predicts the species of iris flower based on different factors.

Run AutoML to acquire the most effective model with the following specifications:
  • Use early stopping timer to 100 sec and max_models to 5.
  • Include only ‘xgboost’ model for training.
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data.
      >>> load_example_data("teradataml", "iris_input")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> iris_sample = iris.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoML instance.
    >>> aml = AutoML(task_type="Classification"
    >>>              include=['xgboost'],
    >>>              verbose=2,
    >>>              max_runtime_secs=100,
    >>>              max_models=5)
  3. Fit training data.
    >>> aml.fit(iris_train, iris_train.species)
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 02:09:34,217 | INFO     | Feature Exploration started
    2025-11-04 02:09:34,217 | INFO     | Data Overview:
    2025-11-04 02:09:34,302 | INFO     | Total Rows in the data: 120
    2025-11-04 02:09:34,343 | INFO     | Total Columns in the data: 6
    2025-11-04 02:09:34,971 | INFO     | Column Summary:
         ColumnName Datatype  NonNullCount  NullCount BlankCount  ZeroCount  PositiveCount  NegativeCount  NullPercentage  NonNullPercentage
    0  petal_length    FLOAT           120          0       None          0            120              0             0.0              100.0
    1   petal_width    FLOAT           120          0       None          0            120              0             0.0              100.0
    2            id  INTEGER           120          0       None          0            120              0             0.0              100.0
    3       species  INTEGER           120          0       None          0            120              0             0.0              100.0
    4   sepal_width    FLOAT           120          0       None          0            120              0             0.0              100.0
    5  sepal_length    FLOAT           120          0       None          0            120              0             0.0              100.0
    2025-11-04 02:09:35,778 | INFO     | Statistics of Data:
          ATTRIBUTE StatName  StatValue
    0   petal_width  MAXIMUM        2.5
    1  petal_length  MINIMUM        1.0
    2  petal_length  MAXIMUM        6.9
    3            id    COUNT      120.0
    4            id  MAXIMUM      150.0
    5  sepal_length    COUNT      120.0
    6  sepal_length  MINIMUM        4.4
    7  sepal_length  MAXIMUM        7.9
    8            id  MINIMUM        1.0
    9  petal_length    COUNT      120.0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 02:09:38,931 | INFO     | Columns with outlier percentage :-
        ColumnName  OutlierPercentage
    0  sepal_width           3.333333
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 02:09:39,251 | INFO     | Feature Engineering started ...
    2025-11-04 02:09:39,252 | INFO     | Handling duplicate records present in dataset ...
    2025-11-04 02:09:39,387 | INFO     | Analysis completed. No action taken.
    2025-11-04 02:09:39,387 | INFO     | Total time to handle duplicate records: 0.14 sec
    2025-11-04 02:09:39,387 | INFO     | Handling less significant features from data ...
    2025-11-04 02:09:40,241 | INFO     | Total time to handle less significant features: 0.85 sec
    2025-11-04 02:09:40,241 | INFO     | Handling Date Features ...
    2025-11-04 02:09:40,241 | INFO     | Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    2025-11-04 02:09:40,241 | INFO     | Total time to handle date features: 0.00 sec
    2025-11-04 02:09:40,241 | INFO     | Checking Missing values in dataset ...
    2025-11-04 02:09:41,623 | INFO     | Analysis Completed. No Missing Values Detected.
    2025-11-04 02:09:41,623 | INFO     | Total time to find missing values in data: 1.38 sec
    2025-11-04 02:09:41,624 | INFO     | Imputing Missing Values ...
    2025-11-04 02:09:41,624 | INFO     | Analysis completed. No imputation required.
    2025-11-04 02:09:41,624 | INFO     | Time taken to perform imputation: 0.00 sec
    2025-11-04 02:09:41,624 | INFO     | Performing encoding for categorical columns ...
    2025-11-04 02:09:41,975 | INFO     | Analysis completed. No categorical columns were found.
    2025-11-04 02:09:41,975 | INFO     | Time taken to encode the columns: 0.35 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 02:09:41,976 | INFO     | Data preparation started ...
    2025-11-04 02:09:41,976 | INFO     | Outlier preprocessing ...
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2025-11-04 02:09:45,835 | INFO     | Columns with outlier percentage :-
        ColumnName  OutlierPercentage
    0  sepal_width           3.333333
    2025-11-04 02:09:46,508 | INFO     | Deleting rows of these columns:
    ['sepal_width']
    2025-11-04 02:09:49,451 | INFO     | Sample of dataset after removing outlier rows:
         sepal_length  sepal_width  petal_length  petal_width  species  automl_id
    id
    99            5.1          2.5           3.0          1.1        2         15
    97            5.7          2.9           4.2          1.3        2         23
    15            5.8          4.0           1.2          0.2        1         27
    53            6.9          3.1           4.9          1.5        2         31
    30            4.7          3.2           1.6          0.2        1         39
    91            5.5          2.6           4.4          1.2        2         43
    114           5.7          2.5           5.0          2.0        3         35
    36            5.0          3.2           1.2          0.2        1         19
    59            6.6          2.9           4.6          1.3        2         11
    19            5.7          3.8           1.7          0.3        1          7
    116 rows X 7 columns
    2025-11-04 02:09:49,602 | INFO     | Time Taken by Outlier processing: 7.63 sec
    2025-11-04 02:09:49,603 | INFO     | Checking imbalance data ...
    2025-11-04 02:09:49,707 | INFO     | Imbalance Not Found.
    2025-11-04 02:09:50,712 | INFO     | Feature selection using rfe ...
    2025-11-04 02:09:57,569 | INFO     | feature selected by RFE:
    ['id', 'petal_length']
    2025-11-04 02:09:57,571 | INFO     | Total time taken by feature selection: 6.86 sec
    2025-11-04 02:09:57,831 | INFO     | Scaling Features of rfe data ...
    2025-11-04 02:09:58,603 | INFO     | columns that will be scaled:
    ['r_id', 'r_petal_length']
    2025-11-04 02:10:00,490 | INFO     | Dataset sample after scaling:
       automl_id  species      r_id  r_petal_length
    0          7        1  0.120805        0.118644
    1          9        1  0.107383        0.050847
    2         10        3  0.798658        0.677966
    3         11        2  0.389262        0.610169
    4         13        3  0.926174        0.644068
    5         14        2  0.375839        0.627119
    6         12        2  0.503356        0.576271
    7          8        1  0.248322        0.067797
    8          6        2  0.530201        0.423729
    9          5        3  0.939597        0.779661
    116 rows X 4 columns
    2025-11-04 02:10:01,015 | INFO     | Total time taken by feature scaling: 3.18 sec
    2025-11-04 02:10:01,016 | INFO     | Scaling Features of pca data ...
    2025-11-04 02:10:01,532 | INFO     | columns that will be scaled:
    ['id', 'sepal_length', 'sepal_width', 'petal_length', 'petal_width']
    2025-11-04 02:10:03,425 | INFO     | Dataset sample after scaling:
       automl_id  species        id  sepal_length  sepal_width  petal_length  petal_width
    0         16        2  0.617450      0.400000     0.222222      0.508475     0.458333
    1         10        3  0.798658      0.457143     0.000000      0.677966     0.583333
    2         14        2  0.375839      0.542857     0.611111      0.627119     0.625000
    3          5        3  0.939597      0.657143     0.500000      0.779661     0.958333
    4         13        3  0.926174      0.457143     0.444444      0.644068     0.708333
    5          7        1  0.120805      0.371429     0.888889      0.118644     0.083333
    6         11        2  0.389262      0.628571     0.388889      0.610169     0.500000
    7         15        2  0.657718      0.200000     0.166667      0.338983     0.416667
    8          9        1  0.107383      0.285714     0.944444      0.050847     0.125000
    9          6        2  0.530201      0.371429     0.222222      0.423729     0.375000
    116 rows X 7 columns
    2025-11-04 02:10:03,972 | INFO     | Total time taken by feature scaling: 2.96 sec
    2025-11-04 02:10:03,972 | INFO     | Dimension Reduction using pca ...
    2025-11-04 02:10:04,588 | INFO     | PCA columns:
    ['col_0', 'col_1', 'col_2']
    2025-11-04 02:10:04,589 | INFO     | Total time taken by PCA: 0.62 sec
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    2025-11-04 02:10:04,945 | INFO     | Model Training started ...
    2025-11-04 02:10:04,989 | INFO     | Hyperparameters used for model training:
    2025-11-04 02:10:04,989 | INFO     | Model: xgboost
    2025-11-04 02:10:04,989 | INFO     | Hyperparameters: {'response_column': 'species', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1), 'lambda1': (1.0, 0.001, 0.01), 'shrinkage_factor': (0.5, 0.1, 0.2), 'max_depth': (5, 6, 7, 8), 'min_node_size': (1, 2), 'iter_num': (10, 20), 'num_boosted_trees': (-1, 2, 5), 'seed': 42}
    2025-11-04 02:10:04,989 | INFO     | Total number of models for xgboost: 1728
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2025-11-04 02:10:04,990 | INFO     | Performing hyperparameter tuning ...
                                                                                                                                                                 2025-11-04 02:10:06,204 | INFO     | Model training for xgboost
    2025-11-04 02:10:19,205 | INFO     | ----------------------------------------------------------------------------------------------------
    2025-11-04 02:10:19,207 | INFO     | Leaderboard
       RANK   MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  XGBOOST_3               pca  1.000000         1.000000  ...      1.000000   1.00000            1.000000         1.000000      1.00000
    1     2  XGBOOST_0               rfe  1.000000         1.000000  ...      1.000000   1.00000            1.000000         1.000000      1.00000
    2     3  XGBOOST_1               rfe  0.958333         0.958333  ...      0.958333   0.95817            0.962963         0.958333      0.95817
    3     4  XGBOOST_2               rfe  0.958333         0.958333  ...      0.958333   0.95817            0.962963         0.958333      0.95817
    [4 rows x 13 columns]
    4 rows X 13 columns
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 12/12
  4. Display model leaderboard.
    >>> aml.leaderboard()
       RANK   MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  XGBOOST_3               pca  1.000000         1.000000  ...      1.000000   1.00000            1.000000         1.000000      1.00000
    1     2  XGBOOST_0               rfe  1.000000         1.000000  ...      1.000000   1.00000            1.000000         1.000000      1.00000
    2     3  XGBOOST_1               rfe  0.958333         0.958333  ...      0.958333   0.95817            0.962963         0.958333      0.95817
    3     4  XGBOOST_2               rfe  0.958333         0.958333  ...      0.958333   0.95817            0.962963         0.958333      0.95817
    [4 rows x 13 columns]
  5. Display best performing model.
    >>> aml.leader()
       RANK   MODEL_ID FEATURE_SELECTION  ACCURACY  MICRO-PRECISION  ...  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0     1  XGBOOST_3               pca       1.0              1.0  ...           1.0       1.0                 1.0              1.0          1.0
    [1 rows x 13 columns]
  6. Display model hyperparameters for trained model.
    >>> aml.model_hyperparameters(rank=2)
    {'response_column': 'species', 
      'name': 'xgboost', 
      'model_type': 'Classification', 
      'column_sampling': 1, 
      'min_impurity': 0.0, 
      'lambda1': 0.001, 
      'shrinkage_factor': 0.2, 
      'max_depth': 6, 
      'min_node_size': 2, 
      'iter_num': 10, 
      'num_boosted_trees': 5, 
      'seed': 42, 
      'persist': False, 
      'output_prob': True, 
      'output_responses': ['1', '2', '3'], 
      'max_models': 3}
    
    >>> aml.model_hyperparameters(rank=4)
    {'response_column': 'species', 
      'name': 'xgboost', 
      'model_type': 'Classification', 
      'column_sampling': 1, 
      'min_impurity': 0.0, 
      'lambda1': 0.01, 
      'shrinkage_factor': 0.1, 
      'max_depth': 5, 
      'min_node_size': 1, 
      'iter_num': 20, 
      'num_boosted_trees': -1, 
      'seed': 42, 
      'persist': False, 
      'output_prob': True, 
      'output_responses': ['1', '2', '3'], 
      'max_models': 3}
    
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(iris_test)
    2025-11-04 02:13:26,384 | INFO     | Data Transformation started ...
    2025-11-04 02:13:26,384 | INFO     | Performing transformation carried out in feature engineering phase ...
    2025-11-04 02:13:26,914 | INFO     | Updated dataset after performing target column transformation :
        id  sepal_length  sepal_width  petal_length  petal_width  species  automl_id
    0  106           7.6          3.0           6.6          2.1        3         13
    1  116           6.4          3.2           5.3          2.3        3          8
    2   43           4.4          3.2           1.3          0.2        1         12
    3  122           5.6          2.8           4.9          2.0        3          7
    4   74           6.1          2.8           4.7          1.2        2         15
    5   40           5.1          3.4           1.5          0.2        1          6
    6   62           5.9          3.0           4.2          1.5        2         10
    7   37           5.5          3.5           1.3          0.2        1         14
    8  137           6.3          3.4           5.6          2.4        3         11
    9   78           6.7          3.0           5.0          1.7        2          4
    30 rows X 7 columns
    2025-11-04 02:13:27,368 | INFO     | Performing transformation carried out in data preparation phase ...
    2025-11-04 02:13:28,134 | INFO     | Updated dataset after performing RFE feature selection:
                id  petal_length  species
    automl_id
    30          67           4.5        2
    5          101           6.0        3
    24          18           1.4        1
    17          64           4.7        2
    13         106           6.6        3
    7          122           4.9        3
    22          92           4.6        2
    12          43           1.3        1
    34         105           5.8        3
    26         149           5.4        3
    30 rows X 4 columns
    2025-11-04 02:13:28,833 | INFO     | Updated dataset after performing scaling on RFE selected features :
       automl_id  species      r_id  r_petal_length
    0         13        3  0.704698        0.949153
    1         15        2  0.489933        0.627119
    2         30        2  0.442953        0.593220
    3          7        3  0.812081        0.661017
    4         12        1  0.281879        0.050847
    5         26        3  0.993289        0.745763
    6          5        3  0.671141        0.847458
    7         24        1  0.114094        0.067797
    8         22        2  0.610738        0.610169
    9         19        3  0.778523        0.762712
    30 rows X 4 columns
    2025-11-04 02:13:29,863 | INFO     | Updated dataset after performing scaling for PCA feature selection :
       automl_id  species        id  sepal_length  sepal_width  petal_length  petal_width
    0         12        1  0.281879      0.000000     0.555556      0.050847     0.041667
    1         15        2  0.489933      0.485714     0.333333      0.627119     0.458333
    2         30        2  0.442953      0.342857     0.444444      0.593220     0.583333
    3         17        2  0.422819      0.485714     0.388889      0.627119     0.541667
    4         13        3  0.704698      0.914286     0.444444      0.949153     0.833333
    5         26        3  0.993289      0.514286     0.666667      0.745763     0.916667
    6          5        3  0.671141      0.542857     0.611111      0.847458     1.000000
    7         24        1  0.114094      0.200000     0.722222      0.067797     0.083333
    8         34        3  0.697987      0.600000     0.444444      0.813559     0.875000
    9         19        3  0.778523      0.600000     0.444444      0.762712     0.708333
    30 rows X 7 columns
    2025-11-04 02:13:30,204 | INFO     | Updated dataset after performing PCA feature selection :
       automl_id     col_0     col_1     col_2  species
    0         26  0.642726  0.216844  0.320843        3
    1         17  0.129536 -0.024644 -0.161745        2
    2          7  0.433299 -0.149514  0.189948        3
    3         19  0.502673  0.055083  0.030002        3
    4          5  0.602389  0.215334  0.029613        3
    5         34  0.580657  0.075005 -0.032312        3
    6         22  0.204231  0.008493 -0.001782        2
    7         15  0.127093 -0.086724 -0.135035        2
    8         24 -0.736908  0.143687 -0.010681        1
    9         13  0.752095  0.201770 -0.234565        3
    10 rows X 5 columns
    2025-11-04 02:13:30,488 | INFO     | Data Transformation completed.⫿⫿⫿⫿⫿| 100% - 9/9
    2025-11-04 02:13:31,034 | INFO     | Following model is being picked for evaluation:
    2025-11-04 02:13:31,034 | INFO     | Model ID : XGBOOST_3
    2025-11-04 02:13:31,034 | INFO     | Feature Selection Method : pca
    2025-11-04 02:13:31,664 | INFO     | Applying SHAP for Model Interpretation...
    2025-11-04 02:13:33,748 | INFO     | SHAP Analysis Completed. Feature Importance Available.
    /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
      plt.show()
    2025-11-04 02:13:33,847 | INFO     | Prediction :
       automl_id  Prediction  species    prob_1    prob_2    prob_3
    0          7           3        3  0.100455  0.100461  0.799084
    1          5           3        3  0.100455  0.100461  0.799084
    2         34           3        3  0.100455  0.100461  0.799084
    3         22           2        2  0.104979  0.790093  0.104928
    4         24           1        1  0.799107  0.100466  0.100428
    5         13           3        3  0.100455  0.100461  0.799084
    6         15           2        2  0.104979  0.790093  0.104928
    7         19           3        3  0.100455  0.100461  0.799084
    8         17           2        2  0.104979  0.790093  0.104928
    9         26           3        3  0.100455  0.100461  0.799084
    2025-11-04 02:13:34,222 | INFO     | Confusion Matrix :
    [[ 8  0  0]
     [ 0 12  0]
     [ 0  0 10]]
    >>> prediction.head()
       automl_id  Prediction  species    prob_1    prob_2    prob_3
    0          7           3        3  0.100455  0.100461  0.799084
    1          5           3        3  0.100455  0.100461  0.799084
    2         34           3        3  0.100455  0.100461  0.799084
    3         22           2        2  0.104979  0.790093  0.104928
    4         24           1        1  0.799107  0.100466  0.100428
    5         13           3        3  0.100455  0.100461  0.799084
    6         15           2        2  0.104979  0.790093  0.104928
    7         19           3        3  0.100455  0.100461  0.799084
    8         17           2        2  0.104979  0.790093  0.104928
    9         26           3        3  0.100455  0.100461  0.799084
  8. Generate evaluation metrics on test dataset using best performing model.
    >>> performance_metrics = aml.evaluate(iris_test)
    2025-11-04 02:14:08,789 | INFO     | Skipping data transformation as data is already transformed.
    2025-11-04 02:14:09,339 | INFO     | Following model is being picked for evaluation:
    2025-11-04 02:14:09,340 | INFO     | Model ID : XGBOOST_3
    2025-11-04 02:14:09,340 | INFO     | Feature Selection Method : pca
    2025-11-04 02:14:11,998 | INFO     | Performance Metrics :
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall   F1  Support
    SeqNum
    2               3  CLASS_3        0        0       10        1.0     1.0  1.0       10
    1               2  CLASS_2        0       12        0        1.0     1.0  1.0       12
    0               1  CLASS_1        8        0        0        1.0     1.0  1.0        8
    --------------------------------------------------------------------------------
       SeqNum              Metric  MetricValue
    0       3        Micro-Recall          1.0
    1       5     Macro-Precision          1.0
    2       6        Macro-Recall          1.0
    3       7            Macro-F1          1.0
    4       9     Weighted-Recall          1.0
    5      10         Weighted-F1          1.0
    6       8  Weighted-Precision          1.0
    7       4            Micro-F1          1.0
    8       2     Micro-Precision          1.0
    9       1            Accuracy          1.0
    >>> performance_metrics
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall   F1  Support
    SeqNum
    0               1  CLASS_1        8        0        0        1.0     1.0  1.0        8
    2               3  CLASS_3        0        0       10        1.0     1.0  1.0       10
    1               2  CLASS_2        0       12        0        1.0     1.0  1.0       12