This example predicts the species of iris flower based on different factors.
Run AutoML to acquire the most effective model with the following specifications:
- Use early stopping timer to 100 sec and max_models to 5.
- Include only ‘xgboost’ model for training.
- Opt for verbose level 2 to get detailed log.
- Load data and split it to train and test datasets.
- Load the example data.
>>> load_example_data("teradataml", "iris_input") - Perform sampling to get 80% for training and 20% for testing.
>>> iris_sample = iris.sample(frac = [0.8, 0.2])
- Fetch train and test data.
>>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)>>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
- Load the example data.
- Create an AutoML instance.
>>> aml = AutoML(task_type="Classification" >>> include=['xgboost'], >>> verbose=2, >>> max_runtime_secs=100, >>> max_models=5)
- Fit training data.
>>> aml.fit(iris_train, iris_train.species)
1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 02:09:34,217 | INFO | Feature Exploration started 2025-11-04 02:09:34,217 | INFO | Data Overview: 2025-11-04 02:09:34,302 | INFO | Total Rows in the data: 120 2025-11-04 02:09:34,343 | INFO | Total Columns in the data: 6 2025-11-04 02:09:34,971 | INFO | Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage 0 petal_length FLOAT 120 0 None 0 120 0 0.0 100.0 1 petal_width FLOAT 120 0 None 0 120 0 0.0 100.0 2 id INTEGER 120 0 None 0 120 0 0.0 100.0 3 species INTEGER 120 0 None 0 120 0 0.0 100.0 4 sepal_width FLOAT 120 0 None 0 120 0 0.0 100.0 5 sepal_length FLOAT 120 0 None 0 120 0 0.0 100.0 2025-11-04 02:09:35,778 | INFO | Statistics of Data: ATTRIBUTE StatName StatValue 0 petal_width MAXIMUM 2.5 1 petal_length MINIMUM 1.0 2 petal_length MAXIMUM 6.9 3 id COUNT 120.0 4 id MAXIMUM 150.0 5 sepal_length COUNT 120.0 6 sepal_length MINIMUM 4.4 7 sepal_length MAXIMUM 7.9 8 id MINIMUM 1.0 9 petal_length COUNT 120.0 2025-11-04 02:09:38,931 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 sepal_width 3.333333 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 02:09:39,251 | INFO | Feature Engineering started ... 2025-11-04 02:09:39,252 | INFO | Handling duplicate records present in dataset ... 2025-11-04 02:09:39,387 | INFO | Analysis completed. No action taken. 2025-11-04 02:09:39,387 | INFO | Total time to handle duplicate records: 0.14 sec 2025-11-04 02:09:39,387 | INFO | Handling less significant features from data ... 2025-11-04 02:09:40,241 | INFO | Total time to handle less significant features: 0.85 sec 2025-11-04 02:09:40,241 | INFO | Handling Date Features ... 2025-11-04 02:09:40,241 | INFO | Analysis Completed. Dataset does not contain any feature related to dates. No action needed. 2025-11-04 02:09:40,241 | INFO | Total time to handle date features: 0.00 sec 2025-11-04 02:09:40,241 | INFO | Checking Missing values in dataset ... 2025-11-04 02:09:41,623 | INFO | Analysis Completed. No Missing Values Detected. 2025-11-04 02:09:41,623 | INFO | Total time to find missing values in data: 1.38 sec 2025-11-04 02:09:41,624 | INFO | Imputing Missing Values ... 2025-11-04 02:09:41,624 | INFO | Analysis completed. No imputation required. 2025-11-04 02:09:41,624 | INFO | Time taken to perform imputation: 0.00 sec 2025-11-04 02:09:41,624 | INFO | Performing encoding for categorical columns ... 2025-11-04 02:09:41,975 | INFO | Analysis completed. No categorical columns were found. 2025-11-04 02:09:41,975 | INFO | Time taken to encode the columns: 0.35 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 02:09:41,976 | INFO | Data preparation started ... 2025-11-04 02:09:41,976 | INFO | Outlier preprocessing ... 2025-11-04 02:09:45,835 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 sepal_width 3.333333 2025-11-04 02:09:46,508 | INFO | Deleting rows of these columns: ['sepal_width'] 2025-11-04 02:09:49,451 | INFO | Sample of dataset after removing outlier rows: sepal_length sepal_width petal_length petal_width species automl_id id 99 5.1 2.5 3.0 1.1 2 15 97 5.7 2.9 4.2 1.3 2 23 15 5.8 4.0 1.2 0.2 1 27 53 6.9 3.1 4.9 1.5 2 31 30 4.7 3.2 1.6 0.2 1 39 91 5.5 2.6 4.4 1.2 2 43 114 5.7 2.5 5.0 2.0 3 35 36 5.0 3.2 1.2 0.2 1 19 59 6.6 2.9 4.6 1.3 2 11 19 5.7 3.8 1.7 0.3 1 7 116 rows X 7 columns 2025-11-04 02:09:49,602 | INFO | Time Taken by Outlier processing: 7.63 sec 2025-11-04 02:09:49,603 | INFO | Checking imbalance data ... 2025-11-04 02:09:49,707 | INFO | Imbalance Not Found. 2025-11-04 02:09:50,712 | INFO | Feature selection using rfe ... 2025-11-04 02:09:57,569 | INFO | feature selected by RFE: ['id', 'petal_length'] 2025-11-04 02:09:57,571 | INFO | Total time taken by feature selection: 6.86 sec 2025-11-04 02:09:57,831 | INFO | Scaling Features of rfe data ... 2025-11-04 02:09:58,603 | INFO | columns that will be scaled: ['r_id', 'r_petal_length'] 2025-11-04 02:10:00,490 | INFO | Dataset sample after scaling: automl_id species r_id r_petal_length 0 7 1 0.120805 0.118644 1 9 1 0.107383 0.050847 2 10 3 0.798658 0.677966 3 11 2 0.389262 0.610169 4 13 3 0.926174 0.644068 5 14 2 0.375839 0.627119 6 12 2 0.503356 0.576271 7 8 1 0.248322 0.067797 8 6 2 0.530201 0.423729 9 5 3 0.939597 0.779661 116 rows X 4 columns 2025-11-04 02:10:01,015 | INFO | Total time taken by feature scaling: 3.18 sec 2025-11-04 02:10:01,016 | INFO | Scaling Features of pca data ... 2025-11-04 02:10:01,532 | INFO | columns that will be scaled: ['id', 'sepal_length', 'sepal_width', 'petal_length', 'petal_width'] 2025-11-04 02:10:03,425 | INFO | Dataset sample after scaling: automl_id species id sepal_length sepal_width petal_length petal_width 0 16 2 0.617450 0.400000 0.222222 0.508475 0.458333 1 10 3 0.798658 0.457143 0.000000 0.677966 0.583333 2 14 2 0.375839 0.542857 0.611111 0.627119 0.625000 3 5 3 0.939597 0.657143 0.500000 0.779661 0.958333 4 13 3 0.926174 0.457143 0.444444 0.644068 0.708333 5 7 1 0.120805 0.371429 0.888889 0.118644 0.083333 6 11 2 0.389262 0.628571 0.388889 0.610169 0.500000 7 15 2 0.657718 0.200000 0.166667 0.338983 0.416667 8 9 1 0.107383 0.285714 0.944444 0.050847 0.125000 9 6 2 0.530201 0.371429 0.222222 0.423729 0.375000 116 rows X 7 columns 2025-11-04 02:10:03,972 | INFO | Total time taken by feature scaling: 2.96 sec 2025-11-04 02:10:03,972 | INFO | Dimension Reduction using pca ... 2025-11-04 02:10:04,588 | INFO | PCA columns: ['col_0', 'col_1', 'col_2'] 2025-11-04 02:10:04,589 | INFO | Total time taken by PCA: 0.62 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 02:10:04,945 | INFO | Model Training started ... 2025-11-04 02:10:04,989 | INFO | Hyperparameters used for model training: 2025-11-04 02:10:04,989 | INFO | Model: xgboost 2025-11-04 02:10:04,989 | INFO | Hyperparameters: {'response_column': 'species', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1), 'lambda1': (1.0, 0.001, 0.01), 'shrinkage_factor': (0.5, 0.1, 0.2), 'max_depth': (5, 6, 7, 8), 'min_node_size': (1, 2), 'iter_num': (10, 20), 'num_boosted_trees': (-1, 2, 5), 'seed': 42} 2025-11-04 02:10:04,989 | INFO | Total number of models for xgboost: 1728 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 02:10:04,990 | INFO | Performing hyperparameter tuning ... 2025-11-04 02:10:06,204 | INFO | Model training for xgboost 2025-11-04 02:10:19,205 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 02:10:19,207 | INFO | Leaderboard RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_3 pca 1.000000 1.000000 ... 1.000000 1.00000 1.000000 1.000000 1.00000 1 2 XGBOOST_0 rfe 1.000000 1.000000 ... 1.000000 1.00000 1.000000 1.000000 1.00000 2 3 XGBOOST_1 rfe 0.958333 0.958333 ... 0.958333 0.95817 0.962963 0.958333 0.95817 3 4 XGBOOST_2 rfe 0.958333 0.958333 ... 0.958333 0.95817 0.962963 0.958333 0.95817 [4 rows x 13 columns] 4 rows X 13 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 12/12 - Display model leaderboard.
>>> aml.leaderboard()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_3 pca 1.000000 1.000000 ... 1.000000 1.00000 1.000000 1.000000 1.00000 1 2 XGBOOST_0 rfe 1.000000 1.000000 ... 1.000000 1.00000 1.000000 1.000000 1.00000 2 3 XGBOOST_1 rfe 0.958333 0.958333 ... 0.958333 0.95817 0.962963 0.958333 0.95817 3 4 XGBOOST_2 rfe 0.958333 0.958333 ... 0.958333 0.95817 0.962963 0.958333 0.95817 [4 rows x 13 columns]
- Display best performing model.
>>> aml.leader()
RANK MODEL_ID FEATURE_SELECTION ACCURACY MICRO-PRECISION ... MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 1 XGBOOST_3 pca 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 [1 rows x 13 columns]
- Display model hyperparameters for trained model.
>>> aml.model_hyperparameters(rank=2)
{'response_column': 'species', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': 1, 'min_impurity': 0.0, 'lambda1': 0.001, 'shrinkage_factor': 0.2, 'max_depth': 6, 'min_node_size': 2, 'iter_num': 10, 'num_boosted_trees': 5, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '2', '3'], 'max_models': 3}>>> aml.model_hyperparameters(rank=4)
{'response_column': 'species', 'name': 'xgboost', 'model_type': 'Classification', 'column_sampling': 1, 'min_impurity': 0.0, 'lambda1': 0.01, 'shrinkage_factor': 0.1, 'max_depth': 5, 'min_node_size': 1, 'iter_num': 20, 'num_boosted_trees': -1, 'seed': 42, 'persist': False, 'output_prob': True, 'output_responses': ['1', '2', '3'], 'max_models': 3} - Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(iris_test)
2025-11-04 02:13:26,384 | INFO | Data Transformation started ... 2025-11-04 02:13:26,384 | INFO | Performing transformation carried out in feature engineering phase ... 2025-11-04 02:13:26,914 | INFO | Updated dataset after performing target column transformation : id sepal_length sepal_width petal_length petal_width species automl_id 0 106 7.6 3.0 6.6 2.1 3 13 1 116 6.4 3.2 5.3 2.3 3 8 2 43 4.4 3.2 1.3 0.2 1 12 3 122 5.6 2.8 4.9 2.0 3 7 4 74 6.1 2.8 4.7 1.2 2 15 5 40 5.1 3.4 1.5 0.2 1 6 6 62 5.9 3.0 4.2 1.5 2 10 7 37 5.5 3.5 1.3 0.2 1 14 8 137 6.3 3.4 5.6 2.4 3 11 9 78 6.7 3.0 5.0 1.7 2 4 30 rows X 7 columns 2025-11-04 02:13:27,368 | INFO | Performing transformation carried out in data preparation phase ... 2025-11-04 02:13:28,134 | INFO | Updated dataset after performing RFE feature selection: id petal_length species automl_id 30 67 4.5 2 5 101 6.0 3 24 18 1.4 1 17 64 4.7 2 13 106 6.6 3 7 122 4.9 3 22 92 4.6 2 12 43 1.3 1 34 105 5.8 3 26 149 5.4 3 30 rows X 4 columns 2025-11-04 02:13:28,833 | INFO | Updated dataset after performing scaling on RFE selected features : automl_id species r_id r_petal_length 0 13 3 0.704698 0.949153 1 15 2 0.489933 0.627119 2 30 2 0.442953 0.593220 3 7 3 0.812081 0.661017 4 12 1 0.281879 0.050847 5 26 3 0.993289 0.745763 6 5 3 0.671141 0.847458 7 24 1 0.114094 0.067797 8 22 2 0.610738 0.610169 9 19 3 0.778523 0.762712 30 rows X 4 columns 2025-11-04 02:13:29,863 | INFO | Updated dataset after performing scaling for PCA feature selection : automl_id species id sepal_length sepal_width petal_length petal_width 0 12 1 0.281879 0.000000 0.555556 0.050847 0.041667 1 15 2 0.489933 0.485714 0.333333 0.627119 0.458333 2 30 2 0.442953 0.342857 0.444444 0.593220 0.583333 3 17 2 0.422819 0.485714 0.388889 0.627119 0.541667 4 13 3 0.704698 0.914286 0.444444 0.949153 0.833333 5 26 3 0.993289 0.514286 0.666667 0.745763 0.916667 6 5 3 0.671141 0.542857 0.611111 0.847458 1.000000 7 24 1 0.114094 0.200000 0.722222 0.067797 0.083333 8 34 3 0.697987 0.600000 0.444444 0.813559 0.875000 9 19 3 0.778523 0.600000 0.444444 0.762712 0.708333 30 rows X 7 columns 2025-11-04 02:13:30,204 | INFO | Updated dataset after performing PCA feature selection : automl_id col_0 col_1 col_2 species 0 26 0.642726 0.216844 0.320843 3 1 17 0.129536 -0.024644 -0.161745 2 2 7 0.433299 -0.149514 0.189948 3 3 19 0.502673 0.055083 0.030002 3 4 5 0.602389 0.215334 0.029613 3 5 34 0.580657 0.075005 -0.032312 3 6 22 0.204231 0.008493 -0.001782 2 7 15 0.127093 -0.086724 -0.135035 2 8 24 -0.736908 0.143687 -0.010681 1 9 13 0.752095 0.201770 -0.234565 3 10 rows X 5 columns 2025-11-04 02:13:30,488 | INFO | Data Transformation completed.⫿⫿⫿⫿⫿| 100% - 9/9 2025-11-04 02:13:31,034 | INFO | Following model is being picked for evaluation: 2025-11-04 02:13:31,034 | INFO | Model ID : XGBOOST_3 2025-11-04 02:13:31,034 | INFO | Feature Selection Method : pca 2025-11-04 02:13:31,664 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 02:13:33,748 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 02:13:33,847 | INFO | Prediction : automl_id Prediction species prob_1 prob_2 prob_3 0 7 3 3 0.100455 0.100461 0.799084 1 5 3 3 0.100455 0.100461 0.799084 2 34 3 3 0.100455 0.100461 0.799084 3 22 2 2 0.104979 0.790093 0.104928 4 24 1 1 0.799107 0.100466 0.100428 5 13 3 3 0.100455 0.100461 0.799084 6 15 2 2 0.104979 0.790093 0.104928 7 19 3 3 0.100455 0.100461 0.799084 8 17 2 2 0.104979 0.790093 0.104928 9 26 3 3 0.100455 0.100461 0.799084 2025-11-04 02:13:34,222 | INFO | Confusion Matrix : [[ 8 0 0] [ 0 12 0] [ 0 0 10]]>>> prediction.head()
automl_id Prediction species prob_1 prob_2 prob_3 0 7 3 3 0.100455 0.100461 0.799084 1 5 3 3 0.100455 0.100461 0.799084 2 34 3 3 0.100455 0.100461 0.799084 3 22 2 2 0.104979 0.790093 0.104928 4 24 1 1 0.799107 0.100466 0.100428 5 13 3 3 0.100455 0.100461 0.799084 6 15 2 2 0.104979 0.790093 0.104928 7 19 3 3 0.100455 0.100461 0.799084 8 17 2 2 0.104979 0.790093 0.104928 9 26 3 3 0.100455 0.100461 0.799084
- Generate evaluation metrics on test dataset using best performing model.
>>> performance_metrics = aml.evaluate(iris_test)
2025-11-04 02:14:08,789 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 02:14:09,339 | INFO | Following model is being picked for evaluation: 2025-11-04 02:14:09,340 | INFO | Model ID : XGBOOST_3 2025-11-04 02:14:09,340 | INFO | Feature Selection Method : pca 2025-11-04 02:14:11,998 | INFO | Performance Metrics : Prediction Mapping CLASS_1 CLASS_2 CLASS_3 Precision Recall F1 Support SeqNum 2 3 CLASS_3 0 0 10 1.0 1.0 1.0 10 1 2 CLASS_2 0 12 0 1.0 1.0 1.0 12 0 1 CLASS_1 8 0 0 1.0 1.0 1.0 8 -------------------------------------------------------------------------------- SeqNum Metric MetricValue 0 3 Micro-Recall 1.0 1 5 Macro-Precision 1.0 2 6 Macro-Recall 1.0 3 7 Macro-F1 1.0 4 9 Weighted-Recall 1.0 5 10 Weighted-F1 1.0 6 8 Weighted-Precision 1.0 7 4 Micro-F1 1.0 8 2 Micro-Precision 1.0 9 1 Accuracy 1.0>>> performance_metrics
Prediction Mapping CLASS_1 CLASS_2 CLASS_3 Precision Recall F1 Support SeqNum 0 1 CLASS_1 8 0 0 1.0 1.0 1.0 8 2 3 CLASS_3 0 0 10 1.0 1.0 1.0 10 1 2 CLASS_2 0 12 0 1.0 1.0 1.0 12