AutoClassifier for multiclass classification using early stopping timer - Example 6: Run AutoClassifier for Multiclass Classification Problem using Early Stopping Timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-04-02
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predicts the species of iris flower based on different factors.

Run AutoML to acquire the most effective model with the following specifications:
  • Use early stopping timer to 300 sec.
  • Include only ‘xgboost’ model for training.
  • Opt for verbose level 2 to get detailed log.
  • Add customization for some specific processes of AutoClassifier.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "iris_input")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> iris_sample = iris.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Add customization.
    >>> AutoClassifier.generate_custom_config("custom_iris")
    Generating custom config JSON for AutoML ...
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  1
    
    Customizing Feature Engineering Phase ...
    
    Available options for customization of feature engineering phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Missing Value Handling
    
    Index 2: Customize Bincode Encoding
    
    Index 3: Customize String Manipulation
    
    Index 4: Customize Categorical Encoding
    
    Index 5: Customize Mathematical Transformation
    
    Index 6: Customize Nonlinear Transformation
    
    Index 7: Customize Antiselect Features
    
    Index 8: Back to main menu
    
    Index 9: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in feature engineering phase:  8
    
    Customization of feature engineering phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  2
    
    Customizing Data Preparation Phase ...
    
    Available options for customization of data preparation phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Train Test Split
    
    Index 2: Customize Data Imbalance Handling
    
    Index 3: Customize Outlier Handling
    
    Index 4: Customize Feature Scaling
    
    Index 5: Back to main menu
    
    Index 6: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in data preparation phase:  1, 4, 5
    
    Customizing Train Test Split ...
    
    Enter the train size for train test split:  0.85
    
    Customization of train test split has been completed successfully.
    
    Available feature scaling methods with corresponding indices:
    Index 1: maxabs
    Index 2: mean
    Index 3: midrange
    Index 4: range
    Index 5: rescale
    Index 6: std
    Index 7: sum
    Index 8: ustd
    
    Enter the corresponding index feature scaling method:  4
    
    Customization of feature scaling has been completed successfully.
    
    Customization of data preparation phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  3
    
    Customizing Model Training Phase ...
    
    Available options for customization of model training phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Model Hyperparameter
    
    Index 2: Back to main menu
    
    Index 3: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in model training phase:  1
    
    Customizing Model Hyperparameter ...
    
    Available models for hyperparameter tuning with corresponding indices:
    Index 1: decision_forest
    Index 2: xgboost
    Index 3: knn
    Index 4: glm
    Index 5: svm
    
    Available hyperparamters update methods with corresponding indices:
    Index 1: ADD
    Index 2: REPLACE
    
    Enter the list of model indices for performing hyperparameter tuning:  2
    
    Available hyperparameters for model 'xgboost' with corresponding indices:
    Index 1: min_impurity
    Index 2: max_depth
    Index 3: min_node_size
    Index 4: shrinkage_factor
    Index 5: iter_num
    
    Enter the list of hyperparameter indices for model 'xgboost':  2
    
    Enter the index of corresponding update method for hyperparameters 'max_depth' for model 'xgboost':  2
    
    Enter the list of value for hyperparameter 'max_depth' for model 'xgboost':  3,4
    
    Customization of model hyperparameter has been completed successfully.
    
    Available options for customization of model training phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Model Hyperparameter
    
    Index 2: Back to main menu
    
    Index 3: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in model training phase:  3
    
    Customization of model training phase has been completed successfully.
    
    Process of generating custom config file for AutoML has been completed successfully.
    
    'custom_iris.json' file is generated successfully under the current working directory.
  3. Create an AutoML instance.
    >>> aml = AutoClassifier(include=['xgboost'],
    >>>                      verbose=2,
    >>>                      max_runtime_secs=300,
    >>>                      custom_config_file='custom_iris.json')
  4. Fit training data.
    >>> aml.fit(iris_train, iris_train.species)
    Received below input for customization : 
    {
        "TrainTestSplitIndicator": true,
        "TrainingSize": 0.85,
        "DataImbalanceIndicator": true,
        "DataImbalanceMethod": "SMOTE",
        "FeatureScalingIndicator": true,
        "FeatureScalingMethod": "range",
        "HyperparameterTuningIndicator": true,
        "HyperparameterTuningParam": {
            "xgboost": {
                "max_depth": {
                    "Method": "ADD",
                    "Value": [
                        3,
                        4
                    ]
                }
            }
        }
    }
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    
    Data Overview:
    Total Rows in the data: 120
    Total Columns in the data: 6
    
    Column Summary:
    ColumnName	Datatype	NonNullCount	NullCount	BlankCount	ZeroCount	PositiveCount	NegativeCount	NullPercentage	NonNullPercentage
    species	INTEGER	120	0	None	0	120	0	0.0	100.0
    sepal_length	FLOAT	120	0	None	0	120	0	0.0	100.0
    petal_length	FLOAT	120	0	None	0	120	0	0.0	100.0
    sepal_width	FLOAT	120	0	None	0	120	0	0.0	100.0
    petal_width	FLOAT	120	0	None	0	120	0	0.0	100.0
    id	INTEGER	120	0	None	0	120	0	0.0	100.0
    
    Statistics of Data:
    func	id	sepal_length	sepal_width	petal_length	petal_width	species
    std	42.746	0.828	0.445	1.784	0.764	0.825
    25%	37.75	5.1	2.775	1.5	0.275	1
    50%	72.5	5.8	3	4.2	1.3	2
    75%	110.25	6.4	3.3	5.1	1.8	3
    max	149	7.7	4.4	6.9	2.5	3
    min	2	4.3	2	1	0.1	1
    mean	73.642	5.818	3.033	3.683	1.158	1.975
    count	120	120	120	120	120	120
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                          
        ColumnName  OutlierPercentage
    0  sepal_width           0.833333
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Feature Engineering started ...
    
    Handling duplicate records present in dataset ...
    Analysis completed. No action taken.                                                    
    
    Total time to handle duplicate records: 1.57 sec
    
    Handling less significant features from data ...
    
    Total time to handle less significant features: 5.83 sec
    
    Handling Date Features ...
    Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    
    Total time to handle date features: 0.00 sec
    Proceeding with default option for missing value imputation.                             
    Proceeding with default option for handling remaining missing values.                    
    
    Checking Missing values in dataset ...
    Analysis Completed. No Missing Values Detected.                                          
    
    Total time to find missing values in data: 7.15 sec
    
    Imputing Missing Values ...
    Analysis completed. No imputation required.                                              
    
    Time taken to perform imputation: 0.01 sec
    No information provided for Variable-Width Transformation.                               
    Skipping customized string manipulation.⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 26% - 5/19
    
    Starting Customized Categorical Feature Encoding ...
    AutoML will proceed with default encoding technique.                                     
    
    Performing encoding for categorical columns ...
    Analysis completed. No categorical columns were found.                                   
    
    Time taken to encode the columns: 1.42 sec
    
    Starting customized mathematical transformation ...
    Skipping customized mathematical transformation.                                         
    
    Starting customized non-linear transformation ...
    Skipping customized non-linear transformation.                                           
    
    Starting customized anti-select columns ...
    Skipping customized anti-select columns.                                                 
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Data preparation started ...
    
    Spliting of dataset into training and testing ...
    Training size : 0.85                                                                      
    Testing size  : 0.15                                                                      
    
    Training data sample
    sepal_length	sepal_width	petal_length	petal_width	species	id
    5.1	3.4	1.5	0.2	1	10
    5.0	2.0	3.5	1.0	2	14
    5.0	3.2	1.2	0.2	1	22
    5.7	2.6	3.5	1.0	2	12
    6.3	3.3	6.0	2.5	3	9
    5.4	3.9	1.3	0.4	1	17
    5.1	2.5	3.0	1.1	2	13
    5.6	2.7	4.2	1.3	2	21
    6.7	3.0	5.0	1.7	2	15
    6.0	3.0	4.8	1.8	3	23
    
    102 rows X 6 columns
    
    Testing data sample
    sepal_length	sepal_width	petal_length	petal_width	species	id
    6.4	3.2	5.3	2.3	3	30
    5.7	2.9	4.2	1.3	2	31
    6.3	2.9	5.6	1.8	3	103
    6.3	3.4	5.6	2.4	3	27
    6.5	2.8	4.6	1.5	2	28
    6.7	2.5	5.8	1.8	3	108
    6.4	2.7	5.3	1.9	3	29
    5.4	3.9	1.7	0.4	1	85
    6.2	2.2	4.5	1.5	2	107
    5.6	2.9	3.6	1.3	2	110
    
    18 rows X 6 columns
    
    Time taken for spliting of data: 11.26 sec
    
    Starting customized outlier processing ...
    No information provided for customized outlier processing. AutoML will proceed with default settings.
    
    Outlier preprocessing ...
    Columns with outlier percentage :-                                                                          
        ColumnName  OutlierPercentage
    0  sepal_width           0.833333
    
    Deleting rows of these columns:
    ['sepal_width']
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713845938369288"'19
    
    Sample of training dataset after removing outlier rows:
    sepal_length	sepal_width	petal_length	petal_width	species	id
    7.2	3.6	6.1	2.5	3	34
    7.4	2.8	6.1	1.9	3	24
    6.2	3.4	5.4	2.3	3	106
    6.1	2.8	4.7	1.2	2	35
    7.3	2.9	6.3	1.8	3	71
    7.0	3.2	4.7	1.4	2	63
    6.7	3.3	5.7	2.1	3	95
    6.7	3.0	5.2	2.3	3	87
    5.4	3.9	1.3	0.4	1	17
    5.4	3.4	1.5	0.4	1	47
    
    101 rows X 6 columns
    
    Time Taken by Outlier processing: 35.55 sec
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713844347449263"'19
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713849039953629"'
    
    Checking imbalance data ...
    
    Imbalance Not Found.
    
    Feature selection using lasso ...
    
    feature selected by lasso:
    ['petal_width', 'sepal_width', 'sepal_length', 'petal_length']
    
    Total time taken by feature selection: 3.37 sec
    
    scaling Features of lasso data ...
    
    columns that will be scaled:
    ['petal_width', 'sepal_width', 'sepal_length', 'petal_length']
    
    Training dataset sample after scaling:
    id	species	petal_width	sepal_width	sepal_length	petal_length
    40	3	0.7916666666666666	0.4761904761904763	0.6470588235294118	0.711864406779661
    80	1	0.08333333333333333	0.7142857142857144	0.23529411764705874	0.06779661016949151
    99	1	0.0	0.4761904761904763	0.0	0.016949152542372895
    61	3	0.7083333333333334	0.4761904761904763	0.6470588235294118	0.7627118644067796
    93	2	0.625	0.3333333333333335	0.5	0.6949152542372881
    34	3	1.0	0.7619047619047621	0.8529411764705882	0.8644067796610169
    78	2	0.5	0.1428571428571428	0.35294117647058826	0.5084745762711864
    19	2	0.5	0.4285714285714286	0.676470588235294	0.6101694915254237
    17	1	0.12500000000000003	0.9047619047619049	0.323529411764706	0.05084745762711865
    76	1	0.04166666666666667	0.6666666666666667	0.14705882352941174	0.1016949152542373
    
    101 rows X 6 columns
    
    Testing dataset sample after scaling:
    id	species	petal_width	sepal_width	sepal_length	petal_length
    110	2	0.5	0.4285714285714286	0.3823529411764705	0.4406779661016949
    29	3	0.75	0.3333333333333335	0.6176470588235295	0.7288135593220338
    116	1	0.08333333333333333	0.4761904761904763	0.14705882352941174	0.06779661016949151
    108	3	0.7083333333333334	0.23809523809523814	0.7058823529411765	0.8135593220338982
    30	3	0.9166666666666666	0.5714285714285716	0.6176470588235295	0.7288135593220338
    28	2	0.5833333333333334	0.38095238095238093	0.6470588235294118	0.6101694915254237
    127	1	0.0	0.523809523809524	0.17647058823529427	0.0847457627118644
    122	3	0.8750000000000001	0.4761904761904763	0.6470588235294118	0.8135593220338982
    101	2	0.375	0.09523809523809534	0.5	0.5084745762711864
    26	2	0.625	0.6190476190476191	0.588235294117647	0.6271186440677966
    
    18 rows X 6 columns
    
    Total time taken by feature scaling: 42.16 sec
    
    Feature selection using rfe ...
    
    feature selected by RFE:
    ['petal_length', 'petal_width']
    
    Total time taken by feature selection: 10.35 sec
    
    scaling Features of rfe data ...
    
    columns that will be scaled:
    ['r_petal_length', 'r_petal_width']
    
    Training dataset sample after scaling:
    id	species	r_petal_length	r_petal_width
    40	3	0.711864406779661	0.7916666666666666
    80	1	0.06779661016949151	0.08333333333333333
    99	1	0.016949152542372895	0.0
    61	3	0.7627118644067796	0.7083333333333334
    93	2	0.6949152542372881	0.625
    34	3	0.8644067796610169	1.0
    78	2	0.5084745762711864	0.5
    19	2	0.6101694915254237	0.5
    17	1	0.05084745762711865	0.12500000000000003
    76	1	0.1016949152542373	0.04166666666666667
    
    101 rows X 4 columns
    
    Testing dataset sample after scaling:
    id	species	r_petal_length	r_petal_width
    110	2	0.4406779661016949	0.5
    29	3	0.7288135593220338	0.75
    116	1	0.06779661016949151	0.08333333333333333
    108	3	0.8135593220338982	0.7083333333333334
    30	3	0.7288135593220338	0.9166666666666666
    28	2	0.6101694915254237	0.5833333333333334
    127	1	0.0847457627118644	0.0
    122	3	0.8135593220338982	0.8750000000000001
    101	2	0.5084745762711864	0.375
    26	2	0.6271186440677966	0.625
    
    18 rows X 4 columns
    
    Total time taken by feature scaling: 40.20 sec
    
    scaling Features of pca data ...
    
    columns that will be scaled:
    ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
    
    Training dataset sample after scaling:
    id	species	sepal_length	sepal_width	petal_length	petal_width
    95	3	0.7058823529411765	0.6190476190476191	0.7966101694915254	0.8333333333333334
    24	3	0.911764705882353	0.38095238095238093	0.8644067796610169	0.75
    106	3	0.5588235294117647	0.6666666666666667	0.7457627118644068	0.9166666666666666
    35	2	0.5294117647058822	0.38095238095238093	0.6271186440677966	0.4583333333333333
    34	3	0.8529411764705882	0.7619047619047621	0.8644067796610169	1.0
    112	3	0.8529411764705882	0.4761904761904763	0.8135593220338982	0.625
    17	1	0.323529411764706	0.9047619047619049	0.05084745762711865	0.12500000000000003
    47	1	0.323529411764706	0.6666666666666667	0.0847457627118644	0.12500000000000003
    71	3	0.8823529411764705	0.4285714285714286	0.8983050847457626	0.7083333333333334
    63	2	0.7941176470588235	0.5714285714285716	0.6271186440677966	0.5416666666666666
    
    101 rows X 6 columns
    
    Testing dataset sample after scaling:
    id	species	sepal_length	sepal_width	petal_length	petal_width
    27	3	0.588235294117647	0.6666666666666667	0.7796610169491525	0.9583333333333333
    30	3	0.6176470588235295	0.5714285714285716	0.7288135593220338	0.9166666666666666
    110	2	0.3823529411764705	0.4285714285714286	0.4406779661016949	0.5
    31	2	0.411764705882353	0.4285714285714286	0.5423728813559322	0.5
    29	3	0.6176470588235295	0.3333333333333335	0.7288135593220338	0.75
    85	1	0.323529411764706	0.9047619047619049	0.11864406779661016	0.12500000000000003
    28	2	0.6470588235294118	0.38095238095238093	0.6101694915254237	0.5833333333333334
    108	3	0.7058823529411765	0.23809523809523814	0.8135593220338982	0.7083333333333334
    103	3	0.588235294117647	0.4285714285714286	0.7796610169491525	0.7083333333333334
    107	2	0.5588235294117647	0.09523809523809534	0.5932203389830508	0.5833333333333334
    
    18 rows X 6 columns
    
    Total time taken by feature scaling: 36.83 sec
    
    Dimension Reduction using pca ...
    
    PCA columns:
    ['col_0', 'col_1']
    
    Total time taken by PCA: 11.03 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Model Training started ...
    
    Starting customized hyperparameter update ...
    
    Completed customized hyperparameter update.
    
    Hyperparameters used for model training:
    response_column : species                                                                                                                             
    name : xgboost
    model_type : Classification
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.1, 0.2)
    max_depth : (3, 4, 5, 6, 7, 8)
    min_node_size : (1, 2)
    iter_num : (10, 20)
    Total number of models for xgboost : 1152
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    Performing hyperParameter tuning ...
    
    xgboost
    
    ----------------------------------------------------------------------------------------------------
    
    Evaluating models performance ...
    
    Evaluation completed.
    
    Leaderboard
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_0	lasso	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    1	2	XGBOOST_1	rfe	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    2	3	XGBOOST_3	lasso	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    3	4	XGBOOST_4	rfe	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    4	5	XGBOOST_7	rfe	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    5	6	XGBOOST_6	lasso	0.944444	0.944444	0.944444	0.944444	0.952381	0.944444	0.944056	0.952381	0.944444	0.944056
    6	7	XGBOOST_2	pca	0.888889	0.888889	0.888889	0.888889	0.916667	0.888889	0.885714	0.916667	0.888889	0.885714
    7	8	XGBOOST_5	pca	0.888889	0.888889	0.888889	0.888889	0.916667	0.888889	0.885714	0.916667	0.888889	0.885714
    
    8 rows X 13 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 19/19
  5. Display model leaderboard.
    >>> aml.leaderboard()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_0	lasso	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    1	2	XGBOOST_1	rfe	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    2	3	XGBOOST_3	lasso	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    3	4	XGBOOST_4	rfe	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    4	5	XGBOOST_7	rfe	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
    5	6	XGBOOST_6	lasso	0.944444	0.944444	0.944444	0.944444	0.952381	0.944444	0.944056	0.952381	0.944444	0.944056
    6	7	XGBOOST_2	pca	0.888889	0.888889	0.888889	0.888889	0.916667	0.888889	0.885714	0.916667	0.888889	0.885714
    7	8	XGBOOST_5	pca	0.888889	0.888889	0.888889	0.888889	0.916667	0.888889	0.885714	0.916667	0.888889	0.885714
  6. Display the best performing model.
    >>> aml.leader()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_0	lasso	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
  7. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    Following model is being used for generating prediction :
    Model ID : XGBOOST_0 
    Feature Selection Method : lasso
    
     Prediction : 
        id  Prediction  Confidence_Lower  Confidence_upper  species
    0  110           2             0.750             0.750        2
    1   29           3             0.750             0.750        3
    2  116           1             0.750             0.750        1
    3  108           3             0.750             0.750        3
    4   30           3             1.000             1.000        3
    5   28           2             0.625             0.625        2
    6  127           1             0.750             0.750        1
    7  122           3             1.000             1.000        3
    8  101           2             1.000             1.000        2
    9   26           2             0.500             0.500        2
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall   F1  Support
    SeqNum                                                                                
    0               1  CLASS_1        6        0        0        1.0     1.0  1.0        6
    2               3  CLASS_3        0        0        6        1.0     1.0  1.0        6
    1               2  CLASS_2        0        6        0        1.0     1.0  1.0        6
    
     Confusion Matrix : 
    array([[6, 0, 0],
           [0, 6, 0],
           [0, 0, 6]], dtype=int64)
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	species
    28	2	0.625	0.625	2
    30	3	1.0	1.0	3
    31	2	0.75	0.75	2
    82	1	0.75	0.75	1
    101	2	1.0	1.0	2
    103	3	1.0	1.0	3
    85	1	0.875	0.875	1
    29	3	0.75	0.75	3
    27	3	1.0	1.0	3
    26	2	0.5	0.5	2
  8. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(iris_test)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping irrelevent columns :
    sepal_length	sepal_width	petal_length	petal_width	species
    7.7	3.8	6.7	2.2	3
    5.9	3.2	4.8	1.8	2
    4.6	3.2	1.4	0.2	1
    5.1	3.5	1.4	0.2	1
    5.7	3.8	1.7	0.3	1
    6.8	3.2	5.9	2.3	3
    6.7	3.1	5.6	2.4	3
    5.8	2.8	5.1	2.4	3
    5.9	3.0	5.1	1.8	3
    4.6	3.4	1.4	0.3	1
    
    Updated dataset after performing target column transformation :
    sepal_length	id	sepal_width	petal_width	petal_length	species
    5.5	9	4.2	0.2	1.4	1
    5.7	11	3.8	0.3	1.7	1
    6.8	19	3.2	2.3	5.9	3
    5.9	10	3.0	1.8	5.1	3
    5.9	13	3.2	1.8	4.8	2
    4.6	21	3.2	0.2	1.4	1
    7.7	12	3.8	2.2	6.7	3
    5.7	20	2.5	2.0	5.0	3
    6.7	14	3.1	2.4	5.6	3
    5.8	22	2.8	2.4	5.1	3
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713849678727662"'
    
    Updated dataset after performing Lasso feature selection:
    id	petal_width	sepal_width	sepal_length	petal_length	species
    26	0.4	3.4	5.0	1.6	1
    22	2.4	2.8	5.8	5.1	3
    35	1.6	3.4	6.0	4.5	2
    36	1.8	3.0	6.1	4.9	3
    17	1.5	3.1	6.7	4.7	2
    34	0.2	3.1	4.6	1.5	1
    15	0.5	3.3	5.1	1.7	1
    32	1.2	3.0	5.7	4.2	2
    38	2.1	3.0	7.1	5.9	3
    12	2.2	3.8	7.7	6.7	3
    
    Updated dataset after performing scaling on Lasso selected features :
    id	species	petal_width	sepal_width	sepal_length	petal_length
    26	1	0.12500000000000003	0.6666666666666667	0.2058823529411765	0.1016949152542373
    17	2	0.5833333333333334	0.523809523809524	0.7058823529411765	0.6271186440677966
    34	1	0.04166666666666667	0.523809523809524	0.088235294117647	0.0847457627118644
    38	3	0.8333333333333334	0.4761904761904763	0.8235294117647057	0.8305084745762712
    36	3	0.7083333333333334	0.4761904761904763	0.5294117647058822	0.6610169491525424
    28	2	0.5416666666666666	0.523809523809524	0.7058823529411765	0.576271186440678
    19	3	0.9166666666666666	0.5714285714285716	0.7352941176470588	0.8305084745762712
    30	2	0.5	0.23809523809523814	0.35294117647058826	0.5084745762711864
    15	1	0.16666666666666669	0.6190476190476191	0.23529411764705874	0.11864406779661016
    32	2	0.4583333333333333	0.4761904761904763	0.411764705882353	0.5423728813559322
    
    Updated dataset after performing RFE feature selection:
    id	petal_length	petal_width	species
    26	1.6	0.4	1
    17	4.7	1.5	2
    34	1.5	0.2	1
    36	4.9	1.8	3
    22	5.1	2.4	3
    35	4.5	1.6	2
    38	5.9	2.1	3
    12	6.7	2.2	3
    15	1.7	0.5	1
    32	4.2	1.2	2
    
    Updated dataset after performing scaling on RFE selected features :
    id	species	r_petal_length	r_petal_width
    22	3	0.6949152542372881	0.9583333333333333
    17	2	0.6271186440677966	0.5833333333333334
    34	1	0.0847457627118644	0.04166666666666667
    38	3	0.8305084745762712	0.8333333333333334
    15	1	0.11864406779661016	0.16666666666666669
    32	2	0.5423728813559322	0.4583333333333333
    36	3	0.6610169491525424	0.7083333333333334
    28	2	0.576271186440678	0.5416666666666666
    19	3	0.8305084745762712	0.9166666666666666
    30	2	0.5084745762711864	0.5
    
    Updated dataset after performing scaling for PCA feature selection :
    id	species	sepal_length	sepal_width	petal_length	petal_width
    22	3	0.4411764705882352	0.38095238095238093	0.6949152542372881	0.9583333333333333
    36	3	0.5294117647058822	0.4761904761904763	0.6610169491525424	0.7083333333333334
    28	2	0.7058823529411765	0.523809523809524	0.576271186440678	0.5416666666666666
    19	3	0.7352941176470588	0.5714285714285716	0.8305084745762712	0.9166666666666666
    17	2	0.7058823529411765	0.523809523809524	0.6271186440677966	0.5833333333333334
    34	1	0.088235294117647	0.523809523809524	0.0847457627118644	0.04166666666666667
    38	3	0.8235294117647057	0.4761904761904763	0.8305084745762712	0.8333333333333334
    12	3	1.0	0.8571428571428572	0.9661016949152542	0.8750000000000001
    15	1	0.23529411764705874	0.6190476190476191	0.11864406779661016	0.16666666666666669
    32	2	0.411764705882353	0.4761904761904763	0.5423728813559322	0.4583333333333333
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	species
    0	26	-0.552814	-0.064726	1
    1	17	0.306241	-0.131285	2
    2	22	0.488587	0.103787	3
    3	19	0.641246	-0.184527	3
    4	38	0.648012	-0.133717	3
    5	36	0.333749	-0.014838	3
    6	15	-0.493909	-0.034387	1
    7	20	0.388955	0.248874	3
    8	34	-0.640616	0.115578	1
    9	35	0.190191	-0.175907	2
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : XGBOOST_0 
    Feature Selection Method : lasso
    
     Prediction : 
       id  Prediction  Confidence_Lower  Confidence_upper  species
    0  26           1             0.875             0.875        1
    1  36           3             1.000             1.000        3
    2  28           2             0.750             0.750        2
    3  15           1             0.875             0.875        1
    4  17           2             0.625             0.625        2
    5  34           1             0.750             0.750        1
    6  38           3             1.000             1.000        3
    7  12           3             1.000             1.000        3
    8  19           3             1.000             1.000        3
    9  30           2             1.000             1.000        2
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision    Recall        F1  Support
    SeqNum                                                                                       
    0               1  CLASS_1        8        0        0   1.000000  1.000000  1.000000        8
    2               3  CLASS_3        0        1       11   0.916667  1.000000  0.956522       11
    1               2  CLASS_2        0       10        0   1.000000  0.909091  0.952381       11
    
     Confusion Matrix : 
    array([[ 8,  0,  0],
           [ 0, 10,  1],
           [ 0,  0, 11]], dtype=int64)
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	species
    10	3	0.875	0.875	3
    12	3	1.0	1.0	3
    13	3	0.875	0.875	2
    14	3	1.0	1.0	3
    16	2	0.75	0.75	2
    17	2	0.625	0.625	2
    15	1	0.875	0.875	1
    11	1	0.875	0.875	1
    9	1	0.875	0.875	1
    8	1	0.875	0.875	1
  9. Generate prediction on test dataset using second best performing model.
    >>> prediction = aml.predict(iris_test,2)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping irrelevent columns :
    sepal_length	sepal_width	petal_length	petal_width	species
    5.5	4.2	1.4	0.2	1
    7.7	3.8	6.7	2.2	3
    5.7	2.5	5.0	2.0	3
    6.7	3.1	5.6	2.4	3
    5.9	3.2	4.8	1.8	2
    4.6	3.2	1.4	0.2	1
    5.7	3.8	1.7	0.3	1
    6.8	3.2	5.9	2.3	3
    5.9	3.0	5.1	1.8	3
    4.6	3.4	1.4	0.3	1
    
    Updated dataset after performing target column transformation :
    sepal_length	id	sepal_width	petal_width	petal_length	species
    5.9	13	3.2	1.8	4.8	2
    6.7	14	3.1	2.4	5.6	3
    5.8	22	2.8	2.4	5.1	3
    5.1	8	3.5	0.2	1.4	1
    5.9	10	3.0	1.8	5.1	3
    4.6	18	3.4	0.3	1.4	1
    7.7	12	3.8	2.2	6.7	3
    5.7	20	2.5	2.0	5.0	3
    5.7	11	3.8	0.3	1.7	1
    6.8	19	3.2	2.3	5.9	3
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713844745578509"'
    
    Updated dataset after performing Lasso feature selection:
    id	petal_width	sepal_width	sepal_length	petal_length	species
    19	2.3	3.2	6.8	5.9	3
    22	2.4	2.8	5.8	5.1	3
    35	1.6	3.4	6.0	4.5	2
    26	0.4	3.4	5.0	1.6	1
    36	1.8	3.0	6.1	4.9	3
    28	1.4	3.1	6.7	4.4	2
    15	0.5	3.3	5.1	1.7	1
    32	1.2	3.0	5.7	4.2	2
    38	2.1	3.0	7.1	5.9	3
    12	2.2	3.8	7.7	6.7	3
    
    Updated dataset after performing scaling on Lasso selected features :
    id	species	petal_width	sepal_width	sepal_length	petal_length
    17	2	0.5833333333333334	0.523809523809524	0.7058823529411765	0.6271186440677966
    19	3	0.9166666666666666	0.5714285714285716	0.7352941176470588	0.8305084745762712
    30	2	0.5	0.23809523809523814	0.35294117647058826	0.5084745762711864
    15	1	0.16666666666666669	0.6190476190476191	0.23529411764705874	0.11864406779661016
    26	1	0.12500000000000003	0.6666666666666667	0.2058823529411765	0.1016949152542373
    20	3	0.7916666666666666	0.23809523809523814	0.411764705882353	0.6779661016949152
    36	3	0.7083333333333334	0.4761904761904763	0.5294117647058822	0.6610169491525424
    28	2	0.5416666666666666	0.523809523809524	0.7058823529411765	0.576271186440678
    38	3	0.8333333333333334	0.4761904761904763	0.8235294117647057	0.8305084745762712
    12	3	0.8750000000000001	0.8571428571428572	1.0	0.9661016949152542
    
    Updated dataset after performing RFE feature selection:
    id	petal_length	petal_width	species
    26	1.6	0.4	1
    22	5.1	2.4	3
    35	4.5	1.6	2
    36	4.9	1.8	3
    19	5.9	2.3	3
    30	4.0	1.3	2
    15	1.7	0.5	1
    32	4.2	1.2	2
    38	5.9	2.1	3
    12	6.7	2.2	3
    
    Updated dataset after performing scaling on RFE selected features :
    id	species	r_petal_length	r_petal_width
    22	3	0.6949152542372881	0.9583333333333333
    38	3	0.8305084745762712	0.8333333333333334
    12	3	0.9661016949152542	0.8750000000000001
    17	2	0.6271186440677966	0.5833333333333334
    19	3	0.8305084745762712	0.9166666666666666
    30	2	0.5084745762711864	0.5
    15	1	0.11864406779661016	0.16666666666666669
    32	2	0.5423728813559322	0.4583333333333333
    36	3	0.6610169491525424	0.7083333333333334
    28	2	0.576271186440678	0.5416666666666666
    
    Updated dataset after performing scaling for PCA feature selection :
    id	species	sepal_length	sepal_width	petal_length	petal_width
    17	2	0.7058823529411765	0.523809523809524	0.6271186440677966	0.5833333333333334
    15	1	0.23529411764705874	0.6190476190476191	0.11864406779661016	0.16666666666666669
    32	2	0.411764705882353	0.4761904761904763	0.5423728813559322	0.4583333333333333
    36	3	0.5294117647058822	0.4761904761904763	0.6610169491525424	0.7083333333333334
    22	3	0.4411764705882352	0.38095238095238093	0.6949152542372881	0.9583333333333333
    35	2	0.5	0.6666666666666667	0.5932203389830508	0.625
    19	3	0.7352941176470588	0.5714285714285716	0.8305084745762712	0.9166666666666666
    30	2	0.35294117647058826	0.23809523809523814	0.5084745762711864	0.5
    38	3	0.8235294117647057	0.4761904761904763	0.8305084745762712	0.8333333333333334
    12	3	1.0	0.8571428571428572	0.9661016949152542	0.8750000000000001
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	species
    0	26	-0.552814	-0.064726	1
    1	17	0.306241	-0.131285	2
    2	22	0.488587	0.103787	3
    3	19	0.641246	-0.184527	3
    4	38	0.648012	-0.133717	3
    5	36	0.333749	-0.014838	3
    6	15	-0.493909	-0.034387	1
    7	20	0.388955	0.248874	3
    8	34	-0.640616	0.115578	1
    9	35	0.190191	-0.175907	2
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : XGBOOST_1 
    Feature Selection Method : rfe
    
     Prediction : 
       id  Prediction  Confidence_Lower  Confidence_upper  species
    0  17           2             0.625             0.625        2
    1  38           3             1.000             1.000        3
    2  12           3             1.000             1.000        3
    3  36           2             0.500             0.500        3
    4  22           3             0.875             0.875        3
    5  35           2             0.625             0.625        2
    6  19           3             1.000             1.000        3
    7  30           2             1.000             1.000        2
    8  15           1             0.875             0.875        1
    9  32           2             1.000             1.000        2
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision    Recall        F1  Support
    SeqNum                                                                                       
    0               1  CLASS_1        8        0        0   1.000000  1.000000  1.000000        8
    2               3  CLASS_3        0        0       10   1.000000  0.909091  0.952381       11
    1               2  CLASS_2        0       11        1   0.916667  1.000000  0.956522       11
    
     Confusion Matrix : 
    array([[ 8,  0,  0],
           [ 0, 11,  0],
           [ 0,  1, 10]], dtype=int64)
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	species
    10	3	0.875	0.875	3
    12	3	1.0	1.0	3
    13	2	0.5	0.5	2
    14	3	1.0	1.0	3
    16	2	1.0	1.0	2
    17	2	0.625	0.625	2
    15	1	0.875	0.875	1
    11	1	0.875	0.875	1
    9	1	0.875	0.875	1
    8	1	0.875	0.875	1