AutoClassifier for multiclass classification using early stopping timer - Example 6: Run AutoClassifier for Multiclass Classification Problem using Early Stopping Timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predicts the species of iris flower based on different factors.

Run AutoML to acquire the most effective model with the following specifications:

  • Use early stopping timer to 300 sec.
  • Include only ‘xgboost’ model for training.
  • Opt for verbose level 2 to get detailed log.
  • Add customization for some specific processes of AutoClassifier.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "iris_input")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> iris_sample = iris.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Add customization.
    >>> AutoClassifier.generate_custom_config("custom_iris")
    
    
    Generating custom config JSON for AutoML ...
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  1
    
    Customizing Feature Engineering Phase ...
    
    Available options for customization of feature engineering phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Missing Value Handling
    
    Index 2: Customize Bincode Encoding
    
    Index 3: Customize String Manipulation
    
    Index 4: Customize Categorical Encoding
    
    Index 5: Customize Mathematical Transformation
    
    Index 6: Customize Nonlinear Transformation
    
    Index 7: Customize Antiselect Features
    
    Index 8: Back to main menu
    
    Index 9: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in feature engineering phase:  8
    
    Customization of feature engineering phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  2
    
    Customizing Data Preparation Phase ...
    
    Available options for customization of data preparation phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Train Test Split
    
    Index 2: Customize Data Imbalance Handling
    
    Index 3: Customize Outlier Handling
    
    Index 4: Customize Feature Scaling
    
    Index 5: Back to main menu
    
    Index 6: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in data preparation phase:  1, 4, 5
    
    Customizing Train Test Split ...
    
    Enter the train size for train test split:  0.85
    
    Customization of train test split has been completed successfully.
    
    Available feature scaling methods with corresponding indices:
    Index 1: maxabs
    Index 2: mean
    Index 3: midrange
    Index 4: range
    Index 5: rescale
    Index 6: std
    Index 7: sum
    Index 8: ustd
    
    Enter the corresponding index feature scaling method:  4
    
    Customization of feature scaling has been completed successfully.
    
    Customization of data preparation phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  3
    
    Customizing Model Training Phase ...
    
    Available options for customization of model training phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Model Hyperparameter
    
    Index 2: Back to main menu
    
    Index 3: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in model training phase:  1
    
    Customizing Model Hyperparameter ...
    
    Available models for hyperparameter tuning with corresponding indices:
    Index 1: decision_forest
    Index 2: xgboost
    Index 3: knn
    Index 4: glm
    Index 5: svm
    
    Available hyperparamters update methods with corresponding indices:
    Index 1: ADD
    Index 2: REPLACE
    
    Enter the list of model indices for performing hyperparameter tuning:  2
    
    Available hyperparameters for model 'xgboost' with corresponding indices:
    Index 1: min_impurity
    Index 2: max_depth
    Index 3: min_node_size
    Index 4: shrinkage_factor
    Index 5: iter_num
    
    Enter the list of hyperparameter indices for model 'xgboost':  2
    
    Enter the index of corresponding update method for hyperparameters 'max_depth' for model 'xgboost':  2
    
    Enter the list of value for hyperparameter 'max_depth' for model 'xgboost':  3,4
    
    Customization of model hyperparameter has been completed successfully.
    
    Available options for customization of model training phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Model Hyperparameter
    
    Index 2: Back to main menu
    
    Index 3: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in model training phase:  3
    
    Customization of model training phase has been completed successfully.
    
    Process of generating custom config file for AutoML has been completed successfully.
    
    'custom_iris.json' file is generated successfully under the current working directory.
  3. Create an AutoML instance.
    >>> aml = AutoClassifier(include=['xgboost'],
    >>>                      verbose=2,
    >>>                      max_runtime_secs=300,
    >>>                      custom_config_file='custom_iris.json')
    
    
  4. Fit training data.
    >>> aml.fit(iris_train, iris_train.species)
    
    Received below input for customization : 
    {
        "TrainTestSplitIndicator": true,
        "TrainingSize": 0.85,
        "FeatureScalingIndicator": true,
        "FeatureScalingMethod": "range",
        "HyperparameterTuningIndicator": true,
        "HyperparameterTuningParam": {
            "xgboost": {
                "max_depth": {
                    "Method": "REPLACE",
                    "Value": [
                        3,
                        4
                    ]
                }
            }
        }
    }
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    Column Summary:
    ColumnName    Datatype    NonNullCount    NullCount    BlankCount    ZeroCount    PositiveCount    NegativeCount    NullPercentage    NonNullPercentage
    sepal_length    FLOAT    120    0    None    0    120    0    0.0    100.0
    species    INTEGER    120    0    None    0    120    0    0.0    100.0
    sepal_width    FLOAT    120    0    None    0    120    0    0.0    100.0
    petal_width    FLOAT    120    0    None    0    120    0    0.0    100.0
    petal_length    FLOAT    120    0    None    0    120    0    0.0    100.0
    id    INTEGER    120    0    None    0    120    0    0.0    100.0
    Statistics of Data:
    func    id    sepal_length    sepal_width    petal_length    petal_width    species
    std    43.9    0.807    0.424    1.762    0.766    0.82
    25%    34.75    5.175    2.8    1.5    0.3    1
    50%    70.5    5.85    3    4.4    1.35    2
    75%    112.25    6.425    3.3    5.1    1.8    3
    max    149    7.9    4.2    6.7    2.5    3
    min    1    4.3    2    1    0.1    1
    mean    73.083    5.868    3.06    3.768    1.201    2
    count    120    120    120    120    120    120
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
        ColumnName  OutlierPercentage
    0  sepal_width                2.5
                                                                                            
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                            
    Feature Engineering started ...
                                                                                            
    Handling duplicate records present in dataset ...
                                                                                            
    Updated dataset after removing duplicate records:
    id    sepal_length    sepal_width    petal_length    petal_width    species
    33    5.2    4.1    1.5    0.1    1
    17    5.4    3.9    1.3    0.4    1
    66    6.7    3.1    4.4    1.4    2
    89    5.6    3.0    4.1    1.3    2
    42    4.5    2.3    1.3    0.3    1
    139    6.0    3.0    4.8    1.8    3
    73    6.3    2.5    4.9    1.5    2
    4    4.6    3.1    1.5    0.2    1
    55    6.5    2.8    4.6    1.5    2
    35    4.9    3.1    1.5    0.2    1
                                                                                            
    Handling less significant features from data ...
                                                                                            
    Total time to handle less significant features: 6.63 sec
                                                                                             
    Handling Date Features ...
    Dataset does not contain any feature related to dates.                                   
                                                                                             
    Total time to handle date features: 0.00 sec
    Proceeding with default option for missing value imputation.                             
    Proceeding with default option for handling remaining missing values.                    
                                                                                             
    Checking Missing values in dataset ...
    No Missing Value Detected.                                                               
                                                                                             
    Total time to find missing values in data: 7.55 sec
                                                                                             
    Imputing Missing Values ...
    No imputation is Required.                                                               
                                                                                             
    Time taken to perform imputation: 0.00 sec
    No information provided for Variable-Width Transformation.                               
    Skipping customized string manipulation.⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 26% - 5/19
                                                                                             
    Starting Customized Categorical Feature Encoding ...
    AutoML will proceed with default encoding technique.                                     
                                                                                             
    Performing encoding for categorical columns ...
    Encoding not required.                                                                   
                                                                                             
    Time taken to encode the columns: 1.32 sec
                                                                                             
    Starting customized mathematical transformation ...
    Skipping customized mathematical transformation.                                         
                                                                                             
    Starting customized non-linear transformation ...
    Skipping customized non-linear transformation.                                           
                                                                                             
    Starting customized anti-select columns ...
    Skipping customized anti-select columns.                                                 
                                                                                              
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                              
    Data preparation started ...
    No information provided for performing customized imbalanced dataset sampling. AutoML will Proceed with default option.
                                                                                              
    Spliting of dataset into training and testing ...
    Training size : 0.85                                                                      
    Testing size  : 0.15                                                                      
                                                                                              
    Training data
    sepal_length    sepal_width    petal_length    petal_width    species    id
    5.3    3.7    1.5    0.2    1    10
    5.6    3.0    4.1    1.3    2    9
    5.0    3.4    1.5    0.2    1    17
    6.3    2.5    4.9    1.5    2    15
    4.8    3.0    1.4    0.1    1    14
    5.5    2.3    4.0    1.3    2    22
    6.5    2.8    4.6    1.5    2    13
    4.9    3.1    1.5    0.2    1    21
    5.2    4.1    1.5    0.1    1    11
    6.4    2.7    5.3    1.9    3    19
                                                                                              
    Testing data
    sepal_length    sepal_width    petal_length    petal_width    species    id
    4.9    3.1    1.5    0.1    1    28
    5.6    2.5    3.9    1.1    2    30
    6.4    3.1    5.5    1.8    3    126
    6.0    2.2    5.0    1.5    3    27
    5.9    3.0    4.2    1.5    2    29
    5.8    2.8    5.1    2.4    3    77
    5.8    2.7    5.1    1.9    3    123
    6.5    3.2    5.1    2.0    3    124
    6.4    2.8    5.6    2.2    3    79
    4.8    3.0    1.4    0.3    1    31
                                                                                              
    Time taken for spliting of data: 10.75 sec
                                                                                              
    Starting customized outlier processing ...
    No information provided for customized outlier processing. AutoML will proceed with default settings.
                                                                                              
    Outlier preprocessing ...
    Columns with outlier percentage :-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
        ColumnName  OutlierPercentage
    0           id           3.333333
    1  sepal_width           2.500000
                                                                                              
    Deleting rows of these columns:
    ['sepal_width', 'id']
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710270864119054"'19
                                                                                              
    Time Taken by Outlier processing: 39.66 sec
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710271039598725"'19
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710273641403702"'
                                                                                              
    Checking imbalance data ...
                                                                                              
    Imbalance Not Found.
                                                                                              
    Feature selection using lasso ...
                                                                                              
    feature selected by lasso:
    ['sepal_width', 'petal_width', 'petal_length', 'sepal_length']
                                                                                              
    Total time taken by feature selection: 2.74 sec
                                                                                              
    scaling Features of lasso data ...
                                                                                              
    columns that will be scaled:
    ['sepal_width', 'petal_width', 'petal_length', 'sepal_length']
                                                                                              
    Training dataset after scaling:
    species    id    sepal_width    petal_width    petal_length    sepal_length
    1    78    0.6666666666666666    0.04166666666666667    0.10526315789473685    0.13888888888888887
    1    142    0.8888888888888887    0.08333333333333333    0.08771929824561403    0.22222222222222213
    3    49    0.6111111111111109    0.8333333333333334    0.8245614035087719    0.6666666666666666
    1    14    0.44444444444444436    0.0    0.07017543859649121    0.13888888888888887
    1    100    0.8888888888888887    0.08333333333333333    0.12280701754385964    0.38888888888888895
    3    105    0.5    0.9583333333333333    0.8070175438596491    0.6666666666666666
    3    155    0.6111111111111109    1.0    0.8245614035087719    0.6666666666666666
    3    66    0.44444444444444436    0.8333333333333334    0.9824561403508771    0.9166666666666665
    3    45    0.33333333333333315    0.7083333333333334    0.6666666666666666    0.5277777777777778
    1    46    0.6666666666666666    0.12500000000000003    0.08771929824561403    0.30555555555555564
                                                                                              
    Testing dataset after scaling:
    species    id    sepal_width    petal_width    petal_length    sepal_length
    1    28    0.5    0.0    0.08771929824561403    0.1666666666666668
    1    214    0.3888888888888888    0.04166666666666667    0.07017543859649121    0.027777777777777922
    3    126    0.5    0.7083333333333334    0.7894736842105263    0.5833333333333334
    1    31    0.44444444444444436    0.08333333333333333    0.07017543859649121    0.13888888888888887
    1    230    0.7222222222222222    0.08333333333333333    0.07017543859649121    0.22222222222222213
    3    77    0.33333333333333315    0.9583333333333333    0.719298245614035    0.41666666666666663
    3    27    0.0    0.5833333333333334    0.7017543859649122    0.4722222222222222
    3    123    0.2777777777777778    0.75    0.719298245614035    0.41666666666666663
    3    124    0.5555555555555556    0.7916666666666666    0.719298245614035    0.611111111111111
    1    132    0.6666666666666666    0.04166666666666667    0.07017543859649121    0.25000000000000006
                                                                                              
    Total time taken by feature scaling: 36.97 sec
                                                                                              
    Feature selection using rfe ...
                                                                                              
    feature selected by RFE:
    ['petal_length', 'petal_width']
                                                                                              
    Total time taken by feature selection: 8.03 sec
                                                                                              
    scaling Features of rfe data ...
                                                                                              
    columns that will be scaled:
    ['r_petal_length', 'r_petal_width']
                                                                                              
    Training dataset after scaling:
    species    id    r_petal_length    r_petal_width
    1    78    0.10526315789473685    0.04166666666666667
    1    142    0.08771929824561403    0.08333333333333333
    3    49    0.8245614035087719    0.8333333333333334
    1    14    0.07017543859649121    0.0
    1    100    0.12280701754385964    0.08333333333333333
    3    105    0.8070175438596491    0.9583333333333333
    3    155    0.8245614035087719    1.0
    3    66    0.9824561403508771    0.8333333333333334
    3    45    0.6666666666666666    0.7083333333333334
    1    46    0.08771929824561403    0.12500000000000003
                                                                                              
    Testing dataset after scaling:
    species    id    r_petal_length    r_petal_width
    1    28    0.08771929824561403    0.0
    1    214    0.07017543859649121    0.04166666666666667
    3    126    0.7894736842105263    0.7083333333333334
    1    31    0.07017543859649121    0.08333333333333333
    1    230    0.07017543859649121    0.08333333333333333
    3    77    0.719298245614035    0.9583333333333333
    3    27    0.7017543859649122    0.5833333333333334
    3    123    0.719298245614035    0.75
    3    124    0.719298245614035    0.7916666666666666
    1    132    0.07017543859649121    0.04166666666666667
                                                                                              
    Total time taken by feature scaling: 39.35 sec
                                                                                              
    scaling Features of pca data ...
                                                                                              
    columns that will be scaled:
    ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
                                                                                              
    Training dataset after scaling:
    species    id    sepal_length    sepal_width    petal_length    petal_width
    3    70    0.8055555555555556    0.5555555555555556    0.8771929824561403    0.7083333333333334
    1    46    0.30555555555555564    0.6666666666666666    0.08771929824561403    0.12500000000000003
    2    158    0.7499999999999999    0.5555555555555556    0.6491228070175439    0.5416666666666666
    2    94    0.4722222222222222    0.2777777777777778    0.719298245614035    0.625
    3    59    0.5277777777777778    0.6666666666666666    0.7719298245614036    0.9166666666666666
    2    48    0.4999999999999999    0.33333333333333315    0.6491228070175439    0.4583333333333333
    2    75    0.4999999999999999    0.33333333333333315    0.5263157894736842    0.5
    2    85    0.6666666666666666    0.5    0.6491228070175439    0.5833333333333334
    3    49    0.6666666666666666    0.6111111111111109    0.8245614035087719    0.8333333333333334
    3    42    0.5555555555555555    0.6666666666666666    0.8070175438596491    0.9583333333333333
                                                                                              
    Testing dataset after scaling:
    species    id    sepal_length    sepal_width    petal_length    petal_width
    1    31    0.13888888888888887    0.44444444444444436    0.07017543859649121    0.08333333333333333
    2    29    0.44444444444444453    0.44444444444444436    0.5614035087719298    0.5833333333333334
    3    77    0.41666666666666663    0.33333333333333315    0.719298245614035    0.9583333333333333
    1    28    0.1666666666666668    0.5    0.08771929824561403    0.0
    2    30    0.361111111111111    0.16666666666666657    0.5087719298245613    0.4166666666666667
    3    126    0.5833333333333334    0.5    0.7894736842105263    0.7083333333333334
    3    124    0.611111111111111    0.5555555555555556    0.719298245614035    0.7916666666666666
    3    79    0.5833333333333334    0.33333333333333315    0.8070175438596491    0.8750000000000001
    3    123    0.41666666666666663    0.2777777777777778    0.719298245614035    0.75
    3    27    0.4722222222222222    0.0    0.7017543859649122    0.5833333333333334
                                                                                              
    Total time taken by feature scaling: 38.94 sec
                                                                                              
    Dimension Reduction using pca ...
                                                                                              
    PCA columns:
    ['col_0', 'col_1']
                                                                                              
    Total time taken by PCA: 9.98 sec
                                                                                              
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                              
    Model Training started ...
                                                                                              
    Starting customized hyperparameter update ...
                                                                                              
    Completed customized hyperparameter update.
                                                                                              
    Hyperparameters used for model training:
    response_column : species                                                                                                                             
    name : xgboost
    model_type : Classification
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.1, 0.2)
    max_depth : (3, 4)
    min_node_size : (1, 2)
    iter_num : (10, 20)
    Total number of models for xgboost : 384
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
                                                                                              
    Performing hyperParameter tuning ...
                                                                                              
    xgboost
    XGBOOST_3                                                                                                                                                                                               
    XGBOOST_1                                                                                 
    XGBOOST_2                                                                                 
                                                                                              
    ----------------------------------------------------------------------------------------------------
                                                                                              
    Evaluating models performance ...
                                                                                              
    Evaluation completed.
                                                                                              
    Leaderboard
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    xgboost    pca    0.888889    0.888889    0.888889    0.888889    0.916667    0.888889    0.885714    0.916667    0.888889    0.885714
    1    2    xgboost    lasso    0.333333    0.333333    0.333333    0.333333    0.111111    0.333333    0.166667    0.111111    0.333333    0.166667
    2    3    xgboost    rfe    0.333333    0.333333    0.333333    0.333333    0.111111    0.333333    0.166667    0.111111    0.333333    0.166667
                                                                                              
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 19/19 
  5. Display model leaderboard.
    >>> aml.leaderboard()
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    xgboost    pca    0.888889    0.888889    0.888889    0.888889    0.916667    0.888889    0.885714    0.916667    0.888889    0.885714
    1    2    xgboost    lasso    0.333333    0.333333    0.333333    0.333333    0.111111    0.333333    0.166667    0.111111    0.333333    0.166667
    2    3    xgboost    rfe    0.333333    0.333333    0.333333    0.333333    0.111111    0.333333    0.166667    0.111111    0.333333    0.166667
  6. Display the best performing model.
    >>> aml.leader()
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    xgboost    pca    0.888889    0.888889    0.888889    0.888889    0.916667    0.888889    0.885714    0.916667    0.888889    0.885714
  7. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    xgboost pca
    
     Prediction : 
        id  Prediction  Confidence_Lower  Confidence_upper  species
    0   29           2             0.875             0.875        2
    1   31           1             1.000             1.000        1
    2  123           2             0.750             0.750        3
    3  124           3             0.875             0.875        3
    4  126           3             0.750             0.750        3
    5   79           3             1.000             1.000        3
    6   77           3             0.750             0.750        3
    7   30           2             1.000             1.000        2
    8   28           1             1.000             1.000        1
    9   27           2             0.875             0.875        3
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision    Recall        F1  Support
    SeqNum                                                                                       
    0               1  CLASS_1        6        0        0       1.00  1.000000  1.000000        6
    2               3  CLASS_3        0        0        4       1.00  0.666667  0.800000        6
    1               2  CLASS_2        0        6        2       0.75  1.000000  0.857143        6
    
     Confusion Matrix : 
    array([[6, 0, 0],
           [0, 6, 0],
           [0, 2, 4]], dtype=int64)
           
    
    >>> prediction.head()
    
    id    Prediction    Confidence_Lower    Confidence_upper    species
    29    2    0.875    0.875    2
    31    1    1.0        1.0    1
    77    3    0.75      0.75    3
    79    3    1.0        1.0    3
    109   1    1.0        1.0    1
    123   2    0.75      0.75    3
    95    2    1.0        1.0    2
    30    2    1.0        1.0    2
    28    1    1.0        1.0    1
    27    2    0.875    0.875    3
  8. Generate prediction on test dataset using second best performing model.
    >>> prediction = aml.predict(iris_test,2)
    
    
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping irrelevent columns :
    sepal_length    sepal_width    petal_length    petal_width    species
    4.9    3.6    1.4    0.1    1
    5.7    4.4    1.5    0.4    1
    4.4    3.0    1.3    0.2    1
    4.8    3.1    1.6    0.2    1
    5.4    3.4    1.7    0.2    1
    5.1    3.8    1.6    0.2    1
    4.6    3.2    1.4    0.2    1
    5.6    2.7    4.2    1.3    2
    5.0    3.5    1.6    0.6    1
    5.8    2.7    3.9    1.2    2
    
    Updated dataset after performing target column transformation :
    sepal_width    petal_width    sepal_length    id    petal_length    species
    3.7    0.4    5.1    10    1.5    1
    3.6    0.1    4.9    12    1.4    1
    2.7    1.2    5.8    20    3.9    2
    4.4    0.4    5.7    15    1.5    1
    3.2    0.2    4.6    13    1.4    1
    2.7    1.3    5.6    21    4.2    2
    3.1    0.2    4.8    14    1.6    1
    3.5    0.6    5.0    22    1.6    1
    3.0    0.2    4.4    23    1.3    1
    3.4    0.2    5.1    18    1.5    1
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710270638695242"'
    
    Updated dataset after performing Lasso feature selection:
    id    sepal_width    petal_width    petal_length    sepal_length    species
    34    2.5    1.7    4.5    4.9    3
    55    3.0    2.3    5.2    6.7    3
    43    2.8    2.0    4.9    5.6    3
    47    2.9    1.8    6.3    7.3    3
    36    3.8    2.2    6.7    7.7    3
    37    2.8    2.1    5.6    6.4    3
    14    3.1    0.2    1.6    4.8    1
    10    3.7    0.4    1.5    5.1    1
    23    3.0    0.2    1.3    4.4    1
    15    4.4    0.4    1.5    5.7    1
    
    Updated dataset after performing scaling on Lasso selected features :
    species    id    sepal_width    petal_width    petal_length    sepal_length
    1    23    0.44444444444444436    0.04166666666666667    0.052631578947368425  0.027777777777777922
    1    19    0.8888888888888887    0.04166666666666667    0.10526315789473685    0.22222222222222213
    1    11    0.6666666666666666    0.04166666666666667    0.12280701754385964    0.30555555555555564
    1    18    0.6666666666666666    0.04166666666666667    0.08771929824561403    0.22222222222222213
    1    22    0.7222222222222222    0.20833333333333334    0.10526315789473685    0.19444444444444448
    1    12    0.7777777777777778                    0.0    0.07017543859649121    0.1666666666666668
    3    44    0.44444444444444436    0.9166666666666666    0.894736842105263      0.9444444444444444
    3    46    0.16666666666666657                  0.75    0.7017543859649122     0.5555555555555555
    3    34    0.16666666666666657    0.6666666666666666    0.6140350877192983     0.1666666666666668
    3    42    0.44444444444444436    0.7083333333333334    0.719298245614035      0.44444444444444453
    
    Updated dataset after performing RFE feature selection:
    id    petal_length    petal_width    species
    34    4.5    1.7    3
    55    5.2    2.3    3
    43    4.9    2.0    3
    47    6.3    1.8    3
    36    6.7    2.2    3
    37    5.6    2.1    3
    14    1.6    0.2    1
    10    1.5    0.4    1
    23    1.3    0.2    1
    15    1.5    0.4    1
    
    Updated dataset after performing scaling on RFE selected features :
    species    id    r_petal_length    r_petal_width
    1    23    0.052631578947368425    0.04166666666666667
    1    19    0.10526315789473685    0.04166666666666667
    1    11    0.12280701754385964    0.04166666666666667
    1    18    0.08771929824561403    0.04166666666666667
    1    22    0.10526315789473685    0.20833333333333334
    1    12    0.07017543859649121    0.0
    3    44    0.894736842105263    0.9166666666666666
    3    46    0.7017543859649122    0.75
    3    34    0.6140350877192983    0.6666666666666666
    3    42    0.719298245614035    0.7083333333333334
    
    Updated dataset after performing scaling for PCA feature selection :
    species    id    sepal_length    sepal_width    petal_length    petal_width
    1    23    0.027777777777777922    0.44444444444444436    0.052631578947368425    0.04166666666666667
    1    19    0.22222222222222213    0.8888888888888887    0.10526315789473685    0.04166666666666667
    1    11    0.30555555555555564    0.6666666666666666    0.12280701754385964    0.04166666666666667
    1    18    0.22222222222222213    0.6666666666666666    0.08771929824561403    0.04166666666666667
    1    22    0.19444444444444448    0.7222222222222222    0.10526315789473685    0.20833333333333334
    1    12    0.1666666666666668    0.7777777777777778    0.07017543859649121    0.0
    3    44    0.9444444444444444    0.44444444444444436    0.894736842105263    0.9166666666666666
    3    46    0.5555555555555555    0.16666666666666657    0.7017543859649122    0.75
    3    34    0.1666666666666668    0.16666666666666657    0.6140350877192983    0.6666666666666666
    3    42    0.44444444444444453    0.44444444444444436    0.719298245614035    0.7083333333333334
    
    Updated dataset after performing PCA feature selection :
    id    col_0    col_1    species
    0    14    0.661743    0.114347    1
    1    44    -0.737294    -0.114781    3
    2    10    0.641438    -0.236553    1
    3    46    -0.392272    0.264719    3
    4    23    0.732906    0.194758    1
    5    34    -0.118346    0.370489    3
    6    15    0.636191    -0.652497    1
    7    42    -0.282277    0.030886    3
    8    19    0.692194    -0.280471    1
    9    55    -0.519254    -0.047785    3
    
    Data Transformation completed.
    xgboost lasso
    
     Prediction : 
       id  Prediction  Confidence_Lower  Confidence_upper  species
    0  23           1               1.0               1.0        1
    1  19           1               1.0               1.0        1
    2  11           1               1.0               1.0        1
    3  18           1               1.0               1.0        1
    4  22           1               1.0               1.0        1
    5  12           1               1.0               1.0        1
    6  44           1               0.5               0.5        3
    7  46           1               0.5               0.5        3
    8  34           1               0.5               0.5        3
    9  42           1               0.5               0.5        3
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall   F1  Support
    SeqNum                                                                                
    0               1  CLASS_1       10       10       10   0.333333     1.0  0.5       10
    2               3  CLASS_3        0        0        0   0.000000     0.0  0.0       10
    1               2  CLASS_2        0        0        0   0.000000     0.0  0.0       10
    
     Confusion Matrix : 
    array([[10,  0,  0],
           [10,  0,  0],
           [10,  0,  0]], dtype=int64)
    
    >>> prediction.head()
    
    >>> prediction.head()
    
    id    Prediction    Confidence_Lower    Confidence_upper    species
    12    1    1.0    1.0    1
    14    1    1.0    1.0    1
    15    1    1.0    1.0    1
    18    1    1.0    1.0    1
    20    1    0.5    0.5    2
    21    1    0.5    0.5    2
    19    1    1.0    1.0    1
    13    1    1.0    1.0    1
    11    1    1.0    1.0    1
    10    1    1.0    1.0    1