Run AutoML for multiclass classification problem using early stopping timer - Example 5: Run AutoML for Multiclass Classification Problem using Early Stopping Timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predicts the species of iris flower based on different factors.

Run AutoML to acquire the most effective model with the following specifications:

  • Use early stopping timer to 300 sec.
  • Include only ‘xgboost’ model for training.
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "iris_input")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> iris_sample = iris.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoML instance.
    >>> aml = AutoML(task_type="Classification"
                     include=['xgboost'],
                     verbose=2,
                     max_runtime_secs=300)
  3. Fit training data.
    >>> aml.fit(iris_train, iris_train.species)
    
    # Fitting train data
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    Column Summary:
    ColumnName    Datatype    NonNullCount    NullCount    BlankCount    ZeroCount    PositiveCount    NegativeCount    NullPercentage    NonNullPercentage
    petal_width    FLOAT    120    0    None    0    120    0    0.0    100.0
    petal_length    FLOAT    120    0    None    0    120    0    0.0    100.0
    sepal_length    FLOAT    120    0    None    0    120    0    0.0    100.0
    species    INTEGER    120    0    None    0    120    0    0.0    100.0
    sepal_width    FLOAT    120    0    None    0    120    0    0.0    100.0
    id    INTEGER    120    0    None    0    120    0    0.0    100.0
    Statistics of Data:
    func    id    sepal_length    sepal_width    petal_length    petal_width    species
    50%    70.5    5.85    3    4.4    1.35    2
    count    120    120    120    120    120    120
    mean    73.083    5.868    3.06    3.767    1.201    2
    min    1    4.3    2    1    0.1    1
    max    149    7.9    4.2    6.7    2.5    3
    75%    112.25    6.425    3.3    5.1    1.8    3
    25%    34.75    5.175    2.8    1.5    0.3    1
    std    43.9    0.807    0.424    1.762    0.766    0.82
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
        ColumnName  OutlierPercentage
    0  sepal_width                2.5
                                                                                            
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                            
    Feature Engineering started ...
                                                                                            
    Handling duplicate records present in dataset ...
                                                                                            
    Updated dataset after removing duplicate records:
    id    sepal_length    sepal_width    petal_length    petal_width    species
    17    5.4    3.9    1.3    0.4    1
    49    5.3    3.7    1.5    0.2    1
    71    5.9    3.2    4.8    1.8    2
    89    5.6    3.0    4.1    1.3    2
    33    5.2    4.1    1.5    0.1    1
    112    6.4    2.7    5.3    1.9    3
    55    6.5    2.8    4.6    1.5    2
    35    4.9    3.1    1.5    0.2    1
    13    4.8    3.0    1.4    0.1    1
    54    5.5    2.3    4.0    1.3    2
                                                                                            
    Handling less significant features from data ...
                                                                                            
    Total time to handle less significant features: 5.49 sec
                                                                                             
    Handling Date Features ...
    Dataset does not contain any feature related to dates.                                   
                                                                                             
    Total time to handle date features: 0.00 sec
                                                                                             
    Checking Missing values in dataset ...
    No Missing Value Detected.                                                               
                                                                                             
    Total time to find missing values in data: 6.96 sec
                                                                                             
    Imputing Missing Values ...
    No imputation is Required.                                                               
                                                                                             
    Time taken to perform imputation: 0.01 sec
                                                                                             
    Performing encoding for categorical columns ...
    Encoding not required.                                                                   
                                                                                             
    Time taken to encode the columns: 1.21 sec
                                                                                             
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                             
    Data preparation started ...
                                                                                             
    Spliting of dataset into training and testing ...
    Training size : 0.8                                                                      
    Testing size  : 0.2                                                                      
                                                                                             
    Training data
    sepal_length    sepal_width    petal_length    petal_width    species    id
    5.3    3.7    1.5    0.2    1    10
    6.5    2.8    4.6    1.5    2    13
    4.9    3.1    1.5    0.2    1    21
    4.8    3.0    1.4    0.1    1    14
    6.3    2.5    4.9    1.5    2    15
    4.6    3.1    1.5    0.2    1    23
    5.6    3.0    4.1    1.3    2    9
    5.0    3.4    1.5    0.2    1    17
    5.4    3.9    1.3    0.4    1    12
    6.7    3.1    4.4    1.4    2    20
                                                                                             
    Testing data
    sepal_length    sepal_width    petal_length    petal_width    species    id
    4.8    3.0    1.4    0.3    1    31
    5.9    3.0    4.2    1.5    2    29
    5.8    2.8    5.1    2.4    3    77
    4.9    3.1    1.5    0.1    1    28
    5.6    2.5    3.9    1.1    2    30
    6.4    3.1    5.5    1.8    3    126
    6.0    2.2    5.0    1.5    3    27
    5.0    3.0    1.6    0.2    1    107
    6.5    3.2    5.1    2.0    3    124
    6.4    2.8    5.6    2.2    3    79
                                                                                             
    Time taken for spliting of data: 8.62 sec
                                                                                             
    Outlier preprocessing ...
    Columns with outlier percentage :-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
        ColumnName  OutlierPercentage
    0           id           3.333333
    1  sepal_width           2.500000
                                                                                             
    Deleting rows of these columns:
    ['sepal_width', 'id']
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710253139382738"'5
                                                                                             
    Time Taken by Outlier processing: 31.94 sec
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710253422711382"'5
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710252970147583"'
                                                                                             
    Checking imbalance data ...
                                                                                             
    Imbalance Not Found.
                                                                                              
    Feature selection using lasso ...
                                                                                              
    feature selected by lasso:
    ['sepal_width', 'petal_width', 'petal_length', 'sepal_length']
                                                                                              
    Total time taken by feature selection: 2.46 sec
                                                                                              
    scaling Features of lasso data ...
                                                                                              
    columns that will be scaled:
    ['sepal_width', 'petal_width', 'petal_length', 'sepal_length']
                                                                                              
    Training dataset after scaling:
    id    species    sepal_width    petal_width    petal_length    sepal_length
    40    2    0.18181818181818177    0.4166666666666667    0.49122807017543857    0.3333333333333333
    38    1    0.6363636363636362    0.04166666666666667    0.15789473684210525    0.13888888888888887
    99    1    0.6363636363636362    0.08333333333333333    0.07017543859649121    0.08333333333333327
    61    1    0.7272727272727273    0.04166666666666667    0.0    0.08333333333333327
    93    2    0.45454545454545453    0.5416666666666666    0.5964912280701755    0.6388888888888887
    34    1    0.45454545454545453    0.04166666666666667    0.07017543859649121    0.1666666666666668
    78    1    0.6363636363636362    0.04166666666666667    0.10526315789473685    0.13888888888888887
    19    3    0.31818181818181823    0.75    0.7543859649122806    0.5833333333333334
    17    1    0.6363636363636362    0.04166666666666667    0.08771929824561403    0.19444444444444448
    76    3    0.5454545454545455    0.9166666666666666    0.8596491228070176    0.6944444444444443
                                                                                              
    Testing dataset after scaling:
    id    species    sepal_width    petal_width    petal_length    sepal_length
    77    3    0.36363636363636354    0.9583333333333333    0.719298245614035    0.41666666666666663
    198    3    0.31818181818181823    0.75    0.719298245614035    0.41666666666666663
    95    2    0.31818181818181823    0.375    0.5438596491228069    0.41666666666666663
    238    2    0.09090909090909098    0.375    0.5263157894736842    0.4722222222222222
    126    3    0.5    0.7083333333333334    0.7894736842105263    0.5833333333333334
    123    3    0.31818181818181823    0.75    0.719298245614035    0.41666666666666663
    79    3    0.36363636363636354    0.8750000000000001    0.8070175438596491    0.5833333333333334
    30    2    0.22727272727272727    0.4166666666666667    0.5087719298245613    0.361111111111111
    101    2    0.409090909090909    0.5    0.45614035087719296    0.361111111111111
    26    3    0.5    0.8333333333333334    0.7719298245614036    0.7222222222222222
                                                                                              
    Total time taken by feature scaling: 31.72 sec
                                                                                              
    Feature selection using rfe ...
                                                                                              
    feature selected by RFE:
    ['petal_length', 'petal_width']
                                                                                              
    Total time taken by feature selection: 6.92 sec
                                                                                              
    scaling Features of rfe data ...
                                                                                              
    columns that will be scaled:
    ['r_petal_length', 'r_petal_width']
                                                                                              
    Training dataset after scaling:
    id    species    r_petal_length    r_petal_width
    40    2    0.49122807017543857    0.4166666666666667
    38    1    0.15789473684210525    0.04166666666666667
    99    1    0.07017543859649121    0.08333333333333333
    61    1    0.0    0.04166666666666667
    93    2    0.5964912280701755    0.5416666666666666
    34    1    0.07017543859649121    0.04166666666666667
    78    1    0.10526315789473685    0.04166666666666667
    19    3    0.7543859649122806    0.75
    17    1    0.08771929824561403    0.04166666666666667
    76    3    0.8596491228070176    0.9166666666666666
                                                                                              
    Testing dataset after scaling:
    id    species    r_petal_length    r_petal_width
    77    3    0.719298245614035    0.9583333333333333
    198    3    0.719298245614035    0.75
    95    2    0.5438596491228069    0.375
    238    2    0.5263157894736842    0.375
    126    3    0.7894736842105263    0.7083333333333334
    123    3    0.719298245614035    0.75
    79    3    0.8070175438596491    0.8750000000000001
    30    2    0.5087719298245613    0.4166666666666667
    101    2    0.45614035087719296    0.5
    26    3    0.7719298245614036    0.8333333333333334
                                                                                              
    Total time taken by feature scaling: 31.84 sec
                                                                                              
    scaling Features of pca data ...
                                                                                              
    columns that will be scaled:
    ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
                                                                                              
    Training dataset after scaling:
    id    species    sepal_length    sepal_width    petal_length    petal_width
    56    3    0.5555555555555555    0.36363636363636354    0.719298245614035    0.5833333333333334
    63    3    0.4999999999999999    0.45454545454545453    0.6842105263157895    0.7083333333333334
    110    2    0.4999999999999999    0.409090909090909    0.6491228070175439    0.5416666666666666
    8    1    0.055555555555555594    0.13636363636363627    0.052631578947368425    0.08333333333333333
    94    2    0.4722222222222222    0.31818181818181823    0.719298245614035    0.625
    39    3    0.8055555555555556    0.45454545454545453    0.8421052631578947    0.625
    70    3    0.8055555555555556    0.5454545454545455    0.8771929824561403    0.7083333333333334
    12    1    0.30555555555555564    0.8636363636363635    0.052631578947368425    0.12500000000000003
    46    1    0.30555555555555564    0.6363636363636362    0.08771929824561403    0.12500000000000003
    37    3    0.6666666666666666    0.22727272727272727    0.8421052631578947    0.7083333333333334
                                                                                              
    Testing dataset after scaling:
    id    species    sepal_length    sepal_width    petal_length    petal_width
    26    3    0.7222222222222222    0.5          0.7719298245614036    0.8333333333333334
    30    2    0.361111111111111     0.22727272727272727    0.5087719298245613    0.4166666666666667
    126   3    0.5833333333333334    0.5    0.7894736842105263    0.7083333333333334
    28    1    0.1666666666666668    0.5    0.08771929824561403    0.0
    31    1    0.13888888888888887   0.45454545454545453    0.07017543859649121    0.08333333333333333
    79    3    0.5833333333333334    0.36363636363636354    0.8070175438596491    0.8750000000000001
    27    3    0.4722222222222222    0.09090909090909098    0.7017543859649122    0.5833333333333334
    107   1    0.19444444444444448   0.45454545454545453    0.10526315789473685    0.04166666666666667
    124   3    0.611111111111111     0.5454545454545455    0.719298245614035    0.7916666666666666
    50    1    0.19444444444444448   0.5454545454545455    0.035087719298245605    0.04166666666666667
                                                                                              
    Total time taken by feature scaling: 31.30 sec
                                                                                              
    Dimension Reduction using pca ...
                                                                                              
    PCA columns:
    ['col_0', 'col_1']
                                                                                              
    Total time taken by PCA: 8.38 sec
                                                                                              
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                              
    Model Training started ...
                                                                                              
    Hyperparameters used for model training:
    response_column : species                                                                                                                             
    name : xgboost
    model_type : Classification
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.1, 0.2)
    max_depth : (5, 6, 7, 8)
    min_node_size : (1, 2)
    iter_num : (10, 20)
    Total number of models for xgboost : 768
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
                                                                                              
    Performing hyperParameter tuning ...
                                                                                              
    xgboost
    XGBOOST_0                                                                                                                                                                                               
    XGBOOST_1                                                                                 
    XGBOOST_2                                                                                 
                                                                                              
    ----------------------------------------------------------------------------------------------------
                                                                                              
    Evaluating models performance ...
                                                                                              
    Evaluation completed.
                                                                                              
    Leaderboard
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    xgboost    lasso    0.958333    0.958333    0.958333    0.958333    0.962963    0.958333    0.958170    0.962963    0.958333    0.958170
    1    2    xgboost    rfe    0.958333    0.958333    0.958333    0.958333    0.962963    0.958333    0.958170    0.962963    0.958333    0.958170
    2    3    xgboost    pca    0.875000    0.875000    0.875000    0.875000    0.909091    0.875000    0.870445    0.909091    0.875000    0.870445
                                                                                              
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 15/15
  4. Display model leaderboard.
    >>> aml.leaderboard()
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    xgboost    lasso    0.958333    0.958333    0.958333    0.958333    0.962963    0.958333    0.958170    0.962963    0.958333    0.958170
    1    2    xgboost    rfe    0.958333    0.958333    0.958333    0.958333    0.962963    0.958333    0.958170    0.962963    0.958333    0.958170
    2    3    xgboost    pca    0.875000    0.875000    0.875000    0.875000    0.909091    0.875000    0.870445    0.909091    0.875000    0.870445
  5. Display the best performing model.
    >>> aml.leader()
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    xgboost    lasso    0.958333    0.958333    0.958333    0.958333    0.962963    0.958333    0.95817    0.962963    0.958333    0.95817
  6. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    xgboost lasso
    
     Prediction : 
        id  Prediction  Confidence_Lower  Confidence_upper  species
    0   77           3             0.875             0.875        3
    1  198           3             0.875             0.875        3
    2   95           2             1.000             1.000        2
    3  238           2             1.000             1.000        2
    4  126           3             1.000             1.000        3
    5  123           3             0.875             0.875        3
    6   79           3             0.875             0.875        3
    7   30           2             1.000             1.000        2
    8  101           2             1.000             1.000        2
    9   26           3             1.000             1.000        3
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall        F1  Support
    SeqNum                                                                                     
    0               1  CLASS_1        8        0        0   1.000000   1.000  1.000000        8
    2               3  CLASS_3        0        0        7   1.000000   0.875  0.933333        8
    1               2  CLASS_2        0        8        1   0.888889   1.000  0.941176        8
    
     Confusion Matrix : 
    array([[8, 0, 0],
           [0, 8, 0],
           [0, 1, 7]], dtype=int64)
    
    
    >>> prediction.head()
    
    id    Prediction    Confidence_Lower    Confidence_upper    species
    28    1    0.875    0.875    1
    30    2    1.0        1.0    2
    31    1    0.875    0.875    1
    50    1    0.875    0.875    1
    79    3    0.875    0.875    3
    95    2    1.0        1.0    2
    77    3    0.875    0.875    3
    29    2    0.875    0.875    2
    27    2    0.875    0.875    3
    26    3    1.0        1.0    3
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(iris_test)
    
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping irrelevent columns :
    sepal_length    sepal_width    petal_length    petal_width    species
    5.1    3.7    1.5    0.4    1
    4.6    3.2    1.4    0.2    1
    5.6    2.7    4.2    1.3    2
    4.8    3.1    1.6    0.2    1
    4.9    3.6    1.4    0.1    1
    5.8    2.7    3.9    1.2    2
    5.7    4.4    1.5    0.4    1
    4.4    3.0    1.3    0.2    1
    5.0    3.5    1.6    0.6    1
    5.1    3.4    1.5    0.2    1
    
    Updated dataset after performing target column transformation :
    sepal_width    petal_length    sepal_length    petal_width    id    species
    3.2    1.4    4.6    0.2    13    1
    3.7    1.5    5.1    0.4    10    1
    3.4    1.5    5.1    0.2    18    1
    3.6    1.4    4.9    0.1    12    1
    3.1    1.6    4.8    0.2    14    1
    3.5    1.6    5.0    0.6    22    1
    4.4    1.5    5.7    0.4    15    1
    3.0    1.3    4.4    0.2    23    1
    2.7    3.9    5.8    1.2    20    2
    2.7    4.2    5.6    1.3    21    2
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1710259143008936"'
    
    Updated dataset after performing Lasso feature selection:
    id    sepal_width    petal_width    petal_length    sepal_length    species
    19    3.8    0.2    1.6    5.1    1
    22    3.5    0.6    1.6    5.0    1
    37    2.8    2.1    5.6    6.4    3
    34    2.5    1.7    4.5    4.9    3
    36    3.8    2.2    6.7    7.7    3
    28    3.0    1.4    4.6    6.1    2
    15    4.4    0.4    1.5    5.7    1
    39    2.8    1.3    4.1    5.7    2
    38    2.6    2.3    6.9    7.7    3
    55    3.0    2.3    5.2    6.7    3
    
    Updated dataset after performing scaling on Lasso selected features :
    id    species    sepal_width    petal_width    petal_length    sepal_length
    19    1    0.818181818181818    0.04166666666666667    0.10526315789473685    0.22222222222222213
    26    2    0.13636363636363627    0.375    0.40350877192982454    0.19444444444444448
    43    3    0.36363636363636354    0.7916666666666666    0.6842105263157895    0.361111111111111
    15    1    1.090909090909091    0.12500000000000003    0.08771929824561403    0.38888888888888895
    36    3    0.818181818181818    0.8750000000000001    1.0    0.9444444444444444
    28    2    0.45454545454545453    0.5416666666666666    0.631578947368421    0.4999999999999999
    34    3    0.22727272727272727    0.6666666666666666    0.6140350877192983    0.1666666666666668
    13    1    0.5454545454545455    0.04166666666666667    0.07017543859649121    0.08333333333333327
    38    3    0.27272727272727276    0.9166666666666666    1.0350877192982457    0.9444444444444444
    55    3    0.45454545454545453    0.9166666666666666    0.7368421052631579    0.6666666666666666
    
    Updated dataset after performing RFE feature selection:
    id    petal_length    petal_width    species
    22    1.6    0.6    1
    36    6.7    2.2    3
    28    4.6    1.4    2
    19    1.6    0.2    1
    34    4.5    1.7    3
    13    1.4    0.2    1
    15    1.5    0.4    1
    39    4.1    1.3    2
    38    6.9    2.3    3
    55    5.2    2.3    3
    
    Updated dataset after performing scaling on RFE selected features :
    id    species    r_petal_length    r_petal_width
    22    1    0.10526315789473685    0.20833333333333334
    36    3    1.0    0.8750000000000001
    28    2    0.631578947368421    0.5416666666666666
    19    1    0.10526315789473685    0.04166666666666667
    38    3    1.0350877192982457    0.9166666666666666
    55    3    0.7368421052631579    0.9166666666666666
    15    1    0.08771929824561403    0.12500000000000003
    39    2    0.5438596491228069    0.5
    34    3    0.6140350877192983    0.6666666666666666
    13    1    0.07017543859649121    0.04166666666666667
    
    Updated dataset after performing scaling for PCA feature selection :
    id    species    sepal_length    sepal_width    petal_length    petal_width
    22    1    0.19444444444444448    0.6818181818181818    0.10526315789473685    0.20833333333333334
    19    1    0.22222222222222213    0.818181818181818    0.10526315789473685    0.04166666666666667
    30    2    0.3333333333333333    0.22727272727272727    0.5263157894736842    0.5
    15    1    0.38888888888888895    1.090909090909091    0.08771929824561403    0.12500000000000003
    26    2    0.19444444444444448    0.13636363636363627    0.40350877192982454    0.375
    43    3    0.361111111111111    0.36363636363636354    0.6842105263157895    0.7916666666666666
    36    3    0.9444444444444444    0.818181818181818    1.0    0.8750000000000001
    28    2    0.4999999999999999    0.45454545454545453    0.631578947368421    0.5416666666666666
    38    3    0.9444444444444444    0.27272727272727276    1.0350877192982457    0.9166666666666666
    55    3    0.6666666666666666    0.45454545454545453    0.7368421052631579    0.9166666666666666
    
    Updated dataset after performing PCA feature selection :
    id    col_0    col_1    species
    0    26    0.173228    0.431893    2
    1    34    -0.116759    0.345608    3
    2    22    0.554195    -0.084019    1
    3    19    0.669626    -0.210643    1
    4    38    -0.862007    0.050611    3
    5    36    -0.725450    -0.461870    3
    6    15    0.601462    -0.528200    1
    7    43    -0.300226    0.148522    3
    8    13    0.706527    0.090071    1
    9    37    -0.498051    0.078520    3
    
    Data Transformation completed.
    xgboost lasso
    
     Prediction : 
       id  Prediction  Confidence_Lower  Confidence_upper  species
    0  19           1             1.000             1.000        1
    1  22           1             0.875             0.875        1
    2  37           3             0.875             0.875        3
    3  15           1             1.000             1.000        1
    4  38           3             0.875             0.875        3
    5  55           3             1.000             1.000        3
    6  34           2             0.750             0.750        3
    7  13           1             0.875             0.875        1
    8  36           3             1.000             1.000        3
    9  28           2             0.875             0.875        2
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall        F1  Support
    SeqNum                                                                                     
    0               1  CLASS_1       10        0        0   1.000000     1.0  1.000000       10
    2               3  CLASS_3        0        0        9   1.000000     0.9  0.947368       10
    1               2  CLASS_2        0       10        1   0.909091     1.0  0.952381       10
    
     Confusion Matrix : 
    array([[10,  0,  0],
           [ 0, 10,  0],
           [ 0,  1,  9]], dtype=int64)
    
    
    >>> prediction.head()
    
    id    Prediction    Confidence_Lower    Confidence_upper    species
    12    1    0.875    0.875    1
    14    1    0.875    0.875    1
    15    1    1.0        1.0    1
    18    1    0.875    0.875    1
    20    2    1.0        1.0    2
    21    2    1.0        1.0    2
    19    1    1.0        1.0    1
    13    1    0.875    0.875    1
    11    1    0.875    0.875    1
    10    1    0.875    0.875    1