Run AutoML for multiclass classification problem using early stopping timer - Example 5: Run AutoML for Multiclass Classification Problem using Early Stopping Timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predicts the species of iris flower based on different factors.

Run AutoML to acquire the most effective model with the following specifications:
  • Use early stopping timer to 300 sec.
  • Include only ‘xgboost’ model for training.
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "iris_input")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> iris_sample = iris.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> iris_train= iris_sample[iris_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> iris_test = iris_sample[iris_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoML instance.
    >>> aml = AutoML(task_type="Classification"
                     include=['xgboost'],
                     verbose=2,
                     max_runtime_secs=300)
  3. Fit training data.
    >>> aml.fit(iris_train, iris_train.species)
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    
    Data Overview:
    Total Rows in the data: 120
    Total Columns in the data: 6
    
    Column Summary:
    ColumnName	Datatype	NonNullCount	NullCount	BlankCount	ZeroCount	PositiveCount	NegativeCount	NullPercentage	NonNullPercentage
    species	INTEGER	120	0	None	0	120	0	0.0	100.0
    sepal_length	FLOAT	120	0	None	0	120	0	0.0	100.0
    id	INTEGER	120	0	None	0	120	0	0.0	100.0
    sepal_width	FLOAT	120	0	None	0	120	0	0.0	100.0
    petal_width	FLOAT	120	0	None	0	120	0	0.0	100.0
    petal_length	FLOAT	120	0	None	0	120	0	0.0	100.0
    
    Statistics of Data:
    func	id	sepal_length	sepal_width	petal_length	petal_width	species
    min	1	4.3	2	1	0.1	1
    std	42.789	0.873	0.45	1.811	0.765	0.82
    25%	35.75	5.1	2.8	1.5	0.3	1
    50%	75.5	5.8	3	4.35	1.3	2
    75%	109.25	6.5	3.325	5.1	1.8	3
    max	147	7.9	4.4	6.9	2.5	3
    mean	73.592	5.877	3.068	3.746	1.185	1.983
    count	120	120	120	120	120	120
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                           
        ColumnName  OutlierPercentage
    0  sepal_width           3.333333
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Feature Engineering started ...
    
    Handling duplicate records present in dataset ...
    Analysis completed. No action taken.                                                    
    
    Total time to handle duplicate records: 1.59 sec
    
    Handling less significant features from data ...
    
    Total time to handle less significant features: 5.89 sec
    
    Handling Date Features ...
    Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    
    Total time to handle date features: 0.00 sec
    
    Checking Missing values in dataset ...
    Analysis Completed. No Missing Values Detected.                                          
    
    Total time to find missing values in data: 7.32 sec
    Imputing Missing Values ...
    Analysis completed. No imputation required.                                              
    
    Time taken to perform imputation: 0.01 sec
    
    Performing encoding for categorical columns ...
    Analysis completed. No categorical columns were found.                                   
    
    Time taken to encode the columns: 1.56 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Data preparation started ...
    
    Spliting of dataset into training and testing ...
    Training size : 0.8                                                                      
    Testing size  : 0.2                                                                      
    
    Training data sample
    sepal_length	sepal_width	petal_length	petal_width	species	id
    5.6	2.8	4.9	2.0	3	11
    6.0	2.2	5.0	1.5	3	10
    6.3	3.3	4.7	1.6	2	18
    5.1	2.5	3.0	1.1	2	13
    5.7	2.6	3.5	1.0	2	12
    4.9	3.6	1.4	0.1	1	20
    5.0	2.0	3.5	1.0	2	14
    6.7	3.1	5.6	2.4	3	22
    6.3	3.3	6.0	2.5	3	9
    5.4	3.9	1.3	0.4	1	17
    
    96 rows X 6 columns
    
    Testing data sample
    sepal_length	sepal_width	petal_length	petal_width	species	id
    4.6	3.1	1.5	0.2	1	114
    6.4	2.7	5.3	1.9	3	29
    5.4	3.9	1.7	0.4	1	85
    5.7	2.9	4.2	1.3	2	31
    5.5	4.2	1.4	0.2	1	25
    7.7	3.8	6.7	2.2	3	28
    6.1	3.0	4.6	1.4	2	84
    5.0	3.2	1.2	0.2	1	30
    5.6	2.9	3.6	1.3	2	102
    6.3	2.9	5.6	1.8	3	95
    
    24 rows X 6 columns
    
    Time taken for spliting of data: 11.17 sec
    
    Outlier preprocessing ...
    Columns with outlier percentage :-                                                                           
        ColumnName  OutlierPercentage
    0  sepal_width           3.333333
    
    Deleting rows of these columns:
    ['sepal_width']
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713844652180031"'5
    
    Sample of training dataset after removing outlier rows:
    sepal_length	sepal_width	petal_length	petal_width	species	id
    7.3	2.9	6.3	1.8	3	55
    6.7	2.5	5.8	1.8	3	108
    6.7	3.3	5.7	2.1	3	87
    6.1	2.8	4.0	1.3	2	52
    7.2	3.0	5.8	1.6	3	120
    7.2	3.6	6.1	2.5	3	26
    7.4	2.8	6.1	1.9	3	24
    6.2	2.2	4.5	1.5	2	107
    5.4	3.0	4.5	1.5	2	71
    5.4	3.7	1.5	0.2	1	41
    
    93 rows X 6 columns
    
    Time Taken by Outlier processing: 35.26 sec
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713844313813785"'5
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713850192151644"'
    
    Checking imbalance data ...
    
    Imbalance Not Found.
    
    Feature selection using lasso ...
    
    feature selected by lasso:
    ['sepal_length', 'petal_width', 'sepal_width', 'petal_length']
    
    Total time taken by feature selection: 3.57 sec
    
    scaling Features of lasso data ...
    
    columns that will be scaled:
    ['sepal_length', 'petal_width', 'sepal_width', 'petal_length']
    
    Training dataset sample after scaling:
    id	species	sepal_length	petal_width	sepal_width	petal_length
    40	1	0.1818181818181817	0.04166666666666667	0.7777777777777778	0.035087719298245605
    80	1	0.2121212121212119	0.08333333333333333	0.7222222222222222	0.035087719298245605
    112	1	0.1818181818181817	0.04166666666666667	0.6666666666666666	0.052631578947368425
    61	1	0.1818181818181817	0.04166666666666667	0.6111111111111109	0.035087719298245605
    26	3	0.8484848484848485	1.0	0.7777777777777778	0.8596491228070174
    34	1	0.06060606060606039	0.08333333333333333	0.6666666666666666	0.035087719298245605
    78	1	0.12121212121212106	0.04166666666666667	0.5	0.07017543859649125
    122	3	0.6363636363636362	0.8750000000000001	0.44444444444444436	0.8070175438596491
    17	1	0.30303030303030304	0.12500000000000003	0.9444444444444444	0.017543859649122823
    76	1	0.12121212121212106	0.04166666666666667	0.6666666666666666	0.07017543859649125
    
    93 rows X 6 columns
    
    Testing dataset sample after scaling:
    id	species	sepal_length	petal_width	sepal_width	petal_length
    127	3	1.0606060606060606	0.7916666666666666	0.8888888888888887	0.9122807017543859
    118	3	0.5757575757575756	0.7083333333333334	0.2777777777777778	0.6491228070175439
    99	1	-0.030303030303030467	0.0	0.44444444444444436	-0.017543859649122782
    116	3	0.5151515151515149	0.7083333333333334	0.44444444444444436	0.6491228070175439
    130	2	0.33333333333333326	0.375	0.11111111111111098	0.43859649122807015
    27	2	0.6666666666666665	0.5	0.3888888888888888	0.5964912280701753
    85	1	0.30303030303030304	0.12500000000000003	0.9444444444444444	0.08771929824561403
    30	1	0.1818181818181817	0.04166666666666667	0.5555555555555556	0.0
    101	2	0.48484848484848475	0.375	0.0	0.49122807017543857
    93	2	0.48484848484848475	0.625	0.2777777777777778	0.6842105263157894
    
    24 rows X 6 columns
    
    Total time taken by feature scaling: 38.49 sec
    
    Feature selection using rfe ...
    
    feature selected by RFE:
    ['petal_length', 'petal_width']
    
    Total time taken by feature selection: 10.98 sec
    
    scaling Features of rfe data ...
    
    columns that will be scaled:
    ['r_petal_length', 'r_petal_width']
    
    Training dataset sample after scaling:
    id	species	r_petal_length	r_petal_width
    40	1	0.035087719298245605	0.04166666666666667
    80	1	0.035087719298245605	0.08333333333333333
    112	1	0.052631578947368425	0.04166666666666667
    61	1	0.035087719298245605	0.04166666666666667
    26	3	0.8596491228070174	1.0
    34	1	0.035087719298245605	0.08333333333333333
    78	1	0.07017543859649125	0.04166666666666667
    122	3	0.8070175438596491	0.8750000000000001
    17	1	0.017543859649122823	0.12500000000000003
    76	1	0.07017543859649125	0.04166666666666667
    
    93 rows X 4 columns
    
    Testing dataset sample after scaling:
    id	species	r_petal_length	r_petal_width
    127	3	0.9122807017543859	0.7916666666666666
    118	3	0.6491228070175439	0.7083333333333334
    99	1	-0.017543859649122782	0.0
    116	3	0.6491228070175439	0.7083333333333334
    130	2	0.43859649122807015	0.375
    27	2	0.5964912280701753	0.5
    85	1	0.08771929824561403	0.12500000000000003
    30	1	0.0	0.04166666666666667
    101	2	0.49122807017543857	0.375
    93	2	0.6842105263157894	0.625
    
    24 rows X 4 columns
    
    Total time taken by feature scaling: 35.82 sec
    
    scaling Features of pca data ...
    columns that will be scaled:
    ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
    
    Training dataset sample after scaling:
    id	species	sepal_length	sepal_width	petal_length	petal_width
    18	2	0.5757575757575756	0.6111111111111109	0.6140350877192983	0.625
    52	2	0.5151515151515149	0.33333333333333315	0.49122807017543857	0.5
    44	3	0.5151515151515149	0.22222222222222218	0.7719298245614034	0.5416666666666666
    55	3	0.8787878787878787	0.3888888888888888	0.894736842105263	0.7083333333333334
    120	3	0.8484848484848485	0.44444444444444436	0.8070175438596491	0.625
    26	3	0.8484848484848485	0.7777777777777778	0.8596491228070174	1.0
    24	3	0.9090909090909092	0.33333333333333315	0.8596491228070174	0.75
    107	2	0.5454545454545454	0.0	0.5789473684210525	0.5833333333333334
    108	3	0.6969696969696969	0.16666666666666657	0.8070175438596491	0.7083333333333334
    87	3	0.6969696969696969	0.6111111111111109	0.7894736842105263	0.8333333333333334
    
    93 rows X 6 columns
    
    Testing dataset sample after scaling:
    id	species	sepal_length	sepal_width	petal_length	petal_width
    91	3	0.7575757575757576	0.5	0.7368421052631579	0.8333333333333334
    85	1	0.30303030303030304	0.9444444444444444	0.08771929824561403	0.12500000000000003
    114	1	0.06060606060606039	0.5	0.052631578947368425	0.04166666666666667
    130	2	0.33333333333333326	0.11111111111111098	0.43859649122807015	0.375
    102	2	0.3636363636363634	0.3888888888888888	0.4210526315789474	0.5
    31	2	0.3939393939393939	0.3888888888888888	0.5263157894736842	0.5
    95	3	0.5757575757575756	0.3888888888888888	0.7719298245614034	0.7083333333333334
    28	3	1.0	0.8888888888888887	0.9649122807017544	0.8750000000000001
    84	2	0.5151515151515149	0.44444444444444436	0.5964912280701753	0.5416666666666666
    30	1	0.1818181818181817	0.5555555555555556	0.0	0.04166666666666667
    
    24 rows X 6 columns
    
    Total time taken by feature scaling: 33.92 sec
    
    Dimension Reduction using pca ...
    
    PCA columns:
    ['col_0', 'col_1']
    
    Total time taken by PCA: 10.29 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Model Training started ...
    
    Hyperparameters used for model training:
    response_column : species                                                                                    
    name : xgboost
    model_type : Classification
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.1, 0.2)
    max_depth : (5, 6, 7, 8)
    min_node_size : (1, 2)
    iter_num : (10, 20)
    Total number of models for xgboost : 768
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    Performing hyperParameter tuning ...
    
    xgboost
    
    ----------------------------------------------------------------------------------------------------
    
    Evaluating models performance ...
    
    Evaluation completed.
    
    Leaderboard
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_2	pca	0.958333	0.958333	0.958333	0.958333	0.962963	0.958333	0.958170	0.962963	0.958333	0.958170
    1	2	XGBOOST_5	pca	0.958333	0.958333	0.958333	0.958333	0.962963	0.958333	0.958170	0.962963	0.958333	0.958170
    2	3	XGBOOST_1	rfe	0.875000	0.875000	0.875000	0.875000	0.878307	0.875000	0.874510	0.878307	0.875000	0.874510
    3	4	XGBOOST_4	rfe	0.875000	0.875000	0.875000	0.875000	0.878307	0.875000	0.874510	0.878307	0.875000	0.874510
    4	5	XGBOOST_7	rfe	0.875000	0.875000	0.875000	0.875000	0.878307	0.875000	0.874510	0.878307	0.875000	0.874510
    5	6	XGBOOST_0	lasso	0.791667	0.791667	0.791667	0.791667	0.812121	0.791667	0.784076	0.812121	0.791667	0.784076
    6	7	XGBOOST_3	lasso	0.791667	0.791667	0.791667	0.791667	0.812121	0.791667	0.784076	0.812121	0.791667	0.784076
    7	8	XGBOOST_6	lasso	0.750000	0.750000	0.750000	0.750000	0.857143	0.750000	0.709091	0.857143	0.750000	0.709091
    
    8 rows X 13 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 15/15  
  4. Display model leaderboard.
    >>> aml.leaderboard()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_2	pca	0.958333	0.958333	0.958333	0.958333	0.962963	0.958333	0.958170	0.962963	0.958333	0.958170
    1	2	XGBOOST_5	pca	0.958333	0.958333	0.958333	0.958333	0.962963	0.958333	0.958170	0.962963	0.958333	0.958170
    2	3	XGBOOST_1	rfe	0.875000	0.875000	0.875000	0.875000	0.878307	0.875000	0.874510	0.878307	0.875000	0.874510
    3	4	XGBOOST_4	rfe	0.875000	0.875000	0.875000	0.875000	0.878307	0.875000	0.874510	0.878307	0.875000	0.874510
    4	5	XGBOOST_7	rfe	0.875000	0.875000	0.875000	0.875000	0.878307	0.875000	0.874510	0.878307	0.875000	0.874510
    5	6	XGBOOST_0	lasso	0.791667	0.791667	0.791667	0.791667	0.812121	0.791667	0.784076	0.812121	0.791667	0.784076
    6	7	XGBOOST_3	lasso	0.791667	0.791667	0.791667	0.791667	0.812121	0.791667	0.784076	0.812121	0.791667	0.784076
    7	8	XGBOOST_6	lasso	0.750000	0.750000	0.750000	0.750000	0.857143	0.750000	0.709091	0.857143	0.750000	0.709091
  5. Display the best performing model.
    >>> aml.leader()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_2	pca	0.958333	0.958333	0.958333	0.958333	0.962963	0.958333	0.95817	0.962963	0.958333	0.95817
  6. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    Following model is being used for generating prediction :
    Model ID : XGBOOST_2 
    Feature Selection Method : pca
    
     Prediction : 
        id  Prediction  Confidence_Lower  Confidence_upper  species
    0   27           2             0.625             0.625        2
    1   29           3             0.750             0.750        3
    2   30           1             1.000             1.000        1
    3   31           2             0.625             0.625        2
    4   91           3             0.750             0.750        3
    5   84           2             0.625             0.625        2
    6  130           2             1.000             1.000        2
    7   28           3             0.750             0.750        3
    8  114           1             1.000             1.000        1
    9   25           1             1.000             1.000        1
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall        F1  Support
    SeqNum                                                                                     
    0               1  CLASS_1        8        0        0   1.000000   1.000  1.000000        8
    2               3  CLASS_3        0        0        7   1.000000   0.875  0.933333        8
    1               2  CLASS_2        0        8        1   0.888889   1.000  0.941176        8
    
     Confusion Matrix : 
    array([[8, 0, 0],
           [0, 8, 0],
           [0, 1, 7]], dtype=int64)
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	species
    28	3	0.75	0.75	3
    30	1	1.0	1.0	1
    31	2	0.625	0.625	2
    84	2	0.625	0.625	2
    91	3	0.75	0.75	3
    93	2	0.625	0.625	2
    85	1	1.0	1.0	1
    29	3	0.75	0.75	3
    27	2	0.625	0.625	2
    25	1	1.0	1.0	1
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(iris_test)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping irrelevent columns :
    sepal_length	sepal_width	petal_length	petal_width	species
    6.4	2.8	5.6	2.1	3
    4.4	2.9	1.4	0.2	1
    5.6	3.0	4.1	1.3	2
    5.7	2.5	5.0	2.0	3
    4.9	2.4	3.3	1.0	2
    5.5	2.3	4.0	1.3	2
    6.1	2.8	4.7	1.2	2
    5.1	3.8	1.6	0.2	1
    5.1	3.4	1.5	0.2	1
    5.9	3.0	5.1	1.8	3
    
    Updated dataset after performing target column transformation :
    petal_width	sepal_length	petal_length	sepal_width	id	species
    0.2	4.4	1.4	2.9	9	1
    0.2	5.1	1.5	3.4	10	1
    1.8	5.9	5.1	3.0	18	3
    0.4	5.4	1.5	3.4	15	1
    1.0	4.9	3.3	2.4	14	2
    1.3	5.5	4.0	2.3	22	2
    2.0	5.7	5.0	2.5	12	3
    1.4	6.7	4.4	3.1	20	2
    2.1	6.4	5.6	2.8	13	3
    1.8	6.5	5.5	3.0	21	3
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713853197024405"'
    
    Updated dataset after performing Lasso feature selection:
    id	sepal_length	petal_width	sepal_width	petal_length	species
    19	5.1	0.2	3.8	1.6	1
    17	5.6	1.3	3.0	4.1	2
    34	6.2	2.3	3.4	5.4	3
    26	5.1	0.4	3.7	1.5	1
    15	5.4	0.4	3.4	1.5	1
    32	6.7	2.5	3.3	5.7	3
    36	4.8	0.3	3.0	1.4	1
    28	5.8	1.2	2.7	3.9	2
    38	6.3	1.5	2.8	5.1	3
    12	5.7	2.0	2.5	5.0	3
    
    Updated dataset after performing scaling on Lasso selected features :
    id	species	sepal_length	petal_width	sepal_width	petal_length
    19	1	0.2121212121212119	0.04166666666666667	0.8888888888888887	0.07017543859649125
    38	3	0.5757575757575756	0.5833333333333334	0.33333333333333315	0.6842105263157894
    12	3	0.3939393939393939	0.7916666666666666	0.16666666666666657	0.6666666666666666
    17	2	0.3636363636363634	0.5	0.44444444444444436	0.5087719298245613
    22	2	0.33333333333333326	0.5	0.055555555555555365	0.49122807017543857
    35	1	0.30303030303030304	0.04166666666666667	0.6666666666666666	0.08771929824561403
    15	1	0.30303030303030304	0.12500000000000003	0.6666666666666666	0.052631578947368425
    32	3	0.6969696969696969	1.0	0.6111111111111109	0.7894736842105263
    36	1	0.12121212121212106	0.08333333333333333	0.44444444444444436	0.035087719298245605
    28	2	0.4242424242424241	0.4583333333333333	0.2777777777777778	0.4736842105263158
    
    Updated dataset after performing RFE feature selection:
    id	petal_length	petal_width	species
    15	1.5	0.4	1
    22	4.0	1.3	2
    35	1.7	0.2	1
    38	5.1	1.5	3
    36	1.4	0.3	1
    28	3.9	1.2	2
    19	1.6	0.2	1
    30	5.1	2.4	3
    17	4.1	1.3	2
    34	5.4	2.3	3
    
    Updated dataset after performing scaling on RFE selected features :
    id	species	r_petal_length	r_petal_width
    19	1	0.07017543859649125	0.04166666666666667
    38	3	0.6842105263157894	0.5833333333333334
    12	3	0.6666666666666666	0.7916666666666666
    36	1	0.035087719298245605	0.08333333333333333
    22	2	0.49122807017543857	0.5
    35	1	0.08771929824561403	0.04166666666666667
    17	2	0.5087719298245613	0.5
    34	3	0.7368421052631579	0.9166666666666666
    15	1	0.052631578947368425	0.12500000000000003
    32	3	0.7894736842105263	1.0
    
    Updated dataset after performing scaling for PCA feature selection :
    id	species	sepal_length	sepal_width	petal_length	petal_width
    22	2	0.33333333333333326	0.055555555555555365	0.49122807017543857	0.5
    38	3	0.5757575757575756	0.33333333333333315	0.6842105263157894	0.5833333333333334
    12	3	0.3939393939393939	0.16666666666666657	0.6666666666666666	0.7916666666666666
    15	1	0.30303030303030304	0.6666666666666666	0.052631578947368425	0.12500000000000003
    19	1	0.2121212121212119	0.8888888888888887	0.07017543859649125	0.04166666666666667
    30	3	0.4242424242424241	0.33333333333333315	0.6842105263157894	0.9583333333333333
    36	1	0.12121212121212106	0.44444444444444436	0.035087719298245605	0.08333333333333333
    28	2	0.4242424242424241	0.2777777777777778	0.4736842105263158	0.4583333333333333
    17	2	0.3636363636363634	0.44444444444444436	0.5087719298245613	0.5
    34	3	0.5454545454545454	0.6666666666666666	0.7368421052631579	0.9166666666666666
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	species
    0	26	-0.637053	-0.237890	1
    1	17	0.019809	0.057742	2
    2	22	0.074623	0.428188	2
    3	19	-0.688597	-0.282715	1
    4	38	0.298288	0.083267	3
    5	36	-0.635618	0.157913	1
    6	15	-0.561092	-0.117245	1
    7	20	0.219649	-0.117778	2
    8	34	0.451922	-0.235519	3
    9	35	-0.590568	-0.109948	1
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : XGBOOST_2 
    Feature Selection Method : pca
    
     Prediction : 
       id  Prediction  Confidence_Lower  Confidence_upper  species
    0  22           2             0.875             0.875        2
    1  38           2             0.500             0.500        3
    2  36           1             1.000             1.000        1
    3  15           1             1.000             1.000        1
    4  34           3             0.750             0.750        3
    5  35           1             1.000             1.000        1
    6  20           2             0.625             0.625        2
    7  19           1             1.000             1.000        1
    8  17           2             0.625             0.625        2
    9  26           1             1.000             1.000        1
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision    Recall        F1  Support
    SeqNum                                                                                       
    0               1  CLASS_1        9        0        0   1.000000  1.000000  1.000000        9
    2               3  CLASS_3        0        0        9   1.000000  0.818182  0.900000       11
    1               2  CLASS_2        0       10        2   0.833333  1.000000  0.909091       10
    
     Confusion Matrix : 
    array([[ 9,  0,  0],
           [ 0, 10,  0],
           [ 0,  2,  9]], dtype=int64)
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	species
    10	1	1.0	1.0	1
    12	3	0.625	0.625	3
    13	3	0.75	0.75	3
    14	2	1.0	1.0	2
    16	2	0.625	0.625	2
    17	2	0.625	0.625	2
    15	1	1.0	1.0	1
    11	2	0.625	0.625	2
    9	1	1.0	1.0	1
    8	3	0.75	0.75	3