Run AutoClassifier for classification problem using early stopping timer - Example 3: Run AutoClassifier for Classification Problem using Early Stopping Timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predict whether passenger aboard the RMS Titanic survived or not based on different factors.

Run AutoClassifier to get the best performing model out of available models with following specifications:

  • Use all default models except ‘knn’.
  • Set early stopping timer to 300 sec.
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "titanic")
      >>> titanic = DataFrame.from_table("titanic")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoClassifier instance.
    >>> aml = AutoClassifier(exclude='knn',
                             verbose=2,
                             max_runtime_secs=300)
  3. Fit the data.
    >>> aml.fit(titanic_train, 'survived')
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    
    Data Overview:
    Total Rows in the data: 713
    Total Columns in the data: 12
    
    Column Summary:
    ColumnName    Datatype    NonNullCount    NullCount    BlankCount    ZeroCount    PositiveCount    NegativeCount    NullPercentage    NonNullPercentage
    name    VARCHAR(1000) CHARACTER SET LATIN    713    0    0    None    None    None    0.0    100.0
    ticket    VARCHAR(20) CHARACTER SET LATIN    713    0    0    None    None    None    0.0    100.0
    cabin    VARCHAR(20) CHARACTER SET LATIN    168    545    0    None    None    None    76.43758765778401    23.56241234221599
    survived    INTEGER    713    0    None    436    277    0    0.0    100.0
    passenger    INTEGER    713    0    None    0    713    0    0.0    100.0
    embarked    VARCHAR(20) CHARACTER SET LATIN    711    2    0    None    None    None    0.2805049088359046    99.71949509116409
    fare    FLOAT    713    0    None    14    699    0    0.0    100.0
    age    INTEGER    573    140    None    7    566    0    19.635343618513325    80.36465638148668
    pclass    INTEGER    713    0    None    0    713    0    0.0    100.0
    parch    INTEGER    713    0    None    541    172    0    0.0    100.0
    
    Statistics of Data:
    func    passenger    survived    pclass    age    sibsp    parch    fare
    min    1    0    1    0    0    0    0
    std    256.436    0.488    0.827    14.688    1.113    0.795    48.047
    25%    225    0    2    20    0    0    7.896
    50%    434    0    3    28    0    0    13.792
    75%    675    1    3    38    1    0    31
    max    890    1    3    80    8    6    512.329
    mean    444.849    0.388    2.327    29.524    0.523    0.381    31.806
    count    713    713    713    573    713    713    713
    
    Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    name                      713       
    sex                       2         
    ticket                    564       
    cabin                     123       
    embarked                  3         
    
    Futile columns in dataset:
    ColumnName
    name
    ticket
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
      ColumnName  OutlierPercentage
    0       fare          13.604488
    1      sibsp           5.049088
    2        age          20.476858
    3      parch          24.123422
                                                                                            
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                            
    Feature Engineering started ...
                                                                                            
    Handling duplicate records present in dataset ...
    Analysis complete. No action taken.                                                     
                                                                                            
    Total time to handle duplicate records: 1.87 sec
                                                                                            
    Handling less significant features from data ...
                                                                                            
    Removing Futile columns:
    ['ticket', 'name']
                                                                                            
    Sample of Data after removing Futile columns:
    passenger    survived    pclass    sex    age    sibsp    parch    fare    cabin    embarked    id
    591    0    3    male    35    0    0    7.125    None    S    11
    326    1    1    female    36    0    0    135.6333    C32    C    13
    732    0    3    male    11    0    0    18.7875    None    C    21
    469    0    3    male    None    0    0    7.725    None    Q    8
    631    1    1    male    80    0    0    30.0    A23    S    10
    200    0    2    female    24    0    0    13.0    None    S    18
    265    0    3    female    None    0    0    7.75    None    Q    9
    530    0    2    male    23    2    1    11.5    None    S    17
    80    1    3    female    30    0    0    12.475    None    S    12
    345    0    2    male    36    0    0    13.0    None    S    20
                                                                                            
    Total time to handle less significant features: 23.32 sec
                                                                                             
    Handling Date Features ...
    Dataset does not contain any feature related to dates.                                   
                                                                                             
    Total time to handle date features: 0.00 sec
                                                                                             
    Checking Missing values in dataset ...
                                                                                             
    Columns with their missing values:
    age: 140
    cabin: 545
    embarked: 2
                                                                                             
    Deleting rows of these columns for handling missing values:
    ['embarked']
                                                                                             
    Dropping these columns for handling missing values:
    ['cabin']
                                                                                             
    Total time to find missing values in data: 9.88 sec
                                                                                             
    Imputing Missing Values ...
                                                                                             
    Columns with their imputation method:
    age: mean
                                                                                             
    Sample of Data after Imputation:
    passenger    survived    pclass    sex    age    sibsp    parch    fare    embarked    id
    427    1    2    female    28    1    0    26.0    S    31
    692    1    3    female    4    0    1    13.4167    C    47
    753    0    3    male    33    0    0    9.5    S    55
    202    0    3    male    29    8    2    69.55    S    63
    528    0    1    male    29    0    0    221.7792    S    79
    793    0    3    female    29    8    2    69.55    S    87
    671    1    2    female    40    1    1    39.0    S    71
    223    0    3    male    51    0    0    8.05    S    39
    162    1    2    female    40    0    0    15.75    S    23
    835    0    3    male    18    0    0    8.3    S    15
                                                                                             
    Time taken to perform imputation: 18.76 sec
                                                                                             
    Performing encoding for categorical columns ...
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1711363215070477"'18
                                                                                             
    ONE HOT Encoding these Columns:
    ['sex', 'embarked']
                                                                                             
    Time taken to encode the columns: 13.92 sec
                                                                                             
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                             
    Data preparation started ...
                                                                                             
    Spliting of dataset into training and testing ...
    Training size : 0.8                                                                      
    Testing size  : 0.2                                                                      
                                                                                             
    Training data sample
    passenger    survived    pclass    sex_0    sex_1    age    sibsp    parch    fare    embarked_0    embarked_1    embarked_2    id
    80    1    3    1    0    30    0    0    12.475    0    0    1    12
    326    1    1    1    0    36    0    0    135.6333    1    0    0    13
    732    0    3    0    1    11    0    0    18.7875    1    0    0    21
    835    0    3    0    1    18    0    0    8.3    0    0    1    15
    591    0    3    0    1    35    0    0    7.125    0    0    1    11
    387    0    3    0    1    1    5    2    46.9    0    0    1    19
    631    1    1    0    1    80    0    0    30.0    0    0    1    10
    200    0    2    1    0    24    0    0    13.0    0    0    1    18
    734    0    2    0    1    23    0    0    13.0    0    0    1    14
    61    0    3    0    1    22    0    0    7.2292    1    0    0    22
                                                                                             
    Testing data sample
    passenger    survived    pclass    sex_0    sex_1    age    sibsp    parch    fare    embarked_0    embarked_1    embarked_2    id
    101    0    3    1    0    28    0    0    7.8958    0    0    1    25
    856    1    3    1    0    18    0    1    9.35    0    0    1    27
    768    0    3    1    0    30    0    0    7.75    0    1    0    123
    385    0    3    0    1    29    0    0    7.8958    0    0    1    29
    509    0    3    0    1    28    0    0    22.525    0    0    1    24
    850    1    1    1    0    29    1    0    89.1042    1    0    0    120
    795    0    3    0    1    25    0    0    7.8958    0    0    1    30
    116    0    3    0    1    21    0    0    7.925    0    0    1    126
    427    1    2    1    0    28    1    0    26.0    0    0    1    31
    97    0    1    0    1    71    0    0    34.6542    1    0    0    127
                                                                                             
    Time taken for spliting of data: 10.93 sec
                                                                                             
    Outlier preprocessing ...
    Columns with outlier percentage :-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
      ColumnName  OutlierPercentage
    0      sibsp           5.063291
    1        age           7.594937
    2       fare          13.361463
    3      parch          24.191280
                                                                                             
    Deleting rows of these columns:
    ['sibsp', 'age']
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1711358625917026"'18
                                                                                             
    median inplace of outliers:
    ['fare', 'parch']
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1711358870160237"'18
                                                                                             
    Time Taken by Outlier processing: 55.26 sec
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1711366773447320"'18
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1711366579308682"'
                                                                                             
    Checking imbalance data ...
                                                                                             
    Imbalance Not Found.
                                                                                              
    Feature selection using lasso ...
                                                                                              
    feature selected by lasso:
    ['sibsp', 'age', 'fare', 'embarked_1', 'sex_0', 'pclass', 'embarked_0', 'passenger']
                                                                                              
    Total time taken by feature selection: 2.17 sec
                                                                                              
    scaling Features of lasso data ...
                                                                                              
    columns that will be scaled:
    ['sibsp', 'age', 'fare', 'pclass', 'passenger']
                                                                                              
    Training dataset sample after scaling:
    survived    id    sex_0    embarked_1    embarked_0    sibsp    age    fare    pclass    passenger
    1    152    1    0    1    0.5    0.47058823529411764    0.24312807017543858    0.5    0.9741282339707537
    1    232    1    0    0    1.0    0.0196078431372549    0.6842105263157895    0.5    0.6951631046119235
    0    88    0    0    0    0.0    0.43137254901960786    0.13421052631578947    1.0    0.0843644544431946
    1    248    1    0    0    0.5    0.6862745098039216    0.5506578947368421    1.0    0.0281214848143982
    1    288    1    0    0    0.0    0.49019607843137253    0.22807017543859648    0.5    0.7142857142857143
    0    160    0    0    1    0.0    0.6666666666666666    0.5210526315789473    0.0    0.30708661417322836
    0    128    0    0    1    0.0    0.5098039215686274    0.13852280701754385    1.0    0.47244094488188976
    0    72    0    0    0    0.0    0.7058823529411765    0.4236842105263158    1.0    0.9122609673790776
    0    40    0    0    0    0.0    0.5294117647058824    0.12719298245614036    1.0    0.41057367829021374
    1    104    1    0    1    0.0    0.803921568627451    0.22807017543859648    0.0    0.5883014623172104
                                                                                              
    Testing dataset sample after scaling:
    survived    id    sex_0    embarked_1    embarked_0    sibsp    age    fare    pclass    passenger
    1    704    0    0    1    0.0    0.5686274509803921    0.5350877192982456    0.0    0.7109111361079865
    1    27    1    0    0    0.0    0.29411764705882354    0.16403508771929823    1.0    0.9617547806524185
    0    480    0    0    0    0.0    0.6470588235294118    0.13150526315789474    1.0    0.7457817772778402
    1    187    1    0    0    0.5    0.37254901960784315    1.1684210526315788    0.0    0.16985376827896512
    1    371    0    0    0    0.0    0.5294117647058824    0.16666666666666666    1.0    0.3217097862767154
    0    552    0    0    0    0.5    0.6470588235294118    1.3833333333333333    0.0    0.8335208098987626
    0    536    1    0    0    0.0    0.5098039215686274    0.14122807017543862    1.0    0.4668166479190101
    0    448    0    0    0    1.0    0.5686274509803921    1.2894736842105263    0.5    0.7480314960629921
    0    208    0    0    0    0.0    1.1568627450980393    0.46578947368421053    0.0    0.28346456692913385
    1    240    1    0    0    0.0    0.6078431372549019    0.22807017543859648    0.5    0.6479190101237345
                                                                                              
    Total time taken by feature scaling: 51.96 sec
                                                                                              
    Feature selection using rfe ...
                                                                                              
    feature selected by RFE:
    ['sibsp', 'embarked_0', 'embarked_2', 'age', 'embarked_1', 'sex_1', 'parch', 'sex_0', 'pclass', 'passenger', 'fare']
                                                                                              
    Total time taken by feature selection: 17.26 sec
                                                                                              
    scaling Features of rfe data ...
                                                                                              
    columns that will be scaled:
    ['r_sibsp', 'r_age', 'r_pclass', 'r_passenger', 'r_fare']
                                                                                              
    Training dataset sample after scaling:
    survived    id    r_parch    r_embarked_0    r_sex_1    r_embarked_1    r_sex_0    r_embarked_2    r_sibsp    r_age    r_pclass    r_passenger    r_fare
    1    152    0    1    0    0    1    0    0.5    0.47058823529411764    0.5    0.9741282339707537    0.24312807017543858
    1    232    0    0    0    0    1    1    1.0    0.0196078431372549    0.5    0.6951631046119235    0.6842105263157895
    0    88    0    0    1    0    0    1    0.0    0.43137254901960786    1.0    0.0843644544431946    0.13421052631578947
    1    248    0    0    0    0    1    1    0.5    0.6862745098039216    1.0    0.0281214848143982    0.5506578947368421
    1    288    0    0    0    0    1    1    0.0    0.49019607843137253    0.5    0.7142857142857143    0.22807017543859648
    0    160    0    1    1    0    0    0    0.0    0.6666666666666666    0.0    0.30708661417322836    0.5210526315789473
    0    128    0    1    1    0    0    0    0.0    0.5098039215686274    1.0    0.47244094488188976    0.13852280701754385
    0    72    0    0    1    0    0    1    0.0    0.7058823529411765    1.0    0.9122609673790776    0.4236842105263158
    0    40    0    0    1    0    0    1    0.0    0.5294117647058824    1.0    0.41057367829021374    0.12719298245614036
    1    104    0    1    0    0    1    0    0.0    0.803921568627451    0.0    0.5883014623172104    0.22807017543859648
                                                                                              
    Testing dataset sample after scaling:
    survived    id    r_parch    r_embarked_0    r_sex_1    r_embarked_1    r_sex_0    r_embarked_2    r_sibsp    r_age    r_pclass    r_passenger    r_fare
    1    704    0    1    1    0    0    0    0.0    0.5686274509803921    0.0    0.7109111361079865    0.5350877192982456
    1    27    1    0    0    0    1    1    0.0    0.29411764705882354    1.0    0.9617547806524185    0.16403508771929823
    0    480    0    0    1    0    0    1    0.0    0.6470588235294118    1.0    0.7457817772778402    0.13150526315789474
    1    187    0    0    0    0    1    1    0.5    0.37254901960784315    0.0    0.16985376827896512    1.1684210526315788
    1    371    0    0    1    0    0    1    0.0    0.5294117647058824    1.0    0.3217097862767154    0.16666666666666666
    0    552    0    0    1    0    0    1    0.5    0.6470588235294118    0.0    0.8335208098987626    1.3833333333333333
    0    536    0    0    0    0    1    1    0.0    0.5098039215686274    1.0    0.4668166479190101    0.14122807017543862
    0    448    0    0    1    0    0    1    1.0    0.5686274509803921    0.5    0.7480314960629921    1.2894736842105263
    0    208    0    0    1    0    0    1    0.0    1.1568627450980393    0.0    0.28346456692913385    0.46578947368421053
    1    240    0    0    0    0    1    1    0.0    0.6078431372549019    0.5    0.6479190101237345    0.22807017543859648
                                                                                              
    Total time taken by feature scaling: 56.05 sec
                                                                                              
    scaling Features of pca data ...
                                                                                              
    columns that will be scaled:
    ['passenger', 'pclass', 'age', 'sibsp', 'fare']
                                                                                              
    Training dataset sample after scaling:
    survived    embarked_2    id    sex_0    embarked_1    sex_1    parch    embarked_0    passenger    pclass    age    sibsp    fare
    0    0    8    0    1    1    0    0    0.5264341957255343    1.0    0.5098039215686274    0.0    0.1355263157894737
    0    0    9    1    1    0    0    0    0.296962879640045    1.0    0.5098039215686274    0.0    0.13596491228070176
    0    1    17    0    0    1    0    0    0.595050618672666    0.5    0.39215686274509803    1.0    0.20175438596491227
    1    0    13    1    0    0    0    1    0.3655793025871766    0.0    0.6470588235294118    0.0    0.22807017543859648
    0    1    11    0    0    1    0    0    0.6636670416197975    1.0    0.6274509803921569    0.0    0.125
    1    1    35    0    0    1    0    0    0.8008998875140607    0.0    0.8823529411764706    0.5    0.9122807017543859
    1    1    12    1    0    0    0    0    0.08886389201349831    1.0    0.5294117647058824    0.0    0.218859649122807
    0    1    20    0    0    1    0    0    0.38695163104611924    0.5    0.6470588235294118    0.0    0.22807017543859648
    0    1    15    0    0    1    0    0    0.9381327334083239    1.0    0.29411764705882354    0.0    0.1456140350877193
    1    1    23    1    0    0    0    0    0.18110236220472442    0.5    0.7254901960784313    0.0    0.27631578947368424
                                                                                              
    Testing dataset sample after scaling:
    survived    embarked_2    id    sex_0    embarked_1    sex_1    parch    embarked_0    passenger    pclass    age    sibsp    fare
    1    1    27    1    0    0    1    0    0.9617547806524185    1.0    0.29411764705882354    0.0    0.16403508771929823
    1    1    26    1    0    0    0    0    0.06299212598425197    0.5    0.35294117647058826    0.0    0.18421052631578946
    0    1    122    0    0    1    0    0    0.6546681664791901    0.5    1.0    0.0    0.45614035087719296
    0    1    30    0    0    1    0    0    0.8931383577052868    1.0    0.43137254901960786    0.0    0.13852280701754385
    1    0    28    1    1    0    0    0    0.27109111361079863    1.0    0.5098039215686274    0.5    0.2719298245614035
    0    1    124    0    0    1    0    0    0.6782902137232846    1.0    0.803921568627451    0.0    0.14122807017543862
    0    1    29    0    0    1    0    0    0.43194600674915634    1.0    0.5098039215686274    0.0    0.13852280701754385
    0    1    125    0    0    1    0    0    0.10573678290213723    1.0    1.0980392156862746    0.0    0.12719298245614036
    1    1    31    1    0    0    0    0    0.47919010123734535    0.5    0.49019607843137253    0.5    0.45614035087719296
    0    0    127    0    0    1    0    1    0.10798650168728909    0.0    1.3333333333333333    0.0    0.6079684210526316
                                                                                              
    Total time taken by feature scaling: 48.81 sec
                                                                                              
    Dimension Reduction using pca ...
                                                                                              
    PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5']
                                                                                              
    Total time taken by PCA: 13.50 sec
                                                                                              
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
                                                                                              
    Model Training started ...
                                                                                              
    Hyperparameters used for model training:
    response_column : survived                                                                                                                            
    name : xgboost
    model_type : Classification
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1, 0.2)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.1, 0.3)
    max_depth : (5, 6, 8, 10)
    min_node_size : (1, 2, 3)
    iter_num : (10, 20, 30)
    Total number of models for xgboost : 2592
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : survived
    name : svm
    model_type : Classification
    lambda1 : (0.001, 0.02, 0.1)
    alpha : (0.15, 0.85)
    tolerance : (0.001, 0.01)
    learning_rate : OPTIMAL
    initial_eta : (0.05, 0.1)
    momentum : (0.65, 0.8, 0.95)
    nesterov : True
    intercept : True
    iter_num_no_change : (5, 10, 50)
    local_sgd_iterations  : (10, 20)
    iter_max : (300, 200, 400)
    batch_size : (10, 50, 60, 80)
    Total number of models for svm : 5184
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : survived
    name : decision_forest
    tree_type : Classification
    min_impurity : (0.0, 0.1, 0.2)
    max_depth : (5, 6, 8, 10)
    min_node_size : (1, 2, 3)
    num_trees : (-1, 20, 30)
    Total number of models for decision_forest : 108
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : survived
    name : glm
    family : BINOMIAL
    lambda1 : (0.001, 0.02, 0.1)
    alpha : (0.15, 0.85)
    learning_rate : OPTIMAL
    initial_eta : (0.05, 0.1)
    momentum : (0.65, 0.8, 0.95)
    iter_num_no_change : (5, 10, 50)
    iter_max : (300, 200, 400)
    batch_size : (10, 50, 60, 80)
    Total number of models for glm : 1296
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
                                                                                              
    Performing hyperParameter tuning ...
                                                                                              
    xgboost
    XGBOOST_3                                                                                                                                                                                               
    XGBOOST_1                                                                                 
    XGBOOST_2                                                                                 
                                                                                              
    ----------------------------------------------------------------------------------------------------
                                                                                              
    svm
    SVM_3                                                                                                                                                                                                   
    SVM_1                                                                                     
    SVM_2                                                                                     
                                                                                              
    ----------------------------------------------------------------------------------------------------
                                                                                              
    decision_forest
    DECISIONFOREST_3                                                                                                                                                                                        
    DECISIONFOREST_1                                                                          
    DECISIONFOREST_2                                                                          
                                                                                              
    ----------------------------------------------------------------------------------------------------
                                                                                              
    glm
    GLM_3                                                                                                                                                                                                   
    GLM_1                                                                                     
    GLM_2                                                                                     
                                                                                              
    ----------------------------------------------------------------------------------------------------
                                                                                              
    Evaluating models performance ...
                                                                                              
    Evaluation completed.
                                                                                              
    Leaderboard
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    glm    lasso    0.790210    0.790210    0.790210    0.790210    0.779288    0.788636    0.782454    0.796451    0.790210    0.791933
    1    2    svm    pca    0.769231    0.769231    0.769231    0.769231    0.761318    0.740909    0.747444    0.766438    0.769231    0.764562
    2    3    xgboost    pca    0.769231    0.769231    0.769231    0.769231    0.787911    0.720455    0.730850    0.779692    0.769231    0.754305
    3    4    decision_forest    pca    0.755245    0.755245    0.755245    0.755245    0.750236    0.719318    0.726995    0.753037    0.755245    0.747261
    4    5    glm    rfe    0.741259    0.741259    0.741259    0.741259    0.755490    0.687500    0.694391    0.749545    0.741259    0.722010
    5    6    glm    pca    0.727273    0.727273    0.727273    0.727273    0.712233    0.713636    0.712896    0.728243    0.727273    0.727722
    6    7    svm    rfe    0.692308    0.692308    0.692308    0.692308    0.706685    0.715909    0.691084    0.738315    0.692308    0.695571
    7    8    svm    lasso    0.671329    0.671329    0.671329    0.671329    0.697538    0.702273    0.671071    0.732135    0.671329    0.673195
    8    9    decision_forest    lasso    0.615385    0.615385    0.615385    0.615385    0.307692    0.500000    0.380952    0.378698    0.615385    0.468864
    9    10    decision_forest    rfe    0.615385    0.615385    0.615385    0.615385    0.307692    0.500000    0.380952    0.378698    0.615385    0.468864
    10    11    xgboost    lasso    0.384615    0.384615    0.384615    0.384615    0.192308    0.500000    0.277778    0.147929    0.384615    0.213675
    11    12    xgboost    rfe    0.384615    0.384615    0.384615    0.384615    0.192308    0.500000    0.277778    0.147929    0.384615    0.213675
                                                                                              
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 18/18  
  4. Display model leaderboard.
    >>> aml.leaderboard()
    
    # Display leaderboard.
    >>> aml.leaderboard()
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    glm    lasso    0.790210    0.790210    0.790210    0.790210    0.779288    0.788636    0.782454    0.796451    0.790210    0.791933
    1    2    svm    pca    0.769231    0.769231    0.769231    0.769231    0.761318    0.740909    0.747444    0.766438    0.769231    0.764562
    2    3    xgboost    pca    0.769231    0.769231    0.769231    0.769231    0.787911    0.720455    0.730850    0.779692    0.769231    0.754305
    3    4    decision_forest    pca    0.755245    0.755245    0.755245    0.755245    0.750236    0.719318    0.726995    0.753037    0.755245    0.747261
    4    5    glm    rfe    0.741259    0.741259    0.741259    0.741259    0.755490    0.687500    0.694391    0.749545    0.741259    0.722010
    5    6    glm    pca    0.727273    0.727273    0.727273    0.727273    0.712233    0.713636    0.712896    0.728243    0.727273    0.727722
    6    7    svm    rfe    0.692308    0.692308    0.692308    0.692308    0.706685    0.715909    0.691084    0.738315    0.692308    0.695571
    7    8    svm    lasso    0.671329    0.671329    0.671329    0.671329    0.697538    0.702273    0.671071    0.732135    0.671329    0.673195
    8    9    decision_forest    lasso    0.615385    0.615385    0.615385    0.615385    0.307692    0.500000    0.380952    0.378698    0.615385    0.468864
    9    10    decision_forest    rfe    0.615385    0.615385    0.615385    0.615385    0.307692    0.500000    0.380952    0.378698    0.615385    0.468864
    10    11    xgboost    lasso    0.384615    0.384615    0.384615    0.384615    0.192308    0.500000    0.277778    0.147929    0.384615    0.213675
    11    12    xgboost    rfe    0.384615    0.384615    0.384615    0.384615    0.192308    0.500000    0.277778    0.147929    0.384615    0.213675
  5. Display the best performing model.
    >>> aml.leader()
    
    Rank    Name    Feature selection    Accuracy    Micro-Precision    Micro-Recall    Micro-F1    Macro-Precision    Macro-Recall    Macro-F1    Weighted-Precision    Weighted-Recall    Weighted-F1
    0    1    glm    lasso    0.79021    0.79021    0.79021    0.79021    0.779288    0.788636    0.782454    0.796451    0.79021    0.791933
  6. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    glm lasso
    
     Prediction : 
        id  prediction      prob  survived
    0  704         1.0  0.564033         1
    1   27         1.0  0.772116         1
    2  480         0.0  0.969111         0
    3  187         1.0  0.995204         1
    4  371         0.0  0.938191         1
    5  552         0.0  0.532455         0
    6  536         1.0  0.848277         0
    7  448         0.0  0.670397         0
    8  208         0.0  0.770221         0
    9  240         1.0  0.905335         1
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    0               0  CLASS_1       70       12   0.853659  0.795455  0.823529       88
    1               1  CLASS_2       18       43   0.704918  0.781818  0.741379       55
    
     ROC-AUC : 
    AUC    GINI
    0.7086776859504131    0.4173553719008263
    threshold_value    tpr    fpr
    0.04081632653061224    0.7818181818181819    0.20454545454545456
    0.08163265306122448    0.7818181818181819    0.20454545454545456
    0.1020408163265306    0.7818181818181819    0.20454545454545456
    0.12244897959183673    0.7818181818181819    0.20454545454545456
    0.16326530612244897    0.7818181818181819    0.20454545454545456
    0.18367346938775508    0.7818181818181819    0.20454545454545456
    0.14285714285714285    0.7818181818181819    0.20454545454545456
    0.061224489795918366    0.7818181818181819    0.20454545454545456
    0.02040816326530612    0.7818181818181819    0.20454545454545456
    0.0    1.0    1.0
    
     Confusion Matrix : 
    array([[70, 18],
           [12, 43]], dtype=int64)
           
    
    >>> prediction.head()
    
    id    prediction    prob    survived
    26    1.0    0.9603717932806239    1
    28    1.0    0.9579491624276063    1
    29    0.0    0.9479066260110912    0
    30    0.0    0.9703120438864229    0
    120    1.0    0.9981173034336402    1
    121    0.0    0.8992972414699342    0
    31    1.0    0.9539365867420146    1
    27    1.0    0.7721157810184932    1
    25    1.0    0.9023805477121637    0
    24    0.0    0.9369559895524328    0
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(titanic_test)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping futile columns :
    passenger    survived    pclass    sex    age    sibsp    parch    fare    cabin    embarked    id
    814    0    3    female    6    4    2    31.275    None    S    8
    852    0    3    male    74    0    0    7.775    None    S    12
    198    0    3    male    42    0    1    8.4042    None    S    20
    17    0    3    male    2    4    1    29.125    None    Q    9
    570    1    3    male    32    0    0    7.8542    None    S    15
    282    0    3    male    28    0    0    7.8542    None    S    23
    122    0    3    male    None    0    0    8.05    None    S    11
    448    1    1    male    34    0    0    26.55    None    S    19
    305    0    3    male    None    0    0    8.05    None    S    13
    650    1    3    female    23    0    0    7.55    None    S    21
    
    Updated dataset after performing target column transformation :
    sibsp    id    cabin    age    fare    parch    embarked    pclass    passenger    sex    survived
    0    12    None    74    7.775    0    S    3    852    male    0
    0    13    None    None    8.05    0    S    3    305    male    0
    0    21    None    23    7.55    0    S    3    650    female    1
    1    10    None    14    11.2417    0    C    3    40    female    1
    4    9    None    2    29.125    1    Q    3    17    male    0
    0    17    None    27    30.5    0    S    1    608    male    1
    1    14    None    28    24.0    0    C    2    875    female    1
    1    22    None    42    52.0    0    S    1    36    male    0
    0    15    None    32    7.8542    0    S    3    570    male    1
    0    23    None    28    7.8542    0    S    3    282    male    0
    
    Updated dataset after dropping missing value containing columns : 
    sibsp    id    age    fare    parch    embarked    pclass    passenger    sex    survived
    1    14    28    24.0    0    C    2    875    female    1
    0    15    32    7.8542    0    S    3    570    male    1
    0    23    28    7.8542    0    S    3    282    male    0
    0    13    None    8.05    0    S    3    305    male    0
    4    8    6    31.275    2    S    3    814    female    0
    1    16    19    26.0    0    S    2    547    female    1
    0    11    None    8.05    0    S    3    122    male    0
    0    19    34    26.55    0    S    1    448    male    1
    4    9    2    29.125    1    Q    3    17    male    0
    0    17    27    30.5    0    S    1    608    male    1
    
    Updated dataset after imputing missing value containing columns :
    sibsp    id    age    fare    parch    embarked    pclass    passenger    sex    survived
    1    182    27    21.0    0    S    2    42    female    0
    1    118    29    26.0    0    S    2    134    female    1
    1    142    11    120.0    2    S    1    803    male    1
    1    150    48    39.6    0    C    1    557    female    1
    1    27    22    29.0    1    S    2    324    female    1
    1    35    25    7.925    0    S    3    730    female    0
    1    57    25    17.8    0    S    3    354    male    0
    1    61    54    78.2667    0    C    1    497    female    1
    1    109    60    79.2    1    C    1    588    male    1
    1    125    32    26.0    0    S    2    544    male    1
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1711360050913278"'
    
    Updated dataset after performing categorical encoding :
    sibsp    id    age    fare    parch    embarked_0    embarked_1    embarked_2    pclass    passenger    sex_0    sex_1    survived
    1    142    11    120.0    2    0    0    1    1    803    0    1    1
    1    27    22    29.0    1    0    0    1    2    324    1    0    1
    1    47    48    76.7292    0    1    0    0    1    646    0    1    1
    1    118    29    26.0    0    0    0    1    2    134    1    0    1
    1    139    39    31.275    5    0    0    1    3    14    0    1    0
    1    179    30    24.0    0    1    0    0    2    309    0    1    0
    4    8    6    31.275    2    0    0    1    3    814    1    0    0
    4    123    3    31.3875    2    0    0    1    3    262    0    1    1
    4    133    8    29.125    1    0    1    0    3    788    0    1    0
    4    55    7    39.6875    1    0    0    1    3    51    0    1    0
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1711359376400501"'
    
    Updated dataset after performing Lasso feature selection:
    id    sibsp    age    fare    embarked_1    sex_0    pclass    embarked_0    passenger    survived
    118    1    29    26.0    0    1    2    0    134    1
    109    1    60    79.2    0    0    1    1    588    1
    125    1    32    26.0    0    0    2    0    544    1
    14    1    28    24.0    0    1    2    1    875    1
    10    1    14    11.2417    0    1    3    1    40    1
    26    1    4    23.0    0    1    2    0    751    1
    149    0    29    8.4583    1    0    3    0    6    0
    62    0    18    14.4542    0    1    3    1    703    0
    86    0    25    13.0    0    0    2    0    344    0
    102    0    30    10.5    0    0    2    0    220    0
    
    Updated dataset after performing scaling on Lasso selected features :
    survived    id    sex_0    embarked_1    embarked_0    sibsp    age    fare    pclass    passenger
    0    86    0    0    0    0.0    0.43137254901960786    0.22807017543859648    0.5    0.3858267716535433
    0    126    0    0    0    0.0    0.5490196078431373    0.8858912280701755    0.0    0.9752530933633295
    0    134    0    0    1    0.0    0.7254901960784313    0.4863298245614035    0.0    0.03374578177727784
    0    158    1    0    0    0.0    0.37254901960784315    0.17258771929824562    1.0    0.5331833520809899
    0    198    0    0    0    0.0    1.3137254901960784    0.18421052631578946    0.5    0.7559055118110236
    0    11    0    0    0    0.0    0.5098039215686274    0.14122807017543862    1.0    0.1361079865016873
    1    150    1    0    1    0.5    0.8823529411764706    0.6947368421052632    0.0    0.625421822272216
    1    47    0    0    1    0.5    0.8823529411764706    1.3461263157894738    0.0    0.7255343082114736
    1    118    1    0    0    0.5    0.5098039215686274    0.45614035087719296    0.5    0.14960629921259844
    1    162    1    1    0    0.5    0.5098039215686274    0.2719298245614035    1.0    0.2092238470191226
    
    Updated dataset after performing RFE feature selection:
    id    sibsp    embarked_0    embarked_2    age    embarked_1    sex_1    parch    sex_0    pclass    passenger    fare    survived
    118    1    0    1    29    0    0    0    1    2    134    26.0    1
    109    1    1    0    60    0    1    1    0    1    588    79.2    1
    125    1    0    1    32    0    1    0    0    2    544    26.0    1
    14    1    1    0    28    0    0    0    1    2    875    24.0    1
    10    1    1    0    14    0    0    0    1    3    40    11.2417    1
    26    1    0    1    4    0    0    1    1    2    751    23.0    1
    149    0    0    0    29    1    1    0    0    3    6    8.4583    0
    62    0    1    0    18    0    0    1    1    3    703    14.4542    0
    86    0    0    1    25    0    1    0    0    2    344    13.0    0
    102    0    0    1    30    0    1    0    0    2    220    10.5    0
    
    Updated dataset after performing scaling on RFE selected features :
    survived    id    r_parch    r_embarked_0    r_embarked_1    r_sex_1    r_embarked_2    r_sex_0    r_sibsp    r_age    r_pclass    r_passenger    r_fare
    1    118    0    0    0    0    1    1    0.5    0.5098039215686274    0.5    0.14960629921259844    0.45614035087719296
    1    109    1    1    0    1    0    0    0.5    1.1176470588235294    0.0    0.6602924634420697    1.3894736842105264
    1    125    0    0    0    1    1    0    0.5    0.5686274509803921    0.5    0.6107986501687289    0.45614035087719296
    1    14    0    1    0    0    0    1    0.5    0.49019607843137253    0.5    0.983127109111361    0.42105263157894735
    1    10    0    1    0    0    0    1    0.5    0.21568627450980393    1.0    0.043869516310461196    0.19722280701754386
    1    26    1    0    0    0    1    1    0.5    0.0196078431372549    0.5    0.843644544431946    0.40350877192982454
    0    149    0    0    1    1    0    0    0.0    0.5098039215686274    1.0    0.00562429696287964    0.14839122807017543
    0    62    1    1    0    0    0    1    0.0    0.29411764705882354    1.0    0.7896512935883014    0.25358245614035085
    0    86    0    0    0    1    1    0    0.0    0.43137254901960786    0.5    0.3858267716535433    0.22807017543859648
    0    102    0    0    0    1    1    0    0.0    0.5294117647058824    0.5    0.24634420697412823    0.18421052631578946
    
    Updated dataset after performing scaling for PCA feature selection :
    survived    embarked_2    id    embarked_1    parch    sex_1    sex_0    embarked_0    passenger    pclass    age    sibsp    fare
    1    1    118    0    0    0    1    0    0.14960629921259844    0.5    0.5098039215686274    0.5    0.45614035087719296
    1    0    109    0    1    1    0    1    0.6602924634420697    0.0    1.1176470588235294    0.5    1.3894736842105264
    1    1    125    0    0    1    0    0    0.6107986501687289    0.5    0.5686274509803921    0.5    0.45614035087719296
    1    0    14    0    0    0    1    1    0.983127109111361    0.5    0.49019607843137253    0.5    0.42105263157894735
    1    0    10    0    0    0    1    1    0.043869516310461196    1.0    0.21568627450980393    0.5    0.19722280701754386
    1    1    26    0    1    0    1    0    0.843644544431946    0.5    0.0196078431372549    0.5    0.40350877192982454
    0    0    149    1    0    1    0    0    0.00562429696287964    1.0    0.5098039215686274    0.0    0.14839122807017543
    0    0    62    0    1    0    1    1    0.7896512935883014    1.0    0.29411764705882354    0.0    0.25358245614035085
    0    1    86    0    0    1    0    0    0.3858267716535433    0.5    0.43137254901960786    0.0    0.22807017543859648
    0    1    102    0    0    1    0    0    0.24634420697412823    0.5    0.5294117647058824    0.0    0.18421052631578946
    
    Updated dataset after performing PCA feature selection :
    id    col_0    col_1    col_2    col_3    col_4    col_5    survived
    0    150    0.757581    0.807563    -0.847880    0.040376    -0.017386    0.240675    1
    1    149    0.248643    0.398625    0.483062    0.064297    0.386706    -0.275761    0
    2    47    0.163710    1.028822    -1.157471    0.160891    -0.049759    0.451629    1
    3    62    0.545024    1.099536    0.539294    0.163490    -0.384128    0.033723    0
    4    118    0.117377    -0.386966    -0.162140    0.082217    0.458866    0.196698    1
    ...    ...    ...    ...    ...    ...    ...    ...    ...
    173    67    -0.575225    -0.117044    -0.232419    0.084700    -0.027788    -0.168891    0
    174    110    0.485727    0.946989    0.229257    -0.466639    0.329940    -0.236554    0
    175    30    -0.159525    1.195071    0.030579    -0.422641    -0.167647    -0.079639    0
    176    141    0.087384    1.076244    -0.926916    0.057947    -0.179088    -0.133110    0
    177    77    -0.424272    -0.200842    -0.801291    0.343286    -0.035128    -0.070753    0
    178 rows × 8 columns
    
    Data Transformation completed.
    glm lasso
    
     Prediction : 
        id  prediction      prob  survived
    0   86         0.0  0.871985         0
    1  126         0.0  0.752665         0
    2  134         1.0  0.734018         0
    3  158         1.0  0.854871         0
    4  198         0.0  0.958001         0
    5   11         0.0  0.923132         0
    6  150         1.0  0.994359         1
    7   47         1.0  0.791938         1
    8  118         1.0  0.969970         1
    9  162         1.0  0.961282         1
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    0               0  CLASS_1       83       20   0.805825  0.734513  0.768519      113
    1               1  CLASS_2       30       45   0.600000  0.692308  0.642857       65
    
     ROC-AUC : 
    AUC    GINI
    0.6215112321307011    0.24302246426140228
    threshold_value    tpr    fpr
    0.04081632653061224    0.6923076923076923    0.26548672566371684
    0.08163265306122448    0.6923076923076923    0.26548672566371684
    0.1020408163265306    0.6923076923076923    0.26548672566371684
    0.12244897959183673    0.6923076923076923    0.26548672566371684
    0.16326530612244897    0.6923076923076923    0.26548672566371684
    0.18367346938775508    0.6923076923076923    0.26548672566371684
    0.14285714285714285    0.6923076923076923    0.26548672566371684
    0.061224489795918366    0.6923076923076923    0.26548672566371684
    0.02040816326530612    0.6923076923076923    0.26548672566371684
    0.0    1.0    1.0
    
     Confusion Matrix : 
    array([[83, 30],
           [20, 45]], dtype=int64)
           
    
    >>> prediction.head()
    id    prediction    prob    survived
    10    1.0    0.9860207773053312    1
    12    0.0    0.9861224826327603    0
    13    0.0    0.9411445593307792    0
    14    1.0    0.9784749370056496    1
    16    1.0    0.9511307192227907    1
    17    0.0    0.7636821536786946    1
    15    0.0    0.9620817432767261    1
    11    0.0    0.9231315492751757    0
    9    0.0    0.5281838121774518    0
    8    1.0    0.9229986368509839    0