AutoML for classification using early stopping timer and customization - Example 4: Run AutoML for Classification Problem using Early Stopping Timer and Customization - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
lifecycle
latest
Product Category
Teradata Vantage

This example predict whether passenger aboard the RMS Titanic survived or not based on different factors.

Run AutoML to get the best performing model using following specifications:
  • Add customization for some specific process in AutoML run.
  • Use only two models ‘xgboost’ and ‘decision forest’ for AutoML training.
  • Set early stopping timer to 300 sec.
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "titanic")
      >>> titanic = DataFrame.from_table("titanic")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Add customization and generate custom config JSON file.
    >>> AutoML.generate_custom_config("custom_titanic")
    Generating custom config JSON for AutoML ...
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  1
    
    Customizing Feature Engineering Phase ...
    
    Available options for customization of feature engineering phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Missing Value Handling
    
    Index 2: Customize Bincode Encoding
    
    Index 3: Customize String Manipulation
    
    Index 4: Customize Categorical Encoding
    
    Index 5: Customize Mathematical Transformation
    
    Index 6: Customize Nonlinear Transformation
    
    Index 7: Customize Antiselect Features
    
    Index 8: Back to main menu
    
    Index 9: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in feature engineering phase:  1,2,4,6,7,8
    
    Customizing Missing Value Handling ...
    
    Provide the following details to customize missing value handling:
    
    Available missing value handling methods with corresponding indices: 
    Index 1: Drop Columns
    Index 2: Drop Rows
    Index 3: Impute Missing values
    
    Enter the list of indices for missing value handling methods :  1,3
    
    Enter the feature or list of features for dropping columns with missing values:  cabin
    
    Available missing value imputation methods with corresponding indices: 
    Index 1: Statistical Imputation
    Index 2: Literal Imputation
    
    Enter the list of corresponding index missing value imputation methods you want to use:  1
    
    Enter the feature or list of features for imputing missing values using statistic values:  age, embarked
    
    Available statistical methods with corresponding indices:
    Index 1: min
    Index 2: max
    Index 3: mean
    Index 4: median
    Index 5: mode
    
    Enter the index of corresponding statistic imputation method for feature age:  4
    
    Enter the index of corresponding statistic imputation method for feature embarked:  5
    
    Customization of missing value handling has been completed successfully.
    
    Customizing Bincode Encoding ...
    
    Provide the following details to customize binning and coding encoding:
    
    Available binning methods with corresponding indices:
    Index 1: Equal-Width
    Index 2: Variable-Width
    
    Enter the feature or list of features for binning:  pclass
    
    Enter the index of corresponding binning method for feature pclass:  2
    
    Enter the number of bins for feature pclass:  2
    
    Available value type of feature for variable binning with corresponding indices:
    Index 1: int
    Index 2: float
    
    Provide the range for bin 1 of feature pclass: 
    
    Enter the index of corresponding value type of feature pclass:  1
    
    Enter the minimum value for bin 1 of feature pclass:  0
    
    Enter the maximum value for bin 1 of feature pclass:  1
    
    Enter the label for bin 1 of feature pclass:  low
    
    Provide the range for bin 2 of feature pclass: 
    
    Enter the index of corresponding value type of feature pclass:  1
    
    Enter the minimum value for bin 2 of feature pclass:  2
    
    Enter the maximum value for bin 2 of feature pclass:  3
    
    Enter the label for bin 2 of feature pclass:  high
    
    Customization of bincode encoding has been completed successfully.
    
    Customizing Categorical Encoding ...
    
    Provide the following details to customize categorical encoding:
    
    Available categorical encoding methods with corresponding indices:
    Index 1: OneHotEncoding
    Index 2: OrdinalEncoding
    Index 3: TargetEncoding
    
    Enter the list of corresponding index categorical encoding methods you want to use:  2,3
    
    Enter the feature or list of features for OrdinalEncoding:  pclass
    
    Enter the feature or list of features for TargetEncoding:  embarked
    
    Available target encoding methods with corresponding indices:
    Index 1: CBM_BETA
    Index 2: CBM_DIRICHLET
    Index 3: CBM_GAUSSIAN_INVERSE_GAMMA
    
    Enter the index of target encoding method for feature embarked:  3
    
    Enter the response column for target encoding method for feature embarked:  survived
    
    Customization of categorical encoding has been completed successfully.
    
    Customizing Nonlinear Transformation ...
    
    Provide the following details to customize nonlinear transformation:
    
    Enter number of non-linear combination you want to make:  1
    
    Provide the details for non-linear combination 1:
    
    Enter the list of target feature/s for non-linear combination 1:  parch, sibsp
    
    Enter the formula for non-linear combination 1:  Y=(X0+X1+1)
    
    Enter the resultant feature for non-linear combination 1:  Family_count
    
    Customization of nonlinear transformation has been completed successfully.
    
    Customizing Antiselect Features ...
    
    Enter the feature or list of features for antiselect:  passenger
    
    Customization of antiselect features has been completed successfully.
    
    Customization of feature engineering phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  2
    
    Customizing Data Preparation Phase ...
    
    Available options for customization of data preparation phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Train Test Split
    
    Index 2: Customize Data Imbalance Handling
    
    Index 3: Customize Outlier Handling
    
    Index 4: Customize Feature Scaling
    
    Index 5: Back to main menu
    
    Index 6: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in data preparation phase:  1,2,3,4,5
    
    Customizing Train Test Split ...
    
    Enter the train size for train test split:  0.75
    
    Customization of train test split has been completed successfully.
    
    Customizing Data Imbalance Handling ...
    
    Available data sampling methods with corresponding indices:
    Index 1: SMOTE
    Index 2: NearMiss
    
    Enter the corresponding index data imbalance handling method:  1
    
    Customization of data imbalance handling has been completed successfully.
    
    Customizing Outlier Handling ...
    
    Available outlier detection methods with corresponding indices:
    Index 1: percentile
    Index 2: tukey
    Index 3: carling
    
    Enter the corresponding index oulier handling method:  1
    
    Enter the lower percentile value for outlier handling:  0.1
    
    Enter the upper percentile value for outlier handling:  0.9
    
    Enter the feature or list of features for outlier handling:  fare
    
    Available outlier replacement methods with corresponding indices:
    Index 1: delete
    Index 2: median
    Index 3: Any Numeric Value
    
    Enter the index of corresponding replacement method for feature fare:  2
    
    Customization of outlier handling has been completed successfully.
    
    Available feature scaling methods with corresponding indices:
    Index 1: maxabs
    Index 2: mean
    Index 3: midrange
    Index 4: range
    Index 5: rescale
    Index 6: std
    Index 7: sum
    Index 8: ustd
    
    Enter the corresponding index feature scaling method:  6
    
    Customization of feature scaling has been completed successfully.
    
    Customization of data preparation phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  3
    
    Customizing Model Training Phase ...
    
    Available options for customization of model training phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Model Hyperparameter
    
    Index 2: Back to main menu
    
    Index 3: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in model training phase:  1,2
    
    Customizing Model Hyperparameter ...
    
    Available models for hyperparameter tuning with corresponding indices:
    Index 1: decision_forest
    Index 2: xgboost
    Index 3: knn
    Index 4: glm
    Index 5: svm
    
    Available hyperparamters update methods with corresponding indices:
    Index 1: ADD
    Index 2: REPLACE
    
    Enter the list of model indices for performing hyperparameter tuning:  2
    
    Available hyperparameters for model 'xgboost' with corresponding indices:
    Index 1: min_impurity
    Index 2: max_depth
    Index 3: min_node_size
    Index 4: shrinkage_factor
    Index 5: iter_num
    
    Enter the list of hyperparameter indices for model 'xgboost':  3
    
    Enter the index of corresponding update method for hyperparameters 'min_node_size' for model 'xgboost':  1
    
    Enter the list of value for hyperparameter 'min_node_size' for model 'xgboost':  1,5
    
    Customization of model hyperparameter has been completed successfully.
    
    Customization of model training phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  4
    
    Generating custom json and exiting ...
    
    Process of generating custom config file for AutoML has been completed successfully.
    
    'custom_titanic.json' file is generated successfully under the current working directory.
  3. Create an AutoML instance.
    >>> aml = AutoML(task_type="Classification",
    >>>              include=['decision_forest','xgboost'],
    >>>              verbose=2,
    >>>              max_runtime_secs=300,
    >>>              custom_config_file='custom_titanic.json')
  4. Fit the data.
    >>> aml.fit(titanic_train, titanic_train.survived)
     Received below input for customization : 
    {
        "MissingValueHandlingIndicator": true,
        "MissingValueHandlingParam": {
            "DroppingColumnIndicator": true,
            "DroppingColumnList": [
                "cabin"
            ],
            "ImputeMissingIndicator": true,
            "StatImputeList": [
                "age",
                "embarked"
            ],
            "StatImputeMethod": [
                "median",
                "mode"
            ]
        },
        "BincodeIndicator": true,
        "BincodeParam": {
            "pclass": {
                "Type": "Variable-Width",
                "NumOfBins": 2,
                "Bin_1": {
                    "min_value": 0,
                    "max_value": 1,
                    "label": "low"
                },
                "Bin_2": {
                    "min_value": 2,
                    "max_value": 3,
                    "label": "high"
                }
            }
        },
        "CategoricalEncodingIndicator": true,
        "CategoricalEncodingParam": {
            "OrdinalEncodingIndicator": true,
            "OrdinalEncodingList": [
                "pclass"
            ],
            "TargetEncodingIndicator": true,
            "TargetEncodingList": {
                "embarked": {
                    "encoder_method": "CBM_GAUSSIAN_INVERSE_GAMMA",
                    "response_column": "survived"
                }
            }
        },
        "NonLinearTransformationIndicator": true,
        "NonLinearTransformationParam": {
            "Combination_1": {
                "target_columns": [
                    "parch",
                    "sibsp"
                ],
                "formula": "Y=(X0+X1+1)",
                "result_column": "Family_count"
            }
        },
        "AntiselectIndicator": true,
        "AntiselectParam": [
            "passenger"
        ],
        "TrainTestSplitIndicator": true,
        "TrainingSize": 0.75,
        "DataImbalanceIndicator": true,
        "DataImbalanceMethod": "SMOTE",
        "OutlierFilterIndicator": true,
        "OutlierFilterMethod": "percentile",
        "OutlierLowerPercentile": 0.1,
        "OutlierUpperPercentile": 0.9,
        "OutlierFilterParam": {
            "fare": {
                "replacement_value": "median"
            }
        },
        "FeatureScalingIndicator": true,
        "FeatureScalingMethod": "std",
        "HyperparameterTuningIndicator": true,
        "HyperparameterTuningParam": {
            "xgboost": {
                "min_node_size": {
                    "Method": "ADD",
                    "Value": [
                        1,
                        5
                    ]
                }
            }
        }
    }
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    
    Data Overview:
    Total Rows in the data: 713
    Total Columns in the data: 12
    
    Column Summary:
    ColumnName	Datatype	NonNullCount	NullCount	BlankCount	ZeroCount	PositiveCount	NegativeCount	NullPercentage	NonNullPercentage
    cabin	VARCHAR(20) CHARACTER SET LATIN	167	546	0	None	None	None	76.57784011220197	23.422159887798035
    pclass	INTEGER	713	0	None	0	713	0	0.0	100.0
    sex	VARCHAR(20) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
    sibsp	INTEGER	713	0	None	483	230	0	0.0	100.0
    parch	INTEGER	713	0	None	535	178	0	0.0	100.0
    passenger	INTEGER	713	0	None	0	713	0	0.0	100.0
    embarked	VARCHAR(20) CHARACTER SET LATIN	712	1	0	None	None	None	0.1402524544179523	99.85974754558205
    fare	FLOAT	713	0	None	12	701	0	0.0	100.0
    name	VARCHAR(1000) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
    survived	INTEGER	713	0	None	439	274	0	0.0	100.0
    age	INTEGER	570	143	None	7	563	0	20.05610098176718	79.94389901823281
    ticket	VARCHAR(20) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
    
    Statistics of Data:
    func	passenger	survived	pclass	age	sibsp	parch	fare
    50%	456	0	3	28	0	0	14.5
    count	713	713	713	570	713	713	713
    mean	452.764	0.384	2.293	29.335	0.53	0.394	33.635
    min	1	0	1	0	0	0	0
    max	891	1	3	71	8	5	512.329
    75%	673	1	3	38	1	0	31.275
    25%	235	0	2	20	0	0	7.925
    std	257.37	0.487	0.839	14.481	1.111	0.797	52.824
    
    Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    name                      713       
    sex                       2         
    ticket                    561       
    cabin                     130       
    embarked                  3         
    
    Futile columns in dataset:
    ColumnName
    ticket
    name
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                           
      ColumnName  OutlierPercentage
    0        age          20.757363
    1      sibsp           5.189341
    2       fare          14.165498
    3      parch          24.964937
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Feature Engineering started ...
    
    Handling duplicate records present in dataset ...
    Analysis completed. No action taken.                                                    
    
    Total time to handle duplicate records: 1.64 sec
    
    Handling less significant features from data ...
    
    Removing Futile columns:
    ['ticket', 'name']
    
    Sample of Data after removing Futile columns:
    passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
    80	1	3	female	30	0	0	12.475	None	S	12
    183	0	3	male	9	4	2	31.3875	None	S	8
    509	0	3	male	28	0	0	22.525	None	S	16
    305	0	3	male	None	0	0	8.05	None	S	13
    835	0	3	male	18	0	0	8.3	None	S	15
    162	1	2	female	40	0	0	15.75	None	S	23
    265	0	3	female	None	0	0	7.75	None	Q	9
    530	0	2	male	23	2	1	11.5	None	S	17
    61	0	3	male	22	0	0	7.2292	None	C	14
    652	1	2	female	18	0	1	23.0	None	S	22
    
    713 rows X 11 columns
    
    Total time to handle less significant features: 20.57 sec
    
    Handling Date Features ...
    Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    
    Total time to handle date features: 0.00 sec
    
    Dropping these columns for handling customized missing value:
    ['cabin']
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845516787654"'20
    
    Updated dataset sample after performing customized missing value imputation:
    passenger	survived	pclass	sex	age	sibsp	parch	fare	embarked	id
    427	1	2	female	28	1	0	26.0	S	31
    671	1	2	female	40	1	1	39.0	S	47
    528	0	1	male	28	0	0	221.7792	S	55
    793	0	3	female	28	8	2	69.55	S	63
    833	0	3	male	28	0	0	7.2292	C	79
    282	0	3	male	28	0	0	7.8542	S	87
    589	0	3	male	22	0	0	8.05	S	71
    692	1	3	female	4	0	1	13.4167	C	39
    162	1	2	female	40	0	0	15.75	S	23
    835	0	3	male	18	0	0	8.3	S	15
    
    713 rows X 10 columns
    Proceeding with default option for handling remaining missing values.                    
    
    Checking Missing values in dataset ...
    Analysis Completed. No Missing Values Detected.                                          
    
    Total time to find missing values in data: 6.47 sec
    
    Imputing Missing Values ...
    Analysis completed. No imputation required.                                              
    
    Time taken to perform imputation: 0.01 sec
    No information provided for Equal-Width Transformation.                                  
    
    Variable-Width binning information:-
    ColumnName	MinValue	MaxValue	Label
    0	pclass	0	1	low
    1	pclass	2	3	high
    
    2 rows X 4 columns
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845626085490"'20
    
    Updated dataset sample after performing Variable-Width binning:
    sibsp	parch	fare	age	id	survived	passenger	sex	embarked	pclass
    3	0	15.85	33	691	1	86	female	S	high
    3	1	25.4667	28	92	0	177	male	S	high
    3	1	21.075	3	220	0	375	female	S	high
    3	1	25.4667	28	700	0	410	female	S	high
    3	2	27.9	10	646	0	820	male	S	high
    3	2	27.9	9	718	0	635	female	S	high
    0	0	7.25	30	32	0	366	male	S	high
    0	0	8.05	16	56	1	221	male	S	high
    0	0	24.15	39	80	0	812	male	S	high
    0	0	8.05	43	88	0	669	male	S	high
    
    713 rows X 10 columns
    Skipping customized string manipulation.⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 25% - 5/20
    
    Starting Customized Categorical Feature Encoding ...
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845980398494"'20
    
    Updated dataset sample after performing ordinal encoding:
    sibsp	parch	fare	age	id	survived	passenger	sex	embarked	pclass
    4	1	29.125	2	49	0	17	male	Q	0
    4	1	29.125	4	558	0	172	male	Q	0
    4	2	31.275	9	539	0	542	female	S	0
    4	2	31.275	6	40	0	814	female	S	0
    4	1	39.6875	14	429	0	687	male	S	0
    4	1	39.6875	1	302	0	165	male	S	0
    4	1	29.125	8	445	0	788	male	Q	0
    4	2	31.3875	5	180	1	234	female	S	0
    4	2	7.925	17	627	1	69	female	S	0
    4	1	39.6875	16	735	0	267	male	S	0
    
    713 rows X 10 columns
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713844678774235"'20
    
    Updated dataset sample after performing target encoding:
    embarked	sibsp	parch	fare	age	id	survived	passenger	sex	pclass
    0.36363636363636365	1	0	15.5	28	398	0	365	male	0
    0.36363636363636365	1	0	24.15	28	178	1	110	female	0
    0.36363636363636365	1	0	24.15	28	457	0	769	male	0
    0.36363636363636365	1	1	15.5	40	585	0	189	male	0
    0.36363636363636365	2	0	23.25	28	420	1	302	male	0
    0.36363636363636365	2	0	90.0	44	742	0	246	male	1
    0.3300395256916996	1	1	29.0	22	67	1	324	female	0
    0.3300395256916996	1	5	31.3875	38	280	1	26	female	0
    0.3300395256916996	1	2	65.0	48	774	1	755	female	0
    0.3300395256916996	0	0	7.8958	22	160	0	522	male	0
    
    713 rows X 10 columns
    
    Performing encoding for categorical columns ...
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845375023505"'20
    
    ONE HOT Encoding these Columns:
    ['sex']
    
    Sample of dataset after performing one hot encoding:
    embarked	sibsp	parch	fare	age	id	survived	passenger	sex_0	sex_1	pclass
    0.36363636363636365	1	0	24.15	28	457	0	769	0	1	0
    0.36363636363636365	2	0	23.25	28	420	1	302	0	1	0
    0.36363636363636365	2	0	90.0	44	742	0	246	0	1	1
    0.36363636363636365	0	0	7.7375	28	304	0	779	0	1	0
    0.36363636363636365	0	0	7.75	28	42	0	791	0	1	0
    0.36363636363636365	0	0	7.75	32	466	0	891	0	1	0
    0.3300395256916996	1	1	11.1333	1	197	1	173	1	0	0
    0.3300395256916996	0	0	6.2375	61	298	0	327	0	1	0
    0.3300395256916996	1	2	65.0	24	662	1	616	1	0	0
    0.3300395256916996	0	0	14.0	54	139	0	318	0	1	0
    
    713 rows X 11 columns
    
    Time taken to encode the columns: 13.91 sec
    
    Starting customized mathematical transformation ...
    Skipping customized mathematical transformation.                                         
    
    Starting customized non-linear transformation ...
    
    Possible combination :
    ['Combination_1']
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713844655147652"'20
    
    Updated dataset sample after performing non-liner transformation:
    embarked	sibsp	parch	fare	age	id	survived	passenger	sex_0	sex_1	pclass	Family_count
    0.3300395256916996	0.0	0.0	0.0	28	106	0	278	0	1	0	1.0
    0.3300395256916996	0.0	0.0	50.4958	31	518	0	868	0	1	1	1.0
    0.3300395256916996	1.0	1.0	20.525	33	41	0	549	0	1	0	3.0
    0.3300395256916996	2.0	0.0	133.65	50	203	1	661	0	1	1	3.0
    0.3300395256916996	1.0	1.0	39.0	60	502	0	685	0	1	0	3.0
    0.3300395256916996	1.0	0.0	26.0	25	444	0	729	0	1	0	2.0
    0.3300395256916996	0.0	0.0	14.5	28	592	0	761	0	1	0	1.0
    0.3300395256916996	1.0	0.0	17.8	18	509	0	50	1	0	0	2.0
    0.3300395256916996	0.0	0.0	8.05	28	544	0	416	1	0	0	1.0
    0.3300395256916996	0.0	0.0	8.05	22	71	0	589	0	1	0	1.0
    
    713 rows X 12 columns
    
    Starting customized anti-select columns ...
    
    Updated dataset sample after performing anti-select columns:
    embarked	sibsp	parch	fare	age	id	survived	sex_0	sex_1	pclass	Family_count
    0.36363636363636365	2.0	0.0	90.0	44	742	0	0	1	1	3.0
    0.36363636363636365	0.0	0.0	7.75	28	42	0	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.75	32	466	0	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.75	28	530	1	1	0	0	1.0
    0.36363636363636365	0.0	0.0	7.75	30	131	0	1	0	0	1.0
    0.36363636363636365	0.0	0.0	7.8792	19	283	1	1	0	0	1.0
    0.3300395256916996	0.0	0.0	8.05	22	71	0	0	1	0	1.0
    0.3300395256916996	0.0	0.0	8.05	28	544	0	1	0	0	1.0
    0.3300395256916996	0.0	0.0	0.0	28	106	0	0	1	0	1.0
    0.3300395256916996	1.0	0.0	17.8	18	509	0	1	0	0	2.0
    
    713 rows X 11 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Data preparation started ...
    
    Spliting of dataset into training and testing ...
    Training size : 0.75                                                                      
    Testing size  : 0.25                                                                      
    
    Training data sample
    embarked	sibsp	parch	fare	age	id	survived	sex_0	sex_1	pclass	Family_count
    0.3300395256916996	0.0	0.0	8.05	28	13	0	0	1	0	1.0
    0.3300395256916996	0.0	0.0	22.525	28	16	0	0	1	0	1.0
    0.3300395256916996	2.0	1.0	11.5	23	17	0	0	1	0	4.0
    0.3300395256916996	4.0	2.0	31.275	2	18	0	1	0	0	7.0
    0.3300395256916996	0.0	0.0	13.0	36	20	0	0	1	0	1.0
    0.5763888888888888	0.0	0.0	18.7875	11	21	0	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.75	28	9	0	1	0	0	1.0
    0.36363636363636365	4.0	1.0	29.125	2	49	0	0	1	0	6.0
    0.36363636363636365	0.0	0.0	7.75	28	73	0	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.75	40	96	0	0	1	0	1.0
    
    534 rows X 11 columns
    
    Testing data sample
    embarked	sibsp	parch	fare	age	id	survived	sex_0	sex_1	pclass	Family_count
    0.5763888888888888	0.0	0.0	7.2292	22	14	0	0	1	0	1.0
    0.3300395256916996	0.0	0.0	7.25	30	32	0	0	1	0	1.0
    0.3300395256916996	0.0	0.0	13.0	24	34	0	1	0	0	1.0
    0.3300395256916996	0.0	0.0	26.55	34	35	1	0	1	1	1.0
    0.3300395256916996	0.0	0.0	30.5	27	57	1	0	1	1	1.0
    0.5763888888888888	1.0	0.0	24.0	28	62	1	1	0	0	2.0
    0.36363636363636365	1.0	0.0	15.5	28	28	1	1	0	0	2.0
    0.36363636363636365	0.0	0.0	7.75	28	42	0	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.75	28	81	1	1	0	0	1.0
    0.36363636363636365	0.0	5.0	29.125	39	196	0	1	0	0	6.0
    
    179 rows X 11 columns
    
    Time taken for spliting of data: 11.91 sec
    
    Starting customized outlier processing ...
    Columns with outlier percentage :-                                                                           
         ColumnName  OutlierPercentage
    0            id           9.817672
    1           age           9.256662
    2          fare           9.116410
    3  Family_count           2.805049
    4         parch           1.683029
    5         sibsp           3.225806
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713845708648720"'
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713853330820560"'/20
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846364308763"'
    
    Checking imbalance data ...
    
    Imbalance Not Found.
    
    Feature selection using lasso ...
    
    feature selected by lasso:
    ['sibsp', 'sex_1', 'fare', 'sex_0', 'Family_count', 'age', 'pclass']
    
    Total time taken by feature selection: 2.80 sec
    
    scaling Features of lasso data ...
    
    columns that will be scaled:
    ['sibsp', 'fare', 'Family_count', 'age', 'pclass']
    
    Training dataset sample after scaling:
    id	survived	sex_1	sex_0	sibsp	fare	Family_count	age	pclass
    163	0	1	0	-0.4733327294262123	-0.7081553190636275	-0.5661773649877848	-0.09018879949184125	-0.5759086932341287
    27	0	1	0	3.6567273606652484	0.91382440495327	3.5194809174916357	-2.2466433783863144	-0.5759086932341287
    53	1	0	1	-0.4733327294262123	-0.08005100856639845	0.01748810393784668	0.38902332915137505	-0.5759086932341287
    210	0	1	0	-0.4733327294262123	1.1201887036809417	-0.5661773649877848	1.9863970912954294	1.7363863608036514
    461	0	1	0	-0.4733327294262123	0.33579644478911136	0.01748810393784668	-0.09018879949184125	-0.5759086932341287
    43	0	0	1	0.3526792885920798	-0.28797473524415335	0.01748810393784668	0.1494172648297669	-0.5759086932341287
    63	0	0	1	6.134763414720124	1.8557188868034997	5.270477324268531	-0.09018879949184125	-0.5759086932341287
    25	0	0	1	-0.4733327294262123	-0.7081553190636275	-0.5661773649877848	-0.09018879949184125	-0.5759086932341287
    60	1	0	1	-0.4733327294262123	2.560580320241089	-0.5661773649877848	-1.0486130567782739	1.7363863608036514
    17	0	1	0	1.1786913066103721	-0.5582755799252347	1.1848190417891098	-0.48953224002785484	-0.5759086932341287
    
    534 rows X 9 columns
    
    Testing dataset sample after scaling:
    id	survived	sex_1	sex_0	sibsp	fare	Family_count	age	pclass
    57	1	1	0	-0.4733327294262123	0.23183458145023392	-0.5661773649877848	-0.17005748759904396	1.7363863608036514
    71	0	1	0	-0.4733327294262123	-0.7017429513328856	-0.5661773649877848	-0.5694009281350575	-0.5759086932341287
    67	1	0	1	0.3526792885920798	0.16945746344690746	0.6011535728634781	-0.5694009281350575	-0.5759086932341287
    79	0	1	0	-0.4733327294262123	-0.7358757103043059	-0.5661773649877848	-0.09018879949184125	-0.5759086932341287
    82	0	1	0	0.3526792885920798	1.171649826033686	0.01748810393784668	-0.8090069924566656	1.7363863608036514
    126	0	0	1	-0.4733327294262123	-0.713178756300162	-0.5661773649877848	-0.8888756805638683	-0.5759086932341287
    99	1	0	1	-0.4733327294262123	7.751915966067934	0.01748810393784668	-1.1284817448854765	1.7363863608036514
    62	1	0	1	0.3526792885920798	-0.03846626323084746	0.01748810393784668	-0.09018879949184125	-0.5759086932341287
    34	0	0	1	-0.4733327294262123	-0.4958984619219083	-0.5661773649877848	-0.4096635519206521	-0.5759086932341287
    32	0	1	0	-0.4733327294262123	-0.7350107476013265	-0.5661773649877848	0.06954857672256418	-0.5759086932341287
    
    179 rows X 9 columns
    
    Total time taken by feature scaling: 44.70 sec
    
    Feature selection using rfe ...
    
    feature selected by RFE:
    ['sex_1', 'sex_0', 'age', 'pclass', 'fare', 'Family_count']
    
    Total time taken by feature selection: 22.47 sec
    
    scaling Features of rfe data ...
    
    columns that will be scaled:
    ['r_age', 'r_pclass', 'r_fare', 'r_Family_count']
    
    Training dataset sample after scaling:
    r_sex_1	id	survived	r_sex_0	r_age	r_pclass	r_fare	r_Family_count
    1	163	0	0	-0.09018879949184125	-0.5759086932341287	-0.7081553190636275	-0.5661773649877848
    1	27	0	0	-2.2466433783863144	-0.5759086932341287	0.91382440495327	3.5194809174916357
    0	53	1	1	0.38902332915137505	-0.5759086932341287	-0.08005100856639845	0.01748810393784668
    1	210	0	0	1.9863970912954294	1.7363863608036514	1.1201887036809417	-0.5661773649877848
    1	461	0	0	-0.09018879949184125	-0.5759086932341287	0.33579644478911136	0.01748810393784668
    0	43	0	1	0.1494172648297669	-0.5759086932341287	-0.28797473524415335	0.01748810393784668
    0	63	0	1	-0.09018879949184125	-0.5759086932341287	1.8557188868034997	5.270477324268531
    0	25	0	1	-0.09018879949184125	-0.5759086932341287	-0.7081553190636275	-0.5661773649877848
    0	60	1	1	-1.0486130567782739	1.7363863608036514	2.560580320241089	-0.5661773649877848
    1	17	0	0	-0.48953224002785484	-0.5759086932341287	-0.5582755799252347	1.1848190417891098
    
    534 rows X 8 columns
    
    Testing dataset sample after scaling:
    r_sex_1	id	survived	r_sex_0	r_age	r_pclass	r_fare	r_Family_count
    1	57	1	0	-0.17005748759904396	1.7363863608036514	0.23183458145023392	-0.5661773649877848
    1	71	0	0	-0.5694009281350575	-0.5759086932341287	-0.7017429513328856	-0.5661773649877848
    0	67	1	1	-0.5694009281350575	-0.5759086932341287	0.16945746344690746	0.6011535728634781
    1	79	0	0	-0.09018879949184125	-0.5759086932341287	-0.7358757103043059	-0.5661773649877848
    1	82	0	0	-0.8090069924566656	1.7363863608036514	1.171649826033686	0.01748810393784668
    0	126	0	1	-0.8888756805638683	-0.5759086932341287	-0.713178756300162	-0.5661773649877848
    0	99	1	1	-1.1284817448854765	1.7363863608036514	7.751915966067934	0.01748810393784668
    0	62	1	1	-0.09018879949184125	-0.5759086932341287	-0.03846626323084746	0.01748810393784668
    0	34	0	1	-0.4096635519206521	-0.5759086932341287	-0.4958984619219083	-0.5661773649877848
    1	32	0	0	0.06954857672256418	-0.5759086932341287	-0.7350107476013265	-0.5661773649877848
    
    179 rows X 8 columns
    
    Total time taken by feature scaling: 42.68 sec
    
    scaling Features of pca data ...
    
    columns that will be scaled:
    ['embarked', 'sibsp', 'parch', 'fare', 'age', 'pclass', 'Family_count']
    
    Training dataset sample after scaling:
    id	survived	sex_1	sex_0	embarked	sibsp	parch	fare	age	pclass	Family_count
    17	0	1	0	-0.5232459472817322	1.1786913066103721	0.7672497196248651	-0.5582755799252346	-0.4895322400278547	-0.5759086932341301	1.1848190417891102
    31	1	0	1	-0.5232459472817322	0.3526792885920798	-0.50514577813811	0.04470322744025449	-0.09018879949184123	-0.5759086932341301	0.017488103937846687
    60	1	0	1	-0.5232459472817322	-0.4733327294262123	-0.50514577813811	2.560580320241088	-1.0486130567782737	1.7363863608036554	-0.566177364987785
    65	0	1	0	-0.5232459472817322	0.3526792885920798	-0.50514577813811	-0.37738193771558787	-0.09018879949184123	-0.5759086932341301	0.017488103937846687
    25	0	0	1	-0.5232459472817322	-0.4733327294262123	-0.50514577813811	-0.7081553190636273	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    163	0	1	0	-0.5232459472817322	-0.4733327294262123	-0.50514577813811	-0.7081553190636273	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    73	0	1	0	-0.17262793722408595	-0.4733327294262123	-0.50514577813811	-0.7142183749335507	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    97	0	1	0	-0.17262793722408595	-0.4733327294262123	-0.50514577813811	-0.5229285463900163	2.226003155617037	-0.5759086932341301	-0.566177364987785
    131	0	0	1	-0.17262793722408595	-0.4733327294262123	-0.50514577813811	-0.7142183749335507	0.06954857672256418	-0.5759086932341301	-0.566177364987785
    133	1	0	1	-0.17262793722408595	-0.4733327294262123	-0.50514577813811	-0.7088456258361975	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    
    534 rows X 11 columns
    
    Testing dataset sample after scaling:
    id	survived	sex_1	sex_0	embarked	sibsp	parch	fare	age	pclass	Family_count
    14	0	1	0	2.0476663405184086	-0.4733327294262123	-0.50514577813811	-0.7358757103043057	-0.5694009281350575	-0.5759086932341301	-0.566177364987785
    32	0	1	0	-0.5232459472817322	-0.4733327294262123	-0.50514577813811	-0.7350107476013262	0.06954857672256418	-0.5759086932341301	-0.566177364987785
    34	0	0	1	-0.5232459472817322	-0.4733327294262123	-0.50514577813811	-0.4958984619219081	-0.40966355192065207	-0.5759086932341301	-0.566177364987785
    35	1	1	0	-0.5232459472817322	-0.4733327294262123	-0.50514577813811	0.06757483737480756	0.389023329151375	1.7363863608036554	-0.566177364987785
    57	1	1	0	-0.5232459472817322	-0.4733327294262123	-0.50514577813811	0.23183458145023386	-0.17005748759904393	1.7363863608036554	-0.566177364987785
    62	1	0	1	2.0476663405184086	0.3526792885920798	-0.50514577813811	-0.03846626323084745	-0.09018879949184123	-0.5759086932341301	0.017488103937846687
    28	1	0	1	-0.17262793722408595	0.3526792885920798	-0.50514577813811	-0.3919365985830307	-0.09018879949184123	-0.5759086932341301	0.017488103937846687
    42	0	1	0	-0.17262793722408595	-0.4733327294262123	-0.50514577813811	-0.7142183749335507	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    81	1	0	1	-0.17262793722408595	-0.4733327294262123	-0.50514577813811	-0.7142183749335507	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    196	0	0	1	-0.17262793722408595	-0.4733327294262123	5.856831710676766	0.17465555661385126	0.7883667696873885	-0.5759086932341301	2.3521499796403735
    
    179 rows X 11 columns
    
    Total time taken by feature scaling: 42.52 sec
    
    Dimension Reduction using pca ...
    
    PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5']
    
    Total time taken by PCA: 11.87 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Model Training started ...
    
    Starting customized hyperparameter update ...
    
    Completed customized hyperparameter update.
    
    Hyperparameters used for model training:
    response_column : survived                                                                                   
    name : decision_forest
    tree_type : Classification
    min_impurity : (0.0, 0.1, 0.2)
    max_depth : (5, 6, 8, 10)
    min_node_size : (1, 2, 3)
    num_trees : (-1, 20, 30)
    Total number of models for decision_forest : 108
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : survived
    name : xgboost
    model_type : Classification
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1, 0.2)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.1, 0.3)
    max_depth : (5, 6, 8, 10)
    min_node_size : (1, 2, 3, 5)
    iter_num : (10, 20, 30)
    Total number of models for xgboost : 3456
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    Performing hyperParameter tuning ...
    
    decision_forest
    
    ----------------------------------------------------------------------------------------------------
    
    xgboost
    
    ----------------------------------------------------------------------------------------------------
    
    Evaluating models performance ...
    
    Evaluation completed.
    
    Leaderboard
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_2	pca	0.793296	0.793296	0.793296	0.793296	0.781918	0.788603	0.784583	0.796967	0.793296	0.794506
    1	2	DECISIONFOREST_3	lasso	0.793296	0.793296	0.793296	0.793296	0.785015	0.772398	0.777281	0.791188	0.793296	0.790961
    2	3	XGBOOST_0	lasso	0.782123	0.782123	0.782123	0.782123	0.774160	0.757905	0.763684	0.779693	0.782123	0.778804
    3	4	XGBOOST_3	lasso	0.782123	0.782123	0.782123	0.782123	0.774160	0.757905	0.763684	0.779693	0.782123	0.778804
    4	5	DECISIONFOREST_0	lasso	0.770950	0.770950	0.770950	0.770950	0.763252	0.743412	0.749838	0.768262	0.770950	0.766484
    5	6	DECISIONFOREST_2	pca	0.765363	0.765363	0.765363	0.765363	0.752372	0.752372	0.752372	0.765363	0.765363	0.765363
    6	7	XGBOOST_1	rfe	0.664804	0.664804	0.664804	0.664804	0.646978	0.605731	0.602222	0.654215	0.664804	0.638361
    7	8	DECISIONFOREST_1	rfe	0.664804	0.664804	0.664804	0.664804	0.653798	0.597628	0.588695	0.657792	0.664804	0.629221
    
    8 rows X 13 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 20/20
  5. Display model leaderboard.
    >>> aml.leaderboard()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_2	pca	0.793296	0.793296	0.793296	0.793296	0.781918	0.788603	0.784583	0.796967	0.793296	0.794506
    1	2	DECISIONFOREST_3	lasso	0.793296	0.793296	0.793296	0.793296	0.785015	0.772398	0.777281	0.791188	0.793296	0.790961
    2	3	XGBOOST_0	lasso	0.782123	0.782123	0.782123	0.782123	0.774160	0.757905	0.763684	0.779693	0.782123	0.778804
    3	4	XGBOOST_3	lasso	0.782123	0.782123	0.782123	0.782123	0.774160	0.757905	0.763684	0.779693	0.782123	0.778804
    4	5	DECISIONFOREST_0	lasso	0.770950	0.770950	0.770950	0.770950	0.763252	0.743412	0.749838	0.768262	0.770950	0.766484
    5	6	DECISIONFOREST_2	pca	0.765363	0.765363	0.765363	0.765363	0.752372	0.752372	0.752372	0.765363	0.765363	0.765363
    6	7	XGBOOST_1	rfe	0.664804	0.664804	0.664804	0.664804	0.646978	0.605731	0.602222	0.654215	0.664804	0.638361
    7	8	DECISIONFOREST_1	rfe	0.664804	0.664804	0.664804	0.664804	0.653798	0.597628	0.588695	0.657792	0.664804	0.629221
  6. Display the best performing model.
    >>> aml.leader()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	XGBOOST_2	pca	0.793296	0.793296	0.793296	0.793296	0.781918	0.788603	0.784583	0.796967	0.793296	0.794506
  7. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    Following model is being used for generating prediction :
    Model ID : XGBOOST_2 
    Feature Selection Method : pca
    
     Prediction : 
       survived   id  Prediction  Confidence_Lower  Confidence_upper
    0         0   42           0             1.000             1.000
    1         1   81           1             1.000             1.000
    2         0   14           0             0.875             0.875
    3         0  196           0             0.500             0.500
    4         0  337           1             0.875             0.875
    5         0   32           0             0.875             0.875
    6         1   23           1             0.750             0.750
    7         0   11           0             0.750             0.750
    8         1   10           1             0.750             0.750
    9         1   28           1             0.625             0.625
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    0               0  CLASS_1       89       16   0.847619  0.809091  0.827907      110
    1               1  CLASS_2       21       53   0.716216  0.768116  0.741259       69
    
     ROC-AUC : 
    AUC	GINI
    0.7152832674571804	0.4305665349143608
    threshold_value	tpr	fpr
    0.04081632653061224	0.7681159420289855	0.19090909090909092
    0.08163265306122448	0.7681159420289855	0.19090909090909092
    0.1020408163265306	0.7681159420289855	0.19090909090909092
    0.12244897959183673	0.7681159420289855	0.19090909090909092
    0.16326530612244897	0.7681159420289855	0.19090909090909092
    0.18367346938775508	0.7681159420289855	0.19090909090909092
    0.14285714285714285	0.7681159420289855	0.19090909090909092
    0.061224489795918366	0.7681159420289855	0.19090909090909092
    0.02040816326530612	0.7681159420289855	0.19090909090909092
    0.0	1.0	1.0
    
     Confusion Matrix : 
    array([[89, 21],
           [16, 53]], dtype=int64)
    >>> prediction.head()
    survived	id	Prediction	Confidence_Lower	Confidence_upper
    0	32	0	0.875	0.875
    0	380	0	1.0	1.0
    0	413	0	0.75	0.75
    0	556	0	1.0	1.0
    0	731	0	0.875	0.875
    0	79	0	1.0	1.0
    0	71	0	0.875	0.875
    0	355	0	0.75	0.75
    0	337	1	0.875	0.875
    0	14	0	0.875	0.875
  8. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(titanic_test)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping futile columns :
    passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
    469	0	3	male	None	0	0	7.725	None	Q	8
    570	1	3	male	32	0	0	7.8542	None	S	15
    223	0	3	male	51	0	0	8.05	None	S	23
    856	1	3	female	18	0	1	9.35	None	S	11
    326	1	1	female	36	0	0	135.6333	C32	C	13
    650	1	3	female	23	0	0	7.55	None	S	21
    734	0	2	male	23	0	0	13.0	None	S	14
    795	0	3	male	25	0	0	7.8958	None	S	22
    631	1	1	male	80	0	0	30.0	A23	S	10
    57	1	2	female	21	0	0	10.5	None	S	18
    
    Updated dataset after performing target column transformation :
    sibsp	cabin	parch	fare	age	id	passenger	sex	pclass	embarked	survived
    0	None	1	9.35	18	11	856	female	3	S	1
    0	B51 B53 B55	0	5.0	33	9	873	male	1	S	0
    0	C106	0	30.5	None	17	299	male	1	S	1
    0	None	0	8.05	21	12	38	male	3	S	0
    0	A23	0	30.0	80	10	631	male	1	S	1
    0	None	0	10.5	21	18	57	female	2	S	1
    0	C32	0	135.6333	36	13	326	female	1	C	1
    0	None	0	7.55	23	21	650	female	3	S	1
    0	None	0	13.0	23	14	734	male	2	S	0
    0	None	0	7.8958	25	22	795	male	3	S	0
    
    Updated dataset after dropping customized missing value containing columns :
    sibsp	parch	fare	age	id	passenger	sex	pclass	embarked	survived
    0	1	9.35	18	11	856	female	3	S	1
    0	0	30.0	80	10	631	male	1	S	1
    0	0	10.5	21	18	57	female	2	S	1
    0	0	13.0	23	14	734	male	2	S	0
    0	0	135.6333	36	13	326	female	1	C	1
    0	0	7.55	23	21	650	female	3	S	1
    0	0	8.05	21	12	38	male	3	S	0
    0	0	7.8542	48	20	772	male	3	S	0
    0	0	7.8542	32	15	570	male	3	S	1
    0	0	8.05	51	23	223	male	3	S	0
    
    Updated dataset after imputing customized missing value containing columns :
    sibsp	parch	fare	age	id	passenger	sex	pclass	embarked	survived
    4	2	31.275	4	116	851	male	3	S	0
    4	1	39.6875	7	63	51	male	3	S	0
    4	1	39.6875	2	62	825	male	3	S	0
    4	2	31.275	11	42	543	female	3	S	0
    0	0	8.05	44	49	697	male	3	S	0
    0	0	7.75	22	57	142	female	3	S	1
    0	0	7.75	65	65	281	male	3	Q	0
    0	0	7.8542	21	105	624	male	3	S	0
    0	0	8.05	55	113	153	male	3	S	0
    0	5	39.6875	41	121	639	female	3	S	0
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846972810707"'
    
    Updated dataset after performing customized variable width bin-code transformation :
    sibsp	parch	fare	age	id	survived	passenger	sex	embarked	pclass
    0	0	8.05	55	113	0	153	male	S	high
    0	0	26.2875	36	153	1	513	male	S	low
    0	0	7.2292	28	161	1	368	female	C	high
    0	0	13.0	23	14	0	734	male	S	high
    0	0	7.8958	24	54	0	295	male	S	high
    0	0	0.0	49	70	0	598	male	S	high
    4	1	39.6875	7	63	0	51	male	S	high
    4	2	31.275	11	42	0	543	female	S	high
    4	1	39.6875	2	62	0	825	male	S	high
    4	1	29.125	7	143	0	279	male	Q	high
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846171306458"'
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713850847022753"'
    
    Updated dataset after performing customized categorical encoding :
    embarked	sibsp	parch	fare	age	id	survived	passenger	sex	pclass
    0.36363636363636365	0	0	7.6292	28	47	0	503	female	0
    0.36363636363636365	1	0	90.0	33	48	1	413	female	1
    0.36363636363636365	1	0	15.5	28	154	1	187	female	0
    0.36363636363636365	2	0	23.25	28	84	1	331	female	0
    0.36363636363636365	0	0	7.75	65	65	0	281	male	0
    0.36363636363636365	0	0	7.725	28	8	0	469	male	0
    0.3300395256916996	0	0	7.8542	20	158	0	641	male	0
    0.3300395256916996	0	0	7.925	39	126	0	529	male	0
    0.3300395256916996	0	0	8.05	28	117	0	88	male	0
    0.3300395256916996	0	0	0.0	28	190	0	675	male	0
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846624114222"'
    
    Updated dataset after performing categorical encoding :
    embarked	sibsp	parch	fare	age	id	survived	passenger	sex_0	sex_1	pclass
    0.3300395256916996	0	0	56.4958	26	61	1	510	0	1	0
    0.3300395256916996	0	0	7.8958	24	54	0	295	0	1	0
    0.3300395256916996	0	0	10.5	21	18	1	57	1	0	0
    0.3300395256916996	0	0	10.1708	19	34	0	688	0	1	0
    0.3300395256916996	0	0	7.8	21	122	0	52	0	1	0
    0.3300395256916996	0	0	7.4958	36	120	0	664	0	1	0
    0.36363636363636365	0	0	7.6292	28	47	0	503	1	0	0
    0.36363636363636365	1	0	90.0	33	48	1	413	1	0	1
    0.36363636363636365	1	0	15.5	28	154	1	187	1	0	0
    0.36363636363636365	2	0	23.25	28	84	1	331	1	0	0
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713846909702579"'
    
    Updated dataset after performing customized non-linear transformation :
    embarked	sibsp	parch	fare	age	id	survived	passenger	sex_0	sex_1	pclass	Family_count
    0.3300395256916996	0.0	0.0	10.5	21	18	1	57	1	0	0	1.0
    0.3300395256916996	0.0	0.0	8.05	45	60	1	339	0	1	0	1.0
    0.3300395256916996	0.0	0.0	7.4958	36	120	0	664	0	1	0	1.0
    0.3300395256916996	0.0	1.0	9.35	18	11	1	856	1	0	0	2.0
    0.3300395256916996	0.0	0.0	26.2875	35	144	1	702	0	1	1	1.0
    0.3300395256916996	0.0	0.0	10.5	26	138	0	620	0	1	0	1.0
    0.36363636363636365	1.0	0.0	15.5	28	154	1	187	1	0	0	2.0
    0.36363636363636365	0.0	0.0	7.75	65	65	0	281	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.725	28	8	0	469	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.75	28	90	0	127	0	1	0	1.0
    
    Updated dataset after performing customized anti-selection :
    embarked	sibsp	parch	fare	age	id	survived	sex_0	sex_1	pclass	Family_count
    0.3300395256916996	0.0	0.0	10.5	21	18	1	1	0	0	1.0
    0.3300395256916996	0.0	0.0	8.05	45	60	1	0	1	0	1.0
    0.3300395256916996	0.0	0.0	7.4958	36	120	0	0	1	0	1.0
    0.3300395256916996	0.0	1.0	9.35	18	11	1	1	0	0	2.0
    0.3300395256916996	0.0	0.0	26.2875	35	144	1	0	1	1	1.0
    0.3300395256916996	0.0	0.0	10.5	26	138	0	0	1	0	1.0
    0.36363636363636365	1.0	0.0	15.5	28	154	1	1	0	0	2.0
    0.36363636363636365	0.0	0.0	7.75	65	65	0	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.725	28	8	0	0	1	0	1.0
    0.36363636363636365	0.0	0.0	7.75	28	90	0	0	1	0	1.0
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713849399082665"'
    
    Updated dataset after performing Lasso feature selection:
    id	sibsp	sex_1	fare	sex_0	Family_count	age	pclass	survived
    190	0.0	1	0.0	0	1.0	28	0	0
    28	1.0	1	20.2125	0	3.0	18	0	0
    26	0.0	1	8.05	0	1.0	30	0	0
    79	0.0	1	7.775	0	1.0	28	0	1
    98	0.0	1	13.0	0	1.0	34	0	1
    114	0.0	1	26.2875	0	1.0	42	1	1
    142	0.0	0	7.75	1	1.0	45	0	0
    16	0.0	0	7.55	1	1.0	28	0	0
    134	1.0	0	26.0	1	2.0	29	0	1
    110	0.0	0	55.0	1	2.0	28	1	1
    
    Updated dataset after performing scaling on Lasso selected features :
    id	survived	sex_1	sex_0	sibsp	fare	Family_count	age	pclass
    190	0	1	0	-0.4733327294262123	-1.036500151284071	-0.5661773649877848	-0.09018879949184125	-0.5759086932341287
    28	0	1	0	0.3526792885920798	-0.19596848618924687	0.6011535728634781	-0.8888756805638683	-0.5759086932341287
    26	0	1	0	-0.4733327294262123	-0.7017429513328856	-0.5661773649877848	0.06954857672256418	-0.5759086932341287
    79	1	1	0	-0.4733327294262123	-0.713178756300162	-0.5661773649877848	-0.09018879949184125	-0.5759086932341287
    98	1	1	0	-0.4733327294262123	-0.4958984619219083	-0.5661773649877848	0.38902332915137505	-0.5759086932341287
    114	1	1	0	-0.4733327294262123	0.056658841724225466	-0.5661773649877848	1.0279728340089966	1.7363863608036514
    142	0	0	1	-0.4733327294262123	-0.7142183749335509	-0.5661773649877848	1.2675788983306049	-0.5759086932341287
    16	0	0	1	-0.4733327294262123	-0.7225353240006611	-0.5661773649877848	-0.09018879949184125	-0.5759086932341287
    134	1	0	1	0.3526792885920798	0.044703227440254505	0.01748810393784668	-0.010320111384638534	-0.5759086932341287
    110	1	0	1	-0.4733327294262123	1.250660842171233	0.01748810393784668	-0.09018879949184125	1.7363863608036514
    
    Updated dataset after performing RFE feature selection:
    id	sex_1	sex_0	age	pclass	fare	Family_count	survived
    190	1	0	28	0	0.0	1.0	0
    28	1	0	18	0	20.2125	3.0	0
    26	1	0	30	0	8.05	1.0	0
    79	1	0	28	0	7.775	1.0	1
    98	1	0	34	0	13.0	1.0	1
    114	1	0	42	1	26.2875	1.0	1
    142	0	1	45	0	7.75	1.0	0
    16	0	1	28	0	7.55	1.0	0
    134	0	1	29	0	26.0	2.0	1
    110	0	1	28	1	55.0	2.0	1
    
    Updated dataset after performing scaling on RFE selected features :
    r_sex_1	id	survived	r_sex_0	r_age	r_pclass	r_fare	r_Family_count
    1	190	0	0	-0.09018879949184125	-0.5759086932341287	-1.036500151284071	-0.5661773649877848
    1	28	0	0	-0.8888756805638683	-0.5759086932341287	-0.19596848618924687	0.6011535728634781
    1	26	0	0	0.06954857672256418	-0.5759086932341287	-0.7017429513328856	-0.5661773649877848
    1	79	1	0	-0.09018879949184125	-0.5759086932341287	-0.713178756300162	-0.5661773649877848
    1	98	1	0	0.38902332915137505	-0.5759086932341287	-0.4958984619219083	-0.5661773649877848
    1	114	1	0	1.0279728340089966	1.7363863608036514	0.056658841724225466	-0.5661773649877848
    0	142	0	1	1.2675788983306049	-0.5759086932341287	-0.7142183749335509	-0.5661773649877848
    0	16	0	1	-0.09018879949184125	-0.5759086932341287	-0.7225353240006611	-0.5661773649877848
    0	134	1	1	-0.010320111384638534	-0.5759086932341287	0.044703227440254505	0.01748810393784668
    0	110	1	1	-0.09018879949184125	1.7363863608036514	1.250660842171233	0.01748810393784668
    
    Updated dataset after performing scaling for PCA feature selection :
    id	survived	sex_1	sex_0	embarked	sibsp	parch	fare	age	pclass	Family_count
    190	0	1	0	-0.5236584390582704	-0.4733327294262123	-0.50514577813811	-1.0365001512840708	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    28	0	1	0	-0.5236584390582704	0.3526792885920798	0.7672497196248651	-0.1959684861892468	-0.8888756805638682	-0.5759086932341301	0.6011535728634785
    26	0	1	0	-0.5236584390582704	-0.4733327294262123	-0.50514577813811	-0.7017429513328853	0.06954857672256418	-0.5759086932341301	-0.566177364987785
    79	1	1	0	-0.5236584390582704	-0.4733327294262123	-0.50514577813811	-0.7131787563001618	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    98	1	1	0	-0.5236584390582704	-0.4733327294262123	-0.50514577813811	-0.4958984619219081	0.389023329151375	-0.5759086932341301	-0.566177364987785
    114	1	1	0	-0.5236584390582704	-0.4733327294262123	-0.50514577813811	0.05665884172422545	1.0279728340089966	1.7363863608036554	-0.566177364987785
    142	0	0	1	-0.5236584390582704	-0.4733327294262123	-0.50514577813811	-0.7142183749335507	1.2675788983306047	-0.5759086932341301	-0.566177364987785
    16	0	0	1	-0.5236584390582704	-0.4733327294262123	-0.50514577813811	-0.7225353240006609	-0.09018879949184123	-0.5759086932341301	-0.566177364987785
    134	1	0	1	-0.5236584390582704	0.3526792885920798	-0.50514577813811	0.04470322744025449	-0.010320111384638533	-0.5759086932341301	0.017488103937846687
    110	1	0	1	-0.5236584390582704	-0.4733327294262123	0.7672497196248651	1.2506608421712326	-0.09018879949184123	1.7363863608036554	0.017488103937846687
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	col_2	col_3	col_4	col_5	survived
    0	34	-0.885025	-1.180737	0.100885	-0.635803	-0.007118	0.330957	0
    1	142	-1.149064	-0.329648	-0.857521	0.961232	0.258437	-1.077695	0
    2	120	-1.174492	-0.715732	-0.605833	0.365936	-0.214848	0.235200	0
    3	16	-0.891443	-0.859990	-0.138386	0.002481	0.458539	-0.989043	0
    4	190	-1.135244	-1.133515	-0.235449	0.025433	-0.140323	0.258624	0
    ...	...	...	...	...	...	...	...	...
    173	183	1.321608	-1.392658	0.622355	-0.251383	1.497978	1.361819	1
    174	138	-0.988370	-0.956751	-0.196295	-0.244715	-0.088857	0.295064	0
    175	60	-1.305914	-0.424775	-0.988488	0.866806	-0.319594	0.189375	1
    176	72	5.525947	-1.038043	-0.239991	-0.357263	-1.256634	0.570059	0
    177	61	-0.478894	0.088547	-0.394947	-0.932471	0.033191	0.408747	1
    
    178 rows × 8 columns
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : XGBOOST_2 
    Feature Selection Method : pca
    
     Prediction : 
       survived   id  Prediction  Confidence_Lower  Confidence_upper
    0         0  120           0             1.000             1.000
    1         0  190           0             1.000             1.000
    2         1  134           1             0.625             0.625
    3         1  144           0             0.750             0.750
    4         0   28           0             0.750             0.750
    5         1  168           1             0.750             0.750
    6         1  110           1             1.000             1.000
    7         0   16           1             0.750             0.750
    8         0  142           1             0.750             0.750
    9         0   34           0             0.750             0.750
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    0               0  CLASS_1       99       25   0.798387  0.900000  0.846154      110
    1               1  CLASS_2       11       43   0.796296  0.632353  0.704918       68
    
     ROC-AUC : 
    AUC	GINI
    0.7345588235294118	0.46911764705882364
    threshold_value	tpr	fpr
    0.04081632653061224	0.6323529411764706	0.1
    0.08163265306122448	0.6323529411764706	0.1
    0.1020408163265306	0.6323529411764706	0.1
    0.12244897959183673	0.6323529411764706	0.1
    0.16326530612244897	0.6323529411764706	0.1
    0.18367346938775508	0.6323529411764706	0.1
    0.14285714285714285	0.6323529411764706	0.1
    0.061224489795918366	0.6323529411764706	0.1
    0.02040816326530612	0.6323529411764706	0.1
    0.0	1.0	1.0
    
     Confusion Matrix : 
    array([[99, 11],
           [25, 43]], dtype=int64)
    >>> prediction.head()
    survived	id	Prediction	Confidence_Lower	Confidence_upper
    0	28	0	0.75	0.75
    0	152	0	0.625	0.625
    0	103	0	1.0	1.0
    0	31	0	0.875	0.875
    0	43	0	0.875	0.875
    0	37	0	0.875	0.875
    0	127	1	0.875	0.875
    0	26	0	1.0	1.0
    0	190	0	1.0	1.0
    0	120	0	1.0	1.0