Run AutoClassifier for classification problem using early stopping timer - Example 3: Run AutoClassifier for Classification Problem using Early Stopping Timer - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predict whether passenger aboard the RMS Titanic survived or not based on different factors.

Run AutoClassifier to get the best performing model out of available models with following specifications:
  • Use all default models except ‘knn’.
  • Set early stopping timer to 300 sec.
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "titanic")
      >>> titanic = DataFrame.from_table("titanic")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create an AutoClassifier instance.
    >>> aml = AutoClassifier(exclude='knn',
                             verbose=2,
                             max_runtime_secs=300)
  3. Fit the data.
    >>> aml.fit(titanic_train, 'survived')
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    
    Data Overview:
    Total Rows in the data: 713
    Total Columns in the data: 12
    
    Column Summary:
    ColumnName	Datatype	NonNullCount	NullCount	BlankCount	ZeroCount	PositiveCount	NegativeCount	NullPercentage	NonNullPercentage
    survived	INTEGER	713	0	None	444	269	0	0.0	100.0
    passenger	INTEGER	713	0	None	0	713	0	0.0	100.0
    embarked	VARCHAR(20) CHARACTER SET LATIN	712	1	0	None	None	None	0.1402524544179523	99.85974754558205
    fare	FLOAT	713	0	None	13	700	0	0.0	100.0
    sibsp	INTEGER	713	0	None	481	232	0	0.0	100.0
    name	VARCHAR(1000) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
    parch	INTEGER	713	0	None	535	178	0	0.0	100.0
    age	INTEGER	564	149	None	7	557	0	20.897615708274895	79.1023842917251
    sex	VARCHAR(20) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
    pclass	INTEGER	713	0	None	0	713	0	0.0	100.0
    cabin	VARCHAR(20) CHARACTER SET LATIN	159	554	0	None	None	None	77.69985974754559	22.30014025245442
    ticket	VARCHAR(20) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
    
    Statistics of Data:
    func	passenger	survived	pclass	age	sibsp	parch	fare
    min	1	0	1	0	0	0	0
    std	256.46	0.485	0.825	14.656	1.119	0.811	51.196
    25%	226	0	2	20	0	0	7.896
    50%	451	0	3	28	0	0	14.454
    75%	667	1	3	38	1	0	30.5
    max	891	1	3	80	8	6	512.329
    mean	446.952	0.377	2.325	29.246	0.54	0.393	31.973
    count	713	713	713	564	713	713	713
    
    Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    name                      713       
    sex                       2         
    ticket                    563       
    cabin                     124       
    embarked                  3         
    
    Futile columns in dataset:
    ColumnName
    name
    ticket
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                           
      ColumnName  OutlierPercentage
    0       fare          12.342216
    1      parch          24.964937
    2      sibsp           5.189341
    3        age          21.739130
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Feature Engineering started ...
    
    Handling duplicate records present in dataset ...
    Analysis completed. No action taken.                                                    
    
    Total time to handle duplicate records: 1.61 sec
    
    Handling less significant features from data ...
    
    Removing Futile columns:
    ['ticket', 'name']
    
    Sample of Data after removing Futile columns:
    passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
    61	0	3	male	22	0	0	7.2292	None	C	14
    469	0	3	male	None	0	0	7.725	None	Q	8
    183	0	3	male	9	4	2	31.3875	None	S	16
    80	1	3	female	30	0	0	12.475	None	S	12
    591	0	3	male	35	0	0	7.125	None	S	11
    387	0	3	male	1	5	2	46.9	None	S	19
    570	1	3	male	32	0	0	7.8542	None	S	15
    162	1	2	female	40	0	0	15.75	None	S	23
    40	1	3	female	14	1	0	11.2417	None	C	10
    631	1	1	male	80	0	0	30.0	A23	S	18
    
    713 rows X 11 columns
    
    Total time to handle less significant features: 21.47 sec
    
    Handling Date Features ...
    Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    
    Total time to handle date features: 0.02 sec
    
    Checking Missing values in dataset ...
    
    Columns with their missing values:
    age: 149
    cabin: 554
    embarked: 1
    
    Deleting rows of these columns for handling missing values:
    ['embarked']
    
    Sample of dataset after removing 1 rows:
    passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
    40	1	3	female	14	1	0	11.2417	None	C	10
    591	0	3	male	35	0	0	7.125	None	S	11
    387	0	3	male	1	5	2	46.9	None	S	19
    570	1	3	male	32	0	0	7.8542	None	S	15
    61	0	3	male	22	0	0	7.2292	None	C	14
    652	1	2	female	18	0	1	23.0	None	S	22
    469	0	3	male	None	0	0	7.725	None	Q	8
    183	0	3	male	9	4	2	31.3875	None	S	16
    80	1	3	female	30	0	0	12.475	None	S	12
    345	0	2	male	36	0	0	13.0	None	S	20
    
    712 rows X 11 columns
    
    Dropping these columns for handling missing values:
    ['cabin']
    
    Sample of dataset after removing 1 columns:
    passenger	survived	pclass	sex	age	sibsp	parch	fare	embarked	id
    469	0	3	male	None	0	0	7.725	Q	8
    80	1	3	female	30	0	0	12.475	S	12
    345	0	2	male	36	0	0	13.0	S	20
    61	0	3	male	22	0	0	7.2292	C	14
    305	0	3	male	None	0	0	8.05	S	13
    446	1	1	male	4	0	2	81.8583	S	21
    570	1	3	male	32	0	0	7.8542	S	15
    162	1	2	female	40	0	0	15.75	S	23
    591	0	3	male	35	0	0	7.125	S	11
    387	0	3	male	1	5	2	46.9	S	19
    
    712 rows X 10 columns
    
    Total time to find missing values in data: 17.32 sec
    
    Imputing Missing Values ...
    
    Columns with their imputation method:
    age: mean
    
    Sample of dataset after Imputation:
    passenger	survived	pclass	sex	age	sibsp	parch	fare	embarked	id
    711	1	1	female	24	0	0	49.5042	C	29
    709	1	1	female	22	0	0	151.55	S	45
    484	1	3	female	63	0	0	9.5875	S	53
    545	0	1	male	50	1	0	106.425	C	61
    667	0	2	male	25	0	0	13.0	S	77
    463	0	1	male	47	0	0	38.5	S	85
    402	0	3	male	26	0	0	8.05	S	69
    444	1	2	female	28	0	0	13.0	S	37
    446	1	1	male	4	0	2	81.8583	S	21
    305	0	3	male	29	0	0	8.05	S	13
    
    712 rows X 10 columns
    
    Time taken to perform imputation: 16.40 sec
    
    Performing encoding for categorical columns ...
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713847448878735"'18
    
    ONE HOT Encoding these Columns:
    ['sex', 'embarked']
    
    Sample of dataset after performing one hot encoding:
    passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
    774	0	3	0	1	29	0	0	7.225	1	0	0	24
    814	0	3	1	0	6	4	2	31.275	0	0	1	40
    364	0	3	0	1	35	0	0	7.05	0	0	1	48
    221	1	3	0	1	16	0	0	8.05	0	0	1	56
    812	0	3	0	1	39	0	0	24.15	0	0	1	72
    669	0	3	0	1	43	0	0	8.05	0	0	1	80
    547	1	2	1	0	19	1	0	26.0	0	0	1	64
    366	0	3	0	1	30	0	0	7.25	0	0	1	32
    183	0	3	0	1	9	4	2	31.3875	0	0	1	16
    469	0	3	0	1	29	0	0	7.725	0	1	0	8
    
    712 rows X 13 columns
    
    Time taken to encode the columns: 14.11 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Data preparation started ...
    
    Spliting of dataset into training and testing ...
    Training size : 0.8                                                                      
    Testing size  : 0.2                                                                      
    
    Training data sample
    passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
    40	1	3	1	0	14	1	0	11.2417	1	0	0	10
    591	0	3	0	1	35	0	0	7.125	0	0	1	11
    387	0	3	0	1	1	5	2	46.9	0	0	1	19
    80	1	3	1	0	30	0	0	12.475	0	0	1	12
    530	0	2	0	1	23	2	1	11.5	0	0	1	9
    101	0	3	1	0	28	0	0	7.8958	0	0	1	17
    305	0	3	0	1	29	0	0	8.05	0	0	1	13
    446	1	1	0	1	4	0	2	81.8583	0	0	1	21
    570	1	3	0	1	32	0	0	7.8542	0	0	1	15
    162	1	2	1	0	40	0	0	15.75	0	0	1	23
    
    569 rows X 13 columns
    
    Testing data sample
    passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
    774	0	3	0	1	29	0	0	7.225	1	0	0	24
    38	0	3	0	1	21	0	0	8.05	0	0	1	28
    339	1	3	0	1	45	0	0	8.05	0	0	1	124
    244	0	3	0	1	22	0	0	7.125	0	0	1	30
    711	1	1	1	0	24	0	0	49.5042	1	0	0	29
    194	1	2	0	1	3	1	1	26.0	0	0	1	125
    427	1	2	1	0	28	1	0	26.0	0	0	1	31
    97	0	1	0	1	71	0	0	34.6542	1	0	0	127
    448	1	1	0	1	34	0	0	26.55	0	0	1	27
    137	1	1	1	0	19	0	2	26.2833	0	0	1	123
    
    143 rows X 13 columns
    
    Time taken for spliting of data: 11.05 sec
    
    Outlier preprocessing ...
    Columns with outlier percentage :-                                                                           
      ColumnName  OutlierPercentage
    0       fare          12.219101
    1        age           7.162921
    2      sibsp           5.196629
    3      parch          25.000000
    
    Deleting rows of these columns:
    ['sibsp', 'age']
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713849531417344"'18
    
    Sample of training dataset after removing outlier rows:
    passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
    141	0	3	1	0	29	0	2	15.2458	1	0	0	46
    406	0	2	0	1	34	1	0	21.0	0	0	1	62
    875	1	2	1	0	28	1	0	24.0	1	0	0	70
    467	0	2	0	1	29	0	0	0.0	0	0	1	78
    343	0	2	0	1	28	0	0	13.0	0	0	1	110
    36	0	1	0	1	42	1	0	52.0	0	0	1	118
    629	0	3	0	1	26	0	0	7.8958	0	0	1	102
    610	1	1	1	0	40	0	0	153.4625	0	0	1	54
    652	1	2	1	0	18	0	1	23.0	0	0	1	22
    61	0	3	0	1	22	0	0	7.2292	1	0	0	14
    
    500 rows X 13 columns
    
    median inplace of outliers:
    ['fare', 'parch']
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713843547408813"'18
    
    Sample of training dataset after performing MEDIAN inplace:
    passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
    141	0	3	1	0	29	0	0	15.2458	1	0	0	46
    406	0	2	0	1	34	1	0	21.0	0	0	1	62
    875	1	2	1	0	28	1	0	24.0	1	0	0	70
    467	0	2	0	1	29	0	0	0.0	0	0	1	78
    343	0	2	0	1	28	0	0	13.0	0	0	1	110
    36	0	1	0	1	42	1	0	52.0	0	0	1	118
    629	0	3	0	1	26	0	0	7.8958	0	0	1	102
    610	1	1	1	0	40	0	0	13.0	0	0	1	54
    652	1	2	1	0	18	0	0	23.0	0	0	1	22
    61	0	3	0	1	22	0	0	7.2292	1	0	0	14
    
    500 rows X 13 columns
    
    Time Taken by Outlier processing: 55.03 sec
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713843671280482"'18
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713843382441478"'
    
    Checking imbalance data ...
    
    Imbalance Not Found.
    
    Feature selection using lasso ...
    
    feature selected by lasso:
    ['embarked_2', 'sex_0', 'sibsp', 'embarked_0', 'age', 'sex_1', 'pclass', 'embarked_1', 'passenger', 'fare']
    
    Total time taken by feature selection: 2.92 sec
    
    scaling Features of lasso data ...
    
    columns that will be scaled:
    ['sibsp', 'age', 'pclass', 'passenger', 'fare']
    
    Training dataset sample after scaling:
    embarked_2	id	sex_0	embarked_0	survived	sex_1	embarked_1	sibsp	age	pclass	passenger	fare
    1	59	1	0	1	0	0	0.5	0.37254901960784315	0.5	0.36292134831460676	0.5087719298245614
    1	67	1	0	1	0	0	0.0	0.2549019607843137	0.0	0.9584269662921349	0.6912280701754385
    0	218	0	1	0	1	0	0.0	0.5098039215686274	0.0	0.6258426966292134	0.22807017543859648
    1	75	0	0	0	1	0	0.0	0.3137254901960784	1.0	0.3393258426966292	0.0
    1	91	1	0	1	0	0	0.0	0.23529411764705882	0.0	0.7741573033707865	0.22807017543859648
    0	338	0	1	0	1	0	0.0	0.5294117647058824	1.0	0.27415730337078653	0.1267543859649123
    0	274	0	0	0	1	1	0.0	0.5098039215686274	1.0	0.14157303370786517	0.13596491228070176
    0	138	0	1	0	1	0	0.0	0.6274509803921569	1.0	0.9516853932584269	0.13852280701754385
    0	66	1	1	1	0	0	0.5	0.23529411764705882	1.0	0.9325842696629213	0.25358245614035085
    1	43	0	0	0	1	0	0.5	0.9607843137254902	0.0	0.2943820224719101	0.22807017543859648
    
    500 rows X 12 columns
    
    Testing dataset sample after scaling:
    embarked_2	id	sex_0	embarked_0	survived	sex_1	embarked_1	sibsp	age	pclass	passenger	fare
    1	369	0	0	0	1	0	0.0	0.29411764705882354	0.5	0.43258426966292135	1.2894736842105263
    1	449	0	0	0	1	0	0.5	0.29411764705882354	1.0	0.19662921348314608	0.13779298245614036
    0	373	0	1	1	1	0	0.5	0.43137254901960786	0.0	0.5438202247191011	1.597880701754386
    1	537	1	0	1	0	0	0.5	0.6470588235294118	0.5	0.5820224719101124	0.45614035087719296
    1	553	1	0	0	0	0	0.0	0.7450980392156863	1.0	0.7168539325842697	0.6962719298245614
    0	24	0	1	0	1	0	0.0	0.5098039215686274	1.0	0.8685393258426967	0.1267543859649123
    0	541	0	1	0	1	0	0.5	0.23529411764705882	1.0	0.3955056179775281	0.1268280701754386
    0	29	1	1	1	0	0	0.0	0.4117647058823529	0.0	0.797752808988764	0.8684947368421052
    0	481	0	1	0	1	0	0.0	0.5098039215686274	0.0	0.8606741573033708	0.6947368421052632
    1	185	0	0	0	1	0	0.0	0.5098039215686274	1.0	0.6752808988764045	0.13852280701754385
    
    143 rows X 12 columns
    
    Total time taken by feature scaling: 52.41 sec
    
    Feature selection using rfe ...
    
    feature selected by RFE:
    ['sex_0', 'age', 'sex_1', 'pclass', 'passenger', 'fare']
    
    Total time taken by feature selection: 32.31 sec
    
    scaling Features of rfe data ...
    
    columns that will be scaled:
    ['r_age', 'r_pclass', 'r_passenger', 'r_fare']
    
    Training dataset sample after scaling:
    r_sex_0	r_sex_1	survived	id	r_age	r_pclass	r_passenger	r_fare
    1	0	1	59	0.37254901960784315	0.5	0.36292134831460676	0.5087719298245614
    1	0	1	67	0.2549019607843137	0.0	0.9584269662921349	0.6912280701754385
    0	1	0	218	0.5098039215686274	0.0	0.6258426966292134	0.22807017543859648
    0	1	0	75	0.3137254901960784	1.0	0.3393258426966292	0.0
    1	0	1	91	0.23529411764705882	0.0	0.7741573033707865	0.22807017543859648
    0	1	0	338	0.5294117647058824	1.0	0.27415730337078653	0.1267543859649123
    0	1	0	274	0.5098039215686274	1.0	0.14157303370786517	0.13596491228070176
    0	1	0	138	0.6274509803921569	1.0	0.9516853932584269	0.13852280701754385
    1	0	1	66	0.23529411764705882	1.0	0.9325842696629213	0.25358245614035085
    0	1	0	43	0.9607843137254902	0.0	0.2943820224719101	0.22807017543859648
    
    500 rows X 8 columns
    
    Testing dataset sample after scaling:
    r_sex_0	r_sex_1	survived	id	r_age	r_pclass	r_passenger	r_fare
    0	1	0	369	0.29411764705882354	0.5	0.43258426966292135	1.2894736842105263
    0	1	0	449	0.29411764705882354	1.0	0.19662921348314608	0.13779298245614036
    0	1	1	373	0.43137254901960786	0.0	0.5438202247191011	1.597880701754386
    1	0	1	537	0.6470588235294118	0.5	0.5820224719101124	0.45614035087719296
    1	0	0	553	0.7450980392156863	1.0	0.7168539325842697	0.6962719298245614
    0	1	0	24	0.5098039215686274	1.0	0.8685393258426967	0.1267543859649123
    0	1	0	541	0.23529411764705882	1.0	0.3955056179775281	0.1268280701754386
    1	0	1	29	0.4117647058823529	0.0	0.797752808988764	0.8684947368421052
    0	1	0	481	0.5098039215686274	0.0	0.8606741573033708	0.6947368421052632
    0	1	0	185	0.5098039215686274	1.0	0.6752808988764045	0.13852280701754385
    
    143 rows X 8 columns
    
    Total time taken by feature scaling: 45.27 sec
    
    scaling Features of pca data ...
    
    columns that will be scaled:
    ['passenger', 'pclass', 'age', 'sibsp', 'fare']
    
    Training dataset sample after scaling:
    embarked_2	id	sex_0	embarked_0	survived	sex_1	parch	embarked_1	passenger	pclass	age	sibsp	fare
    0	8	0	0	0	1	0	1	0.5258426966292135	1.0	0.5098039215686274	0.0	0.1355263157894737
    1	9	0	0	0	1	0	0	0.5943820224719101	0.5	0.39215686274509803	1.0	0.20175438596491227
    1	17	1	0	0	0	0	0	0.11235955056179775	1.0	0.49019607843137253	0.0	0.13852280701754385
    0	14	0	1	0	1	0	0	0.06741573033707865	1.0	0.37254901960784315	0.0	0.1268280701754386
    1	15	0	0	1	1	0	0	0.6393258426966292	1.0	0.5686274509803921	0.0	0.13779298245614036
    1	23	1	0	1	0	0	0	0.18089887640449437	0.5	0.7254901960784313	0.0	0.27631578947368424
    1	13	0	0	0	1	0	0	0.3415730337078652	1.0	0.5098039215686274	0.0	0.14122807017543862
    1	21	0	0	1	1	0	0	0.5	0.0	0.0196078431372549	0.0	0.22807017543859648
    1	12	1	0	1	0	0	0	0.08876404494382023	1.0	0.5294117647058824	0.0	0.218859649122807
    1	20	0	0	0	1	0	0	0.3865168539325843	0.5	0.6470588235294118	0.0	0.22807017543859648
    
    500 rows X 13 columns
    
    Testing dataset sample after scaling:
    embarked_2	id	sex_0	embarked_0	survived	sex_1	parch	embarked_1	passenger	pclass	age	sibsp	fare
    1	26	1	0	0	0	2	0	0.13370786516853933	1.0	-0.0196078431372549	2.0	0.5486842105263158
    1	28	0	0	0	1	0	0	0.04157303370786517	1.0	0.35294117647058826	0.0	0.14122807017543862
    1	124	0	0	1	1	0	0	0.3797752808988764	1.0	0.8235294117647058	0.0	0.14122807017543862
    1	25	0	0	1	1	0	0	0.31797752808988766	1.0	0.3137254901960784	0.0	0.14122807017543862
    1	31	1	0	1	0	0	0	0.4786516853932584	0.5	0.49019607843137253	0.5	0.45614035087719296
    0	127	0	1	0	1	0	0	0.10786516853932585	0.0	1.3333333333333333	0.0	0.6079684210526316
    1	30	0	0	0	1	0	0	0.27303370786516856	1.0	0.37254901960784315	0.0	0.125
    1	126	0	0	0	1	0	0	0.12921348314606743	1.0	0.35294117647058826	0.0	0.13903508771929823
    1	27	0	0	1	1	0	0	0.5022471910112359	0.0	0.6078431372549019	0.0	0.46578947368421053
    1	123	1	0	1	0	2	0	0.15280898876404495	0.0	0.3137254901960784	0.0	0.46111052631578947
    
    143 rows X 13 columns
    
    Total time taken by feature scaling: 46.21 sec
    
    Dimension Reduction using pca ...
    
    PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5']
    
    Total time taken by PCA: 12.01 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Model Training started ...
    
    Hyperparameters used for model training:
    response_column : survived                                                                                   
    name : svm
    model_type : Classification
    lambda1 : (0.001, 0.02, 0.1)
    alpha : (0.15, 0.85)
    tolerance : (0.001, 0.01)
    learning_rate : OPTIMAL
    initial_eta : (0.05, 0.1)
    momentum : (0.65, 0.8, 0.95)
    nesterov : True
    intercept : True
    iter_num_no_change : (5, 10, 50)
    local_sgd_iterations  : (10, 20)
    iter_max : (300, 200, 400)
    batch_size : (10, 50, 60, 80)
    Total number of models for svm : 5184
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : survived
    name : decision_forest
    tree_type : Classification
    min_impurity : (0.0, 0.1, 0.2)
    max_depth : (5, 6, 8, 10)
    min_node_size : (1, 2, 3)
    num_trees : (-1, 20, 30)
    Total number of models for decision_forest : 108
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : survived
    name : glm
    family : BINOMIAL
    lambda1 : (0.001, 0.02, 0.1)
    alpha : (0.15, 0.85)
    learning_rate : OPTIMAL
    initial_eta : (0.05, 0.1)
    momentum : (0.65, 0.8, 0.95)
    iter_num_no_change : (5, 10, 50)
    iter_max : (300, 200, 400)
    batch_size : (10, 50, 60, 80)
    Total number of models for glm : 1296
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : survived
    name : xgboost
    model_type : Classification
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1, 0.2)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.1, 0.3)
    max_depth : (5, 6, 8, 10)
    min_node_size : (1, 2, 3)
    iter_num : (10, 20, 30)
    Total number of models for xgboost : 2592
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    Performing hyperParameter tuning ...
    
    svm
    
    ----------------------------------------------------------------------------------------------------
    
    decision_forest
    
    ----------------------------------------------------------------------------------------------------
    
    glm
    
    ----------------------------------------------------------------------------------------------------
    
    xgboost
    
    ----------------------------------------------------------------------------------------------------
    
    Evaluating models performance ...
    
    Evaluation completed.
    
    Leaderboard
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	DECISIONFOREST_3	lasso	0.832168	0.832168	0.832168	0.832168	0.820710	0.825114	0.822727	0.833646	0.832168	0.832740
    1	2	XGBOOST_3	lasso	0.825175	0.825175	0.825175	0.825175	0.816070	0.808573	0.811892	0.823842	0.825175	0.824126
    2	3	XGBOOST_0	lasso	0.825175	0.825175	0.825175	0.825175	0.816070	0.808573	0.811892	0.823842	0.825175	0.824126
    3	4	XGBOOST_2	pca	0.811189	0.811189	0.811189	0.811189	0.808163	0.782772	0.791444	0.810161	0.811189	0.807150
    4	5	DECISIONFOREST_0	lasso	0.804196	0.804196	0.804196	0.804196	0.816291	0.762588	0.775661	0.809972	0.804196	0.795244
    5	6	DECISIONFOREST_2	pca	0.790210	0.790210	0.790210	0.790210	0.820383	0.736787	0.750581	0.807015	0.790210	0.774915
    6	7	SVM_3	lasso	0.783217	0.783217	0.783217	0.783217	0.775737	0.753017	0.760547	0.780676	0.783217	0.778580
    7	8	SVM_1	rfe	0.783217	0.783217	0.783217	0.783217	0.775737	0.753017	0.760547	0.780676	0.783217	0.778580
    8	9	GLM_3	lasso	0.783217	0.783217	0.783217	0.783217	0.775737	0.753017	0.760547	0.780676	0.783217	0.778580
    9	10	GLM_1	rfe	0.776224	0.776224	0.776224	0.776224	0.768939	0.743758	0.751628	0.773575	0.776224	0.770758
    10	11	SVM_0	lasso	0.762238	0.762238	0.762238	0.762238	0.747332	0.750728	0.748864	0.764161	0.762238	0.763048
    11	12	GLM_2	pca	0.755245	0.755245	0.755245	0.755245	0.739876	0.734186	0.736648	0.752996	0.755245	0.753777
    12	13	SVM_2	pca	0.734266	0.734266	0.734266	0.734266	0.717544	0.706409	0.710465	0.729996	0.734266	0.730783
    13	14	DECISIONFOREST_1	rfe	0.734266	0.734266	0.734266	0.734266	0.730270	0.684561	0.691950	0.732240	0.734266	0.719894
    14	15	XGBOOST_1	rfe	0.713287	0.713287	0.713287	0.713287	0.707576	0.656783	0.661353	0.710172	0.713287	0.693811
    15	16	GLM_0	lasso	0.601399	0.601399	0.601399	0.601399	0.634236	0.636080	0.601321	0.671311	0.601399	0.602685
    
    16 rows X 13 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 18/18
  4. Display model leaderboard.
    >>> aml.leaderboard()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	DECISIONFOREST_3	lasso	0.832168	0.832168	0.832168	0.832168	0.820710	0.825114	0.822727	0.833646	0.832168	0.832740
    1	2	XGBOOST_3	lasso	0.825175	0.825175	0.825175	0.825175	0.816070	0.808573	0.811892	0.823842	0.825175	0.824126
    2	3	XGBOOST_0	lasso	0.825175	0.825175	0.825175	0.825175	0.816070	0.808573	0.811892	0.823842	0.825175	0.824126
    3	4	XGBOOST_2	pca	0.811189	0.811189	0.811189	0.811189	0.808163	0.782772	0.791444	0.810161	0.811189	0.807150
    4	5	DECISIONFOREST_0	lasso	0.804196	0.804196	0.804196	0.804196	0.816291	0.762588	0.775661	0.809972	0.804196	0.795244
    5	6	DECISIONFOREST_2	pca	0.790210	0.790210	0.790210	0.790210	0.820383	0.736787	0.750581	0.807015	0.790210	0.774915
    6	7	SVM_3	lasso	0.783217	0.783217	0.783217	0.783217	0.775737	0.753017	0.760547	0.780676	0.783217	0.778580
    7	8	SVM_1	rfe	0.783217	0.783217	0.783217	0.783217	0.775737	0.753017	0.760547	0.780676	0.783217	0.778580
    8	9	GLM_3	lasso	0.783217	0.783217	0.783217	0.783217	0.775737	0.753017	0.760547	0.780676	0.783217	0.778580
    9	10	GLM_1	rfe	0.776224	0.776224	0.776224	0.776224	0.768939	0.743758	0.751628	0.773575	0.776224	0.770758
    10	11	SVM_0	lasso	0.762238	0.762238	0.762238	0.762238	0.747332	0.750728	0.748864	0.764161	0.762238	0.763048
    11	12	GLM_2	pca	0.755245	0.755245	0.755245	0.755245	0.739876	0.734186	0.736648	0.752996	0.755245	0.753777
    12	13	SVM_2	pca	0.734266	0.734266	0.734266	0.734266	0.717544	0.706409	0.710465	0.729996	0.734266	0.730783
    13	14	DECISIONFOREST_1	rfe	0.734266	0.734266	0.734266	0.734266	0.730270	0.684561	0.691950	0.732240	0.734266	0.719894
    14	15	XGBOOST_1	rfe	0.713287	0.713287	0.713287	0.713287	0.707576	0.656783	0.661353	0.710172	0.713287	0.693811
    15	16	GLM_0	lasso	0.601399	0.601399	0.601399	0.601399	0.634236	0.636080	0.601321	0.671311	0.601399	0.602685
  5. Display the best performing model.
    >>> aml.leader()
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	DECISIONFOREST_3	lasso	0.832168	0.832168	0.832168	0.832168	0.82071	0.825114	0.822727	0.833646	0.832168	0.83274
  6. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    Following model is being used for generating prediction :
    Model ID : DECISIONFOREST_3 
    Feature Selection Method : lasso
     Prediction : 
       survived   id  prediction  prob
    0         0  369           0  0.50
    1         0  449           0  0.65
    2         1  373           1  0.65
    3         1  537           1  0.90
    4         0  553           0  0.60
    5         0   24           0  0.95
    6         0  541           0  0.65
    7         1   29           1  1.00
    8         0  481           0  0.85
    9         0  185           0  1.00
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    0               0  CLASS_1       76       11   0.873563  0.853933  0.863636       89
    1               1  CLASS_2       13       43   0.767857  0.796296  0.781818       54
     ROC-AUC : 
    AUC	GINI
    0.7669579692051602	0.5339159384103205
    threshold_value	tpr	fpr
    0.04081632653061224	0.7962962962962963	0.14606741573033707
    0.08163265306122448	0.7962962962962963	0.14606741573033707
    0.1020408163265306	0.7962962962962963	0.14606741573033707
    0.12244897959183673	0.7962962962962963	0.14606741573033707
    0.16326530612244897	0.7962962962962963	0.14606741573033707
    0.18367346938775508	0.7962962962962963	0.14606741573033707
    0.14285714285714285	0.7962962962962963	0.14606741573033707
    0.061224489795918366	0.7962962962962963	0.14606741573033707
    0.02040816326530612	0.7962962962962963	0.14606741573033707
    0.0	1.0	1.0
     Confusion Matrix : 
    array([[76, 13],
           [11, 43]], dtype=int64)
    >>> prediction.head()
    survived	id	prediction	prob
    0	553	0	0.6
    0	184	0	0.85
    0	633	0	0.95
    0	200	0	1.0
    0	448	0	1.0
    0	480	0	0.85
    0	208	0	0.95
    0	24	0	0.95
    0	541	0	0.65
    0	369	0	0.5
  7. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(titanic_test)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping futile columns :
    passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
    122	0	3	male	None	0	0	8.05	None	S	11
    734	0	2	male	23	0	0	13.0	None	S	14
    795	0	3	male	25	0	0	7.8958	None	S	22
    326	1	1	female	36	0	0	135.6333	C32	C	13
    242	1	3	female	None	1	0	15.5	None	Q	12
    507	1	2	female	33	0	2	26.0	None	S	20
    383	0	3	male	32	0	0	7.925	None	S	10
    648	1	1	male	56	0	0	35.5	A26	C	18
    835	0	3	male	18	0	0	8.3	None	S	15
    282	0	3	male	28	0	0	7.8542	None	S	23
    
    Updated dataset after performing target column transformation :
    cabin	id	sibsp	sex	age	parch	embarked	pclass	passenger	fare	survived
    C32	13	0	female	36	0	C	1	326	135.6333	1
    None	11	0	male	None	0	S	3	122	8.05	0
    None	19	0	female	18	1	S	3	856	9.35	1
    None	12	1	female	None	0	Q	3	242	15.5	1
    None	14	0	male	23	0	S	2	734	13.0	0
    None	22	0	male	25	0	S	3	795	7.8958	0
    None	8	0	male	28	0	S	3	509	22.525	0
    None	16	3	female	None	1	S	3	486	25.4667	0
    None	10	0	male	32	0	S	3	383	7.925	0
    A26	18	0	male	56	0	C	1	648	35.5	1
    
    Updated dataset after dropping missing value containing columns : 
    id	sibsp	sex	age	parch	embarked	pclass	passenger	fare	survived
    11	0	male	None	0	S	3	122	8.05	0
    13	0	female	36	0	C	1	326	135.6333	1
    21	0	male	11	0	C	3	732	18.7875	0
    9	0	female	None	0	Q	3	265	7.75	0
    12	1	female	None	0	Q	3	242	15.5	1
    20	0	female	33	2	S	2	507	26.0	1
    14	0	male	23	0	S	2	734	13.0	0
    22	0	male	25	0	S	3	795	7.8958	0
    15	0	male	18	0	S	3	835	8.3	0
    23	0	male	28	0	S	3	282	7.8542	0
    
    Updated dataset after imputing missing value containing columns :
    id	sibsp	sex	age	parch	embarked	pclass	passenger	fare	survived
    118	0	male	19	0	S	3	647	7.8958	0
    55	1	male	1	2	S	3	789	20.575	1
    135	0	female	55	0	S	2	16	16.0	1
    114	0	female	50	1	C	1	300	247.5208	1
    66	0	male	35	0	S	3	615	8.05	0
    83	0	male	17	0	S	3	434	7.125	0
    72	0	male	23	0	S	3	754	7.8958	0
    198	2	male	44	0	Q	1	246	90.0	0
    38	0	female	36	2	S	1	541	71.0	1
    80	5	male	11	2	S	3	60	46.9	0
    
    Found additional 1 rows that contain missing values :
    id	sibsp	sex	age	parch	embarked	pclass	passenger	fare	survived
    183	1	female	45	4	S	3	168	27.9	0
    40	1	male	49	0	C	1	600	56.9292	1
    120	0	male	22	0	S	3	113	8.05	0
    99	0	male	42	0	S	3	350	8.6625	0
    80	5	male	11	2	S	3	60	46.9	0
    38	0	female	36	2	S	1	541	71.0	1
    122	0	male	61	0	S	1	626	32.3208	0
    19	0	female	18	1	S	3	856	9.35	1
    61	0	female	45	0	S	2	707	13.5	1
    141	0	female	29	0	Q	3	48	7.75	1
    
    Updated dataset after dropping additional missing value containing rows :
    id	sibsp	sex	age	parch	embarked	pclass	passenger	fare	survived
    99	0	male	42	0	S	3	350	8.6625	0
    122	0	male	61	0	S	1	626	32.3208	0
    19	0	female	18	1	S	3	856	9.35	1
    80	5	male	11	2	S	3	60	46.9	0
    61	0	female	45	0	S	2	707	13.5	1
    141	0	female	29	0	Q	3	48	7.75	1
    183	1	female	45	4	S	3	168	27.9	0
    76	0	male	34	0	C	3	844	6.4375	0
    101	1	female	45	1	S	1	857	164.8667	1
    17	0	female	29	0	Q	3	301	7.75	1
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713851702256826"'
    
    Updated dataset after performing categorical encoding :
    id	sibsp	sex_0	sex_1	age	parch	embarked_0	embarked_1	embarked_2	pclass	passenger	fare	survived
    162	0	0	1	29	0	0	0	1	3	82	9.5	1
    183	1	1	0	45	4	0	0	1	3	168	27.9	0
    76	0	0	1	34	0	1	0	0	3	844	6.4375	0
    80	5	0	1	11	2	0	0	1	3	60	46.9	0
    40	1	0	1	49	0	1	0	0	1	600	56.9292	1
    120	0	0	1	22	0	0	0	1	3	113	8.05	0
    61	0	1	0	45	0	0	0	1	2	707	13.5	1
    141	0	1	0	29	0	0	1	0	3	48	7.75	1
    99	0	0	1	42	0	0	0	1	3	350	8.6625	0
    95	0	0	1	28	0	0	0	1	1	24	35.5	1
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USR"."ml__td_sqlmr_persist_out__1713844397835341"'
    
    Updated dataset after performing Lasso feature selection:
    id	embarked_2	sex_0	sibsp	embarked_0	age	sex_1	pclass	embarked_1	passenger	fare	survived
    87	1	0	0	0	32	1	3	0	520	7.8958	0
    123	1	0	0	0	30	1	3	0	489	8.05	0
    81	1	1	0	0	22	0	3	0	142	7.75	1
    142	1	1	0	0	42	0	2	0	866	13.0	1
    176	1	0	0	0	18	1	3	0	776	7.75	0
    33	1	0	0	0	47	1	1	0	663	25.5875	0
    9	0	1	0	0	29	0	3	1	265	7.75	0
    161	0	0	0	1	29	1	3	0	532	7.2292	0
    153	0	0	0	1	29	1	3	0	860	7.2292	0
    143	0	0	0	0	21	1	3	1	422	7.7333	0
    
    Updated dataset after performing scaling on Lasso selected features :
    embarked_2	id	sex_0	embarked_0	survived	sex_1	embarked_1	sibsp	age	pclass	passenger	fare
    1	87	0	0	0	1	0	0.0	0.5686274509803921	1.0	0.5831460674157304	0.13852280701754385
    1	123	0	0	0	1	0	0.0	0.5294117647058824	1.0	0.5483146067415731	0.14122807017543862
    1	81	1	0	1	0	0	0.0	0.37254901960784315	1.0	0.15842696629213482	0.13596491228070176
    1	142	1	0	1	0	0	0.0	0.7647058823529411	0.5	0.9719101123595506	0.22807017543859648
    1	176	0	0	0	1	0	0.0	0.29411764705882354	1.0	0.8707865168539326	0.13596491228070176
    1	33	0	0	0	1	0	0.0	0.8627450980392157	0.0	0.7438202247191011	0.4489035087719298
    0	9	1	0	0	0	1	0.0	0.5098039215686274	1.0	0.2966292134831461	0.13596491228070176
    0	161	0	1	0	1	0	0.0	0.5098039215686274	1.0	0.596629213483146	0.1268280701754386
    0	153	0	1	0	1	0	0.0	0.5098039215686274	1.0	0.9651685393258427	0.1268280701754386
    0	143	0	0	0	1	1	0.0	0.35294117647058826	1.0	0.4730337078651685	0.1356719298245614
    
    Updated dataset after performing RFE feature selection:
    id	sex_0	age	sex_1	pclass	passenger	fare	survived
    87	0	32	1	3	520	7.8958	0
    123	0	30	1	3	489	8.05	0
    81	1	22	0	3	142	7.75	1
    142	1	42	0	2	866	13.0	1
    176	0	18	1	3	776	7.75	0
    33	0	47	1	1	663	25.5875	0
    9	1	29	0	3	265	7.75	0
    161	0	29	1	3	532	7.2292	0
    153	0	29	1	3	860	7.2292	0
    143	0	21	1	3	422	7.7333	0
    
    Updated dataset after performing scaling on RFE selected features :
    r_sex_0	r_sex_1	survived	id	r_age	r_pclass	r_passenger	r_fare
    0	1	0	87	0.5686274509803921	1.0	0.5831460674157304	0.13852280701754385
    0	1	0	123	0.5294117647058824	1.0	0.5483146067415731	0.14122807017543862
    1	0	1	81	0.37254901960784315	1.0	0.15842696629213482	0.13596491228070176
    1	0	1	142	0.7647058823529411	0.5	0.9719101123595506	0.22807017543859648
    0	1	0	176	0.29411764705882354	1.0	0.8707865168539326	0.13596491228070176
    0	1	0	33	0.8627450980392157	0.0	0.7438202247191011	0.4489035087719298
    1	0	0	9	0.5098039215686274	1.0	0.2966292134831461	0.13596491228070176
    0	1	0	161	0.5098039215686274	1.0	0.596629213483146	0.1268280701754386
    0	1	0	153	0.5098039215686274	1.0	0.9651685393258427	0.1268280701754386
    0	1	0	143	0.35294117647058826	1.0	0.4730337078651685	0.1356719298245614
    
    Updated dataset after performing scaling for PCA feature selection :
    embarked_2	sex_0	id	embarked_0	survived	sex_1	parch	embarked_1	passenger	pclass	age	sibsp	fare
    0	0	153	1	0	1	0	0	0.9651685393258427	1.0	0.5098039215686274	0.0	0.1268280701754386
    0	1	57	1	1	0	0	0	0.44157303370786516	0.0	0.39215686274509803	0.5	1.987280701754386
    0	1	158	0	1	0	0	1	0.7831460674157303	1.0	0.5098039215686274	0.0	0.1356719298245614
    0	1	175	1	1	0	0	0	0.42134831460674155	0.0	0.5098039215686274	0.5	1.4415929824561404
    0	0	35	1	0	1	0	0	0.08202247191011236	1.0	0.45098039215686275	0.5	0.25358245614035085
    0	0	94	1	0	1	0	0	0.06404494382022471	1.0	0.49019607843137253	0.0	0.1268280701754386
    1	1	156	0	0	0	1	0	0.350561797752809	0.5	0.45098039215686275	0.5	0.45614035087719296
    1	0	89	0	0	1	1	0	0.1797752808988764	1.0	0.803921568627451	0.0	0.28245614035087724
    1	0	87	0	0	1	0	0	0.5831460674157304	1.0	0.5686274509803921	0.0	0.13852280701754385
    1	1	106	0	1	0	2	0	0.3831460674157303	0.0	0.4117647058823529	1.5	4.614035087719298
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	col_2	col_3	col_4	col_5	survived
    0	156	0.798037	-0.658267	-0.179850	-0.126249	0.169360	0.212713	0
    1	9	1.013059	0.113801	0.916631	0.603353	0.469476	-0.145605	0
    2	89	-0.622825	-0.158456	0.178399	-0.225522	0.240607	-0.114017	0
    3	161	-0.131053	1.130796	0.311504	-0.274859	-0.307450	-0.088209	0
    4	87	-0.632859	-0.161795	0.229150	-0.071917	-0.133835	-0.079546	0
    ...	...	...	...	...	...	...	...	...
    172	110	-0.555536	-0.114490	-0.188127	-0.068215	0.246072	-0.229338	0
    173	64	-0.575110	-0.171498	0.125830	-0.147249	-0.017382	0.428679	0
    174	11	-0.622259	-0.153929	0.255290	-0.286482	0.246139	-0.168777	0
    175	34	0.671605	-0.685764	0.387038	-0.289769	0.073168	-0.259325	1
    176	101	0.962386	-0.693145	-1.316424	0.531144	-0.066025	0.948557	1
    
    177 rows × 8 columns
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : DECISIONFOREST_3 
    Feature Selection Method : lasso
    
     Prediction : 
       survived   id  prediction  prob
    0         0  153           0  1.00
    1         1   57           1  0.95
    2         1  158           1  0.70
    3         1  175           1  0.95
    4         0   35           0  0.80
    5         0   94           0  0.90
    6         0  156           1  1.00
    7         0   89           0  0.95
    8         0   87           0  0.95
    9         1  106           1  0.95
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    1               1  CLASS_2       20       59   0.746835  0.819444  0.781457       72
    0               0  CLASS_1       85       13   0.867347  0.809524  0.837438      105
    
     ROC-AUC : 
    AUC	GINI
    0.736441798941799	0.4728835978835979
    threshold_value	tpr	fpr
    0.04081632653061224	0.8194444444444444	0.19047619047619047
    0.08163265306122448	0.8194444444444444	0.19047619047619047
    0.1020408163265306	0.8194444444444444	0.19047619047619047
    0.12244897959183673	0.8194444444444444	0.19047619047619047
    0.16326530612244897	0.8194444444444444	0.19047619047619047
    0.18367346938775508	0.8194444444444444	0.19047619047619047
    0.14285714285714285	0.8194444444444444	0.19047619047619047
    0.061224489795918366	0.8194444444444444	0.19047619047619047
    0.02040816326530612	0.8194444444444444	0.19047619047619047
    0.0	1.0	1.0
    
     Confusion Matrix : 
    array([[85, 20],
           [13, 59]], dtype=int64)
    >>> prediction.head()
    survived	id	prediction	prob
    0	191	0	1.0
    0	176	0	0.7
    0	14	0	0.85
    0	120	0	0.9
    0	133	0	0.85
    0	187	1	0.8
    0	9	1	0.75
    0	22	0	1.0
    0	107	0	0.95
    0	56	0	1.0