Run AutoML for Classification Problem Using Early Stopping Metrics and max_models - Example 7: Run AutoML for Classification Problem Using Early Stopping Metrics and max_models - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage
This example predicts whether passenger aboard the RMS Titanic survived or not based on different factors. Run AutoClassifier to get the best performing model out of available models with following specifications:
  • Use 3 models for training i.e., ‘glm’, ‘svm’ and ‘xgboost’
  • Utilize two different early stopping criteria - early stopping metrics ‘MICRO-RECALL’ with threshold value 0.9 and maximum models that will be trained as 13
  • Opt for verbose level 2 to get detailed log.
  1. Load data and split it to train and test datasets.
    1. Load the example data and create teradataml DataFrame.
      >>> load_example_data("teradataml", "titanic")
      >>> titanic = DataFrame.from_table("titanic")
    2. Perform sampling to get 80% for training and 20% for testing.
      >>> titanic_sample = titanic.sample(frac = [0.8, 0.2])
    3. Fetch train and test data.
      >>> titanic_train= titanic_sample[titanic_sample['sampleid'] == 1].drop('sampleid', axis=1)
      >>> titanic_test = titanic_sample[titanic_sample['sampleid'] == 2].drop('sampleid', axis=1)
  2. Create Autoclassifier instance and fit on dataset.
    1. Create an instance of AutoML.
      >>> aml = AutoML(include=['glm','svm','xgboost'],
      >>>              verbose=2,
      >>>              stopping_metric='MICRO-RECALL',
      >>>              stopping_tolerance=0.9,
      >>>              max_models=13)
    2. Fit the data.
      >>> aml.fit(titanic_train, 'survived')
      Task type is set to Classification as target column is having distinct values less than or equal to 20.
      1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
      Feature Exploration started ...
      
      
      Data Overview:
      Total Rows in the data: 713
      Total Columns in the data: 12
      
      
      Column Summary:
      ColumnName	Datatype	NonNullCount	NullCount	BlankCount	ZeroCount	PositiveCount	NegativeCount	NullPercentage	NonNullPercentage
      sex	VARCHAR(20) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
      name	VARCHAR(1000) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
      parch	INTEGER	713	0	None	542	171	0	0.0	100.0
      passenger	INTEGER	713	0	None	0	713	0	0.0	100.0
      fare	FLOAT	713	0	None	12	701	0	0.0	100.0
      cabin	VARCHAR(20) CHARACTER SET LATIN	167	546	0	None	None	None	76.57784011220197	23.422159887798035
      survived	INTEGER	713	0	None	440	273	0	0.0	100.0
      pclass	INTEGER	713	0	None	0	713	0	0.0	100.0
      embarked	VARCHAR(20) CHARACTER SET LATIN	711	2	0	None	None	None	0.2805049088359046	99.71949509116409
      sibsp	INTEGER	713	0	None	489	224	0	0.0	100.0
      age	INTEGER	565	148	None	5	560	0	20.757363253856944	79.24263674614306
      ticket	VARCHAR(20) CHARACTER SET LATIN	713	0	0	None	None	None	0.0	100.0
      
      Statistics of Data:
      func	passenger	survived	pclass	age	sibsp	parch	fare
      50%	448	0	3	28	0	0	14.5
      count	713	713	713	565	713	713	713
      mean	447.447	0.383	2.307	29.517	0.525	0.383	32.735
      min	2	0	1	0	0	0	0
      max	891	1	3	80	8	6	512.329
      75%	674	1	3	38	1	0	31.275
      25%	223	0	2	21	0	0	7.896
      std	256.643	0.486	0.841	14.462	1.14	0.811	49.141
      
      
      Categorical Columns with their Distinct values:
      ColumnName                DistinctValueCount
      name                      713  
      sex                       2   
      ticket                    569 
      cabin                     128 
      embarked                  3
      
      Futile columns in dataset:
      ColumnName
      name
      ticket
      
      
      Target Column Distribution:
      
      
      Columns with outlier percentage :-                                                                         
      
                                                                             
        ColumnName  OutlierPercentage
      0        age          22.159888
      1      parch          23.983170
      2      sibsp           4.908836
      3       fare          14.305750
      
      
      
      1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
      
      
      Feature Engineering started ...
      
      Handling duplicate records present in dataset ...
      Analysis completed. No action taken.                                                    
      
      Total time to handle duplicate records: 1.68 sec
      
      Handling less significant features from data ...
      
      Removing Futile columns:
      ['ticket', 'name']
      
      Sample of Data after removing Futile columns:
      passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
      265	0	3	female	None	0	0	7.75	None	Q	9
      122	0	3	male	None	0	0	8.05	None	S	11
      591	0	3	male	35	0	0	7.125	None	S	19
      734	0	2	male	23	0	0	13.0	None	S	14
      326	1	1	female	36	0	0	135.6333	C32	C	13
      305	0	3	male	None	0	0	8.05	None	S	21
      631	1	1	male	80	0	0	30.0	A23	S	10
      120	0	3	female	2	4	2	31.275	None	S	18
      80	1	3	female	30	0	0	12.475	None	S	12
      345	0	2	male	36	0	0	13.0	None	S	20
      
      713 rows X 11 columns
      
      Total time to handle less significant features: 19.67 sec
      
      Handling Date Features ...
      
      Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
      
      Total time to handle date features: 0.00 sec
      
      Checking Missing values in dataset ...
      
      Columns with their missing values:
      age: 148
      cabin: 546
      embarked: 2
      
      Deleting rows of these columns for handling missing values:
      ['embarked']
      
      Sample of dataset after removing 2 rows:
      passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
      122	0	3	male	None	0	0	8.05	None	S	11
      570	1	3	male	32	0	0	7.8542	None	S	15
      835	0	3	male	18	0	0	8.3	None	S	23
      265	0	3	female	None	0	0	7.75	None	Q	9
      631	1	1	male	80	0	0	30.0	A23	S	10
      120	0	3	female	2	4	2	31.275	None	S	18
      734	0	2	male	23	0	0	13.0	None	S	14
      61	0	3	male	22	0	0	7.2292	None	C	22
      80	1	3	female	30	0	0	12.475	None	S	12
      345	0	2	male	36	0	0	13.0	None	S	20
      
      711 rows X 11 columns
      
      Dropping these columns for handling missing values:
      ['cabin']
      
      Sample of dataset after removing 1 columns:
      passenger	survived	pclass	sex	age	sibsp	parch	fare	embarked	id
      265	0	3	female	None	0	0	7.75	Q	9
      326	1	1	female	36	0	0	135.6333	C	13
      305	0	3	male	None	0	0	8.05	S	21
      80	1	3	female	30	0	0	12.475	S	12
      734	0	2	male	23	0	0	13.0	S	14
      61	0	3	male	22	0	0	7.2292	C	22
      631	1	1	male	80	0	0	30.0	S	10
      120	0	3	female	2	4	2	31.275	S	18
      570	1	3	male	32	0	0	7.8542	S	15
      835	0	3	male	18	0	0	8.3	S	23
      
      711 rows X 10 columns
      
      Total time to find missing values in data: 15.02 sec
      
      Imputing Missing Values ...
      
      Columns with their imputation method:
      age: mean
      
      Sample of dataset after Imputation:
      passenger	survived	pclass	sex	age	sibsp	parch	fare	embarked	id
      162	1	2	female	40	0	0	15.75	S	31
      223	0	3	male	51	0	0	8.05	S	47
      692	1	3	female	4	0	1	13.4167	C	55
      753	0	3	male	33	0	0	9.5	S	63
      671	1	2	female	40	1	1	39.0	S	79
      528	0	1	male	29	0	0	221.7792	S	87
      202	0	3	male	29	8	2	69.55	S	71
      427	1	2	female	28	1	0	26.0	S	39
      835	0	3	male	18	0	0	8.3	S	23
      570	1	3	male	32	0	0	7.8542	S	15
      
      711 rows X 10 columns
      
      Time taken to perform imputation: 15.64 sec
      
      Performing encoding for categorical columns ...
      result data stored in table '"automl_user"."ml__td_sqlmr_persist_out__1713324989208786"'7
      
      ONE HOT Encoding these Columns:
      ['sex', 'embarked']
      
      Sample of dataset after performing one hot encoding:
      passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
      38	0	3	0	1	21	0	0	8.05	0	0	1	28
      772	0	3	0	1	48	0	0	7.8542	0	0	1	44
      425	0	3	0	1	18	1	1	20.2125	0	0	1	52
      118	0	2	0	1	29	1	0	21.0	0	0	1	60
      852	0	3	0	1	74	0	0	7.775	0	0	1	76
      505	1	1	1	0	16	0	0	86.5	0	0	1	84
      587	0	2	0	1	47	0	0	15.0	0	0	1	68
      507	1	2	1	0	33	0	2	26.0	0	0	1	36
      345	0	2	0	1	36	0	0	13.0	0	0	1	20
      80	1	3	1	0	30	0	0	12.475	0	0	1	12
      
      711 rows X 13 columns
      
      Time taken to encode the columns: 13.25 sec
      
      
      1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
      
      Data preparation started ...
      
      Spliting of dataset into training and testing ...
      Training size : 0.8                                                                      
      Testing size  : 0.2 
      
      
      Training data sample
      passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
      265	0	3	1	0	29	0	0	7.75	0	1	0	9
      122	0	3	0	1	29	0	0	8.05	0	0	1	11
      591	0	3	0	1	35	0	0	7.125	0	0	1	19
      570	1	3	0	1	32	0	0	7.8542	0	0	1	15
      326	1	1	1	0	36	0	0	135.6333	1	0	0	13
      305	0	3	0	1	29	0	0	8.05	0	0	1	21
      734	0	2	0	1	23	0	0	13.0	0	0	1	14
      61	0	3	0	1	22	0	0	7.2292	1	0	0	22
      80	1	3	1	0	30	0	0	12.475	0	0	1	12
      345	0	2	0	1	36	0	0	13.0	0	0	1	20
      
      568 rows X 13 columns
      
      Testing data sample
      passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
      101	0	3	1	0	28	0	0	7.8958	0	0	1	25
      387	0	3	0	1	1	5	2	46.9	0	0	1	27
      871	0	3	0	1	26	0	0	7.8958	0	0	1	123
      38	0	3	0	1	21	0	0	8.05	0	0	1	28
      732	0	3	0	1	11	0	0	18.7875	1	0	0	29
      196	1	1	1	0	58	0	0	146.5208	1	0	0	125
      652	1	2	1	0	18	0	1	23.0	0	0	1	30
      585	0	3	0	1	29	0	0	8.7125	1	0	0	126
      162	1	2	1	0	40	0	0	15.75	0	0	1	31
      139	0	3	0	1	16	0	0	9.2167	0	0	1	127
      
      143 rows X 13 columns
      
      Time taken for spliting of data: 10.71 sec
      
      Outlier preprocessing ...
      Columns with outlier percentage :-                                                                         
        ColumnName  OutlierPercentage
      0        age           6.751055
      1       fare          14.064698
      2      sibsp           4.922644
      3      parch          24.050633
      
      Deleting rows of these columns:
      ['sibsp', 'age']
      result data stored in table '"automl_user"."ml__td_sqlmr_persist_out__1713328140917821"'7
      
      Sample of training dataset after removing outlier rows:
      passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
      427	1	2	1	0	28	1	0	26.0	0	0	1	39
      692	1	3	1	0	4	0	1	13.4167	1	0	0	55
      753	0	3	0	1	33	0	0	9.5	0	0	1	63
      671	1	2	1	0	40	1	1	39.0	0	0	1	79
      589	0	3	0	1	22	0	0	8.05	0	0	1	103
      833	0	3	0	1	29	0	0	7.2292	1	0	0	111
      528	0	1	0	1	29	0	0	221.7792	0	0	1	87
      223	0	3	0	1	51	0	0	8.05	0	0	1	47
      835	0	3	0	1	18	0	0	8.3	0	0	1	23
      570	1	3	0	1	32	0	0	7.8542	0	0	1	15
      
      494 rows X 13 columns
      
      median inplace of outliers:
      ['fare', 'parch']
      result data stored in table '"automl_user"."ml__td_sqlmr_persist_out__1713332169253155"'7
      
      Sample of training dataset after performing MEDIAN inplace:
      passenger	survived	pclass	sex_0	sex_1	age	sibsp	parch	fare	embarked_0	embarked_1	embarked_2	id
      507	1	2	1	0	33	0	0	26.0	0	0	1	36
      425	0	3	0	1	18	1	0	20.2125	0	0	1	52
      118	0	2	0	1	29	1	0	21.0	0	0	1	60
      587	0	2	0	1	47	0	0	15.0	0	0	1	68
      362	0	2	0	1	29	1	0	27.7208	1	0	0	92
      198	0	3	0	1	42	0	0	8.4042	0	0	1	100
      505	1	1	1	0	16	0	0	13.0	0	0	1	84
      772	0	3	0	1	48	0	0	7.8542	0	0	1	44
      345	0	2	0	1	36	0	0	13.0	0	0	1	20
      80	1	3	1	0	30	0	0	12.475	0	0	1	12
      
      494 rows X 13 columns
      
      Time Taken by Outlier processing: 48.73 sec
      result data stored in table '"automl_user"."ml__td_sqlmr_persist_out__1713325332166272"'7
      result data stored in table '"automl_user"."ml__td_sqlmr_persist_out__1713325412941558"'
      
      Checking imbalance data ...
      
      Imbalance Not Found.
      
      Feature selection using lasso ...
      
      feature selected by lasso:
      ['sex_1', 'embarked_0', 'pclass', 'fare', 'age', 'sibsp', 'sex_0', 'embarked_2', 'passenger', 'embarked_1']
      
      Total time taken by feature selection: 2.79 sec
      
      scaling Features of lasso data ...
      
      columns that will be scaled:
      ['pclass', 'fare', 'age', 'sibsp', 'passenger']
      
      Training dataset sample after scaling:
      id	sex_1	embarked_0	survived	sex_0	embarked_2	embarked_1	pclass	fare	age	sibsp	passenger
      40	1	0	0	0	1	0	1.0	0.1327683615819209	0.6458333333333334	0.0	0.40719910011248595
      80	0	1	1	1	0	0	0.0	0.2448210922787194	0.5833333333333334	0.0	0.2440944881889764
      326	0	0	1	1	0	1	1.0	0.14595103578154425	0.5208333333333334	0.0	0.4128233970753656
      734	0	0	1	1	1	0	0.5	0.2448210922787194	0.6666666666666666	0.0	0.36670416197975253
      509	1	0	0	0	1	0	1.0	0.3032015065913371	0.5416666666666666	0.5	0.28346456692913385
      101	0	0	1	1	1	0	0.0	1.0	0.6041666666666666	0.5	0.9088863892013498
      570	1	1	0	0	0	0	1.0	0.13606403013182672	0.5208333333333334	0.0	0.0281214848143982
      591	1	1	0	0	0	0	1.0	0.13606403013182672	0.5208333333333334	0.0	0.5860517435320585
      530	1	0	0	0	1	0	0.5	0.2448210922787194	0.625	0.0	0.8110236220472441
      469	0	0	1	1	1	0	0.0	0.2448210922787194	0.375	0.0	0.39932508436445446
      
      494 rows X 12 columns
      
      Testing dataset sample after scaling:
      id	sex_1	embarked_0	survived	sex_0	embarked_2	embarked_1	pclass	fare	age	sibsp	passenger
      120	0	1	1	1	0	0	0.0	1.678045197740113	0.5208333333333334	0.5	0.953880764904387
      242	1	0	0	0	0	1	1.0	0.14595103578154425	0.5208333333333334	0.0	0.140607424071991
      650	0	1	1	1	0	0	0.0	2.0881977401129945	0.5208333333333334	0.0	0.34308211473565803
      244	1	0	0	0	1	0	1.0	0.3032015065913371	0.5208333333333334	0.5	0.7176602924634421
      486	0	0	1	1	1	0	0.5	0.4896421845574388	0.5208333333333334	0.5	0.05849268841394826
      747	1	0	0	0	0	1	1.0	0.14595103578154425	0.5208333333333334	0.0	0.688413948256468
      202	1	0	0	0	1	0	0.5	0.2448210922787194	0.7916666666666666	0.0	0.16647919010123735
      122	1	1	0	0	0	0	1.0	0.14869679849340867	0.6458333333333334	0.0	0.9516310461192351
      549	1	1	0	0	0	0	0.0	2.5542994350282484	0.375	0.0	0.4184476940382452
      774	1	0	0	0	1	0	0.0	1.455508474576271	0.3541666666666667	0.0	0.11361079865016872
      
      143 rows X 12 columns
      
      Total time taken by feature scaling: 45.68 sec
      
      Feature selection using rfe ...
      
      feature selected by RFE:
      ['pclass', 'age', 'sex_0', 'sex_1', 'passenger', 'fare']
      
      Total time taken by feature selection: 31.48 sec
      
      scaling Features of rfe data ...
      
      columns that will be scaled:
      ['r_pclass', 'r_age', 'r_passenger', 'r_fare']
      
      Training dataset sample after scaling:
      id	r_sex_1	r_sex_0	survived	r_pclass	r_age	r_passenger	r_fare
      40	1	0	0	1.0	0.6458333333333334	0.40719910011248595	0.1327683615819209
      80	0	1	1	0.0	0.5833333333333334	0.2440944881889764	0.2448210922787194
      326	0	1	1	1.0	0.5208333333333334	0.4128233970753656	0.14595103578154425
      734	0	1	1	0.5	0.6666666666666666	0.36670416197975253	0.2448210922787194
      509	1	0	0	1.0	0.5416666666666666	0.28346456692913385	0.3032015065913371
      101	0	1	1	0.0	0.6041666666666666	0.9088863892013498	1.0
      570	1	0	0	1.0	0.5208333333333334	0.0281214848143982	0.13606403013182672
      591	1	0	0	1.0	0.5208333333333334	0.5860517435320585	0.13606403013182672
      530	1	0	0	0.5	0.625	0.8110236220472441	0.2448210922787194
      469	0	1	1	0.0	0.375	0.39932508436445446	0.2448210922787194
      
      494 rows X 8 columns
      
      Testing dataset sample after scaling:
      id	r_sex_1	r_sex_0	survived	r_pclass	r_age	r_passenger	r_fare
      120	0	1	1	0.0	0.5208333333333334	0.953880764904387	1.678045197740113
      242	1	0	0	1.0	0.5208333333333334	0.140607424071991	0.14595103578154425
      650	0	1	1	0.0	0.5208333333333334	0.34308211473565803	2.0881977401129945
      244	1	0	0	1.0	0.5208333333333334	0.7176602924634421	0.3032015065913371
      486	0	1	1	0.5	0.5208333333333334	0.05849268841394826	0.4896421845574388
      747	1	0	0	1.0	0.5208333333333334	0.688413948256468	0.14595103578154425
      202	1	0	0	0.5	0.7916666666666666	0.16647919010123735	0.2448210922787194
      122	1	0	0	1.0	0.6458333333333334	0.9516310461192351	0.14869679849340867
      549	1	0	0	0.0	0.375	0.4184476940382452	2.5542994350282484
      774	1	0	0	0.0	0.3541666666666667	0.11361079865016872	1.455508474576271
      
      143 rows X 8 columns
      
      Total time taken by feature scaling: 46.86 sec
      
      scaling Features of pca data ...
      
      columns that will be scaled:
      ['passenger', 'pclass', 'age', 'sibsp', 'fare']
      
      Training dataset sample after scaling:
      id	sex_1	embarked_0	parch	survived	sex_0	embarked_2	embarked_1	passenger	pclass	age	sibsp	fare
      9	0	0	0	0	1	0	1	0.2958380202474691	1.0	0.5208333333333334	0.0	0.14595103578154425
      11	1	0	0	0	0	1	0	0.13498312710911137	1.0	0.5208333333333334	0.0	0.15160075329566855
      19	1	0	0	0	0	1	0	0.6625421822272216	1.0	0.6458333333333334	0.0	0.13418079096045196
      15	1	0	0	1	0	1	0	0.6389201349831272	1.0	0.5833333333333334	0.0	0.14791337099811674
      13	0	1	0	1	1	0	0	0.3644544431946007	0.0	0.6666666666666666	0.0	0.2448210922787194
      21	1	0	0	0	0	1	0	0.3408323959505062	1.0	0.5208333333333334	0.0	0.15160075329566855
      14	1	0	0	0	0	1	0	0.8233970753655793	0.5	0.3958333333333333	0.0	0.2448210922787194
      22	1	1	0	0	0	0	0	0.06636670416197975	1.0	0.375	0.0	0.13614312617702448
      12	0	0	0	1	1	1	0	0.08773903262092239	1.0	0.5416666666666666	0.0	0.23493408662900186
      20	1	0	0	0	0	1	0	0.3858267716535433	0.5	0.6666666666666666	0.0	0.2448210922787194
      
      494 rows X 13 columns
      
      Testing dataset sample after scaling:
      id	sex_1	embarked_0	parch	survived	sex_0	embarked_2	embarked_1	passenger	pclass	age	sibsp	fare
      25	0	0	0	0	1	1	0	0.11136107986501688	1.0	0.5	0.0	0.14869679849340867
      29	1	1	0	0	0	0	0	0.8211473565804275	1.0	0.14583333333333334	0.0	0.3538135593220339
      125	0	1	0	1	1	0	0	0.21822272215973004	0.0	1.125	0.0	2.759337099811676
      26	0	0	1	1	1	1	0	0.84251968503937	0.5	0.0	0.5	0.4331450094161958
      27	1	0	2	0	0	1	0	0.4330708661417323	1.0	-0.0625	2.5	0.8832391713747646
      123	1	0	0	0	0	1	0	0.9775028121484814	1.0	0.4583333333333333	0.0	0.14869679849340867
      28	1	0	0	0	0	1	0	0.04049493813273341	1.0	0.3541666666666667	0.0	0.15160075329566855
      124	1	0	1	0	0	1	0	0.1968503937007874	1.0	0.5208333333333334	1.5	0.47959887005649715
      31	0	0	0	1	1	1	0	0.17997750281214847	0.5	0.75	0.0	0.2966101694915254
      127	1	0	0	0	0	1	0	0.15410573678290213	1.0	0.25	0.0	0.1735725047080979
      
      143 rows X 13 columns
      
      Total time taken by feature scaling: 44.56 sec
      
      Dimension Reduction using pca ...
      
      PCA columns:
      ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5']
      
      Total time taken by PCA: 12.29 sec
      
      
      1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
      
      Model Training started ...
      
      Hyperparameters used for model training:
      response_column : survived                                                                                                                            
      name : xgboost
      model_type : Classification
      column_sampling : (1, 0.6)
      min_impurity : (0.0, 0.1, 0.2)
      lambda1 : (0.01, 0.1, 1, 10)
      shrinkage_factor : (0.5, 0.1, 0.3)
      max_depth : (5, 6, 8, 10)
      min_node_size : (1, 2, 3)
      iter_num : (10, 20, 30)
      Total number of models for xgboost : 2592
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      response_column : survived
      name : svm
      model_type : Classification
      lambda1 : (0.001, 0.02, 0.1)
      alpha : (0.15, 0.85)
      tolerance : (0.001, 0.01)
      learning_rate : OPTIMAL
      initial_eta : (0.05, 0.1)
      momentum : (0.65, 0.8, 0.95)
      nesterov : True
      intercept : True
      iter_num_no_change : (5, 10, 50)
      local_sgd_iterations  : (10, 20)
      iter_max : (300, 200, 400)
      batch_size : (10, 50, 60, 80)
      Total number of models for svm : 5184
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      response_column : survived
      name : glm
      family : BINOMIAL
      lambda1 : (0.001, 0.02, 0.1)
      alpha : (0.15, 0.85)
      learning_rate : OPTIMAL
      initial_eta : (0.05, 0.1)
      momentum : (0.65, 0.8, 0.95)
      iter_num_no_change : (5, 10, 50)
      iter_max : (300, 200, 400)
      batch_size : (10, 50, 60, 80)
      Total number of models for glm : 1296
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      
      Performing hyperParameter tuning ...
      
      xgboost
      
      ----------------------------------------------------------------------------------------------------
      
      svm
      
      ----------------------------------------------------------------------------------------------------
      
      glm
      
      ----------------------------------------------------------------------------------------------------
      
      Evaluating models performance ...
      
      Evaluation completed.
      
      Leaderboard
      Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
      0	1	GLM_3	rfe	0.811189	0.811189	0.811189	0.811189	0.802198	0.795455	0.798413	0.809806	0.811189	0.810124
      1	2	GLM_4	pca	0.811189	0.811189	0.811189	0.811189	0.803978	0.792045	0.796843	0.809512	0.811189	0.809301
      2	3	GLM_1	lasso	0.804196	0.804196	0.804196	0.804196	0.797330	0.782955	0.788462	0.802365	0.804196	0.801775
      3	4	GLM_2	rfe	0.804196	0.804196	0.804196	0.804196	0.799867	0.779545	0.786658	0.802782	0.804196	0.800774
      4	5	GLM_5	pca	0.804196	0.804196	0.804196	0.804196	0.803061	0.776136	0.784731	0.803768	0.804196	0.799669
      5	6	XGBOOST_2	rfe	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
      6	7	XGBOOST_3	rfe	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
      7	8	XGBOOST_4	pca	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
      8	9	SVM_1	lasso	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
      9	10	SVM_3	rfe	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
      10	11	SVM_4	pca	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
      11	12	XGBOOST_0	lasso	0.797203	0.797203	0.797203	0.797203	0.785598	0.790909	0.787866	0.799782	0.797203	0.798136
      12	13	GLM_0	lasso	0.797203	0.797203	0.797203	0.797203	0.788602	0.777273	0.781794	0.795203	0.797203	0.795175
      
      13 rows X 13 columns
      
      
      1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
      Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 17/17
  3. Get model leaderboard.
    >>> aml.leaderboard()
    
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	GLM_3	rfe	0.811189	0.811189	0.811189	0.811189	0.802198	0.795455	0.798413	0.809806	0.811189	0.810124
    1	2	GLM_4	pca	0.811189	0.811189	0.811189	0.811189	0.803978	0.792045	0.796843	0.809512	0.811189	0.809301
    2	3	GLM_1	lasso	0.804196	0.804196	0.804196	0.804196	0.797330	0.782955	0.788462	0.802365	0.804196	0.801775
    3	4	GLM_2	rfe	0.804196	0.804196	0.804196	0.804196	0.799867	0.779545	0.786658	0.802782	0.804196	0.800774
    4	5	GLM_5	pca	0.804196	0.804196	0.804196	0.804196	0.803061	0.776136	0.784731	0.803768	0.804196	0.799669
    5	6	XGBOOST_2	rfe	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
    6	7	XGBOOST_3	rfe	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
    7	8	XGBOOST_4	pca	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
    8	9	SVM_1	lasso	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
    9	10	SVM_3	rfe	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
    10	11	SVM_4	pca	0.804196	0.804196	0.804196	0.804196	0.806977	0.772727	0.782675	0.805367	0.804196	0.798457
    11	12	XGBOOST_0	lasso	0.797203	0.797203	0.797203	0.797203	0.785598	0.790909	0.787866	0.799782	0.797203	0.798136
    12	13	GLM_0	lasso	0.797203	0.797203	0.797203	0.797203	0.788602	0.777273	0.781794	0.795203	0.797203	0.795175
  4. Get best performing model.
    >>> aml.leader()
    
    Rank	Model-ID	Feature-Selection	Accuracy	Micro-Precision	Micro-Recall	Micro-F1	Macro-Precision	Macro-Recall	Macro-F1	Weighted-Precision	Weighted-Recall	Weighted-F1
    0	1	GLM_3	rfe	0.811189	0.811189	0.811189	0.811189	0.802198	0.795455	0.798413	0.809806	0.811189	0.810124
  5. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict(rank=4)
    Following model is being used for generating prediction :
    Model ID : GLM_2 
    Feature Selection Method : rfe
    
    
     Prediction : 
        id  prediction      prob  survived
    0  120         1.0  0.920487         1
    1  242         0.0  0.881366         0
    2  650         1.0  0.930461         1
    3  244         0.0  0.875438         0
    4  486         1.0  0.801441         1
    5  747         0.0  0.881366         0
    6  202         0.0  0.791899         0
    7  122         0.0  0.881265         0
    8  549         1.0  0.528228         0
    9  774         0.0  0.568290         0
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    1               1  CLASS_2       10       37   0.787234  0.672727  0.725490       55
    0               0  CLASS_1       78       18   0.812500  0.886364  0.847826       88
    
     ROC-AUC : 
    AUC	GINI
    0.7413223140495868	0.48264462809917363
    threshold_value	tpr	fpr
    0.04081632653061224	0.6727272727272727	0.11363636363636363
    0.08163265306122448	0.6727272727272727	0.11363636363636363
    0.1020408163265306	0.6727272727272727	0.11363636363636363
    0.12244897959183673	0.6727272727272727	0.11363636363636363
    0.16326530612244897	0.6727272727272727	0.11363636363636363
    0.18367346938775508	0.6727272727272727	0.11363636363636363
    0.14285714285714285	0.6727272727272727	0.11363636363636363
    0.061224489795918366	0.6727272727272727	0.11363636363636363
    0.02040816326530612	0.6727272727272727	0.11363636363636363
    0.0	1.0	1.0
    
     Confusion Matrix : 
    array([[78, 10],
           [18, 37]], dtype=int64)
    
    >>> prediction.head()
    
    id	prediction	prob	survived
    120	1.0	0.9204874367314612	1
    242	0.0	0.8813661538418516	0
    650	1.0	0.9304605628045387	1
    244	0.0	0.8754375621867835	0
    486	1.0	0.8014409602906027	1
    747	0.0	0.8813661538418516	0
    202	0.0	0.7918987528859897	0
    122	0.0	0.8812647618359073	0
    549	1.0	0.5282277418701877	0
    774	0.0	0.5682901048373651	0
  6. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(titanic_test, rank=13)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    
    Updated dataset after dropping futile columns :
    passenger	survived	pclass	sex	age	sibsp	parch	fare	cabin	embarked	id
    301	1	3	female	None	0	0	7.75	None	Q	9
    282	0	3	male	28	0	0	7.8542	None	S	15
    15	0	3	female	14	0	0	7.8542	None	S	23
    40	1	3	female	14	1	0	11.2417	None	C	10
    242	1	3	female	None	1	0	15.5	None	Q	12
    240	0	2	male	33	0	0	12.275	None	S	20
    713	1	1	male	48	1	0	52.0	C126	S	11
    854	1	1	female	16	0	1	39.4	D28	S	19
    795	0	3	male	25	0	0	7.8958	None	S	14
    244	0	3	male	22	0	0	7.125	None	S	22
    
    Updated dataset after performing target column transformation :
    id	pclass	fare	embarked	age	cabin	parch	sibsp	sex	passenger	survived
    8	3	7.225	C	None	None	0	0	male	774	0
    14	3	7.8958	S	25	None	0	0	male	795	0
    22	3	7.125	S	22	None	0	0	male	244	0
    11	1	52.0	S	48	C126	0	1	male	713	1
    12	3	15.5	Q	None	None	0	1	female	242	1
    20	2	12.275	S	33	None	0	0	male	240	0
    15	3	7.8542	S	28	None	0	0	male	282	0
    23	3	7.8542	S	14	None	0	0	female	15	0
    10	3	11.2417	C	14	None	0	1	female	40	1
    18	2	13.0	S	24	None	0	0	female	200	0
    
    Updated dataset after dropping missing value containing columns : 
    id	pclass	fare	embarked	age	parch	sibsp	sex	passenger	survived
    8	3	7.225	C	None	0	0	male	774	0
    10	3	11.2417	C	14	0	1	female	40	1
    18	2	13.0	S	24	0	0	female	200	0
    14	3	7.8958	S	25	0	0	male	795	0
    12	3	15.5	Q	None	0	1	female	242	1
    20	2	12.275	S	33	0	0	male	240	0
    11	1	52.0	S	48	0	1	male	713	1
    19	1	39.4	S	16	1	0	female	854	1
    15	3	7.8542	S	28	0	0	male	282	0
    23	3	7.8542	S	14	0	0	female	15	0
    
    Updated dataset after imputing missing value containing columns :
    id	pclass	fare	embarked	age	parch	sibsp	sex	passenger	survived
    34	3	14.4542	C	15	0	1	female	831	1
    13	2	13.0	S	28	0	0	female	444	1
    11	1	52.0	S	48	0	1	male	713	1
    9	3	7.75	Q	29	0	0	female	301	1
    68	3	6.4375	C	34	0	0	male	844	0
    87	3	7.2292	C	29	0	0	male	569	0
    89	3	27.9	S	4	2	3	male	64	0
    156	3	8.05	S	29	0	0	male	46	0
    17	2	10.5	S	66	0	0	male	34	0
    101	2	26.0	S	44	0	1	male	237	0
    result data stored in table '"automl_user"."ml__td_sqlmr_persist_out__1713326001300629"'
    
    Updated dataset after performing categorical encoding :
    id	pclass	fare	embarked_0	embarked_1	embarked_2	age	parch	sibsp	sex_0	sex_1	passenger	survived
    34	3	14.4542	1	0	0	15	0	1	1	0	831	1
    13	2	13.0	0	0	1	28	0	0	1	0	444	1
    11	1	52.0	0	0	1	48	0	1	0	1	713	1
    9	3	7.75	0	1	0	29	0	0	1	0	301	1
    68	3	6.4375	1	0	0	34	0	0	0	1	844	0
    87	3	7.2292	1	0	0	29	0	0	0	1	569	0
    89	3	27.9	0	0	1	4	2	3	0	1	64	0
    156	3	8.05	0	0	1	29	0	0	0	1	46	0
    17	2	10.5	0	0	1	66	0	0	0	1	34	0
    101	2	26.0	0	0	1	44	0	1	0	1	237	0
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"automl_user"."ml__td_sqlmr_persist_out__1713326491643092"'
    
    Updated dataset after performing Lasso feature selection:
    id	sex_1	embarked_0	pclass	fare	age	sibsp	sex_0	embarked_2	passenger	embarked_1	survived
    139	1	0	3	7.75	29	0	0	0	460	1	0
    97	1	0	2	73.5	18	0	0	1	386	0	0
    15	1	0	3	7.8542	28	0	0	1	282	0	0
    32	1	0	3	8.05	43	0	0	1	669	0	0
    51	0	0	2	13.0	38	0	1	1	358	0	0
    108	1	0	3	7.0542	51	0	0	1	632	0	0
    133	1	0	1	28.5	45	0	0	1	332	0	0
    179	1	1	2	24.0	30	1	0	0	309	0	0
    78	0	0	3	7.775	25	0	1	1	247	0	0
    162	1	1	3	7.225	29	0	0	0	355	0	0
    
    Updated dataset after performing scaling on Lasso selected features :
    id	sex_1	embarked_0	survived	sex_0	embarked_2	embarked_1	pclass	fare	age	sibsp	passenger
    61	0	0	1	1	1	0	1.0	0.20966666666666667	-0.0625	0.5	0.1923509561304837
    101	1	0	0	0	1	0	0.5	0.4896421845574388	0.8333333333333334	0.5	0.2643419572553431
    17	1	0	0	0	1	0	0.5	0.19774011299435026	1.2916666666666667	0.0	0.0359955005624297
    40	1	0	0	0	1	0	0.0	1.152071563088512	0.875	0.5	0.10236220472440945
    122	1	0	0	0	1	0	0.5	0.19774011299435026	0.6458333333333334	0.0	0.9122609673790776
    19	0	0	1	1	1	0	0.0	0.7419962335216572	0.25	0.0	0.9583802024746907
    99	0	0	1	1	1	0	0.0	3.979990583804143	0.5208333333333334	0.0	0.8200224971878515
    95	1	0	1	0	1	0	0.5	0.5461393596986818	-0.08333333333333333	0.0	0.08661417322834646
    162	1	1	0	0	0	0	1.0	0.13606403013182672	0.5208333333333334	0.0	0.39707536557930256
    78	0	0	0	1	1	0	1.0	0.14642184557438795	0.4375	0.0	0.2755905511811024
    
    Updated dataset after performing RFE feature selection:
    id	pclass	age	sex_0	sex_1	passenger	fare	survived
    160	3	36	0	1	664	7.4958	0
    116	3	1	1	0	382	15.7417	1
    154	3	21	1	0	437	34.375	0
    28	3	44	0	1	604	8.05	0
    167	3	45	1	0	168	27.9	0
    163	2	29	1	0	597	33.0	1
    188	3	29	1	0	410	25.4667	0
    36	3	20	1	0	114	9.825	0
    141	2	54	0	1	250	26.0	0
    61	3	1	1	0	173	11.1333	1
    
    Updated dataset after performing scaling on RFE selected features :
    id	r_sex_1	r_sex_0	survived	r_pclass	r_age	r_passenger	r_fare
    80	1	0	0	1.0	0.375	0.4431946006749156	0.14681355932203388
    162	1	0	0	1.0	0.5208333333333334	0.39707536557930256	0.13606403013182672
    78	0	1	0	1.0	0.4375	0.2755905511811024	0.14642184557438795
    122	1	0	0	0.5	0.6458333333333334	0.9122609673790776	0.19774011299435026
    99	0	1	1	0.0	0.5208333333333334	0.8200224971878515	3.979990583804143
    95	1	0	1	0.5	-0.08333333333333333	0.08661417322834646	0.5461393596986818
    61	0	1	1	1.0	-0.0625	0.1923509561304837	0.20966666666666667
    141	1	0	0	0.5	1.0416666666666667	0.27896512935883017	0.4896421845574388
    101	1	0	0	0.5	0.8333333333333334	0.2643419572553431	0.4896421845574388
    17	1	0	0	0.5	1.2916666666666667	0.0359955005624297	0.19774011299435026
    
    Updated dataset after performing scaling for PCA feature selection :
    id	sex_1	embarked_0	parch	survived	sex_0	embarked_2	embarked_1	passenger	pclass	age	sibsp	fare
    122	1	0	0	0	0	1	0	0.9122609673790776	0.5	0.6458333333333334	0.0	0.19774011299435026
    61	0	0	1	1	1	1	0	0.1923509561304837	1.0	-0.0625	0.5	0.20966666666666667
    141	1	0	0	0	0	1	0	0.27896512935883017	0.5	1.0416666666666667	0.5	0.4896421845574388
    40	1	0	0	0	0	1	0	0.10236220472440945	0.0	0.875	0.5	1.152071563088512
    101	1	0	0	0	0	1	0	0.2643419572553431	0.5	0.8333333333333334	0.5	0.4896421845574388
    17	1	0	0	0	0	1	0	0.0359955005624297	0.5	1.2916666666666667	0.0	0.19774011299435026
    162	1	1	0	0	0	0	0	0.39707536557930256	1.0	0.5208333333333334	0.0	0.13606403013182672
    78	0	0	0	0	1	1	0	0.2755905511811024	1.0	0.4375	0.0	0.14642184557438795
    99	0	0	0	1	1	1	0	0.8200224971878515	0.0	0.5208333333333334	0.0	3.979990583804143
    95	1	0	2	1	0	1	0	0.08661417322834646	0.5	-0.08333333333333333	0.0	0.5461393596986818
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	col_2	col_3	col_4	col_5	survived
    0	183	1.010659	0.087650	1.015931	0.540143	0.427567	-0.295310	1
    1	101	-0.508872	-0.061286	-0.395290	0.094121	0.437546	0.196669	0
    2	40	-0.372873	-0.029375	-0.969576	0.345081	0.664177	0.172283	0
    3	122	-0.580576	-0.086576	-0.265136	0.176906	-0.382837	0.010624	0
    4	80	-0.659454	-0.112057	0.207767	-0.152085	0.008856	-0.126807	0
    ...	...	...	...	...	...	...	...	...
    173	103	0.851246	-0.646328	-0.718562	0.252324	-0.018630	0.420395	1
    174	168	-0.551856	-0.072476	-0.219560	0.024517	0.179139	-0.274188	0
    175	166	-0.677373	-0.119088	0.163534	-0.037520	-0.340589	0.049311	0
    176	23	0.619989	-0.697611	0.415298	-0.379321	0.293878	-0.337729	0
    177	164	0.932179	-0.652585	-1.199964	0.536862	0.406523	0.124425	1
    178 rows × 8 columns
    
    Data Transformation completed.
    
     Following model is being used for generating prediction :
    Model ID : GLM_0 
    Feature Selection Method : lasso
    
     Prediction : 
        id  prediction      prob  survived
    0  101         0.0  0.842330         0
    1   40         0.0  0.613704         0
    2  120         0.0  0.883461         0
    3  122         0.0  0.865047         0
    4   61         1.0  0.797549         1
    5  141         0.0  0.876889         0
    6  162         0.0  0.865844         0
    7   78         1.0  0.655147         0
    8   99         1.0  0.950835         1
    9   95         0.0  0.582591         1
    
     Performance Metrics : 
           Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
    SeqNum                                                                              
    0               0  CLASS_1       78       21   0.787879  0.715596  0.750000      109
    1               1  CLASS_2       31       48   0.607595  0.695652  0.648649       69
    
     ROC-AUC : 
    AUC	GINI
    0.6067012365376944	0.21340247307538873
    threshold_value	tpr	fpr
    0.04081632653061224	0.6956521739130435	0.28440366972477066
    0.08163265306122448	0.6956521739130435	0.28440366972477066
    0.1020408163265306	0.6956521739130435	0.28440366972477066
    0.12244897959183673	0.6956521739130435	0.28440366972477066
    0.16326530612244897	0.6956521739130435	0.28440366972477066
    0.18367346938775508	0.6956521739130435	0.28440366972477066
    0.14285714285714285	0.6956521739130435	0.28440366972477066
    0.061224489795918366	0.6956521739130435	0.28440366972477066
    0.02040816326530612	0.6956521739130435	0.28440366972477066
    0.0	1.0	1.0
    
     Confusion Matrix : 
    array([[78, 31],
           [21, 48]], dtype=int64)
    >>> prediction.head()
    id  prediction      prob  survived
    101        0.0  0.842330         0
    40         0.0  0.613704         0
    120        0.0  0.883461         0
    122        0.0  0.865047         0
    61         1.0  0.797549         1
    141        0.0  0.876889         0
    162        0.0  0.865844         0
    78         1.0  0.655147         0
    99         1.0  0.950835         1
    95         0.0  0.582591         1