AutoML for regression with early stopping timer and metrics threshold - Example 1: Run AutoML for Regression Problem with Early Stopping Timer and Metrics Threshold - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
Language
English (United States)
Last Update
2024-12-18
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predicts the price of house based on different factors.

Run AutoML to get the best performing model with the following specifications:
  • Set early stopping criteria, that is, time limit to 300 sec and performance metrics R2 threshold value to 0.7.
  • Exclude ‘knn’ model from default model training list.
  • Opt for verbose level 2 to get detailed logging.
  1. Load the example dataset.
    >>> load_example_data("decisionforestpredict", ["housing_train", "housing_test"])
    >>> housing_train = DataFrame.from_table("housing_train")
    >>> housing_test = DataFrame.from_table("housing_test")
  2. Create an AutoML instance.
    >>> aml = AutoML(task_type="Regression",
                     exclude=['knn'],
                     verbose=2,
                     max_runtime_secs=300,
                     stopping_metric='R2',
                     stopping_tolerance=0.7)
  3. Fit the data.
    >>> aml.fit(housing_train,housing_train.price)
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    
    
    Data Overview:
    Total Rows in the data: 492
    Total Columns in the data: 14
    
    
    Column Summary:
    ColumnName	Datatype	NonNullCount	NullCount	BlankCount	ZeroCount	PositiveCount	NegativeCount	NullPercentage	NonNullPercentage
    sn	INTEGER	492	0	None	0	492	0	0.0	100.0
    recroom	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    garagepl	INTEGER	492	0	None	270	222	0	0.0	100.0
    fullbase	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    gashw	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    price	FLOAT	492	0	None	0	492	0	0.0	100.0
    bathrms	INTEGER	492	0	None	0	492	0	0.0	100.0
    prefarea	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    airco	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    stories	INTEGER	492	0	None	0	492	0	0.0	100.0
    bedrooms	INTEGER	492	0	None	0	492	0	0.0	100.0
    lotsize	FLOAT	492	0	None	0	492	0	0.0	100.0
    homestyle	VARCHAR(20) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    driveway	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    
    
    Statistics of Data:
    func	sn	price	lotsize	bedrooms	bathrms	stories	garagepl
    min	1	25000	1650	1	1	1	0
    std	159.501	26472.496	2182.443	0.731	0.51	0.861	0.854
    25%	132.5	49975	3600	2	1	1	0
    50%	274	62000	4616	3	1	2	0
    75%	413.25	82000	6370	3	2	2	1
    max	546	190000	16200	6	4	4	3
    mean	272.943	68100.396	5181.795	2.965	1.293	1.803	0.685
    count	492	492	492	492	492	492	492
    
    Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    driveway                  2         
    recroom                   2         
    fullbase                  2         
    gashw                     2         
    airco                     2         
    prefarea                  2         
    homestyle                 3         
    
    No Futile columns found.
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                           
      ColumnName  OutlierPercentage
    0    stories           7.113821
    1   bedrooms           2.235772
    2    bathrms           0.203252
    3   garagepl           2.235772
    4    lotsize           2.235772
    5      price           2.439024
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Feature Engineering started ...
    
    Handling duplicate records present in dataset ...
    Analysis completed. No action taken.                                                    
    
    Total time to handle duplicate records: 1.55 sec
    
    Handling less significant features from data ...
    Analysis indicates all categorical columns are significant. No action Needed.           
    
    Total time to handle less significant features: 19.75 sec
    
    Handling Date Features ...
    Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    
    Total time to handle date features: 0.00 sec
    
    Checking Missing values in dataset ...
    Analysis Completed. No Missing Values Detected.                                          
    
    Total time to find missing values in data: 8.82 sec
    
    Imputing Missing Values ...
    Analysis completed. No imputation required.                                              
    
    Time taken to perform imputation: 0.00 sec
    
    Performing encoding for categorical columns ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814119199335"'8
    
    ONE HOT Encoding these Columns:
    ['driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'prefarea', 'homestyle']
    
    Sample of dataset after performing one hot encoding:
    sn	price	lotsize	bedrooms	bathrms	stories	driveway_0	driveway_1	recroom_0	recroom_1	fullbase_0	fullbase_1	gashw_0	gashw_1	airco_0	airco_1	garagepl	prefarea_0	prefarea_1	homestyle_0	homestyle_1	homestyle_2	id
    242	52000.0	3000.0	2	1	2	0	1	1	0	1	0	1	0	0	1	0	1	0	0	0	1	28
    425	65500.0	3840.0	3	1	2	0	1	1	0	1	0	1	0	1	0	1	0	1	0	0	1	44
    118	94500.0	4000.0	3	2	2	0	1	1	0	0	1	1	0	0	1	1	1	0	0	0	1	52
    240	30000.0	3000.0	4	1	2	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	60
    362	145000.0	8580.0	4	3	4	0	1	1	0	1	0	1	0	0	1	2	0	1	1	0	0	76
    259	33500.0	3640.0	2	1	1	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	84
    505	71500.0	8150.0	3	2	1	0	1	0	1	0	1	1	0	1	0	0	1	0	0	0	1	68
    507	75000.0	9800.0	4	2	2	0	1	0	1	1	0	1	0	1	0	2	1	0	0	0	1	36
    345	88000.0	4500.0	3	1	4	0	1	1	0	1	0	1	0	0	1	0	1	0	0	0	1	20
    80	63900.0	6360.0	2	1	1	0	1	1	0	0	1	1	0	0	1	1	1	0	0	0	1	12
    
    492 rows X 23 columns
    
    Time taken to encode the columns: 14.51 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Data preparation started ...
    
    Spliting of dataset into training and testing ...
    Training size : 0.8                                                                      
    Testing size  : 0.2                                                                      
    
    Training data sample
    sn	price	lotsize	bedrooms	bathrms	stories	driveway_0	driveway_1	recroom_0	recroom_1	fullbase_0	fullbase_1	gashw_0	gashw_1	airco_0	airco_1	garagepl	prefarea_0	prefarea_1	homestyle_0	homestyle_1	homestyle_2	id
    40	54500.0	3150.0	2	2	1	1	0	1	0	0	1	1	0	1	0	0	1	0	0	0	1	10
    122	80000.0	10500.0	4	2	2	0	1	1	0	1	0	1	0	1	0	1	1	0	0	0	1	11
    387	83900.0	11460.0	3	1	3	0	1	1	0	1	0	1	0	1	0	2	0	1	0	0	1	19
    326	99000.0	8880.0	3	2	2	0	1	1	0	0	1	1	0	0	1	1	1	0	0	0	1	13
    61	48000.0	4120.0	2	1	2	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	14
    244	27000.0	3649.0	2	1	1	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	22
    427	49500.0	5320.0	2	1	1	0	1	1	0	1	0	1	0	1	0	1	0	1	0	1	0	15
    223	70100.0	4200.0	3	1	2	0	1	1	0	1	0	1	0	1	0	1	1	0	0	0	1	23
    265	50000.0	3640.0	2	1	1	0	1	1	0	1	0	1	0	1	0	1	1	0	0	1	0	9
    101	57000.0	4500.0	3	2	2	1	0	1	0	0	1	1	0	0	1	0	1	0	0	0	1	17
    
    393 rows X 23 columns
    
    Testing data sample
    sn	price	lotsize	bedrooms	bathrms	stories	driveway_0	driveway_1	recroom_0	recroom_1	fullbase_0	fullbase_1	gashw_0	gashw_1	airco_0	airco_1	garagepl	prefarea_0	prefarea_1	homestyle_0	homestyle_1	homestyle_2	id
    385	78000.0	6600.0	4	2	2	0	1	0	1	0	1	1	0	1	0	0	0	1	0	0	1	29
    284	45000.0	6750.0	2	1	1	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	25
    354	86000.0	6800.0	2	1	1	0	1	0	1	0	1	1	0	1	0	2	1	0	0	0	1	121
    488	44100.0	8100.0	2	1	1	0	1	1	0	1	0	1	0	1	0	1	1	0	0	1	0	30
    202	53900.0	2520.0	5	2	1	1	0	1	0	0	1	1	0	0	1	1	1	0	0	0	1	31
    32	48000.0	3500.0	4	1	2	0	1	1	0	1	0	1	0	0	1	2	1	0	0	1	0	127
    448	120000.0	5500.0	4	2	2	0	1	1	0	0	1	1	0	0	1	1	0	1	1	0	0	27
    91	47000.0	6060.0	3	1	1	0	1	0	1	0	1	1	0	1	0	0	1	0	0	1	0	123
    242	52000.0	3000.0	2	1	2	0	1	1	0	1	0	1	0	0	1	0	1	0	0	0	1	28
    379	84000.0	7160.0	3	1	1	0	1	1	0	0	1	1	0	1	0	2	0	1	0	0	1	124
    
    99 rows X 23 columns
    
    Time taken for spliting of data: 10.83 sec
    
    Outlier preprocessing ...
    Columns with outlier percentage :-                                                                           
      ColumnName  OutlierPercentage
    0    stories           7.113821
    1   bedrooms           2.235772
    2   garagepl           2.235772
    3    bathrms           0.203252
    4    lotsize           2.235772
    5      price           2.439024
    
    Deleting rows of these columns:
    ['price', 'bathrms', 'garagepl', 'bedrooms', 'lotsize', 'stories']
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713813731333800"'8
    
    Sample of training dataset after removing outlier rows:
    sn	price	lotsize	bedrooms	bathrms	stories	driveway_0	driveway_1	recroom_0	recroom_1	fullbase_0	fullbase_1	gashw_0	gashw_1	airco_0	airco_1	garagepl	prefarea_0	prefarea_1	homestyle_0	homestyle_1	homestyle_2	id
    278	65500.0	4000.0	3	1	2	0	1	1	0	1	0	1	0	1	0	1	1	0	0	0	1	74
    175	50000.0	3036.0	3	1	2	0	1	1	0	0	1	1	0	1	0	0	1	0	0	1	0	98
    478	88500.0	5500.0	3	2	1	0	1	0	1	0	1	1	0	1	0	2	0	1	0	0	1	130
    415	52000.0	2850.0	3	2	2	1	0	1	0	0	1	1	0	1	0	0	0	1	0	0	1	154
    455	74900.0	6050.0	3	1	1	0	1	1	0	0	1	1	0	1	0	0	0	1	0	0	1	178
    127	117000.0	5960.0	3	3	2	0	1	0	1	0	1	1	0	1	0	1	1	0	1	0	0	194
    7	66000.0	3880.0	3	2	2	0	1	1	0	0	1	1	0	1	0	2	1	0	0	0	1	162
    461	47600.0	2145.0	3	1	2	0	1	1	0	0	1	1	0	1	0	0	0	1	0	1	0	90
    423	61100.0	3400.0	3	1	2	0	1	1	0	0	1	1	0	1	0	2	0	1	0	0	1	58
    57	25245.0	2400.0	3	1	1	1	0	1	0	1	0	1	0	1	0	0	1	0	0	1	0	34
    
    192 rows X 23 columns
    
    Time Taken by Outlier processing: 33.67 sec
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814511698109"'8
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713818540523806"'
    
    Feature selection using lasso ...
    
    feature selected by lasso:
    ['stories', 'prefarea_1', 'sn', 'bathrms', 'fullbase_0', 'recroom_0', 'homestyle_1', 'recroom_1', 'garagepl', 'driveway_0', 'prefarea_0', 'fullbase_1', 'airco_1', 'driveway_1', 'homestyle_0', 'lotsize']
    
    Total time taken by feature selection: 0.97 sec
    
    scaling Features of lasso data ...
    
    columns that will be scaled:
    ['stories', 'sn', 'bathrms', 'garagepl', 'lotsize']
    
    Training dataset sample after scaling:
    fullbase_0	recroom_1	prefarea_0	driveway_0	fullbase_1	prefarea_1	airco_1	driveway_1	price	homestyle_0	recroom_0	homestyle_1	id	stories	sn	bathrms	garagepl	lotsize
    0	0	1	0	1	0	0	1	44700.0	0	1	1	426	-1.1754677333050345	-0.6087119947083474	-0.5026028286234501	-0.7628533219155937	0.26705222388917843
    1	0	1	1	0	0	0	0	38000.0	0	1	1	133	-1.1754677333050345	-0.6986239543619377	-0.5026028286234501	-0.7628533219155937	-1.3291743663387832
    1	0	1	0	0	0	1	1	84000.0	0	1	0	136	1.8337296639558538	1.6519315622962105	1.6905731508243313	-0.7628533219155937	0.7904478921368461
    1	0	1	0	0	0	0	1	65000.0	0	1	0	108	-1.1754677333050345	-0.9426707019931115	-0.5026028286234501	1.8067578676948275	-0.6365213924388846
    0	0	0	0	1	1	1	1	73000.0	0	1	0	182	-1.1754677333050345	1.0996152387098697	-0.5026028286234501	-0.7628533219155937	0.5821312082571773
    1	0	0	0	0	1	0	1	60000.0	0	1	0	512	1.8337296639558538	1.2152163296930574	-0.5026028286234501	0.5219522728896168	-1.477600003603047
    1	0	1	0	0	0	0	1	62000.0	0	1	0	192	1.8337296639558538	-0.8591810251719205	1.6905731508243313	0.5219522728896168	-0.2511355272614975
    1	0	1	0	0	0	0	1	70000.0	0	1	0	72	1.8337296639558538	0.38031956148114676	-0.5026028286234501	-0.7628533219155937	-1.0479468431012304
    1	1	0	0	0	1	1	1	78000.0	0	0	0	285	1.8337296639558538	0.4830760867995358	-0.5026028286234501	-0.7628533219155937	0.5821312082571773
    0	1	0	0	1	1	0	1	88500.0	0	0	0	130	-1.1754677333050345	1.2601723095198525	1.6905731508243313	1.8067578676948275	0.2696561824376743
    
    192 rows X 18 columns
    
    Testing dataset sample after scaling:
    fullbase_0	recroom_1	prefarea_0	driveway_0	fullbase_1	prefarea_1	airco_1	driveway_1	price	homestyle_0	recroom_0	homestyle_1	id	stories	sn	bathrms	garagepl	lotsize
    1	0	1	0	0	0	0	1	87250.0	0	1	0	492	1.8337296639558538	-0.36466524707717346	-0.5026028286234501	0.5219522728896168	-0.907333081482454
    1	1	1	0	0	0	1	1	103000.0	1	0	0	368	3.338328362586298	1.536330471313023	1.6905731508243313	0.5219522728896168	1.4049821095818689
    0	0	0	0	1	1	0	1	93000.0	0	1	0	454	1.8337296639558538	1.2409054610226546	-0.5026028286234501	-0.7628533219155937	0.8789824827857053
    1	0	1	0	0	0	0	1	30000.0	0	1	1	247	-1.1754677333050345	0.046360854196382535	-0.5026028286234501	0.5219522728896168	-0.8448380763185533
    1	0	0	0	0	1	1	1	100500.0	1	1	0	205	1.8337296639558538	0.5858326121179247	1.6905731508243313	1.8067578676948275	0.7175370527789621
    0	1	1	1	1	0	0	0	72000.0	0	0	0	211	-1.1754677333050345	-0.46742177239556243	-0.5026028286234501	-0.7628533219155937	-0.7510955685727024
    1	0	0	0	0	1	1	1	112000.0	1	1	0	200	3.338328362586298	0.8427239254138973	1.6905731508243313	-0.7628533219155937	0.7175370527789621
    0	1	1	0	1	0	1	1	70000.0	0	0	0	451	-1.1754677333050345	0.17480651084436877	-0.5026028286234501	-0.7628533219155937	0.4258936953474258
    1	0	1	0	0	0	1	1	120000.0	1	1	0	417	3.338328362586298	1.6069755824694154	-0.5026028286234501	1.8067578676948275	1.050843746986432
    1	0	1	1	0	0	0	0	41000.0	0	1	1	187	-1.1754677333050345	-1.507831591244251	-0.5026028286234501	-0.7628533219155937	-1.0114914234222883
    
    99 rows X 18 columns
    
    Total time taken by feature scaling: 48.99 sec
    
    Feature selection using rfe ...
    
    feature selected by RFE:
    ['sn', 'bathrms', 'homestyle_1', 'garagepl', 'homestyle_2', 'airco_0', 'homestyle_0', 'lotsize']
    
    Total time taken by feature selection: 41.78 sec
    
    scaling Features of rfe data ...
    
    columns that will be scaled:
    ['r_sn', 'r_bathrms', 'r_garagepl', 'r_lotsize']
    
    Training dataset sample after scaling:
    r_homestyle_1	r_airco_0	r_homestyle_2	r_homestyle_0	price	id	r_sn	r_bathrms	r_garagepl	r_lotsize
    1	1	0	0	44700.0	426	-0.6087119947083474	-0.5026028286234501	-0.7628533219155937	0.26705222388917843
    1	1	0	0	38000.0	133	-0.6986239543619377	-0.5026028286234501	-0.7628533219155937	-1.3291743663387832
    0	0	1	0	84000.0	136	1.6519315622962105	1.6905731508243313	-0.7628533219155937	0.7904478921368461
    0	1	1	0	65000.0	108	-0.9426707019931115	-0.5026028286234501	1.8067578676948275	-0.6365213924388846
    0	0	1	0	73000.0	182	1.0996152387098697	-0.5026028286234501	-0.7628533219155937	0.5821312082571773
    0	1	1	0	60000.0	512	1.2152163296930574	-0.5026028286234501	0.5219522728896168	-1.477600003603047
    0	1	1	0	62000.0	192	-0.8591810251719205	1.6905731508243313	0.5219522728896168	-0.2511355272614975
    0	1	1	0	70000.0	72	0.38031956148114676	-0.5026028286234501	-0.7628533219155937	-1.0479468431012304
    0	0	1	0	78000.0	285	0.4830760867995358	-0.5026028286234501	-0.7628533219155937	0.5821312082571773
    0	1	1	0	88500.0	130	1.2601723095198525	1.6905731508243313	1.8067578676948275	0.2696561824376743
    
    192 rows X 10 columns
    
    Testing dataset sample after scaling:
    r_homestyle_1	r_airco_0	r_homestyle_2	r_homestyle_0	price	id	r_sn	r_bathrms	r_garagepl	r_lotsize
    0	1	1	0	87250.0	492	-0.36466524707717346	-0.5026028286234501	0.5219522728896168	-0.907333081482454
    0	0	0	1	103000.0	368	1.536330471313023	1.6905731508243313	0.5219522728896168	1.4049821095818689
    0	1	1	0	93000.0	454	1.2409054610226546	-0.5026028286234501	-0.7628533219155937	0.8789824827857053
    1	1	0	0	30000.0	247	0.046360854196382535	-0.5026028286234501	0.5219522728896168	-0.8448380763185533
    0	0	0	1	100500.0	205	0.5858326121179247	1.6905731508243313	1.8067578676948275	0.7175370527789621
    0	1	1	0	72000.0	211	-0.46742177239556243	-0.5026028286234501	-0.7628533219155937	-0.7510955685727024
    0	0	0	1	112000.0	200	0.8427239254138973	1.6905731508243313	-0.7628533219155937	0.7175370527789621
    0	0	1	0	70000.0	451	0.17480651084436877	-0.5026028286234501	-0.7628533219155937	0.4258936953474258
    0	0	0	1	120000.0	417	1.6069755824694154	-0.5026028286234501	1.8067578676948275	1.050843746986432
    1	1	0	0	41000.0	187	-1.507831591244251	-0.5026028286234501	-0.7628533219155937	-1.0114914234222883
    
    99 rows X 10 columns
    
    Total time taken by feature scaling: 39.84 sec
    
    scaling Features of pca data ...
    
    columns that will be scaled:
    ['sn', 'lotsize', 'bathrms', 'stories', 'garagepl']
    
    Training dataset sample after scaling:
    recroom_0	recroom_1	prefarea_0	driveway_0	bedrooms	prefarea_1	fullbase_1	airco_1	homestyle_2	gashw_0	driveway_1	gashw_1	airco_0	homestyle_0	fullbase_0	price	homestyle_1	id	sn	lotsize	bathrms	stories	garagepl
    1	0	1	1	3	0	1	1	1	1	0	0	0	0	0	57000.0	0	17	-1.1610283182946874	-0.25113552726149785	1.690573150824329	0.32913096532540864	-0.7628533219155942
    1	0	0	0	3	1	0	0	1	1	1	0	1	0	1	65500.0	0	44	0.9197913194026884	-0.5948580556629517	-0.5026028286234494	0.32913096532540864	0.5219522728896171
    1	0	1	0	3	0	1	1	1	1	1	0	0	0	0	94500.0	0	52	-1.0518495101438992	-0.5115313821110842	1.690573150824329	0.32913096532540864	0.5219522728896171
    1	0	1	0	3	0	0	0	1	1	1	0	1	0	1	70100.0	0	23	-0.3775098127419719	-0.40737304017124965	-0.5026028286234494	0.32913096532540864	0.5219522728896171
    1	0	1	0	3	0	0	0	0	1	1	0	1	0	1	35500.0	1	51	-1.4307641972554586	-0.3032146982314151	-0.5026028286234494	0.32913096532540864	-0.7628533219155942
    1	0	1	0	3	0	0	1	1	1	1	0	0	0	1	98000.0	0	59	0.2711407533303583	0.5300520372872609	-0.5026028286234494	-1.175467733305031	0.5219522728896171
    1	0	1	0	3	0	0	0	1	1	1	0	1	0	1	56000.0	0	38	-0.9041370049987152	-1.0323230918102566	-0.5026028286234494	0.32913096532540864	-0.7628533219155942
    0	1	0	0	3	1	1	0	1	1	1	0	1	0	0	86000.0	0	46	0.7977679455871016	0.9987645760165162	1.690573150824329	-1.175467733305031	-0.7628533219155942
    1	0	1	0	3	0	1	1	1	1	1	0	0	0	0	99000.0	0	13	0.28398531899515694	2.029932161220878	1.690573150824329	0.32913096532540864	0.5219522728896171
    1	0	1	0	3	0	0	0	1	0	1	1	1	0	1	60000.0	0	21	0.14911737951477144	0.42589369534742644	-0.5026028286234494	-1.175467733305031	1.8067578676948284
    
    192 rows X 23 columns
    
    Testing dataset sample after scaling:
    recroom_0	recroom_1	prefarea_0	driveway_0	bedrooms	prefarea_1	fullbase_1	airco_1	homestyle_2	gashw_0	driveway_1	gashw_1	airco_0	homestyle_0	fullbase_0	price	homestyle_1	id	sn	lotsize	bathrms	stories	garagepl
    1	0	1	0	3	0	1	0	1	1	1	0	1	0	0	52000.0	0	26	-0.525222317887156	-0.7354718172817283	-0.5026028286234494	0.32913096532540864	-0.7628533219155942
    1	0	0	0	4	1	1	1	0	1	1	0	0	1	0	120000.0	0	27	1.0675038245478725	0.2696561824376747	1.690573150824329	0.32913096532540864	0.5219522728896171
    0	1	1	0	3	0	1	0	0	1	1	0	1	0	0	47000.0	1	123	-1.2252511466186806	0.5612995398692113	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	1	0	2	0	1	0	1	0	1	1	1	0	0	99000.0	0	24	0.5408766322911293	4.2797523471213035	-0.5026028286234494	-1.175467733305031	0.5219522728896171
    1	0	1	1	5	0	1	1	1	1	0	0	0	0	0	53900.0	0	31	-0.5123777522223574	-1.2823031124658595	1.690573150824329	-1.175467733305031	0.5219522728896171
    1	0	1	0	4	0	0	1	0	1	1	0	0	0	1	48000.0	1	127	-1.6041658337302398	-0.7719272369606704	-0.5026028286234494	0.32913096532540864	1.8067578676948284
    1	0	1	0	2	0	0	0	0	1	1	0	1	0	1	44100.0	1	30	1.324395137843845	1.6237146276555232	-0.5026028286234494	-1.175467733305031	0.5219522728896171
    1	0	1	1	3	0	0	0	0	1	0	0	1	0	1	42000.0	1	126	-0.8206473281775242	-0.7198480659907531	-0.5026028286234494	0.32913096532540864	0.5219522728896171
    1	0	1	0	2	0	0	1	1	1	1	0	0	0	1	52000.0	0	28	-0.255486438926385	-1.0323230918102566	-0.5026028286234494	0.32913096532540864	-0.7628533219155942
    1	0	0	0	3	1	1	0	1	1	1	0	1	0	0	84000.0	0	124	0.6243663091123203	1.1341704205383012	-0.5026028286234494	-1.175467733305031	1.8067578676948284
    
    99 rows X 23 columns
    
    Total time taken by feature scaling: 49.69 sec
    
    Dimension Reduction using pca ...
    
    PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9']
    
    Total time taken by PCA: 11.21 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Model Training started ...
    
    Hyperparameters used for model training:
    response_column : price                                                                                                                               
    name : glm
    family : GAUSSIAN
    lambda1 : (0.001, 0.02, 0.1)
    alpha : (0.15, 0.85)
    learning_rate : ('invtime', 'constant', 'adaptive')
    initial_eta : (0.05, 0.1)
    momentum : (0.65, 0.8, 0.95)
    iter_num_no_change : (5, 10, 50)
    iter_max : (300, 200, 400, 500)
    batch_size : (10, 80, 100, 150)
    Total number of models for glm : 5184
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : price
    name : xgboost
    model_type : Regression
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1, 0.2, 0.3)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.01, 0.05, 0.1)
    max_depth : (5, 3, 4, 7, 8)
    min_node_size : (1, 2, 3, 4)
    iter_num : (10, 20, 30, 40)
    Total number of models for xgboost : 10240
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : price
    name : decision_forest
    tree_type : Regression
    min_impurity : (0.0, 0.1, 0.2, 0.3)
    max_depth : (5, 3, 4, 7, 8)
    min_node_size : (1, 2, 3, 4)
    num_trees : (-1, 20, 30, 40)
    Total number of models for decision_forest : 320
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : price
    name : svm
    model_type : regression
    lambda1 : (0.001, 0.02, 0.1)
    alpha : (0.15, 0.85)
    tolerance : (0.001, 0.01)
    learning_rate : ('Invtime', 'Adaptive', 'constant')
    initial_eta : (0.05, 0.1)
    momentum : (0.65, 0.8, 0.95)
    nesterov : True
    intercept : True
    iter_num_no_change : (5, 10, 50)
    local_sgd_iterations  : (10, 20)
    iter_max : (300, 200, 400, 500)
    batch_size : (10, 80, 100, 150)
    Total number of models for svm : 20736
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    Performing hyperParameter tuning ...
    
    glm
    
    ----------------------------------------------------------------------------------------------------
    
    xgboost
    
    ----------------------------------------------------------------------------------------------------
    
    decision_forest
    
    ----------------------------------------------------------------------------------------------------
    
    svm
    
    ----------------------------------------------------------------------------------------------------
    
    Evaluating models performance ...
    
    Evaluation completed.
    
    Leaderboard
    Rank	Model-ID	Feature-Selection	MAE	MSE	MSLE	RMSE	RMSLE	R2-score	Adjusted R2-score
    0	1	XGBOOST_3	lasso	9628.455479	1.483976e+08	0.034632	12181.855382	0.186097	0.783807	0.741623
    1	2	XGBOOST_0	lasso	9628.455479	1.483976e+08	0.034632	12181.855382	0.186097	0.783807	0.741623
    2	3	GLM_3	lasso	10200.663327	1.775318e+08	0.037768	13324.104150	0.194340	0.741363	0.690898
    3	4	DECISIONFOREST_0	lasso	11490.006156	2.148976e+08	0.046107	14659.386825	0.214725	0.686927	0.625840
    4	5	GLM_1	rfe	13133.449531	3.101527e+08	0.061828	17611.153178	0.248652	0.548155	0.507991
    5	6	GLM_2	pca	12825.056171	3.332267e+08	0.058128	18254.497063	0.241098	0.514540	0.459374
    6	7	GLM_0	lasso	14148.183643	3.375269e+08	0.069698	18371.906254	0.264003	0.508275	0.412328
    7	8	XGBOOST_2	pca	13922.629788	3.710428e+08	0.064012	19262.470688	0.253006	0.459447	0.398021
    8	9	DECISIONFOREST_2	pca	12586.068723	3.734912e+08	0.059352	19325.919884	0.243623	0.455880	0.394049
    9	10	DECISIONFOREST_3	lasso	16432.237374	4.326218e+08	0.085381	20799.562257	0.292201	0.369736	0.246758
    10	11	XGBOOST_1	rfe	21225.126287	6.860673e+08	0.153932	26192.885820	0.392342	0.000505	-0.088339
    11	12	DECISIONFOREST_1	rfe	22215.932941	7.373789e+08	0.161533	27154.721120	0.401911	-0.074248	-0.169737
    12	13	SVM_0	lasso	66220.929293	5.071762e+09	51.035352	71216.300842	7.143903	-6.388782	-7.830496
    13	14	SVM_1	rfe	66243.121212	5.075075e+09	57.353084	71239.561004	7.573182	-6.393609	-7.050819
    14	15	SVM_3	lasso	66269.585859	5.078071e+09	61.891356	71260.586170	7.867106	-6.397974	-7.841481
    15	16	SVM_2	pca	66271.212121	5.078287e+09	0.000000	71262.102818	0.000000	-6.398289	-7.239004
    
    16 rows X 10 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 18/18
  4. Display model leaderboard.
    >>> aml.leaderboard()
    Rank	Model-ID	Feature-Selection	MAE	MSE	MSLE	RMSE	RMSLE	R2-score	Adjusted R2-score
    0	1	XGBOOST_3	lasso	9628.455479	1.483976e+08	0.034632	12181.855382	0.186097	0.783807	0.741623
    1	2	XGBOOST_0	lasso	9628.455479	1.483976e+08	0.034632	12181.855382	0.186097	0.783807	0.741623
    2	3	GLM_3	lasso	10200.663327	1.775318e+08	0.037768	13324.104150	0.194340	0.741363	0.690898
    3	4	DECISIONFOREST_0	lasso	11490.006156	2.148976e+08	0.046107	14659.386825	0.214725	0.686927	0.625840
    4	5	GLM_1	rfe	13133.449531	3.101527e+08	0.061828	17611.153178	0.248652	0.548155	0.507991
    5	6	GLM_2	pca	12825.056171	3.332267e+08	0.058128	18254.497063	0.241098	0.514540	0.459374
    6	7	GLM_0	lasso	14148.183643	3.375269e+08	0.069698	18371.906254	0.264003	0.508275	0.412328
    7	8	XGBOOST_2	pca	13922.629788	3.710428e+08	0.064012	19262.470688	0.253006	0.459447	0.398021
    8	9	DECISIONFOREST_2	pca	12586.068723	3.734912e+08	0.059352	19325.919884	0.243623	0.455880	0.394049
    9	10	DECISIONFOREST_3	lasso	16432.237374	4.326218e+08	0.085381	20799.562257	0.292201	0.369736	0.246758
    10	11	XGBOOST_1	rfe	21225.126287	6.860673e+08	0.153932	26192.885820	0.392342	0.000505	-0.088339
    11	12	DECISIONFOREST_1	rfe	22215.932941	7.373789e+08	0.161533	27154.721120	0.401911	-0.074248	-0.169737
    12	13	SVM_0	lasso	66220.929293	5.071762e+09	51.035352	71216.300842	7.143903	-6.388782	-7.830496
    13	14	SVM_1	rfe	66243.121212	5.075075e+09	57.353084	71239.561004	7.573182	-6.393609	-7.050819
    14	15	SVM_3	lasso	66269.585859	5.078071e+09	61.891356	71260.586170	7.867106	-6.397974	-7.841481
    15	16	SVM_2	pca	66271.212121	5.078287e+09	0.000000	71262.102818	0.000000	-6.398289	-7.239004
  5. Display the best performing model.
    >>> aml.leader()
    Rank	Model-ID	Feature-Selection	MAE	MSE	MSLE	RMSE	RMSLE	R2-score	Adjusted R2-score
    0	1	XGBOOST_3	lasso	9628.455479	1.483976e+08	0.034632	12181.855382	0.186097	0.783807	0.741623
  6. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    Following model is being used for generating prediction :
    Model ID : XGBOOST_3 
    Feature Selection Method : lasso
     Prediction : 
        id     Prediction  Confidence_Lower  Confidence_upper     price
    0  492   63523.430206      -7990.015251     135036.875663   87250.0
    1  368  102759.944778     -17752.673004     223272.562559  103000.0
    2  454   77782.022661     -14282.526959     169846.572282   93000.0
    3  247   45022.847275      -6566.519982      96612.214533   30000.0
    4  205  117354.767915     -17115.856482     251825.392313  100500.0
    5  211   56755.264163      -8626.057137     122136.585463   72000.0
    6  200  116854.017302     -17656.674664     251364.709267  112000.0
    7  451   66682.350190      -8409.325274     141774.025654   70000.0
    8  417  107685.061593     -13027.683164     228397.806351  120000.0
    9  187   37198.515964      -7152.411752      81549.443681   41000.0
     Performance Metrics : 
               MAE           MSE      MSLE       MAPE       MPE          RMSE     RMSLE            ME        R2        EV          MPD       MGD
    0  9628.455479  1.483976e+08  0.034632  15.784078 -5.485578  12181.855382  0.186097  52320.270431  0.783807  0.785125  2084.728019  0.033508
    
    >>> prediction
    id	Prediction	Confidence_Lower	Confidence_upper	price
    26	55320.3275035	-14932.158036819601	125572.8130438196	52000.0
    28	68406.69997650001	-3935.3562509474723	140748.75620394747	52000.0
    29	86185.303965	-12767.850251373035	185138.45818137302	78000.0
    30	54504.629284999995	-7394.386681102078	116403.64525110208	44100.0
    120	110679.72956949998	-18432.318021989675	239791.77716098964	163000.0
    121	71314.912363	-14548.560484102243	157178.38521010225	86000.0
    31	61366.627566999996	-10317.129140241195	133050.3842742412	53900.0
    27	114092.16751700001	-20268.7281787322	248453.0632127322	120000.0
    25	52893.839117999996	-6981.666775837832	112769.34501183782	45000.0
    24	80870.8686425	-11200.480495712312	172942.21778071232	99000.0
  7. Generate prediction on validation dataset using third best performing model.
    >>> prediction = aml.predict(rank=3)
    Following model is being used for generating prediction :
    Model ID : GLM_3 
    Feature Selection Method : lasso
     Prediction : 
        id     prediction     price
    0  492   66186.826718   87250.0
    1  368  119753.381829  103000.0
    2  454   78665.886884   93000.0
    3  247   45251.710670   30000.0
    4  205  113365.000670  100500.0
    5  211   51491.354759   72000.0
    6  200  111707.987096  112000.0
    7  451   74287.740695   70000.0
    8  417  112877.191516  120000.0
    9  187   30706.236368   41000.0
     Performance Metrics : 
                MAE           MSE      MSLE       MAPE       MPE         RMSE    RMSLE            ME        R2        EV          MPD       MGD
    0  10200.663327  1.775318e+08  0.037768  16.300732 -5.466382  13324.10415  0.19434  54055.219486  0.741363  0.742799  2391.983615  0.037205
    >>> prediction.head()
    id	prediction	price
    26	63880.087955143106	52000.0
    28	64887.03364189595	52000.0
    29	81838.00634272449	78000.0
    30	59020.9083389725	44100.0
    120	108944.78051355484	163000.0
    121	76648.23791163946	86000.0
    31	64051.73639523083	53900.0
    27	108332.60522274605	120000.0
    25	51235.20813818371	45000.0
    24	89398.59783781157	99000.0
  8. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(housing_test)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713818093862300"'
    
    Updated dataset after performing categorical encoding :
    sn	price	lotsize	bedrooms	bathrms	stories	driveway_0	driveway_1	recroom_0	recroom_1	fullbase_0	fullbase_1	gashw_0	gashw_1	airco_0	airco_1	garagepl	prefarea_0	prefarea_1	homestyle_0	homestyle_1	homestyle_2	id
    260	41000.0	6000.0	2	1	1	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	10
    469	55000.0	2176.0	2	1	2	0	1	0	1	1	0	1	0	1	0	0	0	1	0	0	1	8
    364	72000.0	10700.0	3	1	2	0	1	0	1	0	1	1	0	1	0	0	1	0	0	0	1	16
    53	68000.0	9166.0	2	1	1	0	1	1	0	0	1	1	0	0	1	2	1	0	0	0	1	11
    255	61000.0	4360.0	4	1	2	0	1	1	0	1	0	1	0	1	0	0	1	0	0	0	1	15
    16	37900.0	3185.0	2	1	1	0	1	1	0	1	0	1	0	0	1	0	1	0	0	1	0	23
    251	48500.0	3450.0	3	1	1	0	1	1	0	0	1	1	0	1	0	2	1	0	0	1	0	14
    408	87500.0	6420.0	3	1	3	0	1	1	0	0	1	1	0	1	0	0	0	1	0	0	1	22
    301	55000.0	4080.0	2	1	1	0	1	1	0	1	0	1	0	1	0	0	1	0	0	0	1	9
    13	27000.0	1700.0	3	1	2	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	17
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815079036517"'
    
    Updated dataset after performing Lasso feature selection:
    id	stories	prefarea_1	sn	bathrms	fullbase_0	recroom_0	homestyle_1	recroom_1	garagepl	driveway_0	prefarea_0	fullbase_1	airco_1	driveway_1	homestyle_0	lotsize	price
    48	1	0	25	1	1	1	1	0	0	0	1	0	0	1	0	4960.0	42000.0
    44	1	0	239	1	0	1	1	0	2	0	1	1	0	1	0	3000.0	26000.0
    39	1	1	441	1	1	1	0	0	2	0	0	0	0	1	0	3520.0	51900.0
    51	1	1	443	1	1	1	0	0	0	0	0	0	0	1	0	3520.0	65000.0
    33	1	1	411	1	0	1	0	0	1	0	0	1	0	1	0	9000.0	90000.0
    25	1	0	249	1	1	1	1	0	0	0	1	0	0	1	0	3500.0	44500.0
    12	4	0	38	1	1	1	0	0	0	0	1	0	1	1	0	5170.0	67000.0
    67	4	0	317	1	1	1	0	0	0	0	1	0	0	1	0	5000.0	80000.0
    22	3	1	408	1	0	1	0	0	0	0	0	1	0	1	0	6420.0	87500.0
    75	1	0	294	1	1	1	1	0	0	0	1	0	0	1	0	4040.0	47000.0
    
    Updated dataset after performing scaling on Lasso selected features :
    fullbase_0	recroom_1	prefarea_0	driveway_0	fullbase_1	prefarea_1	airco_1	driveway_1	price	homestyle_0	recroom_0	homestyle_1	id	stories	sn	bathrms	garagepl	lotsize
    0	0	0	0	1	1	0	1	87500.0	0	1	0	22	1.8337296639558538	0.8106125112519007	-0.5026028286234501	-0.7628533219155937	0.7487845553609124
    0	1	0	0	1	1	1	1	92500.0	0	0	0	40	-1.1754677333050345	0.7656565314251055	-0.5026028286234501	1.8067578676948275	1.2643683479630925
    1	0	1	0	0	0	0	1	42000.0	0	1	1	48	-1.1754677333050345	-1.6491218135570358	-0.5026028286234501	-0.7628533219155937	-0.011571340799878474
    1	0	1	0	0	0	1	1	64000.0	0	1	0	29	-1.1754677333050345	0.15553966234717084	-0.5026028286234501	0.5219522728896168	0.47016099067185546
    1	0	0	0	0	1	0	1	51900.0	0	1	0	39	-1.1754677333050345	1.022547844721078	-0.5026028286234501	1.8067578676948275	-0.7615114027666858
    1	0	0	0	0	1	0	1	65000.0	0	1	0	51	-1.1754677333050345	1.0353924103858767	-0.5026028286234501	-0.7628533219155937	-0.7615114027666858
    1	0	1	0	0	0	0	1	47000.0	0	1	1	75	-1.1754677333050345	0.0784722683583791	-0.5026028286234501	-0.7628533219155937	-0.49069971372311655
    0	0	0	0	1	1	0	1	90000.0	0	1	0	33	-1.1754677333050345	0.8298793597490987	-0.5026028286234501	0.5219522728896168	2.0924271663847755
    1	0	1	0	0	0	0	1	44500.0	0	1	1	25	-1.1754677333050345	-0.21053045909958995	-0.5026028286234501	-0.7628533219155937	-0.7719272369606693
    0	0	1	0	1	0	0	1	26000.0	0	1	1	44	-1.1754677333050345	-0.27475328742358307	-0.5026028286234501	1.8067578676948275	-1.0323230918102553
    
    Updated dataset after performing RFE feature selection:
    id	sn	bathrms	homestyle_1	garagepl	homestyle_2	airco_0	homestyle_0	lotsize	price
    48	25	1	1	0	0	1	0	4960.0	42000.0
    44	239	1	1	2	0	1	0	3000.0	26000.0
    39	441	1	0	2	1	1	0	3520.0	51900.0
    51	443	1	0	0	1	1	0	3520.0	65000.0
    33	411	1	0	1	1	1	0	9000.0	90000.0
    25	249	1	1	0	0	1	0	3500.0	44500.0
    12	38	1	0	0	1	0	0	5170.0	67000.0
    67	317	1	0	0	1	1	0	5000.0	80000.0
    22	408	1	0	0	1	1	0	6420.0	87500.0
    75	294	1	1	0	0	1	0	4040.0	47000.0
    
    Updated dataset after performing scaling on RFE selected features :
    r_homestyle_1	r_homestyle_2	r_airco_0	r_homestyle_0	price	id	r_sn	r_bathrms	r_garagepl	r_lotsize
    1	0	1	0	42000.0	48	-1.6491218135570358	-0.5026028286234501	-0.7628533219155937	-0.011571340799878474
    1	0	1	0	26000.0	44	-0.27475328742358307	-0.5026028286234501	1.8067578676948275	-1.0323230918102553
    0	1	1	0	51900.0	39	1.022547844721078	-0.5026028286234501	1.8067578676948275	-0.7615114027666858
    0	1	1	0	65000.0	51	1.0353924103858767	-0.5026028286234501	-0.7628533219155937	-0.7615114027666858
    0	1	1	0	90000.0	33	0.8298793597490987	-0.5026028286234501	0.5219522728896168	2.0924271663847755
    1	0	1	0	44500.0	25	-0.21053045909958995	-0.5026028286234501	-0.7628533219155937	-0.7719272369606693
    0	1	0	0	67000.0	12	-1.5656321367358448	-0.5026028286234501	-0.7628533219155937	0.09779491823694761
    0	1	1	0	80000.0	67	0.22618477350356328	-0.5026028286234501	-0.7628533219155937	0.009260327588088398
    0	1	1	0	87500.0	22	0.8106125112519007	-0.5026028286234501	-0.7628533219155937	0.7487845553609124
    1	0	1	0	47000.0	75	0.0784722683583791	-0.5026028286234501	-0.7628533219155937	-0.49069971372311655
    
    Updated dataset after performing scaling for PCA feature selection :
    recroom_0	recroom_1	prefarea_0	driveway_0	bedrooms	prefarea_1	fullbase_1	airco_1	homestyle_2	gashw_0	driveway_1	gashw_1	airco_0	homestyle_0	fullbase_0	price	homestyle_1	id	sn	lotsize	bathrms	stories	garagepl
    1	0	0	0	3	1	1	0	1	1	1	0	1	0	0	87500.0	0	22	0.8106125112519003	0.7487845553609134	-0.5026028286234494	1.8337296639558482	-0.7628533219155942
    0	1	0	0	3	1	1	1	1	1	1	0	0	0	0	92500.0	0	40	0.765656531425105	1.2643683479630943	-0.5026028286234494	-1.175467733305031	1.8067578676948284
    1	0	1	0	2	0	0	0	0	1	1	0	1	0	1	42000.0	1	48	-1.649121813557035	-0.01157134079987849	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	1	0	2	0	0	1	1	1	1	0	0	0	1	64000.0	0	29	0.15553966234717076	0.4701609906718561	-0.5026028286234494	-1.175467733305031	0.5219522728896171
    1	0	0	0	3	1	0	0	1	1	1	0	1	0	1	51900.0	0	39	1.0225478447210774	-0.7615114027666869	-0.5026028286234494	-1.175467733305031	1.8067578676948284
    1	0	0	0	3	1	0	0	1	1	1	0	1	0	1	65000.0	0	51	1.035392410385876	-0.7615114027666869	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	1	0	2	0	0	0	0	1	1	0	1	0	1	47000.0	1	75	0.07847226835837905	-0.4906997137231172	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	0	0	3	1	1	0	1	1	1	0	1	0	0	90000.0	0	33	0.8298793597490982	2.0924271663847787	-0.5026028286234494	-1.175467733305031	0.5219522728896171
    1	0	1	0	2	0	0	0	0	1	1	0	1	0	1	44500.0	1	25	-0.21053045909958984	-0.7719272369606704	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	1	0	2	0	1	0	0	1	1	0	1	0	0	26000.0	1	44	-0.2747532874235829	-1.0323230918102566	-0.5026028286234494	-1.175467733305031	1.8067578676948284
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	col_2	col_3	col_4	col_5	col_6	col_7	col_8	col_9	price
    0	24	0.121058	-0.088188	1.823276	-1.499059	0.807124	-0.001108	-0.486189	-0.327831	0.999693	0.041307	64900.0
    1	12	-1.702851	2.244144	-0.163064	1.531449	-1.726684	1.435539	-0.336577	0.312267	-0.721428	-0.019429	67000.0
    2	22	0.436066	1.674569	-1.184835	-0.517098	-0.149681	0.581587	-1.015373	0.346597	-1.125296	-0.054246	87500.0
    3	40	2.732152	-1.196533	-0.243570	-0.130603	0.061535	0.992846	0.669891	0.796020	0.182694	0.034230	92500.0
    4	67	-1.071889	2.634593	-1.143026	1.133870	-0.666936	0.448109	-1.183424	0.156071	-0.190941	-0.026246	80000.0
    5	48	-1.466384	-1.594199	0.440659	-0.192916	-0.727838	-0.698315	-0.370444	0.078220	-0.266739	-0.400643	42000.0
    6	29	0.713664	-1.049937	-0.245372	0.286599	-0.760120	-0.026496	0.770101	-0.793006	0.435955	0.026876	64000.0
    7	44	-0.084548	-1.910265	0.033862	0.687256	1.538283	0.149991	0.443854	0.136846	0.043304	-1.148533	26000.0
    8	39	1.145001	-1.114998	-1.120372	0.694225	1.703701	-0.350166	0.571300	-0.292433	0.199957	0.403042	51900.0
    9	51	0.035124	-0.359414	-1.219128	-1.145653	0.772859	-0.721654	0.162539	-0.630366	0.185628	0.517291	65000.0
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : XGBOOST_3 
    Feature Selection Method : lasso
    
     Prediction : 
       id    Prediction  Confidence_Lower  Confidence_upper    price
    0  48  52832.052623      -1130.634446     106794.739693  42000.0
    1  44  48048.913899      -1342.559233      97440.387030  26000.0
    2  39  59237.625097     -11283.510389     129758.760583  51900.0
    3  51  59341.776847     -11013.321662     129696.875356  65000.0
    4  33  81820.074617     -14655.044080     178295.193314  90000.0
    5  25  47145.112130      -5338.347479      99628.571740  44500.0
    6  22  83148.275354     -13682.206803     179978.757511  87500.0
    7  12  82823.957607      -8536.222191     174184.137404  67000.0
    8  67  76352.427511     -13523.627091     166228.482114  80000.0
    9  75  41305.158083      -8937.031699      91547.347865  47000.0
    
     Performance Metrics : 
               MAE           MSE      MSLE       MAPE       MPE         RMSE    RMSLE            ME        R2        EV          MPD       MGD
    0  7530.032113  9.803829e+07  0.032812  14.340231 -6.394165  9901.428629  0.18114  31376.260153  0.695521  0.702174  1675.413575  0.030915
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	price
    10	50847.987966999994	-2114.7423931209414	103810.71832712094	41000.0
    12	82823.9576065	-8536.222191293302	174184.1374042933	67000.0
    13	42366.405568999995	-7506.190055977691	92239.00119397769	49000.0
    14	50464.071062999996	-2300.8744488123702	103229.01657481235	48500.0
    16	72076.95612349999	-17375.200759916668	161529.11300691665	72000.0
    17	30842.594055999998	-5288.270194834851	66973.45830683484	27000.0
    15	62003.993956999984	-9860.1547885412	133868.14270254117	61000.0
    11	74293.7388875	-8882.318131336462	157469.79590633645	68000.0
    9	60351.362888	-11434.725754912419	132137.45153091243	55000.0
    8	59123.682122	-16171.248326522546	134418.61257052253	55000.0
  9. Generate prediction on test dataset using second best performing model.
    >>> prediction = aml.predict(housing_test,2)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815255642391"'
    
    Updated dataset after performing categorical encoding :
    sn	price	lotsize	bedrooms	bathrms	stories	driveway_0	driveway_1	recroom_0	recroom_1	fullbase_0	fullbase_1	gashw_0	gashw_1	airco_0	airco_1	garagepl	prefarea_0	prefarea_1	homestyle_0	homestyle_1	homestyle_2	id
    53	68000.0	9166.0	2	1	1	0	1	1	0	0	1	1	0	0	1	2	1	0	0	0	1	11
    463	49000.0	2610.0	3	1	2	0	1	1	0	0	1	1	0	1	0	0	0	1	0	1	0	13
    459	44555.0	2398.0	3	1	1	0	1	1	0	1	0	1	0	1	0	0	0	1	0	1	0	21
    38	67000.0	5170.0	3	1	4	0	1	1	0	1	0	1	0	0	1	0	1	0	0	0	1	12
    251	48500.0	3450.0	3	1	1	0	1	1	0	0	1	1	0	1	0	2	1	0	0	1	0	14
    408	87500.0	6420.0	3	1	3	0	1	1	0	0	1	1	0	1	0	0	0	1	0	0	1	22
    255	61000.0	4360.0	4	1	2	0	1	1	0	1	0	1	0	1	0	0	1	0	0	0	1	15
    16	37900.0	3185.0	2	1	1	0	1	1	0	1	0	1	0	0	1	0	1	0	0	1	0	23
    301	55000.0	4080.0	2	1	1	0	1	1	0	1	0	1	0	1	0	0	1	0	0	0	1	9
    13	27000.0	1700.0	3	1	2	0	1	1	0	1	0	1	0	1	0	0	1	0	0	1	0	17
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814881895584"'
    
    Updated dataset after performing Lasso feature selection:
    id	stories	prefarea_1	sn	bathrms	fullbase_0	recroom_0	homestyle_1	recroom_1	garagepl	driveway_0	prefarea_0	fullbase_1	airco_1	driveway_1	homestyle_0	lotsize	price
    23	1	0	16	1	1	1	1	0	0	0	1	0	1	1	0	3185.0	37900.0
    32	1	1	403	1	0	0	0	1	0	0	0	1	1	1	0	6825.0	77500.0
    24	1	0	274	2	0	0	0	1	0	0	1	1	0	1	0	4100.0	64900.0
    75	1	0	294	1	1	1	1	0	0	0	1	0	0	1	0	4040.0	47000.0
    52	1	0	111	1	1	1	1	0	0	1	1	0	0	0	0	5076.0	43000.0
    11	1	0	53	1	0	1	0	0	2	0	1	1	1	1	0	9166.0	68000.0
    39	1	1	441	1	1	1	0	0	2	0	0	0	0	1	0	3520.0	51900.0
    29	1	0	306	1	1	1	0	0	1	0	1	0	1	1	0	5885.0	64000.0
    22	3	1	408	1	0	1	0	0	0	0	0	1	0	1	0	6420.0	87500.0
    51	1	1	443	1	1	1	0	0	0	0	0	0	0	1	0	3520.0	65000.0
    
    Updated dataset after performing scaling on Lasso selected features :
    fullbase_0	recroom_1	prefarea_0	driveway_0	fullbase_1	prefarea_1	airco_1	driveway_1	price	homestyle_0	recroom_0	homestyle_1	id	stories	sn	bathrms	garagepl	lotsize
    0	1	0	0	1	1	1	1	77500.0	0	0	0	32	-1.1754677333050345	0.7785010970899041	-0.5026028286234501	-0.7628533219155937	0.9597051977890769
    1	0	1	0	0	0	0	1	47000.0	0	1	1	75	-1.1754677333050345	0.0784722683583791	-0.5026028286234501	-0.7628533219155937	-0.49069971372311655
    1	0	0	0	0	1	0	1	65000.0	0	1	0	51	-1.1754677333050345	1.0353924103858767	-0.5026028286234501	-0.7628533219155937	-0.7615114027666858
    1	0	1	1	0	0	0	0	43000.0	0	1	1	52	-1.1754677333050345	-1.096805489970695	-0.5026028286234501	-0.7628533219155937	0.04884049752522546
    1	0	0	0	0	1	0	1	51900.0	0	1	0	39	-1.1754677333050345	1.022547844721078	-0.5026028286234501	1.8067578676948275	-0.7615114027666858
    1	0	1	0	0	0	1	1	64000.0	0	1	0	29	-1.1754677333050345	0.15553966234717084	-0.5026028286234501	0.5219522728896168	0.47016099067185546
    1	0	1	0	0	0	0	1	80000.0	0	1	0	67	3.338328362586298	0.22618477350356328	-0.5026028286234501	-0.7628533219155937	0.009260327588088398
    1	0	1	0	0	0	1	1	67000.0	0	1	0	12	3.338328362586298	-1.5656321367358448	-0.5026028286234501	-0.7628533219155937	0.09779491823694761
    0	0	0	0	1	1	0	1	87500.0	0	1	0	22	1.8337296639558538	0.8106125112519007	-0.5026028286234501	-0.7628533219155937	0.7487845553609124
    0	0	1	0	1	0	1	1	68000.0	0	1	0	11	-1.1754677333050345	-1.4692978942498551	-0.5026028286234501	1.8067578676948275	2.178878590194838
    
    Updated dataset after performing RFE feature selection:
    id	sn	bathrms	homestyle_1	garagepl	homestyle_2	airco_0	homestyle_0	lotsize	price
    23	16	1	1	0	0	0	0	3185.0	37900.0
    32	403	1	0	0	1	0	0	6825.0	77500.0
    24	274	2	0	0	1	1	0	4100.0	64900.0
    75	294	1	1	0	0	1	0	4040.0	47000.0
    52	111	1	1	0	0	1	0	5076.0	43000.0
    11	53	1	0	2	1	0	0	9166.0	68000.0
    39	441	1	0	2	1	1	0	3520.0	51900.0
    29	306	1	0	1	1	0	0	5885.0	64000.0
    22	408	1	0	0	1	1	0	6420.0	87500.0
    51	443	1	0	0	1	1	0	3520.0	65000.0
    
    Updated dataset after performing scaling on RFE selected features :
    r_homestyle_1	r_homestyle_2	r_airco_0	r_homestyle_0	price	id	r_sn	r_bathrms	r_garagepl	r_lotsize
    0	1	0	0	77500.0	32	0.7785010970899041	-0.5026028286234501	-0.7628533219155937	0.9597051977890769
    1	0	1	0	47000.0	75	0.0784722683583791	-0.5026028286234501	-0.7628533219155937	-0.49069971372311655
    0	1	1	0	65000.0	51	1.0353924103858767	-0.5026028286234501	-0.7628533219155937	-0.7615114027666858
    1	0	1	0	43000.0	52	-1.096805489970695	-0.5026028286234501	-0.7628533219155937	0.04884049752522546
    0	1	1	0	51900.0	39	1.022547844721078	-0.5026028286234501	1.8067578676948275	-0.7615114027666858
    0	1	0	0	64000.0	29	0.15553966234717084	-0.5026028286234501	0.5219522728896168	0.47016099067185546
    0	1	1	0	87500.0	22	0.8106125112519007	-0.5026028286234501	-0.7628533219155937	0.7487845553609124
    0	1	1	0	80000.0	67	0.22618477350356328	-0.5026028286234501	-0.7628533219155937	0.009260327588088398
    0	1	0	0	67000.0	12	-1.5656321367358448	-0.5026028286234501	-0.7628533219155937	0.09779491823694761
    0	1	0	0	68000.0	11	-1.4692978942498551	-0.5026028286234501	1.8067578676948275	2.178878590194838
    
    Updated dataset after performing scaling for PCA feature selection :
    recroom_0	recroom_1	prefarea_0	driveway_0	bedrooms	prefarea_1	fullbase_1	airco_1	homestyle_2	gashw_0	driveway_1	gashw_1	airco_0	homestyle_0	fullbase_0	price	homestyle_1	id	sn	lotsize	bathrms	stories	garagepl
    0	1	0	0	3	1	1	1	1	1	1	0	0	0	0	77500.0	0	32	0.7785010970899037	0.9597051977890783	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	1	0	2	0	0	0	0	1	1	0	1	0	1	47000.0	1	75	0.07847226835837905	-0.4906997137231172	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	0	0	3	1	0	0	1	1	1	0	1	0	1	65000.0	0	51	1.035392410385876	-0.7615114027666869	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	1	1	3	0	0	0	0	1	0	0	1	0	1	43000.0	1	52	-1.0968054899706945	0.048840497525225526	-0.5026028286234494	-1.175467733305031	-0.7628533219155942
    1	0	0	0	3	1	0	0	1	1	1	0	1	0	1	51900.0	0	39	1.0225478447210774	-0.7615114027666869	-0.5026028286234494	-1.175467733305031	1.8067578676948284
    1	0	1	0	2	0	0	1	1	1	1	0	0	0	1	64000.0	0	29	0.15553966234717076	0.4701609906718561	-0.5026028286234494	-1.175467733305031	0.5219522728896171
    1	0	1	0	3	0	0	0	1	1	1	0	1	0	1	80000.0	0	67	0.22618477350356314	0.009260327588088412	-0.5026028286234494	3.338328362586288	-0.7628533219155942
    1	0	1	0	3	0	0	1	1	1	1	0	0	0	1	67000.0	0	12	-1.565632136735844	0.09779491823694775	-0.5026028286234494	3.338328362586288	-0.7628533219155942
    1	0	0	0	3	1	1	0	1	1	1	0	1	0	0	87500.0	0	22	0.8106125112519003	0.7487845553609134	-0.5026028286234494	1.8337296639558482	-0.7628533219155942
    1	0	1	0	2	0	1	1	1	1	1	0	0	0	0	68000.0	0	11	-1.4692978942498542	2.178878590194841	-0.5026028286234494	-1.175467733305031	1.8067578676948284
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	col_2	col_3	col_4	col_5	col_6	col_7	col_8	col_9	price
    0	23	-1.846762	-1.338399	0.503782	-0.227864	-0.735116	-0.101448	1.053769	-0.133609	0.006323	-0.570542	37900.0
    1	67	-1.071889	2.634593	-1.143026	1.133870	-0.666936	0.448109	-1.183424	0.156071	-0.190941	-0.026246	80000.0
    2	22	0.436066	1.674569	-1.184835	-0.517098	-0.149681	0.581587	-1.015373	0.346597	-1.125296	-0.054246	87500.0
    3	21	-0.556103	-0.430944	-1.299009	-1.208186	1.167272	-1.350354	0.655622	0.022532	0.184275	-0.272029	44555.0
    4	12	-1.702851	2.244144	-0.163064	1.531449	-1.726684	1.435539	-0.336577	0.312267	-0.721428	-0.019429	67000.0
    5	32	1.447936	-0.419323	-0.367653	-1.974154	-0.669749	0.666961	0.361696	0.420576	0.244402	0.138265	77500.0
    6	24	0.121058	-0.088188	1.823276	-1.499059	0.807124	-0.001108	-0.486189	-0.327831	0.999693	0.041307	64900.0
    7	75	-0.913410	-0.987968	-0.428719	-0.617452	-0.026463	-1.166772	0.029816	-0.223178	0.401544	-0.558936	47000.0
    8	51	0.035124	-0.359414	-1.219128	-1.145653	0.772859	-0.721654	0.162539	-0.630366	0.185628	0.517291	65000.0
    9	52	-1.417779	-1.546909	0.258422	-0.403733	-0.676443	-1.060856	0.087088	0.370494	-0.220855	0.444359	43000.0
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : XGBOOST_0
    Feature Selection Method : lasso
    
     Prediction : 
       id    Prediction  Confidence_Lower  Confidence_upper    price
    0  22  83148.275354     -13682.206803     179978.757511  87500.0
    1  21  40304.804437      -4991.597547      85601.206422  44555.0
    2  32  80038.998447     -15350.979605     175428.976498  77500.0
    3  24  75236.641121     -15797.504638     166270.786879  64900.0
    4  51  59341.776847     -11013.321662     129696.875356  65000.0
    5  52  49527.065554      -1879.863957     100933.995065  43000.0
    6  11  74293.738888      -8882.318131     157469.795906  68000.0
    7  39  59237.625097     -11283.510389     129758.760583  51900.0
    8  29  65776.624865      -9231.694949     140784.944679  64000.0
    9  75  41305.158083      -8937.031699      91547.347865  47000.0
    
     Performance Metrics : 
               MAE           MSE      MSLE       MAPE       MPE         RMSE    RMSLE            ME        R2        EV          MPD       MGD
    0  7530.032113  9.803829e+07  0.032812  14.340231 -6.394165  9901.428629  0.18114  31376.260153  0.695521  0.702174  1675.413575  0.030915
    
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	price
    10	50847.987966999994	-2114.7423931209414	103810.71832712094	41000.0
    12	82823.9576065	-8536.222191293302	174184.1374042933	67000.0
    13	42366.405568999995	-7506.190055977691	92239.00119397769	49000.0
    14	50464.071062999996	-2300.8744488123702	103229.01657481235	48500.0
    16	72076.95612349999	-17375.200759916668	161529.11300691665	72000.0
    17	30842.594055999998	-5288.270194834851	66973.45830683484	27000.0
    15	62003.993956999984	-9860.1547885412	133868.14270254117	61000.0
    11	74293.7388875	-8882.318131336462	157469.79590633645	68000.0
    9	60351.362888	-11434.725754912419	132137.45153091243	55000.0
    8	59123.682122	-16171.248326522546	134418.61257052253	55000.0