This example predicts the price of house based on different factors.
Run AutoML to get the best performing model with the following specifications:
- Set early stopping criteria, that is, time limit to 300 sec and performance metrics R2 threshold value to 0.7.
- Exclude ‘knn’ model from default model training list.
- Opt for verbose level 2 to get detailed logging.
- Load the example dataset.
>>> load_example_data("decisionforestpredict", ["housing_train", "housing_test"])
>>> housing_train = DataFrame.from_table("housing_train")
>>> housing_test = DataFrame.from_table("housing_test")
- Create an AutoML instance.
>>> aml = AutoML(task_type="Regression", exclude=['knn'], verbose=2, max_runtime_secs=300, stopping_metric='R2', stopping_tolerance=0.7)
- Fit the data.
>>> aml.fit(housing_train,housing_train.price)
1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Exploration started ... Data Overview: Total Rows in the data: 492 Total Columns in the data: 14 Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage sn INTEGER 492 0 None 0 492 0 0.0 100.0 recroom VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 garagepl INTEGER 492 0 None 270 222 0 0.0 100.0 fullbase VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 gashw VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 price FLOAT 492 0 None 0 492 0 0.0 100.0 bathrms INTEGER 492 0 None 0 492 0 0.0 100.0 prefarea VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 airco VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 stories INTEGER 492 0 None 0 492 0 0.0 100.0 bedrooms INTEGER 492 0 None 0 492 0 0.0 100.0 lotsize FLOAT 492 0 None 0 492 0 0.0 100.0 homestyle VARCHAR(20) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 driveway VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 Statistics of Data: func sn price lotsize bedrooms bathrms stories garagepl min 1 25000 1650 1 1 1 0 std 159.501 26472.496 2182.443 0.731 0.51 0.861 0.854 25% 132.5 49975 3600 2 1 1 0 50% 274 62000 4616 3 1 2 0 75% 413.25 82000 6370 3 2 2 1 max 546 190000 16200 6 4 4 3 mean 272.943 68100.396 5181.795 2.965 1.293 1.803 0.685 count 492 492 492 492 492 492 492 Categorical Columns with their Distinct values: ColumnName DistinctValueCount driveway 2 recroom 2 fullbase 2 gashw 2 airco 2 prefarea 2 homestyle 3 No Futile columns found. Target Column Distribution: Columns with outlier percentage :- ColumnName OutlierPercentage 0 stories 7.113821 1 bedrooms 2.235772 2 bathrms 0.203252 3 garagepl 2.235772 4 lotsize 2.235772 5 price 2.439024 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Engineering started ... Handling duplicate records present in dataset ... Analysis completed. No action taken. Total time to handle duplicate records: 1.55 sec Handling less significant features from data ... Analysis indicates all categorical columns are significant. No action Needed. Total time to handle less significant features: 19.75 sec Handling Date Features ... Analysis Completed. Dataset does not contain any feature related to dates. No action needed. Total time to handle date features: 0.00 sec Checking Missing values in dataset ... Analysis Completed. No Missing Values Detected. Total time to find missing values in data: 8.82 sec Imputing Missing Values ... Analysis completed. No imputation required. Time taken to perform imputation: 0.00 sec Performing encoding for categorical columns ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814119199335"'8 ONE HOT Encoding these Columns: ['driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'prefarea', 'homestyle'] Sample of dataset after performing one hot encoding: sn price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 id 242 52000.0 3000.0 2 1 2 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 28 425 65500.0 3840.0 3 1 2 0 1 1 0 1 0 1 0 1 0 1 0 1 0 0 1 44 118 94500.0 4000.0 3 2 2 0 1 1 0 0 1 1 0 0 1 1 1 0 0 0 1 52 240 30000.0 3000.0 4 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 60 362 145000.0 8580.0 4 3 4 0 1 1 0 1 0 1 0 0 1 2 0 1 1 0 0 76 259 33500.0 3640.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 84 505 71500.0 8150.0 3 2 1 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 68 507 75000.0 9800.0 4 2 2 0 1 0 1 1 0 1 0 1 0 2 1 0 0 0 1 36 345 88000.0 4500.0 3 1 4 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 20 80 63900.0 6360.0 2 1 1 0 1 1 0 0 1 1 0 0 1 1 1 0 0 0 1 12 492 rows X 23 columns Time taken to encode the columns: 14.51 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Data preparation started ... Spliting of dataset into training and testing ... Training size : 0.8 Testing size : 0.2 Training data sample sn price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 id 40 54500.0 3150.0 2 2 1 1 0 1 0 0 1 1 0 1 0 0 1 0 0 0 1 10 122 80000.0 10500.0 4 2 2 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 11 387 83900.0 11460.0 3 1 3 0 1 1 0 1 0 1 0 1 0 2 0 1 0 0 1 19 326 99000.0 8880.0 3 2 2 0 1 1 0 0 1 1 0 0 1 1 1 0 0 0 1 13 61 48000.0 4120.0 2 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 14 244 27000.0 3649.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 22 427 49500.0 5320.0 2 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 15 223 70100.0 4200.0 3 1 2 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 23 265 50000.0 3640.0 2 1 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 9 101 57000.0 4500.0 3 2 2 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 17 393 rows X 23 columns Testing data sample sn price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 id 385 78000.0 6600.0 4 2 2 0 1 0 1 0 1 1 0 1 0 0 0 1 0 0 1 29 284 45000.0 6750.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 25 354 86000.0 6800.0 2 1 1 0 1 0 1 0 1 1 0 1 0 2 1 0 0 0 1 121 488 44100.0 8100.0 2 1 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 30 202 53900.0 2520.0 5 2 1 1 0 1 0 0 1 1 0 0 1 1 1 0 0 0 1 31 32 48000.0 3500.0 4 1 2 0 1 1 0 1 0 1 0 0 1 2 1 0 0 1 0 127 448 120000.0 5500.0 4 2 2 0 1 1 0 0 1 1 0 0 1 1 0 1 1 0 0 27 91 47000.0 6060.0 3 1 1 0 1 0 1 0 1 1 0 1 0 0 1 0 0 1 0 123 242 52000.0 3000.0 2 1 2 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 28 379 84000.0 7160.0 3 1 1 0 1 1 0 0 1 1 0 1 0 2 0 1 0 0 1 124 99 rows X 23 columns Time taken for spliting of data: 10.83 sec Outlier preprocessing ... Columns with outlier percentage :- ColumnName OutlierPercentage 0 stories 7.113821 1 bedrooms 2.235772 2 garagepl 2.235772 3 bathrms 0.203252 4 lotsize 2.235772 5 price 2.439024 Deleting rows of these columns: ['price', 'bathrms', 'garagepl', 'bedrooms', 'lotsize', 'stories'] result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713813731333800"'8 Sample of training dataset after removing outlier rows: sn price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 id 278 65500.0 4000.0 3 1 2 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 74 175 50000.0 3036.0 3 1 2 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 0 98 478 88500.0 5500.0 3 2 1 0 1 0 1 0 1 1 0 1 0 2 0 1 0 0 1 130 415 52000.0 2850.0 3 2 2 1 0 1 0 0 1 1 0 1 0 0 0 1 0 0 1 154 455 74900.0 6050.0 3 1 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 0 1 178 127 117000.0 5960.0 3 3 2 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 0 194 7 66000.0 3880.0 3 2 2 0 1 1 0 0 1 1 0 1 0 2 1 0 0 0 1 162 461 47600.0 2145.0 3 1 2 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 90 423 61100.0 3400.0 3 1 2 0 1 1 0 0 1 1 0 1 0 2 0 1 0 0 1 58 57 25245.0 2400.0 3 1 1 1 0 1 0 1 0 1 0 1 0 0 1 0 0 1 0 34 192 rows X 23 columns Time Taken by Outlier processing: 33.67 sec result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814511698109"'8 result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713818540523806"' Feature selection using lasso ... feature selected by lasso: ['stories', 'prefarea_1', 'sn', 'bathrms', 'fullbase_0', 'recroom_0', 'homestyle_1', 'recroom_1', 'garagepl', 'driveway_0', 'prefarea_0', 'fullbase_1', 'airco_1', 'driveway_1', 'homestyle_0', 'lotsize'] Total time taken by feature selection: 0.97 sec scaling Features of lasso data ... columns that will be scaled: ['stories', 'sn', 'bathrms', 'garagepl', 'lotsize'] Training dataset sample after scaling: fullbase_0 recroom_1 prefarea_0 driveway_0 fullbase_1 prefarea_1 airco_1 driveway_1 price homestyle_0 recroom_0 homestyle_1 id stories sn bathrms garagepl lotsize 0 0 1 0 1 0 0 1 44700.0 0 1 1 426 -1.1754677333050345 -0.6087119947083474 -0.5026028286234501 -0.7628533219155937 0.26705222388917843 1 0 1 1 0 0 0 0 38000.0 0 1 1 133 -1.1754677333050345 -0.6986239543619377 -0.5026028286234501 -0.7628533219155937 -1.3291743663387832 1 0 1 0 0 0 1 1 84000.0 0 1 0 136 1.8337296639558538 1.6519315622962105 1.6905731508243313 -0.7628533219155937 0.7904478921368461 1 0 1 0 0 0 0 1 65000.0 0 1 0 108 -1.1754677333050345 -0.9426707019931115 -0.5026028286234501 1.8067578676948275 -0.6365213924388846 0 0 0 0 1 1 1 1 73000.0 0 1 0 182 -1.1754677333050345 1.0996152387098697 -0.5026028286234501 -0.7628533219155937 0.5821312082571773 1 0 0 0 0 1 0 1 60000.0 0 1 0 512 1.8337296639558538 1.2152163296930574 -0.5026028286234501 0.5219522728896168 -1.477600003603047 1 0 1 0 0 0 0 1 62000.0 0 1 0 192 1.8337296639558538 -0.8591810251719205 1.6905731508243313 0.5219522728896168 -0.2511355272614975 1 0 1 0 0 0 0 1 70000.0 0 1 0 72 1.8337296639558538 0.38031956148114676 -0.5026028286234501 -0.7628533219155937 -1.0479468431012304 1 1 0 0 0 1 1 1 78000.0 0 0 0 285 1.8337296639558538 0.4830760867995358 -0.5026028286234501 -0.7628533219155937 0.5821312082571773 0 1 0 0 1 1 0 1 88500.0 0 0 0 130 -1.1754677333050345 1.2601723095198525 1.6905731508243313 1.8067578676948275 0.2696561824376743 192 rows X 18 columns Testing dataset sample after scaling: fullbase_0 recroom_1 prefarea_0 driveway_0 fullbase_1 prefarea_1 airco_1 driveway_1 price homestyle_0 recroom_0 homestyle_1 id stories sn bathrms garagepl lotsize 1 0 1 0 0 0 0 1 87250.0 0 1 0 492 1.8337296639558538 -0.36466524707717346 -0.5026028286234501 0.5219522728896168 -0.907333081482454 1 1 1 0 0 0 1 1 103000.0 1 0 0 368 3.338328362586298 1.536330471313023 1.6905731508243313 0.5219522728896168 1.4049821095818689 0 0 0 0 1 1 0 1 93000.0 0 1 0 454 1.8337296639558538 1.2409054610226546 -0.5026028286234501 -0.7628533219155937 0.8789824827857053 1 0 1 0 0 0 0 1 30000.0 0 1 1 247 -1.1754677333050345 0.046360854196382535 -0.5026028286234501 0.5219522728896168 -0.8448380763185533 1 0 0 0 0 1 1 1 100500.0 1 1 0 205 1.8337296639558538 0.5858326121179247 1.6905731508243313 1.8067578676948275 0.7175370527789621 0 1 1 1 1 0 0 0 72000.0 0 0 0 211 -1.1754677333050345 -0.46742177239556243 -0.5026028286234501 -0.7628533219155937 -0.7510955685727024 1 0 0 0 0 1 1 1 112000.0 1 1 0 200 3.338328362586298 0.8427239254138973 1.6905731508243313 -0.7628533219155937 0.7175370527789621 0 1 1 0 1 0 1 1 70000.0 0 0 0 451 -1.1754677333050345 0.17480651084436877 -0.5026028286234501 -0.7628533219155937 0.4258936953474258 1 0 1 0 0 0 1 1 120000.0 1 1 0 417 3.338328362586298 1.6069755824694154 -0.5026028286234501 1.8067578676948275 1.050843746986432 1 0 1 1 0 0 0 0 41000.0 0 1 1 187 -1.1754677333050345 -1.507831591244251 -0.5026028286234501 -0.7628533219155937 -1.0114914234222883 99 rows X 18 columns Total time taken by feature scaling: 48.99 sec Feature selection using rfe ... feature selected by RFE: ['sn', 'bathrms', 'homestyle_1', 'garagepl', 'homestyle_2', 'airco_0', 'homestyle_0', 'lotsize'] Total time taken by feature selection: 41.78 sec scaling Features of rfe data ... columns that will be scaled: ['r_sn', 'r_bathrms', 'r_garagepl', 'r_lotsize'] Training dataset sample after scaling: r_homestyle_1 r_airco_0 r_homestyle_2 r_homestyle_0 price id r_sn r_bathrms r_garagepl r_lotsize 1 1 0 0 44700.0 426 -0.6087119947083474 -0.5026028286234501 -0.7628533219155937 0.26705222388917843 1 1 0 0 38000.0 133 -0.6986239543619377 -0.5026028286234501 -0.7628533219155937 -1.3291743663387832 0 0 1 0 84000.0 136 1.6519315622962105 1.6905731508243313 -0.7628533219155937 0.7904478921368461 0 1 1 0 65000.0 108 -0.9426707019931115 -0.5026028286234501 1.8067578676948275 -0.6365213924388846 0 0 1 0 73000.0 182 1.0996152387098697 -0.5026028286234501 -0.7628533219155937 0.5821312082571773 0 1 1 0 60000.0 512 1.2152163296930574 -0.5026028286234501 0.5219522728896168 -1.477600003603047 0 1 1 0 62000.0 192 -0.8591810251719205 1.6905731508243313 0.5219522728896168 -0.2511355272614975 0 1 1 0 70000.0 72 0.38031956148114676 -0.5026028286234501 -0.7628533219155937 -1.0479468431012304 0 0 1 0 78000.0 285 0.4830760867995358 -0.5026028286234501 -0.7628533219155937 0.5821312082571773 0 1 1 0 88500.0 130 1.2601723095198525 1.6905731508243313 1.8067578676948275 0.2696561824376743 192 rows X 10 columns Testing dataset sample after scaling: r_homestyle_1 r_airco_0 r_homestyle_2 r_homestyle_0 price id r_sn r_bathrms r_garagepl r_lotsize 0 1 1 0 87250.0 492 -0.36466524707717346 -0.5026028286234501 0.5219522728896168 -0.907333081482454 0 0 0 1 103000.0 368 1.536330471313023 1.6905731508243313 0.5219522728896168 1.4049821095818689 0 1 1 0 93000.0 454 1.2409054610226546 -0.5026028286234501 -0.7628533219155937 0.8789824827857053 1 1 0 0 30000.0 247 0.046360854196382535 -0.5026028286234501 0.5219522728896168 -0.8448380763185533 0 0 0 1 100500.0 205 0.5858326121179247 1.6905731508243313 1.8067578676948275 0.7175370527789621 0 1 1 0 72000.0 211 -0.46742177239556243 -0.5026028286234501 -0.7628533219155937 -0.7510955685727024 0 0 0 1 112000.0 200 0.8427239254138973 1.6905731508243313 -0.7628533219155937 0.7175370527789621 0 0 1 0 70000.0 451 0.17480651084436877 -0.5026028286234501 -0.7628533219155937 0.4258936953474258 0 0 0 1 120000.0 417 1.6069755824694154 -0.5026028286234501 1.8067578676948275 1.050843746986432 1 1 0 0 41000.0 187 -1.507831591244251 -0.5026028286234501 -0.7628533219155937 -1.0114914234222883 99 rows X 10 columns Total time taken by feature scaling: 39.84 sec scaling Features of pca data ... columns that will be scaled: ['sn', 'lotsize', 'bathrms', 'stories', 'garagepl'] Training dataset sample after scaling: recroom_0 recroom_1 prefarea_0 driveway_0 bedrooms prefarea_1 fullbase_1 airco_1 homestyle_2 gashw_0 driveway_1 gashw_1 airco_0 homestyle_0 fullbase_0 price homestyle_1 id sn lotsize bathrms stories garagepl 1 0 1 1 3 0 1 1 1 1 0 0 0 0 0 57000.0 0 17 -1.1610283182946874 -0.25113552726149785 1.690573150824329 0.32913096532540864 -0.7628533219155942 1 0 0 0 3 1 0 0 1 1 1 0 1 0 1 65500.0 0 44 0.9197913194026884 -0.5948580556629517 -0.5026028286234494 0.32913096532540864 0.5219522728896171 1 0 1 0 3 0 1 1 1 1 1 0 0 0 0 94500.0 0 52 -1.0518495101438992 -0.5115313821110842 1.690573150824329 0.32913096532540864 0.5219522728896171 1 0 1 0 3 0 0 0 1 1 1 0 1 0 1 70100.0 0 23 -0.3775098127419719 -0.40737304017124965 -0.5026028286234494 0.32913096532540864 0.5219522728896171 1 0 1 0 3 0 0 0 0 1 1 0 1 0 1 35500.0 1 51 -1.4307641972554586 -0.3032146982314151 -0.5026028286234494 0.32913096532540864 -0.7628533219155942 1 0 1 0 3 0 0 1 1 1 1 0 0 0 1 98000.0 0 59 0.2711407533303583 0.5300520372872609 -0.5026028286234494 -1.175467733305031 0.5219522728896171 1 0 1 0 3 0 0 0 1 1 1 0 1 0 1 56000.0 0 38 -0.9041370049987152 -1.0323230918102566 -0.5026028286234494 0.32913096532540864 -0.7628533219155942 0 1 0 0 3 1 1 0 1 1 1 0 1 0 0 86000.0 0 46 0.7977679455871016 0.9987645760165162 1.690573150824329 -1.175467733305031 -0.7628533219155942 1 0 1 0 3 0 1 1 1 1 1 0 0 0 0 99000.0 0 13 0.28398531899515694 2.029932161220878 1.690573150824329 0.32913096532540864 0.5219522728896171 1 0 1 0 3 0 0 0 1 0 1 1 1 0 1 60000.0 0 21 0.14911737951477144 0.42589369534742644 -0.5026028286234494 -1.175467733305031 1.8067578676948284 192 rows X 23 columns Testing dataset sample after scaling: recroom_0 recroom_1 prefarea_0 driveway_0 bedrooms prefarea_1 fullbase_1 airco_1 homestyle_2 gashw_0 driveway_1 gashw_1 airco_0 homestyle_0 fullbase_0 price homestyle_1 id sn lotsize bathrms stories garagepl 1 0 1 0 3 0 1 0 1 1 1 0 1 0 0 52000.0 0 26 -0.525222317887156 -0.7354718172817283 -0.5026028286234494 0.32913096532540864 -0.7628533219155942 1 0 0 0 4 1 1 1 0 1 1 0 0 1 0 120000.0 0 27 1.0675038245478725 0.2696561824376747 1.690573150824329 0.32913096532540864 0.5219522728896171 0 1 1 0 3 0 1 0 0 1 1 0 1 0 0 47000.0 1 123 -1.2252511466186806 0.5612995398692113 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 1 0 2 0 1 0 1 0 1 1 1 0 0 99000.0 0 24 0.5408766322911293 4.2797523471213035 -0.5026028286234494 -1.175467733305031 0.5219522728896171 1 0 1 1 5 0 1 1 1 1 0 0 0 0 0 53900.0 0 31 -0.5123777522223574 -1.2823031124658595 1.690573150824329 -1.175467733305031 0.5219522728896171 1 0 1 0 4 0 0 1 0 1 1 0 0 0 1 48000.0 1 127 -1.6041658337302398 -0.7719272369606704 -0.5026028286234494 0.32913096532540864 1.8067578676948284 1 0 1 0 2 0 0 0 0 1 1 0 1 0 1 44100.0 1 30 1.324395137843845 1.6237146276555232 -0.5026028286234494 -1.175467733305031 0.5219522728896171 1 0 1 1 3 0 0 0 0 1 0 0 1 0 1 42000.0 1 126 -0.8206473281775242 -0.7198480659907531 -0.5026028286234494 0.32913096532540864 0.5219522728896171 1 0 1 0 2 0 0 1 1 1 1 0 0 0 1 52000.0 0 28 -0.255486438926385 -1.0323230918102566 -0.5026028286234494 0.32913096532540864 -0.7628533219155942 1 0 0 0 3 1 1 0 1 1 1 0 1 0 0 84000.0 0 124 0.6243663091123203 1.1341704205383012 -0.5026028286234494 -1.175467733305031 1.8067578676948284 99 rows X 23 columns Total time taken by feature scaling: 49.69 sec Dimension Reduction using pca ... PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9'] Total time taken by PCA: 11.21 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Model Training started ... Hyperparameters used for model training: response_column : price name : glm family : GAUSSIAN lambda1 : (0.001, 0.02, 0.1) alpha : (0.15, 0.85) learning_rate : ('invtime', 'constant', 'adaptive') initial_eta : (0.05, 0.1) momentum : (0.65, 0.8, 0.95) iter_num_no_change : (5, 10, 50) iter_max : (300, 200, 400, 500) batch_size : (10, 80, 100, 150) Total number of models for glm : 5184 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- response_column : price name : xgboost model_type : Regression column_sampling : (1, 0.6) min_impurity : (0.0, 0.1, 0.2, 0.3) lambda1 : (0.01, 0.1, 1, 10) shrinkage_factor : (0.5, 0.01, 0.05, 0.1) max_depth : (5, 3, 4, 7, 8) min_node_size : (1, 2, 3, 4) iter_num : (10, 20, 30, 40) Total number of models for xgboost : 10240 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- response_column : price name : decision_forest tree_type : Regression min_impurity : (0.0, 0.1, 0.2, 0.3) max_depth : (5, 3, 4, 7, 8) min_node_size : (1, 2, 3, 4) num_trees : (-1, 20, 30, 40) Total number of models for decision_forest : 320 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- response_column : price name : svm model_type : regression lambda1 : (0.001, 0.02, 0.1) alpha : (0.15, 0.85) tolerance : (0.001, 0.01) learning_rate : ('Invtime', 'Adaptive', 'constant') initial_eta : (0.05, 0.1) momentum : (0.65, 0.8, 0.95) nesterov : True intercept : True iter_num_no_change : (5, 10, 50) local_sgd_iterations : (10, 20) iter_max : (300, 200, 400, 500) batch_size : (10, 80, 100, 150) Total number of models for svm : 20736 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Performing hyperParameter tuning ... glm ---------------------------------------------------------------------------------------------------- xgboost ---------------------------------------------------------------------------------------------------- decision_forest ---------------------------------------------------------------------------------------------------- svm ---------------------------------------------------------------------------------------------------- Evaluating models performance ... Evaluation completed. Leaderboard Rank Model-ID Feature-Selection MAE MSE MSLE RMSE RMSLE R2-score Adjusted R2-score 0 1 XGBOOST_3 lasso 9628.455479 1.483976e+08 0.034632 12181.855382 0.186097 0.783807 0.741623 1 2 XGBOOST_0 lasso 9628.455479 1.483976e+08 0.034632 12181.855382 0.186097 0.783807 0.741623 2 3 GLM_3 lasso 10200.663327 1.775318e+08 0.037768 13324.104150 0.194340 0.741363 0.690898 3 4 DECISIONFOREST_0 lasso 11490.006156 2.148976e+08 0.046107 14659.386825 0.214725 0.686927 0.625840 4 5 GLM_1 rfe 13133.449531 3.101527e+08 0.061828 17611.153178 0.248652 0.548155 0.507991 5 6 GLM_2 pca 12825.056171 3.332267e+08 0.058128 18254.497063 0.241098 0.514540 0.459374 6 7 GLM_0 lasso 14148.183643 3.375269e+08 0.069698 18371.906254 0.264003 0.508275 0.412328 7 8 XGBOOST_2 pca 13922.629788 3.710428e+08 0.064012 19262.470688 0.253006 0.459447 0.398021 8 9 DECISIONFOREST_2 pca 12586.068723 3.734912e+08 0.059352 19325.919884 0.243623 0.455880 0.394049 9 10 DECISIONFOREST_3 lasso 16432.237374 4.326218e+08 0.085381 20799.562257 0.292201 0.369736 0.246758 10 11 XGBOOST_1 rfe 21225.126287 6.860673e+08 0.153932 26192.885820 0.392342 0.000505 -0.088339 11 12 DECISIONFOREST_1 rfe 22215.932941 7.373789e+08 0.161533 27154.721120 0.401911 -0.074248 -0.169737 12 13 SVM_0 lasso 66220.929293 5.071762e+09 51.035352 71216.300842 7.143903 -6.388782 -7.830496 13 14 SVM_1 rfe 66243.121212 5.075075e+09 57.353084 71239.561004 7.573182 -6.393609 -7.050819 14 15 SVM_3 lasso 66269.585859 5.078071e+09 61.891356 71260.586170 7.867106 -6.397974 -7.841481 15 16 SVM_2 pca 66271.212121 5.078287e+09 0.000000 71262.102818 0.000000 -6.398289 -7.239004 16 rows X 10 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 18/18
- Display model leaderboard.
>>> aml.leaderboard()
Rank Model-ID Feature-Selection MAE MSE MSLE RMSE RMSLE R2-score Adjusted R2-score 0 1 XGBOOST_3 lasso 9628.455479 1.483976e+08 0.034632 12181.855382 0.186097 0.783807 0.741623 1 2 XGBOOST_0 lasso 9628.455479 1.483976e+08 0.034632 12181.855382 0.186097 0.783807 0.741623 2 3 GLM_3 lasso 10200.663327 1.775318e+08 0.037768 13324.104150 0.194340 0.741363 0.690898 3 4 DECISIONFOREST_0 lasso 11490.006156 2.148976e+08 0.046107 14659.386825 0.214725 0.686927 0.625840 4 5 GLM_1 rfe 13133.449531 3.101527e+08 0.061828 17611.153178 0.248652 0.548155 0.507991 5 6 GLM_2 pca 12825.056171 3.332267e+08 0.058128 18254.497063 0.241098 0.514540 0.459374 6 7 GLM_0 lasso 14148.183643 3.375269e+08 0.069698 18371.906254 0.264003 0.508275 0.412328 7 8 XGBOOST_2 pca 13922.629788 3.710428e+08 0.064012 19262.470688 0.253006 0.459447 0.398021 8 9 DECISIONFOREST_2 pca 12586.068723 3.734912e+08 0.059352 19325.919884 0.243623 0.455880 0.394049 9 10 DECISIONFOREST_3 lasso 16432.237374 4.326218e+08 0.085381 20799.562257 0.292201 0.369736 0.246758 10 11 XGBOOST_1 rfe 21225.126287 6.860673e+08 0.153932 26192.885820 0.392342 0.000505 -0.088339 11 12 DECISIONFOREST_1 rfe 22215.932941 7.373789e+08 0.161533 27154.721120 0.401911 -0.074248 -0.169737 12 13 SVM_0 lasso 66220.929293 5.071762e+09 51.035352 71216.300842 7.143903 -6.388782 -7.830496 13 14 SVM_1 rfe 66243.121212 5.075075e+09 57.353084 71239.561004 7.573182 -6.393609 -7.050819 14 15 SVM_3 lasso 66269.585859 5.078071e+09 61.891356 71260.586170 7.867106 -6.397974 -7.841481 15 16 SVM_2 pca 66271.212121 5.078287e+09 0.000000 71262.102818 0.000000 -6.398289 -7.239004
- Display the best performing model.
>>> aml.leader()
Rank Model-ID Feature-Selection MAE MSE MSLE RMSE RMSLE R2-score Adjusted R2-score 0 1 XGBOOST_3 lasso 9628.455479 1.483976e+08 0.034632 12181.855382 0.186097 0.783807 0.741623
- Generate prediction on validation dataset using best performing model.In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
>>> prediction = aml.predict()
Following model is being used for generating prediction : Model ID : XGBOOST_3 Feature Selection Method : lasso Prediction : id Prediction Confidence_Lower Confidence_upper price 0 492 63523.430206 -7990.015251 135036.875663 87250.0 1 368 102759.944778 -17752.673004 223272.562559 103000.0 2 454 77782.022661 -14282.526959 169846.572282 93000.0 3 247 45022.847275 -6566.519982 96612.214533 30000.0 4 205 117354.767915 -17115.856482 251825.392313 100500.0 5 211 56755.264163 -8626.057137 122136.585463 72000.0 6 200 116854.017302 -17656.674664 251364.709267 112000.0 7 451 66682.350190 -8409.325274 141774.025654 70000.0 8 417 107685.061593 -13027.683164 228397.806351 120000.0 9 187 37198.515964 -7152.411752 81549.443681 41000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 9628.455479 1.483976e+08 0.034632 15.784078 -5.485578 12181.855382 0.186097 52320.270431 0.783807 0.785125 2084.728019 0.033508
>>> prediction
id Prediction Confidence_Lower Confidence_upper price 26 55320.3275035 -14932.158036819601 125572.8130438196 52000.0 28 68406.69997650001 -3935.3562509474723 140748.75620394747 52000.0 29 86185.303965 -12767.850251373035 185138.45818137302 78000.0 30 54504.629284999995 -7394.386681102078 116403.64525110208 44100.0 120 110679.72956949998 -18432.318021989675 239791.77716098964 163000.0 121 71314.912363 -14548.560484102243 157178.38521010225 86000.0 31 61366.627566999996 -10317.129140241195 133050.3842742412 53900.0 27 114092.16751700001 -20268.7281787322 248453.0632127322 120000.0 25 52893.839117999996 -6981.666775837832 112769.34501183782 45000.0 24 80870.8686425 -11200.480495712312 172942.21778071232 99000.0
- Generate prediction on validation dataset using third best performing model.
>>> prediction = aml.predict(rank=3)
Following model is being used for generating prediction : Model ID : GLM_3 Feature Selection Method : lasso Prediction : id prediction price 0 492 66186.826718 87250.0 1 368 119753.381829 103000.0 2 454 78665.886884 93000.0 3 247 45251.710670 30000.0 4 205 113365.000670 100500.0 5 211 51491.354759 72000.0 6 200 111707.987096 112000.0 7 451 74287.740695 70000.0 8 417 112877.191516 120000.0 9 187 30706.236368 41000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 10200.663327 1.775318e+08 0.037768 16.300732 -5.466382 13324.10415 0.19434 54055.219486 0.741363 0.742799 2391.983615 0.037205
>>> prediction.head()
id prediction price 26 63880.087955143106 52000.0 28 64887.03364189595 52000.0 29 81838.00634272449 78000.0 30 59020.9083389725 44100.0 120 108944.78051355484 163000.0 121 76648.23791163946 86000.0 31 64051.73639523083 53900.0 27 108332.60522274605 120000.0 25 51235.20813818371 45000.0 24 89398.59783781157 99000.0
- Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(housing_test)
Data Transformation started ... Performing transformation carried out in feature engineering phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713818093862300"' Updated dataset after performing categorical encoding : sn price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 id 260 41000.0 6000.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 10 469 55000.0 2176.0 2 1 2 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1 8 364 72000.0 10700.0 3 1 2 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 16 53 68000.0 9166.0 2 1 1 0 1 1 0 0 1 1 0 0 1 2 1 0 0 0 1 11 255 61000.0 4360.0 4 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 15 16 37900.0 3185.0 2 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 23 251 48500.0 3450.0 3 1 1 0 1 1 0 0 1 1 0 1 0 2 1 0 0 1 0 14 408 87500.0 6420.0 3 1 3 0 1 1 0 0 1 1 0 1 0 0 0 1 0 0 1 22 301 55000.0 4080.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 9 13 27000.0 1700.0 3 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 17 Performing transformation carried out in data preparation phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815079036517"' Updated dataset after performing Lasso feature selection: id stories prefarea_1 sn bathrms fullbase_0 recroom_0 homestyle_1 recroom_1 garagepl driveway_0 prefarea_0 fullbase_1 airco_1 driveway_1 homestyle_0 lotsize price 48 1 0 25 1 1 1 1 0 0 0 1 0 0 1 0 4960.0 42000.0 44 1 0 239 1 0 1 1 0 2 0 1 1 0 1 0 3000.0 26000.0 39 1 1 441 1 1 1 0 0 2 0 0 0 0 1 0 3520.0 51900.0 51 1 1 443 1 1 1 0 0 0 0 0 0 0 1 0 3520.0 65000.0 33 1 1 411 1 0 1 0 0 1 0 0 1 0 1 0 9000.0 90000.0 25 1 0 249 1 1 1 1 0 0 0 1 0 0 1 0 3500.0 44500.0 12 4 0 38 1 1 1 0 0 0 0 1 0 1 1 0 5170.0 67000.0 67 4 0 317 1 1 1 0 0 0 0 1 0 0 1 0 5000.0 80000.0 22 3 1 408 1 0 1 0 0 0 0 0 1 0 1 0 6420.0 87500.0 75 1 0 294 1 1 1 1 0 0 0 1 0 0 1 0 4040.0 47000.0 Updated dataset after performing scaling on Lasso selected features : fullbase_0 recroom_1 prefarea_0 driveway_0 fullbase_1 prefarea_1 airco_1 driveway_1 price homestyle_0 recroom_0 homestyle_1 id stories sn bathrms garagepl lotsize 0 0 0 0 1 1 0 1 87500.0 0 1 0 22 1.8337296639558538 0.8106125112519007 -0.5026028286234501 -0.7628533219155937 0.7487845553609124 0 1 0 0 1 1 1 1 92500.0 0 0 0 40 -1.1754677333050345 0.7656565314251055 -0.5026028286234501 1.8067578676948275 1.2643683479630925 1 0 1 0 0 0 0 1 42000.0 0 1 1 48 -1.1754677333050345 -1.6491218135570358 -0.5026028286234501 -0.7628533219155937 -0.011571340799878474 1 0 1 0 0 0 1 1 64000.0 0 1 0 29 -1.1754677333050345 0.15553966234717084 -0.5026028286234501 0.5219522728896168 0.47016099067185546 1 0 0 0 0 1 0 1 51900.0 0 1 0 39 -1.1754677333050345 1.022547844721078 -0.5026028286234501 1.8067578676948275 -0.7615114027666858 1 0 0 0 0 1 0 1 65000.0 0 1 0 51 -1.1754677333050345 1.0353924103858767 -0.5026028286234501 -0.7628533219155937 -0.7615114027666858 1 0 1 0 0 0 0 1 47000.0 0 1 1 75 -1.1754677333050345 0.0784722683583791 -0.5026028286234501 -0.7628533219155937 -0.49069971372311655 0 0 0 0 1 1 0 1 90000.0 0 1 0 33 -1.1754677333050345 0.8298793597490987 -0.5026028286234501 0.5219522728896168 2.0924271663847755 1 0 1 0 0 0 0 1 44500.0 0 1 1 25 -1.1754677333050345 -0.21053045909958995 -0.5026028286234501 -0.7628533219155937 -0.7719272369606693 0 0 1 0 1 0 0 1 26000.0 0 1 1 44 -1.1754677333050345 -0.27475328742358307 -0.5026028286234501 1.8067578676948275 -1.0323230918102553 Updated dataset after performing RFE feature selection: id sn bathrms homestyle_1 garagepl homestyle_2 airco_0 homestyle_0 lotsize price 48 25 1 1 0 0 1 0 4960.0 42000.0 44 239 1 1 2 0 1 0 3000.0 26000.0 39 441 1 0 2 1 1 0 3520.0 51900.0 51 443 1 0 0 1 1 0 3520.0 65000.0 33 411 1 0 1 1 1 0 9000.0 90000.0 25 249 1 1 0 0 1 0 3500.0 44500.0 12 38 1 0 0 1 0 0 5170.0 67000.0 67 317 1 0 0 1 1 0 5000.0 80000.0 22 408 1 0 0 1 1 0 6420.0 87500.0 75 294 1 1 0 0 1 0 4040.0 47000.0 Updated dataset after performing scaling on RFE selected features : r_homestyle_1 r_homestyle_2 r_airco_0 r_homestyle_0 price id r_sn r_bathrms r_garagepl r_lotsize 1 0 1 0 42000.0 48 -1.6491218135570358 -0.5026028286234501 -0.7628533219155937 -0.011571340799878474 1 0 1 0 26000.0 44 -0.27475328742358307 -0.5026028286234501 1.8067578676948275 -1.0323230918102553 0 1 1 0 51900.0 39 1.022547844721078 -0.5026028286234501 1.8067578676948275 -0.7615114027666858 0 1 1 0 65000.0 51 1.0353924103858767 -0.5026028286234501 -0.7628533219155937 -0.7615114027666858 0 1 1 0 90000.0 33 0.8298793597490987 -0.5026028286234501 0.5219522728896168 2.0924271663847755 1 0 1 0 44500.0 25 -0.21053045909958995 -0.5026028286234501 -0.7628533219155937 -0.7719272369606693 0 1 0 0 67000.0 12 -1.5656321367358448 -0.5026028286234501 -0.7628533219155937 0.09779491823694761 0 1 1 0 80000.0 67 0.22618477350356328 -0.5026028286234501 -0.7628533219155937 0.009260327588088398 0 1 1 0 87500.0 22 0.8106125112519007 -0.5026028286234501 -0.7628533219155937 0.7487845553609124 1 0 1 0 47000.0 75 0.0784722683583791 -0.5026028286234501 -0.7628533219155937 -0.49069971372311655 Updated dataset after performing scaling for PCA feature selection : recroom_0 recroom_1 prefarea_0 driveway_0 bedrooms prefarea_1 fullbase_1 airco_1 homestyle_2 gashw_0 driveway_1 gashw_1 airco_0 homestyle_0 fullbase_0 price homestyle_1 id sn lotsize bathrms stories garagepl 1 0 0 0 3 1 1 0 1 1 1 0 1 0 0 87500.0 0 22 0.8106125112519003 0.7487845553609134 -0.5026028286234494 1.8337296639558482 -0.7628533219155942 0 1 0 0 3 1 1 1 1 1 1 0 0 0 0 92500.0 0 40 0.765656531425105 1.2643683479630943 -0.5026028286234494 -1.175467733305031 1.8067578676948284 1 0 1 0 2 0 0 0 0 1 1 0 1 0 1 42000.0 1 48 -1.649121813557035 -0.01157134079987849 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 1 0 2 0 0 1 1 1 1 0 0 0 1 64000.0 0 29 0.15553966234717076 0.4701609906718561 -0.5026028286234494 -1.175467733305031 0.5219522728896171 1 0 0 0 3 1 0 0 1 1 1 0 1 0 1 51900.0 0 39 1.0225478447210774 -0.7615114027666869 -0.5026028286234494 -1.175467733305031 1.8067578676948284 1 0 0 0 3 1 0 0 1 1 1 0 1 0 1 65000.0 0 51 1.035392410385876 -0.7615114027666869 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 1 0 2 0 0 0 0 1 1 0 1 0 1 47000.0 1 75 0.07847226835837905 -0.4906997137231172 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 0 0 3 1 1 0 1 1 1 0 1 0 0 90000.0 0 33 0.8298793597490982 2.0924271663847787 -0.5026028286234494 -1.175467733305031 0.5219522728896171 1 0 1 0 2 0 0 0 0 1 1 0 1 0 1 44500.0 1 25 -0.21053045909958984 -0.7719272369606704 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 1 0 2 0 1 0 0 1 1 0 1 0 0 26000.0 1 44 -0.2747532874235829 -1.0323230918102566 -0.5026028286234494 -1.175467733305031 1.8067578676948284 Updated dataset after performing PCA feature selection : id col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 price 0 24 0.121058 -0.088188 1.823276 -1.499059 0.807124 -0.001108 -0.486189 -0.327831 0.999693 0.041307 64900.0 1 12 -1.702851 2.244144 -0.163064 1.531449 -1.726684 1.435539 -0.336577 0.312267 -0.721428 -0.019429 67000.0 2 22 0.436066 1.674569 -1.184835 -0.517098 -0.149681 0.581587 -1.015373 0.346597 -1.125296 -0.054246 87500.0 3 40 2.732152 -1.196533 -0.243570 -0.130603 0.061535 0.992846 0.669891 0.796020 0.182694 0.034230 92500.0 4 67 -1.071889 2.634593 -1.143026 1.133870 -0.666936 0.448109 -1.183424 0.156071 -0.190941 -0.026246 80000.0 5 48 -1.466384 -1.594199 0.440659 -0.192916 -0.727838 -0.698315 -0.370444 0.078220 -0.266739 -0.400643 42000.0 6 29 0.713664 -1.049937 -0.245372 0.286599 -0.760120 -0.026496 0.770101 -0.793006 0.435955 0.026876 64000.0 7 44 -0.084548 -1.910265 0.033862 0.687256 1.538283 0.149991 0.443854 0.136846 0.043304 -1.148533 26000.0 8 39 1.145001 -1.114998 -1.120372 0.694225 1.703701 -0.350166 0.571300 -0.292433 0.199957 0.403042 51900.0 9 51 0.035124 -0.359414 -1.219128 -1.145653 0.772859 -0.721654 0.162539 -0.630366 0.185628 0.517291 65000.0 Data Transformation completed. Following model is being used for generating prediction : Model ID : XGBOOST_3 Feature Selection Method : lasso Prediction : id Prediction Confidence_Lower Confidence_upper price 0 48 52832.052623 -1130.634446 106794.739693 42000.0 1 44 48048.913899 -1342.559233 97440.387030 26000.0 2 39 59237.625097 -11283.510389 129758.760583 51900.0 3 51 59341.776847 -11013.321662 129696.875356 65000.0 4 33 81820.074617 -14655.044080 178295.193314 90000.0 5 25 47145.112130 -5338.347479 99628.571740 44500.0 6 22 83148.275354 -13682.206803 179978.757511 87500.0 7 12 82823.957607 -8536.222191 174184.137404 67000.0 8 67 76352.427511 -13523.627091 166228.482114 80000.0 9 75 41305.158083 -8937.031699 91547.347865 47000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 7530.032113 9.803829e+07 0.032812 14.340231 -6.394165 9901.428629 0.18114 31376.260153 0.695521 0.702174 1675.413575 0.030915
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper price 10 50847.987966999994 -2114.7423931209414 103810.71832712094 41000.0 12 82823.9576065 -8536.222191293302 174184.1374042933 67000.0 13 42366.405568999995 -7506.190055977691 92239.00119397769 49000.0 14 50464.071062999996 -2300.8744488123702 103229.01657481235 48500.0 16 72076.95612349999 -17375.200759916668 161529.11300691665 72000.0 17 30842.594055999998 -5288.270194834851 66973.45830683484 27000.0 15 62003.993956999984 -9860.1547885412 133868.14270254117 61000.0 11 74293.7388875 -8882.318131336462 157469.79590633645 68000.0 9 60351.362888 -11434.725754912419 132137.45153091243 55000.0 8 59123.682122 -16171.248326522546 134418.61257052253 55000.0
- Generate prediction on test dataset using second best performing model.
>>> prediction = aml.predict(housing_test,2)
Data Transformation started ... Performing transformation carried out in feature engineering phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815255642391"' Updated dataset after performing categorical encoding : sn price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 id 53 68000.0 9166.0 2 1 1 0 1 1 0 0 1 1 0 0 1 2 1 0 0 0 1 11 463 49000.0 2610.0 3 1 2 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 13 459 44555.0 2398.0 3 1 1 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 0 21 38 67000.0 5170.0 3 1 4 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 12 251 48500.0 3450.0 3 1 1 0 1 1 0 0 1 1 0 1 0 2 1 0 0 1 0 14 408 87500.0 6420.0 3 1 3 0 1 1 0 0 1 1 0 1 0 0 0 1 0 0 1 22 255 61000.0 4360.0 4 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 15 16 37900.0 3185.0 2 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 23 301 55000.0 4080.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 9 13 27000.0 1700.0 3 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 17 Performing transformation carried out in data preparation phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814881895584"' Updated dataset after performing Lasso feature selection: id stories prefarea_1 sn bathrms fullbase_0 recroom_0 homestyle_1 recroom_1 garagepl driveway_0 prefarea_0 fullbase_1 airco_1 driveway_1 homestyle_0 lotsize price 23 1 0 16 1 1 1 1 0 0 0 1 0 1 1 0 3185.0 37900.0 32 1 1 403 1 0 0 0 1 0 0 0 1 1 1 0 6825.0 77500.0 24 1 0 274 2 0 0 0 1 0 0 1 1 0 1 0 4100.0 64900.0 75 1 0 294 1 1 1 1 0 0 0 1 0 0 1 0 4040.0 47000.0 52 1 0 111 1 1 1 1 0 0 1 1 0 0 0 0 5076.0 43000.0 11 1 0 53 1 0 1 0 0 2 0 1 1 1 1 0 9166.0 68000.0 39 1 1 441 1 1 1 0 0 2 0 0 0 0 1 0 3520.0 51900.0 29 1 0 306 1 1 1 0 0 1 0 1 0 1 1 0 5885.0 64000.0 22 3 1 408 1 0 1 0 0 0 0 0 1 0 1 0 6420.0 87500.0 51 1 1 443 1 1 1 0 0 0 0 0 0 0 1 0 3520.0 65000.0 Updated dataset after performing scaling on Lasso selected features : fullbase_0 recroom_1 prefarea_0 driveway_0 fullbase_1 prefarea_1 airco_1 driveway_1 price homestyle_0 recroom_0 homestyle_1 id stories sn bathrms garagepl lotsize 0 1 0 0 1 1 1 1 77500.0 0 0 0 32 -1.1754677333050345 0.7785010970899041 -0.5026028286234501 -0.7628533219155937 0.9597051977890769 1 0 1 0 0 0 0 1 47000.0 0 1 1 75 -1.1754677333050345 0.0784722683583791 -0.5026028286234501 -0.7628533219155937 -0.49069971372311655 1 0 0 0 0 1 0 1 65000.0 0 1 0 51 -1.1754677333050345 1.0353924103858767 -0.5026028286234501 -0.7628533219155937 -0.7615114027666858 1 0 1 1 0 0 0 0 43000.0 0 1 1 52 -1.1754677333050345 -1.096805489970695 -0.5026028286234501 -0.7628533219155937 0.04884049752522546 1 0 0 0 0 1 0 1 51900.0 0 1 0 39 -1.1754677333050345 1.022547844721078 -0.5026028286234501 1.8067578676948275 -0.7615114027666858 1 0 1 0 0 0 1 1 64000.0 0 1 0 29 -1.1754677333050345 0.15553966234717084 -0.5026028286234501 0.5219522728896168 0.47016099067185546 1 0 1 0 0 0 0 1 80000.0 0 1 0 67 3.338328362586298 0.22618477350356328 -0.5026028286234501 -0.7628533219155937 0.009260327588088398 1 0 1 0 0 0 1 1 67000.0 0 1 0 12 3.338328362586298 -1.5656321367358448 -0.5026028286234501 -0.7628533219155937 0.09779491823694761 0 0 0 0 1 1 0 1 87500.0 0 1 0 22 1.8337296639558538 0.8106125112519007 -0.5026028286234501 -0.7628533219155937 0.7487845553609124 0 0 1 0 1 0 1 1 68000.0 0 1 0 11 -1.1754677333050345 -1.4692978942498551 -0.5026028286234501 1.8067578676948275 2.178878590194838 Updated dataset after performing RFE feature selection: id sn bathrms homestyle_1 garagepl homestyle_2 airco_0 homestyle_0 lotsize price 23 16 1 1 0 0 0 0 3185.0 37900.0 32 403 1 0 0 1 0 0 6825.0 77500.0 24 274 2 0 0 1 1 0 4100.0 64900.0 75 294 1 1 0 0 1 0 4040.0 47000.0 52 111 1 1 0 0 1 0 5076.0 43000.0 11 53 1 0 2 1 0 0 9166.0 68000.0 39 441 1 0 2 1 1 0 3520.0 51900.0 29 306 1 0 1 1 0 0 5885.0 64000.0 22 408 1 0 0 1 1 0 6420.0 87500.0 51 443 1 0 0 1 1 0 3520.0 65000.0 Updated dataset after performing scaling on RFE selected features : r_homestyle_1 r_homestyle_2 r_airco_0 r_homestyle_0 price id r_sn r_bathrms r_garagepl r_lotsize 0 1 0 0 77500.0 32 0.7785010970899041 -0.5026028286234501 -0.7628533219155937 0.9597051977890769 1 0 1 0 47000.0 75 0.0784722683583791 -0.5026028286234501 -0.7628533219155937 -0.49069971372311655 0 1 1 0 65000.0 51 1.0353924103858767 -0.5026028286234501 -0.7628533219155937 -0.7615114027666858 1 0 1 0 43000.0 52 -1.096805489970695 -0.5026028286234501 -0.7628533219155937 0.04884049752522546 0 1 1 0 51900.0 39 1.022547844721078 -0.5026028286234501 1.8067578676948275 -0.7615114027666858 0 1 0 0 64000.0 29 0.15553966234717084 -0.5026028286234501 0.5219522728896168 0.47016099067185546 0 1 1 0 87500.0 22 0.8106125112519007 -0.5026028286234501 -0.7628533219155937 0.7487845553609124 0 1 1 0 80000.0 67 0.22618477350356328 -0.5026028286234501 -0.7628533219155937 0.009260327588088398 0 1 0 0 67000.0 12 -1.5656321367358448 -0.5026028286234501 -0.7628533219155937 0.09779491823694761 0 1 0 0 68000.0 11 -1.4692978942498551 -0.5026028286234501 1.8067578676948275 2.178878590194838 Updated dataset after performing scaling for PCA feature selection : recroom_0 recroom_1 prefarea_0 driveway_0 bedrooms prefarea_1 fullbase_1 airco_1 homestyle_2 gashw_0 driveway_1 gashw_1 airco_0 homestyle_0 fullbase_0 price homestyle_1 id sn lotsize bathrms stories garagepl 0 1 0 0 3 1 1 1 1 1 1 0 0 0 0 77500.0 0 32 0.7785010970899037 0.9597051977890783 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 1 0 2 0 0 0 0 1 1 0 1 0 1 47000.0 1 75 0.07847226835837905 -0.4906997137231172 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 0 0 3 1 0 0 1 1 1 0 1 0 1 65000.0 0 51 1.035392410385876 -0.7615114027666869 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 1 1 3 0 0 0 0 1 0 0 1 0 1 43000.0 1 52 -1.0968054899706945 0.048840497525225526 -0.5026028286234494 -1.175467733305031 -0.7628533219155942 1 0 0 0 3 1 0 0 1 1 1 0 1 0 1 51900.0 0 39 1.0225478447210774 -0.7615114027666869 -0.5026028286234494 -1.175467733305031 1.8067578676948284 1 0 1 0 2 0 0 1 1 1 1 0 0 0 1 64000.0 0 29 0.15553966234717076 0.4701609906718561 -0.5026028286234494 -1.175467733305031 0.5219522728896171 1 0 1 0 3 0 0 0 1 1 1 0 1 0 1 80000.0 0 67 0.22618477350356314 0.009260327588088412 -0.5026028286234494 3.338328362586288 -0.7628533219155942 1 0 1 0 3 0 0 1 1 1 1 0 0 0 1 67000.0 0 12 -1.565632136735844 0.09779491823694775 -0.5026028286234494 3.338328362586288 -0.7628533219155942 1 0 0 0 3 1 1 0 1 1 1 0 1 0 0 87500.0 0 22 0.8106125112519003 0.7487845553609134 -0.5026028286234494 1.8337296639558482 -0.7628533219155942 1 0 1 0 2 0 1 1 1 1 1 0 0 0 0 68000.0 0 11 -1.4692978942498542 2.178878590194841 -0.5026028286234494 -1.175467733305031 1.8067578676948284 Updated dataset after performing PCA feature selection : id col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 price 0 23 -1.846762 -1.338399 0.503782 -0.227864 -0.735116 -0.101448 1.053769 -0.133609 0.006323 -0.570542 37900.0 1 67 -1.071889 2.634593 -1.143026 1.133870 -0.666936 0.448109 -1.183424 0.156071 -0.190941 -0.026246 80000.0 2 22 0.436066 1.674569 -1.184835 -0.517098 -0.149681 0.581587 -1.015373 0.346597 -1.125296 -0.054246 87500.0 3 21 -0.556103 -0.430944 -1.299009 -1.208186 1.167272 -1.350354 0.655622 0.022532 0.184275 -0.272029 44555.0 4 12 -1.702851 2.244144 -0.163064 1.531449 -1.726684 1.435539 -0.336577 0.312267 -0.721428 -0.019429 67000.0 5 32 1.447936 -0.419323 -0.367653 -1.974154 -0.669749 0.666961 0.361696 0.420576 0.244402 0.138265 77500.0 6 24 0.121058 -0.088188 1.823276 -1.499059 0.807124 -0.001108 -0.486189 -0.327831 0.999693 0.041307 64900.0 7 75 -0.913410 -0.987968 -0.428719 -0.617452 -0.026463 -1.166772 0.029816 -0.223178 0.401544 -0.558936 47000.0 8 51 0.035124 -0.359414 -1.219128 -1.145653 0.772859 -0.721654 0.162539 -0.630366 0.185628 0.517291 65000.0 9 52 -1.417779 -1.546909 0.258422 -0.403733 -0.676443 -1.060856 0.087088 0.370494 -0.220855 0.444359 43000.0 Data Transformation completed. Following model is being used for generating prediction : Model ID : XGBOOST_0 Feature Selection Method : lasso Prediction : id Prediction Confidence_Lower Confidence_upper price 0 22 83148.275354 -13682.206803 179978.757511 87500.0 1 21 40304.804437 -4991.597547 85601.206422 44555.0 2 32 80038.998447 -15350.979605 175428.976498 77500.0 3 24 75236.641121 -15797.504638 166270.786879 64900.0 4 51 59341.776847 -11013.321662 129696.875356 65000.0 5 52 49527.065554 -1879.863957 100933.995065 43000.0 6 11 74293.738888 -8882.318131 157469.795906 68000.0 7 39 59237.625097 -11283.510389 129758.760583 51900.0 8 29 65776.624865 -9231.694949 140784.944679 64000.0 9 75 41305.158083 -8937.031699 91547.347865 47000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 7530.032113 9.803829e+07 0.032812 14.340231 -6.394165 9901.428629 0.18114 31376.260153 0.695521 0.702174 1675.413575 0.030915
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper price 10 50847.987966999994 -2114.7423931209414 103810.71832712094 41000.0 12 82823.9576065 -8536.222191293302 174184.1374042933 67000.0 13 42366.405568999995 -7506.190055977691 92239.00119397769 49000.0 14 50464.071062999996 -2300.8744488123702 103229.01657481235 48500.0 16 72076.95612349999 -17375.200759916668 161529.11300691665 72000.0 17 30842.594055999998 -5288.270194834851 66973.45830683484 27000.0 15 62003.993956999984 -9860.1547885412 133868.14270254117 61000.0 11 74293.7388875 -8882.318131336462 157469.79590633645 68000.0 9 60351.362888 -11434.725754912419 132137.45153091243 55000.0 8 59123.682122 -16171.248326522546 134418.61257052253 55000.0