This example predict predict the price of house based on different factors.
Run AutoRegressor to get the best performing model with following specifications:
- Set early stopping criteria, that is, time limit to 300 sec and performance metrics R2 threshold value to 0.7.
- Exclude ‘glm’, ‘svm’, and ‘knn’ model from default model training list.
- Opt for verbose level 2 to get detailed logging.
- Use custom_config_file to customize some specific processes in AutoML flow.
- Load the example dataset.
>>> load_example_data("decisionforestpredict", ["housing_train", "housing_test"])
>>> housing_train = DataFrame.from_table("housing_train")
>>> housing_test = DataFrame.from_table("housing_test")
- Generate custom config JSON file.
>>> AutoRegressor.generate_custom_config("custom_housing")
Generating custom config JSON for AutoML ... Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 1 Customizing Feature Engineering Phase ... Available options for customization of feature engineering phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Missing Value Handling Index 2: Customize Bincode Encoding Index 3: Customize String Manipulation Index 4: Customize Categorical Encoding Index 5: Customize Mathematical Transformation Index 6: Customize Nonlinear Transformation Index 7: Customize Antiselect Features Index 8: Back to main menu Index 9: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in feature engineering phase: 2,4,7,8 Customizing Bincode Encoding ... Provide the following details to customize binning and coding encoding: Available binning methods with corresponding indices: Index 1: Equal-Width Index 2: Variable-Width Enter the feature or list of features for binning: bedrooms Enter the index of corresponding binning method for feature bedrooms: 2 Enter the number of bins for feature bedrooms: 2 Available value type of feature for variable binning with corresponding indices: Index 1: int Index 2: float Provide the range for bin 1 of feature bedrooms: Enter the index of corresponding value type of feature bedrooms: 1 Enter the minimum value for bin 1 of feature bedrooms: 0 Enter the maximum value for bin 1 of feature bedrooms: 2 Enter the label for bin 1 of feature bedrooms: small_house Provide the range for bin 2 of feature bedrooms: Enter the index of corresponding value type of feature bedrooms: 1 Enter the minimum value for bin 2 of feature bedrooms: 3 Enter the maximum value for bin 2 of feature bedrooms: 5 Enter the label for bin 2 of feature bedrooms: big_house Customization of bincode encoding has been completed successfully. Customizing Categorical Encoding ... Provide the following details to customize categorical encoding: Available categorical encoding methods with corresponding indices: Index 1: OneHotEncoding Index 2: OrdinalEncoding Index 3: TargetEncoding Enter the list of corresponding index categorical encoding methods you want to use: 2,3 Enter the feature or list of features for OrdinalEncoding: homestyle Enter the feature or list of features for TargetEncoding: prefarea Available target encoding methods with corresponding indices: Index 1: CBM_BETA Index 2: CBM_DIRICHLET Index 3: CBM_GAUSSIAN_INVERSE_GAMMA Enter the index of target encoding method for feature prefarea: 3 Enter the response column for target encoding method for feature prefarea: price Customization of categorical encoding has been completed successfully. Customizing Antiselect Features ... Enter the feature or list of features for antiselect: sn Customization of antiselect features has been completed successfully. Customization of feature engineering phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 2 Customizing Data Preparation Phase ... Available options for customization of data preparation phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Train Test Split Index 2: Customize Data Imbalance Handling Index 3: Customize Outlier Handling Index 4: Customize Feature Scaling Index 5: Back to main menu Index 6: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in data preparation phase: 1,2,3,4,5 Customizing Train Test Split ... Enter the train size for train test split: 0.75 Customization of train test split has been completed successfully. Customizing Data Imbalance Handling ... Available data sampling methods with corresponding indices: Index 1: SMOTE Index 2: NearMiss Enter the corresponding index data imbalance handling method: 1 Customization of data imbalance handling has been completed successfully. Customizing Outlier Handling ... Available outlier detection methods with corresponding indices: Index 1: percentile Index 2: tukey Index 3: carling Enter the corresponding index oulier handling method: 1 Enter the lower percentile value for outlier handling: 0.1 Enter the upper percentile value for outlier handling: 0.9 Enter the feature or list of features for outlier handling: bathrms Available outlier replacement methods with corresponding indices: Index 1: delete Index 2: median Index 3: Any Numeric Value Enter the index of corresponding replacement method for feature bathrms: 1 Customization of outlier handling has been completed successfully. Available feature scaling methods with corresponding indices: Index 1: maxabs Index 2: mean Index 3: midrange Index 4: range Index 5: rescale Index 6: std Index 7: sum Index 8: ustd Enter the corresponding index feature scaling method: 6 Customization of feature scaling has been completed successfully. Customization of data preparation phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 3 Customizing Model Training Phase ... Available options for customization of model training phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Model Hyperparameter Index 2: Back to main menu Index 3: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in model training phase: 1 Customizing Model Hyperparameter ... Available models for hyperparameter tuning with corresponding indices: Index 1: decision_forest Index 2: xgboost Index 3: knn Index 4: glm Index 5: svm Available hyperparamters update methods with corresponding indices: Index 1: ADD Index 2: REPLACE Enter the list of model indices for performing hyperparameter tuning: 2 Available hyperparameters for model 'xgboost' with corresponding indices: Index 1: min_impurity Index 2: max_depth Index 3: min_node_size Index 4: shrinkage_factor Index 5: iter_num Enter the list of hyperparameter indices for model 'xgboost': 3 Enter the index of corresponding update method for hyperparameters 'min_node_size' for model 'xgboost': 1 Enter the list of value for hyperparameter 'min_node_size' for model 'xgboost': 1,2 Customization of model hyperparameter has been completed successfully. Available options for customization of model training phase with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Model Hyperparameter Index 2: Back to main menu Index 3: Generate custom json and exit -------------------------------------------------------------------------------- Enter the list of indices you want to customize in model training phase: 2 Customization of model training phase has been completed successfully. Available main options for customization with corresponding indices: -------------------------------------------------------------------------------- Index 1: Customize Feature Engineering Phase Index 2: Customize Data Preparation Phase Index 3: Customize Model Training Phase Index 4: Generate custom json and exit -------------------------------------------------------------------------------- Enter the index you want to customize: 4 Generating custom json and exiting ... Process of generating custom config file for AutoML has been completed successfully. 'custom_housing.json' file is generated successfully under the current working directory.
- Create an AutoRegressor instance.
>>> aml = AutoRegressor(exclude=['glm','svm','knn'], verbose=2, max_runtime_secs=300, stopping_metric='R2', stopping_tolerance=0.7, custom_config_file='custom_housing.json')
- Fit the data.
>>> aml.fit(housing_train,housing_train.price)
Received below input for customization : { "BincodeIndicator": true, "BincodeParam": { "bedrooms": { "Type": "Variable-Width", "NumOfBins": 2, "Bin_1": { "min_value": 0, "max_value": 2, "label": "small_house" }, "Bin_2": { "min_value": 3, "max_value": 5, "label": "big_house" } } }, "CategoricalEncodingIndicator": true, "CategoricalEncodingParam": { "OrdinalEncodingIndicator": true, "OrdinalEncodingList": [ "homestyle" ], "TargetEncodingIndicator": true, "TargetEncodingList": { "prefarea": { "encoder_method": "CBM_GAUSSIAN_INVERSE_GAMMA", "response_column": "price" } } }, "AntiselectIndicator": true, "AntiselectParam": [ "sn" ], "TrainTestSplitIndicator": true, "TrainingSize": 0.75, "DataImbalanceIndicator": true, "DataImbalanceMethod": "SMOTE", "OutlierFilterIndicator": true, "OutlierFilterMethod": "percentile", "OutlierLowerPercentile": 0.1, "OutlierUpperPercentile": 0.9, "OutlierFilterParam": { "bathrms": { "replacement_value": "delete" } }, "FeatureScalingIndicator": true, "FeatureScalingMethod": "std", "HyperparameterTuningIndicator": true, "HyperparameterTuningParam": { "xgboost": { "min_node_size": { "Method": "ADD", "Value": [ 1, 2 ] } } } } 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Exploration started ... Data Overview: Total Rows in the data: 492 Total Columns in the data: 14 Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage recroom VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 homestyle VARCHAR(20) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 sn INTEGER 492 0 None 0 492 0 0.0 100.0 price FLOAT 492 0 None 0 492 0 0.0 100.0 prefarea VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 airco VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 stories INTEGER 492 0 None 0 492 0 0.0 100.0 fullbase VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 bedrooms INTEGER 492 0 None 0 492 0 0.0 100.0 gashw VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 bathrms INTEGER 492 0 None 0 492 0 0.0 100.0 garagepl INTEGER 492 0 None 270 222 0 0.0 100.0 lotsize FLOAT 492 0 None 0 492 0 0.0 100.0 driveway VARCHAR(10) CHARACTER SET LATIN 492 0 0 None None None 0.0 100.0 Statistics of Data: func sn price lotsize bedrooms bathrms stories garagepl min 1 25000 1650 1 1 1 0 std 159.501 26472.496 2182.443 0.731 0.51 0.861 0.854 25% 132.5 49975 3600 2 1 1 0 50% 274 62000 4616 3 1 2 0 75% 413.25 82000 6370 3 2 2 1 max 546 190000 16200 6 4 4 3 mean 272.943 68100.396 5181.795 2.965 1.293 1.803 0.685 count 492 492 492 492 492 492 492 Categorical Columns with their Distinct values: ColumnName DistinctValueCount driveway 2 recroom 2 fullbase 2 gashw 2 airco 2 prefarea 2 homestyle 3 No Futile columns found. Target Column Distribution: Columns with outlier percentage :- ColumnName OutlierPercentage 0 lotsize 2.235772 1 bedrooms 2.235772 2 garagepl 2.235772 3 stories 7.113821 4 price 2.439024 5 bathrms 0.203252 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Feature Engineering started ... Handling duplicate records present in dataset ... Analysis completed. No action taken. Total time to handle duplicate records: 1.49 sec Handling less significant features from data ... Analysis indicates all categorical columns are significant. No action Needed. Total time to handle less significant features: 15.14 sec Handling Date Features ... Analysis Completed. Dataset does not contain any feature related to dates. No action needed. Total time to handle date features: 0.00 sec Proceeding with default option for missing value imputation. Proceeding with default option for handling remaining missing values. Checking Missing values in dataset ... Analysis Completed. No Missing Values Detected. Total time to find missing values in data: 7.37 sec Imputing Missing Values ... Analysis completed. No imputation required. Time taken to perform imputation: 0.01 sec No information provided for Equal-Width Transformation. Variable-Width binning information:- ColumnName MinValue MaxValue Label 0 bedrooms 0 2 small_house 1 bedrooms 3 5 big_house 2 rows X 4 columns result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816329745055"'0 Updated dataset sample after performing Variable-Width binning: bathrms lotsize airco gashw garagepl id recroom sn driveway stories prefarea fullbase homestyle price bedrooms 3 4410.0 no no 2 118 no 257 yes 2 no yes Eclectic 71000.0 big_house 3 8580.0 no no 2 510 no 44 yes 2 no no Eclectic 92000.0 big_house 3 3630.0 no no 0 92 yes 55 no 2 no no Classic 38000.0 big_house 3 2610.0 no no 0 73 no 156 no 2 no no Eclectic 60000.0 big_house 3 7500.0 yes no 2 338 no 338 yes 1 yes yes bungalow 155000.0 big_house 3 6000.0 no yes 2 66 yes 217 yes 2 no yes bungalow 138300.0 big_house 3 3300.0 no no 0 291 no 102 yes 2 no yes Eclectic 79000.0 big_house 1 10240.0 yes no 2 96 no 421 yes 1 yes no Eclectic 68000.0 small_house 1 6000.0 yes no 1 59 no 324 yes 1 no no Eclectic 98000.0 big_house 1 6060.0 no no 0 123 yes 91 yes 1 no yes Classic 47000.0 big_house 492 rows X 15 columns Skipping customized string manipulation.⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 25% - 5/20 Starting Customized Categorical Feature Encoding ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816245888910"'0 Updated dataset sample after performing ordinal encoding: bathrms bedrooms lotsize gashw garagepl id recroom sn driveway stories prefarea airco fullbase price homestyle 3 big_house 3630.0 no 0 92 yes 55 no 2 no no no 38000.0 1 3 big_house 8580.0 no 2 76 no 362 yes 4 yes yes no 145000.0 0 3 big_house 6000.0 yes 2 66 yes 217 yes 2 no no yes 138300.0 0 3 big_house 3300.0 no 0 291 no 102 yes 2 no no yes 79000.0 2 3 big_house 8580.0 no 2 510 no 44 yes 2 no no no 92000.0 2 3 big_house 4410.0 no 2 118 no 257 yes 2 no no yes 71000.0 2 3 big_house 5960.0 no 1 194 yes 127 yes 2 no no yes 117000.0 0 1 big_house 7020.0 no 2 262 no 388 yes 1 yes yes yes 85000.0 2 1 small_house 6800.0 no 2 121 yes 354 yes 1 no no yes 86000.0 2 1 small_house 3640.0 no 1 9 no 265 yes 1 no no no 50000.0 1 492 rows X 15 columns result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814972077644"'0 Updated dataset sample after performing target encoding: prefarea bedrooms bathrms lotsize gashw garagepl id recroom sn driveway stories airco homestyle fullbase price 62906.33597883598 big_house 1 3300.0 no 1 33 no 17 no 2 no 1 no 40500.0 62906.33597883598 small_house 1 8400.0 no 1 495 no 494 yes 1 no 2 no 54000.0 62906.33597883598 big_house 1 4500.0 no 0 488 no 145 no 2 yes 2 yes 57250.0 62906.33597883598 big_house 1 4046.0 no 1 228 no 348 yes 2 no 2 yes 59500.0 62906.33597883598 small_house 1 2640.0 no 1 173 no 211 no 1 no 1 no 40500.0 62906.33597883598 big_house 1 5200.0 no 0 111 no 320 yes 3 yes 2 no 83000.0 83851.72413793103 big_house 1 2145.0 no 0 387 no 460 yes 2 no 1 yes 47000.0 83851.72413793103 small_house 1 10360.0 no 1 353 no 477 yes 1 no 2 no 61500.0 83851.72413793103 big_house 1 7000.0 no 2 360 no 399 yes 1 no 2 yes 82900.0 83851.72413793103 big_house 1 6600.0 no 3 93 no 360 yes 4 yes 0 no 107000.0 492 rows X 15 columns Performing encoding for categorical columns ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815195827284"'0 ONE HOT Encoding these Columns: ['bedrooms', 'gashw', 'recroom', 'driveway', 'airco', 'fullbase'] Sample of dataset after performing one hot encoding: prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 sn driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 83851.72413793103 1 0 0 1 7160.0 1 0 2 124 1 0 379 0 1 1 1 0 2 0 1 84000.0 83851.72413793103 1 0 0 1 5020.0 1 0 0 444 1 0 393 0 1 4 0 1 2 1 0 96000.0 83851.72413793103 1 0 0 1 3800.0 1 0 1 404 0 1 456 0 1 2 1 0 2 0 1 75000.0 83851.72413793103 1 0 0 1 3520.0 1 0 0 183 1 0 438 0 1 2 1 0 2 1 0 60000.0 83851.72413793103 1 0 0 1 2880.0 1 0 0 257 1 0 424 0 1 2 1 0 2 1 0 62900.0 83851.72413793103 1 0 0 1 9620.0 1 0 2 393 1 0 391 0 1 1 1 0 2 0 1 86900.0 62906.33597883598 1 0 0 1 3300.0 1 0 1 33 1 0 17 1 0 2 1 0 1 1 0 40500.0 62906.33597883598 0 0 1 1 8400.0 1 0 1 495 1 0 494 0 1 1 1 0 2 1 0 54000.0 62906.33597883598 1 0 0 1 4500.0 1 0 0 488 1 0 145 1 0 2 0 1 2 0 1 57250.0 62906.33597883598 1 0 0 1 4046.0 1 0 1 228 1 0 348 0 1 2 1 0 2 0 1 59500.0 492 rows X 22 columns Time taken to encode the columns: 13.96 sec Starting customized mathematical transformation ... Skipping customized mathematical transformation. Starting customized non-linear transformation ... Skipping customized non-linear transformation. Starting customized anti-select columns ... Updated dataset sample after performing anti-select columns: prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 83851.72413793103 1 0 0 1 7160.0 1 0 2 124 1 0 0 1 1 1 0 2 0 1 84000.0 83851.72413793103 1 0 0 1 5020.0 1 0 0 444 1 0 0 1 4 0 1 2 1 0 96000.0 83851.72413793103 1 0 0 1 3800.0 1 0 1 404 0 1 0 1 2 1 0 2 0 1 75000.0 83851.72413793103 1 0 0 1 3520.0 1 0 0 183 1 0 0 1 2 1 0 2 1 0 60000.0 83851.72413793103 1 0 0 1 2880.0 1 0 0 257 1 0 0 1 2 1 0 2 1 0 62900.0 83851.72413793103 1 0 0 1 9620.0 1 0 2 393 1 0 0 1 1 1 0 2 0 1 86900.0 62906.33597883598 1 0 0 1 3300.0 1 0 1 33 1 0 1 0 2 1 0 1 1 0 40500.0 62906.33597883598 0 0 1 1 8400.0 1 0 1 495 1 0 0 1 1 1 0 2 1 0 54000.0 62906.33597883598 1 0 0 1 4500.0 1 0 0 488 1 0 1 0 2 0 1 2 0 1 57250.0 62906.33597883598 1 0 0 1 4046.0 1 0 1 228 1 0 0 1 2 1 0 2 0 1 59500.0 492 rows X 21 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Data preparation started ... Spliting of dataset into training and testing ... Training size : 0.75 Testing size : 0.25 Training data sample prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 62906.33597883598 0 0 1 1 6360.0 1 0 1 12 1 0 0 1 1 0 1 2 0 1 63900.0 62906.33597883598 1 0 0 1 8372.0 1 0 2 16 1 0 0 1 3 0 1 2 1 0 87000.0 62906.33597883598 1 0 0 2 4500.0 1 0 0 17 1 0 1 0 2 0 1 2 0 1 57000.0 62906.33597883598 1 0 0 1 6840.0 1 0 1 18 0 1 0 1 2 0 1 0 0 1 116000.0 62906.33597883598 1 0 0 1 5800.0 0 1 2 21 1 0 0 1 1 1 0 2 1 0 60000.0 62906.33597883598 0 0 1 1 3649.0 1 0 0 22 1 0 0 1 1 1 0 1 1 0 27000.0 83851.72413793103 0 0 1 1 5320.0 1 0 1 15 1 0 0 1 1 1 0 1 1 0 49500.0 83851.72413793103 1 0 0 2 6600.0 1 0 0 29 0 1 0 1 2 1 0 2 0 1 78000.0 83851.72413793103 1 0 0 1 11440.0 1 0 1 37 1 0 0 1 2 1 0 0 0 1 104900.0 83851.72413793103 1 0 0 1 6360.0 1 0 0 41 1 0 0 1 3 1 0 2 1 0 80000.0 369 rows X 21 columns Testing data sample prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 62906.33597883598 1 0 0 2 8880.0 1 0 1 13 1 0 0 1 2 0 1 2 0 1 99000.0 62906.33597883598 1 0 0 1 2400.0 1 0 0 34 1 0 1 0 1 1 0 1 1 0 25245.0 62906.33597883598 1 0 0 2 9800.0 1 0 2 36 0 1 0 1 2 1 0 2 1 0 75000.0 62906.33597883598 1 0 0 1 3000.0 1 0 0 38 1 0 0 1 2 1 0 2 1 0 56000.0 62906.33597883598 0 0 1 1 4040.0 1 0 1 67 1 0 0 1 2 1 0 2 1 0 58500.0 62906.33597883598 1 0 0 1 2970.0 1 0 0 72 1 0 0 1 3 1 0 2 1 0 70000.0 83851.72413793103 1 0 0 1 11460.0 1 0 2 19 1 0 0 1 3 1 0 2 1 0 83900.0 83851.72413793103 1 0 0 2 5500.0 1 0 1 27 1 0 0 1 2 0 1 0 0 1 120000.0 83851.72413793103 1 0 0 2 4880.0 1 0 1 40 1 0 0 1 2 0 1 0 1 0 118500.0 83851.72413793103 1 0 0 1 2145.0 1 0 0 75 1 0 0 1 3 1 0 1 1 0 49500.0 123 rows X 21 columns Time taken for spliting of data: 14.48 sec Starting customized outlier processing ... Columns with outlier percentage :- ColumnName OutlierPercentage 0 garagepl 2.235772 1 price 8.739837 2 id 9.756098 3 lotsize 9.552846 4 bathrms 2.235772 result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815695090274"' result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816723333520"'20 result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816253699382"' Feature selection using lasso ... feature selected by lasso: ['bathrms', 'fullbase_1', 'gashw_0', 'driveway_0', 'stories', 'airco_1', 'gashw_1', 'bedrooms_0', 'bedrooms_2', 'driveway_1', 'garagepl', 'recroom_0', 'fullbase_0', 'homestyle', 'airco_0', 'prefarea', 'lotsize'] Total time taken by feature selection: 1.43 sec scaling Features of lasso data ... columns that will be scaled: ['bathrms', 'stories', 'garagepl', 'homestyle', 'prefarea', 'lotsize'] Training dataset sample after scaling: driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 bathrms stories garagepl homestyle prefarea lotsize 1 0 47000.0 1 1 1 56 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.6389834756925418 1 1 52000.0 1 1 1 26 0 0 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.7437609001391254 1 0 64000.0 1 1 1 245 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 0.3943732959927247 0.7474571831400929 -0.552679423766173 -0.5219348213558176 1 0 78000.0 1 1 0 175 0 1 1 0 0 0 -0.5698449326198071 2.5617608810097847 -0.7757094582336237 0.7474571831400929 -0.552679423766173 0.5022409040905186 0 0 40500.0 1 1 1 33 1 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 0.3943732959927247 -0.7516329215933895 -0.552679423766173 -0.87119290284443 1 1 57000.0 0 1 0 100 0 0 0 1 0 1 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 1.8093671611394 -0.5172151175519175 1 0 57500.0 1 1 1 427 0 1 0 1 0 0 -0.5698449326198071 -0.8999912756370636 2.734538804445422 0.7474571831400929 -0.552679423766173 0.11994489597460505 1 1 75000.0 1 1 1 116 0 0 1 0 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.41810133767001395 1 0 52000.0 0 1 1 28 0 1 1 0 0 1 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -1.012784016961435 0 0 46000.0 1 1 1 32 1 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.956147571314633 359 rows X 19 columns Testing dataset sample after scaling: driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 bathrms stories garagepl homestyle prefarea lotsize 0 0 25245.0 1 1 1 34 1 1 0 1 0 0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.295966245195445 1 0 56000.0 1 1 1 38 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -1.012784016961435 0 0 47900.0 1 1 1 122 1 1 0 1 0 0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.15437513107844 1 1 51000.0 1 1 1 140 0 0 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.9419884599029325 0 1 44000.0 1 1 1 452 1 0 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.409239136489049 1 1 60000.0 0 1 0 153 0 0 0 1 0 1 -0.5698449326198071 -0.8999912756370636 1.5644560502190732 0.7474571831400929 -0.552679423766173 0.33233156715011253 1 0 47500.0 0 1 1 142 0 1 0 1 0 1 -0.5698449326198071 -0.8999912756370636 0.3943732959927247 -0.7516329215933895 -0.552679423766173 -0.5219348213558176 1 0 59900.0 1 1 1 400 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 0.3943732959927247 0.7474571831400929 -0.552679423766173 -0.8003973457859275 1 0 92000.0 1 1 1 510 0 1 0 1 0 0 4.079571676709981 0.2539261099118858 1.5644560502190732 0.7474571831400929 -0.552679423766173 1.6208107056148582 0 1 70000.0 1 1 0 147 1 0 1 0 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.4959764504343667 123 rows X 19 columns Total time taken by feature scaling: 54.15 sec Feature selection using rfe ... feature selected by RFE: ['bathrms', 'fullbase_1', 'gashw_0', 'driveway_0', 'stories', 'airco_1', 'gashw_1', 'bedrooms_0', 'bedrooms_2', 'driveway_1', 'garagepl', 'recroom_0', 'fullbase_0', 'homestyle', 'airco_0', 'recroom_1', 'prefarea', 'lotsize'] Total time taken by feature selection: 55.81 sec scaling Features of rfe data ... columns that will be scaled: ['r_bathrms', 'r_stories', 'r_garagepl', 'r_homestyle', 'r_prefarea', 'r_lotsize'] Training dataset sample after scaling: r_gashw_0 r_fullbase_1 r_driveway_1 r_recroom_1 r_bedrooms_0 r_bedrooms_2 id r_recroom_0 r_gashw_1 r_driveway_0 r_airco_0 r_airco_1 r_fullbase_0 price r_bathrms r_stories r_garagepl r_homestyle r_prefarea r_lotsize 1 0 1 0 1 0 56 1 0 0 1 0 1 47000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.6389834756925418 1 1 1 0 1 0 26 1 0 0 1 0 0 52000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.7437609001391254 1 0 1 0 1 0 245 1 0 0 1 0 1 64000.0 -0.5698449326198071 0.2539261099118858 0.3943732959927247 0.7474571831400929 -0.552679423766173 -0.5219348213558176 1 0 1 1 1 0 175 0 0 0 0 1 1 78000.0 -0.5698449326198071 2.5617608810097847 -0.7757094582336237 0.7474571831400929 -0.552679423766173 0.5022409040905186 1 0 0 0 1 0 33 1 0 1 1 0 1 40500.0 -0.5698449326198071 0.2539261099118858 0.3943732959927247 -0.7516329215933895 -0.552679423766173 -0.87119290284443 1 1 1 1 0 1 100 0 0 0 1 0 0 57000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 1.8093671611394 -0.5172151175519175 1 0 1 0 1 0 427 1 0 0 1 0 1 57500.0 -0.5698449326198071 -0.8999912756370636 2.734538804445422 0.7474571831400929 -0.552679423766173 0.11994489597460505 1 1 1 0 1 0 116 1 0 0 0 1 0 75000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.41810133767001395 1 0 1 0 0 1 28 1 0 0 0 1 1 52000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -1.012784016961435 1 0 0 0 1 0 32 1 0 1 1 0 1 46000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.956147571314633 359 rows X 20 columns Testing dataset sample after scaling: r_gashw_0 r_fullbase_1 r_driveway_1 r_recroom_1 r_bedrooms_0 r_bedrooms_2 id r_recroom_0 r_gashw_1 r_driveway_0 r_airco_0 r_airco_1 r_fullbase_0 price r_bathrms r_stories r_garagepl r_homestyle r_prefarea r_lotsize 1 0 0 0 1 0 34 1 0 1 1 0 1 25245.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.295966245195445 1 0 1 0 1 0 38 1 0 0 1 0 1 56000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -1.012784016961435 1 0 0 0 1 0 122 1 0 1 1 0 1 47900.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.15437513107844 1 1 1 0 1 0 140 1 0 0 1 0 0 51000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.9419884599029325 1 1 0 0 1 0 452 1 0 1 1 0 0 44000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.409239136489049 1 1 1 1 0 1 153 0 0 0 1 0 0 60000.0 -0.5698449326198071 -0.8999912756370636 1.5644560502190732 0.7474571831400929 -0.552679423766173 0.33233156715011253 1 0 1 0 0 1 142 1 0 0 1 0 1 47500.0 -0.5698449326198071 -0.8999912756370636 0.3943732959927247 -0.7516329215933895 -0.552679423766173 -0.5219348213558176 1 0 1 0 1 0 400 1 0 0 1 0 1 59900.0 -0.5698449326198071 0.2539261099118858 0.3943732959927247 0.7474571831400929 -0.552679423766173 -0.8003973457859275 1 0 1 0 1 0 510 1 0 0 1 0 1 92000.0 4.079571676709981 0.2539261099118858 1.5644560502190732 0.7474571831400929 -0.552679423766173 1.6208107056148582 1 1 0 1 1 0 147 0 0 1 0 1 0 70000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.4959764504343667 123 rows X 20 columns Total time taken by feature scaling: 51.49 sec scaling Features of pca data ... columns that will be scaled: ['prefarea', 'bathrms', 'lotsize', 'garagepl', 'stories', 'homestyle'] Training dataset sample after scaling: bedrooms_1 driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 recroom_1 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 prefarea bathrms lotsize garagepl stories homestyle 0 1 0 65500.0 1 1 1 44 0 0 1 0 1 0 0 1.8093671611393762 -0.5698449326198075 -0.6163288974338209 0.39437329599272386 0.2539261099118857 0.7474571831400939 0 1 0 54000.0 1 1 1 54 0 0 1 0 1 0 0 1.8093671611393762 -0.5698449326198075 -1.0807477517375974 -0.7757094582336221 1.4078434954608345 0.7474571831400939 0 1 1 61100.0 1 1 1 58 0 0 0 0 1 0 0 1.8093671611393762 -0.5698449326198075 -0.8239958648054283 1.5644560502190699 0.2539261099118857 0.7474571831400939 0 1 0 51500.0 0 1 1 61 0 0 1 0 1 0 1 1.8093671611393762 -0.5698449326198075 -0.5408136365714183 -0.7757094582336221 -0.8999912756370632 0.7474571831400939 0 1 1 95000.0 1 1 0 81 0 1 0 1 0 0 0 1.8093671611393762 -0.5698449326198075 0.26153601009161004 1.5644560502190699 -0.8999912756370632 0.7474571831400939 0 1 0 103500.0 1 1 0 88 0 1 1 1 0 0 0 1.8093671611393762 1.7548633720450884 1.8190382653786652 0.39437329599272386 2.5617608810097834 -2.2507230263268747 0 1 1 63900.0 0 1 1 12 0 0 0 1 0 0 1 -0.5526794237661955 -0.5698449326198075 0.5730364611490211 0.39437329599272386 -0.8999912756370632 0.7474571831400939 0 1 0 87000.0 1 1 1 16 0 0 1 1 0 0 0 -0.5526794237661955 -0.5698449326198075 1.5226408664937345 1.5644560502190699 1.4078434954608345 0.7474571831400939 0 0 1 57000.0 1 1 1 17 1 0 0 1 0 0 0 -0.5526794237661955 1.7548633720450884 -0.30482844637640993 -0.7757094582336221 0.2539261099118857 0.7474571831400939 0 1 1 116000.0 1 1 0 18 0 1 0 1 0 0 0 -0.5526794237661955 -0.5698449326198075 0.7995822437362291 0.39437329599272386 0.2539261099118857 -2.2507230263268747 359 rows X 21 columns Testing dataset sample after scaling: bedrooms_1 driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 recroom_1 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 prefarea bathrms lotsize garagepl stories homestyle 0 1 1 99000.0 1 1 1 13 0 0 0 1 0 0 0 -0.5526794237661955 1.7548633720450884 1.7624018197318632 0.39437329599272386 0.2539261099118857 0.7474571831400939 0 0 0 25245.0 1 1 1 34 1 0 1 0 1 0 0 -0.5526794237661955 -0.5698449326198075 -1.295966245195445 -0.7757094582336221 -0.8999912756370632 -0.7516329215933905 0 1 0 75000.0 1 1 0 36 0 1 1 0 1 0 0 -0.5526794237661955 1.7548633720450884 2.1966145696906785 1.5644560502190699 0.2539261099118857 0.7474571831400939 0 1 0 56000.0 1 1 1 38 0 0 1 0 1 0 0 -0.5526794237661955 -0.5698449326198075 -1.012784016961435 -0.7757094582336221 0.2539261099118857 0.7474571831400939 0 1 0 58500.0 0 1 1 67 0 0 1 0 1 0 1 -0.5526794237661955 -0.5698449326198075 -0.5219348213558176 0.39437329599272386 0.2539261099118857 0.7474571831400939 0 1 0 70000.0 1 1 1 72 0 0 1 0 1 0 0 -0.5526794237661955 -0.5698449326198075 -1.0269431283731354 -0.7757094582336221 1.4078434954608345 0.7474571831400939 0 1 0 83900.0 1 1 1 19 0 0 1 0 1 0 0 1.8093671611393762 -0.5698449326198075 2.980085401138106 1.5644560502190699 1.4078434954608345 0.7474571831400939 0 1 1 120000.0 1 1 1 27 0 0 0 1 0 0 0 1.8093671611393762 1.7548633720450884 0.16714193401360672 0.39437329599272386 0.2539261099118857 -2.2507230263268747 0 1 0 118500.0 1 1 1 40 0 0 1 1 0 0 0 1.8093671611393762 1.7548633720450884 -0.12547970182820362 0.39437329599272386 0.2539261099118857 -2.2507230263268747 0 1 0 49500.0 1 1 1 75 0 0 1 0 1 0 0 1.8093671611393762 -0.5698449326198075 -1.4163186921948991 -0.7757094582336221 1.4078434954608345 -0.7516329215933905 123 rows X 21 columns Total time taken by feature scaling: 50.44 sec Dimension Reduction using pca ... PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9'] Total time taken by PCA: 10.63 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Model Training started ... Starting customized hyperparameter update ... Completed customized hyperparameter update. Hyperparameters used for model training: response_column : price name : xgboost model_type : Regression column_sampling : (1, 0.6) min_impurity : (0.0, 0.1, 0.2, 0.3) lambda1 : (0.01, 0.1, 1, 10) shrinkage_factor : (0.5, 0.01, 0.05, 0.1) max_depth : (5, 3, 4, 7, 8) min_node_size : (1, 2, 3, 4) iter_num : (10, 20, 30, 40) Total number of models for xgboost : 10240 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- response_column : price name : decision_forest tree_type : Regression min_impurity : (0.0, 0.1, 0.2, 0.3) max_depth : (5, 3, 4, 7, 8) min_node_size : (1, 2, 3, 4) num_trees : (-1, 20, 30, 40) Total number of models for decision_forest : 320 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Performing hyperParameter tuning ... xgboost ---------------------------------------------------------------------------------------------------- decision_forest ---------------------------------------------------------------------------------------------------- Evaluating models performance ... Evaluation completed. Leaderboard Rank Model-ID Feature-Selection MAE MSE MSLE RMSE RMSLE R2-score Adjusted R2-score 0 1 XGBOOST_2 pca 9742.422518 2.064480e+08 0.035378 14368.297209 0.188091 0.690732 0.663119 1 2 DECISIONFOREST_1 rfe 11097.622077 2.093546e+08 0.041378 14469.090168 0.203416 0.686378 0.632097 2 3 XGBOOST_1 rfe 11162.817285 2.219267e+08 0.050646 14897.203438 0.225047 0.667544 0.610004 3 4 DECISIONFOREST_2 pca 10892.666297 2.371269e+08 0.041082 15398.924619 0.202686 0.644773 0.613057 4 5 DECISIONFOREST_0 lasso 12588.499123 3.056303e+08 0.048797 17482.284634 0.220900 0.542152 0.468025 5 6 XGBOOST_3 lasso 12186.337469 3.168533e+08 0.047248 17800.374075 0.217365 0.525340 0.448490 6 7 XGBOOST_0 lasso 12186.337469 3.168533e+08 0.047248 17800.374075 0.217365 0.525340 0.448490 7 8 DECISIONFOREST_3 lasso 14983.007904 4.247282e+08 0.066943 20608.935846 0.258733 0.363738 0.260724 8 rows X 10 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 20/20
- Display model leaderboard.
>>> aml.leaderboard()
Rank Model-ID Feature-Selection MAE MSE MSLE RMSE RMSLE R2-score Adjusted R2-score 0 1 XGBOOST_2 pca 9742.422518 2.064480e+08 0.035378 14368.297209 0.188091 0.690732 0.663119 1 2 DECISIONFOREST_1 rfe 11097.622077 2.093546e+08 0.041378 14469.090168 0.203416 0.686378 0.632097 2 3 XGBOOST_1 rfe 11162.817285 2.219267e+08 0.050646 14897.203438 0.225047 0.667544 0.610004 3 4 DECISIONFOREST_2 pca 10892.666297 2.371269e+08 0.041082 15398.924619 0.202686 0.644773 0.613057 4 5 DECISIONFOREST_0 lasso 12588.499123 3.056303e+08 0.048797 17482.284634 0.220900 0.542152 0.468025 5 6 XGBOOST_3 lasso 12186.337469 3.168533e+08 0.047248 17800.374075 0.217365 0.525340 0.448490 6 7 XGBOOST_0 lasso 12186.337469 3.168533e+08 0.047248 17800.374075 0.217365 0.525340 0.448490 7 8 DECISIONFOREST_3 lasso 14983.007904 4.247282e+08 0.066943 20608.935846 0.258733 0.363738 0.260724
- Display the best performing model.
>>> aml.leader()
Rank Model-ID Feature-Selection MAE MSE MSLE RMSE RMSLE R2-score Adjusted R2-score 0 1 XGBOOST_2 pca 9742.422518 2.064480e+08 0.035378 14368.297209 0.188091 0.690732 0.663119
- Generate prediction on validation dataset using best performing model.In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
>>> prediction = aml.predict()
Following model is being used for generating prediction : Model ID : XGBOOST_2 Feature Selection Method : pca Prediction : id Prediction Confidence_Lower Confidence_upper price 0 10 63809.865056 33717.239917 93902.490195 54500.0 1 13 92935.646586 46824.448096 139046.845076 99000.0 2 40 105986.892125 59432.832713 152540.951537 118500.0 3 24 79861.408889 40881.522757 118841.295022 99000.0 4 34 38466.813258 21251.107246 55682.519270 25245.0 5 97 98560.011426 47662.676958 149457.345895 106000.0 6 75 58383.595048 29686.945507 87080.244588 49500.0 7 27 96430.203680 51016.647630 141843.759730 120000.0 8 19 86654.297469 44430.989001 128877.605937 83900.0 9 9 41575.195612 22618.003199 60532.388024 50000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 9742.422518 2.064480e+08 0.035378 13.954839 0.188855 14368.297209 0.188091 63470.842062 0.690732 0.704693 2592.517245 0.036744
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper price 13 92935.64658575002 46824.44809589475 139046.84507560526 99000.0 24 79861.40888912501 40881.52275672913 118841.2950215209 99000.0 27 96430.20368 51016.64762957162 141843.7597304284 120000.0 34 38466.813258 21251.10724581299 55682.51927018701 25245.0 38 59323.84226600001 34078.64692764714 84569.03760435287 56000.0 40 105986.892125125 59432.832712888114 152540.9515373619 118500.0 36 89466.34093637503 45426.43042440514 133506.2514483449 75000.0 19 86654.29746912501 44430.989000780464 128877.60593746956 83900.0 10 63809.86505600001 33717.23991664394 93902.49019535608 54500.0 9 41575.195611500014 22618.00319854991 60532.38802445012 50000.0
- Generate prediction on validation dataset using third best performing model.
>>> prediction = aml.predict(rank=3)
Following model is being used for generating prediction : Model ID : XGBOOST_1 Feature Selection Method : rfe Prediction : id Prediction Confidence_Lower Confidence_upper price 0 34 35023.283882 -4837.186404 74883.754169 25245.0 1 38 55898.738253 -8963.818351 120761.294857 56000.0 2 122 35023.283882 -4837.186404 74883.754169 47900.0 3 140 55049.909953 -9742.813129 119842.633036 51000.0 4 452 37547.897706 -4255.979615 79351.775028 44000.0 5 153 67706.129535 -7797.877417 143210.136487 60000.0 6 142 53574.225636 -10035.903135 117184.354408 47500.0 7 400 58698.929647 -7337.989775 124735.849070 59900.0 8 510 74609.789819 -17587.700400 166807.280037 92000.0 9 147 76221.930629 -11523.433164 163967.294423 70000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 11162.817285 2.219267e+08 0.050646 18.09273 -3.630758 14897.203438 0.225047 71234.841896 0.667544 0.668922 3045.942002 0.047478
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper price 13 88394.47405500001 -13505.516260407865 190294.4643704079 99000.0 24 70578.8232925 -5948.346164785151 147105.99274978516 99000.0 27 104401.63842549999 -12692.073419267515 221495.3502702675 120000.0 34 35023.2838825 -4837.186403614331 74883.75416861434 25245.0 38 55898.738252999996 -8963.81835091576 120761.29485691575 56000.0 40 107643.55437249999 -17549.808679008158 232836.91742400813 118500.0 36 72512.602488 -14861.542217641778 159886.7471936418 75000.0 19 89922.82769049998 -10263.101185138774 190108.75656613873 83900.0 10 51887.02052199999 -10237.521547671604 114011.5625916716 54500.0 9 36789.518904 -6212.072251334146 79791.11005933414 50000.0
- Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(housing_test)
Data Transformation started ... Performing transformation carried out in feature engineering phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713817224640943"' Updated dataset after performing customized variable width bin-code transformation : bathrms lotsize airco gashw garagepl id recroom sn driveway stories prefarea fullbase homestyle price bedrooms 1 3520.0 no no 0 51 no 443 yes 1 yes no Eclectic 65000.0 big_house 1 4350.0 no yes 1 20 no 198 no 2 no no Classic 40500.0 big_house 1 3162.0 yes no 1 49 no 161 yes 2 no no Eclectic 63900.0 big_house 1 3750.0 no no 0 43 no 140 yes 2 no no Classic 43000.0 big_house 1 5076.0 no no 0 52 no 111 no 1 no no Classic 43000.0 big_house 1 7980.0 no no 2 69 no 353 yes 1 no no Eclectic 78500.0 big_house 1 3760.0 no yes 2 37 no 117 yes 2 no no Eclectic 93000.0 big_house 1 5000.0 no no 0 67 no 317 yes 4 no no Eclectic 80000.0 big_house 1 3000.0 no no 2 44 no 239 yes 1 no yes Classic 26000.0 small_house 1 5400.0 no no 0 28 no 177 yes 2 no no Eclectic 70000.0 big_house result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815770435510"' result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815975550667"' Updated dataset after performing customized categorical encoding : prefarea bedrooms bathrms lotsize gashw garagepl id recroom sn driveway stories airco homestyle fullbase price 62906.33597883598 small_house 1 3180.0 no 0 59 no 195 yes 1 no 1 no 33000.0 62906.33597883598 small_house 1 5885.0 no 1 29 no 306 yes 1 yes 2 no 64000.0 62906.33597883598 big_house 1 4360.0 no 0 15 no 255 yes 2 no 2 no 61000.0 62906.33597883598 big_house 1 5170.0 no 0 12 no 38 yes 4 yes 2 no 67000.0 62906.33597883598 small_house 1 3185.0 no 0 23 no 16 yes 1 yes 1 no 37900.0 62906.33597883598 small_house 1 9166.0 no 2 11 no 53 yes 1 yes 2 yes 68000.0 83851.72413793103 big_house 1 2787.0 no 0 27 no 472 yes 1 no 2 yes 60500.0 83851.72413793103 small_house 1 2176.0 no 0 8 yes 469 yes 2 no 2 no 55000.0 83851.72413793103 big_house 1 7410.0 no 2 40 yes 401 yes 1 yes 2 yes 92500.0 83851.72413793103 big_house 1 3520.0 no 2 39 no 441 yes 1 no 2 no 51900.0 result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816198052184"' Updated dataset after performing categorical encoding : prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 sn driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 83851.72413793103 1 0 0 1 9000.0 1 0 1 33 1 0 411 0 1 1 1 0 2 0 1 90000.0 83851.72413793103 1 0 0 1 2398.0 1 0 0 21 1 0 459 0 1 1 1 0 1 1 0 44555.0 83851.72413793103 1 0 0 1 6862.0 1 0 2 19 1 0 440 0 1 2 0 1 2 1 0 69000.0 83851.72413793103 1 0 0 1 3520.0 1 0 0 51 1 0 443 0 1 1 1 0 2 1 0 65000.0 83851.72413793103 1 0 0 1 3520.0 1 0 2 39 1 0 441 0 1 1 1 0 2 1 0 51900.0 83851.72413793103 0 0 1 1 2176.0 1 0 0 8 0 1 469 0 1 2 1 0 2 1 0 55000.0 62906.33597883598 0 0 1 1 3180.0 1 0 0 59 1 0 195 0 1 1 1 0 1 1 0 33000.0 62906.33597883598 0 0 1 1 5885.0 1 0 1 29 1 0 306 0 1 1 0 1 2 1 0 64000.0 62906.33597883598 1 0 0 1 4360.0 1 0 0 15 1 0 255 0 1 2 1 0 2 1 0 61000.0 62906.33597883598 1 0 0 1 5170.0 1 0 0 12 1 0 38 0 1 4 0 1 2 1 0 67000.0 Updated dataset after performing customized anti-selection : prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 62906.33597883598 1 0 0 1 4360.0 1 0 0 15 1 0 0 1 2 1 0 2 1 0 61000.0 62906.33597883598 0 0 1 1 3185.0 1 0 0 23 1 0 0 1 1 0 1 1 1 0 37900.0 62906.33597883598 0 0 1 1 9166.0 1 0 2 11 1 0 0 1 1 0 1 2 0 1 68000.0 62906.33597883598 1 0 0 1 10700.0 1 0 0 16 0 1 0 1 2 1 0 2 0 1 72000.0 62906.33597883598 0 0 1 1 4080.0 1 0 0 9 1 0 0 1 1 1 0 2 1 0 55000.0 62906.33597883598 1 0 0 1 1700.0 1 0 0 17 1 0 0 1 2 1 0 1 1 0 27000.0 83851.72413793103 1 0 0 1 7410.0 1 0 2 40 0 1 0 1 1 0 1 2 0 1 92500.0 83851.72413793103 1 0 0 1 6825.0 1 0 0 32 0 1 0 1 1 0 1 2 0 1 77500.0 83851.72413793103 1 0 0 1 9000.0 1 0 1 33 1 0 0 1 1 1 0 2 0 1 90000.0 83851.72413793103 1 0 0 1 2610.0 1 0 0 13 1 0 0 1 2 1 0 1 0 1 49000.0 Performing transformation carried out in data preparation phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713818406018603"' Updated dataset after performing Lasso feature selection: id bathrms fullbase_1 gashw_0 driveway_0 stories airco_1 gashw_1 bedrooms_0 bedrooms_2 driveway_1 garagepl recroom_0 fullbase_0 homestyle airco_0 prefarea lotsize price 31 1 1 1 0 2 1 0 1 0 1 0 1 0 2 0 62906.336 2953.0 60000.0 9 1 0 1 0 1 0 0 0 1 1 0 1 1 2 1 62906.336 4080.0 55000.0 17 1 0 1 0 2 0 0 1 0 1 0 1 1 1 1 62906.336 1700.0 27000.0 25 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 62906.336 3500.0 44500.0 10 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 62906.336 6000.0 41000.0 36 1 0 1 1 1 0 0 0 1 0 0 1 1 1 1 62906.336 3970.0 32500.0 8 1 0 1 0 2 0 0 0 1 1 0 0 1 2 1 83851.7241 2176.0 55000.0 16 1 1 1 0 2 0 0 1 0 1 0 0 0 2 1 62906.336 10700.0 72000.0 29 1 0 1 0 1 1 0 0 1 1 1 1 1 2 0 62906.336 5885.0 64000.0 15 1 0 1 0 2 0 0 1 0 1 0 1 1 2 1 62906.336 4360.0 61000.0 Updated dataset after performing scaling on Lasso selected features : driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 bathrms stories garagepl homestyle prefarea lotsize 1 1 60000.0 1 1 1 31 0 0 1 0 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -1.0349666248397658 1 0 55000.0 0 1 1 9 0 1 0 1 0 1 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.503056006140217 1 0 27000.0 1 1 1 17 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.6263455114684566 1 0 44500.0 0 1 1 25 0 1 0 1 0 1 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.7767988267664266 1 0 41000.0 0 1 1 10 0 1 0 1 0 1 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 0.4031271242086151 0 0 32500.0 0 1 1 36 1 1 0 1 0 1 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.5549727479831188 1 0 55000.0 0 1 0 8 0 1 0 1 0 1 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 1.8093671611394 -1.4016876104028086 1 1 72000.0 1 1 0 16 0 0 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 2.6213879120416936 1 0 64000.0 0 1 1 29 0 1 1 0 0 1 -0.5698449326198071 -0.8999912756370636 0.3943732959927247 0.7474571831400929 -0.552679423766173 0.34885053046376313 1 0 61000.0 1 1 1 15 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.3709042996310123 Updated dataset after performing RFE feature selection: id bathrms fullbase_1 gashw_0 driveway_0 stories airco_1 gashw_1 bedrooms_0 bedrooms_2 driveway_1 garagepl recroom_0 fullbase_0 homestyle airco_0 recroom_1 prefarea lotsize price 31 1 1 1 0 2 1 0 1 0 1 0 1 0 2 0 0 62906.336 2953.0 60000.0 9 1 0 1 0 1 0 0 0 1 1 0 1 1 2 1 0 62906.336 4080.0 55000.0 17 1 0 1 0 2 0 0 1 0 1 0 1 1 1 1 0 62906.336 1700.0 27000.0 25 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 0 62906.336 3500.0 44500.0 10 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 0 62906.336 6000.0 41000.0 36 1 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0 62906.336 3970.0 32500.0 8 1 0 1 0 2 0 0 0 1 1 0 0 1 2 1 1 83851.7241 2176.0 55000.0 16 1 1 1 0 2 0 0 1 0 1 0 0 0 2 1 1 62906.336 10700.0 72000.0 29 1 0 1 0 1 1 0 0 1 1 1 1 1 2 0 0 62906.336 5885.0 64000.0 15 1 0 1 0 2 0 0 1 0 1 0 1 1 2 1 0 62906.336 4360.0 61000.0 Updated dataset after performing scaling on RFE selected features : r_gashw_0 r_fullbase_1 r_driveway_1 r_recroom_1 r_bedrooms_0 r_bedrooms_2 id r_recroom_0 r_gashw_1 r_driveway_0 r_airco_0 r_airco_1 r_fullbase_0 price r_bathrms r_stories r_garagepl r_homestyle r_prefarea r_lotsize 1 1 1 0 1 0 31 1 0 0 0 1 0 60000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -1.0349666248397658 1 0 1 0 0 1 9 1 0 0 1 0 1 55000.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.503056006140217 1 0 1 0 1 0 17 1 0 0 1 0 1 27000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.6263455114684566 1 0 1 0 0 1 25 1 0 0 1 0 1 44500.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.7767988267664266 1 0 1 0 0 1 10 1 0 0 1 0 1 41000.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 0.4031271242086151 1 0 0 0 0 1 36 1 0 1 1 0 1 32500.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.5549727479831188 1 0 1 1 0 1 8 0 0 0 1 0 1 55000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 1.8093671611394 -1.4016876104028086 1 1 1 1 1 0 16 0 0 0 1 0 0 72000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 2.6213879120416936 1 0 1 0 0 1 29 1 0 0 0 1 1 64000.0 -0.5698449326198071 -0.8999912756370636 0.3943732959927247 0.7474571831400929 -0.552679423766173 0.34885053046376313 1 0 1 0 1 0 15 1 0 0 1 0 1 61000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.3709042996310123 Updated dataset after performing scaling for PCA feature selection : bedrooms_1 driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 recroom_1 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 prefarea bathrms lotsize garagepl stories homestyle 0 1 1 60000.0 1 1 1 31 0 0 0 1 0 0 0 -0.5526794213794932 -0.5698449326198075 -1.0349666248397658 -0.7757094582336221 0.2539261099118857 0.7474571831400939 0 1 0 55000.0 0 1 1 9 0 0 1 0 1 0 1 -0.5526794213794932 -0.5698449326198075 -0.503056006140217 -0.7757094582336221 -0.8999912756370632 0.7474571831400939 0 1 0 27000.0 1 1 1 17 0 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -1.6263455114684566 -0.7757094582336221 0.2539261099118857 -0.7516329215933905 0 1 0 44500.0 0 1 1 25 0 0 1 0 1 0 1 -0.5526794213794932 -0.5698449326198075 -0.7767988267664266 -0.7757094582336221 -0.8999912756370632 -0.7516329215933905 0 1 0 41000.0 0 1 1 10 0 0 1 0 1 0 1 -0.5526794213794932 -0.5698449326198075 0.4031271242086151 -0.7757094582336221 -0.8999912756370632 -0.7516329215933905 0 0 0 32500.0 0 1 1 36 1 0 1 0 1 0 1 -0.5526794213794932 -0.5698449326198075 -0.5549727479831188 -0.7757094582336221 -0.8999912756370632 -0.7516329215933905 0 1 0 55000.0 0 1 0 8 0 1 1 0 1 0 1 1.809367156861831 -0.5698449326198075 -1.4016876104028086 -0.7757094582336221 0.2539261099118857 0.7474571831400939 0 1 1 72000.0 1 1 0 16 0 1 0 0 1 0 0 -0.5526794213794932 -0.5698449326198075 2.6213879120416936 -0.7757094582336221 0.2539261099118857 0.7474571831400939 0 1 0 64000.0 0 1 1 29 0 0 1 1 0 0 1 -0.5526794213794932 -0.5698449326198075 0.34885053046376313 0.39437329599272386 -0.8999912756370632 0.7474571831400939 0 1 0 61000.0 1 1 1 15 0 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -0.3709042996310123 -0.7757094582336221 0.2539261099118857 0.7474571831400939 Updated dataset after performing PCA feature selection : id col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 price 0 15 -1.003669 0.239619 -0.544437 -0.446579 -0.873745 0.132942 -0.304909 -0.283428 -0.156519 -0.393635 61000.0 1 29 -0.570314 -0.909223 0.891253 -0.739743 -0.298341 0.315286 -0.009998 1.337645 0.357075 0.066034 64000.0 2 31 -0.938230 0.181104 -0.994173 -0.401154 -0.410828 -0.391002 1.254502 0.614167 -0.346502 -0.214179 60000.0 3 16 0.737217 -1.057533 -0.249024 -0.476448 -1.107181 2.240093 1.110594 -1.325931 0.298640 0.170796 72000.0 4 9 -1.786104 -0.439574 0.030216 -0.344894 -0.025365 0.425588 -0.550912 0.337141 0.623026 -0.124383 55000.0 5 17 -1.414625 1.385440 -0.124862 0.557650 -0.185774 -0.603600 -0.094402 -0.227540 -0.150107 -0.692651 27000.0 6 25 -1.701373 0.405676 0.680010 0.610730 0.412048 0.370640 -0.388688 0.263257 0.585903 -0.277915 44500.0 7 8 -0.880471 -0.664464 -1.511189 1.245302 -0.187289 -0.943057 -0.515748 0.269059 1.517742 -0.028053 55000.0 8 10 -1.105595 0.044414 0.956715 0.552309 0.110896 1.189863 -0.446720 0.107281 0.533577 -0.103056 41000.0 9 36 -1.772019 0.459011 0.740512 0.646621 0.516132 0.508674 -0.404914 0.170345 0.001238 0.920050 32500.0 Data Transformation completed. Following model is being used for generating prediction : Model ID : XGBOOST_2 Feature Selection Method : pca Prediction : id Prediction Confidence_Lower Confidence_upper price 0 31 62660.364670 34660.085646 90660.643694 60000.0 1 9 55986.209180 30785.420530 81186.997830 55000.0 2 17 44426.444076 24229.803083 64623.085068 27000.0 3 25 38398.516388 21188.376911 55608.655865 44500.0 4 10 45683.429188 24617.332667 66749.525709 41000.0 5 36 40066.436509 21095.208542 59037.664475 32500.0 6 8 56863.357233 32185.946248 81540.768217 55000.0 7 16 67979.252919 37332.178823 98626.327014 72000.0 8 29 58572.951390 33177.335198 83968.567583 64000.0 9 15 59827.259623 33990.028359 85664.490888 61000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 8002.145716 1.167270e+08 0.038814 15.298458 -5.345 10804.026128 0.197013 34490.55843 0.637479 0.637927 1995.655639 0.036785
>>> prediction.head()
id Prediction Confidence_Lower Confidence_upper price 10 45683.42918800001 24617.332667 66749.52570900002 41000.0 12 78811.21095774998 41047.641704317764 116574.7802111822 67000.0 13 57454.380859375015 30640.814398146318 84267.94732060371 49000.0 14 52876.934924625 28793.137666832026 76960.73218241797 48500.0 16 67979.2529185 37332.178823254886 98626.32701374512 72000.0 17 44426.44407562501 24229.80308276226 64623.085068487766 27000.0 15 59827.259623375016 33990.02835867039 85664.49088807964 61000.0 11 71539.44990912499 38962.82825323011 104116.07156501987 68000.0 9 55986.209179749996 30785.420529747764 81186.99782975223 55000.0 8 56863.35723262501 32185.946247916516 81540.7682173335 55000.0
- Generate prediction on test dataset using second best performing model.
>>> prediction = aml.predict(housing_test,2)
Data Transformation started ... Performing transformation carried out in feature engineering phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815691874863"' Updated dataset after performing customized variable width bin-code transformation : bathrms lotsize airco gashw garagepl id recroom sn driveway stories prefarea fullbase homestyle price bedrooms 1 3750.0 no no 0 43 no 140 yes 2 no no Classic 43000.0 big_house 1 9000.0 no no 1 33 no 411 yes 1 yes yes Eclectic 90000.0 big_house 1 10700.0 no no 0 16 yes 364 yes 2 no yes Eclectic 72000.0 big_house 1 4080.0 no no 0 9 no 301 yes 1 no no Eclectic 55000.0 small_house 1 3520.0 no no 0 51 no 443 yes 1 yes no Eclectic 65000.0 big_house 1 3180.0 no no 0 59 no 195 yes 1 no no Classic 33000.0 small_house 1 4960.0 no no 0 48 no 25 yes 1 no no Classic 42000.0 small_house 1 3500.0 no no 0 25 no 249 yes 1 no no Classic 44500.0 small_house 1 2650.0 no no 1 41 no 142 yes 2 no yes Classic 40000.0 big_house 1 3162.0 yes no 1 49 no 161 yes 2 no no Eclectic 63900.0 big_house result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815686869295"' result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816248740571"' Updated dataset after performing customized categorical encoding : prefarea bedrooms bathrms lotsize gashw garagepl id recroom sn driveway stories airco homestyle fullbase price 62906.33597883598 big_house 1 5400.0 no 0 28 no 177 yes 2 no 2 no 70000.0 62906.33597883598 small_house 1 6000.0 no 0 10 no 260 yes 1 no 1 no 41000.0 62906.33597883598 big_house 1 1700.0 no 0 17 no 13 yes 2 no 1 no 27000.0 62906.33597883598 big_house 1 2650.0 no 1 41 no 142 yes 2 no 1 yes 40000.0 62906.33597883598 big_house 1 5000.0 no 0 67 no 317 yes 4 no 2 no 80000.0 62906.33597883598 small_house 1 3000.0 no 2 44 no 239 yes 1 no 1 yes 26000.0 83851.72413793103 small_house 1 2176.0 no 0 8 yes 469 yes 2 no 2 no 55000.0 83851.72413793103 big_house 1 3520.0 no 2 39 no 441 yes 1 no 2 no 51900.0 83851.72413793103 big_house 1 2610.0 no 0 13 no 463 yes 2 no 1 yes 49000.0 83851.72413793103 big_house 1 6420.0 no 0 22 no 408 yes 3 no 2 yes 87500.0 result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815690312918"' Updated dataset after performing categorical encoding : prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 sn driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 62906.33597883598 1 0 0 1 1700.0 1 0 0 17 1 0 13 0 1 2 1 0 1 1 0 27000.0 62906.33597883598 1 0 0 1 5000.0 1 0 0 67 1 0 317 0 1 4 1 0 2 1 0 80000.0 62906.33597883598 0 0 1 1 3000.0 1 0 2 44 1 0 239 0 1 1 1 0 1 0 1 26000.0 62906.33597883598 1 0 0 1 5076.0 1 0 0 52 1 0 111 1 0 1 1 0 1 1 0 43000.0 62906.33597883598 1 0 0 1 3900.0 1 0 0 45 1 0 340 0 1 2 1 0 2 1 0 62500.0 62906.33597883598 1 0 0 1 3630.0 1 0 3 53 1 0 237 0 1 2 1 0 1 1 0 43000.0 83851.72413793103 1 0 0 1 2610.0 1 0 0 13 1 0 463 0 1 2 1 0 1 0 1 49000.0 83851.72413793103 1 0 0 1 2398.0 1 0 0 21 1 0 459 0 1 1 1 0 1 1 0 44555.0 83851.72413793103 1 0 0 1 9000.0 1 0 1 33 1 0 411 0 1 1 1 0 2 0 1 90000.0 83851.72413793103 1 0 0 1 2787.0 1 0 0 27 1 0 472 0 1 1 1 0 2 0 1 60500.0 Updated dataset after performing customized anti-selection : prefarea bedrooms_0 bedrooms_1 bedrooms_2 bathrms lotsize gashw_0 gashw_1 garagepl id recroom_0 recroom_1 driveway_0 driveway_1 stories airco_0 airco_1 homestyle fullbase_0 fullbase_1 price 62906.33597883598 1 0 0 1 1700.0 1 0 0 17 1 0 0 1 2 1 0 1 1 0 27000.0 62906.33597883598 1 0 0 1 5000.0 1 0 0 67 1 0 0 1 4 1 0 2 1 0 80000.0 62906.33597883598 0 0 1 1 3000.0 1 0 2 44 1 0 0 1 1 1 0 1 0 1 26000.0 62906.33597883598 1 0 0 1 5076.0 1 0 0 52 1 0 1 0 1 1 0 1 1 0 43000.0 62906.33597883598 1 0 0 1 3900.0 1 0 0 45 1 0 0 1 2 1 0 2 1 0 62500.0 62906.33597883598 1 0 0 1 3630.0 1 0 3 53 1 0 0 1 2 1 0 1 1 0 43000.0 83851.72413793103 1 0 0 1 2610.0 1 0 0 13 1 0 0 1 2 1 0 1 0 1 49000.0 83851.72413793103 1 0 0 1 2398.0 1 0 0 21 1 0 0 1 1 1 0 1 1 0 44555.0 83851.72413793103 1 0 0 1 9000.0 1 0 1 33 1 0 0 1 1 1 0 2 0 1 90000.0 83851.72413793103 1 0 0 1 2787.0 1 0 0 27 1 0 0 1 1 1 0 2 0 1 60500.0 Performing transformation carried out in data preparation phase ... result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713823026627552"' Updated dataset after performing Lasso feature selection: id bathrms fullbase_1 gashw_0 driveway_0 stories airco_1 gashw_1 bedrooms_0 bedrooms_2 driveway_1 garagepl recroom_0 fullbase_0 homestyle airco_0 prefarea lotsize price 47 1 0 1 0 2 0 0 1 0 1 0 1 1 1 1 62906.336 3850.0 44500.0 45 1 0 1 0 2 0 0 1 0 1 0 1 1 2 1 62906.336 3900.0 62500.0 53 1 0 1 0 2 0 0 1 0 1 3 1 1 1 1 62906.336 3630.0 43000.0 15 1 0 1 0 2 0 0 1 0 1 0 1 1 2 1 62906.336 4360.0 61000.0 75 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 62906.336 4040.0 47000.0 11 1 1 1 0 1 1 0 0 1 1 2 1 0 2 0 62906.336 9166.0 68000.0 32 1 1 1 0 1 1 0 1 0 1 0 0 0 2 0 83851.7241 6825.0 77500.0 52 1 0 1 1 1 0 0 1 0 0 0 1 1 1 1 62906.336 5076.0 43000.0 10 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 62906.336 6000.0 41000.0 17 1 0 1 0 2 0 0 1 0 1 0 1 1 1 1 62906.336 1700.0 27000.0 Updated dataset after performing scaling on Lasso selected features : driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 bathrms stories garagepl homestyle prefarea lotsize 1 0 44500.0 1 1 1 47 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.6116091936299208 1 0 62500.0 1 1 1 45 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.58801067461042 1 0 43000.0 1 1 1 53 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 2.734538804445422 -0.7516329215933895 -0.552679423766173 -0.7154426773157244 1 0 61000.0 1 1 1 15 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.3709042996310123 1 0 47000.0 0 1 1 75 0 1 0 1 0 1 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.5219348213558176 1 1 68000.0 0 1 1 11 0 0 1 0 0 1 -0.5698449326198071 -0.8999912756370636 1.5644560502190732 0.7474571831400929 -0.552679423766173 1.8973853485234078 1 1 77500.0 1 1 0 32 0 0 1 0 0 0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 0.7474571831400929 1.8093671611394 0.7925026880303788 0 0 43000.0 1 1 1 52 1 1 0 1 0 0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.03297350727176035 1 0 41000.0 0 1 1 10 0 1 0 1 0 1 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 0.4031271242086151 1 0 27000.0 1 1 1 17 0 1 0 1 0 0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.6263455114684566 Updated dataset after performing RFE feature selection: id bathrms fullbase_1 gashw_0 driveway_0 stories airco_1 gashw_1 bedrooms_0 bedrooms_2 driveway_1 garagepl recroom_0 fullbase_0 homestyle airco_0 recroom_1 prefarea lotsize price 47 1 0 1 0 2 0 0 1 0 1 0 1 1 1 1 0 62906.336 3850.0 44500.0 45 1 0 1 0 2 0 0 1 0 1 0 1 1 2 1 0 62906.336 3900.0 62500.0 53 1 0 1 0 2 0 0 1 0 1 3 1 1 1 1 0 62906.336 3630.0 43000.0 15 1 0 1 0 2 0 0 1 0 1 0 1 1 2 1 0 62906.336 4360.0 61000.0 75 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 0 62906.336 4040.0 47000.0 11 1 1 1 0 1 1 0 0 1 1 2 1 0 2 0 0 62906.336 9166.0 68000.0 32 1 1 1 0 1 1 0 1 0 1 0 0 0 2 0 1 83851.7241 6825.0 77500.0 52 1 0 1 1 1 0 0 1 0 0 0 1 1 1 1 0 62906.336 5076.0 43000.0 10 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 0 62906.336 6000.0 41000.0 17 1 0 1 0 2 0 0 1 0 1 0 1 1 1 1 0 62906.336 1700.0 27000.0 Updated dataset after performing scaling on RFE selected features : r_gashw_0 r_fullbase_1 r_driveway_1 r_recroom_1 r_bedrooms_0 r_bedrooms_2 id r_recroom_0 r_gashw_1 r_driveway_0 r_airco_0 r_airco_1 r_fullbase_0 price r_bathrms r_stories r_garagepl r_homestyle r_prefarea r_lotsize 1 0 1 0 1 0 47 1 0 0 1 0 1 44500.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.6116091936299208 1 0 1 0 1 0 45 1 0 0 1 0 1 62500.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.58801067461042 1 0 1 0 1 0 53 1 0 0 1 0 1 43000.0 -0.5698449326198071 0.2539261099118858 2.734538804445422 -0.7516329215933895 -0.552679423766173 -0.7154426773157244 1 0 1 0 1 0 15 1 0 0 1 0 1 61000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 0.7474571831400929 -0.552679423766173 -0.3709042996310123 1 0 1 0 0 1 75 1 0 0 1 0 1 47000.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.5219348213558176 1 1 1 0 0 1 11 1 0 0 0 1 0 68000.0 -0.5698449326198071 -0.8999912756370636 1.5644560502190732 0.7474571831400929 -0.552679423766173 1.8973853485234078 1 1 1 1 1 0 32 0 0 0 0 1 0 77500.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 0.7474571831400929 1.8093671611394 0.7925026880303788 1 0 0 0 1 0 52 1 0 1 1 0 1 43000.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -0.03297350727176035 1 0 1 0 0 1 10 1 0 0 1 0 1 41000.0 -0.5698449326198071 -0.8999912756370636 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 0.4031271242086151 1 0 1 0 1 0 17 1 0 0 1 0 1 27000.0 -0.5698449326198071 0.2539261099118858 -0.7757094582336237 -0.7516329215933895 -0.552679423766173 -1.6263455114684566 Updated dataset after performing scaling for PCA feature selection : bedrooms_1 driveway_1 fullbase_1 price bedrooms_0 gashw_0 recroom_0 id driveway_0 recroom_1 fullbase_0 airco_1 airco_0 gashw_1 bedrooms_2 prefarea bathrms lotsize garagepl stories homestyle 0 1 0 44500.0 1 1 1 47 0 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -0.6116091936299208 -0.7757094582336221 0.2539261099118857 -0.7516329215933905 0 1 0 62500.0 1 1 1 45 0 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -0.58801067461042 -0.7757094582336221 0.2539261099118857 0.7474571831400939 0 1 0 43000.0 1 1 1 53 0 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -0.7154426773157244 2.734538804445416 0.2539261099118857 -0.7516329215933905 0 1 0 61000.0 1 1 1 15 0 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -0.3709042996310123 -0.7757094582336221 0.2539261099118857 0.7474571831400939 0 1 0 47000.0 0 1 1 75 0 0 1 0 1 0 1 -0.5526794213794932 -0.5698449326198075 -0.5219348213558176 -0.7757094582336221 -0.8999912756370632 -0.7516329215933905 0 1 1 68000.0 0 1 1 11 0 0 0 1 0 0 1 -0.5526794213794932 -0.5698449326198075 1.8973853485234078 1.5644560502190699 -0.8999912756370632 0.7474571831400939 0 1 1 77500.0 1 1 0 32 0 1 0 1 0 0 0 1.809367156861831 -0.5698449326198075 0.7925026880303788 -0.7757094582336221 -0.8999912756370632 0.7474571831400939 0 0 0 43000.0 1 1 1 52 1 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -0.03297350727176035 -0.7757094582336221 -0.8999912756370632 -0.7516329215933905 0 1 0 41000.0 0 1 1 10 0 0 1 0 1 0 1 -0.5526794213794932 -0.5698449326198075 0.4031271242086151 -0.7757094582336221 -0.8999912756370632 -0.7516329215933905 0 1 0 27000.0 1 1 1 17 0 0 1 0 1 0 0 -0.5526794213794932 -0.5698449326198075 -1.6263455114684566 -0.7757094582336221 0.2539261099118857 -0.7516329215933905 Updated dataset after performing PCA feature selection : id col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 price 0 17 -1.414625 1.385440 -0.124862 0.557650 -0.185774 -0.603600 -0.094402 -0.227540 -0.150107 -0.692651 27000.0 1 10 -1.105595 0.044414 0.956715 0.552309 0.110896 1.189863 -0.446720 0.107281 0.533577 -0.103056 41000.0 2 47 -0.902256 1.074754 0.113105 0.507409 -0.444764 0.100932 -0.144309 -0.361679 -0.195108 -0.542272 44500.0 3 52 -1.229283 0.416949 0.607011 0.564425 0.187417 0.726843 -0.220520 -0.356044 -1.097318 0.421967 43000.0 4 45 -1.113292 0.306091 -0.595351 -0.435829 -0.818333 -0.017795 -0.294231 -0.254729 -0.146891 -0.425810 62500.0 5 53 0.449859 0.338874 2.106509 -0.463153 -0.233032 -2.098387 -0.168963 -0.619831 -0.075069 -0.317378 43000.0 6 15 -1.003669 0.239619 -0.544437 -0.446579 -0.873745 0.132942 -0.304909 -0.283428 -0.156519 -0.393635 61000.0 7 32 0.470801 -2.082235 -1.196821 1.131651 -0.030202 0.613375 1.070719 0.477148 -0.176126 -0.287062 77500.0 8 75 -1.572685 0.327644 0.739779 0.598111 0.346999 0.547592 -0.401223 0.229566 0.574601 -0.240146 47000.0 9 11 0.763052 -1.948140 1.644308 -1.101667 -0.221112 0.611181 0.837938 0.648384 0.374097 0.669405 68000.0 Data Transformation completed. Following model is being used for generating prediction : Model ID : DECISIONFOREST_1 Feature Selection Method : rfe Prediction : id prediction confidence_lower confidence_upper price 0 47 44000.694444 43999.333333 44002.055556 44500.0 1 45 58427.906977 55346.604651 61509.209302 62500.0 2 53 41336.956522 36117.391304 46556.521739 43000.0 3 15 58427.906977 55346.604651 61509.209302 61000.0 4 75 44000.694444 43999.333333 44002.055556 47000.0 5 11 80174.038462 51703.153846 108644.923077 68000.0 6 32 87175.757576 72428.242424 101923.272727 77500.0 7 52 38000.694444 26239.333333 49762.055556 43000.0 8 10 69350.694444 19666.055556 119035.333333 41000.0 9 17 41000.000000 35120.000000 46880.000000 27000.0 Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 7402.911675 1.081247e+08 0.032827 13.675299 -2.398854 10398.301013 0.181182 31113.636364 0.664195 0.664562 1791.747597 0.032045
>>> prediction.head()
id prediction confidence_lower confidence_upper price 10 69350.69444444444 19666.055555555547 119035.33333333333 41000.0 12 71718.75 48750.0 94687.5 67000.0 13 41000.0 35120.0 46880.0 49000.0 14 41336.956521739135 36117.39130434795 46556.52173913032 48500.0 16 80174.03846153847 51703.1538461539 108644.92307692303 72000.0 17 41000.0 35120.0 46880.0 27000.0 15 58427.90697674418 55346.60465116245 61509.209302325915 61000.0 11 80174.03846153847 51703.1538461539 108644.92307692303 68000.0 9 56427.90697674418 55589.20930232445 57266.604651163914 55000.0 8 56427.90697674418 55589.20930232445 57266.604651163914 55000.0