AutoRegressor for regression with early stopping condition and customization - Example 2: Run AutoRegressor for Regression Problem with Early Stopping Condition and Customization - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
Language
English (United States)
Last Update
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example predict predict the price of house based on different factors.

Run AutoRegressor to get the best performing model with following specifications:

  • Set early stopping criteria, that is, time limit to 300 sec and performance metrics R2 threshold value to 0.7.
  • Exclude ‘glm’, ‘svm’, and ‘knn’ model from default model training list.
  • Opt for verbose level 2 to get detailed logging.
  • Use custom_config_file to customize some specific processes in AutoML flow.
  1. Load the example dataset.
    >>> load_example_data("decisionforestpredict", ["housing_train", "housing_test"])
    >>> housing_train = DataFrame.from_table("housing_train")
    >>> housing_test = DataFrame.from_table("housing_test")
  2. Generate custom config JSON file.
    >>> AutoRegressor.generate_custom_config("custom_housing")
    Generating custom config JSON for AutoML ...
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  1
    
    Customizing Feature Engineering Phase ...
    
    Available options for customization of feature engineering phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Missing Value Handling
    
    Index 2: Customize Bincode Encoding
    
    Index 3: Customize String Manipulation
    
    Index 4: Customize Categorical Encoding
    
    Index 5: Customize Mathematical Transformation
    
    Index 6: Customize Nonlinear Transformation
    
    Index 7: Customize Antiselect Features
    
    Index 8: Back to main menu
    
    Index 9: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in feature engineering phase:  2,4,7,8
    
    Customizing Bincode Encoding ...
    
    Provide the following details to customize binning and coding encoding:
    
    Available binning methods with corresponding indices:
    Index 1: Equal-Width
    Index 2: Variable-Width
    
    Enter the feature or list of features for binning:  bedrooms
    
    Enter the index of corresponding binning method for feature bedrooms:  2
    
    Enter the number of bins for feature bedrooms:  2
    
    Available value type of feature for variable binning with corresponding indices:
    Index 1: int
    Index 2: float
    
    Provide the range for bin 1 of feature bedrooms: 
    
    Enter the index of corresponding value type of feature bedrooms:  1
    
    Enter the minimum value for bin 1 of feature bedrooms:  0
    
    Enter the maximum value for bin 1 of feature bedrooms:  2
    
    Enter the label for bin 1 of feature bedrooms:  small_house
    
    Provide the range for bin 2 of feature bedrooms: 
    
    Enter the index of corresponding value type of feature bedrooms:  1
    
    Enter the minimum value for bin 2 of feature bedrooms:  3
    
    Enter the maximum value for bin 2 of feature bedrooms:  5
    
    Enter the label for bin 2 of feature bedrooms:  big_house
    
    Customization of bincode encoding has been completed successfully.
    
    Customizing Categorical Encoding ...
    
    Provide the following details to customize categorical encoding:
    
    Available categorical encoding methods with corresponding indices:
    Index 1: OneHotEncoding
    Index 2: OrdinalEncoding
    Index 3: TargetEncoding
    
    Enter the list of corresponding index categorical encoding methods you want to use:  2,3
    
    Enter the feature or list of features for OrdinalEncoding:  homestyle
    
    Enter the feature or list of features for TargetEncoding:  prefarea
    
    Available target encoding methods with corresponding indices:
    Index 1: CBM_BETA
    Index 2: CBM_DIRICHLET
    Index 3: CBM_GAUSSIAN_INVERSE_GAMMA
    
    Enter the index of target encoding method for feature prefarea:  3
    
    Enter the response column for target encoding method for feature prefarea:  price
    
    Customization of categorical encoding has been completed successfully.
    
    Customizing Antiselect Features ...
    
    Enter the feature or list of features for antiselect:  sn
    
    Customization of antiselect features has been completed successfully.
    
    Customization of feature engineering phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  2
    
    Customizing Data Preparation Phase ...
    
    Available options for customization of data preparation phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Train Test Split
    
    Index 2: Customize Data Imbalance Handling
    
    Index 3: Customize Outlier Handling
    
    Index 4: Customize Feature Scaling
    
    Index 5: Back to main menu
    
    Index 6: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in data preparation phase:  1,2,3,4,5
    
    Customizing Train Test Split ...
    
    Enter the train size for train test split:  0.75
    
    Customization of train test split has been completed successfully.
    
    Customizing Data Imbalance Handling ...
    
    Available data sampling methods with corresponding indices:
    Index 1: SMOTE
    Index 2: NearMiss
    
    Enter the corresponding index data imbalance handling method:  1
    
    Customization of data imbalance handling has been completed successfully.
    
    Customizing Outlier Handling ...
    
    Available outlier detection methods with corresponding indices:
    Index 1: percentile
    Index 2: tukey
    Index 3: carling
    
    Enter the corresponding index oulier handling method:  1
    
    Enter the lower percentile value for outlier handling:  0.1
    
    Enter the upper percentile value for outlier handling:  0.9
    
    Enter the feature or list of features for outlier handling:  bathrms
    
    Available outlier replacement methods with corresponding indices:
    Index 1: delete
    Index 2: median
    Index 3: Any Numeric Value
    
    Enter the index of corresponding replacement method for feature bathrms:  1
    
    Customization of outlier handling has been completed successfully.
    
    Available feature scaling methods with corresponding indices:
    Index 1: maxabs
    Index 2: mean
    Index 3: midrange
    Index 4: range
    Index 5: rescale
    Index 6: std
    Index 7: sum
    Index 8: ustd
    
    Enter the corresponding index feature scaling method:  6
    
    Customization of feature scaling has been completed successfully.
    
    Customization of data preparation phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  3
    
    Customizing Model Training Phase ...
    
    Available options for customization of model training phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Model Hyperparameter
    
    Index 2: Back to main menu
    
    Index 3: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in model training phase:  1
    
    Customizing Model Hyperparameter ...
    
    Available models for hyperparameter tuning with corresponding indices:
    Index 1: decision_forest
    Index 2: xgboost
    Index 3: knn
    Index 4: glm
    Index 5: svm
    
    Available hyperparamters update methods with corresponding indices:
    Index 1: ADD
    Index 2: REPLACE
    
    Enter the list of model indices for performing hyperparameter tuning:  2
    
    Available hyperparameters for model 'xgboost' with corresponding indices:
    Index 1: min_impurity
    Index 2: max_depth
    Index 3: min_node_size
    Index 4: shrinkage_factor
    Index 5: iter_num
    
    Enter the list of hyperparameter indices for model 'xgboost':  3
    
    Enter the index of corresponding update method for hyperparameters 'min_node_size' for model 'xgboost':  1
    
    Enter the list of value for hyperparameter 'min_node_size' for model 'xgboost':  1,2
    
    Customization of model hyperparameter has been completed successfully.
    
    Available options for customization of model training phase with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Model Hyperparameter
    
    Index 2: Back to main menu
    
    Index 3: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the list of indices you want to customize in model training phase:  2
    
    Customization of model training phase has been completed successfully.
    
    Available main options for customization with corresponding indices: 
    --------------------------------------------------------------------------------
    
    Index 1: Customize Feature Engineering Phase
    
    Index 2: Customize Data Preparation Phase
    
    Index 3: Customize Model Training Phase
    
    Index 4: Generate custom json and exit
    --------------------------------------------------------------------------------
    
    Enter the index you want to customize:  4
    
    Generating custom json and exiting ...
    
    Process of generating custom config file for AutoML has been completed successfully.
    
    'custom_housing.json' file is generated successfully under the current working directory. 
  3. Create an AutoRegressor instance.
    >>> aml = AutoRegressor(exclude=['glm','svm','knn'],
                            verbose=2,
                            max_runtime_secs=300,
                            stopping_metric='R2',
                            stopping_tolerance=0.7,
                            custom_config_file='custom_housing.json')
  4. Fit the data.
    >>> aml.fit(housing_train,housing_train.price)
    Received below input for customization : 
    {
        "BincodeIndicator": true,
        "BincodeParam": {
            "bedrooms": {
                "Type": "Variable-Width",
                "NumOfBins": 2,
                "Bin_1": {
                    "min_value": 0,
                    "max_value": 2,
                    "label": "small_house"
                },
                "Bin_2": {
                    "min_value": 3,
                    "max_value": 5,
                    "label": "big_house"
                }
            }
        },
        "CategoricalEncodingIndicator": true,
        "CategoricalEncodingParam": {
            "OrdinalEncodingIndicator": true,
            "OrdinalEncodingList": [
                "homestyle"
            ],
            "TargetEncodingIndicator": true,
            "TargetEncodingList": {
                "prefarea": {
                    "encoder_method": "CBM_GAUSSIAN_INVERSE_GAMMA",
                    "response_column": "price"
                }
            }
        },
        "AntiselectIndicator": true,
        "AntiselectParam": [
            "sn"
        ],
        "TrainTestSplitIndicator": true,
        "TrainingSize": 0.75,
        "DataImbalanceIndicator": true,
        "DataImbalanceMethod": "SMOTE",
        "OutlierFilterIndicator": true,
        "OutlierFilterMethod": "percentile",
        "OutlierLowerPercentile": 0.1,
        "OutlierUpperPercentile": 0.9,
        "OutlierFilterParam": {
            "bathrms": {
                "replacement_value": "delete"
            }
        },
        "FeatureScalingIndicator": true,
        "FeatureScalingMethod": "std",
        "HyperparameterTuningIndicator": true,
        "HyperparameterTuningParam": {
            "xgboost": {
                "min_node_size": {
                    "Method": "ADD",
                    "Value": [
                        1,
                        2
                    ]
                }
            }
        }
    }
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Feature Exploration started ...
    
    Data Overview:
    Total Rows in the data: 492
    Total Columns in the data: 14
    
    Column Summary:
    ColumnName	Datatype	NonNullCount	NullCount	BlankCount	ZeroCount	PositiveCount	NegativeCount	NullPercentage	NonNullPercentage
    recroom	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    homestyle	VARCHAR(20) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    sn	INTEGER	492	0	None	0	492	0	0.0	100.0
    price	FLOAT	492	0	None	0	492	0	0.0	100.0
    prefarea	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    airco	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    stories	INTEGER	492	0	None	0	492	0	0.0	100.0
    fullbase	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    bedrooms	INTEGER	492	0	None	0	492	0	0.0	100.0
    gashw	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    bathrms	INTEGER	492	0	None	0	492	0	0.0	100.0
    garagepl	INTEGER	492	0	None	270	222	0	0.0	100.0
    lotsize	FLOAT	492	0	None	0	492	0	0.0	100.0
    driveway	VARCHAR(10) CHARACTER SET LATIN	492	0	0	None	None	None	0.0	100.0
    
    Statistics of Data:
    func	sn	price	lotsize	bedrooms	bathrms	stories	garagepl
    min	1	25000	1650	1	1	1	0
    std	159.501	26472.496	2182.443	0.731	0.51	0.861	0.854
    25%	132.5	49975	3600	2	1	1	0
    50%	274	62000	4616	3	1	2	0
    75%	413.25	82000	6370	3	2	2	1
    max	546	190000	16200	6	4	4	3
    mean	272.943	68100.396	5181.795	2.965	1.293	1.803	0.685
    count	492	492	492	492	492	492	492
    
    Categorical Columns with their Distinct values:
    ColumnName                DistinctValueCount
    driveway                  2         
    recroom                   2         
    fullbase                  2         
    gashw                     2         
    airco                     2         
    prefarea                  2         
    homestyle                 3         
    
    No Futile columns found.
    
    Target Column Distribution:
    
    Columns with outlier percentage :-                                                                           
      ColumnName  OutlierPercentage
    0    lotsize           2.235772
    1   bedrooms           2.235772
    2   garagepl           2.235772
    3    stories           7.113821
    4      price           2.439024
    5    bathrms           0.203252
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Feature Engineering started ...
    
    Handling duplicate records present in dataset ...
    Analysis completed. No action taken.                                                    
    
    Total time to handle duplicate records: 1.49 sec
    
    Handling less significant features from data ...
    Analysis indicates all categorical columns are significant. No action Needed.           
    
    Total time to handle less significant features: 15.14 sec
    
    Handling Date Features ...
    Analysis Completed. Dataset does not contain any feature related to dates. No action needed.
    
    Total time to handle date features: 0.00 sec
    Proceeding with default option for missing value imputation.                             
    Proceeding with default option for handling remaining missing values.                    
    
    Checking Missing values in dataset ...
    Analysis Completed. No Missing Values Detected.                                          
    
    Total time to find missing values in data: 7.37 sec
    
    Imputing Missing Values ...
    Analysis completed. No imputation required.                                              
    
    Time taken to perform imputation: 0.01 sec
    No information provided for Equal-Width Transformation.                                  
    
    Variable-Width binning information:-
    ColumnName	MinValue	MaxValue	Label
    0	bedrooms	0	2	small_house
    1	bedrooms	3	5	big_house
    
    2 rows X 4 columns
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816329745055"'0
    
    Updated dataset sample after performing Variable-Width binning:
    bathrms	lotsize	airco	gashw	garagepl	id	recroom	sn	driveway	stories	prefarea	fullbase	homestyle	price	bedrooms
    3	4410.0	no	no	2	118	no	257	yes	2	no	yes	Eclectic	71000.0	big_house
    3	8580.0	no	no	2	510	no	44	yes	2	no	no	Eclectic	92000.0	big_house
    3	3630.0	no	no	0	92	yes	55	no	2	no	no	Classic	38000.0	big_house
    3	2610.0	no	no	0	73	no	156	no	2	no	no	Eclectic	60000.0	big_house
    3	7500.0	yes	no	2	338	no	338	yes	1	yes	yes	bungalow	155000.0	big_house
    3	6000.0	no	yes	2	66	yes	217	yes	2	no	yes	bungalow	138300.0	big_house
    3	3300.0	no	no	0	291	no	102	yes	2	no	yes	Eclectic	79000.0	big_house
    1	10240.0	yes	no	2	96	no	421	yes	1	yes	no	Eclectic	68000.0	small_house
    1	6000.0	yes	no	1	59	no	324	yes	1	no	no	Eclectic	98000.0	big_house
    1	6060.0	no	no	0	123	yes	91	yes	1	no	yes	Classic	47000.0	big_house
    
    492 rows X 15 columns
    Skipping customized string manipulation.⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 25% - 5/20
    
    Starting Customized Categorical Feature Encoding ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816245888910"'0
    
    Updated dataset sample after performing ordinal encoding:
    bathrms	bedrooms	lotsize	gashw	garagepl	id	recroom	sn	driveway	stories	prefarea	airco	fullbase	price	homestyle
    3	big_house	3630.0	no	0	92	yes	55	no	2	no	no	no	38000.0	1
    3	big_house	8580.0	no	2	76	no	362	yes	4	yes	yes	no	145000.0	0
    3	big_house	6000.0	yes	2	66	yes	217	yes	2	no	no	yes	138300.0	0
    3	big_house	3300.0	no	0	291	no	102	yes	2	no	no	yes	79000.0	2
    3	big_house	8580.0	no	2	510	no	44	yes	2	no	no	no	92000.0	2
    3	big_house	4410.0	no	2	118	no	257	yes	2	no	no	yes	71000.0	2
    3	big_house	5960.0	no	1	194	yes	127	yes	2	no	no	yes	117000.0	0
    1	big_house	7020.0	no	2	262	no	388	yes	1	yes	yes	yes	85000.0	2
    1	small_house	6800.0	no	2	121	yes	354	yes	1	no	no	yes	86000.0	2
    1	small_house	3640.0	no	1	9	no	265	yes	1	no	no	no	50000.0	1
    
    492 rows X 15 columns
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713814972077644"'0
    
    Updated dataset sample after performing target encoding:
    prefarea	bedrooms	bathrms	lotsize	gashw	garagepl	id	recroom	sn	driveway	stories	airco	homestyle	fullbase	price
    62906.33597883598	big_house	1	3300.0	no	1	33	no	17	no	2	no	1	no	40500.0
    62906.33597883598	small_house	1	8400.0	no	1	495	no	494	yes	1	no	2	no	54000.0
    62906.33597883598	big_house	1	4500.0	no	0	488	no	145	no	2	yes	2	yes	57250.0
    62906.33597883598	big_house	1	4046.0	no	1	228	no	348	yes	2	no	2	yes	59500.0
    62906.33597883598	small_house	1	2640.0	no	1	173	no	211	no	1	no	1	no	40500.0
    62906.33597883598	big_house	1	5200.0	no	0	111	no	320	yes	3	yes	2	no	83000.0
    83851.72413793103	big_house	1	2145.0	no	0	387	no	460	yes	2	no	1	yes	47000.0
    83851.72413793103	small_house	1	10360.0	no	1	353	no	477	yes	1	no	2	no	61500.0
    83851.72413793103	big_house	1	7000.0	no	2	360	no	399	yes	1	no	2	yes	82900.0
    83851.72413793103	big_house	1	6600.0	no	3	93	no	360	yes	4	yes	0	no	107000.0
    
    492 rows X 15 columns
    
    Performing encoding for categorical columns ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815195827284"'0
    
    ONE HOT Encoding these Columns:
    ['bedrooms', 'gashw', 'recroom', 'driveway', 'airco', 'fullbase']
    
    Sample of dataset after performing one hot encoding:
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	sn	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    83851.72413793103	1	0	0	1	7160.0	1	0	2	124	1	0	379	0	1	1	1	0	2	0	1	84000.0
    83851.72413793103	1	0	0	1	5020.0	1	0	0	444	1	0	393	0	1	4	0	1	2	1	0	96000.0
    83851.72413793103	1	0	0	1	3800.0	1	0	1	404	0	1	456	0	1	2	1	0	2	0	1	75000.0
    83851.72413793103	1	0	0	1	3520.0	1	0	0	183	1	0	438	0	1	2	1	0	2	1	0	60000.0
    83851.72413793103	1	0	0	1	2880.0	1	0	0	257	1	0	424	0	1	2	1	0	2	1	0	62900.0
    83851.72413793103	1	0	0	1	9620.0	1	0	2	393	1	0	391	0	1	1	1	0	2	0	1	86900.0
    62906.33597883598	1	0	0	1	3300.0	1	0	1	33	1	0	17	1	0	2	1	0	1	1	0	40500.0
    62906.33597883598	0	0	1	1	8400.0	1	0	1	495	1	0	494	0	1	1	1	0	2	1	0	54000.0
    62906.33597883598	1	0	0	1	4500.0	1	0	0	488	1	0	145	1	0	2	0	1	2	0	1	57250.0
    62906.33597883598	1	0	0	1	4046.0	1	0	1	228	1	0	348	0	1	2	1	0	2	0	1	59500.0
    
    492 rows X 22 columns
    
    Time taken to encode the columns: 13.96 sec
    
    Starting customized mathematical transformation ...
    Skipping customized mathematical transformation.                                         
    
    Starting customized non-linear transformation ...
    Skipping customized non-linear transformation.                                           
    
    Starting customized anti-select columns ...
    
    Updated dataset sample after performing anti-select columns:
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    83851.72413793103	1	0	0	1	7160.0	1	0	2	124	1	0	0	1	1	1	0	2	0	1	84000.0
    83851.72413793103	1	0	0	1	5020.0	1	0	0	444	1	0	0	1	4	0	1	2	1	0	96000.0
    83851.72413793103	1	0	0	1	3800.0	1	0	1	404	0	1	0	1	2	1	0	2	0	1	75000.0
    83851.72413793103	1	0	0	1	3520.0	1	0	0	183	1	0	0	1	2	1	0	2	1	0	60000.0
    83851.72413793103	1	0	0	1	2880.0	1	0	0	257	1	0	0	1	2	1	0	2	1	0	62900.0
    83851.72413793103	1	0	0	1	9620.0	1	0	2	393	1	0	0	1	1	1	0	2	0	1	86900.0
    62906.33597883598	1	0	0	1	3300.0	1	0	1	33	1	0	1	0	2	1	0	1	1	0	40500.0
    62906.33597883598	0	0	1	1	8400.0	1	0	1	495	1	0	0	1	1	1	0	2	1	0	54000.0
    62906.33597883598	1	0	0	1	4500.0	1	0	0	488	1	0	1	0	2	0	1	2	0	1	57250.0
    62906.33597883598	1	0	0	1	4046.0	1	0	1	228	1	0	0	1	2	1	0	2	0	1	59500.0
    
    492 rows X 21 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Data preparation started ...
    
    Spliting of dataset into training and testing ...
    Training size : 0.75                                                                      
    Testing size  : 0.25                                                                      
    
    Training data sample
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    62906.33597883598	0	0	1	1	6360.0	1	0	1	12	1	0	0	1	1	0	1	2	0	1	63900.0
    62906.33597883598	1	0	0	1	8372.0	1	0	2	16	1	0	0	1	3	0	1	2	1	0	87000.0
    62906.33597883598	1	0	0	2	4500.0	1	0	0	17	1	0	1	0	2	0	1	2	0	1	57000.0
    62906.33597883598	1	0	0	1	6840.0	1	0	1	18	0	1	0	1	2	0	1	0	0	1	116000.0
    62906.33597883598	1	0	0	1	5800.0	0	1	2	21	1	0	0	1	1	1	0	2	1	0	60000.0
    62906.33597883598	0	0	1	1	3649.0	1	0	0	22	1	0	0	1	1	1	0	1	1	0	27000.0
    83851.72413793103	0	0	1	1	5320.0	1	0	1	15	1	0	0	1	1	1	0	1	1	0	49500.0
    83851.72413793103	1	0	0	2	6600.0	1	0	0	29	0	1	0	1	2	1	0	2	0	1	78000.0
    83851.72413793103	1	0	0	1	11440.0	1	0	1	37	1	0	0	1	2	1	0	0	0	1	104900.0
    83851.72413793103	1	0	0	1	6360.0	1	0	0	41	1	0	0	1	3	1	0	2	1	0	80000.0
    
    369 rows X 21 columns
    Testing data sample
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    62906.33597883598	1	0	0	2	8880.0	1	0	1	13	1	0	0	1	2	0	1	2	0	1	99000.0
    62906.33597883598	1	0	0	1	2400.0	1	0	0	34	1	0	1	0	1	1	0	1	1	0	25245.0
    62906.33597883598	1	0	0	2	9800.0	1	0	2	36	0	1	0	1	2	1	0	2	1	0	75000.0
    62906.33597883598	1	0	0	1	3000.0	1	0	0	38	1	0	0	1	2	1	0	2	1	0	56000.0
    62906.33597883598	0	0	1	1	4040.0	1	0	1	67	1	0	0	1	2	1	0	2	1	0	58500.0
    62906.33597883598	1	0	0	1	2970.0	1	0	0	72	1	0	0	1	3	1	0	2	1	0	70000.0
    83851.72413793103	1	0	0	1	11460.0	1	0	2	19	1	0	0	1	3	1	0	2	1	0	83900.0
    83851.72413793103	1	0	0	2	5500.0	1	0	1	27	1	0	0	1	2	0	1	0	0	1	120000.0
    83851.72413793103	1	0	0	2	4880.0	1	0	1	40	1	0	0	1	2	0	1	0	1	0	118500.0
    83851.72413793103	1	0	0	1	2145.0	1	0	0	75	1	0	0	1	3	1	0	1	1	0	49500.0
    
    123 rows X 21 columns
    
    Time taken for spliting of data: 14.48 sec
    
    Starting customized outlier processing ...
    Columns with outlier percentage :-                                                                           
      ColumnName  OutlierPercentage
    0   garagepl           2.235772
    1      price           8.739837
    2         id           9.756098
    3    lotsize           9.552846
    4    bathrms           2.235772
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815695090274"'
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816723333520"'20
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816253699382"'
    
    Feature selection using lasso ...
    
    feature selected by lasso:
    ['bathrms', 'fullbase_1', 'gashw_0', 'driveway_0', 'stories', 'airco_1', 'gashw_1', 'bedrooms_0', 'bedrooms_2', 'driveway_1', 'garagepl', 'recroom_0', 'fullbase_0', 'homestyle', 'airco_0', 'prefarea', 'lotsize']
    
    Total time taken by feature selection: 1.43 sec
    
    scaling Features of lasso data ...
    
    columns that will be scaled:
    ['bathrms', 'stories', 'garagepl', 'homestyle', 'prefarea', 'lotsize']
    
    Training dataset sample after scaling:
    driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	bathrms	stories	garagepl	homestyle	prefarea	lotsize
    1	0	47000.0	1	1	1	56	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.6389834756925418
    1	1	52000.0	1	1	1	26	0	0	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.7437609001391254
    1	0	64000.0	1	1	1	245	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	0.3943732959927247	0.7474571831400929	-0.552679423766173	-0.5219348213558176
    1	0	78000.0	1	1	0	175	0	1	1	0	0	0	-0.5698449326198071	2.5617608810097847	-0.7757094582336237	0.7474571831400929	-0.552679423766173	0.5022409040905186
    0	0	40500.0	1	1	1	33	1	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	0.3943732959927247	-0.7516329215933895	-0.552679423766173	-0.87119290284443
    1	1	57000.0	0	1	0	100	0	0	0	1	0	1	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	1.8093671611394	-0.5172151175519175
    1	0	57500.0	1	1	1	427	0	1	0	1	0	0	-0.5698449326198071	-0.8999912756370636	2.734538804445422	0.7474571831400929	-0.552679423766173	0.11994489597460505
    1	1	75000.0	1	1	1	116	0	0	1	0	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.41810133767001395
    1	0	52000.0	0	1	1	28	0	1	1	0	0	1	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-1.012784016961435
    0	0	46000.0	1	1	1	32	1	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.956147571314633
    
    359 rows X 19 columns
    
    Testing dataset sample after scaling:
    driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	bathrms	stories	garagepl	homestyle	prefarea	lotsize
    0	0	25245.0	1	1	1	34	1	1	0	1	0	0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.295966245195445
    1	0	56000.0	1	1	1	38	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-1.012784016961435
    0	0	47900.0	1	1	1	122	1	1	0	1	0	0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.15437513107844
    1	1	51000.0	1	1	1	140	0	0	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.9419884599029325
    0	1	44000.0	1	1	1	452	1	0	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.409239136489049
    1	1	60000.0	0	1	0	153	0	0	0	1	0	1	-0.5698449326198071	-0.8999912756370636	1.5644560502190732	0.7474571831400929	-0.552679423766173	0.33233156715011253
    1	0	47500.0	0	1	1	142	0	1	0	1	0	1	-0.5698449326198071	-0.8999912756370636	0.3943732959927247	-0.7516329215933895	-0.552679423766173	-0.5219348213558176
    1	0	59900.0	1	1	1	400	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	0.3943732959927247	0.7474571831400929	-0.552679423766173	-0.8003973457859275
    1	0	92000.0	1	1	1	510	0	1	0	1	0	0	4.079571676709981	0.2539261099118858	1.5644560502190732	0.7474571831400929	-0.552679423766173	1.6208107056148582
    0	1	70000.0	1	1	0	147	1	0	1	0	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.4959764504343667
    
    123 rows X 19 columns
    
    Total time taken by feature scaling: 54.15 sec
    
    Feature selection using rfe ...
    
    feature selected by RFE:
    ['bathrms', 'fullbase_1', 'gashw_0', 'driveway_0', 'stories', 'airco_1', 'gashw_1', 'bedrooms_0', 'bedrooms_2', 'driveway_1', 'garagepl', 'recroom_0', 'fullbase_0', 'homestyle', 'airco_0', 'recroom_1', 'prefarea', 'lotsize']
    
    Total time taken by feature selection: 55.81 sec
    
    scaling Features of rfe data ...
    
    columns that will be scaled:
    ['r_bathrms', 'r_stories', 'r_garagepl', 'r_homestyle', 'r_prefarea', 'r_lotsize']
    
    Training dataset sample after scaling:
    r_gashw_0	r_fullbase_1	r_driveway_1	r_recroom_1	r_bedrooms_0	r_bedrooms_2	id	r_recroom_0	r_gashw_1	r_driveway_0	r_airco_0	r_airco_1	r_fullbase_0	price	r_bathrms	r_stories	r_garagepl	r_homestyle	r_prefarea	r_lotsize
    1	0	1	0	1	0	56	1	0	0	1	0	1	47000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.6389834756925418
    1	1	1	0	1	0	26	1	0	0	1	0	0	52000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.7437609001391254
    1	0	1	0	1	0	245	1	0	0	1	0	1	64000.0	-0.5698449326198071	0.2539261099118858	0.3943732959927247	0.7474571831400929	-0.552679423766173	-0.5219348213558176
    1	0	1	1	1	0	175	0	0	0	0	1	1	78000.0	-0.5698449326198071	2.5617608810097847	-0.7757094582336237	0.7474571831400929	-0.552679423766173	0.5022409040905186
    1	0	0	0	1	0	33	1	0	1	1	0	1	40500.0	-0.5698449326198071	0.2539261099118858	0.3943732959927247	-0.7516329215933895	-0.552679423766173	-0.87119290284443
    1	1	1	1	0	1	100	0	0	0	1	0	0	57000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	1.8093671611394	-0.5172151175519175
    1	0	1	0	1	0	427	1	0	0	1	0	1	57500.0	-0.5698449326198071	-0.8999912756370636	2.734538804445422	0.7474571831400929	-0.552679423766173	0.11994489597460505
    1	1	1	0	1	0	116	1	0	0	0	1	0	75000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.41810133767001395
    1	0	1	0	0	1	28	1	0	0	0	1	1	52000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-1.012784016961435
    1	0	0	0	1	0	32	1	0	1	1	0	1	46000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.956147571314633
    
    359 rows X 20 columns
    
    Testing dataset sample after scaling:
    r_gashw_0	r_fullbase_1	r_driveway_1	r_recroom_1	r_bedrooms_0	r_bedrooms_2	id	r_recroom_0	r_gashw_1	r_driveway_0	r_airco_0	r_airco_1	r_fullbase_0	price	r_bathrms	r_stories	r_garagepl	r_homestyle	r_prefarea	r_lotsize
    1	0	0	0	1	0	34	1	0	1	1	0	1	25245.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.295966245195445
    1	0	1	0	1	0	38	1	0	0	1	0	1	56000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-1.012784016961435
    1	0	0	0	1	0	122	1	0	1	1	0	1	47900.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.15437513107844
    1	1	1	0	1	0	140	1	0	0	1	0	0	51000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.9419884599029325
    1	1	0	0	1	0	452	1	0	1	1	0	0	44000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.409239136489049
    1	1	1	1	0	1	153	0	0	0	1	0	0	60000.0	-0.5698449326198071	-0.8999912756370636	1.5644560502190732	0.7474571831400929	-0.552679423766173	0.33233156715011253
    1	0	1	0	0	1	142	1	0	0	1	0	1	47500.0	-0.5698449326198071	-0.8999912756370636	0.3943732959927247	-0.7516329215933895	-0.552679423766173	-0.5219348213558176
    1	0	1	0	1	0	400	1	0	0	1	0	1	59900.0	-0.5698449326198071	0.2539261099118858	0.3943732959927247	0.7474571831400929	-0.552679423766173	-0.8003973457859275
    1	0	1	0	1	0	510	1	0	0	1	0	1	92000.0	4.079571676709981	0.2539261099118858	1.5644560502190732	0.7474571831400929	-0.552679423766173	1.6208107056148582
    1	1	0	1	1	0	147	0	0	1	0	1	0	70000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.4959764504343667
    
    123 rows X 20 columns
    
    Total time taken by feature scaling: 51.49 sec
    
    scaling Features of pca data ...
    
    columns that will be scaled:
    ['prefarea', 'bathrms', 'lotsize', 'garagepl', 'stories', 'homestyle']
    
    Training dataset sample after scaling:
    bedrooms_1	driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	recroom_1	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	prefarea	bathrms	lotsize	garagepl	stories	homestyle
    0	1	0	65500.0	1	1	1	44	0	0	1	0	1	0	0	1.8093671611393762	-0.5698449326198075	-0.6163288974338209	0.39437329599272386	0.2539261099118857	0.7474571831400939
    0	1	0	54000.0	1	1	1	54	0	0	1	0	1	0	0	1.8093671611393762	-0.5698449326198075	-1.0807477517375974	-0.7757094582336221	1.4078434954608345	0.7474571831400939
    0	1	1	61100.0	1	1	1	58	0	0	0	0	1	0	0	1.8093671611393762	-0.5698449326198075	-0.8239958648054283	1.5644560502190699	0.2539261099118857	0.7474571831400939
    0	1	0	51500.0	0	1	1	61	0	0	1	0	1	0	1	1.8093671611393762	-0.5698449326198075	-0.5408136365714183	-0.7757094582336221	-0.8999912756370632	0.7474571831400939
    0	1	1	95000.0	1	1	0	81	0	1	0	1	0	0	0	1.8093671611393762	-0.5698449326198075	0.26153601009161004	1.5644560502190699	-0.8999912756370632	0.7474571831400939
    0	1	0	103500.0	1	1	0	88	0	1	1	1	0	0	0	1.8093671611393762	1.7548633720450884	1.8190382653786652	0.39437329599272386	2.5617608810097834	-2.2507230263268747
    0	1	1	63900.0	0	1	1	12	0	0	0	1	0	0	1	-0.5526794237661955	-0.5698449326198075	0.5730364611490211	0.39437329599272386	-0.8999912756370632	0.7474571831400939
    0	1	0	87000.0	1	1	1	16	0	0	1	1	0	0	0	-0.5526794237661955	-0.5698449326198075	1.5226408664937345	1.5644560502190699	1.4078434954608345	0.7474571831400939
    0	0	1	57000.0	1	1	1	17	1	0	0	1	0	0	0	-0.5526794237661955	1.7548633720450884	-0.30482844637640993	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    0	1	1	116000.0	1	1	0	18	0	1	0	1	0	0	0	-0.5526794237661955	-0.5698449326198075	0.7995822437362291	0.39437329599272386	0.2539261099118857	-2.2507230263268747
    
    359 rows X 21 columns
    
    Testing dataset sample after scaling:
    bedrooms_1	driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	recroom_1	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	prefarea	bathrms	lotsize	garagepl	stories	homestyle
    0	1	1	99000.0	1	1	1	13	0	0	0	1	0	0	0	-0.5526794237661955	1.7548633720450884	1.7624018197318632	0.39437329599272386	0.2539261099118857	0.7474571831400939
    0	0	0	25245.0	1	1	1	34	1	0	1	0	1	0	0	-0.5526794237661955	-0.5698449326198075	-1.295966245195445	-0.7757094582336221	-0.8999912756370632	-0.7516329215933905
    0	1	0	75000.0	1	1	0	36	0	1	1	0	1	0	0	-0.5526794237661955	1.7548633720450884	2.1966145696906785	1.5644560502190699	0.2539261099118857	0.7474571831400939
    0	1	0	56000.0	1	1	1	38	0	0	1	0	1	0	0	-0.5526794237661955	-0.5698449326198075	-1.012784016961435	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    0	1	0	58500.0	0	1	1	67	0	0	1	0	1	0	1	-0.5526794237661955	-0.5698449326198075	-0.5219348213558176	0.39437329599272386	0.2539261099118857	0.7474571831400939
    0	1	0	70000.0	1	1	1	72	0	0	1	0	1	0	0	-0.5526794237661955	-0.5698449326198075	-1.0269431283731354	-0.7757094582336221	1.4078434954608345	0.7474571831400939
    0	1	0	83900.0	1	1	1	19	0	0	1	0	1	0	0	1.8093671611393762	-0.5698449326198075	2.980085401138106	1.5644560502190699	1.4078434954608345	0.7474571831400939
    0	1	1	120000.0	1	1	1	27	0	0	0	1	0	0	0	1.8093671611393762	1.7548633720450884	0.16714193401360672	0.39437329599272386	0.2539261099118857	-2.2507230263268747
    0	1	0	118500.0	1	1	1	40	0	0	1	1	0	0	0	1.8093671611393762	1.7548633720450884	-0.12547970182820362	0.39437329599272386	0.2539261099118857	-2.2507230263268747
    0	1	0	49500.0	1	1	1	75	0	0	1	0	1	0	0	1.8093671611393762	-0.5698449326198075	-1.4163186921948991	-0.7757094582336221	1.4078434954608345	-0.7516329215933905
    
    123 rows X 21 columns
    
    Total time taken by feature scaling: 50.44 sec
    
    Dimension Reduction using pca ...
    
    PCA columns:
    ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9']
    
    Total time taken by PCA: 10.63 sec
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    
    Model Training started ...
    
    Starting customized hyperparameter update ...
    
    Completed customized hyperparameter update.
    
    Hyperparameters used for model training:
    response_column : price                                                                                                                               
    name : xgboost
    model_type : Regression
    column_sampling : (1, 0.6)
    min_impurity : (0.0, 0.1, 0.2, 0.3)
    lambda1 : (0.01, 0.1, 1, 10)
    shrinkage_factor : (0.5, 0.01, 0.05, 0.1)
    max_depth : (5, 3, 4, 7, 8)
    min_node_size : (1, 2, 3, 4)
    iter_num : (10, 20, 30, 40)
    Total number of models for xgboost : 10240
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    response_column : price
    name : decision_forest
    tree_type : Regression
    min_impurity : (0.0, 0.1, 0.2, 0.3)
    max_depth : (5, 3, 4, 7, 8)
    min_node_size : (1, 2, 3, 4)
    num_trees : (-1, 20, 30, 40)
    Total number of models for decision_forest : 320
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Performing hyperParameter tuning ...
    
    xgboost
    
    ----------------------------------------------------------------------------------------------------
    
    decision_forest
    
    ----------------------------------------------------------------------------------------------------
    
    Evaluating models performance ...
    
    Evaluation completed.
    
    Leaderboard
    Rank	Model-ID	Feature-Selection	MAE	MSE	MSLE	RMSE	RMSLE	R2-score	Adjusted R2-score
    0	1	XGBOOST_2	pca	9742.422518	2.064480e+08	0.035378	14368.297209	0.188091	0.690732	0.663119
    1	2	DECISIONFOREST_1	rfe	11097.622077	2.093546e+08	0.041378	14469.090168	0.203416	0.686378	0.632097
    2	3	XGBOOST_1	rfe	11162.817285	2.219267e+08	0.050646	14897.203438	0.225047	0.667544	0.610004
    3	4	DECISIONFOREST_2	pca	10892.666297	2.371269e+08	0.041082	15398.924619	0.202686	0.644773	0.613057
    4	5	DECISIONFOREST_0	lasso	12588.499123	3.056303e+08	0.048797	17482.284634	0.220900	0.542152	0.468025
    5	6	XGBOOST_3	lasso	12186.337469	3.168533e+08	0.047248	17800.374075	0.217365	0.525340	0.448490
    6	7	XGBOOST_0	lasso	12186.337469	3.168533e+08	0.047248	17800.374075	0.217365	0.525340	0.448490
    7	8	DECISIONFOREST_3	lasso	14983.007904	4.247282e+08	0.066943	20608.935846	0.258733	0.363738	0.260724
    
    8 rows X 10 columns
    
    
    1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation
    Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 20/20
  5. Display model leaderboard.
    >>> aml.leaderboard()
    Rank	Model-ID	Feature-Selection	MAE	MSE	MSLE	RMSE	RMSLE	R2-score	Adjusted R2-score
    0	1	XGBOOST_2	pca	9742.422518	2.064480e+08	0.035378	14368.297209	0.188091	0.690732	0.663119
    1	2	DECISIONFOREST_1	rfe	11097.622077	2.093546e+08	0.041378	14469.090168	0.203416	0.686378	0.632097
    2	3	XGBOOST_1	rfe	11162.817285	2.219267e+08	0.050646	14897.203438	0.225047	0.667544	0.610004
    3	4	DECISIONFOREST_2	pca	10892.666297	2.371269e+08	0.041082	15398.924619	0.202686	0.644773	0.613057
    4	5	DECISIONFOREST_0	lasso	12588.499123	3.056303e+08	0.048797	17482.284634	0.220900	0.542152	0.468025
    5	6	XGBOOST_3	lasso	12186.337469	3.168533e+08	0.047248	17800.374075	0.217365	0.525340	0.448490
    6	7	XGBOOST_0	lasso	12186.337469	3.168533e+08	0.047248	17800.374075	0.217365	0.525340	0.448490
    7	8	DECISIONFOREST_3	lasso	14983.007904	4.247282e+08	0.066943	20608.935846	0.258733	0.363738	0.260724
  6. Display the best performing model.
    >>> aml.leader()
    Rank	Model-ID	Feature-Selection	MAE	MSE	MSLE	RMSE	RMSLE	R2-score	Adjusted R2-score
    0	1	XGBOOST_2	pca	9742.422518	2.064480e+08	0.035378	14368.297209	0.188091	0.690732	0.663119
    
  7. Generate prediction on validation dataset using best performing model.
    In the data preparation phase, AutoML generates the validation dataset by splitting the data provided during fitting into training and testing sets. AutoML's model training utilizes the training data, with the testing data acting as the validation dataset for model evaluation.
    >>> prediction = aml.predict()
    Following model is being used for generating prediction :
    Model ID : XGBOOST_2 
    Feature Selection Method : pca
    
     Prediction : 
       id     Prediction  Confidence_Lower  Confidence_upper     price
    0  10   63809.865056      33717.239917      93902.490195   54500.0
    1  13   92935.646586      46824.448096     139046.845076   99000.0
    2  40  105986.892125      59432.832713     152540.951537  118500.0
    3  24   79861.408889      40881.522757     118841.295022   99000.0
    4  34   38466.813258      21251.107246      55682.519270   25245.0
    5  97   98560.011426      47662.676958     149457.345895  106000.0
    6  75   58383.595048      29686.945507      87080.244588   49500.0
    7  27   96430.203680      51016.647630     141843.759730  120000.0
    8  19   86654.297469      44430.989001     128877.605937   83900.0
    9   9   41575.195612      22618.003199      60532.388024   50000.0
    
     Performance Metrics : 
               MAE           MSE      MSLE       MAPE       MPE          RMSE     RMSLE            ME        R2        EV          MPD       MGD
    0  9742.422518  2.064480e+08  0.035378  13.954839  0.188855  14368.297209  0.188091  63470.842062  0.690732  0.704693  2592.517245  0.036744
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	price
    13	92935.64658575002	46824.44809589475	139046.84507560526	99000.0
    24	79861.40888912501	40881.52275672913	118841.2950215209	99000.0
    27	96430.20368	51016.64762957162	141843.7597304284	120000.0
    34	38466.813258	21251.10724581299	55682.51927018701	25245.0
    38	59323.84226600001	34078.64692764714	84569.03760435287	56000.0
    40	105986.892125125	59432.832712888114	152540.9515373619	118500.0
    36	89466.34093637503	45426.43042440514	133506.2514483449	75000.0
    19	86654.29746912501	44430.989000780464	128877.60593746956	83900.0
    10	63809.86505600001	33717.23991664394	93902.49019535608	54500.0
    9	41575.195611500014	22618.00319854991	60532.38802445012	50000.0
  8. Generate prediction on validation dataset using third best performing model.
    >>> prediction = aml.predict(rank=3)
    Following model is being used for generating prediction :
    Model ID : XGBOOST_1 
    Feature Selection Method : rfe
     Prediction : 
        id    Prediction  Confidence_Lower  Confidence_upper    price
    0   34  35023.283882      -4837.186404      74883.754169  25245.0
    1   38  55898.738253      -8963.818351     120761.294857  56000.0
    2  122  35023.283882      -4837.186404      74883.754169  47900.0
    3  140  55049.909953      -9742.813129     119842.633036  51000.0
    4  452  37547.897706      -4255.979615      79351.775028  44000.0
    5  153  67706.129535      -7797.877417     143210.136487  60000.0
    6  142  53574.225636     -10035.903135     117184.354408  47500.0
    7  400  58698.929647      -7337.989775     124735.849070  59900.0
    8  510  74609.789819     -17587.700400     166807.280037  92000.0
    9  147  76221.930629     -11523.433164     163967.294423  70000.0
     Performance Metrics : 
                MAE           MSE      MSLE      MAPE       MPE          RMSE     RMSLE            ME        R2        EV          MPD       MGD
    0  11162.817285  2.219267e+08  0.050646  18.09273 -3.630758  14897.203438  0.225047  71234.841896  0.667544  0.668922  3045.942002  0.047478
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	price
    13	88394.47405500001	-13505.516260407865	190294.4643704079	99000.0
    24	70578.8232925	-5948.346164785151	147105.99274978516	99000.0
    27	104401.63842549999	-12692.073419267515	221495.3502702675	120000.0
    34	35023.2838825	-4837.186403614331	74883.75416861434	25245.0
    38	55898.738252999996	-8963.81835091576	120761.29485691575	56000.0
    40	107643.55437249999	-17549.808679008158	232836.91742400813	118500.0
    36	72512.602488	-14861.542217641778	159886.7471936418	75000.0
    19	89922.82769049998	-10263.101185138774	190108.75656613873	83900.0
    10	51887.02052199999	-10237.521547671604	114011.5625916716	54500.0
    9	36789.518904	-6212.072251334146	79791.11005933414	50000.0
  9. Generate prediction on test dataset using best performing model.
    >>> prediction = aml.predict(housing_test)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713817224640943"'
    
    Updated dataset after performing customized variable width bin-code transformation :
    bathrms	lotsize	airco	gashw	garagepl	id	recroom	sn	driveway	stories	prefarea	fullbase	homestyle	price	bedrooms
    1	3520.0	no	no	0	51	no	443	yes	1	yes	no	Eclectic	65000.0	big_house
    1	4350.0	no	yes	1	20	no	198	no	2	no	no	Classic	40500.0	big_house
    1	3162.0	yes	no	1	49	no	161	yes	2	no	no	Eclectic	63900.0	big_house
    1	3750.0	no	no	0	43	no	140	yes	2	no	no	Classic	43000.0	big_house
    1	5076.0	no	no	0	52	no	111	no	1	no	no	Classic	43000.0	big_house
    1	7980.0	no	no	2	69	no	353	yes	1	no	no	Eclectic	78500.0	big_house
    1	3760.0	no	yes	2	37	no	117	yes	2	no	no	Eclectic	93000.0	big_house
    1	5000.0	no	no	0	67	no	317	yes	4	no	no	Eclectic	80000.0	big_house
    1	3000.0	no	no	2	44	no	239	yes	1	no	yes	Classic	26000.0	small_house
    1	5400.0	no	no	0	28	no	177	yes	2	no	no	Eclectic	70000.0	big_house
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815770435510"'
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815975550667"'
    
    Updated dataset after performing customized categorical encoding :
    prefarea	bedrooms	bathrms	lotsize	gashw	garagepl	id	recroom	sn	driveway	stories	airco	homestyle	fullbase	price
    62906.33597883598	small_house	1	3180.0	no	0	59	no	195	yes	1	no	1	no	33000.0
    62906.33597883598	small_house	1	5885.0	no	1	29	no	306	yes	1	yes	2	no	64000.0
    62906.33597883598	big_house	1	4360.0	no	0	15	no	255	yes	2	no	2	no	61000.0
    62906.33597883598	big_house	1	5170.0	no	0	12	no	38	yes	4	yes	2	no	67000.0
    62906.33597883598	small_house	1	3185.0	no	0	23	no	16	yes	1	yes	1	no	37900.0
    62906.33597883598	small_house	1	9166.0	no	2	11	no	53	yes	1	yes	2	yes	68000.0
    83851.72413793103	big_house	1	2787.0	no	0	27	no	472	yes	1	no	2	yes	60500.0
    83851.72413793103	small_house	1	2176.0	no	0	8	yes	469	yes	2	no	2	no	55000.0
    83851.72413793103	big_house	1	7410.0	no	2	40	yes	401	yes	1	yes	2	yes	92500.0
    83851.72413793103	big_house	1	3520.0	no	2	39	no	441	yes	1	no	2	no	51900.0
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816198052184"'
    
    Updated dataset after performing categorical encoding :
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	sn	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    83851.72413793103	1	0	0	1	9000.0	1	0	1	33	1	0	411	0	1	1	1	0	2	0	1	90000.0
    83851.72413793103	1	0	0	1	2398.0	1	0	0	21	1	0	459	0	1	1	1	0	1	1	0	44555.0
    83851.72413793103	1	0	0	1	6862.0	1	0	2	19	1	0	440	0	1	2	0	1	2	1	0	69000.0
    83851.72413793103	1	0	0	1	3520.0	1	0	0	51	1	0	443	0	1	1	1	0	2	1	0	65000.0
    83851.72413793103	1	0	0	1	3520.0	1	0	2	39	1	0	441	0	1	1	1	0	2	1	0	51900.0
    83851.72413793103	0	0	1	1	2176.0	1	0	0	8	0	1	469	0	1	2	1	0	2	1	0	55000.0
    62906.33597883598	0	0	1	1	3180.0	1	0	0	59	1	0	195	0	1	1	1	0	1	1	0	33000.0
    62906.33597883598	0	0	1	1	5885.0	1	0	1	29	1	0	306	0	1	1	0	1	2	1	0	64000.0
    62906.33597883598	1	0	0	1	4360.0	1	0	0	15	1	0	255	0	1	2	1	0	2	1	0	61000.0
    62906.33597883598	1	0	0	1	5170.0	1	0	0	12	1	0	38	0	1	4	0	1	2	1	0	67000.0
    
    Updated dataset after performing customized anti-selection :
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    62906.33597883598	1	0	0	1	4360.0	1	0	0	15	1	0	0	1	2	1	0	2	1	0	61000.0
    62906.33597883598	0	0	1	1	3185.0	1	0	0	23	1	0	0	1	1	0	1	1	1	0	37900.0
    62906.33597883598	0	0	1	1	9166.0	1	0	2	11	1	0	0	1	1	0	1	2	0	1	68000.0
    62906.33597883598	1	0	0	1	10700.0	1	0	0	16	0	1	0	1	2	1	0	2	0	1	72000.0
    62906.33597883598	0	0	1	1	4080.0	1	0	0	9	1	0	0	1	1	1	0	2	1	0	55000.0
    62906.33597883598	1	0	0	1	1700.0	1	0	0	17	1	0	0	1	2	1	0	1	1	0	27000.0
    83851.72413793103	1	0	0	1	7410.0	1	0	2	40	0	1	0	1	1	0	1	2	0	1	92500.0
    83851.72413793103	1	0	0	1	6825.0	1	0	0	32	0	1	0	1	1	0	1	2	0	1	77500.0
    83851.72413793103	1	0	0	1	9000.0	1	0	1	33	1	0	0	1	1	1	0	2	0	1	90000.0
    83851.72413793103	1	0	0	1	2610.0	1	0	0	13	1	0	0	1	2	1	0	1	0	1	49000.0
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713818406018603"'
    
    Updated dataset after performing Lasso feature selection:
    id	bathrms	fullbase_1	gashw_0	driveway_0	stories	airco_1	gashw_1	bedrooms_0	bedrooms_2	driveway_1	garagepl	recroom_0	fullbase_0	homestyle	airco_0	prefarea	lotsize	price
    31	1	1	1	0	2	1	0	1	0	1	0	1	0	2	0	62906.336	2953.0	60000.0
    9	1	0	1	0	1	0	0	0	1	1	0	1	1	2	1	62906.336	4080.0	55000.0
    17	1	0	1	0	2	0	0	1	0	1	0	1	1	1	1	62906.336	1700.0	27000.0
    25	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	62906.336	3500.0	44500.0
    10	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	62906.336	6000.0	41000.0
    36	1	0	1	1	1	0	0	0	1	0	0	1	1	1	1	62906.336	3970.0	32500.0
    8	1	0	1	0	2	0	0	0	1	1	0	0	1	2	1	83851.7241	2176.0	55000.0
    16	1	1	1	0	2	0	0	1	0	1	0	0	0	2	1	62906.336	10700.0	72000.0
    29	1	0	1	0	1	1	0	0	1	1	1	1	1	2	0	62906.336	5885.0	64000.0
    15	1	0	1	0	2	0	0	1	0	1	0	1	1	2	1	62906.336	4360.0	61000.0
    
    Updated dataset after performing scaling on Lasso selected features :
    driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	bathrms	stories	garagepl	homestyle	prefarea	lotsize
    1	1	60000.0	1	1	1	31	0	0	1	0	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-1.0349666248397658
    1	0	55000.0	0	1	1	9	0	1	0	1	0	1	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.503056006140217
    1	0	27000.0	1	1	1	17	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.6263455114684566
    1	0	44500.0	0	1	1	25	0	1	0	1	0	1	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.7767988267664266
    1	0	41000.0	0	1	1	10	0	1	0	1	0	1	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	0.4031271242086151
    0	0	32500.0	0	1	1	36	1	1	0	1	0	1	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.5549727479831188
    1	0	55000.0	0	1	0	8	0	1	0	1	0	1	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	1.8093671611394	-1.4016876104028086
    1	1	72000.0	1	1	0	16	0	0	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	2.6213879120416936
    1	0	64000.0	0	1	1	29	0	1	1	0	0	1	-0.5698449326198071	-0.8999912756370636	0.3943732959927247	0.7474571831400929	-0.552679423766173	0.34885053046376313
    1	0	61000.0	1	1	1	15	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.3709042996310123
    
    Updated dataset after performing RFE feature selection:
    id	bathrms	fullbase_1	gashw_0	driveway_0	stories	airco_1	gashw_1	bedrooms_0	bedrooms_2	driveway_1	garagepl	recroom_0	fullbase_0	homestyle	airco_0	recroom_1	prefarea	lotsize	price
    31	1	1	1	0	2	1	0	1	0	1	0	1	0	2	0	0	62906.336	2953.0	60000.0
    9	1	0	1	0	1	0	0	0	1	1	0	1	1	2	1	0	62906.336	4080.0	55000.0
    17	1	0	1	0	2	0	0	1	0	1	0	1	1	1	1	0	62906.336	1700.0	27000.0
    25	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	0	62906.336	3500.0	44500.0
    10	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	0	62906.336	6000.0	41000.0
    36	1	0	1	1	1	0	0	0	1	0	0	1	1	1	1	0	62906.336	3970.0	32500.0
    8	1	0	1	0	2	0	0	0	1	1	0	0	1	2	1	1	83851.7241	2176.0	55000.0
    16	1	1	1	0	2	0	0	1	0	1	0	0	0	2	1	1	62906.336	10700.0	72000.0
    29	1	0	1	0	1	1	0	0	1	1	1	1	1	2	0	0	62906.336	5885.0	64000.0
    15	1	0	1	0	2	0	0	1	0	1	0	1	1	2	1	0	62906.336	4360.0	61000.0
    
    Updated dataset after performing scaling on RFE selected features :
    r_gashw_0	r_fullbase_1	r_driveway_1	r_recroom_1	r_bedrooms_0	r_bedrooms_2	id	r_recroom_0	r_gashw_1	r_driveway_0	r_airco_0	r_airco_1	r_fullbase_0	price	r_bathrms	r_stories	r_garagepl	r_homestyle	r_prefarea	r_lotsize
    1	1	1	0	1	0	31	1	0	0	0	1	0	60000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-1.0349666248397658
    1	0	1	0	0	1	9	1	0	0	1	0	1	55000.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.503056006140217
    1	0	1	0	1	0	17	1	0	0	1	0	1	27000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.6263455114684566
    1	0	1	0	0	1	25	1	0	0	1	0	1	44500.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.7767988267664266
    1	0	1	0	0	1	10	1	0	0	1	0	1	41000.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	0.4031271242086151
    1	0	0	0	0	1	36	1	0	1	1	0	1	32500.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.5549727479831188
    1	0	1	1	0	1	8	0	0	0	1	0	1	55000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	1.8093671611394	-1.4016876104028086
    1	1	1	1	1	0	16	0	0	0	1	0	0	72000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	2.6213879120416936
    1	0	1	0	0	1	29	1	0	0	0	1	1	64000.0	-0.5698449326198071	-0.8999912756370636	0.3943732959927247	0.7474571831400929	-0.552679423766173	0.34885053046376313
    1	0	1	0	1	0	15	1	0	0	1	0	1	61000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.3709042996310123
    
    Updated dataset after performing scaling for PCA feature selection :
    bedrooms_1	driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	recroom_1	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	prefarea	bathrms	lotsize	garagepl	stories	homestyle
    0	1	1	60000.0	1	1	1	31	0	0	0	1	0	0	0	-0.5526794213794932	-0.5698449326198075	-1.0349666248397658	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    0	1	0	55000.0	0	1	1	9	0	0	1	0	1	0	1	-0.5526794213794932	-0.5698449326198075	-0.503056006140217	-0.7757094582336221	-0.8999912756370632	0.7474571831400939
    0	1	0	27000.0	1	1	1	17	0	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-1.6263455114684566	-0.7757094582336221	0.2539261099118857	-0.7516329215933905
    0	1	0	44500.0	0	1	1	25	0	0	1	0	1	0	1	-0.5526794213794932	-0.5698449326198075	-0.7767988267664266	-0.7757094582336221	-0.8999912756370632	-0.7516329215933905
    0	1	0	41000.0	0	1	1	10	0	0	1	0	1	0	1	-0.5526794213794932	-0.5698449326198075	0.4031271242086151	-0.7757094582336221	-0.8999912756370632	-0.7516329215933905
    0	0	0	32500.0	0	1	1	36	1	0	1	0	1	0	1	-0.5526794213794932	-0.5698449326198075	-0.5549727479831188	-0.7757094582336221	-0.8999912756370632	-0.7516329215933905
    0	1	0	55000.0	0	1	0	8	0	1	1	0	1	0	1	1.809367156861831	-0.5698449326198075	-1.4016876104028086	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    0	1	1	72000.0	1	1	0	16	0	1	0	0	1	0	0	-0.5526794213794932	-0.5698449326198075	2.6213879120416936	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    0	1	0	64000.0	0	1	1	29	0	0	1	1	0	0	1	-0.5526794213794932	-0.5698449326198075	0.34885053046376313	0.39437329599272386	-0.8999912756370632	0.7474571831400939
    0	1	0	61000.0	1	1	1	15	0	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-0.3709042996310123	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	col_2	col_3	col_4	col_5	col_6	col_7	col_8	col_9	price
    0	15	-1.003669	0.239619	-0.544437	-0.446579	-0.873745	0.132942	-0.304909	-0.283428	-0.156519	-0.393635	61000.0
    1	29	-0.570314	-0.909223	0.891253	-0.739743	-0.298341	0.315286	-0.009998	1.337645	0.357075	0.066034	64000.0
    2	31	-0.938230	0.181104	-0.994173	-0.401154	-0.410828	-0.391002	1.254502	0.614167	-0.346502	-0.214179	60000.0
    3	16	0.737217	-1.057533	-0.249024	-0.476448	-1.107181	2.240093	1.110594	-1.325931	0.298640	0.170796	72000.0
    4	9	-1.786104	-0.439574	0.030216	-0.344894	-0.025365	0.425588	-0.550912	0.337141	0.623026	-0.124383	55000.0
    5	17	-1.414625	1.385440	-0.124862	0.557650	-0.185774	-0.603600	-0.094402	-0.227540	-0.150107	-0.692651	27000.0
    6	25	-1.701373	0.405676	0.680010	0.610730	0.412048	0.370640	-0.388688	0.263257	0.585903	-0.277915	44500.0
    7	8	-0.880471	-0.664464	-1.511189	1.245302	-0.187289	-0.943057	-0.515748	0.269059	1.517742	-0.028053	55000.0
    8	10	-1.105595	0.044414	0.956715	0.552309	0.110896	1.189863	-0.446720	0.107281	0.533577	-0.103056	41000.0
    9	36	-1.772019	0.459011	0.740512	0.646621	0.516132	0.508674	-0.404914	0.170345	0.001238	0.920050	32500.0
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : XGBOOST_2 
    Feature Selection Method : pca
    
     Prediction : 
       id    Prediction  Confidence_Lower  Confidence_upper    price
    0  31  62660.364670      34660.085646      90660.643694  60000.0
    1   9  55986.209180      30785.420530      81186.997830  55000.0
    2  17  44426.444076      24229.803083      64623.085068  27000.0
    3  25  38398.516388      21188.376911      55608.655865  44500.0
    4  10  45683.429188      24617.332667      66749.525709  41000.0
    5  36  40066.436509      21095.208542      59037.664475  32500.0
    6   8  56863.357233      32185.946248      81540.768217  55000.0
    7  16  67979.252919      37332.178823      98626.327014  72000.0
    8  29  58572.951390      33177.335198      83968.567583  64000.0
    9  15  59827.259623      33990.028359      85664.490888  61000.0
    
     Performance Metrics : 
               MAE           MSE      MSLE       MAPE    MPE          RMSE     RMSLE           ME        R2        EV          MPD       MGD
    0  8002.145716  1.167270e+08  0.038814  15.298458 -5.345  10804.026128  0.197013  34490.55843  0.637479  0.637927  1995.655639  0.036785
    >>> prediction.head()
    id	Prediction	Confidence_Lower	Confidence_upper	price
    10	45683.42918800001	24617.332667	66749.52570900002	41000.0
    12	78811.21095774998	41047.641704317764	116574.7802111822	67000.0
    13	57454.380859375015	30640.814398146318	84267.94732060371	49000.0
    14	52876.934924625	28793.137666832026	76960.73218241797	48500.0
    16	67979.2529185	37332.178823254886	98626.32701374512	72000.0
    17	44426.44407562501	24229.80308276226	64623.085068487766	27000.0
    15	59827.259623375016	33990.02835867039	85664.49088807964	61000.0
    11	71539.44990912499	38962.82825323011	104116.07156501987	68000.0
    9	55986.209179749996	30785.420529747764	81186.99782975223	55000.0
    8	56863.35723262501	32185.946247916516	81540.7682173335	55000.0
  10. Generate prediction on test dataset using second best performing model.
    >>> prediction = aml.predict(housing_test,2)
    Data Transformation started ...
    Performing transformation carried out in feature engineering phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815691874863"'
    
    Updated dataset after performing customized variable width bin-code transformation :
    bathrms	lotsize	airco	gashw	garagepl	id	recroom	sn	driveway	stories	prefarea	fullbase	homestyle	price	bedrooms
    1	3750.0	no	no	0	43	no	140	yes	2	no	no	Classic	43000.0	big_house
    1	9000.0	no	no	1	33	no	411	yes	1	yes	yes	Eclectic	90000.0	big_house
    1	10700.0	no	no	0	16	yes	364	yes	2	no	yes	Eclectic	72000.0	big_house
    1	4080.0	no	no	0	9	no	301	yes	1	no	no	Eclectic	55000.0	small_house
    1	3520.0	no	no	0	51	no	443	yes	1	yes	no	Eclectic	65000.0	big_house
    1	3180.0	no	no	0	59	no	195	yes	1	no	no	Classic	33000.0	small_house
    1	4960.0	no	no	0	48	no	25	yes	1	no	no	Classic	42000.0	small_house
    1	3500.0	no	no	0	25	no	249	yes	1	no	no	Classic	44500.0	small_house
    1	2650.0	no	no	1	41	no	142	yes	2	no	yes	Classic	40000.0	big_house
    1	3162.0	yes	no	1	49	no	161	yes	2	no	no	Eclectic	63900.0	big_house
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815686869295"'
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713816248740571"'
    
    Updated dataset after performing customized categorical encoding :
    prefarea	bedrooms	bathrms	lotsize	gashw	garagepl	id	recroom	sn	driveway	stories	airco	homestyle	fullbase	price
    62906.33597883598	big_house	1	5400.0	no	0	28	no	177	yes	2	no	2	no	70000.0
    62906.33597883598	small_house	1	6000.0	no	0	10	no	260	yes	1	no	1	no	41000.0
    62906.33597883598	big_house	1	1700.0	no	0	17	no	13	yes	2	no	1	no	27000.0
    62906.33597883598	big_house	1	2650.0	no	1	41	no	142	yes	2	no	1	yes	40000.0
    62906.33597883598	big_house	1	5000.0	no	0	67	no	317	yes	4	no	2	no	80000.0
    62906.33597883598	small_house	1	3000.0	no	2	44	no	239	yes	1	no	1	yes	26000.0
    83851.72413793103	small_house	1	2176.0	no	0	8	yes	469	yes	2	no	2	no	55000.0
    83851.72413793103	big_house	1	3520.0	no	2	39	no	441	yes	1	no	2	no	51900.0
    83851.72413793103	big_house	1	2610.0	no	0	13	no	463	yes	2	no	1	yes	49000.0
    83851.72413793103	big_house	1	6420.0	no	0	22	no	408	yes	3	no	2	yes	87500.0
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713815690312918"'
    
    Updated dataset after performing categorical encoding :
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	sn	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    62906.33597883598	1	0	0	1	1700.0	1	0	0	17	1	0	13	0	1	2	1	0	1	1	0	27000.0
    62906.33597883598	1	0	0	1	5000.0	1	0	0	67	1	0	317	0	1	4	1	0	2	1	0	80000.0
    62906.33597883598	0	0	1	1	3000.0	1	0	2	44	1	0	239	0	1	1	1	0	1	0	1	26000.0
    62906.33597883598	1	0	0	1	5076.0	1	0	0	52	1	0	111	1	0	1	1	0	1	1	0	43000.0
    62906.33597883598	1	0	0	1	3900.0	1	0	0	45	1	0	340	0	1	2	1	0	2	1	0	62500.0
    62906.33597883598	1	0	0	1	3630.0	1	0	3	53	1	0	237	0	1	2	1	0	1	1	0	43000.0
    83851.72413793103	1	0	0	1	2610.0	1	0	0	13	1	0	463	0	1	2	1	0	1	0	1	49000.0
    83851.72413793103	1	0	0	1	2398.0	1	0	0	21	1	0	459	0	1	1	1	0	1	1	0	44555.0
    83851.72413793103	1	0	0	1	9000.0	1	0	1	33	1	0	411	0	1	1	1	0	2	0	1	90000.0
    83851.72413793103	1	0	0	1	2787.0	1	0	0	27	1	0	472	0	1	1	1	0	2	0	1	60500.0
    
    Updated dataset after performing customized anti-selection :
    prefarea	bedrooms_0	bedrooms_1	bedrooms_2	bathrms	lotsize	gashw_0	gashw_1	garagepl	id	recroom_0	recroom_1	driveway_0	driveway_1	stories	airco_0	airco_1	homestyle	fullbase_0	fullbase_1	price
    62906.33597883598	1	0	0	1	1700.0	1	0	0	17	1	0	0	1	2	1	0	1	1	0	27000.0
    62906.33597883598	1	0	0	1	5000.0	1	0	0	67	1	0	0	1	4	1	0	2	1	0	80000.0
    62906.33597883598	0	0	1	1	3000.0	1	0	2	44	1	0	0	1	1	1	0	1	0	1	26000.0
    62906.33597883598	1	0	0	1	5076.0	1	0	0	52	1	0	1	0	1	1	0	1	1	0	43000.0
    62906.33597883598	1	0	0	1	3900.0	1	0	0	45	1	0	0	1	2	1	0	2	1	0	62500.0
    62906.33597883598	1	0	0	1	3630.0	1	0	3	53	1	0	0	1	2	1	0	1	1	0	43000.0
    83851.72413793103	1	0	0	1	2610.0	1	0	0	13	1	0	0	1	2	1	0	1	0	1	49000.0
    83851.72413793103	1	0	0	1	2398.0	1	0	0	21	1	0	0	1	1	1	0	1	1	0	44555.0
    83851.72413793103	1	0	0	1	9000.0	1	0	1	33	1	0	0	1	1	1	0	2	0	1	90000.0
    83851.72413793103	1	0	0	1	2787.0	1	0	0	27	1	0	0	1	1	1	0	2	0	1	60500.0
    Performing transformation carried out in data preparation phase ...
    result data stored in table '"AUTOML_USER"."ml__td_sqlmr_persist_out__1713823026627552"'
    
    Updated dataset after performing Lasso feature selection:
    id	bathrms	fullbase_1	gashw_0	driveway_0	stories	airco_1	gashw_1	bedrooms_0	bedrooms_2	driveway_1	garagepl	recroom_0	fullbase_0	homestyle	airco_0	prefarea	lotsize	price
    47	1	0	1	0	2	0	0	1	0	1	0	1	1	1	1	62906.336	3850.0	44500.0
    45	1	0	1	0	2	0	0	1	0	1	0	1	1	2	1	62906.336	3900.0	62500.0
    53	1	0	1	0	2	0	0	1	0	1	3	1	1	1	1	62906.336	3630.0	43000.0
    15	1	0	1	0	2	0	0	1	0	1	0	1	1	2	1	62906.336	4360.0	61000.0
    75	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	62906.336	4040.0	47000.0
    11	1	1	1	0	1	1	0	0	1	1	2	1	0	2	0	62906.336	9166.0	68000.0
    32	1	1	1	0	1	1	0	1	0	1	0	0	0	2	0	83851.7241	6825.0	77500.0
    52	1	0	1	1	1	0	0	1	0	0	0	1	1	1	1	62906.336	5076.0	43000.0
    10	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	62906.336	6000.0	41000.0
    17	1	0	1	0	2	0	0	1	0	1	0	1	1	1	1	62906.336	1700.0	27000.0
    
    Updated dataset after performing scaling on Lasso selected features :
    driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	bathrms	stories	garagepl	homestyle	prefarea	lotsize
    1	0	44500.0	1	1	1	47	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.6116091936299208
    1	0	62500.0	1	1	1	45	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.58801067461042
    1	0	43000.0	1	1	1	53	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	2.734538804445422	-0.7516329215933895	-0.552679423766173	-0.7154426773157244
    1	0	61000.0	1	1	1	15	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.3709042996310123
    1	0	47000.0	0	1	1	75	0	1	0	1	0	1	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.5219348213558176
    1	1	68000.0	0	1	1	11	0	0	1	0	0	1	-0.5698449326198071	-0.8999912756370636	1.5644560502190732	0.7474571831400929	-0.552679423766173	1.8973853485234078
    1	1	77500.0	1	1	0	32	0	0	1	0	0	0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	0.7474571831400929	1.8093671611394	0.7925026880303788
    0	0	43000.0	1	1	1	52	1	1	0	1	0	0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.03297350727176035
    1	0	41000.0	0	1	1	10	0	1	0	1	0	1	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	0.4031271242086151
    1	0	27000.0	1	1	1	17	0	1	0	1	0	0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.6263455114684566
    
    Updated dataset after performing RFE feature selection:
    id	bathrms	fullbase_1	gashw_0	driveway_0	stories	airco_1	gashw_1	bedrooms_0	bedrooms_2	driveway_1	garagepl	recroom_0	fullbase_0	homestyle	airco_0	recroom_1	prefarea	lotsize	price
    47	1	0	1	0	2	0	0	1	0	1	0	1	1	1	1	0	62906.336	3850.0	44500.0
    45	1	0	1	0	2	0	0	1	0	1	0	1	1	2	1	0	62906.336	3900.0	62500.0
    53	1	0	1	0	2	0	0	1	0	1	3	1	1	1	1	0	62906.336	3630.0	43000.0
    15	1	0	1	0	2	0	0	1	0	1	0	1	1	2	1	0	62906.336	4360.0	61000.0
    75	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	0	62906.336	4040.0	47000.0
    11	1	1	1	0	1	1	0	0	1	1	2	1	0	2	0	0	62906.336	9166.0	68000.0
    32	1	1	1	0	1	1	0	1	0	1	0	0	0	2	0	1	83851.7241	6825.0	77500.0
    52	1	0	1	1	1	0	0	1	0	0	0	1	1	1	1	0	62906.336	5076.0	43000.0
    10	1	0	1	0	1	0	0	0	1	1	0	1	1	1	1	0	62906.336	6000.0	41000.0
    17	1	0	1	0	2	0	0	1	0	1	0	1	1	1	1	0	62906.336	1700.0	27000.0
    
    Updated dataset after performing scaling on RFE selected features :
    r_gashw_0	r_fullbase_1	r_driveway_1	r_recroom_1	r_bedrooms_0	r_bedrooms_2	id	r_recroom_0	r_gashw_1	r_driveway_0	r_airco_0	r_airco_1	r_fullbase_0	price	r_bathrms	r_stories	r_garagepl	r_homestyle	r_prefarea	r_lotsize
    1	0	1	0	1	0	47	1	0	0	1	0	1	44500.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.6116091936299208
    1	0	1	0	1	0	45	1	0	0	1	0	1	62500.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.58801067461042
    1	0	1	0	1	0	53	1	0	0	1	0	1	43000.0	-0.5698449326198071	0.2539261099118858	2.734538804445422	-0.7516329215933895	-0.552679423766173	-0.7154426773157244
    1	0	1	0	1	0	15	1	0	0	1	0	1	61000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	0.7474571831400929	-0.552679423766173	-0.3709042996310123
    1	0	1	0	0	1	75	1	0	0	1	0	1	47000.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.5219348213558176
    1	1	1	0	0	1	11	1	0	0	0	1	0	68000.0	-0.5698449326198071	-0.8999912756370636	1.5644560502190732	0.7474571831400929	-0.552679423766173	1.8973853485234078
    1	1	1	1	1	0	32	0	0	0	0	1	0	77500.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	0.7474571831400929	1.8093671611394	0.7925026880303788
    1	0	0	0	1	0	52	1	0	1	1	0	1	43000.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-0.03297350727176035
    1	0	1	0	0	1	10	1	0	0	1	0	1	41000.0	-0.5698449326198071	-0.8999912756370636	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	0.4031271242086151
    1	0	1	0	1	0	17	1	0	0	1	0	1	27000.0	-0.5698449326198071	0.2539261099118858	-0.7757094582336237	-0.7516329215933895	-0.552679423766173	-1.6263455114684566
    
    Updated dataset after performing scaling for PCA feature selection :
    bedrooms_1	driveway_1	fullbase_1	price	bedrooms_0	gashw_0	recroom_0	id	driveway_0	recroom_1	fullbase_0	airco_1	airco_0	gashw_1	bedrooms_2	prefarea	bathrms	lotsize	garagepl	stories	homestyle
    0	1	0	44500.0	1	1	1	47	0	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-0.6116091936299208	-0.7757094582336221	0.2539261099118857	-0.7516329215933905
    0	1	0	62500.0	1	1	1	45	0	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-0.58801067461042	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    0	1	0	43000.0	1	1	1	53	0	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-0.7154426773157244	2.734538804445416	0.2539261099118857	-0.7516329215933905
    0	1	0	61000.0	1	1	1	15	0	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-0.3709042996310123	-0.7757094582336221	0.2539261099118857	0.7474571831400939
    0	1	0	47000.0	0	1	1	75	0	0	1	0	1	0	1	-0.5526794213794932	-0.5698449326198075	-0.5219348213558176	-0.7757094582336221	-0.8999912756370632	-0.7516329215933905
    0	1	1	68000.0	0	1	1	11	0	0	0	1	0	0	1	-0.5526794213794932	-0.5698449326198075	1.8973853485234078	1.5644560502190699	-0.8999912756370632	0.7474571831400939
    0	1	1	77500.0	1	1	0	32	0	1	0	1	0	0	0	1.809367156861831	-0.5698449326198075	0.7925026880303788	-0.7757094582336221	-0.8999912756370632	0.7474571831400939
    0	0	0	43000.0	1	1	1	52	1	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-0.03297350727176035	-0.7757094582336221	-0.8999912756370632	-0.7516329215933905
    0	1	0	41000.0	0	1	1	10	0	0	1	0	1	0	1	-0.5526794213794932	-0.5698449326198075	0.4031271242086151	-0.7757094582336221	-0.8999912756370632	-0.7516329215933905
    0	1	0	27000.0	1	1	1	17	0	0	1	0	1	0	0	-0.5526794213794932	-0.5698449326198075	-1.6263455114684566	-0.7757094582336221	0.2539261099118857	-0.7516329215933905
    
    Updated dataset after performing PCA feature selection :
    id	col_0	col_1	col_2	col_3	col_4	col_5	col_6	col_7	col_8	col_9	price
    0	17	-1.414625	1.385440	-0.124862	0.557650	-0.185774	-0.603600	-0.094402	-0.227540	-0.150107	-0.692651	27000.0
    1	10	-1.105595	0.044414	0.956715	0.552309	0.110896	1.189863	-0.446720	0.107281	0.533577	-0.103056	41000.0
    2	47	-0.902256	1.074754	0.113105	0.507409	-0.444764	0.100932	-0.144309	-0.361679	-0.195108	-0.542272	44500.0
    3	52	-1.229283	0.416949	0.607011	0.564425	0.187417	0.726843	-0.220520	-0.356044	-1.097318	0.421967	43000.0
    4	45	-1.113292	0.306091	-0.595351	-0.435829	-0.818333	-0.017795	-0.294231	-0.254729	-0.146891	-0.425810	62500.0
    5	53	0.449859	0.338874	2.106509	-0.463153	-0.233032	-2.098387	-0.168963	-0.619831	-0.075069	-0.317378	43000.0
    6	15	-1.003669	0.239619	-0.544437	-0.446579	-0.873745	0.132942	-0.304909	-0.283428	-0.156519	-0.393635	61000.0
    7	32	0.470801	-2.082235	-1.196821	1.131651	-0.030202	0.613375	1.070719	0.477148	-0.176126	-0.287062	77500.0
    8	75	-1.572685	0.327644	0.739779	0.598111	0.346999	0.547592	-0.401223	0.229566	0.574601	-0.240146	47000.0
    9	11	0.763052	-1.948140	1.644308	-1.101667	-0.221112	0.611181	0.837938	0.648384	0.374097	0.669405	68000.0
    
    Data Transformation completed.
    Following model is being used for generating prediction :
    Model ID : DECISIONFOREST_1 
    Feature Selection Method : rfe
    
     Prediction : 
       id    prediction  confidence_lower  confidence_upper    price
    0  47  44000.694444      43999.333333      44002.055556  44500.0
    1  45  58427.906977      55346.604651      61509.209302  62500.0
    2  53  41336.956522      36117.391304      46556.521739  43000.0
    3  15  58427.906977      55346.604651      61509.209302  61000.0
    4  75  44000.694444      43999.333333      44002.055556  47000.0
    5  11  80174.038462      51703.153846     108644.923077  68000.0
    6  32  87175.757576      72428.242424     101923.272727  77500.0
    7  52  38000.694444      26239.333333      49762.055556  43000.0
    8  10  69350.694444      19666.055556     119035.333333  41000.0
    9  17  41000.000000      35120.000000      46880.000000  27000.0
    
     Performance Metrics : 
               MAE           MSE      MSLE       MAPE       MPE          RMSE     RMSLE            ME        R2        EV          MPD       MGD
    0  7402.911675  1.081247e+08  0.032827  13.675299 -2.398854  10398.301013  0.181182  31113.636364  0.664195  0.664562  1791.747597  0.032045
    
    >>> prediction.head()
    id	prediction	confidence_lower	confidence_upper	price
    10	69350.69444444444	19666.055555555547	119035.33333333333	41000.0
    12	71718.75	48750.0	94687.5	67000.0
    13	41000.0	35120.0	46880.0	49000.0
    14	41336.956521739135	36117.39130434795	46556.52173913032	48500.0
    16	80174.03846153847	51703.1538461539	108644.92307692303	72000.0
    17	41000.0	35120.0	46880.0	27000.0
    15	58427.90697674418	55346.60465116245	61509.209302325915	61000.0
    11	80174.03846153847	51703.1538461539	108644.92307692303	68000.0
    9	56427.90697674418	55589.20930232445	57266.604651163914	55000.0
    8	56427.90697674418	55589.20930232445	57266.604651163914	55000.0