This example predicts the price of house based on different factors.
Run AutoML to get the best performing model with the following specifications:
- Set early stopping criteria, that is, time limit to 100 sec and performance metrics R2 threshold value to 0.7.
- Exclude 'knn', 'glm', and 'svm' models from default model training list.
- Opt for verbose level 2 to get detailed logging.
- Load the example dataset.
>>> load_example_data("decisionforestpredict", ["housing_train", "housing_test"])>>> housing_train = DataFrame.from_table("housing_train")>>> housing_test = DataFrame.from_table("housing_test") - Create an AutoML instance.
>>> aml = AutoML(task_type="Regression", exclude=['knn', 'glm', 'svm'], verbose=2, max_runtime_secs=100, stopping_metric='R2', stopping_tolerance=0.7 seed=42) - Fit the data.
>>> aml.fit(housing_train,housing_train.price)
1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:32:53,728 | INFO | Feature Exploration started 2025-11-04 01:32:53,729 | INFO | Data Overview: 2025-11-04 01:32:53,750 | INFO | Total Rows in the data: 492 2025-11-04 01:32:53,772 | INFO | Total Columns in the data: 14 2025-11-04 01:32:54,383 | INFO | Column Summary: ColumnName Datatype NonNullCount NullCount BlankCount ZeroCount PositiveCount NegativeCount NullPercentage NonNullPercentage 0 prefarea VARCHAR(10) CHARACTER SET LATIN 492 0 0.0 NaN NaN NaN 0.0 100.0 1 lotsize FLOAT 492 0 NaN 0.0 492.0 0.0 0.0 100.0 2 bedrooms INTEGER 492 0 NaN 0.0 492.0 0.0 0.0 100.0 3 stories INTEGER 492 0 NaN 0.0 492.0 0.0 0.0 100.0 4 sn INTEGER 492 0 NaN 0.0 492.0 0.0 0.0 100.0 5 driveway VARCHAR(10) CHARACTER SET LATIN 492 0 0.0 NaN NaN NaN 0.0 100.0 6 gashw VARCHAR(10) CHARACTER SET LATIN 492 0 0.0 NaN NaN NaN 0.0 100.0 7 airco VARCHAR(10) CHARACTER SET LATIN 492 0 0.0 NaN NaN NaN 0.0 100.0 8 garagepl INTEGER 492 0 NaN 270.0 222.0 0.0 0.0 100.0 9 homestyle VARCHAR(20) CHARACTER SET LATIN 492 0 0.0 NaN NaN NaN 0.0 100.0 10 fullbase VARCHAR(10) CHARACTER SET LATIN 492 0 0.0 NaN NaN NaN 0.0 100.0 11 price FLOAT 492 0 NaN 0.0 492.0 0.0 0.0 100.0 12 recroom VARCHAR(10) CHARACTER SET LATIN 492 0 0.0 NaN NaN NaN 0.0 100.0 13 bathrms INTEGER 492 0 NaN 0.0 492.0 0.0 0.0 100.0 2025-11-04 01:32:55,156 | INFO | Statistics of Data: ATTRIBUTE StatName StatValue 0 bathrms MAXIMUM 4.0 1 bedrooms MINIMUM 1.0 2 bedrooms MAXIMUM 6.0 3 sn COUNT 492.0 4 sn MAXIMUM 546.0 5 garagepl COUNT 492.0 6 garagepl MINIMUM 0.0 7 garagepl MAXIMUM 3.0 8 sn MINIMUM 1.0 9 bedrooms COUNT 492.0 2025-11-04 01:32:55,306 | INFO | Categorical Columns with their Distinct values: ColumnName DistinctValueCount driveway 2 recroom 2 fullbase 2 gashw 2 airco 2 prefarea 2 homestyle 3 2025-11-04 01:32:58,031 | INFO | No Futile columns found. 2025-11-04 01:33:00,943 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 bathrms 0.203252 1 stories 7.113821 2 garagepl 2.235772 3 bedrooms 2.235772 4 lotsize 2.235772 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:33:01,171 | INFO | Feature Engineering started ... 2025-11-04 01:33:01,171 | INFO | Handling duplicate records present in dataset ... 2025-11-04 01:33:01,350 | INFO | Analysis completed. No action taken. 2025-11-04 01:33:01,350 | INFO | Total time to handle duplicate records: 0.18 sec 2025-11-04 01:33:01,351 | INFO | Handling less significant features from data ... 2025-11-04 01:33:04,939 | INFO | Analysis indicates all categorical columns are significant. No action Needed. 2025-11-04 01:33:04,940 | INFO | Total time to handle less significant features: 3.59 sec 2025-11-04 01:33:04,940 | INFO | Handling Date Features ... 2025-11-04 01:33:04,940 | INFO | Analysis Completed. Dataset does not contain any feature related to dates. No action needed. 2025-11-04 01:33:04,940 | INFO | Total time to handle date features: 0.00 sec 2025-11-04 01:33:04,940 | INFO | Checking Missing values in dataset ... 2025-11-04 01:33:06,795 | INFO | Analysis Completed. No Missing Values Detected. 2025-11-04 01:33:06,796 | INFO | Total time to find missing values in data: 1.86 sec 2025-11-04 01:33:06,796 | INFO | Imputing Missing Values ... 2025-11-04 01:33:06,796 | INFO | Analysis completed. No imputation required. 2025-11-04 01:33:06,796 | INFO | Time taken to perform imputation: 0.00 sec 2025-11-04 01:33:06,797 | INFO | Performing encoding for categorical columns ... 2025-11-04 01:33:10,662 | INFO | ONE HOT Encoding these Columns: ['driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'prefarea', 'homestyle'] 2025-11-04 01:33:10,663 | INFO | Sample of dataset after performing one hot encoding: price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 automl_id sn 488 44100.0 8100.0 2 1 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 14 345 88000.0 4500.0 3 1 4 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 22 406 86000.0 6900.0 3 2 1 0 1 0 1 0 1 1 0 1 0 0 0 1 0 0 1 26 528 106000.0 6325.0 3 1 4 0 1 1 0 1 0 1 0 0 1 1 1 0 1 0 0 30 446 104900.0 11440.0 4 1 2 0 1 1 0 0 1 1 0 1 0 1 0 1 1 0 0 38 343 80000.0 10500.0 2 1 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 42 120 116000.0 6840.0 5 1 2 0 1 0 1 0 1 1 0 0 1 1 1 0 1 0 0 34 80 63900.0 6360.0 2 1 1 0 1 1 0 0 1 1 0 0 1 1 1 0 0 0 1 18 223 70100.0 4200.0 3 1 2 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 10 40 54500.0 3150.0 2 2 1 1 0 1 0 0 1 1 0 1 0 0 1 0 0 0 1 6 492 rows X 23 columns 2025-11-04 01:33:10,755 | INFO | Time taken to encode the columns: 3.96 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:33:10,756 | INFO | Data preparation started ... 2025-11-04 01:33:10,756 | INFO | Outlier preprocessing ... 2025-11-04 01:33:14,094 | INFO | Columns with outlier percentage :- ColumnName OutlierPercentage 0 bedrooms 2.235772 1 garagepl 2.235772 2 bathrms 0.203252 3 lotsize 2.235772 4 stories 7.113821 2025-11-04 01:33:14,628 | INFO | Deleting rows of these columns: ['lotsize', 'stories', 'bedrooms', 'bathrms', 'garagepl'] 2025-11-04 01:33:16,825 | INFO | Sample of dataset after removing outlier rows: price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 automl_id sn 19 45000.0 3450.0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 19 59 35500.0 4400.0 3 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 27 324 98000.0 6000.0 3 1 1 0 1 1 0 1 0 1 0 0 1 1 1 0 0 0 1 31 385 78000.0 6600.0 4 2 2 0 1 0 1 0 1 1 0 1 0 0 0 1 0 0 1 35 99 35000.0 3500.0 2 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 0 1 0 43 160 63000.0 3968.0 3 1 2 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 1 47 303 58500.0 4040.0 2 1 2 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 39 263 48500.0 3640.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 23 448 120000.0 5500.0 4 2 2 0 1 1 0 0 1 1 0 0 1 1 0 1 1 0 0 15 122 80000.0 10500.0 4 2 2 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 7 428 rows X 23 columns 2025-11-04 01:33:16,984 | INFO | Time Taken by Outlier processing: 6.23 sec 2025-11-04 01:33:18,064 | INFO | Feature selection using rfe ... 2025-11-04 01:34:01,319 | INFO | feature selected by RFE: ['homestyle_0', 'stories', 'bathrms', 'homestyle_2', 'prefarea_0', 'bedrooms', 'homestyle_1', 'airco_0', 'garagepl', 'fullbase_0', 'sn', 'lotsize'] 2025-11-04 01:34:01,321 | INFO | Total time taken by feature selection: 43.26 sec 2025-11-04 01:34:01,634 | INFO | Scaling Features of rfe data ... 2025-11-04 01:34:03,368 | INFO | columns that will be scaled: ['r_stories', 'r_bathrms', 'r_bedrooms', 'r_garagepl', 'r_sn', 'r_lotsize'] 2025-11-04 01:34:05,459 | INFO | Dataset sample after scaling: r_homestyle_2 r_homestyle_0 automl_id price r_homestyle_1 r_fullbase_0 r_airco_0 r_prefarea_0 r_stories r_bathrms r_bedrooms r_garagepl r_sn r_lotsize 0 1 0 6 54500.0 0 0 1 1 -1.000531 1.626355 -1.327527 -0.769077 -1.415712 -0.930075 1 1 0 8 99000.0 0 0 0 1 0.579644 1.626355 0.187624 0.516724 0.441857 2.233189 2 0 0 9 27000.0 1 1 1 1 -1.000531 -0.522040 -1.327527 -0.769077 -0.090733 -0.654601 3 1 0 10 70100.0 0 1 1 1 0.579644 -0.522040 0.187624 0.516724 -0.227128 -0.350420 4 1 0 13 60000.0 0 1 1 1 -1.000531 -0.522040 0.187624 1.802525 0.305462 0.532865 5 0 0 14 44100.0 1 1 1 1 -1.000531 -0.522040 -1.327527 0.516724 1.494047 1.802588 6 1 0 12 58000.0 0 1 1 1 -1.000531 -0.522040 0.187624 -0.769077 -0.486928 -0.273132 7 1 0 7 80000.0 0 1 1 1 0.579644 1.626355 1.702775 0.516724 -0.883122 3.127515 8 0 0 5 50000.0 1 1 1 1 -1.000531 -0.522040 -1.327527 0.516724 0.045662 -0.659569 9 0 0 4 48000.0 1 1 1 1 0.579644 -0.522040 -1.327527 -0.769077 -1.279317 -0.394584 428 rows X 14 columns 2025-11-04 01:34:06,093 | INFO | Total time taken by feature scaling: 4.46 sec 2025-11-04 01:34:06,093 | INFO | Scaling Features of pca data ... 2025-11-04 01:34:08,162 | INFO | columns that will be scaled: ['sn', 'lotsize', 'bedrooms', 'bathrms', 'stories', 'garagepl'] 2025-11-04 01:34:10,316 | INFO | Dataset sample after scaling: homestyle_0 homestyle_1 fullbase_1 driveway_0 airco_0 recroom_1 airco_1 gashw_0 homestyle_2 price automl_id prefarea_0 prefarea_1 gashw_1 fullbase_0 driveway_1 recroom_0 sn lotsize bedrooms bathrms stories garagepl 0 0 1 0 0 1 0 0 1 0 44100.0 14 1 0 0 1 1 1 1.494047 1.802588 -1.327527 -0.522040 -1.000531 0.516724 1 1 0 1 0 0 0 1 1 0 120000.0 15 0 1 0 0 1 1 1.234247 0.367250 1.702775 1.626355 0.579644 0.516724 2 0 1 0 0 1 0 0 1 0 45000.0 19 1 0 0 1 1 1 -1.552107 -0.764460 -2.842678 -0.522040 -1.000531 -0.769077 3 0 1 0 0 1 0 0 1 0 50000.0 5 1 0 0 1 1 1 0.045662 -0.659569 -1.327527 -0.522040 -1.000531 0.516724 4 0 0 0 0 1 0 0 0 1 60000.0 13 1 0 1 1 1 1 0.305462 0.532865 0.187624 -0.522040 -1.000531 1.802525 5 0 1 0 0 1 0 0 1 0 48000.0 4 1 0 0 1 1 1 -1.279317 -0.394584 -1.327527 -0.522040 0.579644 -0.769077 6 0 0 1 0 0 0 1 1 1 99000.0 8 1 0 0 0 1 1 0.441857 2.233189 0.187624 1.626355 0.579644 0.516724 7 0 0 0 0 1 0 0 1 1 58000.0 12 1 0 0 1 1 1 -0.486928 -0.273132 0.187624 -0.522040 -1.000531 -0.769077 8 0 1 0 0 1 0 0 1 0 27000.0 9 1 0 0 1 1 1 -0.090733 -0.654601 -1.327527 -0.522040 -1.000531 -0.769077 9 0 0 0 0 1 0 0 1 1 80000.0 7 1 0 0 1 1 1 -0.883122 3.127515 1.702775 1.626355 0.579644 0.516724 428 rows X 23 columns 2025-11-04 01:34:11,006 | INFO | Total time taken by feature scaling: 4.91 sec 2025-11-04 01:34:11,006 | INFO | Dimension Reduction using pca ... 2025-11-04 01:34:11,661 | INFO | PCA columns: ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9', 'col_10'] 2025-11-04 01:34:11,662 | INFO | Total time taken by PCA: 0.66 sec 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation 2025-11-04 01:34:12,067 | INFO | Model Training started ... 2025-11-04 01:34:12,110 | INFO | Hyperparameters used for model training: 2025-11-04 01:34:12,110 | INFO | Model: decision_forest 2025-11-04 01:34:12,110 | INFO | Hyperparameters: {'response_column': 'price', 'name': 'decision_forest', 'tree_type': 'Regression', 'min_impurity': (0.0, 0.1, 0.2, 0.3), 'max_depth': (5, 3, 4, 7, 8), 'min_node_size': (1, 2, 3, 4), 'num_trees': (-1,), 'seed': 42} 2025-11-04 01:34:12,110 | INFO | Total number of models for decision_forest: 80 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 01:34:12,110 | INFO | Model: xgboost 2025-11-04 01:34:12,111 | INFO | Hyperparameters: {'response_column': 'price', 'name': 'xgboost', 'model_type': 'Regression', 'column_sampling': (1, 0.6), 'min_impurity': (0.0, 0.1, 0.2, 0.3), 'lambda1': (1.0, 1.0, 10.0, 100.0), 'shrinkage_factor': (0.5, 0.01, 0.05, 0.1), 'max_depth': (5, 3, 4, 7, 8), 'min_node_size': (1, 2, 3, 4), 'iter_num': (10, 20, 30, 40), 'num_boosted_trees': (-1, 20, 50, 100), 'seed': 42} 2025-11-04 01:34:12,124 | INFO | Total number of models for xgboost: 40960 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2025-11-04 01:34:12,124 | INFO | Performing hyperparameter tuning ... 2025-11-04 01:34:13,215 | INFO | Model training for decision_forest 2025-11-04 01:34:38,983 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 01:34:38,983 | INFO | Model training for xgboost 2025-11-04 01:35:04,641 | INFO | ---------------------------------------------------------------------------------------------------- 2025-11-04 01:35:04,644 | INFO | Leaderboard RANK MODEL_ID FEATURE_SELECTION MAE MSE MSLE ... ME R2 EV MPD MGD ADJUSTED_R2 0 1 DECISIONFOREST_0 rfe 9531.052379 1.521203e+08 0.034230 ... 37142.567568 0.683602 0.684739 2201.511486 0.034267 0.674453 1 2 XGBOOST_0 rfe 7822.466077 1.521448e+08 0.024162 ... 74337.127715 0.683551 0.688615 1818.809847 0.025643 0.674400 2 3 DECISIONFOREST_2 rfe 9593.067883 1.557999e+08 0.034362 ... 37142.567568 0.675949 0.676903 2224.474645 0.034410 0.666578 3 4 DECISIONFOREST_4 rfe 9593.067883 1.557999e+08 0.034362 ... 37142.567568 0.675949 0.676903 2224.474645 0.034410 0.666578 4 5 DECISIONFOREST_5 pca 10263.676061 1.800835e+08 0.038719 ... 40566.666667 0.625441 0.630034 2554.009956 0.039661 0.615536 5 6 XGBOOST_2 rfe 8126.079052 1.901138e+08 0.025395 ... 95704.674365 0.604578 0.625691 2127.215276 0.027426 0.593145 6 7 DECISIONFOREST_3 pca 10722.659249 2.004692e+08 0.041859 ... 40566.666667 0.583040 0.584328 2786.784228 0.042189 0.572014 7 8 DECISIONFOREST_1 pca 10669.171897 2.102944e+08 0.043649 ... 51700.000000 0.562604 0.565019 2918.761297 0.044867 0.551039 8 9 XGBOOST_1 pca 9304.694891 2.102998e+08 0.033842 ... 88307.046227 0.562593 0.562641 2485.090162 0.033954 0.551027 9 10 XGBOOST_3 pca 9578.433215 2.361295e+08 0.036369 ... 99170.952412 0.508869 0.515650 2800.835696 0.037728 0.495883 10 11 XGBOOST_5 pca 9461.591069 2.401957e+08 0.037042 ... 106111.874202 0.500412 0.515578 2894.032774 0.039889 0.487202 11 12 XGBOOST_7 pca 10067.123344 2.698733e+08 0.042398 ... 111924.901906 0.438685 0.468630 3326.063729 0.046489 0.423843 12 13 XGBOOST_6 rfe 12950.097039 3.982805e+08 0.065653 ... 115851.893132 0.171608 0.333218 5232.883191 0.074837 0.147655 13 14 XGBOOST_4 rfe 15210.126908 4.912907e+08 0.088076 ... 128690.810653 -0.021846 0.147882 6790.938290 0.101301 -0.051393 [14 rows x 16 columns] 14 rows X 16 columns 1. Feature Exploration -> 2. Feature Engineering -> 3. Data Preparation -> 4. Model Training & Evaluation >>> Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 13/13 - Display model leaderboard.
>>> aml.leaderboard()
RANK MODEL_ID FEATURE_SELECTION MAE MSE MSLE ... ME R2 EV MPD MGD ADJUSTED_R2 0 1 DECISIONFOREST_0 rfe 9531.052379 1.521203e+08 0.034230 ... 37142.567568 0.683602 0.684739 2201.511486 0.034267 0.674453 1 2 XGBOOST_0 rfe 7822.466077 1.521448e+08 0.024162 ... 74337.127715 0.683551 0.688615 1818.809847 0.025643 0.674400 2 3 DECISIONFOREST_2 rfe 9593.067883 1.557999e+08 0.034362 ... 37142.567568 0.675949 0.676903 2224.474645 0.034410 0.666578 3 4 DECISIONFOREST_4 rfe 9593.067883 1.557999e+08 0.034362 ... 37142.567568 0.675949 0.676903 2224.474645 0.034410 0.666578 4 5 DECISIONFOREST_5 pca 10263.676061 1.800835e+08 0.038719 ... 40566.666667 0.625441 0.630034 2554.009956 0.039661 0.615536 5 6 XGBOOST_2 rfe 8126.079052 1.901138e+08 0.025395 ... 95704.674365 0.604578 0.625691 2127.215276 0.027426 0.593145 6 7 DECISIONFOREST_3 pca 10722.659249 2.004692e+08 0.041859 ... 40566.666667 0.583040 0.584328 2786.784228 0.042189 0.572014 7 8 DECISIONFOREST_1 pca 10669.171897 2.102944e+08 0.043649 ... 51700.000000 0.562604 0.565019 2918.761297 0.044867 0.551039 8 9 XGBOOST_1 pca 9304.694891 2.102998e+08 0.033842 ... 88307.046227 0.562593 0.562641 2485.090162 0.033954 0.551027 9 10 XGBOOST_3 pca 9578.433215 2.361295e+08 0.036369 ... 99170.952412 0.508869 0.515650 2800.835696 0.037728 0.495883 10 11 XGBOOST_5 pca 9461.591069 2.401957e+08 0.037042 ... 106111.874202 0.500412 0.515578 2894.032774 0.039889 0.487202 11 12 XGBOOST_7 pca 10067.123344 2.698733e+08 0.042398 ... 111924.901906 0.438685 0.468630 3326.063729 0.046489 0.423843 12 13 XGBOOST_6 rfe 12950.097039 3.982805e+08 0.065653 ... 115851.893132 0.171608 0.333218 5232.883191 0.074837 0.147655 13 14 XGBOOST_4 rfe 15210.126908 4.912907e+08 0.088076 ... 128690.810653 -0.021846 0.147882 6790.938290 0.101301 -0.051393 [14 rows x 16 columns]
- Display the best performing model.
>>> aml.leader()
RANK MODEL_ID FEATURE_SELECTION MAE MSE MSLE ... ME R2 EV MPD MGD ADJUSTED_R2 0 1 DECISIONFOREST_0 rfe 9531.052379 1.521203e+08 0.03423 ... 37142.567568 0.683602 0.684739 2201.511486 0.034267 0.674453 [1 rows x 16 columns]
- Display hyperparameters for trained model.
- Display model hyperparameters for rank 2.
>>> aml.model_hyperparameters(rank=2)
{'response_column': 'price', 'name': 'xgboost', 'model_type': 'Regression', 'column_sampling': 1, 'min_impurity': 0.0, 'lambda1': 1.0, 'shrinkage_factor': 0.5, 'max_depth': 5, 'min_node_size': 1, 'iter_num': 10, 'num_boosted_trees': -1, 'seed': 42, 'persist': False} - Display model hyperparameters for rank 6.
>>> aml.model_hyperparameters(rank=6)
{'response_column': 'price', 'name': 'xgboost', 'model_type': 'Regression', 'column_sampling': 1, 'min_impurity': 0.0, 'lambda1': 1.0, 'shrinkage_factor': 0.5, 'max_depth': 5, 'min_node_size': 1, 'iter_num': 10, 'num_boosted_trees': 20, 'seed': 42, 'persist': False}
- Display model hyperparameters for rank 2.
- Generate prediction on test dataset using best performing model.
>>> prediction = aml.predict(housing_test)
2025-11-04 01:38:20,081 | INFO | Data Transformation started ... 2025-11-04 01:38:20,081 | INFO | Performing transformation carried out in feature engineering phase ... 2025-11-04 01:38:23,789 | INFO | Updated dataset after performing categorical encoding : price lotsize bedrooms bathrms stories driveway_0 driveway_1 recroom_0 recroom_1 fullbase_0 fullbase_1 gashw_0 gashw_1 airco_0 airco_1 garagepl prefarea_0 prefarea_1 homestyle_0 homestyle_1 homestyle_2 automl_id sn 459 44555.0 2398.0 3 1 1 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 0 13 38 67000.0 5170.0 3 1 4 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 8 364 72000.0 10700.0 3 1 2 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 12 177 70000.0 5400.0 4 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 7 440 69000.0 6862.0 3 1 2 0 1 1 0 1 0 1 0 0 1 2 0 1 0 0 1 15 463 49000.0 2610.0 3 1 2 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 6 255 61000.0 4360.0 4 1 2 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 10 260 41000.0 6000.0 2 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 14 53 68000.0 9166.0 2 1 1 0 1 1 0 0 1 1 0 0 1 2 1 0 0 0 1 11 469 55000.0 2176.0 2 1 2 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1 4 46 rows X 23 columns 2025-11-04 01:38:24,308 | INFO | Performing transformation carried out in data preparation phase ... 2025-11-04 01:38:25,724 | INFO | Updated dataset after performing RFE feature selection: automl_id stories bathrms homestyle_2 prefarea_0 bedrooms homestyle_1 airco_0 garagepl fullbase_0 sn lotsize price homestyle_0 0 25 1 1 1 0 3 0 1 1 0 411 9000.0 90000.0 0 37 1 1 0 1 2 1 0 0 1 16 3185.0 37900.0 0 41 2 2 1 1 3 0 1 2 1 176 3630.0 57500.0 0 45 1 1 1 0 3 0 1 2 1 441 3520.0 51900.0 0 53 3 1 1 0 3 0 1 0 0 408 6420.0 87500.0 0 7 2 1 1 1 4 0 1 0 1 177 5400.0 70000.0 0 49 1 1 1 1 3 0 1 2 1 353 7980.0 78500.0 0 29 2 1 0 1 3 1 1 1 0 142 2650.0 40000.0 0 21 1 1 0 1 2 1 1 0 1 249 3500.0 44500.0 0 13 1 1 0 0 3 1 1 0 1 459 2398.0 44555.0 46 rows X 14 columns 2025-11-04 01:38:26,901 | INFO | Updated dataset after performing scaling on RFE selected features : r_homestyle_2 r_homestyle_0 automl_id price r_homestyle_1 r_fullbase_0 r_airco_0 r_prefarea_0 r_stories r_bathrms r_bedrooms r_garagepl r_sn r_lotsize 0 1 0 25 90000.0 0 0 1 0 -1.000531 -0.522040 0.187624 0.516724 0.993932 2.299436 1 0 0 37 37900.0 1 1 0 1 -1.000531 -0.522040 -1.327527 -0.769077 -1.571592 -0.910754 2 1 0 41 57500.0 0 1 1 1 0.579644 1.626355 0.187624 1.802525 -0.532393 -0.665090 3 1 0 45 51900.0 0 1 1 0 -1.000531 -0.522040 0.187624 1.802525 1.188782 -0.725816 4 1 0 53 87500.0 0 0 1 0 2.159819 -0.522040 0.187624 -0.769077 0.974447 0.875138 5 1 0 7 70000.0 0 1 1 1 0.579644 -0.522040 1.702775 -0.769077 -0.525898 0.312044 6 1 0 49 78500.0 0 1 1 1 -1.000531 -0.522040 0.187624 1.802525 0.617222 1.736341 7 0 0 29 40000.0 1 0 1 1 0.579644 -0.522040 0.187624 0.516724 -0.753222 -1.206102 8 0 0 21 44500.0 1 1 1 1 -1.000531 -0.522040 -1.327527 -0.769077 -0.058258 -0.736857 9 0 0 13 44555.0 1 1 1 0 -1.000531 -0.522040 0.187624 -0.769077 1.305692 -1.345219 46 rows X 14 columns 2025-11-04 01:38:28,789 | INFO | Updated dataset after performing scaling for PCA feature selection : homestyle_0 homestyle_1 fullbase_1 driveway_0 airco_0 recroom_1 airco_1 gashw_0 homestyle_2 price automl_id prefarea_0 prefarea_1 gashw_1 fullbase_0 driveway_1 recroom_0 sn lotsize bedrooms bathrms stories garagepl 0 0 0 1 0 1 0 0 1 1 90000.0 25 0 1 0 0 1 1 0.993932 2.299436 0.187624 -0.522040 -1.000531 0.516724 1 0 1 0 0 0 0 1 1 0 37900.0 37 1 0 0 1 1 1 -1.571592 -0.910754 -1.327527 -0.522040 -1.000531 -0.769077 2 0 0 0 0 1 0 0 0 1 57500.0 41 1 0 1 1 1 1 -0.532393 -0.665090 0.187624 1.626355 0.579644 1.802525 3 0 0 0 0 1 0 0 1 1 51900.0 45 0 1 0 1 1 1 1.188782 -0.725816 0.187624 -0.522040 -1.000531 1.802525 4 0 0 1 0 1 0 0 1 1 87500.0 53 0 1 0 0 1 1 0.974447 0.875138 0.187624 -0.522040 2.159819 -0.769077 5 0 0 0 0 1 0 0 1 1 70000.0 7 1 0 0 1 1 1 -0.525898 0.312044 1.702775 -0.522040 0.579644 -0.769077 6 0 0 0 0 1 0 0 1 1 78500.0 49 1 0 0 1 1 1 0.617222 1.736341 0.187624 -0.522040 -1.000531 1.802525 7 0 1 1 0 1 0 0 1 0 40000.0 29 1 0 0 0 1 1 -0.753222 -1.206102 0.187624 -0.522040 0.579644 0.516724 8 0 1 0 0 1 0 0 1 0 44500.0 21 1 0 0 1 1 1 -0.058258 -0.736857 -1.327527 -0.522040 -1.000531 -0.769077 9 0 1 0 0 1 0 0 1 0 44555.0 13 0 1 0 1 1 1 1.305692 -1.345219 0.187624 -0.522040 -1.000531 -0.769077 46 rows X 23 columns 2025-11-04 01:38:29,784 | INFO | Updated dataset after performing PCA feature selection : automl_id col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 col_10 price 0 13 -0.791863 -0.311118 1.398681 -0.572869 -1.260209 -0.080711 1.511712 -0.431964 -0.520080 0.552996 -0.024807 44555.0 1 21 -2.002198 -0.719918 0.326292 -0.403918 -0.307726 0.614137 0.245059 -0.045012 -0.229618 0.459610 -0.263007 44500.0 2 25 1.549625 -2.223951 0.393579 -0.221676 0.776631 -0.670459 0.503664 0.628208 -0.004174 -0.014646 1.054372 90000.0 3 29 -0.863769 0.917218 -0.489754 0.732473 -0.710596 -0.758062 -0.093896 0.439251 -0.717270 0.589935 0.259816 40000.0 4 37 -2.472151 -0.173876 -0.599225 -0.153213 0.440801 0.320591 -0.467513 -1.236014 -0.304159 0.693641 -0.041634 37900.0 5 41 0.863854 0.783032 -1.724096 0.706254 -1.317798 0.922764 -0.239666 0.427848 0.740829 0.006107 -0.146211 57500.0 6 45 0.582348 -1.337774 0.082277 0.981298 -1.841966 -0.377592 1.015298 -0.239363 0.387912 0.011629 0.147673 51900.0 7 49 1.116200 -2.040679 -0.596862 1.227361 0.435827 0.177842 0.797351 0.451317 0.676395 -0.122858 -0.105491 78500.0 8 53 1.323323 0.587434 2.050033 0.108656 0.266096 -0.296743 -0.960196 0.961279 -0.394448 -0.133120 1.123464 87500.0 9 7 0.252334 1.255451 0.516357 0.487479 1.169412 -0.411108 0.761633 0.489646 0.825355 0.265495 -0.022434 70000.0 10 rows X 13 columns 2025-11-04 01:38:30,161 | INFO | Data Transformation completed.⫿⫿⫿⫿⫿⫿⫿| 100% - 9/9 2025-11-04 01:38:31,071 | INFO | Following model is being picked for evaluation: 2025-11-04 01:38:31,071 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 01:38:31,071 | INFO | Feature Selection Method : rfe 2025-11-04 01:38:32,630 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 01:38:35,656 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 01:38:35,766 | INFO | Prediction : automl_id prediction confidence_lower confidence_upper price 0 25 83147.500000 83147.500000 83147.500000 90000.0 1 37 46733.333333 46733.333333 46733.333333 37900.0 2 41 59857.432432 59857.432432 59857.432432 57500.0 3 45 59857.432432 59857.432432 59857.432432 51900.0 4 53 83147.500000 83147.500000 83147.500000 87500.0 5 7 61368.750000 61368.750000 61368.750000 70000.0 6 49 83147.500000 83147.500000 83147.500000 78500.0 7 29 33735.294118 33735.294118 33735.294118 40000.0 8 21 40858.333333 40858.333333 40858.333333 44500.0 9 13 47300.000000 47300.000000 47300.000000 44555.0 - Generate evaluation metrics on test dataset using best performing model.
>>> performance_metrics = aml.evaluate(housing_test)
2025-11-04 01:39:06,226 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 01:39:06,794 | INFO | Following model is being picked for evaluation: 2025-11-04 01:39:06,795 | INFO | Model ID : DECISIONFOREST_0 2025-11-04 01:39:06,795 | INFO | Feature Selection Method : rfe 2025-11-04 01:39:10,042 | INFO | Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 5911.315908 5.049682e+07 0.016968 11.020456 -1.777284 7106.111196 0.13026 19400.0 0.843171 0.843329 865.647374 0.01655>>> performance_metrics
MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 5911.315908 5.049682e+07 0.016968 11.020456 -1.777284 7106.111196 0.13026 19400.0 0.843171 0.843329 865.647374 0.01655
- Generate prediction on test dataset using second best performing model.
>>> prediction = aml.predict(housing_test,2)
2025-11-04 01:40:54,586 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 01:40:55,133 | INFO | Following model is being picked for evaluation: 2025-11-04 01:40:55,133 | INFO | Model ID : XGBOOST_0 2025-11-04 01:40:55,133 | INFO | Feature Selection Method : rfe 2025-11-04 01:40:55,904 | INFO | Applying SHAP for Model Interpretation... 2025-11-04 01:40:58,006 | INFO | SHAP Analysis Completed. Feature Importance Available. /root/automl_testing/pyTeradata/teradataml/automl/model_evaluation.py:380: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show() 2025-11-04 01:40:58,099 | INFO | Prediction : automl_id Prediction Confidence_Lower Confidence_upper price 0 25 79763.286423 79763.286423 79763.286423 90000.0 1 37 45663.242276 45663.242276 45663.242276 37900.0 2 41 59266.237561 59266.237561 59266.237561 57500.0 3 45 61778.865480 61778.865480 61778.865480 51900.0 4 53 76168.669602 76168.669602 76168.669602 87500.0 5 7 70216.282691 70216.282691 70216.282691 70000.0 6 49 70612.043027 70612.043027 70612.043027 78500.0 7 29 37348.314071 37348.314071 37348.314071 40000.0 8 21 37390.393707 37390.393707 37390.393707 44500.0 9 13 49388.806579 49388.806579 49388.806579 44555.0
>>> prediction.head()
automl_id Prediction Confidence_Lower Confidence_upper price 0 25 79763.286423 79763.286423 79763.286423 90000.0 1 37 45663.242276 45663.242276 45663.242276 37900.0 2 41 59266.237561 59266.237561 59266.237561 57500.0 3 45 61778.865480 61778.865480 61778.865480 51900.0 4 53 76168.669602 76168.669602 76168.669602 87500.0 5 7 70216.282691 70216.282691 70216.282691 70000.0 6 49 70612.043027 70612.043027 70612.043027 78500.0 7 29 37348.314071 37348.314071 37348.314071 40000.0 8 21 37390.393707 37390.393707 37390.393707 44500.0 9 13 49388.806579 49388.806579 49388.806579 44555.0
- Generate evaluation metrics on test dataset using second best performing model.
>>> performance_metrics = aml.evaluate(housing_test, 2)
2025-11-04 01:42:34,306 | INFO | Skipping data transformation as data is already transformed. 2025-11-04 01:42:34,874 | INFO | Following model is being picked for evaluation: 2025-11-04 01:42:34,875 | INFO | Model ID : XGBOOST_0 2025-11-04 01:42:34,875 | INFO | Feature Selection Method : rfe 2025-11-04 01:42:36,338 | INFO | Performance Metrics : MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 5540.218852 5.679481e+07 0.015997 9.576885 0.12904 7536.233439 0.12648 23471.971864 0.823611 0.829319 912.853432 0.016114>>> performance_metrics
MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 5540.218852 5.679481e+07 0.015997 9.576885 0.12904 7536.233439 0.12648 23471.971864 0.823611 0.829319 912.853432 0.016114