This example creates an optimistic SVM regression model using GridSearch optimization algorithm. The best model from GridSearch optimization is used to predict house value in California.
In this example, teradataml example California housing data is used to build the SVM regression model.
- Example setup.
- Load example data from "cal_housing_ex_raw" that contains California housing data.
>>> load_example_data("teradataml", ["cal_housing_ex_raw"])
- Create teradataml DataFrame objects.
>>> data_input = DataFrame.from_table("cal_housing_ex_raw")
- Scale "target_columns" with respect to 'STD' value of the column.
>>> fit_obj = ScaleFit(data=data_input, target_columns=['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'], scale_method="USTD")
- Transform the data.
>>> transform_obj = ScaleTransform(data=data_input, object=fit_obj.output, accumulate=["id", "MedHouseVal"])
- Sample train and validation dataframe, where 80% data used for model training and 20% used for model validation.
>>> train_val_sample = transform_obj.result.sample(frac=[0.8, 0.2])
>>> train_df = train_val_sample[train_val_sample.sampleid == 1].drop(\ "sampleid", axis = 1)
>>> val_df = train_val_sample[train_val_sample.sampleid == 2].drop(\ "sampleid", axis = 1)
- Create two training data samples for model optimization.
>>> train_df1 = train_df.iloc[:30]
>>> train_df2 = train_df.iloc[30:]
- Load example data from "cal_housing_ex_raw" that contains California housing data.
- Define a parameter space and use GridSearch for Hyperparameterization.
- Define parameter space for model training.
>>> params = {"input_columns":['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'], "response_column":"MedHouseVal", "model_type":"regression", "batch_size":(11, 50, 75), "iter_max":(100, 301), "lambda1":0.1, "alpha":0.5, "iter_num_no_change":60, "tolerance":0.01, "intercept":False, "learning_rate":"INVTIME", "initial_data":0.5, "decay_rate":0.5, "momentum":0.6, "nesterov_optimization":True, "local_sgd_iterations":1}
- Define required argument for model prediction and evaluation.
>>> eval_params = {"id_column": "id", "accumulate": "MedHouseVal"}
- Import trainer function and optimizer.
>>> from teradataml import SVM, GridSearch
- Initialize the GridSearch optimizer with model trainer function and parameter space required for model training.
>>> gs_obj = GridSearch(func=SVM, params=params)
Model optimization is initiated using fit method.
- Define parameter space for model training.
- Pass single DataFrame for model trainer function and hyperparameter tuning execution viewed using progress bar. Perform model optimization for SVM function. Evaluation and prediction arguments are passed along with.
>>> gs_obj.fit(data=train_df, verbose=1, **eval_params)
Completed: |████████████████████████████████████████████████████████████| 100% - 6/6
All model training has been passed. In case of failure, use get_error_log method to retrieve corresponding error logs. - View trained model metadata from hyperparameter tuning using models property. Retrieve the model metadata of "gs_obj" instance.
>>> gs_obj.models
MODEL_ID DATA_ID PARAMETERS STATUS MAE 0 SVM_2 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR PASS 2.229113 1 SVM_3 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR PASS 2.229113 2 SVM_0 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR PASS 2.254733 3 SVM_1 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR PASS 2.254733 4 SVM_4 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR PASS 2.229113 5 SVM_5 DF_0 {'input_columns': ['MedInc', 'HouseAge', 'AveR PASS 2.229113
- View the best model identified by GridSearch. Retrieve the best model id identified by "gs_obj" instance.
>>> gs_obj.best_model_id
'SVM_0'
Identified best model is stored as a default model for future prediction and evaluation operations. - Perform prediction on validation data using the identified best model.
>>> gs_obj.predict(newdata=val_df, **eval_params)
############ result Output ############ id prediction MedHouseVal 0 11246 0.350175 2.028 1 5328 0.760091 2.775 2 16736 1.236138 3.152 3 3687 1.026577 1.741 4 8783 0.031345 2.500 5 17768 1.022263 1.601 6 7114 0.224646 2.187 7 18164 0.383037 3.674 8 10966 0.253726 1.896 9 16199 0.468865 0.590
- Perform evaluation of validation data using the best model.
>>> gs_obj.evaluate(newdata=val_df, **eval_params)
############ result Output ############ MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 2.011638 4.95115 0.097629 117.909264 116.463401 2.225118 0.312457 3.208421 -1.312606 0.436544 NaN NaN
- View all trained model stats report. Retrieve the model stats of "gs_obj" instance.
>>> gs_obj.model_stats
MODEL_ID MAE MSE MSLE MAPE MPE RMSE RMSLE ME R2 EV MPD MGD 0 SVM_0 2.292239 6.188213 0.286058 110.899315 110.899315 2.487612 0.534843 3.325346 -1.788798 0.579146 NaN NaN 1 SVM_1 2.018557 6.799822 0.000000 92.159608 92.159608 2.607647 0.000000 5.467911 -2.512063 -0.407573 NaN NaN 2 SVM_2 2.018557 6.799822 0.000000 92.159608 92.159608 2.607647 0.000000 5.467911 -2.512063 -0.407573 NaN NaN 3 SVM_3 2.292239 6.188213 0.286058 110.899315 110.899315 2.487612 0.534843 3.325346 -1.788798 0.579146 NaN NaN 4 SVM_7 2.156224 5.709455 0.127423 101.757168 101.757168 2.389447 0.356964 3.440837 -1.573039 0.522228 NaN NaN 5 SVM_4 1.960213 6.039734 0.000000 91.243341 91.243341 2.457587 0.000000 4.982941 -2.119482 -0.134890 NaN NaN
- Update default model with other trained model and perform predictions.
- Find the best model which is considered as default model.
>>> gs_obj.best_model_id
'SVM_0'
- Update the default trained model. Default model of GridSearch instance is updated using set_model method.
>>> gs_obj.set_model(model_id="SVM_1")
Though the default model is updated, known best model information will remain unchanged. The best model and corresponding information can be retrieved using the Properties of GridSearch starting with "best_". - Perform prediction using "SVM_1" model.
>>> gs_obj.predict(newdata=val_df.iloc[:5], **eval_params)
id prediction MedHouseVal 0 686 0.202843 1.578 1 2018 0.149868 0.578 2 1754 0.211870 1.651 3 670 0.192414 1.922 4 244 0.247545 1.117
- Find the best model which is considered as default model.