Using Hyperparameter Tuning for Model Trainer Function | GridSearch| teradataml - Example 1: Using Hyperparameter Tuning for Model Trainer Function - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example creates an optimistic SVM regression model using GridSearch optimization algorithm. The best model from GridSearch optimization is used to predict house value in California.

In this example, teradataml example California housing data is used to build the SVM regression model.

  1. Example setup.
    1. Load example data from "cal_housing_ex_raw" that contains California housing data.
      >>> load_example_data("teradataml", ["cal_housing_ex_raw"])
    2. Create teradataml DataFrame objects.
      >>> data_input = DataFrame.from_table("cal_housing_ex_raw")
    3. Scale "target_columns" with respect to 'STD' value of the column.
      >>> fit_obj = ScaleFit(data=data_input,
                             target_columns=['MedInc', 'HouseAge', 'AveRooms',
                                             'AveBedrms', 'Population', 'AveOccup',
                                             'Latitude', 'Longitude'],
                             scale_method="USTD")
    4. Transform the data.
      >>> transform_obj = ScaleTransform(data=data_input,
                                         object=fit_obj.output,
                                         accumulate=["id", "MedHouseVal"])
    5. Sample train and validation dataframe, where 80% data used for model training and 20% used for model validation.
      >>> train_val_sample = transform_obj.result.sample(frac=[0.8, 0.2])
      >>> train_df = train_val_sample[train_val_sample.sampleid == 1].drop(\
                                      "sampleid", axis = 1)
      >>> val_df = train_val_sample[train_val_sample.sampleid == 2].drop(\
                                      "sampleid", axis = 1)
    6. Create two training data samples for model optimization.
      >>> train_df1 = train_df.iloc[:30]
      >>> train_df2 = train_df.iloc[30:]
  2. Define a parameter space and use GridSearch for Hyperparameterization.
    1. Define parameter space for model training.
      >>> params = {"input_columns":['MedInc', 'HouseAge', 'AveRooms',
                                       'AveBedrms', 'Population', 'AveOccup',
                                       'Latitude', 'Longitude'],
                     "response_column":"MedHouseVal",
                     "model_type":"regression",
                     "batch_size":(11, 50, 75),
                     "iter_max":(100, 301),
                     "lambda1":0.1,
                     "alpha":0.5,
                     "iter_num_no_change":60,
                     "tolerance":0.01,
                     "intercept":False,
                     "learning_rate":"INVTIME",
                     "initial_data":0.5,
                     "decay_rate":0.5,
                     "momentum":0.6,
                     "nesterov_optimization":True,
                     "local_sgd_iterations":1}
    2. Define required argument for model prediction and evaluation.
      >>> eval_params = {"id_column": "id",
                          "accumulate": "MedHouseVal"}
    3. Import trainer function and optimizer.
      >>> from teradataml import SVM, GridSearch
    4. Initialize the GridSearch optimizer with model trainer function and parameter space required for model training.
      >>> gs_obj = GridSearch(func=SVM, params=params)
      Model optimization is initiated using fit method.
  3. Pass single DataFrame for model trainer function and hyperparameter tuning execution viewed using progress bar. Perform model optimization for SVM function. Evaluation and prediction arguments are passed along with.
    >>> gs_obj.fit(data=train_df, verbose=1, **eval_params)
    Completed: |████████████████████████████████████████████████████████████| 100% - 6/6
    All model training has been passed. In case of failure, use get_error_log method to retrieve corresponding error logs.
  4. View trained model metadata from hyperparameter tuning using models property. Retrieve the model metadata of "gs_obj" instance.
    >>> gs_obj.models
      MODEL_ID DATA_ID                                         PARAMETERS STATUS       MAE
    0    SVM_2    DF_0  {'input_columns': ['MedInc', 'HouseAge', 'AveR      PASS  2.229113
    1    SVM_3    DF_0  {'input_columns': ['MedInc', 'HouseAge', 'AveR      PASS  2.229113
    2    SVM_0    DF_0  {'input_columns': ['MedInc', 'HouseAge', 'AveR      PASS  2.254733
    3    SVM_1    DF_0  {'input_columns': ['MedInc', 'HouseAge', 'AveR      PASS  2.254733
    4    SVM_4    DF_0  {'input_columns': ['MedInc', 'HouseAge', 'AveR      PASS  2.229113
    5    SVM_5    DF_0  {'input_columns': ['MedInc', 'HouseAge', 'AveR      PASS  2.229113
  5. View the best model identified by GridSearch. Retrieve the best model id identified by "gs_obj" instance.
    >>> gs_obj.best_model_id
    'SVM_0'
    Identified best model is stored as a default model for future prediction and evaluation operations.
  6. Perform prediction on validation data using the identified best model.
    >>> gs_obj.predict(newdata=val_df, **eval_params)
    ############ result Output ############
    
          id  prediction  MedHouseVal
    0  11246    0.350175        2.028
    1   5328    0.760091        2.775
    2  16736    1.236138        3.152
    3   3687    1.026577        1.741
    4   8783    0.031345        2.500
    5  17768    1.022263        1.601
    6   7114    0.224646        2.187
    7  18164    0.383037        3.674
    8  10966    0.253726        1.896
    9  16199    0.468865        0.590
  7. Perform evaluation of validation data using the best model.
    >>> gs_obj.evaluate(newdata=val_df, **eval_params)
    ############ result Output ############
    
            MAE      MSE      MSLE        MAPE         MPE      RMSE     RMSLE        ME        R2        EV  MPD  MGD
    0  2.011638  4.95115  0.097629  117.909264  116.463401  2.225118  0.312457  3.208421 -1.312606  0.436544  NaN  NaN
  8. View all trained model stats report. Retrieve the model stats of "gs_obj" instance.
    >>> gs_obj.model_stats
       MODEL_ID       MAE       MSE      MSLE        MAPE         MPE      RMSE     RMSLE        ME        R2        EV  MPD  MGD
    0     SVM_0  2.292239  6.188213  0.286058  110.899315  110.899315  2.487612  0.534843  3.325346 -1.788798  0.579146  NaN  NaN
    1     SVM_1  2.018557  6.799822  0.000000   92.159608   92.159608  2.607647  0.000000  5.467911 -2.512063 -0.407573  NaN  NaN
    2     SVM_2  2.018557  6.799822  0.000000   92.159608   92.159608  2.607647  0.000000  5.467911 -2.512063 -0.407573  NaN  NaN
    3     SVM_3  2.292239  6.188213  0.286058  110.899315  110.899315  2.487612  0.534843  3.325346 -1.788798  0.579146  NaN  NaN
    4     SVM_7  2.156224  5.709455  0.127423  101.757168  101.757168  2.389447  0.356964  3.440837 -1.573039  0.522228  NaN  NaN
    5     SVM_4  1.960213  6.039734  0.000000   91.243341   91.243341  2.457587  0.000000  4.982941 -2.119482 -0.134890  NaN  NaN
  9. Update default model with other trained model and perform predictions.
    1. Find the best model which is considered as default model.
      >>> gs_obj.best_model_id
      'SVM_0'
    2. Update the default trained model. Default model of GridSearch instance is updated using set_model method.
      >>> gs_obj.set_model(model_id="SVM_1")
      Though the default model is updated, known best model information will remain unchanged. The best model and corresponding information can be retrieved using the Properties of GridSearch starting with "best_".
    3. Perform prediction using "SVM_1" model.
      >>> gs_obj.predict(newdata=val_df.iloc[:5], **eval_params)
           id  prediction  MedHouseVal
      0   686    0.202843        1.578
      1  2018    0.149868        0.578
      2  1754    0.211870        1.651
      3   670    0.192414        1.922
      4   244    0.247545        1.117