Parallelize Hyperparameter Tuning for Model and Non-Model Trainer | RandomSearch - Example 4: Parallelization in Hyperparameter Tuning for Model and Non-Model Trainer Functions - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
Language
English (United States)
Last Update
2024-12-18
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

teradataml provides the capability for parallel execution of hyperparameters for both model non-model trainer functions using RandomSearch algorithm. This example execute DecisionForest (model trainer function) and AntiSelect (non-model trainer function) on the admission dataset.

In this example, teradataml example admission data is used to demonstrate the parallel capability.

  1. Example setup.
    1. Load the example dataset.
      >>> load_example_data("teradataml", "admission_train")
    2. Create teradataml DataFrame.
      >>> train_df = DataFrame.from_table("admission_train")
    3. Identify and transform distinct categorical values into numerical values from the input using Ordinal Encoding.
      >>> ordinal_fit = OrdinalEncodingFit(data=df,
                                           target_column=['stats','programming','masters'])
      >>> ordinal_transform = OrdinalEncodingTransform(data=df,
                                                       object=ordinal_fit,
                                                       accumulate=['id','admitted','gpa'])
      >>> df = ordinal_transform.result
      >>> target_col='admitted'
      >>> columns =['gpa', 'stats', 'programming', 'masters']
    4. Scale the data.
      >>> scale_transform = ScaleTransform(data=df,
                                           object=scale_fit.output,
                                           accumulate=["id", "admitted"])
    5. Sample the data.
      >>> train_val_sample = scale_transform.result.sample(frac=[0.8, 0.2])
    6. Create train and test data.
      >>> train_df = train_val_sample[train_val_sample.sampleid == 1].drop("sampleid", axis = 1)
      >>> test_df = train_val_sample[train_val_sample.sampleid == 2].drop("sampleid", axis = 1)
  2. Execute model trainer function DecisionForest.
    1. Define hyperparameter tuning for DecisionForest function.
      >>> # Model training parameters
      >>> model_params = {"input_columns":(['gpa', 'stats', 'programming', 'masters']),
                          "response_column":'admitted',
                          "max_depth":(1,15,25,20),
                          "num_trees":(5,15,50),
                          "tree_type":'CLASSIFICATION'}
      >>> # Model evaluation parameters
      >>> eval_params = {"id_columnn": "id",
                         "accumulate": "admitted"
                        }
      >>> # Import model trainer and optimizer
      >>> from teradataml import DecisionForest, RandomSearch
      >>> # Initialize the RandomSearch optimizer with model trainer
      >>> # function and parameter space required for model training.
      >>> rs_obj = RandomSearch(func=DecisionForest, params=model_params, n_iter=5)
    2. Execute the hyperparameter fit function.
      The default setting for run_parallel is True. That is, by default, hyperparameter runs in parallel.
      >>> rs_obj.fit(data=train_df, verbose=2, run_parallel=True, **eval_params)
      Model_id:DECISIONFOREST_2 - Run time:29.327s - Status:PASS - ACCURACY:0.833       
      Model_id:DECISIONFOREST_3 - Run time:29.451s - Status:PASS - ACCURACY:0.833        
      Model_id:DECISIONFOREST_0 - Run time:29.454s - Status:PASS - ACCURACY:0.833        
      Model_id:DECISIONFOREST_1 - Run time:29.453s - Status:PASS - ACCURACY:0.833        
      Model_id:DECISIONFOREST_4 - Run time:16.397s - Status:PASS - ACCURACY:0.667        
      Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 5/5
      Different evaluation_metric can be used for training different models in hyperparameter tuning.
    3. View the results using models and model_stats properties.
      >>> # Trained models can be viewed using models property
      >>> rs_obj.models
               MODEL_ID        DATA_ID                        PARAMETERS                   STATUS   ACCURACY
      0    DECISIONFOREST_2    DF_0    {'input_columns': ['gpa', 'stats', 'programmin...    PASS    0.833333
      1    DECISIONFOREST_3    DF_0    {'input_columns': ['gpa', 'stats', 'programmin...    PASS    0.833333
      2    DECISIONFOREST_0    DF_0    {'input_columns': ['gpa', 'stats', 'programmin...    PASS    0.833333
      3    DECISIONFOREST_1    DF_0    {'input_columns': ['gpa', 'stats', 'programmin...    PASS    0.833333
      4    DECISIONFOREST_4    DF_0    {'input_columns': ['gpa', 'stats', 'programmin...    PASS    0.666667
      >>> # Additional Performance metrics can be viewd using model_stats property
      >>> rs_obj.model_stats
                  MODEL_ID    ACCURACY    MICRO-PRECISION    MICRO-RECALL    MICRO-F1    MACRO-PRECISION    MACRO-RECALL    MACRO-F1    WEIGHTED-PRECISION    WEIGHTED-RECALL    WEIGHTED-F1
      0    DECISIONFOREST_2    0.833333    0.833333    0.833333    0.833333    0.833333    0.875    0.828571    0.888889    0.833333    0.838095
      1    DECISIONFOREST_3    0.833333    0.833333    0.833333    0.833333    0.833333    0.875    0.828571    0.888889    0.833333    0.838095
      2    DECISIONFOREST_0    0.833333    0.833333    0.833333    0.833333    0.833333    0.875    0.828571    0.888889    0.833333    0.838095
      3    DECISIONFOREST_1    0.833333    0.833333    0.833333    0.833333    0.833333    0.875    0.828571    0.888889    0.833333    0.838095
      4    DECISIONFOREST_4    0.666667    0.666667    0.666667    0.666667    0.625000    0.625    0.625000    0.666667    0.666667    0.666667
  3. Execute non-model trainer function Antiselect.
    1. Define the parameter space for Antiselect function.
      >>> # Define the non-model trainer function parameter space.
      >>> params = { "data":train_df,
                     "exclude":(['stats', 'programming', 'masters'],
                                ['id', 'admitted'],
                                ['admitted', 'gpa', 'stats'],
                                ['masters'],
                                ['admitted', 'gpa', 'stats', 'programming'])}
      >>> # Import non-model trainer function and optimizer.
      >>> from teradataml import Antiselect, RandomSearch
      >>> # Initialize the GridSearch optimizer with non-model trainer
      >>> # function and parameter space required for non-model training.
      >>> rs_obj = RandomSearch(func=Antiselect, params=params, n_iter=3)
    2. Execute hyperparameter tunning with Antiselect in parallel.
      The default setting for run_parallel is True.
      >>> # Fitting Antiselect in parallel
      >>> rs_obj.fit(verbose=2)
              MODEL_ID                                           PARAMETERS    STATUS
      0    ANTISELECT_1    {'data': '"ALICE"."ml__select__170983718572642...    PASS
      1    ANTISELECT_2    {'data': '"ALICE"."ml__select__170983718572642...    PASS
      2    ANTISELECT_0    {'data': '"ALICE"."ml__select__170983718572642...    PASS
    3. View the non-model trainer function execution metadata.
      >>> # Retrieve the model metadata of "rs_obj" instance.
      >>> rs_obj.models
              MODEL_ID                                           PARAMETERS    STATUS
      0    ANTISELECT_1    {'data': '"ALICE"."ml__select__170983718572642...    PASS
      1    ANTISELECT_2    {'data': '"ALICE"."ml__select__170983718572642...    PASS
      2    ANTISELECT_0    {'data': '"ALICE"."ml__select__170983718572642...    PASS
    All the properties, arguments and functions in previous examples are also applicable here for model and non-model trainer functions.