Parallelize Hyperparameter Tuning for Model and Non-Model Trainer | GridSearch - Example 4: Parallelization in Hyperparameter Tuning for Model and Non-Model Trainer Functions - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

teradataml provides the capability for parallel execution of hyperparameters for both model and non-model trainer functions using GridSearch algorithm. This example executes DecisionForest (model trainer function) and AntiSelect (non-model trainer function) on the admission dataset.

In this example, admission data is used to demonstrate the parallel capability.

  1. Example setup.
    1. Load the example dataset.
      >>> load_example_data("teradataml", "admission_train")
    2. Create teradataml DataFrame.
      >>> train_df = DataFrame.from_table("admission_train")
  2. Execute model trainer function DecisionForest.
    1. Define hyperparameter tuning for DecisionForest function.
      >>> # Model training parameters
      >>> model_params = {"input_columns":(['gpa', 'stats', 'programming', 'masters']),
                          "response_column":'admitted',
                          "max_depth":(1,15,25,20),
                          "num_trees":(5,15,50),
                          "tree_type":'CLASSIFICATION'}
      >>> # Model evaluation parameters
      >>> eval_params = {"id_columnn": "id",
                         "accumulate": "admitted"
                        }
      >>> # Import model trainer and optimizer
      >>> from teradataml import DecisionForest, GridSearch
      >>> # Initialize the GridSearch optimizer with model trainer
      >>> # function and parameter space required for model training.
      >>> gs_obj = GridSearch(func=DecisionForest, params=model_params)
    2. Execute the hyperparameter fit function.
      The default setting for run_parallel is True.
      >>> gs_obj.fit(data=train_df, run_parallel=True, verbose=2, evaluation_metric="Micro-f1", **eval_params)
      Model_id:DECISIONFOREST_0 - Run time:27.721s - Status:PASS - MICRO-F1:0.5          
      Model_id:DECISIONFOREST_3 - Run time:27.722s - Status:PASS - MICRO-F1:0.5           
      Model_id:DECISIONFOREST_1 - Run time:27.722s - Status:PASS - MICRO-F1:0.5            
      Model_id:DECISIONFOREST_2 - Run time:27.722s - Status:PASS - MICRO-F1:0.5            
      Model_id:DECISIONFOREST_4 - Run time:28.249s - Status:PASS - MICRO-F1:0.5           
      Model_id:DECISIONFOREST_5 - Run time:28.249s - Status:PASS - MICRO-F1:0.167          
      Model_id:DECISIONFOREST_7 - Run time:28.239s - Status:PASS - MICRO-F1:0.167         
      Model_id:DECISIONFOREST_6 - Run time:28.242s - Status:PASS - MICRO-F1:0.5           
      Model_id:DECISIONFOREST_9 - Run time:27.638s - Status:PASS - MICRO-F1:0.5           
      Model_id:DECISIONFOREST_11 - Run time:27.368s - Status:PASS - MICRO-F1:0.167        
      Model_id:DECISIONFOREST_10 - Run time:27.624s - Status:PASS - MICRO-F1:0.167         
      Model_id:DECISIONFOREST_8 - Run time:27.661s - Status:PASS - MICRO-F1:0.167          
      Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 12/12
      Different evaluation_metric can be used for training different models in hyperparameter tuning.
    3. View the results using models and model_stats properties.
      >>> # Trained models can be viewed using models property
      >>> gs_obj.models
               MODEL_ID                DATA_ID                 PARAMETERS                              STATUS        MICRO-F1
      0        DECISIONFOREST_0        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.500000
      1        DECISIONFOREST_3        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.500000
      2        DECISIONFOREST_1        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.500000
      3        DECISIONFOREST_2        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.500000
      4        DECISIONFOREST_4        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.500000
      5        DECISIONFOREST_5        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.166667
      6        DECISIONFOREST_7        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.166667
      7        DECISIONFOREST_6        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.500000
      8        DECISIONFOREST_9        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.500000
      9        DECISIONFOREST_11       DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.166667
      10       DECISIONFOREST_10       DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.166667
      11       DECISIONFOREST_8        DF_0        {'input_columns': ['gpa', 'stats', 'programmin       PASS        0.166667
      >>> # Additional Performance metrics can be viewd using model_stats property
      >>> gs_obj.model_stats
      MODEL_ID           ACCURACY   MICRO-PRECISION  MICRO-RECALL  MICRO-F1   MACRO-PRECISION  MACRO-RECALL  MACRO-F1  WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
      0   DECISIONFOREST_0  0.500000  0.500000          0.500000    0.500000  0.625000           0.7          0.485714  0.875000            0.500000         0.542857
      1   DECISIONFOREST_3  0.500000  0.500000          0.500000    0.500000  0.625000           0.7          0.485714  0.875000            0.500000         0.542857
      2   DECISIONFOREST_1  0.500000  0.500000          0.500000    0.500000  0.625000           0.7          0.485714  0.875000            0.500000         0.542857
      3   DECISIONFOREST_2  0.500000  0.500000          0.500000    0.500000  0.625000           0.7          0.485714  0.875000            0.500000         0.542857
      4   DECISIONFOREST_4  0.500000  0.500000          0.500000    0.500000  0.625000           0.7          0.485714  0.875000            0.500000         0.542857
      5   DECISIONFOREST_5  0.166667  0.166667          0.166667    0.166667  0.083333           0.5          0.142857  0.027778            0.166667         0.047619
      6   DECISIONFOREST_7  0.166667  0.166667          0.166667    0.166667  0.083333           0.5          0.142857  0.027778            0.166667         0.047619
      7   DECISIONFOREST_6  0.500000  0.500000          0.500000    0.500000  0.625000           0.7          0.485714  0.875000            0.500000         0.542857
      8   DECISIONFOREST_9  0.500000  0.500000          0.500000    0.500000  0.625000           0.7          0.485714  0.875000            0.500000         0.542857
      9   DECISIONFOREST_11 0.166667  0.166667          0.166667    0.166667  0.083333           0.5          0.142857  0.027778            0.166667         0.047619
      10  DECISIONFOREST_10 0.166667  0.166667          0.166667    0.166667  0.083333           0.5          0.142857  0.027778            0.166667         0.047619
      11  DECISIONFOREST_8  0.166667  0.166667          0.166667    0.166667  0.083333           0.5          0.142857  0.027778            0.166667         0.047619
  3. Execute non-model trainer function Antiselect.
    1. Define the parameter space for Antiselect fiunction.
      >>> # Define the non-model trainer function parameter space.
      >>> params = { "data":train_df,
                     "exclude":(['stats', 'programming', 'masters'],
                                ['id', 'admitted'],
                                ['admitted', 'gpa', 'stats'],
                                ['masters'],
                                ['admitted', 'gpa', 'stats', 'programming'])}
      >>> # Import non-model trainer function and optimizer.
      >>> from teradataml import Antiselect, GridSearch
      >>> # Initialize the GridSearch optimizer with non-model trainer
      >>> # function and parameter space required for non-model training.
      >>> gs_obj = GridSearch(func=Antiselect, params=params)
    2. Execute hyperparameter tunning with Antiselect in parallel.
      >>> # Execute Antiselect in parallel
      >>> gs_obj.fit(verbose=2)
      Model_id:ANTISELECT_3 - Run time:5.878s - Status:PASS                             
      Model_id:ANTISELECT_2 - Run time:5.882s - Status:PASS                               
      Model_id:ANTISELECT_1 - Run time:5.882s - Status:PASS                              
      Model_id:ANTISELECT_0 - Run time:5.883s - Status:PASS                              
      Model_id:ANTISELECT_4 - Run time:4.402s - Status:PASS                              
      Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 5/5
    3. View the non-model trainer function execution metadata.
      >>> # Retrieve the model metadata of "gs_obj" instance.
      >>> gs_obj.models
          MODEL_ID                              PARAMETERS                          STATUS
      0      ANTISELECT_3      {'data': '"ALICE"."ml__select__170973834455250         PASS
      1      ANTISELECT_2      {'data': '"ALICE"."ml__select__170973834455250         PASS
      2      ANTISELECT_1      {'data': '"ALICE"."ml__select__170973834455250         PASS
      3      ANTISELECT_0      {'data': '"ALICE"."ml__select__170973834455250         PASS
      4      ANTISELECT_4      {'data': '"ALICE"."ml__select__170973834455250         PASS
    All the properties, arguments and functions in previous examples are also applicable here for model and non-model trainer functions.