Use Hyperparameter Tuning for Model Trainer Function | RandomSearch| teradataml - Example 1: Using Hyperparameter Tuning for Model Trainer Function - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

This example creates an optimistic DecisionForest classification model using RandomSearch optimization algorithm. The best model from RandomSearch optimization is used to classify iris flower. Perform hyperparameter-tuning on DecisionForest model trainer function for classification task.

In this example, teradataml example iris data is used to build the DecisionForest classification model.

  1. Example Setup.
    1. Load example data.
      >>> load_example_data("byom", "iris_input")
    2. Create two samples of input data: sample 1 has 90% of total rows and sample 2 has 10% of total rows.
      >>> iris_sample = iris_input.sample(frac=[0.9, 0.1])
    3. Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
      >>> iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
    4. Create validation dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
      >>> iris_val = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
  2. Define the parameter space and use RandomSearch for hyperparameterization.
    1. Define parameter space for model training.
      >>> params = {"input_columns":["sepal_length", "sepal_width", "petal_length", "petal_width"],
                    "response_column":"species",
                    "tree_type":"classification",
                    "ntree":tuple(set(round(random.uniform(20, 500)) for i in range(50))),
                    "tree_size":(100, 200),
                    "nodesize":10,
                    "variance":tuple(set(round(random.random(), 2) for i in range(20))),
                    "max_depth":tuple(set(round(random.uniform(2, 20)) for i in range(6))),
                    "maxnum_categorical":20,
                    "mtry":30,
                    "mtry_seed":100,
                    "seed":100}
    2. Define required argument for model prediction and evaluation.
      >>> eval_params = {"id_column": "id",
                         "accumulate": "species"}
    3. Import trainer function and optimizer.
      >>> from teradataml import DecisionForest, RandomSearch
    4. Initialize the RandomSearch optimizer with model trainer function and parameter space required for model training.
      >>> rs_obj = RandomSearch(func=DecisionForest, params=params, n_iter=4)
      Model optimization is initiated using fit method.
  3. Pass a single DataFrame for model training. Perform model optimization for DecisionForest function.Evaluation and prediction arguments are passed along with training dataframe.

    Parallel execution mode is disabled, and early stop criteria is set.

    >>> rs_obj.fit(data=iris_train, run_parallel=False, early_stop=0.93, **eval_params)
    All model training has been passed. In case of failure, use get_error_log method to retrieve corresponding error logs.
    Model optimization will be stopped once early_stop criteria is achieved. Otherwise, Hyperparameter tuning is performed for specified iterations.
  4. View trained model metadata from hyperparameter tuning using models property. Retrieve the model metadata of "rs_obj" instance.
    >>> rs_obj.models
               MODEL_ID DATA_ID                                         PARAMETERS STATUS  ACCURACY
    0  DECISIONFOREST_0    DF_0  {'input_columns': ['sepal_length', 'sepal_widt      PASS  0.925926
    1  DECISIONFOREST_1    DF_0  {'input_columns': ['sepal_length', 'sepal_widt      PASS  0.925926
    2  DECISIONFOREST_2    DF_0  {'input_columns': ['sepal_length', 'sepal_widt      PASS  0.925926
    3  DECISIONFOREST_3    DF_0  {'input_columns': ['sepal_length', 'sepal_widt      PASS  0.925926
  5. View the best model and corresponding information identified by RandomSearch.
    1. Retrieve the best model id identified by "rs_obj" instance.
      >>> rs_obj.best_model_id
      'DECISIONFOREST_0'
    2. Retrieve the best data id.
      >>> rs_obj.best_data_id
      'DF_0'
    3. Retrieve the best model of "rs_obj" instance.
      >>> rs_obj.best_model
      ############ result Output ############
      
         task_index  tree_num  tree_order                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    classification_tree
      0           1         0           0  {"id_":1,"size_":87,"maxDepth_":10,"responseCounts_":{"1":28,"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.664817,"scoreImprove_":0.328172,"leftNodeSize_":28,"rightNodeSize_":59},"leftChild_":{"id_":2,"size_":28,"maxDepth_":9,"label_":"1","responseCounts_":{"1":28},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":59,"maxDepth_":9,"responseCounts_":{"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.700000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.496409,"scoreImprove_":0.336645,"leftNodeSize_":32,"rightNodeSize_":27},"leftChild_":{"id_":6,"size_":32,"maxDepth_":8,"label_":"2","responseCounts_":{"2":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":27,"maxDepth_":8,"label_":"3","responseCounts_":{"3":27},"nodeType_":"CLASSIFICATION_LEAF"}}}
      1           0         0           0  {"id_":1,"size_":86,"maxDepth_":10,"responseCounts_":{"1":32,"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.663332,"scoreImprove_":0.351101,"leftNodeSize_":32,"rightNodeSize_":54},"leftChild_":{"id_":2,"size_":32,"maxDepth_":9,"label_":"1","responseCounts_":{"1":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":54,"maxDepth_":9,"responseCounts_":{"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.750000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.497257,"scoreImprove_":0.312231,"leftNodeSize_":29,"rightNodeSize_":25},"leftChild_":{"id_":6,"size_":29,"maxDepth_":8,"label_":"2","responseCounts_":{"2":29},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":25,"maxDepth_":8,"label_":"3","responseCounts_":{"3":25},"nodeType_":"CLASSIFICATION_LEAF"}}}
    4. Retrieve the best parameters.
      >>> rs_obj.best_params_
      {'input_columns': ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], 'response_column': 'species', 'tree_type': 'classification', 'ntree': 453, 'tree_size': 100, 'nodesize': 10, 'variance': 0.25, 'max_depth': 10, 'maxnum_categorical': 20, 'mtry': 30, 'mtry_seed': 100, 'seed': 100, 'data': '"ALICE"."ml__select__169836386746840"'}
    5. Retrieve the best sampled data.
      >>> rs_obj.best_sampled_data_
      [{'data':     
                 sepal_length sepal_width  petal_length petal_width species
      id
      141           6.7          3.1           5.6          2.4        3
      139           6.0          3.0           4.8          1.8        3
      118           7.7          3.8           6.7          2.2        3
      61            5.0          2.0           3.5          1.0        2
      59            6.6          2.9           4.6          1.3        2
      38            4.9          3.6           1.4          0.1        1
      78            6.7          3.0           5.0          1.7        2
      36            5.0          3.2           1.2          0.2        1
      40            5.1          3.4           1.5          0.2        1
      17            5.4          3.9           1.3          0.4        1},
       {'newdata':
                   sepal_length sepal_width  petal_length petal_width species
      id
      5             5.0          3.6           1.4          0.2        1
      58            4.9          2.4           3.3          1.0        2
      77            6.8          2.8           4.8          1.4        2
      99            5.1          2.5           3.0          1.1        2
      28            5.2          3.5           1.5          0.2        1
      89            5.6          3.0           4.1          1.3        2
      106           7.6          3.0           6.6          2.1        3
      125           6.7          3.3           5.7          2.1        3
      70            5.6          2.5           3.9          1.1        2
      43            4.4          3.2           1.3          0.2        1}]
    Identified best model is stored as a default model for future prediction and evaluation operations.
  6. Perform prediction on validation data using the identified best model.
    >>> rs_obj.predict(newdata=iris_val, **eval_params)
    ############ result Output ############
    
       species   id  prediction  confidence_lower  confidence_upper
    0        2   54           2               1.0               1.0
    1        1    8           1               1.0               1.0
    2        1   23           1               1.0               1.0
    3        2   85           2               1.0               1.0
    4        3  138           3               1.0               1.0
    5        1   14           1               1.0               1.0
    6        1   46           1               1.0               1.0
    7        3  107           2               1.0               1.0
    8        2   81           2               1.0               1.0
    9        3  113           3               1.0               1.0
  7. Perform evaluation using internally sampled data using the best model.
    If validation data is not passed to evaluate method, it will use internally sampled test data for evaluation.
    >>> rs_obj.evaluate(newdata=iris_val, **eval_params)
    ############ output_data Output ############
    
       SeqNum                                              Metric  MetricValue
    0       3  Micro-Recall                                                1.0
    1       5  Macro-Precision                                             1.0
    2       6  Macro-Recall                                                1.0
    3       7  Macro-F1                                                    1.0
    4       9  Weighted-Recall                                             1.0
    5      10  Weighted-F1                                                 1.0
    6       8  Weighted-Precision                                          1.0
    7       4  Micro-F1                                                    1.0
    8       2  Micro-Precision                                             1.0
    9       1  Accuracy                                                    1.0
    
    
    ############ result Output ############
    
           Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  Precision  Recall   F1  Support
    SeqNum
    2               3  CLASS_3        0        0        5        1.0     1.0  1.0        5
    1               2  CLASS_2        0        5        0        1.0     1.0  1.0        5
    0               1  CLASS_1        5        0        0        1.0     1.0  1.0        5
  8. View all trained model stats report. Retrieve the model stats of "rs_obj" instance.
    >>> rs_obj.model_stats
               MODEL_ID  ACCURACY  MICRO-PRECISION       WEIGHTED-PRECISION  WEIGHTED-RECALL  WEIGHTED-F1
    0  DECISIONFOREST_0  0.925926         0.925926                 0.939394         0.925926        0.925
    1  DECISIONFOREST_1  0.925926         0.925926                 0.939394         0.925926        0.925
    2  DECISIONFOREST_2  0.925926         0.925926                 0.939394         0.925926        0.925
    3  DECISIONFOREST_3  0.925926         0.925926                 0.939394         0.925926        0.925
    
    [4 rows x 11 columns]
  9. Update default model with other trained model and perform predictions.
    1. Find the best model which is considered as default model.
      >>> rs_obj.best_model_id
      'DECISIONFOREST_0'
    2. Update the default trained model of RandomSearch instance using set_model method.
      >>> rs_obj.set_model(model_id="DECISIONFOREST_1")
    3. Perform prediction using "DECISIONFOREST_1" model.
      >>> rs_obj.predict(newdata=iris_val.iloc[:5], **eval_params)
      ############ result Output ############
      
         species  id  prediction  confidence_lower  confidence_upper
      0        1  17           1               1.0               1.0
      1        1  26           1               1.0               1.0
      2        1  19           1               1.0               1.0
      3        1  15           1               1.0               1.0
      4        1   6           1               1.0               1.0
  10. Retrieve any trained model from the RandomSearch instance using get_model.
    >>> rs_obj.get_model("DECISIONFOREST_3")
    ############ result Output ############
    
       task_index  tree_num  tree_order                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   classification_tree
    0           1         0           0  {"id_":1,"size_":87,"maxDepth_":6,"responseCounts_":{"1":28,"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.664817,"scoreImprove_":0.328172,"leftNodeSize_":28,"rightNodeSize_":59},"leftChild_":{"id_":2,"size_":28,"maxDepth_":5,"label_":"1","responseCounts_":{"1":28},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":59,"maxDepth_":5,"responseCounts_":{"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.700000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.496409,"scoreImprove_":0.336645,"leftNodeSize_":32,"rightNodeSize_":27},"leftChild_":{"id_":6,"size_":32,"maxDepth_":4,"label_":"2","responseCounts_":{"2":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":27,"maxDepth_":4,"label_":"3","responseCounts_":{"3":27},"nodeType_":"CLASSIFICATION_LEAF"}}}
    1           0         0           0  {"id_":1,"size_":86,"maxDepth_":6,"responseCounts_":{"1":32,"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.663332,"scoreImprove_":0.351101,"leftNodeSize_":32,"rightNodeSize_":54},"leftChild_":{"id_":2,"size_":32,"maxDepth_":5,"label_":"1","responseCounts_":{"1":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":54,"maxDepth_":5,"responseCounts_":{"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.750000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.497257,"scoreImprove_":0.312231,"leftNodeSize_":29,"rightNodeSize_":25},"leftChild_":{"id_":6,"size_":29,"maxDepth_":4,"label_":"2","responseCounts_":{"2":29},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":25,"maxDepth_":4,"label_":"3","responseCounts_":{"3":25},"nodeType_":"CLASSIFICATION_LEAF"}}}