teradataml provides the capability for parallel execution of hyperparameters for both model and non-model trainer functions using GridSearch algorithm. This example executes DecisionForest (model trainer function) and AntiSelect (non-model trainer function) on the admission dataset.
In this example, admission data is used to demonstrate the parallel capability.
- Example setup.
- Load the example dataset.
>>> load_example_data("teradataml", "admission_train")
- Create teradataml DataFrame.
>>> train_df = DataFrame.from_table("admission_train")
- Load the example dataset.
- Execute model trainer function DecisionForest.
- Define hyperparameter tuning for DecisionForest function.
>>> # Model training parameters >>> model_params = {"input_columns":(['gpa', 'stats', 'programming', 'masters']), "response_column":'admitted', "max_depth":(1,15,25,20), "num_trees":(5,15,50), "tree_type":'CLASSIFICATION'}
>>> # Model evaluation parameters >>> eval_params = {"id_columnn": "id", "accumulate": "admitted" }
>>> # Import model trainer and optimizer >>> from teradataml import DecisionForest, GridSearch
>>> # Initialize the GridSearch optimizer with model trainer >>> # function and parameter space required for model training. >>> gs_obj = GridSearch(func=DecisionForest, params=model_params)
- Execute the hyperparameter fit function.The default setting for run_parallel is True.
>>> gs_obj.fit(data=train_df, run_parallel=True, verbose=2, evaluation_metric="Micro-f1", **eval_params)
Model_id:DECISIONFOREST_0 - Run time:27.721s - Status:PASS - MICRO-F1:0.5 Model_id:DECISIONFOREST_3 - Run time:27.722s - Status:PASS - MICRO-F1:0.5 Model_id:DECISIONFOREST_1 - Run time:27.722s - Status:PASS - MICRO-F1:0.5 Model_id:DECISIONFOREST_2 - Run time:27.722s - Status:PASS - MICRO-F1:0.5 Model_id:DECISIONFOREST_4 - Run time:28.249s - Status:PASS - MICRO-F1:0.5 Model_id:DECISIONFOREST_5 - Run time:28.249s - Status:PASS - MICRO-F1:0.167 Model_id:DECISIONFOREST_7 - Run time:28.239s - Status:PASS - MICRO-F1:0.167 Model_id:DECISIONFOREST_6 - Run time:28.242s - Status:PASS - MICRO-F1:0.5 Model_id:DECISIONFOREST_9 - Run time:27.638s - Status:PASS - MICRO-F1:0.5 Model_id:DECISIONFOREST_11 - Run time:27.368s - Status:PASS - MICRO-F1:0.167 Model_id:DECISIONFOREST_10 - Run time:27.624s - Status:PASS - MICRO-F1:0.167 Model_id:DECISIONFOREST_8 - Run time:27.661s - Status:PASS - MICRO-F1:0.167 Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 12/12
Different evaluation_metric can be used for training different models in hyperparameter tuning. - View the results using models and model_stats properties.
>>> # Trained models can be viewed using models property >>> gs_obj.models
MODEL_ID DATA_ID PARAMETERS STATUS MICRO-F1 0 DECISIONFOREST_0 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.500000 1 DECISIONFOREST_3 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.500000 2 DECISIONFOREST_1 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.500000 3 DECISIONFOREST_2 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.500000 4 DECISIONFOREST_4 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.500000 5 DECISIONFOREST_5 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.166667 6 DECISIONFOREST_7 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.166667 7 DECISIONFOREST_6 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.500000 8 DECISIONFOREST_9 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.500000 9 DECISIONFOREST_11 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.166667 10 DECISIONFOREST_10 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.166667 11 DECISIONFOREST_8 DF_0 {'input_columns': ['gpa', 'stats', 'programmin PASS 0.166667
>>> # Additional Performance metrics can be viewd using model_stats property >>> gs_obj.model_stats
MODEL_ID ACCURACY MICRO-PRECISION MICRO-RECALL MICRO-F1 MACRO-PRECISION MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 DECISIONFOREST_0 0.500000 0.500000 0.500000 0.500000 0.625000 0.7 0.485714 0.875000 0.500000 0.542857 1 DECISIONFOREST_3 0.500000 0.500000 0.500000 0.500000 0.625000 0.7 0.485714 0.875000 0.500000 0.542857 2 DECISIONFOREST_1 0.500000 0.500000 0.500000 0.500000 0.625000 0.7 0.485714 0.875000 0.500000 0.542857 3 DECISIONFOREST_2 0.500000 0.500000 0.500000 0.500000 0.625000 0.7 0.485714 0.875000 0.500000 0.542857 4 DECISIONFOREST_4 0.500000 0.500000 0.500000 0.500000 0.625000 0.7 0.485714 0.875000 0.500000 0.542857 5 DECISIONFOREST_5 0.166667 0.166667 0.166667 0.166667 0.083333 0.5 0.142857 0.027778 0.166667 0.047619 6 DECISIONFOREST_7 0.166667 0.166667 0.166667 0.166667 0.083333 0.5 0.142857 0.027778 0.166667 0.047619 7 DECISIONFOREST_6 0.500000 0.500000 0.500000 0.500000 0.625000 0.7 0.485714 0.875000 0.500000 0.542857 8 DECISIONFOREST_9 0.500000 0.500000 0.500000 0.500000 0.625000 0.7 0.485714 0.875000 0.500000 0.542857 9 DECISIONFOREST_11 0.166667 0.166667 0.166667 0.166667 0.083333 0.5 0.142857 0.027778 0.166667 0.047619 10 DECISIONFOREST_10 0.166667 0.166667 0.166667 0.166667 0.083333 0.5 0.142857 0.027778 0.166667 0.047619 11 DECISIONFOREST_8 0.166667 0.166667 0.166667 0.166667 0.083333 0.5 0.142857 0.027778 0.166667 0.047619
- Define hyperparameter tuning for DecisionForest function.
- Execute non-model trainer function Antiselect.
- Define the parameter space for Antiselect function.
>>> # Define the non-model trainer function parameter space. >>> params = { "data":train_df, "exclude":(['stats', 'programming', 'masters'], ['id', 'admitted'], ['admitted', 'gpa', 'stats'], ['masters'], ['admitted', 'gpa', 'stats', 'programming'])}
>>> # Import non-model trainer function and optimizer. >>> from teradataml import Antiselect, GridSearch
>>> # Initialize the GridSearch optimizer with non-model trainer >>> # function and parameter space required for non-model training. >>> gs_obj = GridSearch(func=Antiselect, params=params)
- Execute hyperparameter tunning with Antiselect in parallel.
>>> # Execute Antiselect in parallel >>> gs_obj.fit(verbose=2)
Model_id:ANTISELECT_3 - Run time:5.878s - Status:PASS Model_id:ANTISELECT_2 - Run time:5.882s - Status:PASS Model_id:ANTISELECT_1 - Run time:5.882s - Status:PASS Model_id:ANTISELECT_0 - Run time:5.883s - Status:PASS Model_id:ANTISELECT_4 - Run time:4.402s - Status:PASS Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 5/5
- View the non-model trainer function execution metadata.
>>> # Retrieve the model metadata of "gs_obj" instance. >>> gs_obj.models
MODEL_ID PARAMETERS STATUS 0 ANTISELECT_3 {'data': '"ALICE"."ml__select__170973834455250 PASS 1 ANTISELECT_2 {'data': '"ALICE"."ml__select__170973834455250 PASS 2 ANTISELECT_1 {'data': '"ALICE"."ml__select__170973834455250 PASS 3 ANTISELECT_0 {'data': '"ALICE"."ml__select__170973834455250 PASS 4 ANTISELECT_4 {'data': '"ALICE"."ml__select__170973834455250 PASS
All the properties, arguments and functions in previous examples are also applicable here for model and non-model trainer functions. - Define the parameter space for Antiselect function.