teradataml provides the capability for parallel execution of hyperparameters for both model non-model trainer functions using RandomSearch algorithm. This example execute DecisionForest (model trainer function) and AntiSelect (non-model trainer function) on the admission dataset.
In this example, teradataml example admission data is used to demonstrate the parallel capability.
- Example setup.
- Load the example dataset.
>>> load_example_data("teradataml", "admission_train")
- Create teradataml DataFrame.
>>> train_df = DataFrame.from_table("admission_train")
- Identify and transform distinct categorical values into numerical values from the input using Ordinal Encoding.
>>> ordinal_fit = OrdinalEncodingFit(data=df, target_column=['stats','programming','masters'])
>>> ordinal_transform = OrdinalEncodingTransform(data=df, object=ordinal_fit, accumulate=['id','admitted','gpa'])
>>> df = ordinal_transform.result
>>> target_col='admitted'
>>> columns =['gpa', 'stats', 'programming', 'masters']
- Scale the data.
>>> scale_transform = ScaleTransform(data=df, object=scale_fit.output, accumulate=["id", "admitted"])
- Sample the data.
>>> train_val_sample = scale_transform.result.sample(frac=[0.8, 0.2])
- Create train and test data.
>>> train_df = train_val_sample[train_val_sample.sampleid == 1].drop("sampleid", axis = 1)
>>> test_df = train_val_sample[train_val_sample.sampleid == 2].drop("sampleid", axis = 1)
- Load the example dataset.
- Execute model trainer function DecisionForest.
- Define hyperparameter tuning for DecisionForest function.
>>> # Model training parameters >>> model_params = {"input_columns":(['gpa', 'stats', 'programming', 'masters']), "response_column":'admitted', "max_depth":(1,15,25,20), "num_trees":(5,15,50), "tree_type":'CLASSIFICATION'}
>>> # Model evaluation parameters >>> eval_params = {"id_columnn": "id", "accumulate": "admitted" }
>>> # Import model trainer and optimizer >>> from teradataml import DecisionForest, RandomSearch
>>> # Initialize the RandomSearch optimizer with model trainer >>> # function and parameter space required for model training. >>> rs_obj = RandomSearch(func=DecisionForest, params=model_params, n_iter=5)
- Execute the hyperparameter fit function.The default setting for run_parallel is True. That is, by default, hyperparameter runs in parallel.
>>> rs_obj.fit(data=train_df, verbose=2, run_parallel=True, **eval_params)
Model_id:DECISIONFOREST_2 - Run time:29.327s - Status:PASS - ACCURACY:0.833 Model_id:DECISIONFOREST_3 - Run time:29.451s - Status:PASS - ACCURACY:0.833 Model_id:DECISIONFOREST_0 - Run time:29.454s - Status:PASS - ACCURACY:0.833 Model_id:DECISIONFOREST_1 - Run time:29.453s - Status:PASS - ACCURACY:0.833 Model_id:DECISIONFOREST_4 - Run time:16.397s - Status:PASS - ACCURACY:0.667 Completed: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿| 100% - 5/5
Different evaluation_metric can be used for training different models in hyperparameter tuning. - View the results using models and model_stats properties.
>>> # Trained models can be viewed using models property >>> rs_obj.models
MODEL_ID DATA_ID PARAMETERS STATUS ACCURACY 0 DECISIONFOREST_2 DF_0 {'input_columns': ['gpa', 'stats', 'programmin... PASS 0.833333 1 DECISIONFOREST_3 DF_0 {'input_columns': ['gpa', 'stats', 'programmin... PASS 0.833333 2 DECISIONFOREST_0 DF_0 {'input_columns': ['gpa', 'stats', 'programmin... PASS 0.833333 3 DECISIONFOREST_1 DF_0 {'input_columns': ['gpa', 'stats', 'programmin... PASS 0.833333 4 DECISIONFOREST_4 DF_0 {'input_columns': ['gpa', 'stats', 'programmin... PASS 0.666667
>>> # Additional Performance metrics can be viewd using model_stats property >>> rs_obj.model_stats
MODEL_ID ACCURACY MICRO-PRECISION MICRO-RECALL MICRO-F1 MACRO-PRECISION MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 DECISIONFOREST_2 0.833333 0.833333 0.833333 0.833333 0.833333 0.875 0.828571 0.888889 0.833333 0.838095 1 DECISIONFOREST_3 0.833333 0.833333 0.833333 0.833333 0.833333 0.875 0.828571 0.888889 0.833333 0.838095 2 DECISIONFOREST_0 0.833333 0.833333 0.833333 0.833333 0.833333 0.875 0.828571 0.888889 0.833333 0.838095 3 DECISIONFOREST_1 0.833333 0.833333 0.833333 0.833333 0.833333 0.875 0.828571 0.888889 0.833333 0.838095 4 DECISIONFOREST_4 0.666667 0.666667 0.666667 0.666667 0.625000 0.625 0.625000 0.666667 0.666667 0.666667
- Define hyperparameter tuning for DecisionForest function.
- Execute non-model trainer function Antiselect.
- Define the parameter space for Antiselect function.
>>> # Define the non-model trainer function parameter space. >>> params = { "data":train_df, "exclude":(['stats', 'programming', 'masters'], ['id', 'admitted'], ['admitted', 'gpa', 'stats'], ['masters'], ['admitted', 'gpa', 'stats', 'programming'])}
>>> # Import non-model trainer function and optimizer. >>> from teradataml import Antiselect, RandomSearch
>>> # Initialize the GridSearch optimizer with non-model trainer >>> # function and parameter space required for non-model training. >>> rs_obj = RandomSearch(func=Antiselect, params=params, n_iter=3)
- Execute hyperparameter tunning with Antiselect in parallel.The default setting for run_parallel is True.
>>> # Fitting Antiselect in parallel >>> rs_obj.fit(verbose=2)
MODEL_ID PARAMETERS STATUS 0 ANTISELECT_1 {'data': '"ALICE"."ml__select__170983718572642... PASS 1 ANTISELECT_2 {'data': '"ALICE"."ml__select__170983718572642... PASS 2 ANTISELECT_0 {'data': '"ALICE"."ml__select__170983718572642... PASS
- View the non-model trainer function execution metadata.
>>> # Retrieve the model metadata of "rs_obj" instance. >>> rs_obj.models
MODEL_ID PARAMETERS STATUS 0 ANTISELECT_1 {'data': '"ALICE"."ml__select__170983718572642... PASS 1 ANTISELECT_2 {'data': '"ALICE"."ml__select__170983718572642... PASS 2 ANTISELECT_0 {'data': '"ALICE"."ml__select__170983718572642... PASS
All the properties, arguments and functions in previous examples are also applicable here for model and non-model trainer functions. - Define the parameter space for Antiselect function.