This example shows time based early stopping for model trainer function XGBoost.
- Define hyperparameter space and GridSearch with XGBoost.
- Define model training parameters.
>>> model_params = {"input_columns":['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], "response_column" :'species', "max_depth":(5,10,15), "lambda1" :(1000.0,0.001), "model_type" :"Classification", "seed":32, "shrinkage_factor":0.1, "iter_num":(5, 50)}
- Define evaluation parameters.
>>> eval_params = {"id_column": "id", ... "accumulate":"species", ... "model_type":'Classification', ... "object_order_column":['task_index', 'tree_num', 'iter','class_num', 'tree_order'] }
- Import model trainer function and optimizer.
>>> from teradataml import XGBoost, GridSearch
- Initialize the GridSearch optimizer with model trainer function and parameter space required for model training.
>>> gs_obj = GridSearch(func=XGBoost, params=model_params)
- Define model training parameters.
- Execute hyperparameter tunning with max time.This step fits the hyperparameters in parallel with max_time argument set to 30 seconds.
>>> gs_obj.fit(data=data, max_time=30, verbose=2, **eval_params)
Model_id:XGBOOST_2 - Run time:33.277s - Status:PASS - ACCURACY:0.933 Model_id:XGBOOST_3 - Run time:33.276s - Status:PASS - ACCURACY:0.933 Model_id:XGBOOST_0 - Run time:33.279s - Status:PASS - ACCURACY:0.967 Model_id:XGBOOST_1 - Run time:33.278s - Status:PASS - ACCURACY:0.933 Computing: |⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾⫾| 33% - 4/12
As shown in the output, four models are trained, as the maximum number of trained models in parallel is set to 4.
Any additional models are skipped due to reaching the maximum time allowed for running hyperparameter tuning.
- View hyperparameter tuning model metadata using models and model_stats properties.
- View trained model using models property.
>>> gs_obj.models
MODEL_ID DATA_ID PARAMETERS STATUS ACCURACY 0 XGBOOST_2 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.933333 1 XGBOOST_4 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 2 XGBOOST_5 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 3 XGBOOST_6 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 4 XGBOOST_7 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 5 XGBOOST_8 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 6 XGBOOST_9 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 7 XGBOOST_10 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 8 XGBOOST_11 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... SKIP NaN 9 XGBOOST_3 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.933333 10 XGBOOST_0 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.966667 11 XGBOOST_1 DF_0 {'input_columns': ['sepal_length', 'sepal_widt... PASS 0.933333
The status "SKIP" indicates that the model was not trained due to reaching the maximum time limit. - View additional performance metrics using model_stats property.
>>> gs_obj.model_stats
MODEL_ID ACCURACY MICRO-PRECISION MICRO-RECALL MICRO-F1 MACRO-PRECISION MACRO-RECALL MACRO-F1 WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 XGBOOST_3 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1 XGBOOST_4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 XGBOOST_5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 XGBOOST_6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 XGBOOST_7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5 XGBOOST_8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 XGBOOST_9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 XGBOOST_10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8 XGBOOST_11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9 XGBOOST_2 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 10 XGBOOST_1 0.967 0.967 0.967 0.967 0.972 0.933 0.948 0.969 0.967 0.966 11 XGBOOST_0 0.967 0.967 0.967 0.967 0.972 0.933 0.948 0.969 0.967 0.966
- View trained model using models property.