This example creates an optimistic DecisionForest classification model using RandomSearch optimization algorithm. The best model from RandomSearch optimization is used to classify iris flower. Perform hyperparameter-tuning on DecisionForest model trainer function for classification task.
In this example, teradataml example iris data is used to build the DecisionForest classification model.
- Example Setup.
- Load example data.
>>> load_example_data("byom", "iris_input")
- Create two samples of input data: sample 1 has 90% of total rows and sample 2 has 10% of total rows.
>>> iris_sample = iris_input.sample(frac=[0.9, 0.1])
- Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
>>> iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
- Create validation dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
>>> iris_val = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
- Load example data.
- Define the parameter space and use RandomSearch for hyperparameterization.
- Define parameter space for model training.
>>> params = {"input_columns":["sepal_length", "sepal_width", "petal_length", "petal_width"], "response_column":"species", "tree_type":"classification", "ntree":tuple(set(round(random.uniform(20, 500)) for i in range(50))), "tree_size":(100, 200), "nodesize":10, "variance":tuple(set(round(random.random(), 2) for i in range(20))), "max_depth":tuple(set(round(random.uniform(2, 20)) for i in range(6))), "maxnum_categorical":20, "mtry":30, "mtry_seed":100, "seed":100}
- Define required argument for model prediction and evaluation.
>>> eval_params = {"id_column": "id", "accumulate": "species"}
- Import trainer function and optimizer.
>>> from teradataml import DecisionForest, RandomSearch
- Initialize the RandomSearch optimizer with model trainer function and parameter space required for model training.
>>> rs_obj = RandomSearch(func=DecisionForest, params=params, n_iter=4)
Model optimization is initiated using fit method.
- Define parameter space for model training.
- Pass a single DataFrame for model training. Perform model optimization for DecisionForest function.Evaluation and prediction arguments are passed along with training dataframe.
Parallel execution mode is disabled, and early stop criteria is set.
>>> rs_obj.fit(data=iris_train, run_parallel=False, early_stop=0.93, **eval_params)
All model training has been passed. In case of failure, use get_error_log method to retrieve corresponding error logs.Model optimization will be stopped once early_stop criteria is achieved. Otherwise, Hyperparameter tuning is performed for specified iterations. - View trained model metadata from hyperparameter tuning using models property. Retrieve the model metadata of "rs_obj" instance.
>>> rs_obj.models
MODEL_ID DATA_ID PARAMETERS STATUS ACCURACY 0 DECISIONFOREST_0 DF_0 {'input_columns': ['sepal_length', 'sepal_widt PASS 0.925926 1 DECISIONFOREST_1 DF_0 {'input_columns': ['sepal_length', 'sepal_widt PASS 0.925926 2 DECISIONFOREST_2 DF_0 {'input_columns': ['sepal_length', 'sepal_widt PASS 0.925926 3 DECISIONFOREST_3 DF_0 {'input_columns': ['sepal_length', 'sepal_widt PASS 0.925926
- View the best model and corresponding information identified by RandomSearch.
- Retrieve the best model id identified by "rs_obj" instance.
>>> rs_obj.best_model_id
'DECISIONFOREST_0'
- Retrieve the best data id.
>>> rs_obj.best_data_id
'DF_0'
- Retrieve the best model of "rs_obj" instance.
>>> rs_obj.best_model
############ result Output ############ task_index tree_num tree_order classification_tree 0 1 0 0 {"id_":1,"size_":87,"maxDepth_":10,"responseCounts_":{"1":28,"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.664817,"scoreImprove_":0.328172,"leftNodeSize_":28,"rightNodeSize_":59},"leftChild_":{"id_":2,"size_":28,"maxDepth_":9,"label_":"1","responseCounts_":{"1":28},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":59,"maxDepth_":9,"responseCounts_":{"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.700000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.496409,"scoreImprove_":0.336645,"leftNodeSize_":32,"rightNodeSize_":27},"leftChild_":{"id_":6,"size_":32,"maxDepth_":8,"label_":"2","responseCounts_":{"2":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":27,"maxDepth_":8,"label_":"3","responseCounts_":{"3":27},"nodeType_":"CLASSIFICATION_LEAF"}}} 1 0 0 0 {"id_":1,"size_":86,"maxDepth_":10,"responseCounts_":{"1":32,"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.663332,"scoreImprove_":0.351101,"leftNodeSize_":32,"rightNodeSize_":54},"leftChild_":{"id_":2,"size_":32,"maxDepth_":9,"label_":"1","responseCounts_":{"1":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":54,"maxDepth_":9,"responseCounts_":{"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.750000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.497257,"scoreImprove_":0.312231,"leftNodeSize_":29,"rightNodeSize_":25},"leftChild_":{"id_":6,"size_":29,"maxDepth_":8,"label_":"2","responseCounts_":{"2":29},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":25,"maxDepth_":8,"label_":"3","responseCounts_":{"3":25},"nodeType_":"CLASSIFICATION_LEAF"}}}
- Retrieve the best parameters.
>>> rs_obj.best_params_
{'input_columns': ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], 'response_column': 'species', 'tree_type': 'classification', 'ntree': 453, 'tree_size': 100, 'nodesize': 10, 'variance': 0.25, 'max_depth': 10, 'maxnum_categorical': 20, 'mtry': 30, 'mtry_seed': 100, 'seed': 100, 'data': '"ALICE"."ml__select__169836386746840"'}
- Retrieve the best sampled data.
>>> rs_obj.best_sampled_data_
[{'data': sepal_length sepal_width petal_length petal_width species id 141 6.7 3.1 5.6 2.4 3 139 6.0 3.0 4.8 1.8 3 118 7.7 3.8 6.7 2.2 3 61 5.0 2.0 3.5 1.0 2 59 6.6 2.9 4.6 1.3 2 38 4.9 3.6 1.4 0.1 1 78 6.7 3.0 5.0 1.7 2 36 5.0 3.2 1.2 0.2 1 40 5.1 3.4 1.5 0.2 1 17 5.4 3.9 1.3 0.4 1}, {'newdata': sepal_length sepal_width petal_length petal_width species id 5 5.0 3.6 1.4 0.2 1 58 4.9 2.4 3.3 1.0 2 77 6.8 2.8 4.8 1.4 2 99 5.1 2.5 3.0 1.1 2 28 5.2 3.5 1.5 0.2 1 89 5.6 3.0 4.1 1.3 2 106 7.6 3.0 6.6 2.1 3 125 6.7 3.3 5.7 2.1 3 70 5.6 2.5 3.9 1.1 2 43 4.4 3.2 1.3 0.2 1}]
Identified best model is stored as a default model for future prediction and evaluation operations. - Retrieve the best model id identified by "rs_obj" instance.
- Perform prediction on validation data using the identified best model.
>>> rs_obj.predict(newdata=iris_val, **eval_params)
############ result Output ############ species id prediction confidence_lower confidence_upper 0 2 54 2 1.0 1.0 1 1 8 1 1.0 1.0 2 1 23 1 1.0 1.0 3 2 85 2 1.0 1.0 4 3 138 3 1.0 1.0 5 1 14 1 1.0 1.0 6 1 46 1 1.0 1.0 7 3 107 2 1.0 1.0 8 2 81 2 1.0 1.0 9 3 113 3 1.0 1.0
- Perform evaluation using internally sampled data using the best model.If validation data is not passed to evaluate method, it will use internally sampled test data for evaluation.
>>> rs_obj.evaluate(newdata=iris_val, **eval_params)
############ output_data Output ############ SeqNum Metric MetricValue 0 3 Micro-Recall 1.0 1 5 Macro-Precision 1.0 2 6 Macro-Recall 1.0 3 7 Macro-F1 1.0 4 9 Weighted-Recall 1.0 5 10 Weighted-F1 1.0 6 8 Weighted-Precision 1.0 7 4 Micro-F1 1.0 8 2 Micro-Precision 1.0 9 1 Accuracy 1.0 ############ result Output ############ Prediction Mapping CLASS_1 CLASS_2 CLASS_3 Precision Recall F1 Support SeqNum 2 3 CLASS_3 0 0 5 1.0 1.0 1.0 5 1 2 CLASS_2 0 5 0 1.0 1.0 1.0 5 0 1 CLASS_1 5 0 0 1.0 1.0 1.0 5
- View all trained model stats report. Retrieve the model stats of "rs_obj" instance.
>>> rs_obj.model_stats
MODEL_ID ACCURACY MICRO-PRECISION WEIGHTED-PRECISION WEIGHTED-RECALL WEIGHTED-F1 0 DECISIONFOREST_0 0.925926 0.925926 0.939394 0.925926 0.925 1 DECISIONFOREST_1 0.925926 0.925926 0.939394 0.925926 0.925 2 DECISIONFOREST_2 0.925926 0.925926 0.939394 0.925926 0.925 3 DECISIONFOREST_3 0.925926 0.925926 0.939394 0.925926 0.925 [4 rows x 11 columns]
- Update default model with other trained model and perform predictions.
- Find the best model which is considered as default model.
>>> rs_obj.best_model_id
'DECISIONFOREST_0'
- Update the default trained model of RandomSearch instance using set_model method.
>>> rs_obj.set_model(model_id="DECISIONFOREST_1")
- Perform prediction using "DECISIONFOREST_1" model.
>>> rs_obj.predict(newdata=iris_val.iloc[:5], **eval_params)
############ result Output ############ species id prediction confidence_lower confidence_upper 0 1 17 1 1.0 1.0 1 1 26 1 1.0 1.0 2 1 19 1 1.0 1.0 3 1 15 1 1.0 1.0 4 1 6 1 1.0 1.0
- Find the best model which is considered as default model.
- Retrieve any trained model from the RandomSearch instance using get_model.
>>> rs_obj.get_model("DECISIONFOREST_3")
############ result Output ############ task_index tree_num tree_order classification_tree 0 1 0 0 {"id_":1,"size_":87,"maxDepth_":6,"responseCounts_":{"1":28,"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.664817,"scoreImprove_":0.328172,"leftNodeSize_":28,"rightNodeSize_":59},"leftChild_":{"id_":2,"size_":28,"maxDepth_":5,"label_":"1","responseCounts_":{"1":28},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":59,"maxDepth_":5,"responseCounts_":{"2":32,"3":27},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.700000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.496409,"scoreImprove_":0.336645,"leftNodeSize_":32,"rightNodeSize_":27},"leftChild_":{"id_":6,"size_":32,"maxDepth_":4,"label_":"2","responseCounts_":{"2":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":27,"maxDepth_":4,"label_":"3","responseCounts_":{"3":27},"nodeType_":"CLASSIFICATION_LEAF"}}} 1 0 0 0 {"id_":1,"size_":86,"maxDepth_":6,"responseCounts_":{"1":32,"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":2.600000,"attr_":"petal_length","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.663332,"scoreImprove_":0.351101,"leftNodeSize_":32,"rightNodeSize_":54},"leftChild_":{"id_":2,"size_":32,"maxDepth_":5,"label_":"1","responseCounts_":{"1":32},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":3,"size_":54,"maxDepth_":5,"responseCounts_":{"2":29,"3":25},"nodeType_":"CLASSIFICATION_NODE","split_":{"splitValue_":1.750000,"attr_":"petal_width","type_":"CLASSIFICATION_NUMERIC_SPLIT","score_":0.497257,"scoreImprove_":0.312231,"leftNodeSize_":29,"rightNodeSize_":25},"leftChild_":{"id_":6,"size_":29,"maxDepth_":4,"label_":"2","responseCounts_":{"2":29},"nodeType_":"CLASSIFICATION_LEAF"},"rightChild_":{"id_":7,"size_":25,"maxDepth_":4,"label_":"3","responseCounts_":{"3":25},"nodeType_":"CLASSIFICATION_LEAF"}}}