This section uses the housing data for illustration. The dataset contains 492 samples, each with 12 features describing a home style, and a 13th column indicates the home style.
In this example, you build a Decision Forest model based on the training dataset and apply the model to the test dataset to evaluate the performance of the model.
- Import the required modules.
from teradataml.analytics.mle.DecisionForest import DecisionForest from teradataml.analytics.sqle.DecisionForestPredict import DecisionForestPredict from teradataml.dataframe.dataframe import DataFrame from teradataml.data.load_example_data import load_example_data
- If the input tables housing_train and housing_test do not already exist, create them and load the datasets into them.
load_example_data("decisionforestpredict", ["housing_train","housing_test"])
- Create a teradataml DataFrame "housing_train" consisting the tokens from the training dataset.
housing_train = DataFrame.from_table("housing_train")
- Train a new Decision Forest model based on the teradataml DataFrame "housing_train" from the training dataset, using the DecisionForest function from teradataml package.
formula = "homestyle ~ driveway + recroom + fullbase + gashw + airco + prefarea + price + lotsize + bedrooms + bathrms + stories + garagepl" rft_model = DecisionForest(data=housing_train, formula = formula, tree_type="classification", ntree=50, tree_size=100, nodesize=1, variance=0.0, max_depth=12, maxnum_categorical=20, mtry=3, mtry_seed=100, seed=100 )
Once the model is created, you can apply the model to the test dataset. - Create a teradataml DataFrame "housing_test" with the tokens from the test dataset.
housing_test = DataFrame.from_table("housing_test")
- Predict the home styles by applying the Decision Forest model to the teradataml DataFrame "housing_test" from the test dataset, using the DecisionForestPredict.
decision_forest_predict_out = DecisionForestPredict(object = rft_model, newdata = housing_test, id_column = "sn", detailed = False, terms = ["homestyle"] )
- Inspect the results.
decision_forest_predict_out