Use the Decision Forest Model with Teradata Package for Python - Using the Decision Forest Model with Teradata Package for Python - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

This section uses the housing data for illustration. The dataset contains 492 samples, each with 12 features describing a home style, and a 13th column indicates the home style.

In this example, you build a Decision Forest model based on the training dataset and apply the model to the test dataset to evaluate the performance of the model.

  1. Import the required modules.
    from teradataml.analytics.mle.DecisionForest import DecisionForest
    
    from teradataml.analytics.sqle.DecisionForestPredict import DecisionForestPredict
    
    from teradataml.dataframe.dataframe import DataFrame
    
    from teradataml.data.load_example_data import load_example_data
  2. If the input tables housing_train and housing_test do not already exist, create them and load the datasets into them.
    load_example_data("decisionforestpredict", ["housing_train","housing_test"])
  3. Create a teradataml DataFrame "housing_train" consisting the tokens from the training dataset.
    housing_train = DataFrame.from_table("housing_train")
  4. Train a new Decision Forest model based on the teradataml DataFrame "housing_train" from the training dataset, using the DecisionForest function from teradataml package.
    formula = "homestyle ~ driveway + recroom + fullbase + gashw + airco + prefarea + price + lotsize + bedrooms + bathrms + stories + garagepl"
    
    rft_model = DecisionForest(data=housing_train,
                                formula = formula,
                                tree_type="classification",
                                ntree=50,
                                tree_size=100,
                                nodesize=1,
                                variance=0.0,
                                max_depth=12,
                                maxnum_categorical=20,
                                mtry=3,
                                mtry_seed=100,
                                seed=100
                                )
    Once the model is created, you can apply the model to the test dataset.
  5. Create a teradataml DataFrame "housing_test" with the tokens from the test dataset.
    housing_test = DataFrame.from_table("housing_test")
  6. Predict the home styles by applying the Decision Forest model to the teradataml DataFrame "housing_test" from the test dataset, using the DecisionForestPredict.
    decision_forest_predict_out = DecisionForestPredict(object = rft_model,
                                                        newdata = housing_test,
                                                        id_column = "sn",
                                                        detailed = False,
                                                        terms = ["homestyle"]
                                                        )
  7. Inspect the results.
    decision_forest_predict_out