This section uses iris data with three classes. The dataset contains 150 samples, each with four features describing flower properties, and a fifth column indicates the flower species.
In this example, you build a Decision Forest model based on the training dataset and apply the model to the test dataset to evaluate the performance of the model.
- Import the required modules.
from teradataml import DecisionForest from teradataml import DecisionForestPredict from teradataml import load_example_data from teradataml.dataframe.dataframe import DataFrame
- If the input table "iris_input" does not already exist, create it and load the dataset.
load_example_data("byom", "iris_input")
- Create a teradataml DataFrame from the loaded dataset.
- Create a teradataml DataFrame "iris_input" consisting the tokens from the training dataset.
iris_input = DataFrame("iris_input")
- Create two samples of input data: sample 1 has 80% of the total rows for training the model ("iris_train"), and sample 2 has 20% of the total rows for testing the model ("iris_test").First, sample the "iris_input" dataframe.
iris_sample = iris_input.sample(frac=[0.8, 0.2])
- Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
- Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
- Create a teradataml DataFrame "iris_input" consisting the tokens from the training dataset.
- Train a new Decision Forest model based on the teradataml DataFrame "iris_train" from the training dataset, using the DecisionForest function from teradataml package.This can be done with or without using the formula argument.
Example 1: Train the decision forest Classification model using input teradataml dataframe and provided the formula argument.
formula = "species ~ sepal_length + sepal_width + petal_length + petal_width" # Train the Decision Forest model. rft_model = DecisionForest(data=iris_train, formula = formula, tree_type="classification", ntree=50, tree_size=100, nodesize=1, variance=0.0, max_depth=12, maxnum_categorical=20, mtry=3, mtry_seed=100, seed=100)
Example 2: Train the same decision forest Classification model (rft_model) without using the formula argument.rft_model = DecisionForest(data=iris_train, input_columns=["sepal_length", "sepal_width", "petal_length", "petal_width"], response_column="species", tree_type="classification", ntree=50, tree_size=100, nodesize=1, variance=0.0, max_depth=12, maxnum_categorical=20, mtry=3, mtry_seed=100, seed=100)
Once the model is created, you can apply the model to the test dataset. - Predict the iris species by applying the Decision Forest model to the teradataml DataFrame "iris_test" from the test dataset, using the DecisionForestPredict function.
decision_forest_predict_out = DecisionForestPredict(object = rft_model, newdata = iris_test, id_column = "id", detailed = False, terms = ["species"] )
- Inspect the results.
decision_forest_predict_out.result