Using Decision Forest Model with teradataml Package - Using Decision Forest Model with teradataml Package

Using Decision Forest Model with teradataml Package - Using Decision Forest Model with teradataml Package - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

This section uses iris data with three classes. The dataset contains 150 samples, each with four features describing flower properties, and a fifth column indicates the flower species.

In this example, you build a Decision Forest model based on the training dataset and apply the model to the test dataset to evaluate the performance of the model.

Import the required modules.

from teradataml import DecisionForest
from teradataml import DecisionForestPredict
from teradataml import load_example_data
from teradataml.dataframe.dataframe import DataFrame

If the input table "iris_input" does not already exist, create it and load the dataset.
```
load_example_data("byom", "iris_input")
```
Create a teradataml DataFrame from the loaded dataset.
1. Create a teradataml DataFrame "iris_input" consisting the tokens from the training dataset.
```
iris_input = DataFrame("iris_input")
```
2. Create two samples of input data: sample 1 has 80% of the total rows for training the model ("iris_train"), and sample 2 has 20% of the total rows for testing the model ("iris_test").
  First, sample the "iris_input" dataframe.
```
iris_sample = iris_input.sample(frac=[0.8, 0.2])
```
3. Create train dataset from sample 1 by filtering on "sampleid" and drop "sampleid" column as it is not required for training model.
```
iris_train = iris_sample[iris_sample.sampleid == "1"].drop("sampleid", axis = 1)
 
```
4. Create test dataset from sample 2 by filtering on "sampleid" and drop "sampleid" column as it is not required for scoring.
```
iris_test = iris_sample[iris_sample.sampleid == "2"].drop("sampleid", axis = 1)
```

Train a new Decision Forest model based on the teradataml DataFrame "iris_train" from the training dataset, using the DecisionForest function from teradataml package.

This can be done with or without using the formula argument.

Example 1: Train the decision forest Classification model using input teradataml dataframe and provided the formula argument.

formula = "species ~ sepal_length + sepal_width + petal_length + petal_width"
 
# Train the Decision Forest model.
rft_model = DecisionForest(data=iris_train,
                           formula = formula,
                           tree_type="classification",
                           ntree=50,
                           tree_size=100,
                           nodesize=1,
                           variance=0.0,
                           max_depth=12,
                           maxnum_categorical=20,
                           mtry=3,
                           mtry_seed=100,
                           seed=100)

Example 2: Train the same decision forest Classification model (rft_model) without using the formula argument.

rft_model = DecisionForest(data=iris_train,
                           input_columns=["sepal_length", "sepal_width", "petal_length", "petal_width"],
                           response_column="species",
                           tree_type="classification",
                           ntree=50,
                           tree_size=100,
                           nodesize=1,
                           variance=0.0,
                           max_depth=12,
                           maxnum_categorical=20,
                           mtry=3,
                           mtry_seed=100,
                           seed=100)

Once the model is created, you can apply the model to the test dataset.

Predict the iris species by applying the Decision Forest model to the teradataml DataFrame "iris_test" from the test dataset, using the DecisionForestPredict function.

decision_forest_predict_out = DecisionForestPredict(object = rft_model,
                                                    newdata = iris_test,
                                                    id_column = "id",
                                                    detailed = False,
                                                    terms = ["species"]
                                                    )

Inspect the results.
```
decision_forest_predict_out.result
```