TD_DecisionForestPredict Usage Notes - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2025-01-20
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

Decision forests or random forests, are a type of machine learning algorithm used for both classification and regression tasks. They are composed of multiple decision trees, each trained on a random subset of the training data and a random subset of the available variables. During training, each decision tree in the forest learns to predict the target variable based on the input variables.

To predict unseen examples, the input data is fed to each tree in the forest, and the predictions from each tree are combined to produce a final output. This aggregation of predictions helps to reduce overfitting and improve the accuracy of the overall model.

Example: How to Predict Using Decision Forests?

There is a dataset containing information about houses, including the size, number of bedrooms, location, and price. Build a model that can predict the price of a house given its variables. To do this, use a decision forest algorithm:

  1. Split the dataset into a training set and a test set. Use the training set to train the decision forest model and the test set to evaluate its performance.
  2. Build a decision forest model by training multiple decision trees on different subsets of the training data. Each decision tree is trained to predict the price of a house based on a random subset of variables.
  3. During testing, feed the variables of a new house to each decision tree in the forest. Each tree produces a prediction of the house price, take the average of all the predictions to get the final predicted price.

For example, there is a new house with the following variables:

  • size=2000 sq. ft.
  • bedrooms=3
  • location=New York

Feed this data to each tree in the decision forest and each tree produces a prediction of the house price based on a random subset of variables.

Here's what the predictions might look like for the first five trees in the forest:
  • Tree 1: $500,000
  • Tree 2: $450,000
  • Tree 3: $480,000
  • Tree 4: $510,000
  • Tree 5: $490,000

Take the average of all the predictions, to get the final prediction:

Final prediction: ($500,000 + $450,000 + $480,000 + $510,000 + $490,000) / 5 = $486,000

Final predicted price for the house is $486,000.

To illustrate using a different (classification) example, suppose you have the following decision forest with four trees:

Each of these trees is different from the other and can produce different predictions based on the input data.

Example, using the following data:

x0 x1 x2 x3 x4
2.7 6.7 4.2 5.3 4.8

The following are the predictions for each of the trees:

What will the decision forest predict? In the case above, since there are a greater number of 1s, the decision forest predicts the target variable for the row of data to be 1. In the case that there are an equal number of 1s and 0s, the decision forest will either predict the outcome variable randomly (0 or 1) or you can increase or decrease the number of trees to be an odd number so that the decision forest has a definite prediction.