TD_DecisionForest Function | DecisionForest | Teradata Vantage - TD_DecisionForest - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Analytics Database
Release Number
17.20
Published
June 2022
ft:locale
en-US
ft:lastEdition
2025-11-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

Decision forest functions create predictive models based on the algorithm for decision tree training and prediction.

TD_DecisionForest function is an ensemble algorithm used for classification and regression predictive modeling problems. It is an extension of bootstrap aggregation (bagging) of decision trees. The function supports regression, binary, and multiclass classification.

Constructing a decision tree involves evaluating the value for each input variable in the data to select a split point. The function reduces the variables to a random subset that can be considered at each split point. The algorithm can force each decision tree in the forest to be different to improve prediction accuracy.

Each node in the tree represents a decision based on the value of a single variable, and the tree is grown by iteratively splitting the data into smaller and smaller subsets based on these decisions. It repeats this process until it finds the best variable to split the data at a given level of a tree, and repeats it at each level until the stopping criterion is met.

Consider the following points when using TD_DecisionForest function:
  • All input variables are numeric. Convert the categorical columns to numerical columns as preprocessing step.
  • For classification, class labels (ResponseColumn values) can only be integers. Supports a maximum of 500 classes for classification.
  • The function skips any observation with a missing value in an input column and is not used for training. Use TD_SimpleImpute function to assign missing values.
TD_DecisionForest has several parameters that you can tune to optimize performance, including the number of trees, the maximum depth of each tree, and the minimum number of samples required to split a node. TD_DecisionForest constructs the trees in parallel by all the AMPs, which have a non-empty partition of data.
  • When you specify the NumTrees value, TD_DecisionForest adjusts the number of trees built as:
    Number_of_trees = Num_AMPs_with_data * (NumTrees/Num_AMPs_with_data)
  • For Num_AMPs_with_data value, use the SQL command SELECT HASHAMP()+1;.
  • When you do not specify the NumTrees value, TD_DecisionForest calculates the number of trees built by an AMP as:
    Number_of_AMP_trees = CoverageFactor * Num_Rows_AMP / TreeSize

    The number of trees built by the function is the sum of Number_of_AMP_trees.

    When a data set is small, best practice is to distribute the data to one AMP. To do this, create an identifier column as a primary index, and use the same value for each row.
  • The TreeSize value determines the sample size used to build a tree in the forest and depends on the memory available to the AMP. By default, TD_DecisionForest computes internally this value. TD_DecisionForest reserves approximately 40% of its available memory to store the input sample, while the rest is used to build the tree.
The number of trees controlls the processing time and complexity of the trees. For example, changing CoverageFactor from 1.0 to 2.0 doubles the number of trees and increases processing time of the query.

TD_DecisionForest uses a training dataset to create a predictive model. TD_DecisionForestPredict function uses the model created by TD_DecisionForest function for making predictions. See TD_DecisionForestPredict.

The following is an example of how to use TD_DecisionForest:
  1. Convert the categorical columns to numerical columns.
  2. Determine the parameters to use with the function, such as tree depth, model type, and number of trees.
  3. Use TD_DecisionForest on a training dataset to create a predictive model.
  4. Use TD_DecisionForestPredict function on the model created by the TD_DecisionForest function to make predictions.