DecisionForest Output - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The function outputs a message and a model table.

Output Message (MonitorTable) Schema

The function saves this message to monitor_table, whose default value is default_dt_monitor_table in the current schema.

Column Data Type Description
message VARCHAR Reports this information:
  • Total number of trees
  • Number of trees created by each worker
  • Approximate number of points in each tree
  • Poisson sampling parameter
  • For OutOfBag ('true'):
    Problem Information
    Classification Out-of-bag estimate of error rate
    Regression
    • Mean of squared residuals
    • Percent of total variance explained
  • For OutOfBag ('false'): Query time in seconds
  • For DisplayNumProcessedRows ('true'): Total number of input rows processed and number of input rows processed after excluding rows with NULL values

OutputTable Schema

This is the model table to input to Forest_Predict.

Column Data Type Description
worker_ip VARCHAR IP address of worker that produced decision tree.
task_index INTEGER Identifier of worker that produced decision tree.
tree_num INTEGER Decision tree identifier.
tree CLOB JSON representation of decision tree. For JSON types that can appear in the representation, see following table.
num_processed_rows INTEGER Number of input rows processed by the worker (excluding rows skipped because they contained NULL values).
num_total_rows INTEGER Number of input rows allocated to the worker.

JSON Types in JSON Representation of Decision Tree

JSON Type Description
responseCounts Appears only for classification trees. Number of observations in each class at node identified by id.
sum Appears only for regression trees. Sum of values of response variable at node identified by id.
sumSq Appears only for regression trees. Sum of squared values of response variable at node identified by id.
size Total number of observations at node identified by id.
id Node identifier.
maxDepth Maximum possible depth of tree, starting from node identified by id. For root node, value is max_depth; for leaf nodes, 0; for other nodes, maximum possible depth of tree, starting from that node.
split Start of JSON item describing a split at node identified by id.
attr Attribute (predictor) on which algorithm split at node identified by id.
leftCategories Appears only for splits where attribute (identified by attr) is categorical. Categories assigned to left child of split.
rightCategories Appears only for splits where attribute is categorical. Categories assigned to the right child of split.
score Gini score of node identified by id.
type Type of tree and split. Possible values:
  • CLASSIFICATION_CATEGORICAL_SPLIT
  • CLASSIFICATION_NUMERIC_SPLIT
  • REGRESSION_CATEGORICAL_SPLIT
  • REGRESSION_NUMERIC_SPLIT
leftNodeSize Number of observations assigned to left node of split.
rightNodeSize Number of observations assigned to right node of split.
scoreImprove Score improvement at node identified by id.
leftChild Start of JSON item describing left child of node identified by id.
rightChild Start of JSON item describing right child of node identified by id.
nodeType Type of node identified by id. Possible values:
  • CLASSIFICATION_NODE
  • CLASSIFICATION_LEAF
  • REGRESSION_NODE
  • REGRESSION_LEAF