1.1 - 8.10 - DecisionForest Output - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

The function outputs a message and a model table.

OutputMessageTable Schema

The function saves this message to output_message_table, whose default value is default_dt_monitor_table in the current schema.

Column Data Type Description
message VARCHAR Reports this information:
  • Total number of trees
  • Number of trees created by each worker
  • Approximate number of points in each tree
  • Poisson sampling parameter
  • For OutOfBag ('true'):
    Problem Information
    Classification Out-of-bag estimate of error rate
    Regression
    • Mean of squared residuals
    • Percent of total variance explained
  • For OutOfBag ('false'): Query time in seconds
  • For DisplayNumProcessedRows ('true'): Total number of InputTable rows processed and number of InputTable rows processed after excluding rows with NULL values

OutputTable Schema

This is the model table to input to DecisionForestPredict_MLE.

Column Data Type Description
id_column VARCHAR [Appears only with IDColumn.] Unique row identifier from InputTable.
worker_ip VARCHAR IP address of worker that produced decision tree.
task_index INTEGER Identifier of worker that produced decision tree.
tree_num INTEGER Decision tree identifier.
tree CLOB JSON representation of decision tree. For JSON types that can appear in the representation, see following table.
num_processed_rows INTEGER Number of InputTable rows processed by the worker (excluding rows skipped because they contained NULL values).
num_total_rows INTEGER Number of InputTable rows allocated to the worker.

JSON Types in JSON Representation of Decision Tree

JSON Type Description
responseCounts Appears only for classification trees. Number of observations in each class at node identified by id.
sum Appears only for regression trees. Sum of values of response variable at node identified by id.
sumSq Appears only for regression trees. Sum of squared values of response variable at node identified by id.
size Total number of observations at node identified by id.
id Node identifier.
maxDepth Maximum possible depth of tree, starting from node identified by id. For root node, value is max_depth; for leaf nodes, 0; for other nodes, maximum possible depth of tree, starting from that node.
split Start of JSON item describing a split at node identified by id.
attr Attribute (predictor) on which algorithm split at node identified by id.
leftCategories Appears only for splits where attribute (identified by attr) is categorical. Categories assigned to left child of split.
rightCategories Appears only for splits where attribute is categorical. Categories assigned to the right child of split.
score Gini score of node identified by id.
type Type of tree and split. Possible values:
  • CLASSIFICATION_CATEGORICAL_SPLIT
  • CLASSIFICATION_NUMERIC_SPLIT
  • REGRESSION_CATEGORICAL_SPLIT
  • REGRESSION_NUMERIC_SPLIT
leftNodeSize Number of observations assigned to left node of split.
rightNodeSize Number of observations assigned to right node of split.
scoreImprove Score improvement at node identified by id.
leftChild Start of JSON item describing left child of node identified by id.
rightChild Start of JSON item describing right child of node identified by id.
nodeType Type of node identified by id. Possible values:
  • CLASSIFICATION_NODE
  • CLASSIFICATION_LEAF
  • REGRESSION_NODE
  • REGRESSION_LEAF