1.1 - 8.10 - DecisionTree Output - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Release Date
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
Table Description
OutputTable Contains final decision tree (model table).
FinalResponseTable [Optional] Contains final PID and response pair from ResponseTable and node_id from final decision tree.
IntermediateSplitsTable [Disallowed with SplitsTable, optional otherwise] Contains intermediate splits.

Output Message Schema

Column Data Type Description
message VARCHAR Reports either that model table was stored in table specified by OutputTable syntax element and depth of tree or "The splitting conditions are not satisfied and no tree is built".
If no tree was built, confirm following:
  • Every column in SplitsTable and CategoricalAttributesTable is also in InputTable.
  • Values of numeric columns in SplitsTable are not outside range of InputTable values.
  • Values of categorical columns in SplitsTable appear in InputTable.

"The splitting conditions are not satisfied and no tree is built" also appears if function cannot find split that improves impurity measurement of full input data set.

OutputTable Schema

This model table has a row for each node in the model.

Column Data Type Description
node_id INTEGER Node identifier.
node_size INTEGER Number of objects in node.
node_gini[_p] DOUBLE PRECISION GINI impurity value for information in node. For ImpurityMeasurement ('gini'), column name is node_gini_p; otherwise, it is node_gini.
node_entropy[_p] DOUBLE PRECISION Entropy impurity value for the information in the node. For ImpurityMeasurement ('entropy'), column name is node_entropy_p; otherwise, it is node_entropy.
node_chisq_pv[_p] DOUBLE PRECISION Chi-square impurity value for the information in the node. For ImpurityMeasurement ('chisquare'), column name is node_chisq_pv_p; otherwise, it is node_chisq_pv.
node_label VARCHAR Output category for node.
node_majorvotes INTEGER Number of objects that belong to category identified by node_label.
split_value DOUBLE PRECISION Numeric split value.
split_gini[_p] DOUBLE PRECISION GINI impurity measurement for information in node after splitting. For ImpurityMeasurement ('gini'), column name is split_gini_p; otherwise, it is split_gini.
split_entropy[_p] DOUBLE PRECISION Entropy impurity measurement for the information in node after splitting. For ImpurityMeasurement ('entropy'), column name is split_entropy_p; otherwise, it is split_entropy.
split_chisq_pv[_p] DOUBLE PRECISION Chi-square impurity measurement for information in node after splitting. For ImpurityMeasurement ('chisquare'), column name is split_chisq_pv_p; otherwise, it is split_chisq_pv.
left_id INTEGER Identifier of left child of node.
left_size INTEGER Number of objects in left child of node.
left_label VARCHAR Output category for left child of node.
left_majorvotes INTEGER Number of objects that belong to category identified by left_label.
right_id INTEGER Identifier of right child of node.
right_size INTEGER Number of objects in right child of node.
right_label VARCHAR Output category for right child of node.
right_majorvotes INTEGER Number of objects that belong to category identified by right_label.
left_bucket VARCHAR When split value is categorical attribute, value in left child of node.
right_bucket VARCHAR When split value is categorical attribute, value in right child of node.
attribute VARCHAR Split attribute.
node_majorfreq DOUBLE PRECISION [Column appears only with Weighted ('true').] Weighted objects that belong to category identified by node_label.
left_majorfreq DOUBLE PRECISION [Column appears only with Weighted ('true').] Weighted objects that belong to category identified by left_label.
right_majorfreq DOUBLE PRECISION [Column appears only with Weighted ('true').] Weighted objects that belong to category identified by right_label.
left_label_probdist VARCHAR [Column appears only with OutputProb ('true').] Probability of each label for left child of node.
right_label_probdist VARCHAR [Column appears only with OutputProb ('true').] Probability of each label for right child of node.
prob_label_order VARCHAR [Column appears only with OutputProb ('true').] Order of probability of labels for left and right children of node.

IntermediateSplitsTable Schema

Column Data Type Description
attribute VARCHAR Attribute name (from the attribute table in DecisionTree Input). For each attribute, the table has the number of rows specified by the MaxDepth syntax element.
percentile INTEGER Percentage of values in the split. For example, if attribute A has 100 different values, then percentile =10 and value =1 means that 100*10%=10 (the 10th value) of attribute A is 1, and 1 is the split value.
value NUMERIC, INTEGER, BIGINT, or DOUBLE PRECISION Split value (from the attribute table, DecisionTree Input).

SaveFinalResponseTable Schema

Column Data Type Description
node_id INTEGER Node identifier.
pid Any Data point identifier.
response NUMERIC, INTEGER, BIGINT, or DOUBLE PRECISION Response value for the data point.