DecisionForestEvaluator Output - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™

Output Table Schema

Column Data Type Description
worker_ip VARCHAR IP address of worker that produced decision tree.
task_index INTEGER Identifier of worker that produced decision tree.
tree_num INTEGER Decision tree identifier.
variable_col VARCHAR String representation of decision tree.
level INTEGER Highest level of decision tree at which variable appears.
cnt INTEGER Number of times variable is used as split node in decision tree.
importance DOUBLE PRECISION Importance statistics for each decision tree. To find overall importance of each variable, use this query, where n is number of trees:
SELECT variable, sum(importance)/n
  FROM DecisionForestEvaluator (
    ON { table | view | (query) }
    [ NumLevels (number_of_levels) ]
  ) GROUP BY variable;

For classification tree:

Function measures importance by Gini impurity decrease. For each split, this is the formula for decrease in Gini impurity:

parent_node_Gini - left_node_Gini - right_node_Gini

Function records decrease in Gini impurity resulting from each split and accumulates these values for all nodes in all trees in forest, individually for all variables. Specific algorithm for calculating importance is described in "A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data," Bjoern H Menze, B Michael Kelm, Ralf Masuch, Uwe Himmelreich, Peter Bachert, Wolfgang Petrich and Fred A Hamprecht, 2009 (http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-213).

For regression tree:

Function calculates importance using mean squared error, described in "Variable Importance Assessment in Regression: Linear Regression versus Random Forest," Ulrike GRÖMPING 2009 (https://prof.beuth-hochschule.de/fileadmin/prof/groemp/downloads/tast_2E2009_2E08199.pdf).