DecisionForestEvaluator Output

DecisionForestEvaluator Output - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Output Table Schema

Column	Data Type	Description
worker_ip	VARCHAR	IP address of worker that produced decision tree.
task_index	INTEGER	Identifier of worker that produced decision tree.
tree_num	INTEGER	Decision tree identifier.
variable_col	VARCHAR	Variable name.
level	INTEGER	Highest level of decision tree at which variable appears.
cnt	INTEGER	Number of times variable is used as split node in decision tree.
importance	DOUBLE PRECISION	Importance statistics for each decision tree. To calculate overall importance of each variable, you must group by variable and then take average over all trees. Use this query, where n is number of trees: SELECT variable, sum(importance)/n FROM DecisionForestEvaluator ( ON { table \| view \| (query) } [ NumLevels (number_of_levels) ] ) GROUP BY variable; For classification tree: Function measures importance by Gini impurity decrease. For each split, this is the formula for decrease in Gini impurity: parent_node_Gini - left_node_Gini - right_node_Gini Function records decrease in Gini impurity resulting from each split and accumulates these values for all nodes in all trees in forest, individually for all variables. Specific algorithm for calculating importance is described in "A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data," Bjoern H Menze, B Michael Kelm, Ralf Masuch, Uwe Himmelreich, Peter Bachert, Wolfgang Petrich and Fred A Hamprecht, 2009 (http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-213). For regression tree: Function calculates importance using mean squared error, described in "Variable Importance Assessment in Regression: Linear Regression versus Random Forest," Ulrike GRÖMPING 2009 (https://prof.beuth-hochschule.de/fileadmin/prof/groemp/downloads/tast_2E2009_2E08199.pdf).