Output - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

The output of the Forest_Analyze function is a table of model analysis data. The following table shows its schema.

Forest_Analyze Output Table Schema
Column	Data Type	Description
worker_ip	VARCHAR	The IP address of the worker that produced the decision tree.
task_index	INTEGER	The ID of the worker that produced the decision tree.
tree_num	INTEGER	The ID of the decision tree.
variable	VARCHAR	A string representation of the decision tree.
level	INTEGER	The highest level of the decision tree at which the variable appears.
cnt	INTEGER	The number of times that the variable is used as a split node in the decision tree.
importance	DOUBLE PRECISION	The importance statistics for each decision tree in the random forest. To find the overall importance of each variable, use this query, where n is the number of trees: SELECT variable, sum(importance)/n FROM Forest_Analyze ( ON { table \| view \| (query) } [ NumLevels (number_of_levels) ] ) GROUP BY variable; The function measures the importance for a classification tree by Gini impurity decrease. For each split, the decrease in Gini impurity is: parent_node_Gini - left_node_Gini - right_node_Gini. The function records the decrease in Gini impurity resulting from each split and accumulates these values for all nodes in all trees in the forest, individually for all variables. The specific algorithm for calculating importance is described in the paper "A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data," Bjoern H Menze, B Michael Kelm, Ralf Masuch, Uwe Himmelreich, Peter Bachert, Wolfgang Petrich and Fred A Hamprecht, 2009 (http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-213). For regression, the function calculates importance using the mean squared error, described in the paper "Variable Importance Assessment in Regression: Linear Regression versus Random Forest," Ulrike GRÖMPING 2009 (https://prof.beuth-hochschule.de/fileadmin/user/groemping/downloads/tast_2E2009_2E08199.pdf).