The output of the Forest_Analyze function is a table of model analysis data. The following table shows its schema.
Column | Data Type | Description |
---|---|---|
worker_ip | VARCHAR | The IP address of the worker that produced the decision tree. |
task_index | INTEGER | The ID of the worker that produced the decision tree. |
tree_num | INTEGER | The ID of the decision tree. |
variable | VARCHAR | A string representation of the decision tree. |
level | INTEGER | The highest level of the decision tree at which the variable appears. |
cnt | INTEGER | The number of times that the variable is used as a split node in the decision tree. |
importance | DOUBLE PRECISION | The importance statistics for each decision tree in the random forest. To find the overall importance of each variable, use this query, where n is the number of trees:SELECT variable, sum(importance)/n FROM Forest_Analyze ( ON { table | view | (query) } [ NumLevels (number_of_levels) ] ) GROUP BY variable; The function measures the importance for a classification tree by Gini impurity decrease. For each split, the decrease in Gini impurity is: parent_node_Gini - left_node_Gini - right_node_Gini. The function records the decrease in Gini impurity resulting from each split and accumulates these values for all nodes in all trees in the forest, individually for all variables. The specific algorithm for calculating importance is described in the paper "A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data," Bjoern H Menze, B Michael Kelm, Ralf Masuch, Uwe Himmelreich, Peter Bachert, Wolfgang Petrich and Fred A Hamprecht, 2009 (http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-213). For regression, the function calculates importance using the mean squared error, described in the paper "Variable Importance Assessment in Regression: Linear Regression versus Random Forest," Ulrike GRÖMPING 2009 (https://prof.beuth-hochschule.de/fileadmin/user/groemping/downloads/tast_2E2009_2E08199.pdf). |