DecisionForestEvaluator Example: Variable Importance - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

To calculate overall importance of each variable, you must group the importance reported in the output table by variable and then sum it over all trees. This query in this example shows how to make that calculation.

Input

SQL Call

In the DecisionForest call that output rft_model, the value of NumTrees was 50; therefore, to calculate the average importance over all trees, you divide by 50 in this SQL call.

SELECT variable_col, SUM(importance)/50
  FROM DecisionForestEvaluator (
   ON rft_model
  ) AS dt GROUP BY variable_col ORDER BY 2 DESC;

Output

Variable importance is in descending order. The top three variables for modeling and prediction are price, lotsize, and bedrooms.

 variable_col IMPORTANCE            
 ------------ --------------------- 
 price           1.1017219750593588
 lotsize        0.19450014830967055
 stories        0.07803449640626983
 garagepl        0.0707099673003008
 bathrms        0.05317959159771956
 bedrooms       0.03751879285236206
 fullbase      0.016919484930740327
 recroom       0.013867229822064479
 prefarea      0.013447400310153186
 gashw         0.006672653654901577
 airco        -0.016537859271588122
 driveway     -0.048091065686165085

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.