Decision Trees - Teradata Warehouse Miner

Purpose

TWM includes Decision Tree algorithms such as gain ratio, gini index, and Chaid as well as one regression algorithm. The only algorithm that resides in-database currently is gain ratio, which is available in the TWM as a decision tree splitting method called Gain Ratio Extreme. Decision Tree is a stand-alone, externally-stored procedure run directly in the Teradata database.

To run the in-database decision tree algorithm, the td_analyze stored procedure and the tda_dt_calc table operator must be installed on the Teradata system, with appropriate permissions such as Execute Procedure granted to the user. The in-database decision tree feature is dependent on Release 15.00 of the Teradata RDBMS.

For each call to td_analyze, a decision tree is performed. The first parameter for decision trees is the decisiontree function name, followed by decision tree parameters.

A Gain Ratio Extreme Decision Tree returns a data set that can be viewed as result set. The result set contains one row with two columns. The second column contains an XML string representing the resulting decision tree model described in Predictive Model Markup Language (PMML).

Syntax

call twm. td_analyze('decisiontree','database=twm_source;tablename=twm_customer_analysis;columns=col names;dependent=column;General Parameters');

Required Parameters

columns: The independent input columns used in decision tree building. These columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.; For example: columns=column1,column2,column3
database: The database containing the input table.
decisiontree: Identifies the type of function being performed.
dependent: The dependent parameter is the name of a column whose values are being predicted. The dependent column is selected from the available columns that reside in the table specified by the database and tablename parameters.
tablename: The input table to build a predictive model from.

Optional Parameters

algorithm: The algorithm the decision tree uses during building. Currently this option only allows gainratio.
binning: Option to automatically Bincode the continuous independent variables. Continuous data is separated into one hundred bins when this option is selected. If the variable has fewer than one hundred distinct values, this option is ignored. Default is false.
max_depth: Specifies the maximum number of levels the tree can grow. The default is 100.
min_records: Specifies how far the decision tree can split. Unless a node is pure (meaning it has only observations with the same dependent value) it splits if each branch that can come off this node contains at least this many observations. The default is a minimum of two cases for each branch.
operatordatabase: The database where the table operators called by td_analyze reside. If not specified, the database software searches the standard search path for table operators, including the current user database.; For example: operatordatabase=twm
outputdatabase: The database containing the resulting output table when outputstyle=table or view.
outputtablename: The name of the output table representing the decision tree model.
overwrite: When overwrite is set to true (default), the output tables are dropped before creating new ones.
pruning: Determines the style of pruning to use after the tree is fully built. The default option is gainratio. The only other option at this time is none which does no pruning of the tree.

Example

To run the provided examples, the td_analyze function must be installed in a database called twm and the TWM tutorial data must be installed in the twm_source database.

This example shows how to invoke the td_analyze stored procedure and the tda_dt_calc table operator to perform decision tree. The resulting model is returned from the td_analyze stored procedure or placed in the output database and output table chosen.

call twm.td_analyze('decisiontree','database=twm_source;tablename=twm_customer_analysis;columns=income,age,nbr_children;dependent=gender;min_records=2;max_depth=5;binning=false;algorithm=gainratio;pruning=gainratio;outputdatabase=twm;outputtablename=cust_analysis_dt;operatordatabase=twm;');