Purpose
Miner includes a few Decision Tree algorithms such as gain ratio, gini index, and Chaid as well as one regression algorithm. The only algorithm that resides in-database currently is gain ratio which is available in the Miner as a decision tree splitting method called Gain Ratio Extreme. It is a stand-alone, externally-stored procedure executed directly in the Teradata database.
To execute the in-database decision tree algorithm, the td_analyze stored procedure and the tda_dt_calc table operator must be installed on the Teradata system, with appropriate permissions such as Execute Procedure granted to the user. The in-database decision tree feature is dependent on Release 15.00 of the Teradata RDBMS.
For each call to td_analyze, a decision tree is performed. The first parameter for decision trees is the decisiontree function name, followed by decision tree parameters.
A Gain Ratio Extreme Decision Tree returns a data set that can be viewed as result set. The result set contains one row with two columns. The second column contains an XML string representing the resulting decision tree model described in Predictive Model Markup Language (PMML).
Syntax
call twm. td_analyze('decisiontree','database=twm_source;tablename=twm_customer_analysis;columns=col names;dependent=column;General Parameters');Required Parameters
- columns
- The independent input columns used in decision tree building. These columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.
- database
- The database containing the input table.
- decisiontree
- Identifies the type of function being performed.
- dependent
- The dependent value is the name of a column whose values are being predicted. It is selected from the available columns that reside in the table specified by the database and tablename parameters.
- tablename
- The name of the table to transform.
Optional Parameters
- algorithm
- The algorithm the decision tree uses during building. Currently this option only allows gainratio.
- binning
- Option to automatically Bincode the continuous independent variables. Continuous data is separated into one hundred bins when this option is selected. If the variable has fewer than one hundred distinct values, this option is ignored. Default is false.
- max_depth
- Specifies the maximum number of levels the tree can grow. The default is 100.
- min_records
- Specifies how far the decision tree can split. Unless a node is pure (meaning it has only observations with the same dependent value) it splits if each branch that can come off this node contains at least this many observations. The default is a minimum of two cases for each branch.
- operatordatabase
- The database where the tda_kmeans table operator called by td_analyze resides. If not specified, the database software searches the standard search path for table operators, including the current user database.
- outputdatabase
- The database containing the resulting output table when outputstyle=table or view.
- outputtablename
- The name of the output table representing the decision tree model.
- pruning
- Determines the style of pruning to use after the tree is fully built. The default option is gainratio. The only other option at this time is none which does no pruning of the tree.
Example
This example assumes the td_analyze function is installed in a database named twm.
call twm.td_analyze('decisiontree','database=twm_source;tablename=twm_customer_analysis;columns=income,age,nbr_children;dependent=gender;min_records=2;max_depth=5;binning=false;algorithm=gainratio;pruning=gainratio;outputdatabase=twm;outputtablename=cust_analysis_dt;operatordatabase=twm;');