5.4.5 - Decision Tree - INPUT - Analysis Parameters - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
yuy1504291362546.ditamap
dita:ditavalPath
ft:empty
  1. On the Decision Tree dialog box, click INPUT.
  2. Click analysis parameters.
    Decision Tree > Input > Analysis Parameters

  3. On this screen, select:
    • Splitting Options
      • Splitting Method
        • Gain Ratio — Option to use the Gain Ratio splitting criteria.
        • Gini Index — Option to use the Gini Index splitting criteria.
        • Chaid — Option to use the Chaid splitting criteria. When using this option you are also given the opportunity to change the merging or splitting Chaid Significance Levels.
        • Regression Trees — Option to use the Regression splitting criteria as outlined above.
        • Gain Ratio Extreme — Option to use the Gain Ratio splitting criteria using a stored procedure and table operator that process the data more directly in the database for better resource utilization.
          When using this option, confirm that the td_analyze external stored procedure and the tda_dt_calc table operator are installed in the database where the TWM metadata tables reside. This can be performed using the Install or Uninstall UDFs option under the Teradata Warehouse Miner start program item, selecting the option to Install TD_Analyze UDFs.
        • Minimum Split Count — This option determines how far the splitting of the decision tree will go. Unless a node is pure (meaning it has only observations with the same dependent value) it splits if each branch that can come off this node will contain at least this many observations. The default is a minimum of 2 cases for each branch.
        • Maximum Nodes — (This option is not available when using the Gain Ratio Extreme splitting method.) If the nodes in the tree are equal to or exceed this value while splitting a certain level of the tree, the algorithm stops the tree growing after completing this level and returns the tree built so far. The default is 10000 nodes.
        • Maximum Depth — Another method of stopping the tree is to specify the maximum depth the tree may grow to. This option will stop the algorithm if the tree being built has this many levels. The default is 100 levels.
        • Chaid Significance Levels — (These options are only available when using the Chaid splitting method.)
          • Merging — Independent variables are tested by looping through the values and merging categories that have the least significant difference from one another and also are still below this merging significance level parameter (default .05).
          • Splitting — Once all independent variables have been optimally merged the one with the highest significance is chosen for the split, the data is subdivided, and the process is repeated on the subsets of the data. The splitting stops when the significance goes above this splitting significance level parameter (default .05).
        • Bin Numeric Variables — Option to automatically Bincode the continuous independent variables. Continuous data is separated into one hundred bins when this option is selected. If the variable has less than one hundred distinct values, this option is ignored.
        • Include Validation Table — (This option is not available when using the Gain Ratio Extreme splitting method.) A supplementary table may be utilized in the modeling process to validate the effectiveness of the model on a separate set of observations. If specified, this table is used to calculate a second set of confidence or targeted confidence factors. These recalculated confidence factors are viewed in the tree browser and/or added to the scored table when scoring the resultant model. When Include Validation Table is selected, a separate validation table is required.
          • Database — The name of the database to look in for the validation table - by default, this is the source database.
          • Table — The name of the validation table to use for recalculating confidence or targeted confidence factors.
        • Include Lift Table — (This option is not available when using the Gain Ratio Extreme splitting method.) Option to generate a Cumulative Lift Table in the report to demonstrate how effective the model is in estimating the dependent variable. Valid for binary dependent variables only.
        • Response Value — An optional response value can be specified for the dependent variable that will represent the response value. Note that all other dependent variable values will be considered a non-response value.
        • Values — Bring up the Decision Tree values wizard to help in specifying the response value.
    • Pruning Options
      • Pruning Method — Pull-down list with the following values:
        • Gain Ratio — Option to use the Gain Ratio pruning criteria as outlined above.
        • Gini Index — (This option is not available when using the Gain Ratio Extreme splitting method.) Option to use the Gini Index pruning criteria as outlined above.
        • None — Option to not prune the resultant decision tree.
      • Gini Test Table — (This option does not apply when using the Gain Ratio Extreme splitting method.) When Gini Index pruning is selected as the pruning method, a separate Test table is required.
        • Database — The name of the database to look for the Test table - by default, this is the source database.
        • Table — The name of the table to use for test purposes during the Gini Pruning process.