Decision Tree - INPUT - Data Selection - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Teradata Warehouse Miner
Release Number
February 2018
English (United States)
Last Update
Product Category
  1. On the Decision Tree dialog box, click INPUT.
  2. Click data selection.
    Decision Tree > Input > Data Selection

  3. On this screen, select:
    • Select Input Source — Users can select between different sources of input. By selecting the Input Source Table, the user can select from available databases, tables (or views) and columns in the usual manner. By selecting the Input Source Analysis, however, the user can select directly from the output of another analysis of qualifying type in the current project. Analyses that can be selected from directly include all of the Analytic Data Set (ADS) and Reorganization analyses (except Refresh). In place of Available Databases, the user may select from Available Analyses, while Available Tables then contains a list of all the output tables that will eventually be produced by the selected Analysis.
      Since this analysis cannot select from a volatile input table, Available Analyses contains only those qualifying analyses that create an output table or view.
    • Select Columns From a Single Table
      • Available Databases (or Analyses) — All the databases (or analyses) that are available for the Decision Tree analysis.
      • Available Tables — All the tables that are available for the Decision Tree analysis.
      • Available Columns — Within the selected table or matrix, all columns that are available for the Decision Tree analysis.
      • Selected Columns — Select columns by highlighting and then either dragging and dropping into the Selected Columns window, or click on the arrow button to move highlighted columns into the Selected Columns window.
        The Selected Columns window is a split window; you can either insert columns as Dependent or Independent columns. Make sure you have the correct portion of the window highlighted.
        • Independent — These may be of numeric or character type.
        • Dependent — The dependent variable column is the column whose value is being predicted. It is selected from the Available Variables in the selected table. When Gain Ratio or Gini Index are selected as the Tree Splitting criteria, this is treated as a categorical variable with distinct values, in keeping with the nature of classification trees.
          An error occurs if the Dependent Variable has more than 50 distinct values. When Regression Trees is selected as the Tree Splitting criteria, this is treated as a continuous variable. In this case, it must contain only numeric values.