Data Explorer - INPUT - Expert Options

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

Teradata Warehouse Miner
User Guide
  1. On the Data Explorer dialog box, click on INPUT.
  2. Click on expert options.
    Data Explorer > Input > Expert Options

  3. On this screen, select:
    • Number of Tables to Process in Parallel — The total number of threads to use in order to process tables in parallel. This threading is described above. Defaults to 3 tables.
    • Maximum Unique Character Values for Unrestricted Frequency Analysis — The maximum number of unique values for character type columns to perform unrestricted frequency analysis on (by default, the value is 100 unique values). Changing this will add to the processing time for the Data Explorer Frequency analysis as a complete frequency will be done against more unique values. If the total number of unique values exceeds the number given here, a restricted frequency is automatically done. See Restricted Frequecy Processing below.
    • Maximum Unique Numeric/Date Values for Frequency Analysis — The maximum number of unique values for numeric and date type columns to perform a frequency analysis on (by default, the value is 20 unique values). If a numeric or date column has more unique values than this, a histogram is performed instead.
    • Minimum Rows Before Frequency/Histogram Combining Attempted — The minimum number of rows to use the combining strategy within frequency and histogram analysis. This strategy is defined for both analyses above. Defaults to 25000 rows. Note that less than that has shown no performance improvement when combining columns for those analyses.
    • Maximum Number of Combined Values for Frequency/Histogram Analysis — The maximum number of possible combined values to allow when combining columns in frequency and histogram analysis. Performance problems and/or SQL errors may result when this is increased. Defaults to 10000 combined values.
    • Restricted Frequency Processing (Include Prominent Values) — Check box to enable a restricted frequency analysis. Restricted frequency is defined as the minimum percentage of rows a value must occur in for inclusion in results for character columns with more unique values than the specified threshold parameter (as specified by the Maximum Unique Character Values for Unrestricted Frequency Analysis parameter). Defaults to enabled.
      • Minimum Fraction of Rows Frequency Value Must Occur In — If the ratio of unique values to rows is greater than 100 minus this percentage (100 - 1 = 99%), the restricted frequency analysis is skipped. If not, the restricted frequency analysis is executed. Defaults to 1 percent.
    • Auto-Calculate the Number of Select List Items — When checked, an attempt is made to determine the number of select list items that should be included in the SQL generated for the Values and Statistics analyses. In some cases, however, the SQL for the Statistics analysis may fail due to too many select list items being generated, dependent on the number of input columns and the Basic Statistics Options requested. In this case, the Auto-Calculate option should be unchecked and a value provided in the Maximum Number... text box below it.
      Tip: When processing more than 300 input columns with the first five statistics requested, try setting the maximum items to 1000 or less in the text box below.
      • Maximum Number of Select List Items — An integer greater than 0 representing the maximum number of items that will appear in any given SELECT statement generated by the Data Explorer Values and Statistics analyses.