Data Explorer - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

Product
Teradata Warehouse Miner
Release Number
5.4.4
Published
July 2017
Language
English (United States)
Last Update
2018-05-03
dita:mapPath
wbc1492033894304.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2300
Product Category
Software
The Data Explorer performs basic statistical analysis on a set of selected tables or on selected columns from selected tables in one or more databases. It stores results from four fundamental types of analysis based on simplified versions of the Descriptive Statistics analyses:
  • Values
  • Statistics
  • Frequency
  • Histogram

An answer table is produced for each requested type of analysis, the output including requested table names and column names in order to allow results from multiple tables to be included in each answer table.

Each analysis can be selected individually, with the following exceptions:
  1. If Frequency is selected, Values must be selected.
  2. If Histogram is selected, Values must be selected and Statistics must be selected including the Count, Minimum, Maximum, Mean and Standard Deviation.

Data Explorer includes intelligence about which functions should be performed on which columns, with decisions based partly on column type and partly on results obtained. It also includes performance enhancements resulting in minimal passes on the input data. You may also specify a separate SQL Where Clause to apply to each of the input tables selected for analysis.

The Data Explorer normal processing scheme is outlined below. Note that underlined values given in the following topics are threshold values which can be set by the user. The program first builds up to four output tables, then the steps below are applied to each requested input table, one at a time. If parallel processing is requested, however, the tables are, in a sense, processed n at a time, where n is the number of tables to process in parallel. That is, the program establishes n threads and performs the steps below for each input table in a separate thread until all tables are processed.

For general information about output, see OUTPUT Tab.