Data Explorer - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

Product

Teradata Warehouse Miner

Release Number

5.4.4

Published

July 2017

Language

English (United States)

Last Update

2018-05-03

dita:mapPath

wbc1492033894304.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2300

Product Category

Software

The Data Explorer performs basic statistical analysis on a set of selected tables or on selected columns from selected tables in one or more databases. It stores results from four fundamental types of analysis based on simplified versions of the Descriptive Statistics analyses:

Values
Statistics
Frequency
Histogram

An answer table is produced for each requested type of analysis, the output including requested table names and column names in order to allow results from multiple tables to be included in each answer table.

Each analysis can be selected individually, with the following exceptions:

If Frequency is selected, Values must be selected.
If Histogram is selected, Values must be selected and Statistics must be selected including the Count, Minimum, Maximum, Mean and Standard Deviation.

Data Explorer includes intelligence about which functions should be performed on which columns, with decisions based partly on column type and partly on results obtained. It also includes performance enhancements resulting in minimal passes on the input data. You may also specify a separate SQL Where Clause to apply to each of the input tables selected for analysis.

The Data Explorer normal processing scheme is outlined below. Note that underlined values given in the following topics are threshold values which can be set by the user. The program first builds up to four output tables, then the steps below are applied to each requested input table, one at a time. If parallel processing is requested, however, the tables are, in a sense, processed n at a time, where n is the number of tables to process in parallel. That is, the program establishes n threads and performs the steps below for each input table in a separate thread until all tables are processed.

For general information about output, see OUTPUT Tab.