Data Explorer - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.4
Published
August 2017
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
guj1484331868727.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Purpose

The Data Explorer performs basic statistical analysis on a set of selected tables, or on selected columns from tables, in one or more databases. It stores results from four fundamental types of analysis based on simplified versions of the Descriptive Statistics analyses:

  • Values
  • Statistics
  • Frequency
  • Histogram

An answer table is produced for each requested type of analysis, the output including requested database, table, and column names to allow results from multiple tables to be included in each answer table. Each analysis is selected individually, with the following exceptions:

  • If Frequency is selected, Values must be selected.
  • If Histogram is selected, Values and Statistics must be selected, including the Count, Minimum, Maximum, Mean and Standard Deviation.

The Data Explorer includes intelligence about which functions should be performed on which columns, with decisions based partly on column type and partly on obtained results. It also includes performance enhancements in the generated SQL to help minimize passes through the data. A separate SQL Where Clause can be specified for each of the input tables selected for analysis.

A Values analysis can be performed on columns of any data type, though the measures displayed vary according to the column type.

A Statistics analysis can be performed on columns of numeric or date data type. For columns of type DATE, statistics other than count, minimum, maximum and mean are calculated by first converting to the number of days since 1900.

A Frequency analysis can be performed on columns of any data type except types that hold byte data (for example, BYTE and VARBYTE).

A Histogram analysis can be performed on columns of numeric or date data type.

Syntax

call twm. td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;Optional Parameters;');

Required Parameters

database
The database or databases containing the table or tables to be analyzed.
The number of databases in the database parameter and the number of tables in the tablename parameter must be the same. They are matched by position, and the database names can be repeated. For example, database=d1,d1,d2 and tablename=t1,t2,t3 indicates d1.t1, d1.t2, and d2.t3 are the tables to analyze, with tables t1 and t2 both residing in database d1.
DataExplorer
The DataExplorer parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
outputdatabase
The results of the analysis are stored in one or more tables in this database.
There are four output table name parameters. These affect not only the names of the output tables but whether the indicated types of analysis are performed. If none of these parameters is supplied, all four fundamental types are performed. The output table names are:
  • TwmExploreValues
  • TwmExploreStatistics
  • TwmExploreFrequency
  • TwmExploreHistogram
If one or more of the output table name parameters below are specified, then only the indicated types of analysis are performed. The output tables are:
  • valuesoutputtablename=
  • statisticsoutputtablename=
  • frequencyoutputtablename=
  • histogramoutputtablename=
Not all combinations of the output table name parameters are valid. The valid combinations are:
  • values
  • statistics
  • values + statistics
  • values + frequency
  • values + statistics + histogram
  • values + statistics + frequency + histogram
For example, a combination of Values and Statistics analyses, creating output tables v1 and s1, can be requested as shown. The Examples section provides additional samples.

...valuesoutputtablename=v1;statisticsoutputtablename=s1;

tablename
The table or tables containing the columns to analyze. For example, tablename=t1,t2,t3 indicates three tables (t1, t2, and t3) are to be analyzed. The number of databases in the database parameter and the number of tables in the tablename parameter must be the same.

Optional Parameters

See the Teradata Warehouse Miner User Guide, Volume 1, Introduction and Profiling for additional information.

bins
The number of bins to create if the Histogram analysis is performed. The default is 10.
binstyle
The style of binning to perform if the Histogram analysis is performed. The options are binstyle=bins (default) and binstyle=quantiles.
columns
The columns to analyze. For example, columns=c1,c2,c3. Enter columns=all is entered, all columns in the input table are analyzed.
maxnumcombvalues
The maximum number of combined values for frequency or histogram analysis. The default is 10000.
maxuniquecharvalues
The maximum number of unique character values for unrestricted frequency analysis. The default value is 100.
maxuniquenumvalues
The maximum number of unique numeric or date values for frequency analysis. The default is 20.
minrowsforcomb
The minimum number of rows before frequency or histogram combining attempted. The default is 25000.
restrictedfreqproc
The restricted frequency processing including prominent values. The default is true.
restrictedthreshold
The minimum percentage of rows a value must occur in, for inclusion in results. The default is 1.
statisticalmethod
The calcuated statistic as specified by either population or sample. For example, statisicalmethod=population or statisticalmethod=sample can be specified. Population statistics is the default if neither is specified.
statsoptions
The basic statistics to be calculated if the Statistics analysis is performed. Shortened aliases can be used instead of the lengthier names (for example, statsoptions=all or statsoptions=cnt,min,max,mean,std). Available statistics include:
  • all
  • none
  • count (cnt)
  • minimum (min)
  • maximum (max)
  • mean
  • standarddeviation (std)
  • skewness (skew)
  • kurtosis (kurt)
  • standarderror (ste)
  • coefficientofvariance (cv)
  • variance (var)
  • sum
  • uncorrectedsumofsquares (uss)
  • correctedsumofsquares (css)
uniques
The unique values count for each selected column when true. By default, unique values are set to true and not counted.
where

The optional WHERE clause to filter the data to process. For example:

where=cust_id > 0

Examples

These examples demonstrate the performance of the valid combinations of analyses with a single input table. These Overlap examples assume the td_analyze function is installed in a database named twm.

This example performs all analyses (Values, Statistics, Frequency, and Histogram) and creates the output tables for each of these analyses with default names.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;');

This example performs only the Values analysis, naming the Values output table v1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;');

This example performs only the Statistics analysis, naming the Statistics output table s1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;statisticsoutputtablename=s1;');

This example performs a Values and Statistics analysis, naming the output tables v1 and s1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;statisticsoutputtablename=s1;');

This example performs both a Values and Frequency analysis, naming the output tables v1 and f1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;frequencyoutputtablename=f1;');

This example performs a Values, Statistics, and Histogram analysis, naming the output tables v1, s1 and h1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;statisticsoutputtablename=s1;histogramoutputtablename=h1;');

This example performs one of each of the four analyses, Values, Statistics, Frequency and Histogram, and names the output tables v1, s1, f1 and h1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;statisticsoutputtablename=s1;frequencyoutputtablename=f1;histogramoutputtablename=h1;');