Data Explorer - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
dfw1503087325991.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Purpose

The Data Explorer performs basic statistical analysis on a set of selected tables, or on selected columns from tables, in one or more databases. It stores results from four fundamental types of analysis based on simplified versions of the Descriptive Statistics analyses:

  • Values
  • Statistics
  • Frequency
  • Histogram

An answer table is produced for each requested type of analysis, the output including requested database, table, and column names to allow results from multiple tables to be included in each answer table. Each analysis is selected individually, with the following exceptions:

  • If Frequency is selected, Values must be selected.
  • If Histogram is selected, Values and Statistics must be selected, including the Count, Minimum, Maximum, Mean and Standard Deviation.

The Data Explorer includes intelligence about which functions should be performed on which columns, with decisions based partly on column type and partly on obtained results. It also includes performance enhancements in the generated SQL to help minimize passes through the data. A separate SQL Where Clause can be specified for each of the input tables selected for analysis.

A Values analysis can be performed on columns of any data type, though the measures displayed vary according to the column type.

A Statistics analysis can be performed on columns of numeric or date data type. For columns of type DATE, statistics other than count, minimum, maximum and mean are calculated by first converting to the number of days since 1900.

A Frequency analysis can be performed on columns of any data type except types that hold byte data (for example, BYTE and VARBYTE).

A Histogram analysis can be performed on columns of numeric or date data type.

Syntax

call twm. td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;Optional Parameters;');

Required Parameters

database
The database or databases containing the table or tables to be analyzed.
The number of databases in the database parameter and the number of tables in the tablename parameter must be the same. They are matched by position, and the database names can be repeated. For example, database=d1,d1,d2 and tablename=t1,t2,t3 indicates d1.t1, d1.t2, and d2.t3 are the tables to analyze, with tables t1 and t2 both residing in database d1.
DataExplorer
The DataExplorer parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
outputdatabase
The results of the analysis are stored in one or more tables in this database.
tablename
The table or tables containing the columns to analyze. For example, tablename=t1,t2,t3 indicates three tables (t1, t2, and t3) are to be analyzed. The number of databases in the database parameter and the number of tables in the tablename parameter must be the same.

Optional Parameters

See the Teradata Warehouse Miner User Guide, Volume 1, Introduction and Profiling for additional information.

bins
The number of bins to create if the Histogram analysis is performed. The default is 10.
binstyle
The style of binning to perform if the Histogram analysis is performed. The options are binstyle=bins (default) and binstyle=quantiles.
columns
The columns to analyze. For example, columns=c1,c2,c3. Enter columns=all is entered, all columns in the input table are analyzed.
frequencyoutputtablename
The frequency analysis function is run and an output table is created with the results. Specify a unique name to avoid overwriting previous results (for example,frequencyoutputtablename=FrequencyExampleOutputTableName). If no outputtablename parameters are specified when Data Explorer is performed, all analysis functions are run and all output tables are created with a default table name. The default table name for frequencyoutputtablename is TwmExploreFrequency.
histogramoutputtablename
The histogram analysis function is run and an output table is created with the results. Specify a unique name to avoid overwriting previous results (for example, histogramoutputtablename=HistogramExampleOutputTableName). If no outputtablename parameters are specified when Data Explorer is performed, all analysis functions are run and all output tables are created with a default table name. The default table name for histogramoutputtablename is TwmExploreHistogram.
maxnumcombvalues
The maximum number of combined values for frequency or histogram analysis. The default is 10000.
maxuniquecharvalues
The maximum number of unique character values for unrestricted frequency analysis. The default value is 100.
maxuniquenumvalues
The maximum number of unique numeric or date values for frequency analysis. The default is 20.
minrowsforcomb
The minimum number of rows before frequency or histogram combining attempted. The default is 25000.
overwrite

When overwrite is set to true (default), the output tables are dropped before creating new ones.

restrictedfreqproc
The restricted frequency processing including prominent values. The default is true.
restrictedthreshold
The minimum percentage of rows a value must occur in, for inclusion in results. The default is 1.
statisticalmethod
The calcuated statistic as specified by either population or sample. For example, statisicalmethod=population or statisticalmethod=sample can be specified. Population statistics is the default if neither is specified.
statisticsoutputtablename
The statistics analysis function is run and an output table is created with the results. Specify a unique name to avoid overwriting previous results (for example, statisticsoutputtablename=StatisticsExampleOutputTableName). If no outputtablename parameters are specified when Data Explorer is performed, all analysis functions are run and all output tables are created with a default table name. The default table name for statisticsoutputtablename is TwmExploreStatistics.
statsoptions
The basic statistics to be calculated if the Statistics analysis is performed. Shortened aliases can be used instead of the lengthier names (for example, statsoptions=all or statsoptions=cnt,min,max,mean,std). Available statistics include:
  • all
  • none
  • count (cnt)
  • minimum (min)
  • maximum (max)
  • mean
  • standarddeviation (std)
  • skewness (skew)
  • kurtosis (kurt)
  • standarderror (ste)
  • coefficientofvariance (cv)
  • variance (var)
  • sum
  • uncorrectedsumofsquares (uss)
  • correctedsumofsquares (css)
uniques
The unique values count for each selected column when true. By default, unique values are set to true and not counted.
valueoutputtablename
The value analysis function is run and an output table is created with the results. Specify a unique name to avoid overwriting previous results (for example, valuesoutputtablename=ValuesExampleOutputTableName). If no outputtablename parameters are specified when Data Explorer is performed, all analysis functions are run and all output tables are created with a default table name. The default table name for valuesoutputtablename is TwmExploreValues.
where

The optional WHERE clause to filter the data to process. For example:

where=cust_id > 0

Examples

These examples demonstrate the performance of the valid combinations of analyses with a single input table. To execute the provided examples, the td_analyze function must be installed in a database called twm and the Teradata Warehouse Miner tutorial data must be installed in the twm_source database.

This example performs all analyses (Values, Statistics, Frequency, and Histogram) and creates the output tables for each of these analyses with default names.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;');

This example performs only the Values analysis, naming the Values output table v1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;');

This example performs only the Statistics analysis, naming the Statistics output table s1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;statisticsoutputtablename=s1;');

This example performs a Values and Statistics analysis, naming the output tables v1 and s1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;statisticsoutputtablename=s1;');

This example performs both a Values and Frequency analysis, naming the output tables v1 and f1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;frequencyoutputtablename=f1;');

This example performs a Values, Statistics, and Histogram analysis, naming the output tables v1, s1 and h1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;statisticsoutputtablename=s1;histogramoutputtablename=h1;');

This example performs one of each of the four analyses, Values, Statistics, Frequency and Histogram, and names the output tables v1, s1, f1 and h1.

call twm.td_analyze('DataExplorer','database=twm_source;tablename=twm_customer;outputdatabase=twm_results;valuesoutputtablename=v1;statisticsoutputtablename=s1;frequencyoutputtablename=f1;histogramoutputtablename=h1;');