5.4.6 - Frequency - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

prodname
Teradata Warehouse Miner
vrm_release
5.4.6
created_date
November 2018
category
User Guide
featnum
B035-2306-118K

Purpose

A Frequency analysis counts the occurences of individual data values in columns that contain categorical data. Frequency analysis is useful in understanding the meaning of a particular data element, and can point out the need to recode some of the data values found, either permanently or in the course of building an analytic data set. This function is also useful in analyzing combinations of values occurring in two or more columns.

A Frequency analysis calculates the number of occurrences of each value of the column or columns individually or in combination. Additionally, for each value, the percentage of rows in the selected table is provided in descending order starting with the most frequently occurring value.

You can perform Frequency analysis on columns of any data type except types that hold byte data (for example, BYTE and VARBYTE).

Syntax

call twm. td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;Optional Parameters;');

Required Parameters

columns
The columns to analyze. For example, columns=c1,c2,c3. When columns=all is entered, all columns in the input table are analyzed.
database
The database containing the table to analyze.
frequency
The frequency parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
tablename
The table containing the columns to analyze.

Optional Parameters

cumulativeoption
When true, includes rank, cumulative count, and cumulative percent information for each frequency value. This option is not available if the pairwise style is selected.
gensqlonly
When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.
having
Causes the specified SQL HAVING clause, generated within the Frequency SQL, to restrict returned aggregations. For example: having=xpct > 1. This option is valid only if the minimumpercentage option is not used.
minimumpercentage
If specified, the analysis includes only frequency values that occur a minimum percentage of the time, determined by the value of this parameter. Setting this to 0 or 0.0 is equivalent to not including the parameter at all.
outputdatabase
Specifies the name of the database to contain the analysis results table.
outputtablename
Specifies the name of the table to store the analysis results. If not supplied, the results are returned as a result set.
overwrite

When overwrite is set to true (default), the output tables are dropped before creating new ones.

pairwisecolumns
The columns to be paired up with the frequency columns. Only use this option when the style parameter is set to pairwise.
statisticscolumns
If statistics columns are specified with this option, the minimum, maximum, mean value, and standard deviation of each column are included in the result set or answer table with the values computed over the rows corresponding to the individual values of the frequency columns. This option is available only with the basic frequency style and not with pairwise or crosstab.
style
This option sets the frequency style to one of three values.
Style Description
basic Counts frequencies of individual column values (default).
pairwise Counts frequencies of pair-wise combinations of values of selected columns rather than individually. Not available if the cumulative option is selected.
crosstab Counts frequencies of combinations of values of selected columns rather than individually.
topvalues
Number of frequency values to include. Shows frequency only for the number of top occurring values entered. For example, topvalues=10. This option is enabled only if cumulativeoption is selected.
where
Causes the specified SQL WHERE clause, generated within the Frequency SQL, to filter rows selected for analysis. For example: where=cust_id > 0.

Examples

These examples demonstrate how to use the Frequency analysis. To run the provided examples, the td_analyze function must be installed in a database called twm and the TWM tutorial data must be installed in the twm_source database.

The following case has no special options.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;');

This case only includes frequency values that occur a minimum 10% of the time.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;minimumpercentage=10;');

This case includes cumulative measures and only the top five frequency values.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;cumulativeoption=true;topvalues=5;');

This case uses the crosstab option on two columns.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender,marital_status;style=crosstab;');

This case uses the pairwise option, combining one pairwise column with two frequency columns.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender,marital_status;style=pairwise;pairwisecolumns=years_with_bank;');

This case uses a statistics column and also creates a permanent result table.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender;statisticscolumns=years_with_bank;outputdatabase=twm_results;outputtablename=twm_frequency;');

This case uses the where and having options to introduce a custom where clause and having clause.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;cumulativeoption=true;where=cust_id < 0;having=xpct > 1;');