5.4.2 - Frequency - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

prodname
Teradata Warehouse Miner
vrm_release
5.4.2
created_date
October 2016
category
User Guide
featnum
B035-2306-106K

Purpose

A Frequency analysis is designed to count the occurrence of individual data values in columns that contain categorical data. It can be useful in understanding the meaning of a particular data element, and it may point out the need to recode some of the data values found, either permanently or in the course of building an analytic data set. This function can also be useful in analyzing combinations of values occurring in two or more columns.

A Frequency analysis calculates the number of occurrences of each value of the column or columns individually or in combination. Additionally, for each value, the percentage of rows in the selected table is provided in descending order starting with the most frequently occurring value.

Generally, a Frequency analysis can be performed on columns of any data type except types that hold byte data (for example, BYTE and VARBYTE type data).

Syntax

call twm. td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;Optional Parameters;');

Required Parameters

columns
The columns to be analyzed. For example, columns=c1,c2,c3. If columns=all is entered, all columns in the input table are analyzed.
database
The database containing the table to be analyzed.
frequency
The frequency parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
tablename
The table containing the columns to be analyzed.

Optional Parameters

cumulativeoption
When set to true, includes rank, cumulative count and cumulative percent information for each frequency value. This option is not available if the pairwise style is selected.
having
Option to have the specified SQL HAVING clause generated within the Frequency SQL to restrict returned aggregations. For example: having=xpct > 1. This option is valid only if the minimumpercentage option is not used.
minimumpercentage
If specified, the analysis will include only frequency values that occur a minimum percentage of the time, determined by the value of this parameter. Setting this parameter to a value of 0 or 0.0 is equivalent to not including the parameter at all.
outputdatabase
This parameter specifies the name of the database that will contain the analysis results table.
outputtablename
Specifies the name of the table that will store the analysis results. If this parameter is not supplied, the results are returned as a result set.
pairwisecolumns
The columns to be paired up with the frequency columns. This option is only used when the style parameter is set to pairwise.
statisticscolumns
If statistics columns are specified with this option, the minimum, maximum, mean value and standard deviation of each column are included in the result set or answer table with the values computed over the rows corresponding to the individual values of the frequency columns. This option is available only with the basic frequency style and not with pairwise or crosstab.
style
This option sets the frequency style to one of three possible values.
Style Description
basic Option to count frequencies of individual column values (default).
pairwise Option to count frequencies of pair-wise combinations of values of selected columns rather than individually. Not available if the cumulative option has been selected.
crosstab Option to count frequencies of combinations of values of selected columns rather than individually.
topvalues
Number of frequency values to include. If supplied, show frequency only for the number of top occurring values entered. For example, topvalues=10. This option is enabled only if cumulativeoption is selected.
where
Option to have the specified SQL WHERE clause generated within the Frequency SQL to filter rows selected for analysis.For example: where=cust_id > 0.

Examples

The examples included in this section demonstrate how to use the Frequency analysis. These examples assume that the td_analyze function has been installed in a database named twm.

The following is a simple case with no special options.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;');

This case only includes frequency values that occur a minimum 10% of the time.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;minimumpercentage=10;');

This case includes cumulative measures and only the top five frequency values.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;cumulativeoption=true;topvalues=5;');

This case uses the crosstab option on two columns.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender,marital_status;style=crosstab;');

This case uses the pairwise option, combining one pairwise column with two frequency columns.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender,marital_status;style=pairwise;pairwisecolumns=years_with_bank;');

This case uses a statistics column and also creates a permanent result table.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender;statisticscolumns=years_with_bank;outputdatabase=twm_results;outputtablename=twm_frequency;');

This case uses the where and having options to introduce a custom where clause and having clause.

call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;cumulativeoption=true;where=cust_id < 0;having=xpct > 1;');