Frequency

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

brand
Software
prodname
Teradata Warehouse Miner
vrm_release
5.4.4
category
User Guide
featnum
B035-2300-077K

Frequency analysis is designed to count the occurrence of individual data values in columns that contain categorical data. It can be useful in understanding the meaning of a particular data element, and it may point out the need to recode some of the data values found, either permanently or in the course of building an analytic data set. This function can also be useful in analyzing combinations of values occurring in two or more columns.

Given a table name and the name of one or more columns, the Frequency analysis calculates the number of occurrences of each value of the column or columns individually or in combination. Additionally, the percentage of rows in the selected table are listed in descending order starting with the most frequently occurring value. Optionally, you may request:
  • Whether to calculate frequencies of column values individually or in combination, using the Compute Cross-Tabulation option.
  • Whether to calculate pair-wise frequencies from one or two lists of column values, using the Compute Pairwise Frequencies option.
  • Whether to calculate basic statistics (min, max, mean, standard deviation) on one or more columns.
  • Whether to provide additional cumulative sums over the frequencies and percents, in addition to the associated rank, using the “Cumulative Options”. This feature is not available when the Compute Pairwise Frequencies option is selected.
  • A different sort order, such as by the selected column(s).
  • A WHERE clause, reducing the rows before aggregation.
  • A HAVING clause, reducing the answer set after aggregating, which must refer to the requested column(s), xcnt or xpct. In the case of Compute Pairwise Frequencies, reference to col1, col2, xcnt or xpct is required (this is implemented as a final WHERE clause).
  • A QUALIFY clause, reducing the answer set after aggregating, which may refer to any returned column, but is most useful in conjunction with xrank to specify the maximum number of rows to return, for instance “xrank ≤ 50” (requires setting the “cumulative” option).
The following rules apply to the Frequency analysis:
  1. If the Compute Cross-Tabulation option is not requested (the default case) and multiple columns are requested, the analysis is repeated individually for each requested column. In this case the CREATE VIEW option may not be requested, and if the CREATE TABLE option is requested, the create occurs only once with subsequent INSERT/SELECT statements generated.
  2. If the Compute Cross-Tabulation option is requested, one select is generated for the entire column list taken together.
  3. If the Cumulative Options is requested, cumulative sums for the column(s) being analyzed are provided. Specifically, cumulative sums of the frequencies and percents are provided, as well as the associated rank. This is not available when the Compute Pairwise Frequencies option is selected.
  4. If the Compute Pairwise Frequencies option is requested, a count of the number of occurrences in the table for each pair-wise combination of values in the selected columns is given along with the percentage of the total number of rows. Alternatively, two lists of columns can be given, with each column in the first list combined with each column in the second list to count the number of occurrences of each combination of values as above.
  5. If multiple column lists are given, the Compute Cross-Tabulation option is not allowed.
  6. Explain and Create View options are not allowed when multiple columns are selected.
  7. Statistics column(s) can only be specified with basic frequency, not in combination with cumulative, crosstab or pairwise options.
  8. When multiple columns are requested with Select result option, columns are combined within a single query using a volatile table with a final select at the end to format the results.
  9. BYTE types are supported only with the Select result option and individual column requests.

The Frequency analysis is parameterized by specifying the table and column(s) to analyze, options unique to the Frequency analysis, as well as specifying the desired results and SQL or Expert Options.

For general information about output, see OUTPUT Tab.