Purpose
A Frequency analysis counts the occurrence of individual data values in columns that contain categorical data. It is useful in understanding the meaning of a particular data element, and it can point out the need to recode some of the data values found, either permanently or in the course of building an analytic data set. This function can also be useful in analyzing combinations of values occurring in two or more columns.
A Frequency analysis calculates the number of occurrences of each value of the column or columns individually or in combination. Additionally, for each value, the percentage of rows in the selected table is provided in descending order starting with the most frequently occurring value.
A Frequency analysis can be performed on columns of any data type except types that hold byte data (for example, BYTE and VARBYTE).
Syntax
call twm. td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;Optional Parameters;');Required Parameters
- columns
- The columns to analyze. For example, columns=c1,c2,c3. Enter columns=all is entered, all columns in the input table are analyzed.
- database
- The database containing the table to analyze.
- frequency
- The frequency parameter:
- Is required
- Must be the first parameter
- Is always enclosed in single quotes
- tablename
- The table containing the columns to analyze.
Optional Parameters
- cumulativeoption
- When true, includes rank, cumulative count, and cumulative percent information for each frequency value. This option is not available if the pairwise style is selected.
- having
- Causes the specified SQL HAVING clause, generated within the Frequency SQL, to restrict returned aggregations. For example: having=xpct > 1. This option is valid only if the minimumpercentage option is not used.
- minimumpercentage
- If specified, the analysis includes only frequency values that occur a minimum percentage of the time, determined by the value of this parameter. Setting this to 0 or 0.0 is equivalent to not including the parameter at all.
- outputdatabase
- Specifies the name of the database to contain the analysis results table.
- outputtablename
- Specifies the name of the table to store the analysis results. If not supplied, the results are returned as a result set.
- pairwisecolumns
- The columns to be paired up with the frequency columns. Only use this option when the style parameter is set to pairwise.
- statisticscolumns
- If statistics columns are specified with this option, the minimum, maximum, mean value, and standard deviation of each column are included in the result set or answer table with the values computed over the rows corresponding to the individual values of the frequency columns. This option is available only with the basic frequency style and not with pairwise or crosstab.
- style
- This option sets the frequency style to one of three values.
Style Description basic Counts frequencies of individual column values (default). pairwise Counts frequencies of pair-wise combinations of values of selected columns rather than individually. Not available if the cumulative option is selected. crosstab Counts frequencies of combinations of values of selected columns rather than individually. - topvalues
- Number of frequency values to include. Shows frequency only for the number of top occurring values entered. For example, topvalues=10. This option is enabled only if cumulativeoption is selected.
- where
- Causes the specified SQL WHERE clause, generated within the Frequency SQL, to filter rows selected for analysis. For example: where=cust_id > 0.
Examples
These examples demonstrate how to use the Frequency analysis. These Overlap examples assume the td_analyze function is installed in a database named twm.
The following case has no special options.
call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;');
This case only includes frequency values that occur a minimum 10% of the time.
call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;minimumpercentage=10;');
This case includes cumulative measures and only the top five frequency values.
call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;cumulativeoption=true;topvalues=5;');
This case uses the crosstab option on two columns.
call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender,marital_status;style=crosstab;');
This case uses the pairwise option, combining one pairwise column with two frequency columns.
call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender,marital_status;style=pairwise;pairwisecolumns=years_with_bank;');
This case uses a statistics column and also creates a permanent result table.
call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=gender;statisticscolumns=years_with_bank;outputdatabase=twm_results;outputtablename=twm_frequency;');
This case uses the where and having options to introduce a custom where clause and having clause.
call twm.td_analyze('frequency','database=twm_source;tablename=twm_customer;columns=years_with_bank;cumulativeoption=true;where=cust_id < 0;having=xpct > 1;');