If a Frequency analysis is requested, and the option to “Compute unique values for each column selected” is also requested along with the Values analysis, a Frequency analysis is performed on every requested numeric and date type column that has less than or equal to a user specified number of unique values (by default 20), and on every character type column that has less than or equal to a user specified number of unique values (by default 100). Character type columns with more values can be analyzed with a restricted Frequency analysis which returns only 'prominent' values that occur in greater than or equal to a user determined x % of rows (by default 1%), provided the ratio of unique values to rows is less than 100 - x % (by default 99%). The option to perform a restricted Frequency analysis, as well as the threshold values underlined above, can be set on the expert options tab.
If both restricted and regular frequency processing are to be performed, restricted frequency processing is actually performed first in order to facilitate restart processing, should it become necessary. Once restricted frequency processing is performed, a strategy for efficiently calculating regular frequencies must be determined. One strategy is simply to calculate each frequency individually (i.e., one at a time). The other strategy is to combine columns into an intermediate table of counts and then select individual column frequencies from the intermediate table. This can enhance performance dramatically in cases where there are not too many combinations of values and where there are enough rows to make the effort worth while. Too many combined values can, however, lead to greatly degraded performance.
- The minimum number of rows to use the combining strategy with, by default 25000.
- The maximum number of possible combined values in combined columns, by default 10,000.