| |
- Frequency(data, columns=None, exclude_columns=None, cumulative_option=False, agg_filter=None, min_percentage=None, pairwise_columns=None, stats_columns=None, style='basic', top_n=None, filter=None, gen_sql_only=False, charset=None)
- DESCRIPTION:
Frequency analysis counts the occurrences of individual data values in columns that
contain categorical data. Frequency analysis is useful in understanding the meaning
of a particular data element and can point out the need to recode some of the data
values found, either permanently or in the course of building an analytic data set.
This function is also useful in analyzing combinations of values occurring in two
or more columns.
A Frequency analysis calculates the number of occurrences of each value of the column
or columns individually or in combination. Additionally, for each value, the percentage
of rows in the selected DataFrame is provided in descending order starting with the
most frequently occurring value.
PARAMETERS:
data:
Required Argument.
Specifies the input data to perform frequency analysis.
Types: teradataml DataFrame
columns:
Required Argument.
Specifies the name(s) of the column(s) to analyze. Occasionally, it can also accept
permitted strings to specify all columns, or all numeric columns, or all character
columns.
Permitted Values:
* Name(s) of the column(s) in "data".
* Pre-defined strings:
* 'all' - all columns
* 'allnumeric' - all numeric columns
* 'allcharacter' - all numeric and date columns
Types: str OR list of Strings (str)
exclude_columns:
Optional Argument.
Specifies the name(s) of the column(s) to exclude from the analysis, if a column
specifier such as 'all', 'allnumeric', 'allcharacter' is used in the "columns"
argument.
Types: str OR list of Strings (str)
cumulative_option:
Optional Argument.
Specifies whether to include rank, cumulative count, and cumulative percent
information for each frequency value. When set to 'True', this information is
included otherwise not.
Note:
This argument should not be set to 'True' when style is 'pairwise'.
Default Value: False
Types: bool
agg_filter:
Optional Argument.
Specifies the clause to restrict returned aggregations.
For example,
agg_filter="xpct > 1"
Note:
This argument should not be used when "min_percentage" argument is used.
Types: str
min_percentage:
Optional Argument.
Specifies a value to determine whether to include only frequency values that occur
a minimum percentage of the time. Setting this to 0 or 0.0 is equivalent to not
including the argument at all.
Types: float, int
pairwise_columns:
Optional Argument.
Specified the columns to be paired up with the frequency columns.
Note:
Use only when "style" is set to 'pariwise'.
Types: str OR list of Strings (str)
stats_columns:
Optional Argument.
Specifies the name(s) of column(s) for which the minimum, maximum, mean value,
and standard deviation are included in the result with the values computed over
the rows corresponding to the individual values of the frequency columns.
Note:
This argument can be used only when "style" is 'basic'.
Types: str OR list of Strings (str)
style:
Optional Argument.
Specifies the frequency style for the analysis.
Permitted Values:
* 'basic' - Counts frequencies of individual column values.
* 'pairwise' - Counts frequencies of pair-wise combinations of values of
selected columns rather than individually. This should not be used
when "cumulative_option" is set to 'True'.
* 'crosstab' - Counts frequencies of combinations of values of selected columns
rather than individually.
Default Value: 'basic'
Types: str
top_n:
Optional Argument.
Specifies the number of frequency values to include. Using this argument shows
frequency only for the number of top occurring values entered.
Note:
This argument is enabled only if "cumulative_option" is set to 'True'.
Types: int
filter:
Optional Argument.
Specifies the clause to filter rows selected for analysis within Frequency.
For example,
filter = "cust_id > 0"
Types: str
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
charset:
Optional Argument.
Specifies the character set for the table name and column names.
If this argument is not set, the function takes default value set by
VAL library.
Permitted Values:
* 'UTF8'
* 'ASCII'
Types: str
RETURNS:
An instance of Frequency.
Output teradataml DataFrames can be accessed using attribute references, such as
FrequencyObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrame.
df = DataFrame("customer")
print(df)
# Example 1: Perform frequency analysis using default values.
obj = valib.Frequency(data=df,
columns="years_with_bank")
# Print the results.
print(obj.result)
# Example 2: Calculate frequency values that occur a minimum 10% of the time for
# the column 'years_with_bank'.
obj = valib.Frequency(data=df,
columns="years_with_bank",
min_percentage=10)
# Print the results.
print(obj.result)
# Example 3: Calculate the frequency values for the column 'years_with_bank'. Include
# cumulative measures and get only the top five frequency values.
obj = valib.Frequency(data=df,
columns="years_with_bank",
cumulative_option=True,
top_n=5)
# Print the results.
print(obj.result)
# Example 4: Calculate the frequencies using 'crosstab' style on two columns 'income'
# and 'age'.
obj = valib.Frequency(data=df,
columns=["income", "age"],
style="crosstab")
# Print the results.
print(obj.result)
# Example 5: Calculate frequencies using 'pairwise' style by combining one pairwise
# column with two frequency columns.
obj = valib.Frequency(data=df,
columns=["gender", "marital_status"],
style="pairwise",
pairwise_columns="years_with_bank")
# Print the results.
print(obj.result)
# Example 6: Calculate frequencies with inclusion of "stats_columns".
obj = valib.Frequency(data=df,
columns="gender",
stats_columns="years_with_bank")
# Print the results.
print(obj.result)
# Example 7: Calculate frequencies by applying filtering using "agg_filter" and
# "filter" to introduce a custom where clause and having clause.
obj = valib.Frequency(data=df,
columns="years_with_bank",
filter="cust_id > 0",
agg_filter="xpct > 1")
# Print the results.
print(obj.result)
# Example 8: Generate only SQL for the function, but do not execute the same.
obj = valib.Frequency(data=df,
columns=["income", "age"],
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|