Teradata Package for Python Function Reference - Frequency - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product: Teradata Package for Python
Release Number: 17.00
Published: November 2021
Language: English (United States)
Last Update: 2021-11-19
lifecycle: previous
Product Category: Teradata Vantage

Frequency

Functions
		Frequency(data, columns=None, exclude_columns=None, cumulative_option=False, agg_filter=None, min_percentage=None, pairwise_columns=None, stats_columns=None, style='basic', top_n=None, filter=None) DESCRIPTION: Frequency analysis counts the occurrences of individual data values in columns that contain categorical data. Frequency analysis is useful in understanding the meaning of a particular data element and can point out the need to recode some of the data values found, either permanently or in the course of building an analytic data set. This function is also useful in analyzing combinations of values occurring in two or more columns. A Frequency analysis calculates the number of occurrences of each value of the column or columns individually or in combination. Additionally, for each value, the percentage of rows in the selected DataFrame is provided in descending order starting with the most frequently occurring value. PARAMETERS: data: Required Argument. Specifies the input data to perform frequency analysis. Types: teradataml DataFrame columns: Required Argument. Specifies the name(s) of the column(s) to analyze. Occasionally, it can also accept permitted strings to specify all columns, or all numeric columns, or all character columns. Permitted Values: * Name(s) of the column(s) in "data". * Pre-defined strings: * 'all' - all columns * 'allnumeric' - all numeric columns * 'allcharacter' - all numeric and date columns Types: str OR list of Strings (str) exclude_columns: Optional Argument. Specifies the name(s) of the column(s) to exclude from the analysis, if a column specifier such as 'all', 'allnumeric', 'allcharacter' is used in the "columns" argument. Types: str OR list of Strings (str) cumulative_option: Optional Argument. Specifies whether to include rank, cumulative count, and cumulative percent information for each frequency value. When set to 'True', this information is included otherwise not. Note: This argument should not be set to 'True' when style is 'pairwise'. Default Value: False Types: bool agg_filter: Optional Argument. Specifies the clause to restrict returned aggregations. For example, agg_filter="xpct > 1" Note: This argument should not be used when "min_percentage" argument is used. Types: str min_percentage: Optional Argument. Specifies a value to determine whether to include only frequency values that occur a minimum percentage of the time. Setting this to 0 or 0.0 is equivalent to not including the argument at all. Types: float, int pairwise_columns: Optional Argument. Specified the columns to be paired up with the frequency columns. Note: Use only when "style" is set to 'pariwise'. Types: str OR list of Strings (str) stats_columns: Optional Argument. Specifies the name(s) of column(s) for which the minimum, maximum, mean value, and standard deviation are included in the result with the values computed over the rows corresponding to the individual values of the frequency columns. Note: This argument can be used only when "style" is 'basic'. Types: str OR list of Strings (str) style: Optional Argument. Specifies the frequency style for the analysis. Permitted Values: * 'basic' - Counts frequencies of individual column values. * 'pairwise' - Counts frequencies of pair-wise combinations of values of selected columns rather than individually. This should not be used when "cumulative_option" is set to 'True'. * 'crosstab' - Counts frequencies of combinations of values of selected columns rather than individually. Default Value: 'basic' Types: str top_n: Optional Argument. Specifies the number of frequency values to include. Using this argument shows frequency only for the number of top occurring values entered. Note: This argument is enabled only if "cumulative_option" is set to 'True'. Types: int filter: Optional Argument. Specifies the clause to filter rows selected for analysis within Frequency. For example, filter = "cust_id > 0" Types: str RETURNS: An instance of Frequency. Output teradataml DataFrames can be accessed using attribute references, such as FrequencyObj.<attribute_name>. Output teradataml DataFrame attribute name is: result. RAISES: TeradataMlException, TypeError, ValueError EXAMPLES: # Notes: # 1. To execute Vantage Analytic Library functions, # a. import "valib" object from teradataml. # b. set 'configure.val_install_location' to the database name where Vantage # analytic library functions are installed. # 2. Datasets used in these examples can be loaded using Vantage Analytic Library # installer. # Import valib object from teradataml to execute this function. from teradataml import valib # Set the 'configure.val_install_location' variable. from teradataml import configure configure.val_install_location = "SYSLIB" # Create required teradataml DataFrame. df = DataFrame("customer") print(df) # Example 1: Perform frequency analysis using default values. obj = valib.Frequency(data=df, columns="years_with_bank") # Print the results. print(obj.result) # Example 2: Calculate frequency values that occur a minimum 10% of the time for # the column 'years_with_bank'. obj = valib.Frequency(data=df, columns="years_with_bank", min_percentage=10) # Print the results. print(obj.result) # Example 3: Calculate the frequency values for the column 'years_with_bank'. Include # cumulative measures and get only the top five frequency values. obj = valib.Frequency(data=df, columns="years_with_bank", cumulative_option=True, top_n=5) # Print the results. print(obj.result) # Example 4: Calculate the frequencies using 'crosstab' style on two columns 'income' # and 'age'. obj = valib.Frequency(data=df, columns=["income", "age"], style="crosstab") # Print the results. print(obj.result) # Example 5: Calculate frequencies using 'pairwise' style by combining one pairwise # column with two frequency columns. obj = valib.Frequency(data=df, columns=["gender", "marital_status"], style="pairwise", pairwise_columns="years_with_bank") # Print the results. print(obj.result) # Example 6: Calculate frequencies with inclusion of "stats_columns". obj = valib.Frequency(data=df, columns="gender", stats_columns="years_with_bank") # Print the results. print(obj.result) # Example 7: Calculate frequencies by applying filtering using "agg_filter" and # "filter" to introduce a custom where clause and having clause. obj = valib.Frequency(data=df, columns="years_with_bank", filter="cust_id > 0", agg_filter="xpct > 1") # Print the results. print(obj.result)