Teradata Package for R Function Reference | 17.00 - 17.00 - td_frequency_valib - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

Frequency analysis counts the occurrences of individual data values in columns that contain categorical data. Frequency analysis is useful in understanding the meaning of a particular data element and can point out the need to recode some of the data values found, either permanently or in the course of building an analytic data set. This function is also useful in analyzing combinations of values occurring in two or more columns.

A Frequency analysis calculates the number of occurrences of each value of the column or columns individually or in combination. Additionally, for each value, the percentage of rows in the selected tbl_teradata is provided in descending order starting with the most frequently occurring value.

Usage

td_frequency_valib(data, columns, ...)

Arguments

data

Required Argument.
Specifies the input data to perform frequency analysis.
Types: tbl_teradata

columns

Required Argument.
Specifies the name(s) of the column(s) to analyze. Occasionally, it can also accept permitted strings to specify all columns, or all numeric columns, or all character columns.
Permitted Values:

  1. Name(s) of the column(s) in "data".

  2. Pre-defined strings:

    1. 'all' - all columns

    2. 'allnumeric' - all numeric columns

    3. 'allcharacter' - all numeric and date columns

Types: character OR vector of Strings (character)

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_frequency_valib" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using name: result.

Other Arguments

exclude.columns

Optional Argument.
Specifies the name(s) of the column(s) to exclude from the analysis, if a column specifier such as 'all', 'allnumeric', 'allcharacter' is used in the "columns" argument.
Types: character OR vector of Strings (character)

cumulative.option

Optional Argument.
Specifies whether to include rank, cumulative count, and cumulative percent information for each frequency value.
When set to TRUE, this information is included otherwise not.
Note:

  • This argument should not be set to TRUE when style is 'pairwise'.

Default Value: FALSE
Types: logical

agg.filter

Optional Argument.
Specifies the clause to restrict returned aggregations.
For example,
agg.filter="xpct > 1"
Note:

  • This argument should not be used when "min.percentage" argument is used.

Types: character

min.percentage

Optional Argument.
Specifies a value to determine whether to include only frequency values that occur a minimum percentage of the time. Setting this to 0 or 0.0 is equivalent to not including the argument at all.
Types: numeric

pairwise.columns

Optional Argument.
Specified the columns to be paired up with the frequency columns.
Note:

  • Use only when "style" is set to 'pariwise'.

Types: character OR vector of Strings (character)

stats.columns

Optional Argument.
Specifies the name(s) of column(s) for which the minimum, maximum, mean value, and standard deviation are included in the result with the values computed over the rows corresponding to the individual values of the frequency columns.
Note:

  • This argument can be used only when "style" is 'basic'.

Types: character OR vector of Strings (character)

style

Optional Argument.
Specifies the frequency style for the analysis.
Permitted Values:

  1. 'basic' - Counts frequencies of individual column values.

  2. 'pairwise' - Counts frequencies of pair-wise combinations of values of selected columns rather than individually. This should not be used when "cumulative.option" is set to TRUE.

  3. 'crosstab' - Counts frequencies of combinations of values of selected columns rather than individually.

Default Value: 'basic'
Types: character

top.n

Optional Argument.
Specifies the number of frequency values to include. Using this argument shows frequency only for the number of top occurring values entered.
Note:

  • This argument is enabled only if "cumulative.option" is set to TRUE.

Types: integer

filter

Optional Argument.
Specifies the clause to filter rows selected for analysis within Frequency.
For example,
filter = "cust_id > 0"
Types: character

Examples

# Notes:
#   1. To execute Vantage Analytic Library functions, set options 'val.install.location' to
#      the database name where Vantage analytic library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)

# Example 1: Perform frequency analysis using default values.
obj <- td_frequency_valib(data=df,
                          columns="years_with_bank")

# Print the results.
print(obj$result)

# Example 2: Calculate frequency values that occur a minimum 10% of the time for
#            the column 'years_with_bank'.
obj <- td_frequency_valib(data=df,
                          columns="years_with_bank",
                          min.percentage=10)

# Print the results.
print(obj$result)

# Example 3: Calculate the frequency values for the column 'years_with_bank'. Include
#            cumulative measures and get only the top five frequency values.
obj <- td_frequency_valib(data=df,
                          columns="years_with_bank",
                          cumulative.option=TRUE,
                          top.n=5)

# Print the results.
print(obj$result)

# Example 4: Calculate the frequencies using 'crosstab' style on two columns 'income'
#            and 'age'.
obj <- td_frequency_valib(data=df,
                          columns=c("income", "age"),
                          style="crosstab")

# Print the results.
print(obj$result)

# Example 5: Calculate frequencies using 'pairwise' style by combining one pairwise
#            column with two frequency columns.
obj <- td_frequency_valib(data=df,
                          columns=c("gender", "marital_status"),
                          style="pairwise",
                          pairwise.columns="years_with_bank")

# Print the results.
print(obj$result)

# Example 6: Calculate frequencies with inclusion of "stats.columns".
obj <- td_frequency_valib(data=df,
                          columns="gender",
                          stats.columns="years_with_bank")

# Print the results.
print(obj$result)

# Example 7: Calculate frequencies by applying filtering using "agg.filter" and
#            "filter" to introduce a custom where clause and having clause.
obj <- td_frequency_valib(data=df,
                          columns="years_with_bank",
                          filter="cust_id > 0",
                          agg.filter="xpct > 1")

# Print the results.
print(obj$result)