Teradata Package for R Function Reference | 17.00 - 17.00 - td_association_valib - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

Association Rules provide various measures concerning items residing in groups. The measures, support, confidence, lift and Z Score, help to determine the likelihood that one or more items exist in a group, given that another one or more items exist in the same group. The classic example of this type of study is market basket analysis, in which the groups are shopping carts and the items are the products purchased in the shopping carts. An association rule might indicate the likelihood that a given shopping cart contains oranges, given that it also contains apples.
Association rules consist of a left part and a right part. The left part consists of one or more items that are given to reside in a group, and the right part is the consequence that one or more items also reside in the given group. The measures are defined as follows:

  • Support-Percentage of groups containing the items on the left (left-side support), on the right (right-side support), or on both sides of a rule (rule support).

  • Confidence-Percentage of groups containing the left-side items that also contain the right-side items.

  • Lift-A measure of how much the probability is raised that the right-side items occur in a group given that the left-side items occur in the group.

  • Z Score-A statistical measure of how much the expected and actual values of the number of groups containing all the items in the rule varies. (Zero means expected and actual are the same.)

A sequence analysis may be optionally requested, wherein there is a sequence of items defined by a "sequence.column" argument, ordering the items on each side of each rule, with left-side items preceding the rights-side items. An option called "relaxed.order" is provided, that can be set to TRUE so that items on the left side and the right side can be in any order provided that all left-side items precede all right-side items. An output tbl_teradata is created for each requested rule combination (1-to-1, 2-to-1, and so on).

Usage

td_association_valib(data, group.column, item.column, ...)

Arguments

data

Required Argument.
Specifies the input data to perform Association analysis.
Types: tbl_teradata

group.column

Required Argument.
Specifies the name of the column representing groups in the association rules.
Types: character

item.column

Required Argument.
Specifies the name of the column representing items in the association rules.
Types: character

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_association_valib" which is a named list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator using names as described in the note below.
Note:

  1. Output objects of class tbl_teradata generated can be categorized as follows:

    1. Affinity outputs.

    2. Support outputs.

  2. Attribute names of these output objects of class tbl_teradata can be found out using two attributes:

    1. affinity.outputs:
      This prints the attribute names of the output objects of class tbl_teradata containing the affinity results.
      For example, this can be accessed as
      AssociationObj$affinity.outputs

    2. support.outputs:
      This prints the attribute names of the output objects of class tbl_teradata containing the support results.
      For example, this can be accessed as
      AssociationObj$support.outputs

  3. Output objects of class tbl_teradata generated by the function depend upon following:

    1. When "process.type" is set to 'support':

      1. No affinity outputs are generated.

      2. Function generates only two support output objects of class tbl_teradata named as:

        1. support.1.item

        2. group.count

    2. When "no.support.results" is set to TRUE and "process.type" is set to values other than 'support':

      1. Only affinity outputs are generated, based on the number of combinations requested by the user.

      2. No support outputs are generated.

    3. When "no.support.results" is set to FALSE and "process.type" is set to values other than 'support':

      1. Affinity outputs are generated, based on the number of combinations requested.

      2. Support outputs are also generated depending on the combination(s) requested.

Other Arguments

combinations

Optional Argument.
Specifies the combinations of number of items on left side and number of items on right side of requested association rules. More than one combination can also be requested. For each combination specified, one output tbl_teradata is generated, i.e., number of outputs objects of class tbl_teradata generated depends on the number of combinations. Corresponding output tbl_teradata is named as "result.<combination>".
For example,
combinations=c(11, 21)
above combinations produces an analysis of 1-to-1 and 2-to-1 rules. This will result in two output objects of class tbl_teradata 'result.11' and 'result.21'.
Note:

  • If you add the sizes of the left and right sides of a combination, the sum must be less than or equal to 5.

Default Value: 11
Types: integer OR list of Integers (int)

description.data

Optional Argument.
Specifies the tbl_teradata containing description data, which is joined with the result.
Note:

  • Arguments "description.data", "description.identifier" and "description.column", if used, must be used together.

Types: tbl_teradata

description.identifier

Optional Argument.
Specifies the name of the column in the "description.data" that is joined with the result.
Note:

  • Arguments "description.data", "description.identifier" and "description.column", if used, must be used together.

Types: character

description.column

Optional Argument.
Specifies the name of the column in the "description.data" that contains descriptive item names.
Note:

  • Arguments "description.data", "description.identifier" and "description.column", if used, must be used together.

Types: character

group.count

Optional Argument.
Specifies the count of the number of groups in the input data. By default it is calculated by the function. This is useful when you are processing a reduced input set saved in a previous run, so that calculations can be based on the number of groups in the original input set and not the reduced set.
Types: integer

hierarchy.data

Optional Argument.
Specifies the hierarchy data that can be joined with the input data in order to reduce the amount of input data and compute association rules at a different hierarchical level. Use of this argument has an impact on the data saved in the reduced input when the argument "reduced.data" is requested. When this option is utilized, listwise deletion is automatically performed, ignoring rows that contain a null group, item, or sequence column value.
Note:

  • Arguments "hierarchy.data", "low.level.column" and "high.level.column", if used, must be used together.

Types: tbl_teradata

low.level.column

Optional Argument.
Specifies the lowest level item column in the "hierarchy.data" to be matched with the item column in the input data.
Note:

  • Arguments "hierarchy.data", "low.level.column" and "high.level.column", if used, must be used together.

Types: character

high.level.column

Optional Argument.
Specifies the higher-level item column in the "hierarchy.data".
Note:

  • Arguments "hierarchy.data", "low.level.column" and "high.level.column", if used, must be used together.

Types: character

left.lookup.data

Optional Argument.
Specifies a left-side lookup data that can be specified to reduce the rules reported to only those with left-side items that appear in the lookup dataset.
Note:

  • Arguments "left.lookup.data" and "left.lookup.column", if used, must be used together.

Types: tbl_teradata

left.lookup.column

Optional Argument.
Specifies the name of the column to match with left-side items in rules.
Note:

  • Arguments "left.lookup.data" and "left.lookup.column", if used, must be used together.

Types: character

right.lookup.data

Optional Argument.
Specifies a right-side lookup data that can be specified to reduce the rules reported to only those with right-side items that appear in the lookup dataset.
Note:

  • Arguments "right.lookup.data" and "right.lookup.column", if used, must be used together.

Types: tbl_teradata

right.lookup.column

Optional Argument.
Specifies the name of the column to match with right-side items in rules.
Note:

  • Arguments "right.lookup.data" and "right.lookup.column", if used, must be used together.

Types: character

min.confidence

Optional Argument.
Specifies the minimum value that the confidence measure of an association rule must have before it is included in a result. The range of valid values is 0 to 1 inclusive.
Types: numeric

min.lift

Optional Argument.
Specifies the minimum value that the lift measure of an association rule must have before it is included in a result. The range of valid values is 0 to any positive numeric value.
Types: numeric

min.support

Optional Argument.
Specifies the minimum value that the support measure of an association rule must have before it is included in a result. When this argument is utilized, the size of the input data is reduced, potentially impacting the use of the "reduced.data" argument. Use of this argument also causes listwise deletion to be performed, skipping any input rows that have a null group, item or sequence column value. The range of valid values is 0 to 1 inclusive.
Types: numeric

min.zscore

Optional Argument.
Specifies the minimum value that the Z Score measure of an association rule must have before it is included in a result.
Types: numeric

order.prob

Optional Argument.
Specifies the probability of correct ordering for sequential analysis. When sequence analysis is being performed, by default the algorithm determines ordering probabilities. Value should be non-zero between 0 and 1 (Setting it to 1 effectively ignores this principle in lift and Z Score calculations).
Types: numeric

process.type

Optional Argument.
Specifies the type of processing.
Permitted Values:

  • 'all' - All processing is performed, from building support tables to calculating final affinities.

  • 'support' - The single item support result tbl_teradata is built and then processing is halted. This allows user to view the support result and decide what the minimum support value should be, thus reducing the amount of processing performed. The single item support output tbl_teradata is named as 'support.1.item' and underlying output table is named as 'ml__valib_association_1_ITEM_SUPPORT'. If "support.result.prefix" is specified, it replaces 'ml__valib_association' with the provided value in the support table name.

  • 'recalculate' - The final affinity tables are calculated based on support tables already present. This requires that the "no.support.results" parameter was set to FALSE in a previous run so that the support tables are available for recalculating the final affinities.

Default Value: 'all'
Types: character

reduced.data

Optional Argument.
Specified the reduced input data. If input to the analysis is reduced by using the "min.support", a "hierarchy.data", or a "filter", the resulting reduced input data can be saved for further analysis.
Notes:

  • This is not affected by the use of a left-side or right-side lookup argument or a "min.confidence", "min.lift", or "min.zscore" arguments.

  • If further analysis is performed on this data, it may be appropriate to use the "group.count" argument.

Types: tbl_teradata

relaxed.order

Optional Argument.
Use this option in conjunction with sequence analysis, that is, when a sequence column is specified. Relaxed ordering occurs when the items on the left side of an association rule may occur in any order (via the sequence column), and the same is the case with the right-side items, provided that all left-side items precede all right-side items.
Types: logical

sequence.column

Optional Argument.
Specifies the name of the column providing sequencing of input items if sequence analysis is desired. This might typically be a column of type date or timestamp. By default, sequence analysis is not performed.
Types: character

filter

Optional Argument.
Specifies the clause to filter rows selected for analysis within Association Rules.
For example,
filter = "cust_id > 0"
Note:

  • Single quotes within the parameter value must be doubled, such as in where=channel <> ” ”. (Ordinarily, the expression would be where=channel <> ' '. Instead, the expression ends with quote-quote-blank-quote-quote).

Types: character

no.support.results

Optional Argument.
Specifies whether the intermediate support results are required or not. By default, support results are not presented in the output.
Notes:

  • If "no.support.results" is FALSE and "support.result.prefix" is not used, then the generated underlying support tables are overwritten, if they already exist.

  • When set to TRUE, support results generated depends on "combinations" parameter.

Default Value: TRUE
Types: logical

support.result.prefix

Optional Argument.
Specifies a string that should be used to as prefix for the underlying table name for the support tables which can be accessed using output objects of class tbl_teradata.
Notes:

  • Teradata recommends using this when function is to be executed with "process.type" as 'recalculate'. Make sure to use the same value for both the function calls.

  • If "no.support.results" is FALSE and this is not used, then the generated underlying support tables are overwritten, if they already exist.

Default Value: 'ml__valib_association'
Types: character

Examples

# Notes:
#   1. To execute Vantage Analytic Library functions, set option
#      'val.install.location' to the database name where Vantage analytic
#      library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic
#      Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "credit_tran")
print(df)


# Example 1: Perform Association analysis using default values.
obj <- td_association_valib(data=df, group.column="cust_id",
                            item.column="channel")

# Print the affinity result. Only affinity result for default combination 11
# is produced.
print(obj$result.11)

# Example 2: Requests a 1-to-1 and a 2-to-1 analysis, while also requesting
#            0.1 minimum support. Rows with blank channel column are also
#            filtered.
# Note: The blank channel value requires double single quotes,
#       that is quote-quote-blank-quote-quote.
obj <- td_association_valib(data=df,
                            group.column=c("cust_id"),
                            item.column="channel",
                            min.support=0.1,
                            filter="channel <>''''",
                            combinations=c(11, 21))

# Executing above call will return two affinity results. Let's check the
# names of the output objects of class tbl_teradata.
print(obj$affinity.outputs)

# Print the results.
print(obj$result.11)
print(obj$result.21)

# Example 3: Request sequence analysis by specifying a "sequence.column"
#            parameter. Also include optional parameters "min.support" and
#            "filter".
obj <- td_association_valib(data=df,
                            group.column=c("cust_id"),
                            item.column="channel",
                            min.support=0.1,
                            filter="channel <>''''",
                            sequence.column="tran_date")

# Executing above call will return one affinity result. Let's check the name
# of the output tbl_teradata.
print(obj$affinity.outputs)

# Print the results.
print(obj$result.11)

# Example 4: Let's generate the support results first and then calculate the
#            final affinity results.
# To generate the support results, we set "no.support.results" to FALSE and
# use 'test_prefix' as a prefix for the table names for the support tables
# generated.
obj <- td_association_valib(data=df,
                            group.column=c("cust_id"),
                            item.column="channel",
                            no.support.results=FALSE,
                            support.result.prefix="test_prefix")

# Print the results.
print(obj)

# Let's look at the attribute names of the support results generated.
print(obj$support.outputs)

# Print the individual support results.
print(obj$support.result.01)
print(obj$support.result.11)

# Let's look at the attribute names of the affinity results generated.
print(obj$affinity.outputs)

# Print the affinity results.
print(obj$result.11)

# Re-run the Association function with "process.type" set to 'recalculate'
# for recalculation of affinity results.
# To recalculate we shall use the support results generated by the previous
# function call.
obj <- td_association_valib(data=df,
                            group.column=c("cust_id"),
                            item.column="channel",
                            process.type="recalculate",
                            support.result.prefix="test_prefix")

# Print the affinity results.
print(obj$result.11)