Teradata Package for R Function Reference | 17.00 - 17.00 - td_overlap_valib - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

The function performs Overlap analysis by combining information from multiple inputs into an analytic data set by providing counts of overlapping key fields among pairs of inputs. For example, if an analytic data set is being built to describe customers, it is useful to know whether the customer, account, and transaction tables that provide information about customers refer to the same customers.

Given inputs and corresponding column names, the Overlap analysis determines the number of instances of that column which each pair-wise combination of inputs has in common. The same can also be performed for multiple columns taken together.

Overlap analysis is used to process any data type that is comparable except those containing byte data.

Usage

td_overlap_valib(data1, columns1, ...)

Arguments

data1

Required Argument.
Specifies the input data containing the columns on which Overlap analysis is to be performed.
Types: tbl_teradata

columns1

Required Argument.
Specifies the name(s) of the column(s), in "data1" argument, to be used in Overlap analysis.
Types: character OR vector of Strings (character)

...

Specifies the additional data and columns arguments that can be used with "data1" and "columns1" for Overlap analysis.

  • data2, ..., dataN:
    Optional Arguments.
    Specifies the additional inputs containing the columns on which Overlap analysis is to be performed along with "data1" and "columns1".
    Types: tbl_teradata

  • columns2, ..., columnsN:
    Optional Arguments.
    Specifies the name(s) of the columns(s) of additional inputs to be used in the Overlap analysis along with "data1" and "columns1".
    Types: character OR vector of Strings (character)

Note:

  1. The data and columns related arguments must be in a sequence starting from "data2" and "columns2" respectively.

  2. For each data argument (datai), corresponding columns argument (columnsi) must be specified and vice-versa.

  3. The number of columns in each of the columns related arguments (including "columns1" argument) should be same.

Value

Function returns an object of class "td_overlap_valb" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using name: result.

Examples

# Notes:
#   1. To execute Vantage Analytic Library functions, set option
#      'val.install.location' to the database name where Vantage analytic
#      library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic
#      Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create and print objects of class "tbl_teradata".
customer <- tbl(con, "customer")
customer_analysis <- tbl(con, "customer_analysis")
checking_tran <- tbl(con, "checking_tran")
credit_tran <- tbl(con, "credit_tran")
savings_tran <- tbl(con, "savings_tran")

print(customer)
print(customer_analysis)
print(checking_tran)
print(credit_tran)
print(savings_tran)

# Example 1: Run Overlap analysis on 'cust_id' column present in the
#            input objects 'customer' and 'customer_analysis' of class
#            "tbl_teradata".
overlap_obj <- td_overlap_valib(data1=customer,
                                data2=customer_analysis,
                                columns1=c("cust_id"),
                                columns2="cust_id")
# Print the results.
print(overlap_obj$result)

# Example 2: Run Overlap analysis on columns 'cust_id' and 'tran_id' present
#            in the input objects 'checking_tran', 'credit_tran' and
#            'savings_tran' of class "tbl_teradata".
overlap_obj <- td_overlap_valib(data1=checking_tran,
                                data2=credit_tran,
                                data3=savings_tran,
                                columns1=c("cust_id", "tran_id"),
                                columns2=c("cust_id", "tran_id"),
                                columns3=c("cust_id", "tran_id"))
# Print the results.
print(overlap_obj$result)