Overlap - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.4
Published
August 2017
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
guj1484331868727.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Purpose

Overlap analysis makes it safer to combine information in multiple tables into an analytic data set by providing counts of overlapping key fields amongst pairs of tables. For example, if an analytic data set is being built to describe customers, it is useful to know whether the customer, account, and transaction tables that provide information about customers refer to the same customers.

Given a column name and a list of table names matched to a list of database names, the Overlap analysis determines the number of instances of that column which each pair-wise combination of tables has in common. The same can also be performed for multiple columns taken together.

An Overlap analysis can process any data type that can be compared except those containing byte data.

Syntax

call twm. td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

Required Parameters

columns
The key columns to match and count. For each table, one or more column names should appear in braces. All pairs of braces should, however, have the same number of columns in them. For example, if tablename=twm_customer,twm_customer_analysis, the columns parameter would be columns={cust_id},{cust_id}. If the key contained two columns, the parameter might look like columns={cust_id,tran_id},{cust_id,tran_id}. The key columns might have different names in different tables, so the column names in the braces could differ.
database
The database or databases containing the table or tables to analyze.
The number of databases in the database parameter and the number of tables in the tablename parameter must be the same. They are matched by position, and the database names can be repeated. For example, database=d1,d1,d2 and tablename=t1,t2,t3 indicate d1.t1, d1.t2 and d2.t3 are the tables to analyze, with tables t1 and t2 both residing in database d1.
Overlap
The Overlap parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
tablename
The table or tables containing the columns to analyze. For example, tablename=t1,t2,t3 indicates three tables (t1, t2, and t3) are to be analyzed. The number of databases in the database parameter and the number of tables in the tablename parameter must be the same.

Examples

These Overlap examples assume the td_analyze function is installed in a database named twm.

These examples demonstrate calling Overlap with two input tables, twm_customer and twm_customer_analysis, each having a single-part key consisting of the column named cust_id.

call twm.td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

The following example is from the Teradata Warehouse Miner User Guide, Volume 1. It uses a single part key (cust_id) that is a partial key for some of the seven tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source,twm_source,twm_source,twm_source,twm_source;tablename=twm_customer,twm_checking_acct,twm_checking_tran,twm_credit_acct,twm_credit_tran,twm_savings_acct,twm_savings_tran;columns={cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id};');

The following example uses a 2-part key, cust_id and tran_id, for 3 transaction tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source;tablename=twm_checking_tran,twm_credit_tran,twm_savings_tran;columns={cust_id,tran_id},{cust_id,tran_id},{cust_id,tran_id};');