5.4.5 - Overlap - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

prodname
Teradata Warehouse Miner
vrm_release
5.4.5
created_date
February 2018
category
User Guide
featnum
B035-2306-028K

Purpose

Overlap analysis makes it safer to combine information in multiple tables into an analytic data set by providing counts of overlapping key fields amongst pairs of tables. For example, if an analytic data set is being built to describe customers, it is useful to know whether the customer, account, and transaction tables that provide information about customers refer to the same customers.

Given a column name and a list of table names matched to a list of database names, the Overlap analysis determines the number of instances of that column which each pair-wise combination of tables has in common. The same can also be performed for multiple columns taken together.

An Overlap analysis can process any data type that can be compared except those containing byte data.

Syntax

call twm. td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

Required Parameters

columns
The key columns to match and count. For each table, one or more column names should appear in braces. All pairs of braces should, however, have the same number of columns in them. For example, if tablename=twm_customer,twm_customer_analysis, the columns parameter would be columns={cust_id},{cust_id}. If the key contained two columns, the parameter might look like columns={cust_id,tran_id},{cust_id,tran_id}. The key columns might have different names in different tables, so the column names in the braces could differ.
database
The database or databases containing the table or tables to analyze.
The number of databases in the database parameter and the number of tables in the tablename parameter must be the same. They are matched by position, and the database names can be repeated. For example, database=d1,d1,d2 and tablename=t1,t2,t3 indicate d1.t1, d1.t2 and d2.t3 are the tables to analyze, with tables t1 and t2 both residing in database d1.
Overlap
The Overlap parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
tablename
The table or tables containing the columns to analyze. For example, tablename=t1,t2,t3 indicates three tables (t1, t2, and t3) are to be analyzed. The number of databases in the database parameter and the number of tables in the tablename parameter must be the same.

Optional Parameters

overwrite

When overwrite is set to true (default), the output tables are dropped before creating new ones.

Examples

To execute the provided examples, the td_analyze function must be installed in a database called twm and the Teradata Warehouse Miner tutorial data must be installed in the twm_source database.

These examples demonstrate calling Overlap with two input tables, twm_customer and twm_customer_analysis, each having a single-part key consisting of the column named cust_id.

call twm.td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

The following example is from the Teradata Warehouse Miner User Guide, Volume 1. It uses a single part key (cust_id) that is a partial key for some of the seven tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source,twm_source,twm_source,twm_source,twm_source;tablename=twm_customer,twm_checking_acct,twm_checking_tran,twm_credit_acct,twm_credit_tran,twm_savings_acct,twm_savings_tran;columns={cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id};');

The following example uses a 2-part key, cust_id and tran_id, for 3 transaction tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source;tablename=twm_checking_tran,twm_credit_tran,twm_savings_tran;columns={cust_id,tran_id},{cust_id,tran_id},{cust_id,tran_id};');