5.4.2 - Overlap - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

prodname
Teradata Warehouse Miner
vrm_release
5.4.2
created_date
October 2016
category
User Guide
featnum
B035-2306-106K

Purpose

Overlap analysis is designed to make it safer to combine information in multiple tables into an analytic data set by providing counts of overlapping key fields amongst pairs of tables. For example, if an analytic data set is being built to describe customers, it is useful to know whether the customer, account and transaction tables that provide information about customers actually refer to the same customers.

Given a column name and a list of table names matched to a list of database names, the Overlap analysis determines the number of instances of that column which each pair-wise combination of tables has in common. The same can also be performed for multiple columns taken together.

An Overlap analysis can process any data type that can be compared except those containing byte data.

Syntax

call twm. td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

Required Parameters

columns
The key columns to be matched and counted. For each table, one or more column names should appear in braces. All pairs of braces should, however, have the same number of columns in them. For example, if tablename=twm_customer,twm_customer_analysis, the columns parameter would be columns={cust_id},{cust_id}. If the key contained two columns, the parameter might look like columns={cust_id,tran_id},{cust_id,tran_id}. The key columns might have different names in different tables, so that the column names in the braces could differ.
database
The database or databases containing the table or tables to be analyzed.
The number of databases in the database parameter and the number of tables in the tablename parameter must be the same. They are matched by position, and the database names may be repeated. For example, database=d1,d1,d2 and tablename=t1,t2,t3 indicate that d1.t1, d1.t2 and d2.t3 are the tables to be analyzed, with tables t1 and t2 both residing in database d1.
Overlap
The Overlap parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
tablename
The table or tables containing the columns to be analyzed. For example, tablename=t1,t2,t3 indicates that three tables (t1, t2 and t3) are to be analyzed. Note that the number of databases in the database parameter and the number of tables in the tablename parameter must be the same, as described for the database parameter.

Examples

Examples of using Overlap are included in this section. These examples assume that the td_analyze function has been installed in a database named twm.

The following example demonstrates calling Overlap with two input tables, twm_customer and twm_customer_analysis, each having a single-part key consisting of the column named cust_id.

call twm.td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

The following example is the tutorial example from the Teradata Warehouse Miner User Guide, Volume 1. It uses a single part key (cust_id) that is a partial key for some of the 7 tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source,twm_source,twm_source,twm_source,twm_source;tablename=twm_customer,twm_checking_acct,twm_checking_tran,twm_credit_acct,twm_credit_tran,twm_savings_acct,twm_savings_tran;columns={cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id};');

The following example uses a 2-part key, cust_id and tran_id, for 3 transaction tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source;tablename=twm_checking_tran,twm_credit_tran,twm_savings_tran;columns={cust_id,tran_id},{cust_id,tran_id},{cust_id,tran_id};');