Purpose
Overlap analysis makes it safer to combine information in multiple tables into an analytic data set by providing counts of overlapping key fields amongst pairs of tables. For example, if an analytic data set is being built to describe customers, it is useful to know whether the customer, account, and transaction tables that provide information about customers refer to the same customers.
Given a column name and a list of table names matched to a list of database names, the Overlap analysis determines the number of instances of that column which each pair-wise combination of tables has in common. The same can also be performed for multiple columns taken together.
An Overlap analysis can process any data type that can be compared except those containing byte data.
Syntax
call twm. td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');Required Parameters
- columns
- The key columns to match and count. For each table, one or more column names should appear in braces. All pairs of braces should, however, have the same number of columns in them. For example, if tablename=twm_customer,twm_customer_analysis, the columns parameter would be columns={cust_id},{cust_id}. If the key contained two columns, the parameter might look like columns={cust_id,tran_id},{cust_id,tran_id}. The key columns might have different names in different tables, so the column names in the braces could differ.
- database
- The database or databases containing the table or tables to analyze.
- Overlap
- The Overlap parameter:
- Is required
- Must be the first parameter
- Is always enclosed in single quotes
- tablename
- The table or tables containing the columns to analyze. For example, tablename=t1,t2,t3 indicates three tables (t1, t2, and t3) are to be analyzed. The number of databases in the database parameter and the number of tables in the tablename parameter must be the same.
Examples
These Overlap examples assume the td_analyze function is installed in a database named twm.
These examples demonstrate calling Overlap with two input tables, twm_customer and twm_customer_analysis, each having a single-part key consisting of the column named cust_id.
call twm.td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');
The following example is from the Teradata Warehouse Miner User Guide, Volume 1. It uses a single part key (cust_id) that is a partial key for some of the seven tables.
call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source,twm_source,twm_source,twm_source,twm_source;tablename=twm_customer,twm_checking_acct,twm_checking_tran,twm_credit_acct,twm_credit_tran,twm_savings_acct,twm_savings_tran;columns={cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id};');
The following example uses a 2-part key, cust_id and tran_id, for 3 transaction tables.
call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source;tablename=twm_checking_tran,twm_credit_tran,twm_savings_tran;columns={cust_id,tran_id},{cust_id,tran_id},{cust_id,tran_id};');