Overlap - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.6
Published
November 2018
Language
English (United States)
Last Update
2018-12-07
dita:mapPath
akh1538171534882.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Purpose

Overlap analysis combines information from multiple tables into an analytic data set by providing counts of overlapping key fields amongst pairs of tables. For example, if an analytic data set is being built to describe customers, it is useful to know whether the customer, account, and transaction tables that provide information about customers refer to the same customers.

Given a column name and a list of table names matched to a list of database names, the Overlap analysis determines the number of instances of that column which each pair-wise combination of tables has in common. The same can also be performed for multiple columns taken together.

You can use Overlap analysis to process any data type that is comparable except those containing byte data.

Syntax

call twm. td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

Required Parameters

columns
The key columns to match and count. For each table, one or more column names appears in braces. All pairs of braces, however, have the same number of columns in them. For example, if tablename=twm_customer,twm_customer_analysis, the columns parameter is columns={cust_id},{cust_id}. If the key contained two columns, the parameter is columns={cust_id,tran_id},{cust_id,tran_id}. If the key columns have different names in different tables, the column names in the braces may differ.
database
The database or databases containing the table or tables to analyze.
The number of databases in the database parameter and the number of tables in the tablename parameter must be the same. They are matched by position, and you can repeat the database names. For example, database=d1,d1,d2 and tablename=t1,t2,t3 indicate d1.t1, d1.t2 and d2.t3 are the tables to analyze, with tables t1 and t2 both residing in database d1.
gensqlonly
When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.
outputdatabase
Specifies the name of the database to contain the analysis results table.
Overlap
The Overlap parameter:
  • Is required
  • Must be the first parameter
  • Is always enclosed in single quotes
tablename
The table or tables containing the columns to analyze. For example, tablename=t1,t2,t3 indicates three tables (t1, t2, and t3) are to be analyzed. The number of databases in the database parameter and the number of tables in the tablename parameter must be the same.

Optional Parameters

outputtablename
The name of the output table when outputstyle=table or view.
overwrite

When overwrite is set to true (default), the output tables are dropped before creating new ones.

Examples

To run the provided examples, the td_analyze function must be installed in a database called twm and the TWM tutorial data must be installed in the twm_source database.

These examples demonstrate calling Overlap with two input tables, twm_customer and twm_customer_analysis, each having a single-part key consisting of the column named cust_id.

call twm.td_analyze('Overlap','database=twm_source,twm_source;tablename=twm_customer,twm_customer_analysis;columns={cust_id},{cust_id};');

The following example uses a single part key (cust_id) that is a partial key for some of the seven tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source,twm_source,twm_source,twm_source,twm_source;tablename=twm_customer,twm_checking_acct,twm_checking_tran,twm_credit_acct,twm_credit_tran,twm_savings_acct,twm_savings_tran;columns={cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id},{cust_id};');

The following example uses a 2-part key, cust_id and tran_id, for 3 transaction tables.

call twm.td_analyze('Overlap','database=twm_source,twm_source,twm_source;tablename=twm_checking_tran,twm_credit_tran,twm_savings_tran;columns={cust_id,tran_id},{cust_id,tran_id},{cust_id,tran_id};');