Duplication By Dimension - Teradata Warehouse Miner

Teradata® Warehouse Miner™ User Guide - Volume 2ADS Generation

Teradata Warehouse Miner
Release Number
November 2018
English (United States)
Last Update
Product Category

Sometimes, it is desirable to generate a number of similar variables at one time using data constraints involving specific values or combinations of values from one or more columns in the input table. These other columns can be thought of as dimensions upon which the new variable is expanded or duplicated. For example, instead of creating a single variable containing a customer’s average transaction amount, it may be desirable to create separate variables for average transaction amount during each of the last 6 months, yielding 6 variables.

Duplication by dimension is performed at the time a variable is created with the Variable Creation analysis. The user may dimension a variable on all or a subset of the dimension values they define. Ordinarily, both the dimensioned and dimensioning variable reside in the same input table. For example, both the transaction amount (variable being dimensioned) and the transaction date (dimensioning variable) reside in the transaction table that is used as input.

It is possible, however, to dimension a variable via a column in another table such as a hierarchy table. This requires that the table containing the dimensioning variable also contains a column that can be matched to a column in the table that contains the column to be dimensioned. For example, you can dimension the average transaction amount by department where the table containing the transaction amount also contains a product code, and the hierarchy table used for dimensioning contains both a product code and department code. In this case, the product code must be used in the “join path” between the transaction and hierarchy table.

Although variables duplicated by dimension are always implemented as aggregates by necessity, the variables may or may not be summarized values. The example previously given of average transaction amount is a summarized value where the individual dimension values apply to multiple rows or observations. However, if the dimension values apply to specific rows for each anchor key (see Join Paths, Anchor Table and Anchor Keys), then duplicating by dimension amounts to picking out specific values rather than summarizing over dimension values. An example of this might be dimensioning, by month, the values in a table that summarizes transaction amounts by customer and month. In this case, dimensioning by month simply selects the individual monthly sums or averages, creating a separate variable for each. To do this, the default aggregate function MIN is used.

Depending on the nature of the variable being dimensioned, the user may want to treat values not applying to a particular dimension value as either NULL or 0. The use of NULL in this case results in the possibility of the dimensioned variable being NULL if no data applies; it simply gives a total of 0 if no data applies. An option is therefore provided to the user to indicate that either NULL or 0 should be used when no data applies.