Design Code - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

Product
Teradata Warehouse Miner
Release Number
5.4.4
Published
August 2017
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
guj1484331868727.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2306
lifecycle
previous
Product Category
Teradata® Warehouse Miner

Purpose

Design Coding is useful when a categorical data element must be re-expressed as one or more meaningful numeric data elements, creating a binary numeric field for each categorical data value. Design coding is offered in two forms: dummy-coding and contrast-coding.

In dummy-coding, a new column is produced for each listed value, with a value of 0 or 1 depending on whether that value is assumed by the original column. Alternately, given a list of values to contrast-code along with a reference value, a new column is produced for each listed value, with a value of 0 or 1 depending on whether that value is assumed by the original column, or a value of -1 if that original value is equal to the reference value.

When using dummy-coding, if a column assumes n values, new columns can be created for all n values, (or for only n-1 values, because the nth column is perfectly correlated with the first n-1 columns). When using contrast-coding, only n-1 or fewer new columns can be created from a categorical column with n values.

Design Coding supports character, numeric, and date type columns. If you enter date values, the keyword DATE must precede the date value, which should not be enclosed in single quotes.

The following parameters are required for a Design Code transformation:
  • designstyle
  • designvalues
  • columns

Syntax

call twm .td_analyze('vartran','database=database name;tablename=table name;General Parameters;designcode={designstyle (dummycode),designvalues (value,value),columns (value,)};');

Required Parameters

columns
Controls the name of the output (transformed) column and its data type. The columns parameter is required by all transformations except Derive. A separate transformation is performed for each column in the list. If a column name is followed by a forward slash and a name, the name after the slash becomes the name of the transformed column in the resultant output table. Otherwise the column name is used as the output column name.
For the Derive transformation, the outputname parameter controls the naming of the transformed output column.
database
The database containing the input table.
designcode
Identifies the type of transformation being performed.
designstyle
The designstyle parameter can assume one of two different styles: dummycode or contrastcode.
Style Description
designstyle (dummycode)

In dummy-coding, a new column is produced for each listed value (see designvalues below), with a value of 0 or 1 depending on whether that value is assumed by the original column.

designstyle (contrastcode, <reference value>)

Given a list of values to contrast-code, along with a reference value, a new column is produced for each listed value, with a value of 0 or 1 depending on whether that value is assumed by the original column, or a value of -1 if that original value is equal to the reference value. For example, with a reference value of 0:

designstyle (contrastcode,0)

designvalues

Lists the values to code and optionally the name of the resulting output column. For example:

designvalues (M/Males,F/Females)
tablename
The name of the table to transform.
vartran
Required to run a variable transformation. Enclose the 'vartran' parameter in single quotes.

General Parameters (separated by a semi-colon)

datatype
For all transformation types, the datatype casts the column to a desired database data type provided it is compatible with the transformed data.
Allowed output types include:
  • byteint
  • char
  • date
  • decimal
  • float
  • integer
  • smallint
  • time
  • timestamp
  • varchar
  • bigint
  • number
fallback
When true, requests a mirrored copy of the output table in the Teradata Database when outputstyle=table.
gensqlonly
When true, the SQL for the requested transformations is returned as a result set but not executed. When not specified or set to false, the SQL is executed but not returned.
indexcolumns
When true, requests the output table contain the index columns when outputstyle=table.
indexunique
When true, requests the output table contain a unique primary index when outputstyle=table.
keycolumns
When null replacement is requested, either via a Null Replacement transformation or in combination with a Bin Code, Derive, Design Code, Recode, Rescale, Sigmoid, or Z Score transformation, the keycolumns parameter must be specified. The column or columns listed must form a unique key into the input and output table of the transformation.
lockingclause

Requests the generated SQL contain the given locking clause in the appropriate location depending on the output style.

An example of a locking clause when the output style defaults to select is:

LOCKING mydb.mytable FOR ACCESS;

multiset
When true, requests an output table that can contain duplicate rows when outputstyle=table.
noindex
When true, requests the output table contain no index columns when outputstyle=table.
nullstyle
Data types supported by various nullstyle parameters are:
Data Type Description Example
literal,value numeric, character, and date nullstyle (literal,value)
mean numeric and date nullstyle (mean)
median numeric and date nullstyle (median)
medianwithoutaveraging any supported data type nullstyle (medianwithoutaveraging)
mode any supported date type nullstyle (mode)
imputed,table any supported data type nullstyle (imputed,tablename)

If date values are entered, the keyword DATE must precede the date value, which should not be enclosed in single quotes.

outputdatabase
The database containing the resulting output table when outputstyle=table or view.
outputstyle
Allowed output styles are:
  • select
  • table
  • view
If outputstyle is not specified, the function generates a SELECT statement and does not create a table or view.
outputstyle={select|table|view}
outputtablename
The name of the output table when outputstyle=table or view.
whereclause
Requests the generated SQL containing the given WHERE clause in appropriate places in the generated SQL. This is independent of the output style requested.

Examples

These examples show how to use Design Code. These Overlap examples assume the td_analyze function is installed in a database named twm.

The following examples demonstrate the Design Code transformation.

call twm.td_analyze('vartran','database=twm_source;tablename=twm_customer;designcode={designstyle(dummycode),designvalues(M,F),columns(gender)}{designstyle(contrastcode,M),designvalues(F/female),columns(gender/gen)}{designstyle(contrastcode,M),designvalues(F),columns(gender/gen2)}{designstyle(contrastcode,M),designvalues(F),columns(gender/gen3)};');
This example shows how date literals can be used.
call twm.td_analyze('vartran','database=twm_source;tablename=twm_credit_acct;designcode={designstyle(dummycode),designvalues(DATE 1995-12-21),columns(acct_start_date)}{designstyle(contrastcode,DATE 1995-12-21),designvalues(DATE 1995-12-23/d2,DATE 1995-12-26/d3),columns(acct_start_date/sdate)};');

The following example demonstrates combined null replacement. Note that keycolumns must be included as a general parameter when null value replacement is performed. The letter U for Unknown is enclosed in two pair of single quotes.

call twm.td_analyze('vartran','database=twm_source;tablename=twm_customer;keycolumns=cust_id;designcode=designstyle(dummycode),designvalues(M,F),nullstyle(literal,''U''),columns(gender);');