Design Code - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 2ADS Generation

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-03
dita:mapPath
qhj1503087326201.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2301
Product Category
Software

Design coding is useful when a categorical data element must be re-expressed as one or more meaningful numeric data elements. Many classes of analytical algorithms from the statistical and artificial intelligence communities require variables, inputs, or outputs to be numeric and numerically meaningful. It does this, roughly speaking, by creating a binary numeric field for each categorical data value. Design coding is offered in two forms, one known as dummy-coding and the other as contrast-coding. A “Values” function is provided to select the possible values from the input table.

In “dummy-coding”, a new column is produced for each listed value, with a value of 0 or 1 depending on whether that value is assumed by the original column. Alternately, given a list of values to “contrast-code” along with a “reference value”, a new column is produced for each listed value, with a value of 0 or 1 depending on whether that value is assumed by the original column, or a value of -1 if that original value is equal to the reference value.

When using “Dummy Coding,” if a column assumes n values, new columns may be created for all n values, (or for only n-1 values, because the nth column will be perfectly correlated with the first n-1 columns). When using “Contrast Coding”, only n-1 or fewer new columns may be created from a categorical column with n values.