Design coding is useful when a categorical data element must be re-expressed as one or more meaningful numeric data elements. Many classes of analytical algorithms from the statistical and artificial intelligence communities require variables, inputs, or outputs to be numeric and numerically meaningful. It does this, roughly speaking, by creating a binary numeric field for each categorical data value. Design coding is offered in two forms, one known as dummy-coding and the other as contrast-coding. A “Values” function is provided to select the possible values from the input table.
In “dummy-coding”, a new column is produced for each listed value, with a value of 0 or 1 depending on whether that value is assumed by the original column. Alternately, given a list of values to “contrast-code” along with a “reference value”, a new column is produced for each listed value, with a value of 0 or 1 depending on whether that value is assumed by the original column, or a value of -1 if that original value is equal to the reference value.
When using “Dummy Coding,” if a column assumes n values, new columns may be created for all n values, (or for only n-1 values, because the nth column will be perfectly correlated with the first n-1 columns). When using “Contrast Coding”, only n-1 or fewer new columns may be created from a categorical column with n values.