Recoding a categorical data column is often done to re-express existing values of a column (variable) into some new coding scheme or to correct data quality problems and focus an analysis on a particular value. It allows for mapping individual values, NULL values or any number of remaining values (ELSE option) to a new value, a NULL value or the same value.
Recoding supports character, numeric and date type columns. If date values are entered, the keyword DATE must precede the date value, which should not be enclosed in single quotes.
Syntax
call twm. td_analyze('vartran','database=twm_source;tablename=twm_customer;General Parameters;recode={recodevalues (value pairs),recodeother (n),columns (value)};');Required Parameters
- columns
- Controls the name of the output (transformed) column and its data type. The columns parameter is required by all transformations except Derive. A separate transformation is performed for each column in the list. If a column name is followed by a forward slash and a name, the name after the slash becomes the name of the transformed column in the resultant output table. Otherwise the column name is used as the output column name.
- database
- The database containing the input table.
- recode
- The parameter that identifies the type of transformation being performed.
- recodeother
-
The recodeother parameter specifies the value to be assumed for all other cases.
For example: recodeother (SAME)
Values allowed are:
- SAME
- NULL
- A literal value
- recodevalues
-
The recodevalues parameter lists the values to be recoded, alternating the old and new column values separated by a slash.
For example: recodevalues (F/f,M/SAME,NULL/0)
The keyword SAME is used for the same value.
- tablename
- The name of the table to be transformed.
- vartran
- This parameter is required to run a variable transformation. The vartran parameter is always enclosed in single quotes.
General Parameters (separated by a semi-colon)
- datatype
- For all transformation types, the datatype parameter is used to cast the column to a desired database data type provided it is compatible with the transformed data. The allowed output types include:
- byteint
- char
- date
- decimal
- float
- integer
- smallint
- time
- timestamp
- varchar
- bigint
- number
- fallback
- When set to true, this parameter requests a mirrored copy of the output table in the Teradata Database when outputstyle=table.
- gensqlonly
- When set to true, the SQL for the requested transformations is returned as a result set but not executed. When this parameter is not specified or is set to false, the SQL is executed but not returned.
- indexcolumns
- When set to true, requests the output table contain the index columns when outputstyle=table.
- indexunique
- When set to true, requests the output table contain a unique primary index when outputstyle=table.
- keycolumns
- When null replacement is requested, either via a Null Replacement transformation or in combination with a Bin Code, Derive, Design Code, Recode, Rescale, Sigmoid or Z Score transformation, the keycolumns parameter must be specified. The column or columns listed must form a unique key into the input and output table of the transformation.
- lockingclause
-
Requests the generated SQL contain the given locking clause in the appropriate location depending on the output style.
An example of a locking clause when the output style defaults to select is:
LOCKING mydb.mytable FOR ACCESS;
- multiset
- When set to true, requests an output table that may contain duplicate rows when outputstyle=table.
- noindex
- When set to true, requests the output table contain no index columns when outputstyle=table.
- nullstyle
-
Data types supported by various nullstyle parameters are:
Data Type Description Example literal,value numeric, character, and date nullstyle (literal,value) mean numeric and date nullstyle (mean) median numeric and date nullstyle (median) medianwithoutaveraging any supported data type nullstyle (medianwithoutaveraging) mode any supported date type nullstyle (mode) imputed,table any supported data type nullstyle (imputed,tablename) If date values are entered, the keyword DATE must precede the date value, which should not be enclosed in single quotes.
- outputdatabase
- The database that will contain the resulting output table when outputstyle=table or view.
- outputtablename
- The name of the output table when outputstyle=table or view.
- whereclause
- Requests the generated SQL contain the given WHERE clause in appropriate places in the generated SQL. This is independent of the output style requested.
Examples
Examples in this section show how to use Recode. These examples assume that the td_analyze function has been installed in a database named twm.
The following example demonstrate the Recode transformation.
call twm.td_analyze('vartran','database=twm_source;tablename=twm_customer;recode={recodevalues(M/SAME,F/f),recodeother(NULL),columns(gender)}{recodevalues(1/SAME,2/NULL,3/6,4/4,NULL/NULL),recodeother(NULL),columns(marital_status)}{recodevalues(F/f,null/0),recodeother(same),columns(gender/gender2)}{recodevalues(0/0,1/1,2/1,3/1,4/1,5/1),recodeother(0),columns(nbr_children,years_with_bank)};');
This example shows how date literals can be used.
call twm.td_analyze('vartran',database=twm_source;tablename=twm_credit_acct;recode={recodevalues(DATE 1995-01-05/DATE 2016-06-15,DATE 1995-12-26/SAME),recodeother(DATE 2016-01-01),columns(acct_start_date)}{recodevalues(DATE 1995-01-05/DATE 2016-06-15,DATE 1995-12-26/SAME),recodeother(NULL),columns(acct_start_date/start2)}{recodevalues(DATE 1995-01-05/DATE 2016-06-15,DATE 1995-12-26/SAME),recodeother(SAME),columns(acct_start_date/start3)};');
The following example demonstrates combined null replacement. Note that keycolumns must be included as a general parameter when null value replacement is performed.
call twm.td_analyze('vartran',database=twm_source;tablename=twm_customer;keycolumns=cust_id;recode=recodevalues(0/0,1/1,2/1,3/1,4/1,5/1),recodeother(0),nullstyle(literal,0),columns(nbr_children,years_with_bank);');