Purpose
Rescaling limits the upper and lower boundaries of the data in a continuous numeric column using a linear rescaling function based on maximum and minimum data values. It is useful with algorithms that require or work better with data within a certain range. Rescale is only valid on numeric columns, and not columns of type date.
You can supply new minimum and maximum values to form new variable boundaries. If only the lower boundary is supplied, the variable is aligned to this value; if only an upper boundary value is specified, the variable is aligned to that value. If a requested column has a constant value (max and min are the same), then the transformation fails with an SQL error.
The rescale transformation formulas are shown in the following examples. The l denotes the left bound and not the numeral 1, while r denotes the right bound.
When both the lower and upper bounds are specified:
f(x,l,r) = (l+(x-min(x))(r-l))/(max(x)-min(x))
When only the lower bound is specified:
F(x,l) = x-min(x)+l
When only the upper bound is specified:
f(x,r) = x-max(x)+r
Rescaling supports only numeric type columns.
Syntax
call twm. td_analyze('vartran','database=twm_source;tablename=twm_customer;General Parameters;rescale={rescalebounds (lowerbound/0),columns (colvalues)};');Required Parameters
- columns
- Controls the name of the output (transformed) column and its data type. The columns parameter is required by all transformations except Derive. A separate transformation is performed for each column in the list. If a column name is followed by a forward slash and a name, the name after the slash becomes the name of the transformed column in the resultant output table. Otherwise the column name is used as the output column name.
- database
- The database containing the input table.
- rescale
- Identifies the type of transformation being performed.
- rescalebounds
- A slash to separate the keywords lowerbound and upperbound from the designated values. The third example is the default.
- rescalebounds (lowerbound/0)
- rescalebounds (upperbound/1)
- rescalebounds (lowerbound/0,upperbound/1)
- tablename
- The input table to build a predictive model from.
- vartran
- Required to run a variable transformation. Enclose the 'vartran' parameter in single quotes.
Optional Parameters
- datatype
- For all transformation types, the datatype casts the column to a desired database data type provided it is compatible with the transformed data. Allowed output types include:
- byteint
- char
- date
- decimal
- float
- integer
- smallint
- time
- timestamp
- varchar
- bigint
- number
- fallback
- When true, requests a mirrored copy of the output table in the Teradata Database when outputstyle=table.
- gensqlonly
- When true, the SQL for the requested transformations is returned as a result set but not executed. When not specified or set to false, the SQL is executed but not returned.
- indexcolumns
- When true, requests the output table contain the index columns when outputstyle=table.
- indexunique
- When true, requests the output table contain a unique primary index when outputstyle=table.
- keycolumns
- When null replacement is requested, either via a Null Replacement transformation or in combination with a Bin Code, Derive, Design Code, Recode, Rescale, Sigmoid, or Z Score transformation, the keycolumns parameter must be specified. The column or columns listed must form a unique key into the input and output table of the transformation.
- lockingclause
-
Requests the generated SQL contain the given locking clause in the appropriate location depending on the output style.
An example of a locking clause when the output style defaults to select is:
LOCKING mydb.mytable FOR ACCESS;
- multiset
- When true, requests an output table that can contain duplicate rows when outputstyle=table.
- noindex
- When true, requests the output table contain no index columns when outputstyle=table.
- nullstyle
-
Data types supported by various nullstyle parameters are:
Data Type Description Example literal,value numeric, character, and date nullstyle (literal,value) mean numeric and date nullstyle (mean) median numeric and date nullstyle (median) medianwithoutaveraging any supported data type nullstyle (medianwithoutaveraging) mode any supported date type nullstyle (mode) imputed,table any supported data type nullstyle (imputed,tablename) If date values are entered, the keyword DATE must precede the date value, which should not be enclosed in single quotes.
- outputdatabase
- The database containing the resulting output table when outputstyle=table or view.
- outputstyle
- Allowed output styles are:
- select
- table
- view
- outputtablename
- The name of the output table when outputstyle=table or view.
- overwrite
-
When overwrite is set to true (default), the output tables are dropped before creating new ones.
- whereclause
- Requests the generated SQL containing the given WHERE clause in appropriate places in the generated SQL. This is independent of the output style requested.
Examples
These examples show how to use Rescale. To execute the provided examples, the td_analyze function must be installed in a database called twm and the Teradata Warehouse Miner tutorial data must be installed in the twm_source database.
The following example demonstrates the Rescale transformation.
call twm.td_analyze('vartran','database=twm_source;tablename=twm_customer;rescale={rescalebounds(lowerbound/0,upperbound/1),columns(income/inc,age)}{rescalebounds(upperbound/1),columns(income/income1,age/age1)}{rescalebounds(lowerbound/0),columns(income/income2,age/age2)};');
The following example demonstrates combined null replacement. The keycolumns parameter must be included as a general parameter when null value replacement is performed.
call twm.td_analyze('vartran','database=twm_source;tablename=twm_customer;keycolumns=cust_id;rescale=rescalebounds(lowerbound/0,upperbound/1),nullstyle(literal,0),columns(age,income);');