5.4.5 - Derive - Teradata Warehouse Miner

In-Database Analytic Functions User Guide

prodname
Teradata Warehouse Miner
vrm_release
5.4.5
created_date
February 2018
category
User Guide
featnum
B035-2306-028K

Purpose

The Derive transformation requires the free-form transformation be specified as a formula using the following operators, arguments, and functions:

+, -, **, *, /, %, (, ), x, y, z, abs, exp, ln, log, sqrt

The arguments x, y, and z can only assume the value of an input column, unlike in the classic version of this transformation. Enclose the formula in pairs of single quotes for correct processing (see example below). An implied multiply operator is automatically inserted when a number, argument (x, y, z), or parenthesis is immediately followed by an argument or parenthesis. For example, 4x means 4*x, xy means x*y, and x(x+1) is equivalent to x*(x+1).

An example formula for the quadratic equation is below.

formula (''(-y+sqrt(y**2-4xz))/(2x)'')

Syntax

call twm. td_analyze('vartran','database=database name;tablename=table name;General Parameters;derive={formula (arithmetic formula),arguments (values),outputname (output name)};');

Required Parameters

arguments
Specifies the columns is associated with the arguments x, y, and z. If the formula parameter includes only the argument x, then the arguments parameter should include only one column. If it includes the arguments x and y, then the arguments parameter should include two column names. And finally, if the formula includes the arguments x, y, and z, the arguments parameter should include three column names.
In the unusual case where the formula defines a constant expression, the arguments parameter is not required. However, at least one column must be included in the overall transformation, perhaps in a Retain transformation.
database
The database containing the input table.
derive
Identifies the type of transformation being performed.
formula

The Derive transformation requires the free-form transformation is specified as a formula using the following operators, arguments, and functions:

+, -, **, *, /, %, (, ), x, y, z, abs, exp, ln, log, sqrt

The arguments x, y, and z can only assume the value of an input column, unlike in the classic version of this transformation. Enclose the formula in pairs of single quotes. An implied multiply operator is automatically inserted when a number, argument (x, y, z) or parenthesis is immediately followed by an argument or parenthesis. For example, 4x means 4*x, xy means x*y, and x(x+1) is equivalent to x*(x+1).

An example formula for the quadratic equation is below.

formula ("(-y+sqrt(y**2-4xz))/(2x)")

outputname
Controls the naming of the transformed output column for the Derive transformation.
tablename
The input table to build a predictive model from.
vartran
Required to run a variable transformation. Enclose the 'vartran' parameter in single quotes.

Optional Parameters

datatype
For all transformation types, the datatype casts the column to a desired database data type provided it is compatible with the transformed data.
Allowed output types include:
  • byteint
  • char
  • date
  • decimal
  • float
  • integer
  • smallint
  • time
  • timestamp
  • varchar
  • bigint
  • number
fallback
When true, requests a mirrored copy of the output table in the Teradata Database when outputstyle=table.
gensqlonly
When true, the SQL for the requested transformations is returned as a result set but not executed. When not specified or set to false, the SQL is executed but not returned.
indexcolumns
When true, requests the output table contain the index columns when outputstyle=table.
indexunique
When true, requests the output table contain a unique primary index when outputstyle=table.
keycolumns
When null replacement is requested, either via a Null Replacement transformation or in combination with a Bin Code, Derive, Design Code, Recode, Rescale, Sigmoid, or Z Score transformation, the keycolumns parameter must be specified. The column or columns listed must form a unique key into the input and output table of the transformation.
lockingclause

Requests the generated SQL contain the given locking clause in the appropriate location depending on the output style.

An example of a locking clause when the output style defaults to select is:

LOCKING mydb.mytable FOR ACCESS;

multiset
When true, requests an output table that can contain duplicate rows when outputstyle=table.
noindex
When true, requests the output table contain no index columns when outputstyle=table.
nullstyle
Data types supported by various nullstyle parameters are:
Data Type Description Example
literal,value numeric, character, and date nullstyle (literal,value)
mean numeric and date nullstyle (mean)
median numeric and date nullstyle (median)
medianwithoutaveraging any supported data type nullstyle (medianwithoutaveraging)
mode any supported date type nullstyle (mode)
imputed,table any supported data type nullstyle (imputed,tablename)

If date values are entered, the keyword DATE must precede the date value, which should not be enclosed in single quotes.

outputdatabase
The database containing the resulting output table when outputstyle=table or view.
outputstyle
Allowed output styles are:
  • select
  • table
  • view
If outputstyle is not specified, the function generates a SELECT statement and does not create a table or view.
outputstyle={select|table|view}
outputtablename
The name of the output table when outputstyle=table or view.
overwrite

When overwrite is set to true (default), the output tables are dropped before creating new ones.

whereclause
Requests the generated SQL containing the given WHERE clause in appropriate places in the generated SQL. This is independent of the output style requested.

Examples

These examples describe how to use Derive. To execute the provided examples, the td_analyze function must be installed in a database called twm and the Teradata Warehouse Miner tutorial data must be installed in the twm_source database.

The following example includes two formulas: one with 3 variables (x, y, and z), and the other with one (x). The second formula, x(x+1), is enclosed in pairs of single quotes, necessitated by the presence of parentheses in the formula. Doubling the single quotes is necessary because the parameters are already contained within a string literal. Include the pairs of single quotes around the formula to avoid confusion.

call twm.td_analyze('vartran','database=twm_source;tablename=twm_customer;derive={formula(x+y+z),arguments(income,age,years_with_bank),outputname(myoutput)}{formula(''x(x+1)''),arguments(age),outputname(myoutput2)};');

The following example demonstrates combined null replacement. keycolumns must be included as a general parameter when null value replacement is performed.

call twm.td_analyze('vartran','database=twm_source;tablename=twm_customer;keycolumns=cust_id;derive=formula(''x(x+1)''),arguments(age),outputname(myoutput),nullstyle(literal,0);');