Teradata Package for Python Function Reference on VantageCloud Lake - Derive - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.analytics.Transformations.Derive.__init__ = __init__(self, formula, columns, out_column, datatype=None, fillna=None)
- DESCRIPTION:
The Derive transformation requires the free-form transformation be specified
as a formula using the following operators, arguments, and functions:
+, -, **, *, /, %, (, ), x, y, z, abs, exp, ln, log, sqrt
The arguments x, y, and z can only assume the value of an input column.
An implied multiply operator is automatically inserted when a number, argument
(x, y, z), or parenthesis is immediately followed by an argument or parenthesis.
For example,
4x means 4*x, xy means x*y, and x(x+1) is equivalent to x*(x+1).
An example formula for the quadratic equation is below.
formula="(-y+sqrt(y**2-4xz))/(2x)"
Note:
Output of this function is passed to "derive" argument of "Transform"
function from Vantage Analytic Library.
PARAMETERS:
formula:
Required Argument.
Specifies the free-form transformation required for Derive.
Arithmetic formula can be specified as string using following operators,
arguments, and functions:
+, -, **, *, /, %, (, ), x, y, z, abs, exp, ln, log, sqrt
Types: str
columns:
Required Argument.
Specifies the names of the columns to use for formula.
Types: str or list of str
out_column:
Required Argument.
Specifies the name of the output column.
Types: str
datatype:
Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using either the
teradatasqlalchemy types or the permitted strings mentioned below:
-------------------------------------------------------------------
| If intended SQL Data Type is | Permitted Value to be passed is |
|-------------------------------------------------------------------|
| bigint | bigint |
| byteint | byteint |
| char(n) | char,n |
| date | date |
| decimal(m,n) | decimal,m,n |
| float | float |
| integer | integer |
| number(*) | number |
| number(n) | number,n |
| number(*,n) | number,*,n |
| number(n,n) | number,n,n |
| smallint | smallint |
| time(p) | time,p |
| timestamp(p) | timestamp,p |
| varchar(n) | varchar,n |
--------------------------------------------------------------------
Notes:
1. Argument is ignored if "columns" argument is not used.
2. char without a size is not supported.
3. number(*) does not include the * in its datatype format.
Examples:
1. If intended datatype for the output column is "bigint", then
pass string "bigint" to the argument as shown below:
datatype="bigint"
2. If intended datatype for the output column is "decimal(3,5)", then
pass string "decimal,3,5" to the argument as shown below:
datatype="decimal,3,5"
Types: str, BIGINT, BYTEINT, CHAR, DATE, DECIMAL, FLOAT, INTEGER, NUMBER, SMALLINT, TIME,
TIMESTAMP, VARCHAR.
fillna:
Optional Argument.
Specifies whether the null replacement/missing value treatment should
be performed with derive or not. Output of FillNa() can be passed to
this argument.
Note:
If the FillNa object is created with its arguments "columns",
"out_columns" and "datatype", then values passed in FillNa() arguments
are ignored. Only nullstyle information is captured from the same.
Types: FillNa
RETURNS:
An instance of Derive class.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLE:
# Note:
# To run any transformation, user needs to use Transform() function from
# Vantage Analytic Library.
# To do so import valib first and set the "val_install_location".
>>> from teradataml import configure, DataFrame, Derive, FillNa, load_example_data, valib
>>> configure.val_install_location = "SYSLIB"
>>>
# Load example data.
>>> load_example_data("dataframe", "sales")
>>>
# Create the required DataFrame.
>>> sales = DataFrame("sales")
>>> sales
Feb Jan Mar Apr datetime
accounts
Blue Inc 90.0 50.0 95.0 101.0 04/01/2017
Alpha Co 210.0 200.0 215.0 250.0 04/01/2017
Jones LLC 200.0 150.0 140.0 180.0 04/01/2017
Yellow Inc 90.0 NaN NaN NaN 04/01/2017
Orange Inc 210.0 NaN NaN 250.0 04/01/2017
Red Inc 200.0 150.0 140.0 NaN 04/01/2017
>>>
# Example: Includes multiple derive transformations.
# Derive transformation 1 is done with 3 variables, x, y, z, to calculate
# the total sales for the first quarter for each account.
>>> fn_1 = FillNa(style='literal', value=0)
>>> dr_1 = Derive(formula="x+y+z", columns=["Jan", "Feb", "Mar"],
... out_column="q1_sales", fillna=fn_1)
>>>
# Derive transformation 2 is done with 2 variables, x, y, to calculate
# the sale growth from the month of Jan to Feb.
>>> fn_2 = FillNa(style='median')
>>> dr_2 = Derive(formula="((y-x)/x)*100", columns=["Jan", "Feb"],
... out_column="feb_growth", fillna=fn_2, datatype='bigint')
>>>
# Execute Transform() function.
>>> obj = valib.Transform(data=sales, derive=[dr_1, dr_2], key_columns="accounts")
>>> obj.result
accounts q1_sales feb_growth
0 Alpha Co 625.0 4
1 Red Inc 490.0 33
2 Orange Inc NaN 40
3 Jones LLC 490.0 33
4 Yellow Inc NaN -40
5 Blue Inc 235.0 79
>>>