Teradata Package for Python Function Reference | 20.00 - ZScore - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.analytics.Transformations.ZScore.__init__ = __init__(self, columns, out_columns=None, datatype=None, fillna=None)
- DESCRIPTION:
ZScore will allows rescaling of continuous numeric data in a more
sophisticated way than a Rescaling transformation. In a Z-Score
transformation, a numeric column is transformed into its Z-score based
on the mean value and standard deviation of the data in the column.
Z-Score transforms each column value into the number of standard
deviations from the mean value of the column. This non-linear transformation
is useful in data mining rather than in a linear Rescaling transformation.
The Z-Score transformation supports both numeric and date type input data.
Note:
Output of this function is passed to "zscore" argument of "Transform"
function from Vantage Analytic Library.
PARAMETERS:
columns:
Required Argument.
Specifies the name(s) of the column(s) to perform transformation on.
Types: str or list of str
out_columns:
Optional Argument.
Specifies the names of the output columns.
Note:
Number of elements in "columns" and "out_columns" must be same.
Types: str or list of str
datatype:
Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using either the
teradatasqlalchemy types or the permitted strings mentioned below:
-------------------------------------------------------------------
| If intended SQL Data Type is | Permitted Value to be passed is |
|-------------------------------------------------------------------|
| bigint | bigint |
| byteint | byteint |
| char(n) | char,n |
| date | date |
| decimal(m,n) | decimal,m,n |
| float | float |
| integer | integer |
| number(*) | number |
| number(n) | number,n |
| number(*,n) | number,*,n |
| number(n,n) | number,n,n |
| smallint | smallint |
| time(p) | time,p |
| timestamp(p) | timestamp,p |
| varchar(n) | varchar,n |
--------------------------------------------------------------------
Notes:
1. Argument is ignored if "columns" argument is not used.
2. char without a size is not supported.
3. number(*) does not include the * in its datatype format.
Examples:
1. If intended datatype for the output column is "bigint", then
pass string "bigint" to the argument as shown below:
datatype="bigint"
2. If intended datatype for the output column is "decimal(3,5)", then
pass string "decimal,3,5" to the argument as shown below:
datatype="decimal,3,5"
Types: str, BIGINT, BYTEINT, CHAR, DATE, DECIMAL, FLOAT, INTEGER, NUMBER, SMALLINT, TIME,
TIMESTAMP, VARCHAR.
fillna:
Optional Argument.
Specifies whether the null replacement/missing value treatment should
be performed with Z-Score transformation or not. Output of 'FillNa()'
can be passed to this argument.
Note:
If the FillNa object is created with its arguments "columns",
"out_columns" and "datatype", then values passed in FillNa() arguments
are ignored. Only nullstyle information is captured from the same.
Types: FillNa
RETURNS:
An instance of ZScore class.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLE:
# Note:
# To run any transformation, user needs to use Transform() function from
# Vantage Analytic Library.
# To do so import valib first and set the "val_install_location".
>>> from teradataml import configure, DataFrame, FillNa, load_example_data, valib, ZScore
>>> configure.val_install_location = "SYSLIB"
>>>
# Load example data.
>>> load_example_data("dataframe", "sales")
>>>
# Create the required DataFrames.
>>> sales = DataFrame("sales")
>>> sales
Feb Jan Mar Apr datetime
accounts
Alpha Co 210.0 200.0 215.0 250.0 04/01/2017
Blue Inc 90.0 50.0 95.0 101.0 04/01/2017
Yellow Inc 90.0 NaN NaN NaN 04/01/2017
Jones LLC 200.0 150.0 140.0 180.0 04/01/2017
Red Inc 200.0 150.0 140.0 NaN 04/01/2017
Orange Inc 210.0 NaN NaN 250.0 04/01/2017
>>>
# Example 1: Rescaling with ZScore is carried out on "Feb" column.
>>> zs = ZScore(columns="Feb")
# Execute Transform() function.
>>> obj = valib.Transform(data=sales, zscore=zs)
>>> obj.result
accounts Feb
0 Blue Inc -1.410220
1 Alpha Co 0.797081
2 Jones LLC 0.613139
3 Yellow Inc -1.410220
4 Orange Inc 0.797081
5 Red Inc 0.613139
>>>
# Example 2: Rescaling with ZScore is carried out with multiple columns "Jan"
# and "Apr" with null replacement using "mode" style.
>>> fn = FillNa(style="mode")
>>> zs = ZScore(columns=["Jan", "Apr"], out_columns=["january", "april"], fillna=fn)
# Execute Transform() function.
>>> obj = valib.Transform(data=sales, zscore=zs, key_columns="accounts")
>>> obj.result
accounts january april
0 Blue Inc -2.042649 -1.993546
1 Alpha Co 1.299867 0.646795
2 Jones LLC 0.185695 -0.593634
3 Yellow Inc 0.185695 0.646795
4 Orange Inc 0.185695 0.646795
5 Red Inc 0.185695 0.646795
>>>