Teradata Package for Python Function Reference | 20.00 - MinMaxScalar - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.analytics.Transformations.MinMaxScalar.__init__ = __init__(self, columns, lbound=0, ubound=1, out_columns=None, datatype=None, fillna=None)
- DESCRIPTION:
MinMaxScalar allows rescaling that limits the upper and lower boundaries of the
data in a continuous numeric column using a linear rescaling function based on
maximum and minimum data values. MinMaxScalar is useful with algorithms that require
or work better with data within a certain range. MinMaxScalar is only valid on numeric
columns, and not columns of type date.
The rescale transformation formulas are shown in the following examples.
The l denotes the left bound and r denotes the right bound.
* When both the lower and upper bounds are specified:
f(x,l,r) = (l+(x-min(x))(r-l))/(max(x)-min(x))
* When only the lower bound is specified:
f(x,l) = x-min(x)+l
* When only the upper bound is specified:
f(x,r) = x-max(x)+r
Rescaling supports only numeric type columns.
Note:
Output of this function is passed to "rescale" argument of "Transform"
function from Vantage Analytic Library.
PARAMETERS:
columns:
Required Argument.
Specifies the names of the columns to perform transformation on.
Types: str or list of str
lbound:
Optional Argument.
Specifies the lowerbound value required for rescaling the numeric data.
If only the lower boundary is supplied, the variable is aligned to this
value. This can be achieved by passing None to "ubound" argument.
Default Value: 0
Types: float, int
ubound:
Optional Argument.
Specifies the upperbound value required for rescaling the numeric data.
If only an upper boundary value is specified, the variable is aligned to
this value. This can be achieved by passing None to "lbound" argument.
Default Value: 1
Types: float, int
out_columns:
Optional Argument.
Specifies the names of the output columns.
Note:
Number of elements in "columns" and "out_columns" must be same.
Types: str or list of str
datatype:
Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using either the
teradatasqlalchemy types or the permitted strings mentioned below:
-------------------------------------------------------------------
| If intended SQL Data Type is | Permitted Value to be passed is |
|-------------------------------------------------------------------|
| bigint | bigint |
| byteint | byteint |
| char(n) | char,n |
| date | date |
| decimal(m,n) | decimal,m,n |
| float | float |
| integer | integer |
| number(*) | number |
| number(n) | number,n |
| number(*,n) | number,*,n |
| number(n,n) | number,n,n |
| smallint | smallint |
| time(p) | time,p |
| timestamp(p) | timestamp,p |
| varchar(n) | varchar,n |
--------------------------------------------------------------------
Notes:
1. Argument is ignored if "columns" argument is not used.
2. char without a size is not supported.
3. number(*) does not include the * in its datatype format.
Examples:
1. If intended datatype for the output column is "bigint", then
pass string "bigint" to the argument as shown below:
datatype="bigint"
2. If intended datatype for the output column is "decimal(3,5)", then
pass string "decimal,3,5" to the argument as shown below:
datatype="decimal,3,5"
Types: str, BIGINT, BYTEINT, CHAR, DATE, DECIMAL, FLOAT, INTEGER, NUMBER, SMALLINT, TIME,
TIMESTAMP, VARCHAR.
fillna:
Optional Argument.
Specifies whether the null replacement/missing value treatment should
be performed with rescaling or not. Output of 'FillNa()' can be passed to
this argument.
Note:
If the FillNa object is created with its arguments "columns",
"out_columns" and "datatype", then values passed in FillNa() arguments
are ignored. Only nullstyle information is captured from the same.
Types: FillNa
RETURNS:
An instance of MinMaxScalar class.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLE:
# Note:
# To run any transformation, user needs to use Transform() function from
# Vantage Analytic Library.
# To do so import valib first and set the "val_install_location".
>>> from teradataml import configure, DataFrame, MinMaxScalar, FillNa, load_example_data, valib
>>> configure.val_install_location = "SYSLIB"
>>>
# Load example data.
>>> load_example_data("dataframe", "sales")
>>>
# Create the required DataFrames.
>>> df = DataFrame("sales")
>>> df
Feb Jan Mar Apr datetime
accounts
Alpha Co 210.0 200.0 215.0 250.0 04/01/2017
Blue Inc 90.0 50.0 95.0 101.0 04/01/2017
Yellow Inc 90.0 NaN NaN NaN 04/01/2017
Jones LLC 200.0 150.0 140.0 180.0 04/01/2017
Red Inc 200.0 150.0 140.0 NaN 04/01/2017
Orange Inc 210.0 NaN NaN 250.0 04/01/2017
>>>
# Example 1: Rescale values in column "Feb", using the default bounds, which is
# with lowerbound as 0 and upperbound as 1.
>>> rs = MinMaxScalar(columns="Feb")
# Execute Transform() function.
>>> obj = valib.Transform(data=df, rescale=rs)
>>> obj.result
accounts Feb
0 Blue Inc 0.000000
1 Alpha Co 1.000000
2 Jones LLC 0.916667
3 Yellow Inc 0.000000
4 Orange Inc 1.000000
5 Red Inc 0.916667
>>>
# Example 2: Rescale values in column "Feb", using only lowerbound as -1.
# To use only lowerbound, one must pass None to "ubound".
>>> rs = MinMaxScalar(columns="Feb", lbound=-1, ubound=None)
# Execute Transform() function.
>>> obj = valib.Transform(data=df, rescale=rs)
>>> obj.result
accounts Feb
0 Jones LLC 109.0
1 Yellow Inc -1.0
2 Red Inc 109.0
3 Blue Inc -1.0
4 Alpha Co 119.0
5 Orange Inc 119.0
>>>
# Example 3: Rescale values in columns "Jan" and "Apr", using only upperbound as 10.
# To use only upperbound, one must pass None to "lbound".
# We shall also combine this with missing value treatment. We shall replace
# missing values with "mode" null style replacement.
>>> fn = FillNa(style="mode")
>>> rs = MinMaxScalar(columns=["Jan", "Apr"], lbound=None, ubound=10, fillna=fn)
# Execute Transform() function.
>>> obj = valib.Transform(data=df, rescale=rs, key_columns="accounts")
>>> obj.result
accounts Jan Apr
0 Alpha Co 10.0 10.0
1 Blue Inc -140.0 -139.0
2 Yellow Inc -40.0 10.0
3 Jones LLC -40.0 -60.0
4 Red Inc -40.0 10.0
5 Orange Inc -40.0 10.0
>>>
# Example 4: This example shows combining multiple ways of rescaling in one
# Transform() call.
# Rescale values in column "Feb" using lowerbound as -1 and upperbound as 1.
# Name the output column as "Feb1".
>>> rs_1 = MinMaxScalar(columns="Feb", lbound=-1, ubound=1, out_columns="Feb1")
>>>
# Rescale values in column "Feb" using only upperbound as 1.
# Name the output column as "FebU".
>>> rs_2 = MinMaxScalar(columns="Feb", lbound=None, ubound=1, out_columns="FebU")
>>>
# Rescale values in column "Feb" using only lowerbound as 0 (default value).
# Name the output column as "FebL".
>>> rs_3 = MinMaxScalar(columns="Feb", ubound=None, out_columns="FebL")
>>>
# Rescale values in columns "Jan" and "Apr" using default bounds.
# Name the output columns as "Jan1" and "Apr1".
# Combine with Missing value treatment, with literal null replacement.
>>> fn_1 = FillNa(style="literal", value=0)
>>> rs_4 = MinMaxScalar(columns=["Jan", "Apr"], out_columns=["Jan1", "Apr1"], fillna=fn_1)
>>>
# Rescale values in columns "Jan" and "Apr" using default bounds.
# Name the output columns as "Jan2" and "Apr2".
# Combine with Missing value treatment, with median null replacement.
>>> fn_2 = FillNa(style="median")
>>> rs_5 = MinMaxScalar(columns=["Jan", "Apr"], out_columns=["Jan2", "Apr2"], fillna=fn_2)
>>>
# Execute Transform() function.
>>> obj = valib.Transform(data=df, rescale=[rs_1, rs_2, rs_3, rs_4, rs_5],
... key_columns="accounts")
>>> obj.result
accounts Feb1 FebU FebL Jan1 Apr1 Jan2 Apr2
0 Blue Inc -1.000000 -119.0 0.0 0.25 0.404 0.000000 0.000000
1 Alpha Co 1.000000 1.0 120.0 1.00 1.000 1.000000 1.000000
2 Jones LLC 0.833333 -9.0 110.0 0.75 0.720 0.666667 0.530201
3 Yellow Inc -1.000000 -119.0 0.0 0.00 0.000 0.666667 0.765101
4 Orange Inc 1.000000 1.0 120.0 0.00 1.000 0.666667 1.000000
5 Red Inc 0.833333 -9.0 110.0 0.75 0.000 0.666667 0.765101
>>>