Teradata Package for R Function Reference | 17.00 - 17.00 - tdMinMaxScalar - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

tdMinMaxScalar() allows rescaling to limit the upper and lower boundaries of the data in a continuous numeric column using a linear rescaling function based on maximum and minimum data values. Rescaling is useful with algorithms that require or work better with data within a certain range. tdMinMaxScalar() is only valid on numeric columns, and not columns of type date.
The rescale transformation formulas are shown in the following examples.
The l denotes the left bound and r denotes the right bound.

  1. When both the lower and upper bounds are specified:
    f(x,l,r) = (l+(x-min(x))(r-l))/(max(x)-min(x))

  2. When only the lower bound is specified:
    f(x,l) = x-min(x)+l

  3. When only the upper bound is specified:
    f(x,r) = x-max(x)+r

Note:

  • Output of this function is passed to "rescale" argument of td_transform_valib().

Usage

tdMinMaxScalar(columns, lbound=NULL,ubound=NULL, datatype=NULL,
               fillna=NULL)

Arguments

columns

Required Argument.
Specifies name(s) of column(s) containing the input and output column names, where key is the name of the column to perform transformation on and value contains the name of the transformed output column. When only key is specified then output column name is the name of input column.
Types: character OR list of Strings (character)

lbound

Optional Argument.
Specifies the lower bound value required for rescaling the numeric data. If only the lower boundary value is specified, the variable is aligned to this value. This can be achieved by passing NULL to "ubound" argument. Default Value: 0 Types: numeric

ubound

Optional Argument.
Specifies the upper bound value required for rescaling the numeric data. If only an upper boundary value is specified, the variable is aligned to this value. This can be achieved by passing NULL to "lbound" argument. Default Value: 1 Types: numeric

datatype

Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using the permitted strings below:

------------------------------------ ---------------------------------------
If intended SQL Data Type is Permitted Value to be passed is
------------------------------------ ---------------------------------------
bigint bigint
byteint byteint
char(n) char,n
date date
decimal(m,n) decimal,m,n
float float
integer integer
number(*) number
number(n) number,n
number(*,n) number,*,n
number(n,n) number,n,n
smallint smallint
time(p) time,p
timestamp(p) timestamp,p
varchar(n) varchar,n

Notes:

  1. Argument is ignored if "columns" argument is not used.

  2. char without a size is not supported.

  3. number(*) does not include the * in its datatype format.

Examples:

  1. If intended datatype for the output column is "bigint", then pass string "bigint" to the argument as shown below:
    datatype="bigint"

  2. If intended datatype for the output column is "decimal(3,5)", then pass string "decimal,3,5" to the argument as shown below:
    datatype="decimal,3,5"

Types: character

fillna

Optional Argument.
Specifies whether the null replacement/missing value treatment should be performed with rescale or not. Output of tdFillNa() can be passed to this argument.
Note:

  • If the tdFillNa object is created with its arguments "columns", and "datatype", then values passed in tdFillNa() arguments are ignored. Only nullstyle information is captured from the same.

Types: tdFillNa

Value

An object of tdMinMaxScalar class.

Examples

Notes:
# 1. To run any transformation, user needs to use td_transform_valib()
#    function.
# 2. To do so set option 'val.install.location' to the database name where
#    Vantage analytic library functions are installed.

# Get the current context/connection
con <- td_get_context()$connection

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Load example data.
loadExampleData("val_example", "sales")

# Create object(s) of class "tbl_teradata".
sales <- tbl(con, "sales")
sales

# Example 1: Rescale values in column 'Feb', using the default bounds, which
#            is with "lbound" as 0 and "ubound" as 1.
# Create tdMinMaxScalar object.
rs <- tdMinMaxScalar(columns="Feb")

# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=rs)
obj$result

# Example 2: Rescale values in column 'Feb', using only "lbound" as -1.
# Create tdMinMaxScalar object.
rs <- tdMinMaxScalar(columns="Feb", lbound=-1, ubound=NULL)

# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=rs)
obj$result

# Example 3: Rescale values in columns 'Jan' and 'Apr', using only
#            "ubound" as 10. To use only "ubound", one must pass NULL to
#            "lbound". Rescaling is combined with missing value
#            treatment. Missing values are replaced with "mode" null
#            style replacement.
# Create tdFillNa object.
fn <- tdFillNa(style="mode")

# Create tdMinMaxScalar object.
rs <- tdMinMaxScalar(columns=list("Jan", "Apr"), lbound=NULL, ubound=10,
                    fillna=fn)

# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=rs, key.columns="accounts")
obj$result

# Example 4: Combine multiple ways of rescaling in one td_transform_valib()
#            call.
# Rescale values in column 'Feb' using "lbound" as -1 and "ubound" as 1.
# Name the output column as 'Feb1'.
rs_1 <- tdMinMaxScalar(columns=list("Feb"="Feb1"), lbound=-1, ubound=1)

# Rescale values in column 'Feb' using only "ubound" as 1 and default value
# of "lbound" which is 0.
# Name the output column as 'FebU'.
rs_2 <- tdMinMaxScalar(columns=list("Feb"="FebU"), ubound=1)

# Rescale values in column 'Feb' using only "lbound" as 0 (default value).
# Name the output column as 'FebL'.
rs_3 <- tdMinMaxScalar(columns=list("Feb"="FebL"), ubound=NULL)

# Rescale values in columns 'Jan' and 'Apr' using default bounds.
# Name the output columns as 'Jan1' and 'Apr1'.
# Combine with missing value treatment, with literal null replacement.
fn_1 <- tdFillNa(style="literal", value=0)
rs_4 <- tdMinMaxScalar(columns=list("Jan"="Jan1", "Apr"="Apr1"), fillna=fn_1)

# Rescale values in columns 'Jan' and 'Apr' using default bounds.
# Name the output columns as 'Jan2' and 'Apr2.'
# Combine with missing value treatment, with median null replacement.
fn_2 <- tdFillNa(style="median")
rs_5 <- tdMinMaxScalar(columns=list("Jan"="Jan2", "Apr"="Apr2"), fillna=fn_2)

# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=c(rs_1, rs_2, rs_3, rs_4, rs_5),
                         key.columns="accounts")
obj$result