Description
tdMinMaxScalar()
allows rescaling to limit the upper and lower
boundaries of the data in a continuous numeric column using a linear
rescaling function based on maximum and minimum data values. Rescaling is
useful with algorithms that require or work better with data within a
certain range. tdMinMaxScalar()
is only valid on numeric columns,
and not columns of type date.
The rescale transformation formulas are shown in the following examples.
The l denotes the left bound and r denotes the right bound.
When both the lower and upper bounds are specified:
f(x,l,r) = (l+(x-min(x))(r-l))/(max(x)-min(x))
When only the lower bound is specified:
f(x,l) = x-min(x)+l
When only the upper bound is specified:
f(x,r) = x-max(x)+r
Note:
Output of this function is passed to "rescale" argument of
td_transform_valib()
.
Usage
tdMinMaxScalar(columns, lbound=NULL,ubound=NULL, datatype=NULL,
fillna=NULL)
Arguments
columns |
Required Argument. | ||||||||||||||||||||||||||||||||||||
lbound |
Optional Argument. | ||||||||||||||||||||||||||||||||||||
ubound |
Optional Argument. | ||||||||||||||||||||||||||||||||||||
datatype |
Optional Argument.
Notes:
Examples:
Types: character | ||||||||||||||||||||||||||||||||||||
fillna |
Optional Argument.
Types: tdFillNa |
Value
An object of tdMinMaxScalar class.
Examples
Notes:
# 1. To run any transformation, user needs to use td_transform_valib()
# function.
# 2. To do so set option 'val.install.location' to the database name where
# Vantage analytic library functions are installed.
# Get the current context/connection
con <- td_get_context()$connection
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Load example data.
loadExampleData("val_example", "sales")
# Create object(s) of class "tbl_teradata".
sales <- tbl(con, "sales")
sales
# Example 1: Rescale values in column 'Feb', using the default bounds, which
# is with "lbound" as 0 and "ubound" as 1.
# Create tdMinMaxScalar object.
rs <- tdMinMaxScalar(columns="Feb")
# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=rs)
obj$result
# Example 2: Rescale values in column 'Feb', using only "lbound" as -1.
# Create tdMinMaxScalar object.
rs <- tdMinMaxScalar(columns="Feb", lbound=-1, ubound=NULL)
# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=rs)
obj$result
# Example 3: Rescale values in columns 'Jan' and 'Apr', using only
# "ubound" as 10. To use only "ubound", one must pass NULL to
# "lbound". Rescaling is combined with missing value
# treatment. Missing values are replaced with "mode" null
# style replacement.
# Create tdFillNa object.
fn <- tdFillNa(style="mode")
# Create tdMinMaxScalar object.
rs <- tdMinMaxScalar(columns=list("Jan", "Apr"), lbound=NULL, ubound=10,
fillna=fn)
# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=rs, key.columns="accounts")
obj$result
# Example 4: Combine multiple ways of rescaling in one td_transform_valib()
# call.
# Rescale values in column 'Feb' using "lbound" as -1 and "ubound" as 1.
# Name the output column as 'Feb1'.
rs_1 <- tdMinMaxScalar(columns=list("Feb"="Feb1"), lbound=-1, ubound=1)
# Rescale values in column 'Feb' using only "ubound" as 1 and default value
# of "lbound" which is 0.
# Name the output column as 'FebU'.
rs_2 <- tdMinMaxScalar(columns=list("Feb"="FebU"), ubound=1)
# Rescale values in column 'Feb' using only "lbound" as 0 (default value).
# Name the output column as 'FebL'.
rs_3 <- tdMinMaxScalar(columns=list("Feb"="FebL"), ubound=NULL)
# Rescale values in columns 'Jan' and 'Apr' using default bounds.
# Name the output columns as 'Jan1' and 'Apr1'.
# Combine with missing value treatment, with literal null replacement.
fn_1 <- tdFillNa(style="literal", value=0)
rs_4 <- tdMinMaxScalar(columns=list("Jan"="Jan1", "Apr"="Apr1"), fillna=fn_1)
# Rescale values in columns 'Jan' and 'Apr' using default bounds.
# Name the output columns as 'Jan2' and 'Apr2.'
# Combine with missing value treatment, with median null replacement.
fn_2 <- tdFillNa(style="median")
rs_5 <- tdMinMaxScalar(columns=list("Jan"="Jan2", "Apr"="Apr2"), fillna=fn_2)
# Perform the rescale transformation using td_transform_valib() function.
obj <- td_transform_valib(data=sales, rescale=c(rs_1, rs_2, rs_3, rs_4, rs_5),
key.columns="accounts")
obj$result