Teradata Package for R Function Reference | 17.00 - 17.00 - td_transform_valib - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

The Variable Transformation analysis reads the input data in tbl_teradata object and produces an output containing transformed columns. This is useful when preparing data for input to an analytic algorithm. For example, a K-Means Clustering algorithm typically produces better results when the input columns are first converted to their Z-Score values to put all input variables on an equal footing, regardless of their magnitude.
Function supports the following transformations:

  1. Binning - replaces a continuous numeric column with a categorical one to produce ordinal values (for example, numeric categorical values where order is meaningful).

  2. Derive - allows you to enter free-form formulas based on columns within a table. It requires the free-form transformation to be specified as a formula using operators, arguments, and functions.

  3. One Hot Encoding - re-expresses a categorical data element as one or more numeric data elements and creates a binary numeric field for each categorical data value.

  4. Null Replacement - defines how missing values are treated.

  5. Label Encoding - allows to re-express existing values of a categorical data column (variable) into a new coding scheme or to correct data quality problems and focus an analysis on a value.

  6. Min-Max Scaling - limits the upper and lower boundaries of the data in a continuous numeric column using a linear rescaling function based on maximum and minimum data values.

  7. Retain - allows copying of one or more columns into the final analytic data set.

  8. Sigmoid - provides rescaling of continuous numeric data using a type of sigmoid or s-shaped function.

  9. ZScore - provides rescaling of continuous numeric data using Z-Scores.

Usage

td_transform_valib(data, bins=NULL, derive=NULL, one.hot.encode=NULL,
                   fillna=NULL, label.encode=NULL, rescale=NULL,
                   retain=NULL, sigmoid=NULL, zscore=NULL, ...)

Arguments

data

Required Argument.
Specifies the input data to perform variable transformations.
Types: tbl_teradata

bins

Optional Argument.
Specifies one or more object(s) of class tdBinning. Binning replaces a continuous numeric column with a categorical one to produce ordinal values (for example, numeric categorical values where order is meaningful).
Types: tdBinning or vector of tdBinning

derive

Optional Argument.
Specifies one or more object(s) of class tdDerive. This argument allows user to perform a free form transformation using arithmetic formula.
Types: tdDerive or vector of tdDerive

one.hot.encode

Optional Argument.
Specifies one or more object(s) of class tdOneHotEncoder. One hot encoding allows user to re-express categorical data as one or more numeric data elements, creating a binary numeric field for each categorical data value.
Types: tdOneHotEncoder or vector of tdOneHotEncoder

fillna

Optional Argument.
Specifies one or more object(s) of class tdFillNa. This argument allows user to perform a missing value/null replacement transformation.
Types: tdFillNa or vector of tdFillNa

label.encode

Optional Argument.
Specifies one or more object(s) of class tdLabelEncoder. This allows to re-express existing values of a categorical data column (variable) into a new coding scheme.
Types: tdLabelEncoder or vector of tdLabelEncoder

rescale

Optional Argument.
Specifies one or more object(s) of class tdMinMaxScalar. This limits the upper and lower boundaries of the data in a continuous numeric column using a linear rescaling function based on maximum and minimum data values.
Types: tdMinMaxScalar or vector of tdMinMaxScalar

retain

Optional Argument.
Specifies one or more object(s) of class tdRetain. This argument allows user to retain columns from input to output.
Types: tdRetain or vector of tdRetain

sigmoid

Optional Argument.
Specifies one or more object(s) of class tdSigmoid. This argument allows user to perform a rescaling using sigmoid transformation.
Types: tdSigmoid or vector of tdSigmoid

zscore

Optional Argument.
Specifies one or more object(s) of class tdZScore. This argument allows user to perform a rescaling using Z-Score transformation.
Types: tdZScore or vector of tdZScore

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns result based on transformation object passed. Named list member can be referenced directly with the "$" operator using name: result.

Other Arguments

fallback

Optional Argument.
Specifies whether a mirrored copy of underlying table of output tbl_teradata is required or not.
Default Value: FALSE
Types: logical

index.columns

Optional Argument.
Specifies the name(s) of the output column(s) to be used as index in output tbl_teradata.
Types: character OR vector of Strings (character)

unique.index

Optional Argument.
Specifies whether the underlying output table should contain a unique primary index or not.
Default Value: FALSE
Types: logical

key.columns

Optional Argument.
Specifies the name(s) of the column(s) that can be unique key in input and output tbl_teradata. When null replacement is requested, i.e., "fillna" argument is used either in tdFillNa transformation or in combination with a tdBinning, tdDerive, tdOneHotEncoder, td_LabelEncoder, tdMinMaxScalar, tdSigmoid, td_ZScore transformation, the "key.columns" argument must be specified.
Types: character OR vector of Strings (character)

allow.duplicates

Optional Argument.
Specifies whether output should contain duplicate rows or not.
Types: logical

nopi

Optional Argument.
Specifies whether the underlying output table should contain no index columns. When TRUE, output table does not contain index columns.
Note:

  • When this argument is set to TRUE, "allow.duplicates" must also be set to TRUE.

Types: logical

filter

Optional Argument.
Specifies the clause to filter rows selected for transformation.
For example,
filter = "cust_id > 0"
Types: character

See Also

tdFillNa, tdBinning, tdDerive, tdOneHotEncoder, tdLabelEncoder, tdMinMaxScalar, tdSigmoid, tdZScore

Examples

Notes:
# 1. To run any transformation, user needs to use td_transform_valib()
#    function.
# 2. To do so set option 'val.install.location' to the database name
#    where Vantage analytic library functions are installed.
# 3. Datasets used in these examples can be loaded using Vantage Analytic
#    Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create object(s) of class "tbl_teradata".
customer <- tbl(con, "customer")

# Example of running individual transformations can be found in the example
# sections for each Transformation.

# Binning performed on column 'income' with different number of bins.
bin_1 <- tdBinning(style="bins", value=3, columns=list("income"="inc1"))
bin_2 <- tdBinning(style="bins", value=4, columns=list("income"="inc2"))
bin_3 <- tdBinning(style="bins", value=5, columns=list("income"="inc3"))

# Derive transformation done on 'age' column with different arithmetic
# formula.
derive_1 <- tdDerive(formula="x/10", columns="age", out.column="age1")
derive_2 <- tdDerive(formula="x*10", columns="age", out.column="age2")
derive_3 <- tdDerive(formula="x+10", columns="age", out.column="age3")

# One hot encoding done for values in columns 'gender' and 'marital_status'.
dc_1 <- tdOneHotEncoder(values=list("M", "F"), column="gender")
dc_2 <- tdOneHotEncoder(values=list(1, 2, 3, 4), column="marital_status")

# Replace null values in column 'street_nbr', by using 'literal' and 'mean'
# null style.
fillna_1 <- tdFillNa(style="literal", value=0, columns="street_nbr")
fillna_2 <- tdFillNa(columns=list("street_nbr"="street_nbr2"))

# Recode values in column 'marital_status', using different coding schemes.
rc_1 <- tdLabelEncoder(values=list("1"="S", "2"="M", "3"="S", "4"="S"),
                       default="SAME", columns=list("marital_status"="mar1"))
rc_2 <- tdLabelEncoder(values=list("1"="B", "2"="A", "3"="B", "4"="B"),
                       default=NULL, columns=list("marital_status"="mar2"))

# Values in column 'age' are rescaled using MinMaxScalar using
#        1. Both lower bound and upper bound,
#        2. Only lower bound,
#        3. Only upper bound.
rs_1 <- tdMinMaxScalar(lbound=0, ubound=100, columns=list("age"="age4"))
rs_2 <- tdMinMaxScalar(lbound=0, ubound=None, columns=list("age"="age5"))
rs_3 <- tdMinMaxScalar(lbound=None, ubound=100, columns=list("age"="age6"))

# Rescale values in column "nbr_children" using various sigmoid functions.
sig_1 <- tdSigmoid(style="logit", columns=list("nbr_children"="nbr1"))
sig_2 <- tdSigmoid(style="modifiedlogit", columns=list("nbr_children"="nbr2"))
sig_3 <- tdSigmoid(style="tanh", columns=list("nbr_children"="nbr3"))

# Rescale values in column 'years_with_bank' using Z-Score values.
zscore <- tdZScore(columns=list("years_with_bank"="ywb"))

# Retain columns 'income', 'age', 'years_with_bank' and 'nbr_children' in the
# transformed output.
retain <- tdRetain(columns=list("income", "age", "years_with_bank",
                                "nbr_children"))

# Execute the td_transform_valib() function.
obj <- td_transform_valib(data=customer,
                          bins=c(bin_1, bin_2, bin_3),
                          derive=c(derive_1, derive_2, derive_3),
                          one.hot.encode=c(dc_1, dc_2),
                          fillna=c(fillna_1, fillna_2),
                          label.encode=c(rc_1, rc_2),
                          rescale=c(rs_1, rs_2),
                          sigmoid=c(sig_1, sig_2, sig_3),
                          retain=retain,
                          zscore=zscore,
                          key.columns="cust_id")

# Print the results.
print(obj$result)