Description
tdLabelEncoder()
allows user to recode a categorical data column to
re-express existing values of a column (variable) into a new coding scheme
or to correct data quality problems and focus on analysis of a particular
value. It allows for mapping individual values, NULL values, or any number
of remaining values (ELSE option) to a new value, a NULL value or
the same value.
Label encoding supports character, numeric, and date type columns.
Note:
Object of this class is passed to "label.encode" argument of
td_transform_valib()
.
Usage
tdLabelEncoder(values, columns, default=NULL, datatype=NULL,
fillna=NULL)
Arguments
values |
Required Argument.
Notes:
Types: named list of integer, numeric, logical or character | ||||||||||||||||||||||||||||||||||||
columns |
Required Argument. | ||||||||||||||||||||||||||||||||||||
default |
Optional Argument. | ||||||||||||||||||||||||||||||||||||
datatype |
Optional Argument.
Notes:
Examples:
Types: character | ||||||||||||||||||||||||||||||||||||
fillna |
Optional Argument.
Types: tdFillNa |
Value
An object of tdLabelEncoder class.
Examples
# Notes:
# 1. To run any transformation, user needs to use td_transform_valib()
# function.
# 2. To do so set option 'val.install.location' to the database name
# where Vantage analytic library functions are installed.
# 3. Datasets used in these examples can be loaded using Vantage Analytic
# Library installer.
# Get the current context/connection
con <- td_get_context()$connection
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Create object(s) of class "tbl_teradata".
admissions_train <- tbl(con, "admissions_train")
admissions_train
# Example 1: Recode all values 'Novice', 'Advanced', and 'Beginner'
# in 'programming' and 'stats' columns.
# "values" argument takes named list of old column values to new
# column values.
rc <- tdLabelEncoder(values=list("Novice"=1, "Advanced"=2, "Beginner"=3),
columns=list("stats", "programming"))
obj <- td_transform_valib(data=admissions_train, label.encode=rc)
obj$result
# Example 2: Recode value 'Novice' as 1 which is passed as named list to
# "values" argument and recode other values in 'programming' and
# 'stats' columns as 0 by passing it to "default" argument.
rc <- tdLabelEncoder(values=list("Novice"=1),
columns=list("stats", "programming"), default=0)
obj <- td_transform_valib(data=admissions_train, label.encode=rc)
obj$result
# Example 3: Recode values differently for different columns.
# For values in 'programming' column, recoding is done as follows:
# Novice --> 0
# Advanced --> 1 and
# Rest of the values as --> NULL
rc_prog <- tdLabelEncoder(values=list("Novice"=0, "Advanced"=1),
columns="programming", default=NULL)
# For values in 'stats' column, recoding is done as follows:
# Novice --> N
# Advanced --> keep it as is and
# Beginner --> NULL
rc_stats <- tdLabelEncoder(values=list("Novice"=0, "Advanced"="same",
"Beginner"=NULL),
columns="stats")
# For values in 'masters' column, recoding is done as follows:
# yes --> 1 and other as 0
rc_yes <- tdLabelEncoder(values=list("yes"=1),
columns=list("masters"="masters_yes"),
default=0)
# For values in 'masters' column, label encoding is as follows:
# no --> 1 and other as 0
rc_no <- tdLabelEncoder(values=list("no"=1),
columns=list("masters"="masters_no"), default=0)
obj <- td_transform_valib(data=admissions_train,
label.encode=c(rc_prog, rc_stats, rc_yes,rc_no))
obj$result