Teradata Package for R Function Reference | 17.00 - 17.00 - tdLabelEncoder - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

tdLabelEncoder() allows user to recode a categorical data column to re-express existing values of a column (variable) into a new coding scheme or to correct data quality problems and focus on analysis of a particular value. It allows for mapping individual values, NULL values, or any number of remaining values (ELSE option) to a new value, a NULL value or the same value.
Label encoding supports character, numeric, and date type columns.
Note:

  • Object of this class is passed to "label.encode" argument of td_transform_valib().

Usage

tdLabelEncoder(values, columns, default=NULL, datatype=NULL,
               fillna=NULL)

Arguments

values

Required Argument.
Specifies the values to be label encoded. Values can be specified in following format:

  • A named list, where each key is the old value and value is the new value.
    For example,
    values=list(old_val1=new_val2, old_val2=new_val2)

Notes:

  1. If date values are entered, the keyword 'DATE' must precede the date value, and do not enclose in single quotes.
    For example,
    value='DATE 1987-06-09'

  2. To keep the old value as is, one can pass 'same' as it's new value.

  3. To use NULL values for old or new value, one can pass NULL.

  4. Old value should be enclosed in quotes irrespective of the actual datatype.
    For example,
    If old value is an integer, numeric or logical: value=list("1"=3.5, "2.5"=3, "TRUE"="yes")

Types: named list of integer, numeric, logical or character

columns

Required Argument.
Specifies name(s) of column(s) containing the input and output column names, where key is the name of the column containing values to be label encoded and value contains the name of the transformed output column. When only key is specified then output column name is the name of input column.
Types: character OR list of Strings (character)

default

Optional Argument.
Specifies the value assumed for all other cases.
Permitted Values: 'SAME', 'NULL', a literal
Default Value: NULL
Types: logical, numeric, integer, character

datatype

Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using the permitted strings below:

------------------------------------ ---------------------------------------
If intended SQL Data Type is Permitted Value to be passed is
------------------------------------ ---------------------------------------
bigint bigint
byteint byteint
char(n) char,n
date date
decimal(m,n) decimal,m,n
float float
integer integer
number(*) number
number(n) number,n
number(*,n) number,*,n
number(n,n) number,n,n
smallint smallint
time(p) time,p
timestamp(p) timestamp,p
varchar(n) varchar,n

Notes:

  1. Argument is ignored if "columns" argument is not used.

  2. char without a size is not supported.

  3. number(*) does not include the * in its datatype format.

Examples:

  1. If intended datatype for the output column is 'bigint', then pass string 'bigint' to the argument as shown below:
    datatype="bigint"

  2. If intended datatype for the output column is 'decimal(3,5)', then pass string 'decimal,3,5' to the argument as shown below:
    datatype="decimal,3,5"

Types: character

fillna

Optional Argument.
Specifies whether the null replacement/missing value treatment should be performed with recoding or not. Output of tdFillNa() can be passed to this argument.
Note:

  • If the tdFillNa object is created with its arguments "columns" and "datatype", then values passed in tdFillNa() arguments are ignored. Only nullstyle information is captured from the same.

Types: tdFillNa

Value

An object of tdLabelEncoder class.

Examples

# Notes:
# 1. To run any transformation, user needs to use td_transform_valib()
#    function.
# 2. To do so set option 'val.install.location' to the database name
#    where Vantage analytic library functions are installed.
# 3. Datasets used in these examples can be loaded using Vantage Analytic
#    Library installer.

# Get the current context/connection
con <- td_get_context()$connection

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Create object(s) of class "tbl_teradata".
admissions_train <- tbl(con, "admissions_train")
admissions_train

# Example 1: Recode all values 'Novice', 'Advanced', and 'Beginner'
#            in 'programming' and 'stats' columns.
#            "values" argument takes named list of old column values to new
#            column values.
rc <- tdLabelEncoder(values=list("Novice"=1, "Advanced"=2, "Beginner"=3),
                     columns=list("stats", "programming"))
obj <- td_transform_valib(data=admissions_train, label.encode=rc)
obj$result

# Example 2: Recode value 'Novice' as 1 which is passed as named list to
#            "values" argument and recode other values in 'programming' and
#            'stats' columns as 0 by passing it to "default" argument.
rc <- tdLabelEncoder(values=list("Novice"=1),
                     columns=list("stats", "programming"), default=0)
obj <- td_transform_valib(data=admissions_train, label.encode=rc)
obj$result

# Example 3: Recode values differently for different columns.
# For values in 'programming' column, recoding is done as follows:
#   Novice --> 0
#   Advanced --> 1 and
#   Rest of the values as --> NULL
rc_prog <- tdLabelEncoder(values=list("Novice"=0, "Advanced"=1),
                          columns="programming", default=NULL)

# For values in 'stats' column, recoding is done as follows:
#   Novice --> N
#   Advanced --> keep it as is and
#   Beginner --> NULL
rc_stats <- tdLabelEncoder(values=list("Novice"=0, "Advanced"="same",
                                       "Beginner"=NULL),
                           columns="stats")

# For values in 'masters' column, recoding is done as follows:
#   yes --> 1 and other as 0
rc_yes <- tdLabelEncoder(values=list("yes"=1),
                         columns=list("masters"="masters_yes"),
                         default=0)

# For values in 'masters' column, label encoding is as follows:
#   no --> 1 and other as 0
rc_no <- tdLabelEncoder(values=list("no"=1),
                        columns=list("masters"="masters_no"), default=0)

obj <- td_transform_valib(data=admissions_train,
                          label.encode=c(rc_prog, rc_stats, rc_yes,rc_no))
obj$result