Teradata Package for R Function Reference | 17.20 - OrdinalEncodingFit - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

OrdinalEncodingFit

Description

td_ordinal_encoding_fit_sqle() function identifies distinct categorical values from the input data or a user-defined list and generates the distinct categorical values along with the ordinal value for each category.

Notes:

  • Function requires the UTF8 client character set for UNICODE data.

  • Function does not support Pass Through Characters (PTCs).

  • Function does not support KanjiSJIS or Graphic data types.

  • The maximum number of unique categories in a particular column is 4000.

  • The maximum category length is 128 characters.

  • NULL categories are not encoded.

Usage

  td_ordinal_encoding_fit_sqle (
      data = NULL,
      category.data = NULL,
      target.column = NULL,
      approach = "AUTO",
      categories = NULL,
      ordinal.values = NULL,
      target.column.names = NULL,
      categories.column = NULL,
      ordinal.values.column = NULL,
      start.value = 0,
      default.value = NULL,
      ...
  )

Arguments

data

Required Argument.
Specifies the input data containing the categorical target column.
Types: tbl_teradata

category.data

Optional Argument.
Specifies the data containing the input categories for 'LIST' approach.
Types: tbl_teradata

target.column

Required Argument.
Specifies the name of the categorical input target column.
Note:
The maximum number of unique columns in the "target.column" argument is 2018.
Types: character OR vector of Strings (character)

approach

Optional Argument.
Specifies whether to determine categories automatically from the
input data ('AUTO' approach) or determine categories from the list provided by user ('LIST' approach).
Default Value: 'AUTO'
Permitted Values: 'AUTO', 'LIST'
Types: character

categories

Optional Argument.
Specifies the list of categories that need to be encoded in the desired order.
Notes:

  • If only one target column is provided, category values read from this argument. Otherwise, they read from the "category.data".

  • Required, when user use the 'LIST' approach and a single target column.

Types: character OR vector of Strings (character)

ordinal.values

Optional Argument.
Specifies the custom ordinal values to replace the categories,
when user use the 'LIST' approach for encoding the categorical values.
If user does not provide the "ordinal.values" and the "start.value", then by default, the first category contains the default start value '0', and the last category is assigned a value that is one lesser than the total number of categories.
For example, if there are three categories, then the categories contain the values 0, 1, 2 respectively.

However, if user only specify the ordinal values, then each ordinal value is associated with a categorical value. For example, if there are three categories and the ordinal values are 3, 4, 5 then the ordinal values are assigned to the respective categories.
The td_ordinal_encoding_fit_sqle() function returns an error when the ordinal value count does not match the categorical value count or if both the ordinal values and the start value are provided.
Notes:

  • User can either use the "ordinal.values" or the "start.value" argument in the syntax.

  • If only one target column is provided, ordinal values are read from this argument. Otherwise, they are read from the "category.data".

  • If omitted, ordinal values are generated based on the "start.value" argument.

Types: integer OR vector of integers

target.column.names

Required when "category.data" is used, optional otherwise.
Specifies the "category.data" column which contains the names of the
target columns.
Types: character

categories.column

Required when "category.data" is used, optional otherwise.
Specifies the "category.data" column which contains the category values.
Types: character

ordinal.values.column

Required when "category.data" is used, optional otherwise.
Specifies the "category.data" column which contains the ordinal values.
If omitted, ordinal values will be generated based on the "start.value" argument.
Types: character

start.value

Optional Argument.
Specifies the starting value for ordinal values list.
Default Value: 0
Types: integer

default.value

Optional Argument.
Specifies the ordinal value to use when category is not found.
Types: integer

...

Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_ordinal_encoding_fit_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):

  1. result

  2. output.data

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("tdplyr_example", "titanic","cat_table")
    
    # Create tbl_teradata object.
    titanic <- tbl(con, "titanic")
    cat_data <- tbl(con, "cat_table")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1: identifying distinct categorical values from the input.
    ordinal_encodingfit_res_1 <- td_ordinal_encoding_fit_sqle(target.column='sex',
                                                              data=titanic)
    
    # Print the result.
    print(ordinal_encodingfit_res_1$result)
    
    # Example 2: Identifying distinct categorical values from the input and
    #            returns the distinct categorical values along with the ordinal
    #            value for each category.
    ordinal_encodingfit_res_2 <- td_ordinal_encoding_fit_sqle(
                                                   target.column='sex',
                                                   approach='LIST',
                                                   categories=c('category0',
                                                                'category1'),
                                                   ordinal.values=c(1, 2),
                                                   start.value=0,
                                                   default.value=-1,
                                                   data=titanic)
    
    # Print the result.
    print(ordinal_encodingfit_res_2$result)
    
    # Example 3: Provide ordinal values to "target.column" using
    #            dataset by "category.data".
    ordinal_encodingfit_res_3 <- td_ordinal_encoding_fit_sqle(
                                   target.column=c('name',
                                                   'sex',
                                                   'ticket',
                                                   'cabin',
                                                   'embarked'),
                                   category.data=cat_data,
                                   approach='LIST',
                                   target.column.names="column_name",
                                   categories.column="category",
                                   ordinal.values.column="ordinal_value",
                                   default.value=c(-1, -10, -15, 20, 0),
                                   data=titanic)
    
    # Print the result.
    print(ordinal_encodingfit_res_3$result)