Teradata Package for R Function Reference | 17.20 - OneHotEncodingFit - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

OneHotEncodingFit

Description

The td_one_hot_encoding_fit_sqle() function outputs a tbl_teradata of attributes and categorical values to input to td_one_hot_encoding_transform_sqle() function, which encodes them as one-hot numeric vectors.
Notes:

  • This function requires the UTF8 client character set for UNICODE data.

  • This function does not support Pass Through Characters (PTCs).

  • This function does not support KanjiSJIS or Graphic data types.

Usage

  td_one_hot_encoding_fit_sqle (
      data = NULL,
      category.data = NULL,
      target.column = NULL,
      attribute.column = NULL,
      value.column = NULL,
      is.input.dense = NULL,
      approach = "LIST",
      categorical.values = NULL,
      target.column.names = NULL,
      categories.column = NULL,
      other.column = "other",
      category.counts = NULL,
      target.attributes = NULL,
      other.attributes = NULL,
      ...
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata

category.data

Optional Argument.
Specifies the data containing the input categories for 'LIST' approach.
Types: tbl_teradata

target.column

Required when "is.input.dense" is set to TRUE, disallowed otherwise.
Specifies the name of the column in "data" to be encoded.
Note:

  • The maximum number of unique columns in the "target.column" argument is 2018.

Types: character OR vector of Strings (character)

attribute.column

Required when "is.input.dense" is set to FALSE, disallowed otherwise.
Specifies the name of the column in "data" which contains attribute names.
Types: character

value.column

Required when "is.input.dense" is set to FALSE, disallowed otherwise.
Specifies the name of the column in "data" which contains attribute values.
Types: character

is.input.dense

Required Argument.
Specifies whether input is in dense format or sparse format.
Note:
"category.data" is meant for dense input format and not for sparse format.
Types: logical

approach

Optional Argument.
Specifies whether to determine categories automatically from the
input data (AUTO approach) or the user provided list (LIST approach).
Default Value: "LIST"
Permitted Values: "AUTO", "LIST"
Types: character

categorical.values

Required when "approach" is set to 'LIST' and a single value is present in "target.column", optional otherwise.
Specifies the list of categories that need to be encoded in the desired order.
When only one target column is provided, category values are read from this argument. Otherwise, they will be read from the "category.data".
Notes:

  • The number of characters in "target.column.names" plus the number of characters in the category specified in the "categorical.values" argument must be less than 128 characters.

  • The maximum number of categories in the "categorical.values" argument is 2018.

Types: character OR vector of Strings (character)

target.column.names

Required when "category.data" is used, optional otherwise.
Specifies the "category.data" column which contains the names of the target columns.
Types: character

categories.column

Required when "category.data" is used, optional otherwise.
Specifies the "category.data" column which contains the category values.
Types: character

other.column

Optional when "is.input.dense" is set to TRUE, disallowed otherwise.
Specifies the column name for the column representing one-hot encoding for values other than the ones specified in the "categorical.values" argument or "category.data" or categories found through the 'auto' approach.
Default Value: 'other'
Types: character

category.counts

Required when "category.data" is used or "approach" is set to 'auto', optional otherwise.
Specifies the category counts for each of the "target.column".
The number of values in "category.counts" should be the same as the number of "target.column".
Types: character OR vector of Strings (character)

target.attributes

Required when "is.input.dense" is set to FALSE, disallowed otherwise.
Specifies one or more attributes to encode in one-hot form. Every target attribute must be in "attribute.column".
Types: character OR vector of Strings (character)

other.attributes

Optional when "is.input.dense" is set to FALSE, disallowed otherwise.
For each target attribute, specifies a category name for attributes that "target.attributes" does not specify. The nth "other.attributes" corresponds to the nth "target_attribute".
Notes:

  • The number of characters in values specified in the "target.attributes" argument plus the number of characters in values specified in the "other.attributes" argument must be less than 128 characters.

  • The number of values passed to the "target.attributes" argument and "other.attributes" argument must be equal.

Types: character OR vector of Strings (character)

...

Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not.
When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the function in a volatile table or not.
When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.hash.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.order.column" accepts character OR vector of Strings (character) (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_one_hot_encoding_fit_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("tdplyr_example", "titanic", "cat_table")
    
    # Create tbl_teradata object.
    titanic_data <- tbl(con, "titanic")
    cat_data <- tbl(con, "cat_table")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1: Generate fit object to encode 'male' and 'female' values of column 'sex'.
    fit_obj1 <- td_one_hot_encoding_fit_sqle(data=titanic_data,
                                             is.input.dense=TRUE,
                                             target.column="sex",
                                             categorical.values=c("male", "female"),
                                             other.column="other")
    
    # Print the result.
    print(fit_obj1$result)
    
    # Example 2: Generate fit object to encode column 'sex' and 'embarked' in dataset.
    fit_obj2 <- td_one_hot_encoding_fit_sqle(data=titanic_data,
                                             is.input.dense=TRUE,
                                             approach="auto",
                                             target.column=c("sex", "embarked"),
                                             category.counts=c(2, 3),
                                             other.column="other")
    # Print the result.
    print(fit_obj2$result)
    
    # Example 3: Generate fit object when "category.data" is used.
    fit_obj3 <- td_one_hot_encoding_fit_sqle(data=titanic_data,
                                             category.data=cat_data,
                                             target.column.names="column_name",
                                             categories.column="category",
                                             is.input.dense=TRUE,
                                             target.column=c("sex", "embarked", "name"),
                                             category.counts=c(2, 4, 6),
                                             other.column="other")
    # Print the result.
    print(fit_obj3$result)
    
    # Example 4: Generate fit object when "approach" is set to 'LIST'.
    fit_obj4 <- td_one_hot_encoding_fit_sqle(data=titanic_data,
                                             is.input.dense=TRUE,
                                             approach="list",
                                             categorical.values=c('male','female'),
                                             target.column=c("sex"),
                                             other.column="other")
    # Print the result.
    print(fit_obj4$result)