Teradata Package for R Function Reference | 17.20 - TargetEncodingTransform - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

TargetEncodingTransform

Description

The td_target_encoding_transform_sqle() function takes the input data and a fit data generated by the td_target_encoding_fit_sqle() function for encoding the categorical values.

Notes:

  • This function requires the UTF8 client character set.

  • This function does not support Pass-Through Characters (PTCs).

  • This function does not support KanjiSJIS or Graphic data types.

Usage considerations for td_target_encoding_transform_sqle are:

  • Errors are generated in these cases: * When the td_fit_sqle data does not meet the criteria. * When category from input data is not found in the td_fit_sqle data and the "default_values" argument is also not used during td_target_encoding_fit_sqle() function.

Usage

  td_target_encoding_transform_sqle (
      data = NULL,
      object = NULL,
      accumulate = NULL,
      ...
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata

object

Required Argument.
Specifies the tbl_teradata containing the fit parameters generated by td_target_encoding_fit_sqle() function or the instance of td_target_encoding_fit_sqle.
Types: tbl_teradata or td_target_encoding_fit_sqle

accumulate

Optional Argument.
Specifies the name(s) of input tbl_teradata column(s) to be copied to the output.
Notes:

  • The maximum length supported is 128.

  • The maximum list length is 2047.

  • "accumulate" are not case sensitive.

Types: character OR vector of Strings (character)

...

Specifies the generic keyword arguments SQLE functions accept. Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_target_encoding_transform_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("tdplyr_example", "titanic")
    
    # Create tbl_teradata object.
    data_input <- tbl(con, "titanic")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Find the distinct values and counts for column 'sex' and 'embarked'.
    categorical_summ <- td_categorical_summary_sqle(data=data_input,
                                                    target.columns = c("sex", "embarked"))
    
    # Find the distinct count of 'sex' and 'embarked' in which only 2 column should be present
    # name 'ColumnName' and 'CategoryCount'.
    category_data <- categorical_summ$result 
    
    # Generates the required hyperparameters when "encoder_method" is 'CBM_BETA'.
    TargetEncodingFit_out <- td_target_encoding_fit_sqle(data=data_input,
                                                         category.data=category_data,
                                                         encoder.method='CBM_BETA',
                                                         target.columns=c('sex', 'embarked'),
                                                         response.column='survived',
                                                         default.values=c(-1, -2))
    
    # Example 1 : Encode the column 'sex' and 'embarked'.
    TargetEncodingTransform_out <- td_target_encoding_transform_sqle(data=data_input,
                                                                     object=TargetEncodingFit_out,
                                                                     accumulate="passenger")
    
    # Print the result.
    print(TargetEncodingTransform_out$result)
    
    # Alternatively use S3 transform function to run transform on the output of
    # td_target_encoding_fit_sqle() function.
     TargetEncodingTransform_out <- transform(TargetEncodingFit_out,
                                              data=data_input,
                                              accumulate="passenger")
    
    # Print the result.
    print(TargetEncodingTransform_out$result)