Teradata Package for R Function Reference | 17.20 - ColumnTransformer - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
lifecycle
latest
Product Category
Teradata Vantage

ColumnTransformer

Description

The td_column_transformer_sqle() function transforms the input data columns in a single operation. Provide only the FIT tbl_teradata objects generated by the td_fit_sqle analytic functions, and the function runs all transformations that user require in a single operation.
The function performs the following transformations:

  • Scale Transform

  • Bincode Transform

  • Function Transform

  • NonLinearCombine Transform

  • OutlierFilter Transform

  • PolynomialFeatures Transform

  • RowNormalize Transform

  • OrdinalEncoding Transform

  • OneHotEncoding Transform

  • SimpleImpute Transform

User must create the FIT tbl_teradata before using the function and must be provided in the same order as in the training data sequence to transform the dataset. The FIT tbl_teradata can have maximum of 128 columns.

Usage

  td_column_transformer_sqle (
      input.data = NULL,
      bincode.fit.data = NULL,
      function.fit.data = NULL,
      nonlinearcombine.fit.data = NULL,
      onehotencoding.fit.data = NULL,
      ordinalencoding.fit.data = NULL,
      outlierfilter.fit.data = NULL,
      polynomialfeatures.fit.data = NULL,
      rownormalize.fit.data = NULL,
      scale.fit.data = NULL,
      simpleimpute.fit.data = NULL,
      fillrowid.column.name = NULL,
      ...
  )

Arguments

input.data

Required Argument.
Specifies the tbl_teradata that contains input data.
Types: tbl_teradata

bincode.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_bincode_fit_sqle() function.
Types: tbl_teradata

function.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_fit_sqle() function.
Types: tbl_teradata

nonlinearcombine.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_non_linear_combine_fit_sqle() function.
One can pass multiple td_non_linear_combine_fit_sqle() function's output using argument names as nonlinearcombine_fit_data1, nonlinearcombine_fit_data2 and so on.
Notes:

  • Function considers the arguments which are in sequence and ignores the arguments

which are out of sequence.
Example - If arguments are nonlinearcombine.fit.data, nonlinearcombine_fit_data1, nonlinearcombine_fit_data2, nonlinearcombine_fit_data4, the function considers only nonlinearcombine.fit.data, nonlinearcombine_fit_data1 and nonlinearcombine_fit_data2 for processing and ignores nonlinearcombine_fit_data4 since it is out of sequence.

  • Argument sequence starts from "nonlinearcombine.fit.data", followed by

"nonlinearcombine_fit_data1", "nonlinearcombine_fit_data2", so on..
Types: tbl_teradata

onehotencoding.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_one_hot_encoding_fit_sqle() function.
Types: tbl_teradata

ordinalencoding.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_ordinal_encoding_fit_sqle() function.
Types: tbl_teradata

outlierfilter.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_outlier_filter_fit_sqle() function.
Types: tbl_teradata

polynomialfeatures.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_polynomial_features_fit_sqle() function.
Types: tbl_teradata

rownormalize.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_row_normalize_fit_sqle() function.
Types: tbl_teradata

scale.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_scale_fit_sqle() function.
Types: tbl_teradata

simpleimpute.fit.data

Optional Argument.
Specifies the tbl_teradata generated by the td_simple_impute_fit_sqle() function.
Types: tbl_teradata

fillrowid.column.name

Optional Argument.
Specifies the name for the output column in which unique identifiers for each row are
populated.
Types: character

...

Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQLE Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_column_transformer_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("tdplyr_example", "titanic")
    
    # Create tbl_teradata object.
    titanic <- tbl(con, "titanic")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1: Perform Bincode transformation and OneHotEncoding 
    #            transformation using td_bincode_fit_sqle() and
    #            td_column_transformer_sqle() function.
    bin_code <- td_bincode_fit_sqle(data=titanic,
                          target.columns='age',
                          method.type='Equal-Width',
                          nbins=2,
                          label.prefix='label_prefix')
    
    one_hot_encoding <- td_one_hot_encoding_fit_sqle(
                         data=titanic,
                         is.input.dense=TRUE,
                         target.column="sex",
                         categorical.values=c("male", "female"),
                         other.column="other")
    
    ColumnTransformer_out <- td_column_transformer_sqle(
                              fillrowid.column.name="output_value",
                              input.data=titanic,
                              bincode.fit.data=bin_code$output,
                              onehotencoding.fit.data=one_hot_encoding$result)
    
    # Print the result.
    print(ColumnTransformer_out$result)
    
    # Example 2: Perform multiple NonLinearCombineFit transformation 
    #            using td_bincode_fit_sqle() function and 
    #            td_column_transformer_sqle() function.
    # Create multiple td_non_linear_combine_fit_sqle objects and 
    # pass those to td_column_transformer_sqle.
    NLC_o1 <- td_non_linear_combine_fit_sqle(
               data = titanic,
               target.columns = c("sibsp", "parch"),
               formula = "Y=(X0+X1+1)",
               result.column = "total_cost")
    
    NLC_o2 <- td_non_linear_combine_fit_sqle(
               data = titanic,
               target.columns = c( "fare"),
               formula = "Y=(X0+1)",
               result.column = "total_fare_cost2")
    
    NLC_o3 <- td_non_linear_combine_fit_sqle(
               data = titanic,
               target.columns = c( "fare"),
               formula = "Y=(X0+3)",
               result.column = "total_fare_cost3")
    
    ColumnTransformer_out2 <- 
      td_column_transformer_sqle(
       input.data=titanic,
       nonlinearcombine.fit.data=NLC_o1$result,
       nonlinearcombine.fit.data.order.column=c('parch'),
       nonlinearcombine.fit.data1=NLC_o2$result,
       nonlinearcombine.fit.data1.order.column=c('fare'),
       nonlinearcombine.fit.data2=NLC_o3$result)
    
    # Print the result.
    print(ColumnTransformer_out2$result)
    
    # Example 3: Perform Bincode Fit transformation along with Oridnal Encoding 
    #            Fit and One Hot Encoding Fit transformation using 
    #            td_bincode_fit_sqle() and td_column_transformer_sqle() function.
    # Create td_bincode_fit_sqle object along with td_ordinal_encoding_fit_sqle
    # and td_one_hot_encoding_fit_sqle objects, and pass 
    # those to td_column_transformer_sqle.
    bin_code <- td_bincode_fit_sqle(
                          data=titanic,
                          target.columns='age',
                          method.type='Equal-Width',
                          nbins=2,
                          label.prefix='label_prefix')
    
    one_hot_encoding <- td_one_hot_encoding_fit_sqle(
                          data=titanic,
                          is.input.dense=TRUE,
                          target.column="embarked",
                          categorical.values=c("c", "s"),
                          other.column="other")
    
    ord_enc_fit <- td_ordinal_encoding_fit_sqle(target.column='sex',
                                                data=titanic)
    
    ColumnTransformer_out3 <- td_column_transformer_sqle(
                               fillrowid.column.name="output_value",
                               input.data=titanic,
                               bincode.fit.data=bin_code$output,
                               onehotencoding.fit.data=one_hot_encoding$result,
                               ordinalencoding.fit.data=ord_enc_fit$result)
    
    # Print the result.
    print(ColumnTransformer_out3$result)