Teradata Package for R Function Reference | 17.20 - Unpack - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

Unpack

Description

The td_unpack_sqle() function unpacks data from a single packed column into multiple columns. The packed column is composed of multiple virtual columns, which become the output columns. To determine the virtual columns, the function must have either the delimiter that separates them in the packed column or their lengths.

Usage

  td_unpack_sqle (
      data = NULL,
      input.column = NULL,
      output.columns = NULL,
      output.datatypes = NULL,
      delimiter = ",",
      column.length = NULL,
      regex = "(.*)",
      regex.set = 1,
      exception = FALSE,
      accumulate = NULL,
      ...
  )

Arguments

data

Required Argument.
Specifies the tbl_teradata containing the input attributes.
Types: tbl_teradata

input.column

Required Argument.
Specifies the name of the input column that contains the packed data.
Types: character

output.columns

Required Argument.
Specifies the names to give to the output columns, in the order in
which the corresponding virtual columns appear in "input.column". If you specify fewer output column names than there are in virtual input columns, the function ignores the extra virtual input columns. That is, if the packed data contains x+y virtual columns and the "output.columns" argument specifies x output column names, the function assigns the names to the first x virtual columns and ignores the remaining y virtual columns.
Types: character OR vector of Strings (character)

output.datatypes

Required Argument.
Specifies the datatypes of the unpacked output columns.Supported output.datatypes are VARCHAR, integer, float, TIME, DATE, and TIMESTAMP. If "output.datatypes" specifies only one value and "output.columns" specifies multiple columns, then the specified value applies to every output_column. If "output.datatypes" specifies multiple values, then it must specify a value for each output_column.
The nth datatype corresponds to the nth output_column.The function can output only 16 VARCHAR columns.
Types: character OR vector of Strings (character)

delimiter

Optional Argument.
Specifies the delimiter (a string) that separates the virtual columns in the packed data. If the virtual columns are separated by a delimiter, then specify the delimiter with this argument; otherwise, specify the "column.length" argument. Do not specify both - this argument and the "column.length" argument.
Default Value: ","
Types: character

column.length

Optional Argument.
Specifies the lengths of the virtual columns; therefore, to use this argument, you must know the length of each virtual column. If "column.length" specifies only one value and "output.columns" specifies multiple columns, then the specified value applies to every output_column. If "column.length" specifies multiple values, then it must specify a value for each output_column. The nth datatype corresponds to the nth output_column. However, the last column_name can be an asterisk (*), which represents a single virtual column that contains the remaining data. For example, if the first three virtual columns have the lengths 2, 1, and 3, and all remaining data belongs to the fourth virtual column, you can specify "column.length" as ("2", "1", "3", *). If you specify this argument, you must omit the delimiter argument.
Types: character OR vector of Strings (character)

regex

Optional Argument.
Specifies a regular expression that describes a row of packed data, enabling the function to find the data values. A row of packed data contains one data value for each virtual column, but the row might also contain other information (such as the virtual column name). In the "regex", each data value is enclosed in parentheses.
For example, suppose that the packed data has two virtual columns, age and sex, and that one row of packed data is: age:34,sex:male The "regex" that describes the row is ".*:(.*)". The ".*:" matches the virtual column names, age and sex, and the "(.*)" matches the values, 34 and male. The default "regex" is "(.*)" which matches the whole string (between delimiters, if any). When applied to the preceding sample row, the default "regex" causes the function to return "age:34" and "sex:male" as data values. To represent multiple data groups in "regex", use multiple pairs of parentheses. By default, the last data group in "regex" represents the data value (other data groups are assumed to be virtual column names or unwanted data). If a different data group represents the data value, specify its group number with the "regex.set" argument.
Default Value: "(.*)"
Types: character

regex.set

Optional Argument.
Specifies the ordinal number of the data group in "regex" that represents the data value in a virtual column. By default, the last data group in "regex" represents the data value.
For example, suppose that "regex" is: "([a-zA-Z]*):(.*)". If group number is "1", then "([a-zA-Z]*)" represents the data value. If group number is "2", then "(.*)" represents the data value.
Default Value: 1
Types: integer

exception

Optional Argument.
Specifies whether the function ignores rows that contain invalid data;
By default the function to fails if it encounters a row with invalid data.
Default Value: FALSE
Types: logical

accumulate

Optional Argument.
Specifies the name(s) of input tbl_teradata column(s) to copy to the output. By default, the function copies no input columns to the output.
Types: character OR vector of Strings (character)

...

Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not.
When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the function in a volatile table or not.
When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.hash.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.order.column" accepts character OR vector of Strings (character) (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQLE function supports it, else an exception is raised.

Value

Function returns an object of class "td_unpack_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("unpack_example","ville_tempdata","ville_tempdata1")
    
    # Create tbl_teradata object.
    ville_tempdata1 <- tbl(con, "ville_tempdata1")
    ville_tempdata <- tbl(con, "ville_tempdata")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1: Delimiter separates Virtual Columns.
    #           The input table, ville_tempdata, is a collection of temperature readings
    #           for two cities, Nashville and Knoxville, in the state of Tennessee.
    #           In the column of packed data, the delimiter comma (,) separates the virtual
    #           columns.
    unpack_out1 <- td_unpack_sqle(data=ville_tempdata,
                                  input.column='packed_temp_data',
                                  output.columns=c('city','state','temp_f'),
                                  output.datatypes=c('varchar','varchar','real'),
                                  delimiter=',',
                                  regex='(.*)',
                                  regex.set=1,
                                  exception=TRUE)

    # Print the results.
    print(unpack_out1$result)

    # Example 2: No Delimiter separates Virtual Columns.
    #            The input, ville_tempdata1, contains same data as the previous example,
    #            except that no delimiter separates the virtual columns in the packed data.
    #            To enable the function to determine the virtual columns, the function call
    #            specifies the column lengths.
    unpack_out2 <- td_unpack_sqle(data=ville_tempdata1,
                                  input.column='packed_temp_data',
                                  output.columns=c('city','state','temp_f'),
                                  output.datatypes=c('varchar','varchar','real'),
                                  column.length=c('9','9','4'),
                                  regex='(.*)',
                                  regex.set=1,
                                  exception=TRUE)
    
    # Print the results.
    print(unpack_out2$result)