Teradata Package for R Function Reference | 17.00 - Unpack - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Published
July 2021
Language
English (United States)
Last Update
2023-08-08
dita:id
B700-4007
NMT
no
Product Category
Teradata Vantage
Unpack

Description

The Unpack function unpacks data from a single packed column into multiple columns. The packed column is composed of multiple virtual columns, which become the output columns. To determine the virtual columns, the function must have either the delimiter that separates them in the packed column or their lengths.

Usage

  td_unpack_mle (
    data = NULL,
    input.column = NULL,
    output.columns = NULL,
    output.datatypes = NULL,
    delimiter = ",",
    column.length = NULL,
    regex = "(.*)",
    regex.set = 1,
    exception = FALSE,
    data.sequence.column = NULL,
    data.order.column = NULL
  )

Arguments

data

Required Argument.
Specifies the tbl_teradata containing the input attributes.

data.order.column

Optional Argument.
Specifies Order By columns for "data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

input.column

Required Argument.
Specifies the name of the input column that contains the packed data.
Types: character

output.columns

Required Argument.
Specifies the names to give to the output columns, in the order in which the corresponding virtual columns appear in "input.column".
Types: character OR vector of characters

output.datatypes

Required Argument.
Specifies the datatypes of the unpacked output columns. If "output.datatypes" specifies only one value and "output.columns" specifies multiple columns, then the specified value applies to every output column. If "output.datatypes" specifies multiple values, then it must specify a value for each "output.column". The nth datatype corresponds to the nth output column.
Types: character OR vector of characters

delimiter

Optional Argument.
Specifies the delimiter (a string) that separates the virtual columns in the packed data. If "delimiter" contains a character that is a symbol in a regular expression, such as, an asterisk (*) or pipe character (|), precede it with two escape characters. For example, if the delimiter is the pipe character, specify "\|". The default "delimiter" is comma (,). If the virtual columns are separated by a delimiter, then specify the delimiter with this argument; otherwise, specify the "column.length" argument. Do not specify this and the "column.length" argument together.
Default Value: "," Types: character

column.length

Optional Argument.
Specifies the lengths of the virtual columns. To use this argument, you must know the length of each virtual column. If "column.length" specifies only one value and "output.columns" specifies multiple columns, then the specified value applies to every output column. If "column.length" specifies multiple values, then it must specify a value for each "output.column". The nth datatype corresponds to the nth output column. However, the last column name can be an asterisk (*), which represents a single virtual column that contains the remaining data. For example, if the first three virtual columns have the lengths 2, 1, and 3, and all remaining data belongs to the fourth virtual column, you can specify "column.length" as c("2", "1", "3", *). If you specify this argument, you must omit the "delimiter" argument.
Types: character OR vector of characters

regex

Optional Argument.
Specifies a regular expression that describes a row of packed data, enabling the function to find the data values. A row of packed data contains one data value for each virtual column, but the row might also contain other information (such as the virtual column name). In the regular_expression, each data value is enclosed in parentheses. For example, suppose that the packed data has two virtual columns, age and sex, and that one row of packed data is: age:34,sex:male The regular_expression that describes the row is ".*:(.*)". The ".*:" matches the virtual column names, age and sex, and the "(.*)" matches the values, 34 and male. Default regular expression matches the whole string (between delimiters, if any). When applied to the preceding sample row, the default regular expression causes the function to return "age:34" and "sex:male" as data values. To represent multiple data groups in regular_expression, use multiple pairs of parentheses. By default, the last data group in regular expression represents the data value (other data groups are assumed to be virtual column names or unwanted data). If a different data group represents the data value, specify its group number with the regex.set argument.
Default Value: "(.*)"
Types: character

regex.set

Optional Argument.
Specifies the ordinal number of the data group in regular expression that represents the data value in a virtual column. By default, the last data group in regular expression represents the data value. For example, suppose that regular expression is: "([a-zA-Z]*):(.*)" If group number is "1", then "([a-zA-Z]*)" represents the data value. If group number is "2", then "(.*)" represents the data value.
Default Value: 1
Types: integer

exception

Optional Argument.
Specifies whether the function ignores rows that contain invalid data, that is, continues without outputting them, which causes the function to fail if it encounters a row with invalid data.
Default Value: FALSE
Types: logical

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_unpack_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using name: result.

Examples

  
    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("unpack_example", "ville_tempdata", "ville_tempdata1")

    # Create object(s) of class "tbl_teradata".
    ville_tempdata <- tbl(con, "ville_tempdata")
    ville_tempdata1 <- tbl(con, "ville_tempdata1")

    # Example 1 -
    td_unpack_out1 <- td_unpack_mle(data = ville_tempdata,
                                    input.column = "packed_temp_data",
                                    output.columns = c("city","state","temp_F"),
                                    output.datatypes = c("varchar","varchar","real"),
                                    delimiter = ",",
                                    regex = '(.*)',
                                    regex.set = 1,
                                    exception = TRUE
                                    )

    # Example 2 -
    td_unpack_out2 <- td_unpack_mle(data = ville_tempdata1,
                                    input.column = "packed_temp_data",
                                    output.columns = c("city","state","temp_F"),
                                    output.datatypes = c("varchar","varchar","real"),
                                    column.length = c("9","9","4"),
                                    regex = '(.*)',
                                    regex.set = 1,
                                    exception = TRUE
                                    )