Description
The Unpack function unpacks data from a single packed column into
multiple columns. The packed column is composed of multiple virtual
columns, which become the output columns. To determine the virtual
columns, the function must have either the delimiter that separates
them in the packed column or their lengths.
Usage
td_unpack_mle (
data = NULL,
input.column = NULL,
output.columns = NULL,
output.datatypes = NULL,
delimiter = ",",
column.length = NULL,
regex = "(.*)",
regex.set = 1,
exception = FALSE
)
Arguments
data |
Required Argument.
Specifies the tbl_teradata containing the input attributes.
|
input.column |
Required Argument.
Specifies the name of the input column that contains the packed data.
|
output.columns |
Required Argument.
Specifies the names to give to the output columns, in the order in
which the corresponding virtual columns appear in input_column.
|
output.datatypes |
Required Argument.
Specifies the datatypes of the unpacked output columns. If
output.datatypes specifies only one value and output.columns
specifies multiple columns, then the specified value applies to every
output_column. If output.datatypes specifies multiple values, then it
must specify a value for each output_column. The nth datatype
corresponds to the nth output_column.
|
delimiter |
Optional Argument.
Specifies the delimiter (a string) that separates the virtual
columns in the packed data. If the delimiter contains a character that is
a symbol in a regular expression, such as, an asterisk (*) or pipe
character (|) then precede it with two escape characters. For example, if
the delimiter is the pipe character, specify "\|". The default
delimiter is comma (,). If the virtual columns are separated by a
delimiter, then specify the delimiter with this argument; otherwise,
specify the column.length argument. Do not specify both of these arguments
and the column.length argument.
Default Value: ","
|
column.length |
Optional Argument.
Specifies the lengths of the virtual columns. To use this
argument, you must know the length of each virtual column. If
column.length specifies only one value and output.columns specifies
multiple columns, then the specified value applies to every
output_column. If column.length specifies multiple values, then it
must specify a value for each output_column. The nth datatype
corresponds to the nth output_column. However, the last column_name
can be an asterisk (*), which represents a single virtual column that
contains the remaining data. For example, if the first three virtual
columns have the lengths 2, 1, and 3, and all remaining data belongs
to the fourth virtual column, you can specify column.length ("2",
"1", "3", *). If you specify this argument, you must omit the
delimiter argument.
|
regex |
Optional Argument.
Specifies a regular expression that describes a row of packed data,
enabling the function to find the data values. A row of packed data
contains one data value for each virtual column, but the row might
also contain other information (such as the virtual column name). In
the regular_expression, each data value is enclosed in parentheses.
For example, suppose that the packed data has two virtual columns,
age and sex, and that one row of packed data is: age:34,sex:male The
regular_expression that describes the row is ".*:(.*)". The ".*:"
matches the virtual column names, age and sex, and the "(.*)" matches
the values, 34 and male. The default regular_expression is "(.*)"
which matches the whole string (between delimiters, if any). When
applied to the preceding sample row, the default regular_expression
causes the function to return "age:34" and "sex:male" as data values.
To represent multiple data groups in regular_expression, use multiple
pairs of parentheses. By default, the last data group in
regular_expression represents the data value (other data groups are
assumed to be virtual column names or unwanted data). If a different
data group represents the data value, specify its group number with
the regex.set argument.
Default Value: "(.*)"
|
regex.set |
Optional Argument.
Specifies the ordinal number of the data group in regular_expression
that represents the data value in a virtual column. By default, the
last data group in regular_expression represents the data value. For
example, suppose that regular_expression is: "([a-zA-Z]*):(.*)" If
group_number is "1", then "([a-zA-Z]*)" represents the data value. If
group_number is "2", then "(.*)" represents the data value.
Default Value: 1
|
exception |
Optional Argument.
Specifies whether the function ignores rows that contain invalid
data, that is, continues without outputting them. which causes the
function to fail if it encounters a row with invalid data.
Default Value: FALSE
|
Value
Function returns a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator
using name: result
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("unpack_example", "ville_tempdata", "ville_tempdata1")
# Create remote tibble objects.
ville_tempdata <- tbl(con, "ville_tempdata")
ville_tempdata1 <- tbl(con, "ville_tempdata1")
# Example 1 -
td_unpack_out1 <- td_unpack_mle(data = ville_tempdata,
input.column = "packed_temp_data",
output.columns = c("city","state","temp_F"),
output.datatypes = c("varchar","varchar","real"),
delimiter = ",",
regex = '(.*)',
regex.set = 1,
exception = TRUE
)
# Example 2 -
td_unpack_out2 <- td_unpack_mle(data = ville_tempdata1,
input.column = "packed_temp_data",
output.columns = c("city","state","temp_F"),
output.datatypes = c("varchar","varchar","real"),
column.length = c("9","9","4"),
regex = '(.*)',
regex.set = 1,
exception = TRUE
)