Teradata R Package Function Reference | 17.00 - 17.00 - Burst - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The Burst function bursts (splits) a time interval into a series of shorter "burst" intervals and allocates values from the time intervals into the new, shorter subintervals. The Burst function is useful for allocating values from overlapping time intervals into user-defined time intervals (for example, when a cable company has customer data from overlapping time intervals, which it wants to analyze by dividing into uniform time intervals). The Burst function supports several allocation methods.
The burst intervals can have either the same length (specified by the "time.interval" argument), the same number of data points (specified by the "num.points" argument), or specific start and end times (specified by "time.data").

Usage

  td_burst_mle (
      data = NULL,
      time.data = NULL,
      time.column = NULL,
      value.columns = NULL,
      time.interval = NULL,
      time.datatype = NULL,
      value.datatype = NULL,
      start.time = NULL,
      end.time = NULL,
      num.points = NULL,
      values.before.first = NULL,
      values.after.last = NULL,
      split.criteria = "nosplit",
      seed = NULL,
      accumulate = NULL,
      data.sequence.column = NULL,
      time.data.sequence.column = NULL,
      data.partition.column = NULL,
      time.data.partition.column = NULL,
      data.order.column = NULL,
      time.data.order.column = NULL
  )

Arguments

data

Required Argument.
Specifies the tbl_teradata containing the time series data.

data.partition.column

Required Argument.
Specifies Partition By columns for "data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

data.order.column

Optional Argument.
Specifies Order By columns for "data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

time.data

Optional Argument.
Specifies the tbl_teradata containing the time data.
Note: Specify exactly one of the following arguments:
"time.data", "time.interval", or "num.points".

time.data.partition.column

Optional Argument. Required if "time.data" is specified.
Specifies Partition By columns for "time.data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

time.data.order.column

Optional Argument.
Specifies Order By columns for "time.data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

time.column

Required Argument.
Specifies the names of the input tbl_teradata columns that contain the start and end times of the time interval to be burst. This argument is specified as a vector of the form c("<start_time_column>", "<end_time_column>").
Types: character OR vector of Strings (character)

value.columns

Required Argument.
Specifies the names of input tbl_teradata columns to copy to the output tbl_teradata.
Types: character OR vector of Strings (character)

time.interval

Optional Argument.
Specifies the length of each burst time interval.
Note: Specify exactly one of the following arguments:
"time.data", "time.interval" or "num.points".
Types: numeric

time.datatype

Optional Argument.
Specifies the data type of the output columns that correspond to the input tbl_teradata columns specified by the argument "time.column" (start_time_column and end_time_column). If you omit this argument, then the function infers the data type of start_time_column and end_time_column from the input tbl_teradata and uses the inferred data type for the corresponding output tbl_teradata columns. If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: INTEGER, BIGINT, SMALLINT, DOUBLE PRECISION, DECIMAL(n,n), DECIMAL, NUMERIC, NUMERIC(n,m).
Types: character

value.datatype

Optional Argument.
Specifies the data types of the output columns that correspond to the input tbl_teradata columns specified by the argument "value.columns". If you omit this argument, then the function infers the data type of each value column from the input tbl_teradata and uses the inferred data type for the corresponding output tbl_teradata column. If you specify "value.datatype", then it must be the same size as "value.columns". That is, if "value.columns" specifies n columns, then "value.datatype" must specify n data types. For i in [1, n], value_column_i has value_type_i. However, value_type_i can be empty. For example: value.columns (c1, c2, c3), value.datatype (integer, ,VARCHAR). If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: INTEGER, BIGINT, SMALLINT, DOUBLE PRECISION, DECIMAL(n,n), DECIMAL, NUMERIC, NUMERIC(n,n)
Types: character OR vector of characters

start.time

Optional Argument.
Specifies the start time for the time interval to be burst. The default value is the start time of the start column in "time.column".
Types: character

end.time

Optional Argument.
Specifies the end time for the time interval to be burst. The default value is the end time of the end column in "time.column".
Types: character

num.points

Optional Argument.
Specifies the number of data points in each burst time interval.
Note: Specify exactly one of the following arguments:
"time.data", "time.interval", or "num.points".
Types: integer

values.before.first

Optional Argument.
Specifies the values to use if start_time is before start_time_column. Each of these values must have the same data type as its corresponding value column. Values of data type VARCHAR are case-insensitive. If you specify "values.before.first", then it must be the same size as that of "value.columns". That is, if "value.columns" specifies n columns, then "values.before.first" must specify n values. For i in [1, n], value_column_i has the value before_first_value_i. However, before_first_value_i can be empty. For example: value.columns (c1, c2, c3), values.before.first (1, ,"abc"). If before_first_value_i is empty, then value_column_i has the value "NULL".
Note: NULL should be put in double quotes.
If you do not specify "values.before.first", then value_column_i has the value NULL for i in [1, n].
Types: character OR vector of characters

values.after.last

Optional Argument.
Specifies the values to use if end_time is after end_time_column. Each of these values must have the same data type as its corresponding value column. Values of data type VARCHAR are case-insensitive. If you specify "values.after.last", then it must be the same size as that of "value.columns". That is, if "value.columns" specifies n columns, then "values.after.last" must specify n values. For i in [1, n], value_column_i has the value after_last_value_i. However, after_last_value_i can be empty. For example: value.columns (c1, c2, c3), values.after.last (1, ,"abc"). If after_last_value_i is empty, then value_column_i has the value "NULL".
Note: NULL should be put in double quotes.
If you do not specify "values.after.last", then value_column_i has the value NULL for i in [1, n].
Types: character OR vector of characters

split.criteria

Optional Argument.
Specifies the split criteria of the value columns.
Default Value: "nosplit"
Permitted Values: nosplit, proportional, random, gaussian, poisson
Types: character

seed

Optional Argument.
Specifies the seed for the random number generator.
Types: integer

accumulate

Optional Argument.
Specifies the names of input tbl_teradata columns (other than those specified by "time.column" and "value.columns") to copy to the output tbl_teradata. By default, the function copies to the output tbl_teradata only the columns specified by "time.column" and "value.columns".
Types: character OR vector of Strings (character)

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

time.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "time.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_burst_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("burst_example", "burst_data", "finance_data", "time_table2")

    # Create object(s) of class "tbl_teradata".
    burst_data <- tbl(con, "burst_data")
    finance_data <- tbl(con, "finance_data")
    time_table2 <- tbl(con, "time_table2")

    # Example 1: Use "time.interval" argument to burst the data for
    # a duration of 1 day (86400 seconds).
    td_burst_out1 <- td_burst_mle(data = burst_data,
                                  data.partition.column = c("id"),
                                  data.order.column = "id",
                                  time.column = c("start_time_column", "end_time_column"),
                                  value.columns = c("num_custs"),
                                  time.interval = 86400,
                                  start.time = "08/01/2010",
                                  end.time = "08/10/2010",
                                  split.criteria = "nosplit",
                                  accumulate = c("id")
                                  )

    # Example 2: "split.criteria" = proportional.
    td_burst_out2 <- td_burst_mle(data = burst_data,
                                  data.partition.column = c("id"),
                                  data.order.column = "id",
                                  time.column = c("start_time_column", "end_time_column"),
                                  value.columns = c("num_custs"),
                                  time.interval = 86400,
                                  start.time = "08/01/2010",
                                  end.time = "08/10/2010",
                                  split.criteria = "proportional",
                                  accumulate = c("id")
                                  )

    # Example 3: "split.criteria" = gaussian.
    td_burst_out3 <- td_burst_mle(data = burst_data,
                                  data.partition.column = c("id"),
                                  data.order.column = "id",
                                  time.column = c("start_time_column", "end_time_column"),
                                  value.columns = c("num_custs"),
                                  time.interval = 86400,
                                  start.time = "08/01/2010",
                                  end.time = "08/10/2010",
                                  split.criteria = "gaussian",
                                  accumulate = c("id")
                                  )

    # Example 4: uses a "time.data" argument, "values.before.first" and "values.after.last".
    # The "time.data" option allows the use of different time intervals and partitions the 
    # data accordingly.
    td_burst_out4 <- td_burst_mle(data = finance_data,
                                  data.partition.column = c("id"),
                                  data.order.column = "id",
                                  time.data = time_table2,
                                  time.data.partition.column = c("id"),
                                  time.data.order.column = "burst_start",
                                  time.column = c("start_time_column", "end_time_column"),
                                  value.columns = c("expenditure", "income", "investment"),
                                  start.time = "06/30/1967",
                                  end.time = "07/10/1967",
                                  values.before.first = c("NULL","NULL","NULL"),
                                  values.after.last = c("NULL","NULL","NULL"),
                                  accumulate = c("id")
                                  )