Teradata R Package Function Reference - 16.20 - Burst - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
16.20
created_date
February 2020
category
Programming Reference
featnum
B700-4007-098K

Description

The Burst function bursts (splits) a time interval into a series of shorter burst intervals that can be analyzed independently.

Usage

  td_burst_mle (
      data = NULL,
      time.data = NULL,
      time.column = NULL,
      value.columns = NULL,
      time.interval = NULL,
      time.datatype = NULL,
      value.datatype = NULL,
      start.time = NULL,
      end.time = NULL,
      num.points = NULL,
      values.before.first = NULL,
      values.after.last = NULL,
      split.criteria = "nosplit",
      seed = NULL,
      accumulate = NULL,
      data.sequence.column = NULL,
      time.data.sequence.column = NULL,
      data.partition.column = NULL,
      time.data.partition.column = NULL,
      data.order.column = NULL,
      time.data.order.column = NULL
  )

Arguments

data

Required Argument.
Specifies the tbl object of class "tbl_teradata" containing the time series.

data.partition.column

Required Argument.
Specifies partition columns for data.
Values to this argument can be provided as vector, if multiple columns are used for ordering.

data.order.column

Optional Argument.
Specifies order by columns for "data" argument.
Values to this argument can be provided as vector, if multiple columns are used for ordering.

time.data

Optional Argument.
Specifies the tbl object of class tbl_teradata which contains time.

time.data.partition.column

Required Argument.
Specifies partition columns for time.data argument.
Values to this argument can be provided as vector, if multiple columns are used for ordering.

time.data.order.column

Optional Argument.
Specifies order by columns for "time.data" argument.
Values to this argument can be provided as vector, if multiple columns are used for ordering.

time.column

Required Argument.
Specifies the names of the input table columns that contain the start and end times of the time interval to be burst. This argument is specified as a vector of the form c("<start_time_column>", "<end_time_column>").

value.columns

Required Argument.
Specifies the names of the input table columns to copy to the output table.

time.interval

Optional Argument.
Specifies the length of each burst time interval. This value must be numeric.
Note: Specify exactly one of the following arguments
time.data, time.interval or num.points.

time.datatype

Optional Argument.
Specifies the data type of the output columns that correspond to the input tbl columns specified by the argument "time.column" (start_time_column and end_time_column). If you omit this argument, then the function infers the data type of start_time_column and end_time_column from the input tbl_teradata and uses the inferred data type for the corresponding output tbl_teradata columns. If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: INTEGER, BIGINT, SMALLINT, DOUBLE PRECISION, DECIMAL(n,n), DECIMAL, NUMERIC, NUMERIC(n,m).

value.datatype

Optional Argument.
Specifies the data types of the output columns that correspond to the input tbl_teradata columns specified by the argument "value.columns". If you omit this argument, then the function infers the data type of each value.columns from the input table and uses the inferred data type for the corresponding output table column. If you specify value.datatype, then it must be the same size as value.columns. That is, if "value.columns" specifies n columns, then value.datatype must specify n data types. For i in [1, n], value_column_i has value_type_i. However, value_type_i can be empty; for example: value.columns (c1, c2, c3), value.datatype (numeric, ,VARCHAR) If you specify this argument, then the function can transform the input data to the specified output data type only if both the input column data type and the specified output column data type are in this list: numeric, BIGINT, SMALLINT, numeric, DECIMAL(n,m), DECIMAL, NUMERIC, NUMERIC(n,m)

start.time

Optional Argument.
Specifies the start time for the time interval to be burst. This argument is specified as a string. The default value is the start time of the start column in "time.column".

end.time

Optional Argument.
Specifies the end time for the time interval to be burst. This argument is specified as a string. The default value is the end time of the end column in "time.column".

num.points

Optional Argument.
Specifies the number of data points in each burst time interval. This value must be an integer or double precision.
Note: Specify exactly one of the following arguments:
time_table, time.interval, or num.points.

values.before.first

Optional Argument.
Specifies the values to use if start_time is before start_time_column. Each of these values must have the same data type as its corresponding "value.columns". Values of data type VARCHAR are case-insensitive. If you specify values.before.first, then it must be the same size as value.columns. That is, if value.columns specifies n columns, then values.before.first must specify n values. For i in [1, n], value_column_i has the value before_first_value_i. However, before_first_value_i can be empty. For example, value.columns (c1, c2, c3), values.before.first (1, ,"abc"). If before_first_value_i is empty, then value_column_i has the value "NULL".
Note: NULL should be put in double quotes. If you do not specify values.before.first, then value_column_i has the value NULL for i in [1, n].

values.after.last

Optional Argument.
Specifies the values to use if end_time is after end_time_column. Each of these values must have the same data type as its corresponding value_column. Values of data type VARCHAR are case-insensitive. If you specify "values.after.last", then it must be the same size as value.columns. That is, if value.columns specifies n columns, then "values.after.last" must specify n values. For i in [1, n], value_column_i has the value after_last_value_i. However, after_last_value_i can be empty, for example:value.columns (c1, c2, c3), values.after.last (1, ,"abc"). If after_last_value_i is empty, then value_column_i has the value NULL.
Note: NULL should be put in double quotes. If you do not specify "values.after.last", then value_column_i has the value NULL for i in [1, n].

split.criteria

Optional Argument.
Specifies the Split criteria of the Value Columns. Default Value: "nosplit"
Permitted Values: nosplit, proportional, random, gaussian, poisson

seed

Optional Argument.
Specifies the seed for the random number generator.

accumulate

Optional Argument.
Specifies the names of input table columns to copy to the output table. By default, the function copies to the output tbl_teradata object only the columns specified by time.column and value.columns.

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

time.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "time.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

Value

Function returns an object of class "td_burst_mle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("burst_example", "burst_data", "finance_data", "time_table2")
    
    # Create remote tibble objects.
    burst_data <- tbl(con, "burst_data")
    finance_data <- tbl(con, "finance_data")
    time_table2 <- tbl(con, "time_table2")
    
    # Example 1 - Use time.interval argument to burst the data for 
    # a duration of 1 day (86400 seconds). 
    td_burst_out1 <- td_burst_mle(data = burst_data,
                             data.partition.column = c("id"),
                             data.order.column = "id",
                             time.column = c("start_time_column", "end_time_column"),
                             value.columns = c("num_custs"),
                             time.interval = 86400,
                             start.time = "08/01/2010",
                             end.time = "08/10/2010",
                             split.criteria = "nosplit",
                             accumulate = c("id")
                             )
    
    # Example 2 - split.criteria = proportional.
    td_burst_out2 <- td_burst_mle(data = burst_data,
                             data.partition.column = c("id"),
                             data.order.column = "id",
                             time.column = c("start_time_column", "end_time_column"),
                             value.columns = c("num_custs"),
                             time.interval = 86400,
                             start.time = "08/01/2010",
                             end.time = "08/10/2010",
                             split.criteria = "proportional",
                             accumulate = c("id")
                             )
    
    # Example 3 - split.criteria = gaussian.
    td_burst_out3 <- td_burst_mle(data = burst_data,
                             data.partition.column = c("id"),
                             data.order.column = "id",
                             time.column = c("start_time_column", "end_time_column"),
                             value.columns = c("num_custs"),
                             time.interval = 86400,
                             start.time = "08/01/2010",
                             end.time = "08/10/2010",
                             split.criteria = "gaussian",
                             accumulate = c("id")
                             )
    
    # Example 4 - uses a  time.data argument, values.before.first and values.after.last.
    # The time.data option allows the use of different time intervals and partitions the data accordingly.
    td_burst_out4 <- td_burst_mle(data = finance_data,
                             data.partition.column = c("id"),
                             data.order.column = "id",
                             time.data = time_table2,
                             time.data.partition.column = c("id"),
                             time.data.order.column = "burst_start",
                             time.column = c("start_time_column", "end_time_column"),
                             value.columns = c("expenditure", "income", "investment"),
                             start.time = "06/30/1967",
                             end.time = "07/10/1967",
                             values.before.first = c("NULL","NULL","NULL"),
                             values.after.last = c("NULL","NULL","NULL"),
                             accumulate = c("id")
                             )