Description
The Burst function bursts (splits) a time interval into a series of
shorter "burst" intervals and allocates values from the time intervals
into the new, shorter subintervals. The Burst function is useful for
allocating values from overlapping time intervals into user-defined time
intervals (for example, when a cable company has customer data from
overlapping time intervals, which it wants to analyze by dividing into
uniform time intervals). The Burst function supports several allocation
methods.
The burst intervals can have either the same length (specified by the
"time.interval" argument), the same number of data points (specified
by the "num.points" argument), or specific start and end
times (specified by "time.data").
Usage
td_burst_mle (
data = NULL,
time.data = NULL,
time.column = NULL,
value.columns = NULL,
time.interval = NULL,
time.datatype = NULL,
value.datatype = NULL,
start.time = NULL,
end.time = NULL,
num.points = NULL,
values.before.first = NULL,
values.after.last = NULL,
split.criteria = "nosplit",
seed = NULL,
accumulate = NULL,
data.sequence.column = NULL,
time.data.sequence.column = NULL,
data.partition.column = NULL,
time.data.partition.column = NULL,
data.order.column = NULL,
time.data.order.column = NULL
)
Arguments
data |
Required Argument.
Specifies the tbl_teradata containing the time series data.
|
data.partition.column |
Required Argument.
Specifies Partition By columns for "data".
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
data.order.column |
Optional Argument.
Specifies Order By columns for "data".
Values to this argument can be provided as a vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
time.data |
Optional Argument.
Specifies the tbl_teradata containing the time data.
Note: Specify exactly one of the following arguments:
"time.data", "time.interval", or "num.points".
|
time.data.partition.column |
Optional Argument. Required if "time.data" is specified.
Specifies Partition By columns for "time.data".
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
time.data.order.column |
Optional Argument.
Specifies Order By columns for "time.data".
Values to this argument can be provided as a vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
time.column |
Required Argument.
Specifies the names of the input tbl_teradata columns that contain the start
and end times of the time interval to be burst. This argument is specified
as a vector of the form c("<start_time_column>", "<end_time_column>").
Types: character OR vector of Strings (character)
|
value.columns |
Required Argument.
Specifies the names of input tbl_teradata columns to copy to the output
tbl_teradata.
Types: character OR vector of Strings (character)
|
time.interval |
Optional Argument.
Specifies the length of each burst time interval.
Note: Specify exactly one of the following arguments:
"time.data", "time.interval" or "num.points".
Types: numeric
|
time.datatype |
Optional Argument.
Specifies the data type of the output columns that correspond to the
input tbl_teradata columns specified by the argument "time.column"
(start_time_column and end_time_column). If you omit this argument,
then the function infers the data type of start_time_column and
end_time_column from the input tbl_teradata and uses the inferred
data type for the corresponding output tbl_teradata columns. If you
specify this argument, then the function can transform the input data
to the specified output data type only if both the input column data
type and the specified output column data type are in this list:
INTEGER, BIGINT, SMALLINT, DOUBLE PRECISION, DECIMAL(n,n), DECIMAL, NUMERIC,
NUMERIC(n,m).
Types: character
|
value.datatype |
Optional Argument.
Specifies the data types of the output columns that correspond to the
input tbl_teradata columns specified by the argument "value.columns". If you omit
this argument, then the function infers the data type of each
value column from the input tbl_teradata and uses the inferred data
type for the corresponding output tbl_teradata column. If you specify
"value.datatype", then it must be the same size as "value.columns". That
is, if "value.columns" specifies n columns, then "value.datatype" must
specify n data types. For i in [1, n], value_column_i has
value_type_i. However, value_type_i can be empty. For example:
value.columns (c1, c2, c3), value.datatype (integer, ,VARCHAR). If you
specify this argument, then the function can transform the input data
to the specified output data type only if both the input column data
type and the specified output column data type are in this list:
INTEGER, BIGINT, SMALLINT, DOUBLE PRECISION, DECIMAL(n,n), DECIMAL, NUMERIC,
NUMERIC(n,n)
Types: character OR vector of characters
|
start.time |
Optional Argument.
Specifies the start time for the time interval to be burst. The default
value is the start time of the start column in "time.column".
Types: character
|
end.time |
Optional Argument.
Specifies the end time for the time interval to be burst. The default
value is the end time of the end column in "time.column".
Types: character
|
num.points |
Optional Argument.
Specifies the number of data points in each burst time interval.
Note: Specify exactly one of the following arguments:
"time.data", "time.interval", or "num.points".
Types: integer
|
values.before.first |
Optional Argument.
Specifies the values to use if start_time is before
start_time_column. Each of these values must have the same data type
as its corresponding value column. Values of data type VARCHAR are
case-insensitive. If you specify "values.before.first", then it must be
the same size as that of "value.columns". That is, if "value.columns" specifies n
columns, then "values.before.first" must specify n values. For i in [1,
n], value_column_i has the value before_first_value_i. However,
before_first_value_i can be empty. For example: value.columns (c1,
c2, c3), values.before.first (1, ,"abc"). If before_first_value_i is
empty, then value_column_i has the value "NULL".
Note: NULL should be put in double quotes.
If you do not specify "values.before.first", then value_column_i has
the value NULL for i in [1, n].
Types: character OR vector of characters
|
values.after.last |
Optional Argument.
Specifies the values to use if end_time is after end_time_column.
Each of these values must have the same data type as its
corresponding value column. Values of data type VARCHAR are
case-insensitive. If you specify "values.after.last", then it must be the
same size as that of "value.columns". That is, if "value.columns" specifies n
columns, then "values.after.last" must specify n values. For i in [1, n],
value_column_i has the value after_last_value_i. However,
after_last_value_i can be empty. For example: value.columns (c1, c2,
c3), values.after.last (1, ,"abc"). If after_last_value_i is empty,
then value_column_i has the value "NULL".
Note: NULL should be put in double quotes.
If you do not specify "values.after.last", then value_column_i has
the value NULL for i in [1, n].
Types: character OR vector of characters
|
split.criteria |
Optional Argument.
Specifies the split criteria of the value columns.
Default Value: "nosplit"
Permitted Values: nosplit, proportional, random, gaussian, poisson
Types: character
|
seed |
Optional Argument.
Specifies the seed for the random number generator.
Types: integer
|
accumulate |
Optional Argument.
Specifies the names of input tbl_teradata columns (other than those
specified by "time.column" and "value.columns") to copy to the output
tbl_teradata. By default, the function copies to the output tbl_teradata
only the columns specified by "time.column" and "value.columns".
Types: character OR vector of Strings (character)
|
data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
time.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "time.data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
Value
Function returns an object of class "td_burst_mle" which is a named
list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using the name: result.
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("burst_example", "burst_data", "finance_data", "time_table2")
# Create object(s) of class "tbl_teradata".
burst_data <- tbl(con, "burst_data")
finance_data <- tbl(con, "finance_data")
time_table2 <- tbl(con, "time_table2")
# Example 1: Use "time.interval" argument to burst the data for
# a duration of 1 day (86400 seconds).
td_burst_out1 <- td_burst_mle(data = burst_data,
data.partition.column = c("id"),
data.order.column = "id",
time.column = c("start_time_column", "end_time_column"),
value.columns = c("num_custs"),
time.interval = 86400,
start.time = "08/01/2010",
end.time = "08/10/2010",
split.criteria = "nosplit",
accumulate = c("id")
)
# Example 2: "split.criteria" = proportional.
td_burst_out2 <- td_burst_mle(data = burst_data,
data.partition.column = c("id"),
data.order.column = "id",
time.column = c("start_time_column", "end_time_column"),
value.columns = c("num_custs"),
time.interval = 86400,
start.time = "08/01/2010",
end.time = "08/10/2010",
split.criteria = "proportional",
accumulate = c("id")
)
# Example 3: "split.criteria" = gaussian.
td_burst_out3 <- td_burst_mle(data = burst_data,
data.partition.column = c("id"),
data.order.column = "id",
time.column = c("start_time_column", "end_time_column"),
value.columns = c("num_custs"),
time.interval = 86400,
start.time = "08/01/2010",
end.time = "08/10/2010",
split.criteria = "gaussian",
accumulate = c("id")
)
# Example 4: uses a "time.data" argument, "values.before.first" and "values.after.last".
# The "time.data" option allows the use of different time intervals and partitions the
# data accordingly.
td_burst_out4 <- td_burst_mle(data = finance_data,
data.partition.column = c("id"),
data.order.column = "id",
time.data = time_table2,
time.data.partition.column = c("id"),
time.data.order.column = "burst_start",
time.column = c("start_time_column", "end_time_column"),
value.columns = c("expenditure", "income", "investment"),
start.time = "06/30/1967",
end.time = "07/10/1967",
values.before.first = c("NULL","NULL","NULL"),
values.after.last = c("NULL","NULL","NULL"),
accumulate = c("id")
)