Description
The Symbolic Aggregate approXimation (td_sax_mle
) function
transforms original time series data into symbolic strings, which
are more suitable for additional types of manipulation, because
of their smaller size and the relative ease with which patterns can
be identified and compared. Input and output formats allow it to
supply data to the Shapelet Functions.
Usage
td_sax_mle (
data = NULL,
data.partition.column = NULL,
data.order.column = NULL,
meanstats.data = NULL,
stdevstats.data = NULL,
value.columns = NULL,
time.column = NULL,
window.type = "global",
output = "string",
mean = NULL,
st.dev = NULL,
window.size = NULL,
output.frequency = 1,
points.persymbol = 1,
symbols.perwindow = NULL,
alphabet.size = 4,
bitmap.level = 2,
print.stats = FALSE,
accumulate = NULL,
data.sequence.column = NULL,
meanstats.data.sequence.column = NULL,
stdevstats.data.sequence.column = NULL,
meanstats.data.partition.column = NULL,
stdevstats.data.partition.column = NULL,
meanstats.data.order.column = NULL,
stdevstats.data.order.column = NULL
)
Arguments
data |
Required Argument.
Specifies the input table.
|
data.partition.column |
Required Argument.
Specifies the Partition By Columns for data.
Values to this argument can be provided as vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
data.order.column |
Required Argument.
Specifies the Order By Columns for data.
Values to this argument can be provided as vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
meanstats.data |
Optional Argument.
Specifies the tbl_teradata that contains the global means of
each column in "value.columns" argument of the input table.
|
meanstats.data.partition.column |
Required Argument when meanstats.data is specified.
Specifies the Partition By Columns for meanstats.data.
Values to this argument can be provided as vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
meanstats.data.order.column |
Optional Argument.
Specifies the Order By Columns for meanstats.data.
Values to this argument can be provided as vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
stdevstats.data |
Optional Argument.
Specifies the tbl_teradata that contains the global standard
deviations of each column in "value.columns" argument of the input table.
|
stdevstats.data.partition.column |
Required Argument when stdevstats.data is specified.
Partition By columns for stdevstats.data.
Values to this argument can be provided as vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
stdevstats.data.order.column |
Optional Argument.
Specifies the Order By Columns for stdevstats.data.
Values to this argument can be provided as vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
value.columns |
Required Argument.
Specifies the names of the input tbl_teradata columns that contain
the time series data to be transformed.
Types: character OR vector of Strings (character)
|
time.column |
Optional Argument.
Specifies the name of the input tbl_teradata column that contains the
time axis of the data.
Types: character OR vector of Strings (character)
|
window.type |
Optional Argument.
Determines how much data the function processes at one time:
"global" (default): The function computes the SAX code using
a single mean and standard deviation for the entire data set.
"sliding": The function recomputes the mean and standard
deviation for a sliding window of the data set.
Default Value: "global"
Permitted Values: sliding, global
Types: character
|
output |
Optional Argument.
Determines how the function outputs the results:
"string" (default): The function outputs a list of SAX codes
for each window.
"bytes": The function outputs the list of SAX codes as compact
byte arrays (which are not "human-readable").
"bitmap": The function outputs a JSON representation of a SAX
bitmap.
"characters": The function outputs one character for each line.
Default Value: "string"
Permitted Values: string, bitmap, bytes, characters
Types: character
|
mean |
Optional Argument.
Specifies the global mean values that the function uses to calculate
the SAX code for every partition. A mean value has the data type
numeric. If "mean" specifies only one value and "value.columns" specifies
multiple columns, then the specified "mean" value applies to every item in
"value.columns". If "mean" specifies multiple values, then it must specify
one value for each item in "value.columns". The nth mean value in "mean"
corresponds to the nth item in "value.columns".
Tip: To specify a different global mean value for
each partition, use the multiple-input syntax and put the values in
the meanstats table.
Default Value: NULL
Types: numeric
|
st.dev |
Optional Argument.
Specifies the global standard deviation values that the function uses
to calculate the SAX code for every partition. A standard deviation
value has the data type numeric and its value must be greater than 0.
If it specifies only one value and "value.columns" specifies multiple
columns, then the specified "st.dev" value applies to every item in "value.columns".
If it specifies multiple values, then it must specify one value for
each item in "value.columns". The nth standard deviation value corresponds
to the nth item in "value.columns" argument.
Tip: To specify a different global standard deviation
value for each partition, use the multiple-input syntax and put the
values in the stdevstats table.
Default Value: NULL
Types: numeric
|
window.size |
Optional Argument.
Specifies the size of the sliding window. The value must be an
integer greater than 0.
Types: numeric
|
output.frequency |
Optional Argument.
Specifies the number of data points that the window slides between
successive outputs. The value must be an integer greater than 0.
Note: "window.type" value must be "sliding" and "output" value cannot be
"characters". If window.type is "sliding" and "output" value is
"characters", then "output.frequency" is automatically set to the value
of "window.size", to ensure that a single character is assigned to each
time point. If the number of data points in the time series is not an
integer multiple of the window size, then the function ignores the
leftover parts.
Default Value: 1
Types: numeric
|
points.persymbol |
Optional Argument.
Specifies the number of data points to be converted into one SAX
symbol. Each value must be an integer greater than 0.
Note: "window.type" value must be "global".
Default Value: 1
Types: numeric
|
symbols.perwindow |
Optional Argument.
Specifies the number of SAX symbols to be generated for each window.
Each value must be an integer greater than 0. The default value is
the value of "window.size".
Note: "window.type" value must be "sliding".
Types: numeric
|
alphabet.size |
Optional Argument.
Specifies the number of symbols in the SAX alphabet. The value must
be an integer in the range [2, 20].
Default Value: 4
Types: numeric
|
bitmap.level |
Optional Argument.
Specifies the number of consecutive symbols to be converted to one
symbol on a bitmap. For bitmap level 1, the bitmap contains the
symbols "a", "b", "c", and so on; for bitmap level 2, the bitmap
contains the symbols "aa", "ab", "ac", and so on. The input value
must be an integer in the range [1, 4].
Note: "output" value must be "bitmap".
Default Value: 2
Types: numeric
|
print.stats |
Optional Argument.
Specifies whether the function prints the mean and standard
deviation.
Note: "output" value must be "string".
Default Value: FALSE
Types: logical
|
accumulate |
Optional Argument.
Specifies the names of the input tbl_teradata columns that are to appear in the
output table. For each sequence in the input table, td_sax_mle function chooses
the value corresponding to the first time point in the sequence to
output as the "accumulate" value.
Types: character OR vector of Strings (character)
|
data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
meanstats.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "meanstats.data". The argument is used to
ensure deterministic results for functions which produce results that
vary from run to run.
Types: character OR vector of Strings (character)
|
stdevstats.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "stdevstats.data". The argument is used to
ensure deterministic results for functions which produce results that
vary from run to run.
Types: character OR vector of Strings (character)
|
Value
Function returns an object of class "td_sax_mle" which is a named list
containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator
using name: result.
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("sax_example", "finance_data3")
# Create remote tibble objects.
finance_data3 <- tbl(con, "finance_data3")
# Example 1 - This example uses window.type as global and default output value.
td_sax_out <- td_sax_mle(data = finance_data3,
data.partition.column = c("id"),
data.order.column = c("period"),
value.columns = c("expenditure","income","investment"),
time.column = "period",
window.type = "global",
print.stats = TRUE,
accumulate = c("id")
)
# Example 2 - This example uses window.type as sliding and default output value. # window.size should also be specified when window.type is set as sliding.
td_sax_out2 <- td_sax_mle(data = finance_data3,
data.partition.column = c("id"),
data.order.column = c("period"),
value.columns = c("expenditure"),
time.column = "period",
window.type = "sliding",
window.size = 20,
print.stats = TRUE,
accumulate = c("id")
)
# Example 3 - This example uses a the multiple-input version, where the
# mean and standard deviation statistics are applied globally with
# meanstats and the stdevstats tables.
meanstats <- tbl(con, "finance_data3") %>% group_by(id) %>%
summarize(expenditure = mean(expenditure, na.rm = TRUE),
income = mean(income, na.rm = TRUE),
investment = mean(investment, na.rm = TRUE))
stdevstats <- tbl(con, "finance_data3") %>% group_by(id) %>%
summarize(expenditure = sd(expenditure, na.rm = TRUE),
income = sd(income, na.rm = TRUE),
investment = sd(investment, na.rm = TRUE))
td_sax_out3 <- td_sax_mle(data = finance_data3,
data.partition.column = c("id"),
data.order.column = c("id"),
meanstats.data = meanstats,
meanstats.data.partition.column = c("id"),
stdevstats.data = stdevstats,
stdevstats.data.partition.column = c("id"),
value.columns = c("expenditure","income","investment"),
time.column = "period",
window.type = "global",
accumulate = c("id")
)