Description
The Symbolic Aggregate approXimation function transforms original
time series data into symbolic strings, which are more suitable
for additional types of manipulation, because of their smaller
size and the relative ease with which patterns can be identified
and compared. Input and output formats allow it to supply data
to the Shapelet Functions.
Usage
td_sax_mle (
data = NULL,
data.partition.column = NULL,
data.order.column = NULL,
meanstats.data = NULL,
stdevstats.data = NULL,
value.columns = NULL,
time.column = NULL,
window.type = "global",
output = "string",
mean = NULL,
st.dev = NULL,
window.size = NULL,
output.frequency = 1,
points.persymbol = 1,
symbols.perwindow = NULL,
alphabet.size = 4,
bitmap.level = 2,
print.stats = FALSE,
accumulate = NULL,
data.sequence.column = NULL,
meanstats.data.sequence.column = NULL,
stdevstats.data.sequence.column = NULL,
meanstats.data.partition.column = NULL,
stdevstats.data.partition.column = NULL,
meanstats.data.order.column = NULL,
stdevstats.data.order.column = NULL
)
Arguments
data |
Required Argument.
Specifies the input tbl_teradata.
|
data.partition.column |
Required Argument.
Specifies Partition By columns for "data".
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
data.order.column |
Required Argument.
Specifies Order By columns for "data".
Values to this argument can be provided as a vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
meanstats.data |
Optional Argument.
Specifies the tbl_teradata that contains the global means of
each column in "value.columns" argument of the input tbl_teradata.
|
meanstats.data.partition.column |
Optional Argument. Required if "meanstats.data" is specified.
Specifies Partition By columns for "meanstats.data".
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
meanstats.data.order.column |
Optional Argument.
Specifies Order By columns for "meanstats.data".
Values to this argument can be provided as a vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
stdevstats.data |
Optional Argument.
Specifies the tbl_teradata that contains the global standard
deviations of each column in "value.columns" argument of the input tbl_teradata.
|
stdevstats.data.partition.column |
Optional Argument. Required if "stdevstats.data" is specified.
Specifies Partition By columns for "stdevstats.data".
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
stdevstats.data.order.column |
Optional Argument.
Specifies Order By columns for "stdevstats.data".
Values to this argument can be provided as a vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
value.columns |
Required Argument.
Specifies the names of the input tbl_teradata columns that contain
the time series data to be transformed.
Types: character OR vector of Strings (character)
|
time.column |
Optional Argument.
Specifies the name of the input tbl_teradata column that contains the
time axis of the data.
Types: character
|
window.type |
Optional Argument.
Determines how much data the function processes at one time:
"global": The function computes the SAX code using a single
mean and standard deviation for the entire data set.
"sliding": The function recomputes the mean and standard
deviation for a sliding window of the data set.
Default Value: "global"
Permitted Values: sliding, global
Types: character
|
output |
Optional Argument.
Determines how the function outputs the results:
"string": The function outputs a list of SAX codes for each
window.
"bytes": The function outputs the list of SAX codes as compact
byte arrays (which are not "human-readable").
"bitmap": The function outputs a JSON representation of a SAX
bitmap.
"characters": The function outputs one character for each line.
Default Value: "string"
Permitted Values: string, bitmap, bytes, characters
Types: character
|
mean |
Optional Argument.
Specifies the global mean values that the function uses to calculate
the SAX code for every partition. A mean value has the data type
numeric. If "mean" specifies only one value and "value.columns" specifies
multiple columns, then the specified value applies to every item in
"value.columns". If "mean" specifies multiple values, then it must specify
one value for each item in "value.columns". The nth mean value corresponds to the
nth value column.
Note: To specify a different global mean value for each partition, use the
multiple-input syntax and put the values in the "meanstats.data" tbl_teradata.
Default Value: NULL
Types: numeric OR vector of numerics
|
st.dev |
Optional Argument.
Specifies the global standard deviation values that the function uses
to calculate the SAX code for every partition. A standard deviation
value has the data type numeric and its value must be greater than 0.
If it specifies only one value and "value.columns" specifies multiple
columns, then the specified "st.dev" value applies to every item in "value.columns".
If it specifies multiple values, then it must specify one value for
each item in "value.columns". The nth standard deviation value corresponds
to the nth item in "value.columns" argument.
Note: To specify a different global standard deviation value for each
partition, use the multiple-input syntax and put the values in the
"stdevstats.data" tbl_teradata.
Default Value: NULL
Types: numeric OR vector of numerics
|
window.size |
Optional Argument.
Specifies the size of the sliding window. The value must be an
integer greater than 0.
Types: integer
|
output.frequency |
Optional Argument.
Specifies the number of data points that the window slides between
successive outputs. The value must be an integer greater than 0.
Note: "window.type" value must be "sliding" and "output" value cannot be
"characters". If "window.type" is "sliding" and "output" value is
"characters", then "output.frequency" is automatically set to the value
of "window.size", to ensure that a single character is assigned to each
time point. If the number of data points in the time series is not an
integer multiple of the window size, then the function ignores the
leftover parts.
Default Value: 1
Types: integer
|
points.persymbol |
Optional Argument.
Specifies the number of data points to be converted into one SAX
symbol. Each value must be an integer greater than 0.
Note: "window.type" value must be "global".
Default Value: 1
Types: integer
|
symbols.perwindow |
Optional Argument.
Specifies the number of SAX symbols to be generated for each window.
Each value must be an integer greater than 0. The default value is
the value of "window.size".
Note: "window.type" value must be "sliding".
Types: integer
|
alphabet.size |
Optional Argument.
Specifies the number of symbols in the SAX alphabet. The value must
be an integer in the range [2, 20].
Default Value: 4
Types: integer
|
bitmap.level |
Optional Argument.
Specifies the number of consecutive symbols to be converted to one
symbol on a bitmap. For bitmap level 1, the bitmap contains the
symbols "a", "b", "c", and so on; for bitmap level 2, the bitmap
contains the symbols "aa", "ab", "ac", and so on. The input value
must be an integer in the range [1, 4].
Note: "output" value must be "bitmap".
Default Value: 2
Types: integer
|
print.stats |
Optional Argument.
Specifies whether the function prints the mean and standard
deviation.
Note: "output" value must be "string".
Default Value: FALSE
Types: logical
|
accumulate |
Optional Argument.
Specifies the names of the input tbl_teradata columns that are to appear in the
output tbl_teradata. For each sequence in the input tbl_teradata, the function chooses
the value corresponding to the first time point in the sequence to
output as the "accumulate" value.
Types: character OR vector of Strings (character)
|
data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
meanstats.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "meanstats.data". The argument is used to
ensure deterministic results for functions which produce results that
vary from run to run.
Types: character OR vector of Strings (character)
|
stdevstats.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "stdevstats.data". The argument is used to
ensure deterministic results for functions which produce results that
vary from run to run.
Types: character OR vector of Strings (character)
|
Value
Function returns an object of class "td_sax_mle" which is a named
list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using the name: result.
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("sax_example", "finance_data3")
# Create object(s) of class "tbl_teradata".
finance_data3 <- tbl(con, "finance_data3")
# Example 1: This example uses "window.type" as global and default output value.
td_sax_out <- td_sax_mle(data = finance_data3,
data.partition.column = c("id"),
data.order.column = c("period"),
value.columns = c("expenditure","income","investment"),
time.column = "period",
window.type = "global",
print.stats = TRUE,
accumulate = c("id")
)
# Example 2: This example uses "window.type" as sliding and default output value.
# "window.size" should also be specified when "window.type" is set as sliding.
td_sax_out2 <- td_sax_mle(data = finance_data3,
data.partition.column = c("id"),
data.order.column = c("period"),
value.columns = c("expenditure"),
time.column = "period",
window.type = "sliding",
window.size = 20,
print.stats = TRUE,
accumulate = c("id")
)
# Example 3: This example uses a the multiple-input version, where the
# mean and standard deviation statistics are applied globally with the
# meanstats tbl_teradata and the stdevstats tbl_teradata.
meanstats <- tbl(con, "finance_data3") %>% group_by(id) %>%
summarize(expenditure = mean(expenditure, na.rm = TRUE),
income = mean(income, na.rm = TRUE),
investment = mean(investment, na.rm = TRUE))
stdevstats <- tbl(con, "finance_data3") %>% group_by(id) %>%
summarize(expenditure = sd(expenditure, na.rm = TRUE),
income = sd(income, na.rm = TRUE),
investment = sd(investment, na.rm = TRUE))
td_sax_out3 <- td_sax_mle(data = finance_data3,
data.partition.column = c("id"),
data.order.column = c("id"),
meanstats.data = meanstats,
meanstats.data.partition.column = c("id"),
stdevstats.data = stdevstats,
stdevstats.data.partition.column = c("id"),
value.columns = c("expenditure","income","investment"),
time.column = "period",
window.type = "global",
accumulate = c("id")
)