Teradata R Package Function Reference | 17.00 - 17.00 - SAX - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The Symbolic Aggregate approXimation function transforms original time series data into symbolic strings, which are more suitable for additional types of manipulation, because of their smaller size and the relative ease with which patterns can be identified and compared. Input and output formats allow it to supply data to the Shapelet Functions.

Usage

  td_sax_mle (
      data = NULL,
      data.partition.column = NULL,
      data.order.column = NULL,
      meanstats.data = NULL,
      stdevstats.data = NULL,
      value.columns = NULL,
      time.column = NULL,
      window.type = "global",
      output = "string",
      mean = NULL,
      st.dev = NULL,
      window.size = NULL,
      output.frequency = 1,
      points.persymbol = 1,
      symbols.perwindow = NULL,
      alphabet.size = 4,
      bitmap.level = 2,
      print.stats = FALSE,
      accumulate = NULL,
      data.sequence.column = NULL,
      meanstats.data.sequence.column = NULL,
      stdevstats.data.sequence.column = NULL,
      meanstats.data.partition.column = NULL,
      stdevstats.data.partition.column = NULL,
      meanstats.data.order.column = NULL,
      stdevstats.data.order.column = NULL
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.

data.partition.column

Required Argument.
Specifies Partition By columns for "data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

data.order.column

Required Argument.
Specifies Order By columns for "data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

meanstats.data

Optional Argument.
Specifies the tbl_teradata that contains the global means of each column in "value.columns" argument of the input tbl_teradata.

meanstats.data.partition.column

Optional Argument. Required if "meanstats.data" is specified.
Specifies Partition By columns for "meanstats.data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

meanstats.data.order.column

Optional Argument.
Specifies Order By columns for "meanstats.data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

stdevstats.data

Optional Argument.
Specifies the tbl_teradata that contains the global standard deviations of each column in "value.columns" argument of the input tbl_teradata.

stdevstats.data.partition.column

Optional Argument. Required if "stdevstats.data" is specified.
Specifies Partition By columns for "stdevstats.data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

stdevstats.data.order.column

Optional Argument.
Specifies Order By columns for "stdevstats.data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

value.columns

Required Argument.
Specifies the names of the input tbl_teradata columns that contain the time series data to be transformed.
Types: character OR vector of Strings (character)

time.column

Optional Argument.
Specifies the name of the input tbl_teradata column that contains the time axis of the data.
Types: character

window.type

Optional Argument.
Determines how much data the function processes at one time:

  1. "global": The function computes the SAX code using a single mean and standard deviation for the entire data set.

  2. "sliding": The function recomputes the mean and standard deviation for a sliding window of the data set.


Default Value: "global"
Permitted Values: sliding, global
Types: character

output

Optional Argument.
Determines how the function outputs the results:

  1. "string": The function outputs a list of SAX codes for each window.

  2. "bytes": The function outputs the list of SAX codes as compact byte arrays (which are not "human-readable").

  3. "bitmap": The function outputs a JSON representation of a SAX bitmap.

  4. "characters": The function outputs one character for each line.


Default Value: "string"
Permitted Values: string, bitmap, bytes, characters
Types: character

mean

Optional Argument.
Specifies the global mean values that the function uses to calculate the SAX code for every partition. A mean value has the data type numeric. If "mean" specifies only one value and "value.columns" specifies multiple columns, then the specified value applies to every item in "value.columns". If "mean" specifies multiple values, then it must specify one value for each item in "value.columns". The nth mean value corresponds to the nth value column.
Note: To specify a different global mean value for each partition, use the multiple-input syntax and put the values in the "meanstats.data" tbl_teradata.
Default Value: NULL
Types: numeric OR vector of numerics

st.dev

Optional Argument.
Specifies the global standard deviation values that the function uses to calculate the SAX code for every partition. A standard deviation value has the data type numeric and its value must be greater than 0. If it specifies only one value and "value.columns" specifies multiple columns, then the specified "st.dev" value applies to every item in "value.columns". If it specifies multiple values, then it must specify one value for each item in "value.columns". The nth standard deviation value corresponds to the nth item in "value.columns" argument.
Note: To specify a different global standard deviation value for each partition, use the multiple-input syntax and put the values in the "stdevstats.data" tbl_teradata.
Default Value: NULL
Types: numeric OR vector of numerics

window.size

Optional Argument.
Specifies the size of the sliding window. The value must be an integer greater than 0.
Types: integer

output.frequency

Optional Argument.
Specifies the number of data points that the window slides between successive outputs. The value must be an integer greater than 0.
Note: "window.type" value must be "sliding" and "output" value cannot be "characters". If "window.type" is "sliding" and "output" value is "characters", then "output.frequency" is automatically set to the value of "window.size", to ensure that a single character is assigned to each time point. If the number of data points in the time series is not an integer multiple of the window size, then the function ignores the leftover parts.
Default Value: 1
Types: integer

points.persymbol

Optional Argument.
Specifies the number of data points to be converted into one SAX symbol. Each value must be an integer greater than 0.
Note: "window.type" value must be "global".
Default Value: 1
Types: integer

symbols.perwindow

Optional Argument.
Specifies the number of SAX symbols to be generated for each window. Each value must be an integer greater than 0. The default value is the value of "window.size".
Note: "window.type" value must be "sliding".
Types: integer

alphabet.size

Optional Argument.
Specifies the number of symbols in the SAX alphabet. The value must be an integer in the range [2, 20].
Default Value: 4
Types: integer

bitmap.level

Optional Argument.
Specifies the number of consecutive symbols to be converted to one symbol on a bitmap. For bitmap level 1, the bitmap contains the symbols "a", "b", "c", and so on; for bitmap level 2, the bitmap contains the symbols "aa", "ab", "ac", and so on. The input value must be an integer in the range [1, 4].
Note: "output" value must be "bitmap".
Default Value: 2
Types: integer

print.stats

Optional Argument.
Specifies whether the function prints the mean and standard deviation.
Note: "output" value must be "string".
Default Value: FALSE
Types: logical

accumulate

Optional Argument.
Specifies the names of the input tbl_teradata columns that are to appear in the output tbl_teradata. For each sequence in the input tbl_teradata, the function chooses the value corresponding to the first time point in the sequence to output as the "accumulate" value.
Types: character OR vector of Strings (character)

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

meanstats.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "meanstats.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

stdevstats.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "stdevstats.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_sax_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("sax_example", "finance_data3")

    # Create object(s) of class "tbl_teradata".
    finance_data3 <- tbl(con, "finance_data3")

    # Example 1: This example uses "window.type" as global and default output value.
    td_sax_out <- td_sax_mle(data = finance_data3,
                             data.partition.column = c("id"),
                             data.order.column = c("period"),
                             value.columns = c("expenditure","income","investment"),
                             time.column = "period",
                             window.type = "global",
                             print.stats = TRUE,
                             accumulate = c("id")
                            )

    # Example 2: This example uses "window.type" as sliding and default output value.
    # "window.size" should also be specified when "window.type" is set as sliding.
    td_sax_out2 <- td_sax_mle(data = finance_data3,
                              data.partition.column = c("id"),
                              data.order.column = c("period"),
                              value.columns = c("expenditure"),
                              time.column = "period",
                              window.type = "sliding",
                              window.size = 20,
                              print.stats = TRUE,
                              accumulate = c("id")
                             )

    # Example 3: This example uses a the multiple-input version, where the
    # mean and standard deviation statistics are applied globally with the
    # meanstats tbl_teradata and the stdevstats tbl_teradata.
    meanstats <- tbl(con, "finance_data3") %>% group_by(id) %>%
                    summarize(expenditure = mean(expenditure, na.rm = TRUE),
                    income =  mean(income, na.rm = TRUE),
                    investment =  mean(investment, na.rm = TRUE))
    stdevstats <- tbl(con, "finance_data3") %>% group_by(id) %>%
                    summarize(expenditure = sd(expenditure, na.rm = TRUE),
                    income =  sd(income, na.rm = TRUE),
                    investment =  sd(investment, na.rm = TRUE))

    td_sax_out3 <- td_sax_mle(data = finance_data3,
                              data.partition.column = c("id"),
                              data.order.column = c("id"),
                              meanstats.data = meanstats,
                              meanstats.data.partition.column = c("id"),
                              stdevstats.data = stdevstats,
                              stdevstats.data.partition.column = c("id"),
                              value.columns = c("expenditure","income","investment"),
                              time.column = "period",
                              window.type = "global",
                              accumulate = c("id")
                             )