Teradata R Package Function Reference - 16.20 - SAX - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
16.20
created_date
February 2020
category
Programming Reference
featnum
B700-4007-098K

Description

The Symbolic Aggregate approXimation (td_sax_mle) function transforms original time series data into symbolic strings, which are more suitable for additional types of manipulation, because of their smaller size and the relative ease with which patterns can be identified and compared. Input and output formats allow it to supply data to the Shapelet Functions.

Usage

  td_sax_mle (
      data = NULL,
      data.partition.column = NULL,
      data.order.column = NULL,
      meanstats.data = NULL,
      stdevstats.data = NULL,
      value.columns = NULL,
      time.column = NULL,
      window.type = "global",
      output = "string",
      mean = NULL,
      st.dev = NULL,
      window.size = NULL,
      output.frequency = 1,
      points.persymbol = 1,
      symbols.perwindow = NULL,
      alphabet.size = 4,
      bitmap.level = 2,
      print.stats = FALSE,
      accumulate = NULL,
      data.sequence.column = NULL,
      meanstats.data.sequence.column = NULL,
      stdevstats.data.sequence.column = NULL,
      meanstats.data.partition.column = NULL,
      stdevstats.data.partition.column = NULL,
      meanstats.data.order.column = NULL,
      stdevstats.data.order.column = NULL
  )

Arguments

data

Required Argument.
Specifies the input table.

data.partition.column

Required Argument.
Specifies the Partition By Columns for data.
Values to this argument can be provided as vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

data.order.column

Required Argument.
Specifies the Order By Columns for data.
Values to this argument can be provided as vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

meanstats.data

Optional Argument.
Specifies the tbl_teradata that contains the global means of each column in "value.columns" argument of the input table.

meanstats.data.partition.column

Required Argument when meanstats.data is specified.
Specifies the Partition By Columns for meanstats.data. Values to this argument can be provided as vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

meanstats.data.order.column

Optional Argument.
Specifies the Order By Columns for meanstats.data.
Values to this argument can be provided as vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

stdevstats.data

Optional Argument.
Specifies the tbl_teradata that contains the global standard deviations of each column in "value.columns" argument of the input table.

stdevstats.data.partition.column

Required Argument when stdevstats.data is specified.
Partition By columns for stdevstats.data.
Values to this argument can be provided as vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

stdevstats.data.order.column

Optional Argument.
Specifies the Order By Columns for stdevstats.data.
Values to this argument can be provided as vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

value.columns

Required Argument.
Specifies the names of the input tbl_teradata columns that contain the time series data to be transformed.
Types: character OR vector of Strings (character)

time.column

Optional Argument. Specifies the name of the input tbl_teradata column that contains the time axis of the data.
Types: character OR vector of Strings (character)

window.type

Optional Argument.
Determines how much data the function processes at one time:

  1. "global" (default): The function computes the SAX code using a single mean and standard deviation for the entire data set.

  2. "sliding": The function recomputes the mean and standard deviation for a sliding window of the data set.

Default Value: "global"
Permitted Values: sliding, global
Types: character

output

Optional Argument.
Determines how the function outputs the results:

  1. "string" (default): The function outputs a list of SAX codes for each window.

  2. "bytes": The function outputs the list of SAX codes as compact byte arrays (which are not "human-readable").

  3. "bitmap": The function outputs a JSON representation of a SAX bitmap.

  4. "characters": The function outputs one character for each line.

Default Value: "string"
Permitted Values: string, bitmap, bytes, characters
Types: character

mean

Optional Argument.
Specifies the global mean values that the function uses to calculate the SAX code for every partition. A mean value has the data type numeric. If "mean" specifies only one value and "value.columns" specifies multiple columns, then the specified "mean" value applies to every item in "value.columns". If "mean" specifies multiple values, then it must specify one value for each item in "value.columns". The nth mean value in "mean" corresponds to the nth item in "value.columns".
Tip: To specify a different global mean value for each partition, use the multiple-input syntax and put the values in the meanstats table.
Default Value: NULL
Types: numeric

st.dev

Optional Argument.
Specifies the global standard deviation values that the function uses to calculate the SAX code for every partition. A standard deviation value has the data type numeric and its value must be greater than 0. If it specifies only one value and "value.columns" specifies multiple columns, then the specified "st.dev" value applies to every item in "value.columns". If it specifies multiple values, then it must specify one value for each item in "value.columns". The nth standard deviation value corresponds to the nth item in "value.columns" argument.
Tip: To specify a different global standard deviation value for each partition, use the multiple-input syntax and put the values in the stdevstats table.
Default Value: NULL
Types: numeric

window.size

Optional Argument.
Specifies the size of the sliding window. The value must be an integer greater than 0. Types: numeric

output.frequency

Optional Argument.
Specifies the number of data points that the window slides between successive outputs. The value must be an integer greater than 0.
Note: "window.type" value must be "sliding" and "output" value cannot be "characters". If window.type is "sliding" and "output" value is "characters", then "output.frequency" is automatically set to the value of "window.size", to ensure that a single character is assigned to each time point. If the number of data points in the time series is not an integer multiple of the window size, then the function ignores the leftover parts.
Default Value: 1
Types: numeric

points.persymbol

Optional Argument.
Specifies the number of data points to be converted into one SAX symbol. Each value must be an integer greater than 0.
Note: "window.type" value must be "global".
Default Value: 1
Types: numeric

symbols.perwindow

Optional Argument. Specifies the number of SAX symbols to be generated for each window. Each value must be an integer greater than 0. The default value is the value of "window.size".
Note: "window.type" value must be "sliding".
Types: numeric

alphabet.size

Optional Argument.
Specifies the number of symbols in the SAX alphabet. The value must be an integer in the range [2, 20].
Default Value: 4
Types: numeric

bitmap.level

Optional Argument.
Specifies the number of consecutive symbols to be converted to one symbol on a bitmap. For bitmap level 1, the bitmap contains the symbols "a", "b", "c", and so on; for bitmap level 2, the bitmap contains the symbols "aa", "ab", "ac", and so on. The input value must be an integer in the range [1, 4].
Note: "output" value must be "bitmap".
Default Value: 2
Types: numeric

print.stats

Optional Argument.
Specifies whether the function prints the mean and standard deviation.
Note: "output" value must be "string".
Default Value: FALSE
Types: logical

accumulate

Optional Argument.
Specifies the names of the input tbl_teradata columns that are to appear in the output table. For each sequence in the input table, td_sax_mle function chooses the value corresponding to the first time point in the sequence to output as the "accumulate" value.
Types: character OR vector of Strings (character)

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

meanstats.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "meanstats.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

stdevstats.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "stdevstats.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_sax_mle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("sax_example", "finance_data3")
    
    # Create remote tibble objects.
    finance_data3 <- tbl(con, "finance_data3")
    
    # Example 1 - This example uses window.type as global and default output value.
    td_sax_out <- td_sax_mle(data = finance_data3,
                         data.partition.column = c("id"),
                         data.order.column = c("period"),
                         value.columns = c("expenditure","income","investment"),
                         time.column = "period",
                         window.type = "global",
                         print.stats = TRUE,
                         accumulate = c("id")
                        )
    
    # Example 2 - This example uses window.type as sliding and default output value.        # window.size should also be specified when window.type is set as sliding.
    td_sax_out2 <- td_sax_mle(data = finance_data3,
                          data.partition.column = c("id"),
                          data.order.column = c("period"),
                          value.columns = c("expenditure"),
                          time.column = "period",
                          window.type = "sliding",
                          window.size = 20,
                          print.stats = TRUE,
                          accumulate = c("id")
                         )
    
    # Example 3 - This example uses a the multiple-input version, where the 
    # mean and standard deviation statistics are applied globally with 
    # meanstats and the stdevstats tables.
    meanstats <- tbl(con, "finance_data3") %>% group_by(id) %>%  
                    summarize(expenditure = mean(expenditure, na.rm = TRUE), 
                    income =  mean(income, na.rm = TRUE), 
                    investment =  mean(investment, na.rm = TRUE))
    stdevstats <- tbl(con, "finance_data3") %>% group_by(id) %>%  
                    summarize(expenditure = sd(expenditure, na.rm = TRUE), 
                    income =  sd(income, na.rm = TRUE), 
                    investment =  sd(investment, na.rm = TRUE))
    
    td_sax_out3 <- td_sax_mle(data = finance_data3,
                          data.partition.column = c("id"),
                          data.order.column = c("id"),
                          meanstats.data = meanstats,
                          meanstats.data.partition.column = c("id"),
                          stdevstats.data = stdevstats,
                          stdevstats.data.partition.column = c("id"),
                          value.columns = c("expenditure","income","investment"),
                          time.column = "period",
                          window.type = "global",
                          accumulate = c("id")
                         )