Teradata R Package Function Reference | 17.00 - 17.00 - CCM - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The CCM function takes two or more time series as input and evaluates potential cause-effect relationships between them. Each time series column can be a single, long time series or a set of shorter subsequences that represent the same process. The function returns an effect size for each cause-effect pair.

Usage

  td_ccm_mle (
      data = NULL,
      sequence.id.column = NULL,
      time.column = NULL,
      cause.columns = NULL,
      effect.columns = NULL,
      library.size = 100,
      embedding.dimension = 2,
      time.step = 1,
      bootstrap.iterations = 100,
      predict.step = 1,
      self.predict = FALSE,
      seed = NULL,
      point.select.rule = "DistanceOnly",
      mode = "Single",
      data.sequence.column = NULL
  )

Arguments

data

Required Argument.
Specifies the tbl_teradata containing the input data.

sequence.id.column

Required Argument.
Specifies the column containing the sequence ids. A sequence is a sample of the time series.
Types: character

time.column

Required Argument.
Specifies the column containing the timestamps.
Types: character

cause.columns

Required Argument.
Specifies the columns to be evaluated as potential causes.
Types: character OR vector of Strings (character)

effect.columns

Required Argument.
Specifies the columns to be evaluated as potential effects.
Types: character OR vector of Strings (character)

library.size

Optional Argument.
The CCM algorithm works by using "libraries" of randomly selected points along the potential effect time series to predict values of the cause time series. A causal relationship is said to exist if the correlation between the predicted values of the cause time series and the actual values increases as the size of the library increases. Each input value must be greater than 0.
Default Value: 100
Types: integer OR vector of integers

embedding.dimension

Optional Argument.
Specifies an estimate of the number of past values to use when predicting a given value of the time series. The input value must be greater than 0.
Default Value: 2
Types: integer OR vector of integers

time.step

Optional Argument.
Specifies the number of time steps between past values to use when predicting a given value of the time series. The input value must be greater than 0.
Default Value: 1
Types: integer

bootstrap.iterations

Optional Argument.
Specifies the number of bootstrap iterations used to predict. The bootstrap process is used to estimate the uncertainty associated with the predicted values. The input value must be greater than 0.
Default Value: 100
Types: integer

predict.step

Optional Argument.
Specifies the value for the number of time steps into the future to make predictions from past observations. This argument should be used if the best embedding dimension is needed.
Default Value: 1
Types: integer

self.predict

Optional Argument.
If this argument is set to TRUE, then the function will attempt to predict each attribute using the attribute itself. If an attribute can predict its own time series well, the signal-to-noise ratio is too low for the CCM algorithm to work effectively.
Default Value: FALSE
Types: logical

seed

Optional Argument.
Specifies the random seed used to initialize the algorithm.
Types: numeric

point.select.rule

Optional Argument.
Specifies the rules to select nearest points if the best embedding dimension is needed.
Default Value: "DistanceOnly"
Permitted Values: DistanceAndTime, DistanceOnly
Types: character

mode

Optional Argument.
Specifies the execution mode. CCM can be executed in single mode and distribute mode.
Default Value: "Single"
Permitted Values: Single, Distribute
Types: character

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_ccm_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("ccmprepare_example", "ccmprepare_input")
    loadExampleData("ccm_example", "ccm_input", "ccm_input2")

    # Load the time series datasets.
    ccm_input <- tbl(con, "ccm_input")
    ccm_input2 <- tbl(con, "ccm_input2")
    ccmprepare_input <- tbl(con, "ccmprepare_input")

    # Example 1: Find causal-effect relationship between income, expenditure and 
    # investiment fields.
    td_ccm_out <- td_ccm_mle(data = ccm_input,
                             sequence.id.column = "id",
                             time.column = "period",
                             cause.columns = c("income"),
                             effect.columns = c("expenditure","investment"),
                             seed = 0
                             )
                             
    # Example 2: Alternatively, the below example produces the same output as above
    # by making use of td_ccm_prepare_mle() and then using its output as input
    # for td_ccm_mle().
    td_ccm_prepare_out <- td_ccm_prepare_mle(data = ccmprepare_input,
                                             data.partition.column = "id"
                                            )
    td_ccm_out1 <- td_ccm_mle(data = td_ccm_prepare_out$result,
                              sequence.id.column = "id",
                              time.column = "period",
                              cause.columns = c("income"),
                              effect.columns = c("expenditure","investment"),
                              seed = 0
                              )


    # Example 3: Find the cause-effect relation on a sample market time series data.
    td_ccm_out2 <- td_ccm_mle(data = ccm_input2,
                              sequence.id.column = "id",
                              time.column = "period",
                              cause.columns = c("marketindex","indexval"),
                              effect.columns = c("indexdate","indexchange"),
                              library.size = 10,
                              seed = 0
                              )