Teradata R Package Function Reference - ChangePointDetection - Teradata R Package - Look here for syntax, methods and examples for the functions included in the Teradata R Package.

Teradata® R Package Function Reference

Product
Teradata R Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-28
dita:id
B700-4007
lifecycle
previous
Product Category
Teradata Vantage

Description

The ChangePointDetection function detects change points in a stochastic process or time series, using retrospective change-point detection, implemented with these algorithms:

  1. Search algorithm: binary search

  2. Segmentation algorithm: normal distribution and linear regression

The function takes sorted time series data as input and generates change points or data segments as output.

Usage

  td_changepoint_detection_mle (
      data = NULL,
      data.partition.column = NULL,
      data.order.column = NULL,
      value.column = NULL,
      accumulate = NULL,
      segmentation.method = "normal_distribution",
      search.method = "binary",
      max.change.num = 10,
      penalty = "BIC",
      output.option = "changepoint",
      data.sequence.column = NULL
  )

Arguments

data

Required Argument.
Specifies input table containing the input time series data.

data.partition.column

Required Argument.
Specifies partition by columns for data.
Values to this argument can be provided as vector, if multiple columns are used for partition.

data.order.column

Required Argument.
Specifies order by columns for data.
Values to this argument can be provided as vector, if multiple columns are used for ordering.

value.column

Required Argument.
Specifies the name of the input table column that contains the time series data.

accumulate

Required Argument.
Specifies the names of the input table columns to copy to the output table.
Tip: To identify change points in the output table, specify the columns that appear in 'data.partition.column' and 'data.order.column'.

segmentation.method

Optional Argument.
Specifies one of these segmentation methods:

  1. normal_distribution (default): In each segment, the data is in a normal distribution.

  2. linear_regression: In each segment, the data is in linear regression.


Default Value: "normal_distribution"
Permitted Values: normal_distribution, linear_regression

search.method

Optional Argument.
Specifies the search method, binary search. This is the default and only possible value.
Permitted Values: binary

max.change.num

Optional Argument.
Specifies the maximum number of change points to detect.
Default Value: 10

penalty

Optional Argument.
Specifies the penalty function, which is used to avoid over-fitting. Possible values are:

  1. BIC - For change point existence, the condition is: ln(L1)-ln(L0) > (p1-p0)*ln(n)/2.
    For normal distribution and linear regression, the condition is: (p1-p0)*ln(n)/2 = ln(n).

  2. AIC - the condition for the existence of a change point is: ln(L1)-ln(L0) > p1-p0.
    For normal distribution and linear regression, the condition is: p1-p0 = 2.

  3. For threshold(numeric value), the specified value is compared to: ln(L1)-ln(L0). L1 and L2 are the maximum likelihood estimation of hypotheses H1 and H0. p is the number of additional parameters introduced by adding a change point. p is used in the information criterion BIC or AIC. p1 and p0 represent this parameter in hypotheses H1 and H0 separately.

Default Value: "BIC"

output.option

Optional Argument.
Specifies the output table columns.
Default Value: "changepoint"
Permitted Values: changepoint, segment, verbose

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

Value

Function returns an object of class "td_changepoint_detection_mle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("changepointdetection_example", "finance_data2" , "cpt")
    
    # Create remote tibble objects.
    # The input signal is like a clock signal whose values can represent a
    # cyclic recurrence of an event (for example, electric power consumption at 
    # certain periods or sequence, pulserate, and so on)
    cpt <- tbl(con,"cpt")
    
    # Input contains two time series of finance data.
    finance_data2 <- tbl(con,"finance_data2")
    
    # Example 1: Two Series, Default Options
    td_changepoint_detection_out1 <- td_changepoint_detection_mle(data=finance_data2,
                                                             data.partition.column='sid',
                                                             data.order.column='id',
                                                             value.column='expenditure',
                                                             accumulate=c('sid','id','expenditure'))
    
    # Example 2 -  One Series, Default Options
    td_changepoint_detection_out2 <- td_changepoint_detection_mle(data=cpt,
                                                             data.partition.column="sid",
                                                             data.order.column="id",
                                                             value.column = "val",
                                                             accumulate = c("sid", "id")
                                                             )
    
    # Example 3 - One Series, VERBOSE Output
    td_changepoint_detection_out3 <- td_changepoint_detection_mle(data=cpt,
                                                             value.column = "val",
                                                             data.partition.column="sid",
                                                             data.order.column="id",
                                                             accumulate = c("sid","id"),
                                                             output.option = "verbose"
                                                             )
    
    # Example 4 - One Series, Penalty 10
    td_changepoint_detection_out4 <- td_changepoint_detection_mle(data=cpt,
                                                             data.partition.column="sid",
                                                             data.order.column="id",
                                                             value.column = "val",
                                                             accumulate = c("sid","id"),
                                                             penalty = "10"
                                                             )
    
    # Example 5 -  One Series, SEGMENT Output, Penalty 10
    td_changepoint_detection_out5 <- td_changepoint_detection_mle(data=cpt, 
                                                             data.partition.column="sid",
                                                             data.order.column="id",
                                                             value.column = "val",
                                                             accumulate = c("sid","id"),
                                                             penalty = "10",
                                                             output.option = "segment"
                                                             )
    
    # Example 6 - One Series, Penalty 20, Linear Regression
    td_changepoint_detection_out6 <- td_changepoint_detection_mle(data=cpt,
                                                             data.partition.column="sid",
                                                             data.order.column="id",
                                                             value.column = "val",
                                                             accumulate = c("sid","id"),
                                                             segmentation.method = "linear_regression",
                                                             penalty = "20"
                                                             )