Change-Point Detection Functions (ML Engine)

Change-Point Detection Functions (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Change-point detection functions detect the change points in a stochastic process or time series. These functions take sorted time series data as input and output change points or data segments.

In statistical analysis, change detection or change-point detection tries to identify the abrupt changes of a stochastic process or time series.

Consider the following ordered time series data sequence, where t is a time variable:

y(t), t=1, 2, ..., n

Change-point detection tries to find a segmented model M, given by the following equation:

Y = f1(t, w1) + e1(t), (1 <t <=τ1)

= f2(t, w2) + e2(t), (τ1 <t <=τ2)

...

= fk(t, wk) + ek(t), (τk-1 <t <=τk)

= fk+1(t, wk+1) + ek+1(t), (τk <t <=nk)

where:

fi(t,w1) is the function (with its vector of parameters wi) that fits in segment i.
Each τi is the change point between successive segments.
Each ei(t) is an error term.
n is the size of data series and k is the number of change points.

Segmentation model selection aims to find the function fi(t,w1) that best approximates the data of each segment. Various model selection methods have been proposed. According to literature, the most commonly used model selection method is normal distribution.

Search method selection aims to find the change points from a global perspective.

If τ0 =0 and τk+1 =n, one common method of identifying the change point is to minimize this value:

Formula for identifying the change point, used by Machine Learning Engine change-point detection functions

C is a cost function for a segment to measure the difference between fi(t,w1) and the original data. βf (k) is a penalty to guard against over-fitting. The choice is linear in the number of change points k; that is, βf (k)=βk. There are information criteria for the evaluation, such as Akaike Information Criterion (AIC) and Bayes Information Criterion (BIC).

For AIC, β=2p, where p is the number of additional parameters introduced by adding a change point.

For BIC (also called SBIC), β=plog(n).

Function	Description
ChangePointDetection (ML Engine)	For when input data can be stored in memory.
ChangePointDetectionRT (ML Engine)	For when input data cannot be stored in memory or application needs real-time response.