ChangePointDetection Function | Teradata Vantage - ChangePointDetection (ML Engine)

ChangePointDetection Function | Teradata Vantage - ChangePointDetection (ML Engine) - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The change-point search method for retrospective change-point detection, binary segmentation, uses this procedure:

Search the data for the first change point.
At that change point, split the data into two parts.
In each part, select the change point with the minimum loss.
Repeat this procedure until there are either no new change points or the maximum number of change points.

Binary segmentation is an approximation method, because the change point is decided with only part of the data. However, this method is efficient and has an O(n log n) computational cost, where n is the number of data points.

Figure showing binary segmentation, used by Machine Learning Engine function ChangepointDetection

Taking normal distribution as an example, the change-point problem is to test the following null hypothesis:

H0:μ = μ1 = μ2 = … = μn and σ2 = σ12 = σ22 = … σn2

as opposed to the alternatives,

H1:μ1 = … = μk1 ≠ μk1+1 = … μk2 ≠ ... ≠ μkq+1 = ...= μn

and

σ12 = … = σk12 ≠ σk1+12 = … = σk22 ≠ ... ≠ σkq+12 = … = σn2

Binary segmentation performs the following tests in each iteration:

H1:μ1 = … = μk1 ≠ μk1+1 = … = μn

and

σ12 = … = σk12 ≠ σk1+12 = … σn2

These are the formulas for the log likelihood functions H0 and H1:

First of two loglikehhood formulas used by Machine Learning Engine function ChangepointDetection

Second of two loglikehhood formulas used by Machine Learning Engine function ChangepointDetection

These are the formulas for the maximum likelihood estimation of μ and σ2:

First of two formulas for the maximum likelihood estimation used by Machine Learning Engine function ChangepointDetection

Second of two formulas for the maximum likelihood estimation used by Machine Learning Engine function ChangepointDetection

From the preceding formulas, the binary segmentation algorithm computes max LogL1 by giving k different values. Then, to check for a change point, the algorithm compares the difference between max LogL1 and LogL0 to the penalty value.

If the algorithm detects a change point, it adds that change point to its list of candidate change points and splits the data into two parts. From the candidate change points that the algorithm finds in the two parts, it selects the one with the minimum loss.

The ChangePointDetection function detects change points in a stochastic process or time series, using retrospective change-point detection, implemented with these algorithms:

Search algorithm: binary search
Segmentation algorithm: normal distribution and linear regression

Use this function when the input data can be stored in memory and the application does not require a real-time response. If the input data cannot be stored in memory, or the application requires a real-time response, use the function ChangePointDetectionRT (ML Engine).