In statistical analysis, change detection or change-point detection tries to identify the abrupt changes of a stochastic process or time series.
Consider the following ordered time series data sequence:
y(t), t=1, 2, ..., n
where t is a time variable.
Change-point detection tries to find a segmented model M, given by the following equation:
Y = f 1(t, w 1) + e 1(t), (1 <t <=τ 1)
= f 2(t, w 2) + e 2(t), (τ 1 <t <=τ 2)
= f k (t, w k ) + e k (t), (τ k-1 <t <=τ k )
= f k+1(t, w k+1) + e k+1(t), (τ k <t <=n k )
- f i(t,w 1) is the function (with its vector of parameters w i) that fits in segment i.
- Each τ i is the change point between successive segments.
- Each e i(t) is an error term.
- n is the size of data series and k is the number of change points.
Segmentation model selection aims to find the function f i(t,w 1) that best approximates the data of each segment. Various model selection methods have been proposed. According to literature, the most commonly used model selection method is normal distribution.
Search method selection aims to find the change points from a global perspective.
If τ0 =0 and τ k+1 =n, one common method of identifying the change point is to minimize this value:
C is a cost function for a segment to measure the difference between f i(t,w 1) and the original data. βf (k) is a penalty to guard against overfitting. The choice is linear in the number of change points k; that is, βf (k)=βk. There are information criteria for the evaluation, such as Akaike Information Criterion (AIC) and Bayes Information Criterion (BIC).
For AIC, β=2p, where p is the number of additional parameters introduced by adding a change point.
For BIC (also called SBIC), β=plog(n).
Change-point detection methods are classified into two categories, based on speed of detection: