The change-point search method for retrospective change-point detection, binary segmentation, uses this procedure:
- Search the data for the first change point.
- At that change point, split the data into two parts.
- In each part, select the change point with the minimum loss.
- Repeat this procedure until there are either no new change points or the maximum number of change points.
Binary segmentation is an approximation method, because the change point is decided with only part of the data. However, this method is efficient and has an O(n log n) computational cost, where n is the number of data points.
Taking normal distribution as an example, the change-point problem is to test the following null hypothesis:
H 0:μ = μ 1 = μ 2 = … = μ n and σ 2 = σ 1 2 = σ 2 2 = … σ n 2
as opposed to the alternatives,
H 1:μ 1 = … = μ k 1 ≠ μ k 1+1 = … μ k 2 ≠ ... ≠ μ k q+1 = ...= μ n
σ 1 2 = … = σ k 1 2 ≠ σ k 1+1 2 = … = σ k 2 2 ≠ ... ≠ σ k q+1 2 = … = σ n 2
Binary segmentation performs the following tests in each iteration:
H 1:μ 1 = … = μ k 1 ≠ μ k 1+1 = … = μ n
σ 1 2 = … = σ k 1 2 ≠ σ k 1+1 2 = … σ n 2
The log likelihood functions H 0 and H 1 are:
The maximum likelihood estimation of μ and σ2 are:
From the preceding formulas, the binary segmentation algorithm computes max LogL 1 by giving k different values. Then, to check for a change point, the algorithm compares the difference between max LogL 1 and LogL 0 to the penalty value.
If the algorithm detects a change point, it adds that change point to its list of candidate change points and splits the data into two parts. From the candidate change points that the algorithm finds in the two parts, it selects the one with the minimum loss.
The algorithm repeats the preceding process until it finds all change points or reaches the maximum change-point number.