Convergent Cross-Mapping Functions - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Convergent cross-mapping (CCM) is a method for evaluating whether one time series variable in a system has a causal influence on another. Unlike the symmetric relationship of correlation, a causality relationship detected by CCM can be unidirectional: while (A is correlated with B) always implies that (B is correlated with A), the relationship found by CCM can simultaneously satisfy (A causes B) and (B does not cause A).

The intuition behind the CCM algorithm is that if variable A is a cause of variable B, information about time series A is reflected in time series B. Therefore, you can estimate A from B (this is the reverse of cause and effect). If the predictability of time series A improves with increasing information from time series B, A has a causal influence on B. This counter-intuitive definition is described in detail in the following references.

The mathematical justification for this approach depends on a result from the dynamical systems theory Takens Theorem, which demonstrates that a complex dynamical system can be embedded into a low-dimensional space. This approach is designed for short time series (less than 30 points) for which multiple samples are available.

To test for causality, the CCM function:

  1. Chooses a library of short time series from the effect variable.
  2. Uses this library to predict values of the cause variable.

    The function uses a k-nearest neighbors algorithm to predict the cause variable from the effect variable and a bootstrapping process to estimate the uncertainty associated with the predicted values.

  3. Uses this library to evaluates the goodness-of-fit of the predictions.

    For numerical variables, the function determines goodness-of-fit using the correlation between the predictions and the true values. For categorical variables, the function determines goodness-of-fit using the Jaccard Index.

  4. Repeats this procedure for libraries of different sizes.

    If increasing the library size results in a significant improvement of the goodness-of-fit, and the correlation is significantly greater than zero, there is a causal relationship. You can be sure that you have considered large enough libraries if the goodness-of-fit converges (that is, stops improving) as you continue to increase library size.

The following references provide more detail:

  • Sugihara et al. Detecting Causality in Complex Systems, Science, 2012.
  • Clark et al. Spatial convergent cross mapping to detect causal relationships from short time series, Ecology, 2015.