1.0 - 8.00 - CCM Arguments - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)
Tip: Read the description of the EmbeddingDimensions argument first.
SequenceIDColumn
Specify the name of the input table column that contains the time sequence identifiers. A time sequence is a sample of the time series.
TimeColumn
Specify the name of the input table column that contains the time stamps.
CauseColumns
Specify the names of the input table columns that contain values to evaluate as potential causes.

To select best embedding dimension, CauseColumns and EffectColumns must specify the same single column.

EffectColumns
Specify the names of the input table columns that contain values to evaluate as potential effects.

To select best embedding dimension, CauseColumn and EffectColumn must specify the same single column.

LibrarySize
[Optional] If the function selects the best embedding dimension, you must omit this argument.

Specify the sizes of the libraries that the function uses. Each library contains randomly selected points along the potential effect time series. The function uses the libraries to predict values of the cause time series. If the correlation between the predicted values of the cause time series and the actual values increases as the size of the library increases, a causal relationship is said to exist.

Each library_size must be a positive INTEGER.

If the function does not select the best embedding dimension, and you omit this argument, the function uses two library sizes: 100 and dimension * time_step + 1 (dimension and time_step are specified by the arguments EmbeddingDimensions and TimeStep, respectively).

If you specify a single library_size, the function uses two library sizes: library_size and dimension * time_step + 1.

EmbeddingDimensions
[Optional] Specify the number of past values to use when predicting a given value of the time series. If you specify only one dimension, the function uses it. If you specify multiple dimensions, the function selects the best one.
Each dimension must be a positive INTEGER. Default: 2
PredictStep
[Optional] Specify the number of time steps into the future to make predictions from past observations, if the function selects the best embedding dimension.
The predict_step must be a positive INTEGER. Default: 1
TimeStep
[Optional] Specify the number of time steps between past values to use when predicting a given value of the time series.

The time_step must be a positive INTEGER. Default: 1

BootstrapIterations
[Optional] Specify the number of bootstrap iterations to use when predicting a given value of the time series. The function uses the bootstrap process to estimate the uncertainty associated with the predicted values. If the function selects the best embedding dimension, iterations has the value 1, and the function ignores this argument if it is specified.
The iterations must be a positive INTEGER. Default: 100
PointSelectRule
[Optional] Specify the rule for selecting the nearest points if the function is to select the best embedding dimension:
Option Description
'DistanceOnly' (Default) Function determines nearest points based only on computed distance.
'DistanceAndTime' Function determines nearest points based on both computed distance and time, which matches procedure described in documentation for R package multispatialCCM (version 1.0).
ExecutionMode
[Optional] Specify the execution mode:
Option Description
'single' (Default) Function runs on single node and outputs results similar to those produced by R package multispatialCCM (version 1.0).

Use 'single' unless input table is large or 'single' is slow.

'distribute' Function distributes data and runs across multiple vworkers. To parallelize run, function uses an algorithm different from that of R, and its results can be different from those of R.
SelfPredict
[Optional] Specify whether the function tries to predict each attribute using the attribute itself. If an attribute can predict its own time series well, the signal-to-noise ratio is too low for the CCM algorithm to work effectively.
Default: 'false'
If the function selects the best embedding dimension, this argument must specify 'true'.
Seed
[Optional] Specify the random seed the algorithm uses for repeatable results (for more information, see Nondeterministic Results). The seed must be a LONG value.