1.1 - 8.10 - ShapeletSupervised Syntax Elements - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Published
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
Last Update
2019-12-31
ModelTable
[Optional] Specify the name for the model table that the function outputs.
Default: shapelet_model in the current schema
IDColumn
Specify the name of the column in InputTable and ResponseTable that contains the unique identity of a time series.
TimeColumn
Specify the name of the InputTable column that contains the time axis of the data.
TargetColumn
Specify the name of the InputTable column that contains the data points.
ResponseColumn
Specify the name of the ResponseTable column that contains the category (class) of the time series.
SAXSymbolsPerWindow
[Optional] Specify the SAX syntax element SymbolsPerWindow, which specifies the number of SAX code symbols to create from a window. The symbols_per_window must an INTEGER in the range [1, 1000000]. If the symbols_per_window is greater than the length of the shortest time series in input data set (d), its value becomes d.
Default: 10
SAXMinWindowSize
[Optional] Specify the SAX syntax element WindowSize , which specifies the size of the sliding window. The min_window_size defines the length (number of data points) of the shortest shapelet; the minimum span (time series length) used to distinguish two time series from each other.
The min_window_size must be an integer in the range [1, 1000000]. If the min_window_size is greater than the length of the shortest time series in input data set (d), its value becomes d. If min_window_size is smaller than symbols_per_window, its value becomes symbols_per_window.
Default: 10
SAXMaxWindowSize
[Optional] Specify the SAX syntax element WindowSize , which specifies the size of the sliding window. The max_window_size defines the length of the longest shapelet; the maximum span used to distinguish two time series from each other. The max_window_size must be an integer in the range [1, 1000000] that is greater than or equal to min_window_size.
If the max_window_size is greater than the length of the shortest time series in input data set (d), its value becomes d.
A greater difference between min_window_size and max_window_size increases the probability of identifying better shapelets at the cost of higher execution time. The function uses this formula to compute the number of sliding windows, n:

n = ((max_window_size min_window_size) / symbols_per_window)+1

The maximum value of n is 20.

Default: 70
SAXOutputFrequency
[Optional] Specify the SAX syntax element OutputFrequency, which specifies the number of data points to skip between successive sliding windows. The gap_between_windows must be an integer in the range [1, 1000]. A smaller value increases accuracy (the chance of distinguishing time series from each other) at the cost of higher execution time.
Default: 10
RandomProjections
[Optional] Specify the number of iterations required for random masking of SAX words during shapelet training. The projections must be an INTEGER in the range [1, 40].
Default: 10

Specifying a greater projections for a longer input time series increases the probability of identifying better shapelets at the cost of higher execution time.

ShapeletCount
[Optional] Specify the maximum number of shapelets in the output model table. The num_shapelets must be an INTEGER in the range [1, 100000].
Default: 20
TimeInterval
[Optional] Specify the number of data points in a time series to skip between consecutive time series windows when calculating the distance of a shapelet from a time series.

The function builds a shapelet classification tree based on the distance of a shapelet from the time series data. Because a shapelet is typically much smaller than a complete time series, the function calculates the distance of a shapelet from a time series by sliding the shapelet across time series windows of shapelet length, calculating the distance between the shapelet and each window, and then selecting the smallest distance.

The num_data_points is the number of data points to skip when sliding from one time series window to the next. The num_data_points must be an INTEGER in the range [1, 1000000]. The value 1 gives optimal results at the cost of higher execution time.

Default: 10

Seed
[Optional] Specify the random seed the algorithm uses for repeatable results. The seed must be an INTEGER in the range [1, 100000].
For repeatable results, use both the Seed and UniqueID syntax elements. For more information, see Nondeterministic Results and UniqueID Syntax Element.
Default: 23