7.00.02 - SupervisedShapeletTrainer Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
Last Update
2018-04-17
InputTable
Specifies the name of the table that contains the input data.
CategoryTable
[Optional] Specifies the name of the table that contains the categories (classes) for the time series in input_data_table. Default: input_data_table.

If input_categories_table is different from input_data_table, the function ignores any time series that is not in both input_categories_table and input_data_table. If a time series is represented by multiple rows in input_categories_table, these rows must contain the same category; otherwise, the function might not select the correct category.

IDColumn
Specifies the name of the column in input_data_table and input_categories_table that contains the unique identity of a time series.
TimeColumn
Specifies the name of the input_data_table column that contains the time axis of the data.
ValueColumn
Specifies the name of the input_data_table column that contains the data points.
CategoryColumn
Specifies the name of the input_categories_table column that contains the category (class) of the time series.
SaxSymbolsPerWindow
[Optional] Specifies the SAX2 argument SymbolsPerWindow, which specifies the number of SAX code symbols to generate from a window. The symbols_per_window must an INTEGER in the range [1, 1000000]. Default: 10.

If the symbols_per_window is greater than the length of the shortest time series in input data set (d), its value becomes d.

SaxMinWindowSize
[Optional] Specifies the SAX2 argument WindowSize , which specifies the size of the sliding window. The min_window_size defines the length (number of data points) of the shortest shapelet; the minimum span (time series length) used to distinguish two time series from each other. The min_window_size must be an integer in the range [1, 1000000]. Default: 10.

If the min_window_size is greater than the length of the shortest time series in input data set (d), its value becomes d. If min_window_size is smaller than symbols_per_window, its value becomes symbols_per_window.

SaxMaxWindowSize
[Optional] Specifies the SAX2 argument WindowSize , which specifies the size of the sliding window. The max_window_size defines the length of the longest shapelet; the maximum span used to distinguish two time series from each other. The max_window_size must be an integer in the range [1, 1000000] that is greater than or equal to min_window_size. Default: 70.

If the max_window_size is greater than the length of the shortest time series in input data set (d), its value becomes d.

A greater difference between min_window_size and max_window_size increases the probability of identifying better shapelets at the cost of higher execution time. The function uses this formula to compute the number of sliding windows, n:

n = ((max_window_size min_window_size) / symbols_per_window)+1

The maximum value of n is 20.

SaxOutputFrequency
[Optional] Specifies the SAX2 argument OutputFrequency, which specifies the number of data points to skip between successive sliding windows. The gap_between_windows must be an integer in the range [1, 1000]. Default: 10. A smaller value increases accuracy (the chance of distinguishing time series from each other) at the cost of higher execution time.
ModelTable
[Optional] Specifies the name of the output model table that contains trained shapelets. Default: "shapelet_model".
OverwriteOutput
[Optional] Specifies whether to overwrite output_model_table, if it exists. Default: 'false'.
RandomProjections
[Optional] Specifies the number of iterations required for random masking of SAX words during shapelet training. The projections must be an INTEGER in the range [1, 40]. Default: 10.

Specifying a greater projections for a longer input time series increases the probability of identifying better shapelets at the cost of higher execution time.

ShapeletCount
[Optional] Specifies the maximum number of shapelets in the output model table. The num_shapelets must be an INTEGER in the range [1, 100000]. Default: 20.
TimeInterval
[Optional] Specifies the number of data points in a time series to skip between consecutive time series windows when calculating the distance of a shapelet from a time series.

The function builds a shapelet classification tree based on the distance of a shapelet from the time series data. Because a shapelet is typically much smaller than a complete time series, the function calculates the distance of a shapelet from a time series by sliding the shapelet across time series windows of shapelet length, calculating the distance between the shapelet and each window, and then selecting the smallest distance.

The num_data_points is the number of data points to skip when sliding from one time series window to the next. The num_data_points must be an INTEGER in the range [1, 1000000]. The value 1 gives optimal results at the cost of higher execution time. Default: 10.

Seed
[Optional] Specifies the seed value for the function to use to generate random numbers, which it uses internally. The seed must be an INTEGER in the range [1, 100000]. Default: 23.