DTW Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

This example compares multiple time series to both a common template and each other. Each time series represents stock prices and the template represents a series of stock index prices.

Input

InputTable: timeseriesdata
timeseriesid timestamp1 stockprice
1 0 24.2019
1 0.025063 27.8701
1 0.050125 31.4969
1 0.075188 35.083
1 0.100251 38.6286
1 0.125313 42.1343
1 0.150376 45.6005
1 0.175439 49.0276
1 0.200501 52.4162
1 0.225564 55.7666
1 0.250627 59.0792
... ... ...
TemplateTable: templatedata
templateid timestamp2 index_price
1 0 0
1 0.025063 0
1 0.050125 0
1 0.075188 0
1 0.100251 0
1 0.125313 0
1 0.150376 0
1 0.175439 0
1 0.200501 0
1 0.225564 0
1 0.250627 0
... ... ...
MappingTable: mappingdata
timeseriesid templateid
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
4 1
4 2
4 3

SQL Call

SELECT * FROM DTW (
  ON timeseriesdata AS InputTable PARTITION BY timeseriesid ORDER BY timestamp1
  ON templatedata AS TemplateTable DIMENSION ORDER BY timestamp2
  ON mappingdata AS MappingTable PARTITION BY timeseriesid
  USING
  TargetColumns ('stockprice', 'timestamp1')
  TemplateColumns ('indexprice', 'timestamp2')
  TimeSeriesId ('timeseriesid')
  TemplateId ('templateid')
) AS dt ORDER BY 1, 2;

Output

 timeseries_id template_id warp_distance      
 ------------- ----------- ------------------ 
             1           1      25163.9453125
             1           2      7547.69140625
             1           3    19577.634765625
             2           1 132.66847229003906
             2           2  1904.084716796875
             2           3  71.78057861328125
             3           1  351.6783142089844
             3           2   3614.21240234375
             3           3  75.77669525146484
             4           1   4927.62451171875
             4           2  914.2536010742188
             4           3          16641.625

Plot and Interpretation of Results



The warping distance is an unnormalized measure of how dissimilar two time series are. The warp_distance column in the output table has the warping distance for all pairs in the mapping table; that is, for every timeseries_id and template_id number.

The figure shows that input 2 is more similar to templates 1 and 3 than to template 2. The warp distances also show this:
Template Warp Distance
1 131.588
2 106.131
3 ~540

Because the dissimilarity of two time series is not based on whether they are temporarily close (the time is stretched and the two time series that are offset by a constant time interval are effectively the same), input 3 is not very dissimilar to templates 1 and 3. However, input 4 has the largest warping distance measure from templates 1 and 3, as the curvature of the latter 2 is far from input 4. Time stretching brings input 4 closer to templates 1 and 3, but with a larger warping path (not output above) and therefore, a larger warping distance.

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.