SeriesSplitter Troubleshooting

SeriesSplitter Troubleshooting - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.10

1.1

Published

October 2019

Language

English (United States)

Last Update

2019-12-31

dita:mapPath

ima1540829771750.ditamap

dita:ditavalPath

jsj1481748799576.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Problem: Invoking a function using SeriesSplitter does not improve execution time.

Before trying workaround, verify that the data is skewed and that the function that uses SeriesSplitter does not exploit full parallelism. If the data is not skewed and the function exploits full parallelism, SeriesSplitter cannot improve its execution time.

Workaround:

Invoke SeriesSplitter and the subsequent function in separate SQL calls. For example, in this SQL call, company1_stock1_split is the output from SeriesSplitter Example: Partition Splitter:

SELECT * FROM Interpolator (
  ON company1_stock1_split AS InputTable
  PARTITION BY id ORDER BY "period"
  USING
  TimeColumn ('period')
  TimeInterval (86400)
  InterpolationType ('linear')
  ValueColumns ('stockprice')
  Accumulate ('id')
  DuplicateRowsCount (2)
) AS dt;

The preceding SQL call is likely to perform better than the following call, which uses SeriesSplitter in the ON clause of the subsequent function.

SELECT * FROM Interpolator (
  ON (
    SELECT * FROM SeriesSplitter (
      ON company1_stock1 AS InputTable
      USING
      PartitionByColumns ('id')
      OrderByColumns ('period')
      SplitCount (50)
      Accumulate ('stockprice')
      ReturnStatsTable ('false')
    ) AS dt1
  ) AS input_table PARTITION BY id ORDER BY period
  USING
  TimeColumn ('period')
  TimeInterval (86400)
  InterpolationType ('linear')
  ValueColumns ('stockprice')
  Accumulate ('id')
  DuplicateRowsCount (2)
) AS dt2;

Adjust these syntax elements as follows:

Syntax Element	Value
DuplicateRowsCount	As low as possible.
SplitCount	A smaller multiple (for example, 1) of the number of vworkers in the cluster.
RowsPerSplit	As high as possible (you want the resulting number of splits to be a smaller multiple of the number of vworkers in the cluster).
Accumulate	Specify as few columns as possible.
DuplicateColumn	Omit this syntax element.
PartialSplitID	'true'
ReturnStatsTable	'true'