Problem: Invoking a function using SeriesSplitter does not improve execution time.
Before trying workarounds, ensure that the data is skewed and that the function that uses SeriesSplitter does not exploit full parallelism. If the data is not skewed and the function exploits full parallelism, SeriesSplitter cannot improve its execution time.
Workaround:
- Invoke SeriesSplitter and the subsequent function in separate SQL-MapReduce calls (as in the first choice in Example 2: Using SeriesSplitter with Interpolator), rather than using SeriesSplitter in the ON clause of the subsequent function (as in the second choice in Example 2: Using SeriesSplitter with Interpolator).
- Adjust these arguments as follows:
- DuplicateRowsCount: as low as possible
- SplitCount: a smaller multiple (for example, 1) of the number of vworkers in the cluster
- RowsPerSplit: as high as possible (you want the resulting number of splits to be a smaller multiple of the number of vworkers in the cluster)
- Accumulate: specify as few columns as possible
- DuplicateColumn: omit this argument
- PartialSplitID: 'true'
- ReturnStatsTable: 'true'