Eliminating Redundant Distribution - Aster Execution Engine

Teradata Aster® Developer Guide

Product

Aster Execution Engine

Release Number

7.00.02

Published

July 2017

Language

English (United States)

Last Update

2018-04-13

dita:mapPath

xnl1494366523182.ditamap

dita:ditavalPath

Generic_no_ie_no_tempfilter.ditaval

dita:id

ffu1489104705746

lifecycle

Product Category

Software

With SQL-MapReduce Collaborative Planning, you can add logic to your SQL-MapReduce function that describes the distribution of the output data to the Planner.

For example, consider this query:

SELECT userid, max(sessionid)
FROM Sessionize(ON clickstream
  PARTITION BY userid
  ORDER BY ts
  TIMECOLUMN ('ts')
  TIMEOUT (60))
group by userid;

Without SQL-MapReduce Collaborative Planning, the plan consists of the following steps:

Distribute clickstream on userid (if not already distributed).
Execute the following query and store the output data in the temporary table tmp1:
```
Sessionize (select * from clickstream order by userid, ts)
```
Redistribute the data in tmp1.
After executing the query in step 2, the Planner does not know how the data is distributed. As a result, even though the function does not change the distribution, the Planner needs to run the distribute operator again to ensure that the data is distributed on userid.
- Source: All workers
  Partitioning columns: unknown
- Destination: All workers
  Partitioning columns: userid
Hash-aggregate:
Group on userid.

Compute max(sessionid).

Transfer the groups to the Queen.
Return the output data to the application from the Queen.

However, with SQL-MapReduce Collaborative Planning logic added to the Sessionize function, it can tell the Planner that there was no change in data distribution. This allows the Planner to eliminate the expensive redistribution step (see step 3 above), resulting in a significant optimization, as shown in the following figure.

Distribution example