With SQL-MapReduce Collaborative Planning, you can add logic to your SQL-MapReduce function that checks whether the function can apply the predicate sent by the Planner and advises the Planner accordingly.
For example, consider this query, which retrieves the first 10 sessions for all users:
SELECT userid, ts, sessionid FROM Sessionize(ON clickstream PARTITION BY userid ORDER BY ts TIMECOLUMN ('ts') TIMEOUT (60)) WHERE sessionid < 10;
Without SQL-MapReduce Collaborative Planning, the plan consists of the following steps:
- Execute the following query and store the output data in the temporary table tmp1:
SESSIONIZE (SELECT * FROM clickstream ORDER BY userid, ts) Filter the data in tmp1. SELECT * FROM tmp1 WHERE sessionid < 10;
- Send the filtered data to the Queen.
- Return the output data to the application from the Queen.
However, with SQL-MapReduce Collaborative Planning, your code can apply the predicate for each userid, which allows the function to skip to the next userid after the first 10 sessions are output. This allows the Planner to eliminate the redundant filtering step (see step 2 above), resulting in a significant optimization, as shown in the following figure.