With SQL-MapReduce Collaborative Planning, you can add logic to your SQL-MapReduce function that checks whether it can push the predicate sent by the Planner to the function’s input and advises the Planner accordingly.
For example, consider this query, which retrieves the session information for a particular user:
SELECT userid, ts, sessionid FROM Sessionize(ON clickstream PARTITION BY custid ORDER BY ts TIMECOLUMN ('ts') TIMEOUT (60)) WHERE userid = 333;
Without SQL-MapReduce Collaborative Planning, the plan consists of the following steps:
- Execute the following query and store the output data in the temporary table tmp1:
SESSIONIZE (SELECT * FROM clickstream ORDER BY userid, ts)
- Filter the data in tmp1.
SELECT * FROM tmp1 WHERE userid = 333;
- Send the filtered data to the Queen.
- Return the output data to the application from the Queen.
However, with SQL-MapReduce Collaborative Planning, if your code determines that it can push the predicate on the function's input, the code notifies the Planner, which sends the following query to the function instead of the query sent in step 1 above:
SESSIONIZE (SELECT * FROM clickstream WHERE custid = 333 ORDER BY userid, ts)
In addition, the Planner eliminates the filtering step (see step 2 above), resulting in a significant optimization, as shown in the following figure.