The example uses a fictional SQL-MapReduce function, attribute_sales, which accepts two partitioned inputs and two arguments.
- weblog, which contains web store logs, the source of purchase information
- adlog, which contains ad server logs
Both inputs are partitioned on the user browser cookie.
- Clicks, which specifies the percentage of sales to attribute to clicked ads that lead to a purchase
- Impressions, which specifies the percentage of sales to attribute to web page views that lead to a purchase
Consider the following function call:
SELECT adname, attr_revenue FROM attribute_sales ( ON (SELECT cookie, cart_amt, adname, action FROM weblog WHERE page = 'thankyou') AS W PARTITION BY cookie ON adlog AS S PARTITION BY cookie Clicks(.8) Impressions(.2) );
The following figure shows how SQL-MapReduce runs the preceding function call.
SQL-MapReduce cogroups the two inputs before the function operates on them. Conceptually, the cogroup operation has these steps:
- Group each input data set according to the cookie attribute specified in the PARTITION BY clauses.
Form a cogroup tuple for each unique resulting group. The tuple is composed of the cookie value that identifies the group and a nested relation that contains all values from both the weblog and adlog inputs that belong to the group.
The middle box in the preceding figure shows the result of the cogroup operation.
- Invoke the attribute_sales function for each cogroup tuple.
Each invocation processes the nested relation, treating it as a single row, and then attributes the sales revenue to the appropriate advertisements.
The bottom box in the preceding figure shows the output of the attribute_sales function.
The cogroup result includes a tuple for the DDDD cookie, although there is no corresponding group in the adlog data set. The reason is that the grouping operation performs an outer join, causing cogroup tuples that have empty groups for certain attributes to be included in the cogroup output.