For functions with partitioned inputs (that is, ON clauses with PARTITION BY attributes phrases), SQL-MapReduce performs these steps:
- Form a new cogroup tuple for every distinct p_attribute_set.
The distinct p_attribute_set is the first attribute of its new cogroup tuple.
- For each partitioned input, add a new attribute to the cogroup tuple.
This new attribute contains all attributes of each tuple in the input whose p_attribute_set match those of the cogroup tuple.
- For each dimensional input, add a new attribute to the cogroup tuple.
This new attribute contains all tuples of the dimensional input.
Now there is one cogroup tuple for each distinct p_attribute_set, each of which has:- One attribute that is p_attribute_set
- One attribute for each partitioned input, which contains a nested array of all matching tuples of that input
- One attribute for each dimensional input, which contains an array of all tuples of that input
- Invoke the SQL-MapReduce function on each cogroup tuple.
SQL-MapReduce uses comparison semantics for this grouping operation; therefore, NULL values are equivalent. Grouped tuples that have empty groups for certain attributes (that is, inputs with no tuples for a particular group) are included in the grouped output by default.