This example demonstrates the grouping operation used with multiple inputs to a table operator, called a cogroup. All of the inputs that satisfy the condition are grouped together in a single group. Cogroup matches the partition keys of multiple inputs, thus ensuring that the results are correct. Using a cogroup enhances the ability of table operators to work with multiple input streams and dimension tables.
Suppose that you have the following two tables.
WebLog | ||
---|---|---|
cookie | cart_amount | page |
AAAA | $60 | Thankyou |
AAAA | $140 | Thankyou |
BBBB | $100 | Thankyou |
CCCC | Intro | |
CCCC | $200 | Thankyou |
DDDD | $100 | Thankyou |
AdLog | ||
---|---|---|
cookie | ad_name | action |
AAAA | Champs | Impression |
AAAA | Puppies | Click |
BBBB | Apples | Click |
CCCC | Baseball | Impression |
CCCC | Apples | Click |
You run the following query:
SELECT cookie, cart_amt, adname, action FROM attribute_sales ( ON (SELECT cookie,cart_amt FROM weblog WHERE page = 'thankyou' ) as W PARTITION BY cookie ON adlog as S PARTITION BY cookie) as result1 ;
The inputs are grouped together similar to the following table.
Grouped and Nested Relations With the Cogroup on Cookie | ||
---|---|---|
Cookie | WebLog | AdLog |
AAAA | AAAA,$60, thankyou AAAA,$140,thankyou |
AAAA,champs,impression AAAA,puppies,click |
BBBB | BBBB,$100,thankyou | BBBB,apples,click |
CCCC | CCCC,$200,thankyou | CCCC,baseball,impression CCCC,apples,click |
DDDD | DDDD,$100,thankyou |
The output of the query is similar to the following.
adname | attr_revenue |
champs | $40 |
puppies | $160 |
apples | $240 |
baseball | $40 |
Each time you invoke the table operator, only the inputs that have the same partition key values participate in the invocation.
If there are no rows of the PARTITION BY ANY and PARTITION BY key inputs on an AMP and the query involves DIMENSION input, the table operator is not invoked even though the DIMENSION table has rows on that AMP.