Cogroup Example - Aster Analytics

Teradata AsterĀ® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1022
lifecycle
previous
Product Category
Software

The example uses a fictional SQL-MapReduce function, attribute_sales, which accepts two partitioned inputs and two arguments.

The inputs are:
  • weblog, which contains web store logs, the source of purchase information
  • adlog, which contains ad server logs

Both inputs are partitioned on the user browser cookie.

Cogroup Example Tables

The function arguments are:
  • Clicks, which specifies the percentage of sales to attribute to clicked ads that lead to a purchase
  • Impressions, which specifies the percentage of sales to attribute to web page views that lead to a purchase

Consider the following function call:

SELECT adname, attr_revenue FROM attribute_sales (
  ON (SELECT cookie, cart_amt, adname, action
      FROM weblog
      WHERE page = 'thankyou') AS W PARTITION BY cookie
  ON adlog AS S PARTITION BY cookie
  Clicks(.8)
  Impressions(.2)
);

The following figure shows how SQL-MapReduce runs the preceding function call.

How a SQL-MapReduce Function Performs a Cogroup

SQL-MapReduce cogroups the two inputs before the function operates on them. Conceptually, the cogroup operation has these steps:

  1. Group each input data set according to the cookie attribute specified in the PARTITION BY clauses.

    Form a cogroup tuple for each unique resulting group. The tuple is composed of the cookie value that identifies the group and a nested relation that contains all values from both the weblog and adlog inputs that belong to the group.

    The middle box in the preceding figure shows the result of the cogroup operation.

  2. Invoke the attribute_sales function for each cogroup tuple.

    Each invocation processes the nested relation, treating it as a single row, and then attributes the sales revenue to the appropriate advertisements.

    The bottom box in the preceding figure shows the output of the attribute_sales function.

The cogroup result includes a tuple for the DDDD cookie, although there is no corresponding group in the adlog data set. The reason is that the grouping operation performs an outer join, causing cogroup tuples that have empty groups for certain attributes to be included in the cogroup output.