SQL-MapReduce Example 1: Word Count - Aster Execution Engine

Teradata Aster® Developer Guide

Product
Aster Execution Engine
Release Number
7.00.02
Published
July 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
xnl1494366523182.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
ffu1489104705746
lifecycle
previous
Product Category
Software

Consider a hypothetical SQL-MapReduce function that splits strings into individual words:

SELECT word
FROM SplitIntoWords(
      ON documents
)

In this example, the SplitIntoWords function is invoked once for every row in the documents table. It is Java procedural code that takes each document and emits a row for each word. The function itself defines the columns that appear in its output rows; in this case, SplitIntoWords emits rows with a single column named word.

We might want to use such a function to compute the 10 most-frequently occurring words in a body of text. One approach would be to write another SQL-MapReduce function that counts the number of times a given word appears. We might have a CountInput function for this purpose:

SELECT word, count AS frequency
FROM CountInput(
      ON SplitIntoWords( ON documents )
      PARTITION BY word
      )
ORDER BY frequency DESC
LIMIT 10

In this example, the rows that are output by SplitIntoWords are formed into groups of distinct words using the PARTITION BY clause. Then, the CountInput function counts the number of words in each partition.

We don't need a special CountInput function. We can just use normal SQL:

SELECT word, COUNT(*) as frequency
FROM SplitIntoWords(
      ON documents
      )
GROUP BY word
ORDER BY frequency DESC
LIMIT 10

This is a simple example of the flexibility provided by the integration of SQL and MapReduce-style computation in SQL-MapReduce functions.