SQL-MapReduce Example 1: Word Count - Aster Execution Engine

Teradata Aster® Developer Guide

Product

Aster Execution Engine

Release Number

7.00.02

Published

July 2017

Language

English (United States)

Last Update

2018-04-13

dita:mapPath

xnl1494366523182.ditamap

dita:ditavalPath

Generic_no_ie_no_tempfilter.ditaval

dita:id

ffu1489104705746

lifecycle

Product Category

Software

Consider a hypothetical SQL-MapReduce function that splits strings into individual words:

SELECT word
FROM SplitIntoWords(
      ON documents
)

In this example, the SplitIntoWords function is invoked once for every row in the documents table. It is Java procedural code that takes each document and emits a row for each word. The function itself defines the columns that appear in its output rows; in this case, SplitIntoWords emits rows with a single column named word.

We might want to use such a function to compute the 10 most-frequently occurring words in a body of text. One approach would be to write another SQL-MapReduce function that counts the number of times a given word appears. We might have a CountInput function for this purpose:

SELECT word, count AS frequency
FROM CountInput(
      ON SplitIntoWords( ON documents )
      PARTITION BY word
      )
ORDER BY frequency DESC
LIMIT 10

In this example, the rows that are output by SplitIntoWords are formed into groups of distinct words using the PARTITION BY clause. Then, the CountInput function counts the number of words in each partition.

We don't need a special CountInput function. We can just use normal SQL:

SELECT word, COUNT(*) as frequency
FROM SplitIntoWords(
      ON documents
      )
GROUP BY word
ORDER BY frequency DESC
LIMIT 10

This is a simple example of the flexibility provided by the integration of SQL and MapReduce-style computation in SQL-MapReduce functions.