Write a SQL-MapReduce Function in Java - Aster Execution Engine

Teradata Aster® Developer Guide

Product
Aster Execution Engine
Release Number
7.00.02
Published
July 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
xnl1494366523182.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
ffu1489104705746
lifecycle
previous
Product Category
Software

This section guides you through writing and invoking a simple SQL-MapReduce function. It assumes you have downloaded the Aster SQL-MapReduce SDK.

To write a SQL-MapReduce function in Java, you create a Java class that implements one of the following interfaces:
  • com.asterdata.ncluster.sqlmr.RowFunction
  • com.asterdata.ncluster.sqlmr.PartitionFunction
  • com.asterdata.ncluster.sqlmr.MultipleInputFunction

The class must implement a public constructor that takes a com.asterdata.ncluster.sqlmr.RuntimeContract.

The name of your SQL-MapReduce is the name of the Java class, ignoring case differences. So, for example, a function named splitintowords might be implemented by a Java class com.mycompany.SplitIntoWords.

Teradata Aster’s SQL-MapReduce framework supports three types of functions. A SQL-MapReduce function must implement one of these three interfaces:

  • RowFunction

    A RowFunction corresponds to a map function with a single input, and must be invoked without a PARTITION BY clause. From an interface perspective, the function will be passed an iterator to an arbitrary set of rows. A RowFunction consists of two functions: the constructor and the operate function, operateOnSomeRows().

  • PartitionFunction

    A PartitionFunction corresponds to a reduce function with a single input, and must be invoked with a PARTITION BY clause. Rows with the same values for the PARTITION BY expressions are brought together onto the same logical worker, and each invocation of the function is passed all the rows in that partition. A partition function consists of two methods: the constructor, and the operate function, operateOnPartition().

  • MultipleInputFunction

    A MultipleInputFunction is a new row-producing interface, provided for functions that require multiple inputs. It implements a row emitting method operateOnMultipleInputs() that is provided one or more partitioned inputs and zero or more dimensional inputs per invocation.