Write a SQL-MapReduce Function in Java

Write a SQL-MapReduce Function in Java - Aster Execution Engine

Teradata Aster® Developer Guide

Product

Aster Execution Engine

Release Number

7.00.02

Published

July 2017

Language

English (United States)

Last Update

2018-04-13

dita:mapPath

xnl1494366523182.ditamap

dita:ditavalPath

Generic_no_ie_no_tempfilter.ditaval

dita:id

ffu1489104705746

lifecycle

Product Category

Software

This section guides you through writing and invoking a simple SQL-MapReduce function. It assumes you have downloaded the Aster SQL-MapReduce SDK.

To write a SQL-MapReduce function in Java, you create a Java class that implements one of the following interfaces:

com.asterdata.ncluster.sqlmr.RowFunction
com.asterdata.ncluster.sqlmr.PartitionFunction
com.asterdata.ncluster.sqlmr.MultipleInputFunction

The class must implement a public constructor that takes a com.asterdata.ncluster.sqlmr.RuntimeContract.

The name of your SQL-MapReduce is the name of the Java class, ignoring case differences. So, for example, a function named splitintowords might be implemented by a Java class com.mycompany.SplitIntoWords.

Teradata Aster’s SQL-MapReduce framework supports three types of functions. A SQL-MapReduce function must implement one of these three interfaces:

RowFunction
A RowFunction corresponds to a map function with a single input, and must be invoked without a PARTITION BY clause. From an interface perspective, the function will be passed an iterator to an arbitrary set of rows. A RowFunction consists of two functions: the constructor and the operate function, operateOnSomeRows().
PartitionFunction
A PartitionFunction corresponds to a reduce function with a single input, and must be invoked with a PARTITION BY clause. Rows with the same values for the PARTITION BY expressions are brought together onto the same logical worker, and each invocation of the function is passed all the rows in that partition. A partition function consists of two methods: the constructor, and the operate function, operateOnPartition().
MultipleInputFunction
A MultipleInputFunction is a new row-producing interface, provided for functions that require multiple inputs. It implements a row emitting method operateOnMultipleInputs() that is provided one or more partitioned inputs and zero or more dimensional inputs per invocation.