This section guides you through writing and invoking a simple SQL-MapReduce function. It assumes you have downloaded the Aster SQL-MapReduce SDK.
- com.asterdata.ncluster.sqlmr.RowFunction
- com.asterdata.ncluster.sqlmr.PartitionFunction
- com.asterdata.ncluster.sqlmr.MultipleInputFunction
The class must implement a public constructor that takes a com.asterdata.ncluster.sqlmr.RuntimeContract.
The name of your SQL-MapReduce is the name of the Java class, ignoring case differences. So, for example, a function named splitintowords might be implemented by a Java class com.mycompany.SplitIntoWords.
Teradata Aster’s SQL-MapReduce framework supports three types of functions. A SQL-MapReduce function must implement one of these three interfaces:
- RowFunction
A RowFunction corresponds to a map function with a single input, and must be invoked without a PARTITION BY clause. From an interface perspective, the function will be passed an iterator to an arbitrary set of rows. A RowFunction consists of two functions: the constructor and the operate function, operateOnSomeRows().
- PartitionFunction
A PartitionFunction corresponds to a reduce function with a single input, and must be invoked with a PARTITION BY clause. Rows with the same values for the PARTITION BY expressions are brought together onto the same logical worker, and each invocation of the function is passed all the rows in that partition. A partition function consists of two methods: the constructor, and the operate function, operateOnPartition().
- MultipleInputFunction
A MultipleInputFunction is a new row-producing interface, provided for functions that require multiple inputs. It implements a row emitting method operateOnMultipleInputs() that is provided one or more partitioned inputs and zero or more dimensional inputs per invocation.