The Aster Database In-Database MapReduce framework, SQL-MapReduce, lets you write functions in Java or C, save these functions in the cluster, and allow analysts to run them in a parallel fashion on Aster Database for efficient data analysis.
Analysts invoke a SQL-MapReduce function in a SELECT query and receive function output as if the function were a table. A SQL-MapReduce function takes as input one or more sets of rows from tables or views (for example, the contents of a table in the database, the output of a SQL SELECT statement, or the output of another SQL-MapReduce function) and produces a set of rows as output.
SQL-MapReduce functions can accept multiple inputs. For more information, see SQL-MapReduce with Multiple Inputs.
Because a call to a SQL-MapReduce function results in a set of parallel tasks being run across the cluster, the input data provided to a SQL-MapReduce function must be divided across the parallel tasks.
SQL-MapReduce Function Types
The following table summarizes the SQL-MapReduce function types.
|Function Type||How Function Takes Input||Aster Database SQL-MapReduce API Interface for Function||SQL Statement That Calls Function Includes||Further Description|
|Single-input row||One row at a time, in any order.||RowFunction||ON input_table||Operates on individual rows.
Corresponds to a map function in traditional map-reduce systems.
|Single-input partition||One partition at a time.
In a partition, rows are grouped by a specified key of one or more columns.
|PartitionFunction||ON input_table PARTITION BY attributes||Operates on rows that share a partition.
Has simultaneous access to all rows in a partition, enabling more complex processing than possible with row-wise input. Within each partition, you can sort rows with an ORDER BY clause.
Corresponds to a reduce function in traditional map-reduce systems.
|Multiple-input||From multiple sources.
Inputs can include a cogroup operation in which inputs from multiple sources are partitioned and combined before being processed, a dimension operation where all rows of one or more inputs are replicated to each vworker, or a combination of both.
|MultipleInputFunction||A combination of the following, to specify each input and how to distribute its rows:
||See Rules for Number of Inputs by Type.|
A SQL-MapReduce function:
- Uses the Aster Database API (which supports the languages Java and C).
- Is compiled outside the database.
- Is installed (uploaded to the cluster) using Aster Database ACT.
- Is invoked with a SQL statement, whose ON clauses specify inputs.
- Receives, as input, rows of one or more database tables or views, pre-existing trained models, or the result of another SQL-MapReduce function.
- Receives, as arguments, zero or more argument clauses (parameters), which can modify function behavior.
- Returns output rows to the database.
- Is polymorphic.
During initialization, the function gets its input schema (for example, (key, value)) and instructions for returning its output schema.
- Is designed to run on an MPP system by allowing the user to specify the slice of the data (partition) that a particular instance of the function can access.