An Aster instance uses the invocation request-reply architecture to communicate with SQL-MapReduce functions. The Queen Planner fills in the invocation request data structure and passes it to the function. The function in turn updates the reply data structure to pass some information back to the Planner with the guarantee that this Runtime Contract will be honored at execution.
In this architecture:
- The Planner sends the input columns, their data types, and function arguments to the function.
- The function describes the output columns and their data types.
Each operator in the plan either produces a data stream or affects the properties of the data stream input to it. The properties of the data stream can be classified as follows:
- Logical properties
Two comparable plans have the same logical/relational properties. For example, keys, functional dependencies, and schema.
- Physical properties
For example, the order of data in the stream and its distribution. Two comparable plans can have different physical properties.
- Estimated properties
For example, cardinality and cost.
A SQL-MapReduce function is not aware of all the properties of the data stream input to it. The reduce functions are aware of the data partitioning and, optionally, the ordering of the data. The mapping functions are, optionally, aware of the order, but they are not aware of how the input data stream is distributed. None of the SQL-MapReduce functions are aware of whether the data stream has any keys or if any functional dependencies exist.
Moreover, the SQL-MapReduce function appears as a black box to the Planner because it does not know the properties of the data stream output by the function. This lack of information leads the Planner to perform redundant operations like data redistribution and sorting, even when the data output by the function is already correctly distributed and sorted.
To eliminate redundancy and improve performance, Aster Execution Engine provides collaborative planning, which is described in the next section.