Semantically, a SQL-MapReduce input is either partitioned or dimensional.
Partitioned Input
- PARTITION BY ANY
The input is randomly partitioned among the vworkers. PARTITION BY ANY preserves any existing partitioning of the data for that input. A function can have at most one PARTITION BY ANY input.
- PARTITION BY p_attribute_set
The input is sorted and partitioned on the columns specified by p_attribute_set.
A function can have multiple PARTITION BY p_attribute_set inputs. All PARTITION BY p_attribute_set clauses must specify the same number of attributes, and corresponding attributes must be equijoin-compatible (that is, either of the same data type or of data types that can be implicitly cast to match). This casting is partition safe, which means that it does not cause redistribution of data on the vworkers.
Dimensional Input
A dimensional input, identified by the keyword DIMENSION, is distributed to each vworker. Dimensional inputs must be on each vworker because, like function arguments, they provide information that the function needs. The most common dimensional inputs are lookup tables and trained models.