The distribution property means distribution across vWorkers. The PARTITION BY clause indicates the distribution of values across the input rows. The PARTITION BY requirement is implemented by the planner by treating PARTITION BY column as ‘distribution column’ and distributing data across vWorkers. Distributing the data on a column also ensures that the same values in that column land on the same vWorker and are seen together.
If the function does not destroy the distribution (for example, the function does not move the data from one vWorker to the other—moving data across vWorkers from the function is unlikely) and (in case of hash distribution) the distribution column is in the output column list, the input distribution survives.
The same argument holds for order. So, when the function executes the operateOnSomeRows() or operateOnPartition(), it does not need to do anything to enforce the distribution and order properties of the output, they are inherent in the output.
The input distribution is:
- 'Distributed' if the function operates on a fact table.
- 'Replicated' if the function operates on a replicated dimension table.
- 'Any' if the data is distributed on all nodes.