You invoke the RunOnSpark function in a SQL-MapReduce query:
SELECT select_list FROM RunOnSpark (ON ...);
The query runs on the Aster cluster, in parallel with the Aster vworkers. The vworkers communicate with Spark, which they treat as an external processing engine. The query execution procedure is:
- The SQL-MapReduce instances synchronize themselves.
- Aster Database starts the Spark job at runtime by submitting a request to the Spark manager.
- Aster Database transfers the data for the job to the Spark manager.
- Spark processes the data.
- Spark sends the results to the SQL-MapReduce instances.
- Aster Database processes the results.
The following figure shows how Aster Database and Spark communicate. Aster Database and Spark can be on the same cluster or on different clusters.