Open-source R is designed to operate in a single-thread environment on data stored in the local system’s memory. Because of this design, R fails with data too large to fit in memory. The amount of memory depends on the specific system configuration and the actual memory available at a given point in time. This limitation is exacerbated by the call-by-value semantics of an R execution, which leads to many copies of data being created in memory as data flows from one function to another.
Data scientists and statisticians using R routinely analyze large data stored in relational databases. In most cases, the only option available for data scientists to analyze data stored in a relational database is to download the data into an R environment. This leads to a number of problems, including time-consuming data extraction from (and export to) relational databases. This typically prohibits interactive data analysis, unnecessarily duplicates data storage in the organization, and requires sampling or a system with large amounts of memory and storage to run R and process large amounts of data.
The Aster R product addresses these challenges by making in-database execution of R possible. Executing R within the Aster Database eliminates the need to transfer data between the database and the R client. It also allows Aster R users to take advantage of parallel computation across many nodes and automatic scaling made possible by the Aster Database.