Aster Database lets you efficiently perform analytical tasks on your full data set, in the database, rather than using samples or bulk-exporting data to a dedicated computing cluster.
- Accurate, reproducible results
- Increased iteration speed
Pushing down analytics into a massively parallel processing (MPP) system decreases the iteration cycle time. Teradata is working with partners, including the SAS Institute, Inc., to make this process straightforward, and is developing functions where appropriate (for example, functionality that takes advantage of the MapReduce paradigm).
- "Needle in a haystack," "false
negative," and "exceptional cases" searches
Very rare events can only be found (and defined) against the background of the entire data set (consider defining "elite baseball player" by looking at the 2008 SF Giants, as opposed to every player in Major League Baseball history).
- Statistical significance
Reliable analytics may require using a large portion of the data, which cannot be fit on a typical, single database machine.
- Model tuning
The parameters to predictive models depend on aggregate statistics of the entire data set (for example, residual away from the mean).
- No meaningful way to sample
Sampling a graph is not straightforward, especially if one is interested in critical behavior that only appears when a certain threshold of connections is reached.
- Larger data sets are just
The resulting analytics are applied to the entire data set in the cluster. Algorithms developed on smaller data sets may not scale appropriately to the full data set, requiring redevelopment.