7.00.02 - Analytics at Scale: Full Data Set Analysis - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)

Aster Database lets you efficiently perform analytical tasks on your full data set, in the database, rather than using samples or bulk-exporting data to a dedicated computing cluster.

Advantages of full data set analysis are:
  • Accurate, reproducible results
  • Increased iteration speed

    Pushing down analytics into a massively parallel processing (MPP) system decreases the iteration cycle time. Teradata is working with partners, including the SAS Institute, Inc., to make this process straightforward, and is developing functions where appropriate (for example, functionality that takes advantage of the MapReduce paradigm).

  • "Needle in a haystack," "false negative," and "exceptional cases" searches

    Very rare events can only be found (and defined) against the background of the entire data set (consider defining "elite baseball player" by looking at the 2008 SF Giants, as opposed to every player in Major League Baseball history).

  • Statistical significance

    Reliable analytics may require using a large portion of the data, which cannot be fit on a typical, single database machine.

  • Model tuning

    The parameters to predictive models depend on aggregate statistics of the entire data set (for example, residual away from the mean).

  • No meaningful way to sample

    Sampling a graph is not straightforward, especially if one is interested in critical behavior that only appears when a certain threshold of connections is reached.

  • Larger data sets are just different

    The resulting analytics are applied to the entire data set in the cluster. Algorithms developed on smaller data sets may not scale appropriately to the full data set, requiring redevelopment.