7.00.02 - Background - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
Last Update
2018-04-17

When the column or combination of columns has a large cardinality, the function uses the Flajolet-Martin algorithm, which approximates the number of distinct elements in a large set of numbers with a single pass, by counting some bitmap functions of the hashed values of the large set of numbers.

The value nmap/φ*2 (S/nmap) asymptotically converges to the number of distinct values in the set, where:

  • S is the calculated sum of the bitmap function.
  • nmap is the number of hash map function used, determined by specified error tolerance.
  • φ is the constant with approximate value 0.77.

When the number of distinct values in the set is small, the function counts them, rather than using the Flajolet-Martin algorithm. To understand why, consider the case where the distinct count is 5: The value nmap/φ*2 (S/nmap) is approximately 85 when the error is 10% and approximately 10590 when error is 1%.

For more information about probabilistic counting algorithms, see Probabilistic Counting Algorithms for Data Base Applications, by Philippe Flajolet and G. Nigel Martin (http://portal.acm.org/citation.cfm?id=5215).