Background - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

When the column or combination of columns has a large cardinality, the function uses the Flajolet-Martin algorithm, which approximates the number of distinct elements in a large set of numbers with a single pass, by counting some bitmap functions of the hashed values of the large set of numbers.

The value nmap/φ*2 (S/nmap) asymptotically converges to the number of distinct values in the set, where:

  • S is the calculated sum of the bitmap function.
  • nmap is the number of hash map function used, determined by specified error tolerance.
  • φ is the constant with approximate value 0.77.

When the number of distinct values in the set is small, the function counts them, rather than using the Flajolet-Martin algorithm. To understand why, consider the case where the distinct count is 5: The value nmap/φ*2 (S/nmap) is approximately 85 when the error is 10% and approximately 10590 when error is 1%.

For more information about probabilistic counting algorithms, see Probabilistic Counting Algorithms for Data Base Applications, by Philippe Flajolet and G. Nigel Martin (http://portal.acm.org/citation.cfm?id=5215).