When the column or combination of columns has a large cardinality, the function uses the Flajolet-Martin algorithm, which approximates the number of distinct elements in a large set of numbers with a single pass, by counting some bitmap functions of the hashed values of the large set of numbers.
The value nmap/φ*2 (S/nmap) asymptotically converges to the number of distinct values in the set, where:
- S is the calculated sum of the bitmap function.
- nmap is the number of hash map function used, determined by specified error tolerance.
- φ is the constant with approximate value 0.77.
When the number of distinct values in the set is small, the function counts them, rather than using the Flajolet-Martin algorithm. To understand why, consider the case where the distinct count is 5: The value nmap/φ*2 (S/nmap) is approximately 85 when the error is 10% and approximately 10590 when error is 1%.
For more information about probabilistic counting algorithms, see Probabilistic Counting Algorithms for Data Base Applications, by Philippe Flajolet and G. Nigel Martin (http://portal.acm.org/citation.cfm?id=5215).