This example calculates the number of distinct values for each specified column with a 1% error rate (accuracy).
Input
The input table, cracker, has more than 3000 rows of price and advertisement information for U.S. cracker brands Sunshine, Keebler, Nabisco, and a private label (such as a store brand). Input column names have these meanings:
Column | Meaning |
---|---|
dispbrand | Seller displayed brand prominently. |
featbrand | Seller featured brand. |
pricebrand | Price of brand. |
sn | id | dispsunshine | dispkeebler | dispnabisco | dispprivate | featsunshine | featkeebler | featnabisco | featprivate | pricesunshine | pricekeebler | pricenabisco | priceprivate | choice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 98 | 88 | 120 | 71 | nabisco |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99 | 109 | 99 | 71 | nabisco |
3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 49 | 109 | 109 | 78 | sunshine |
4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 103 | 109 | 89 | 78 | nabisco |
5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 109 | 109 | 119 | 64 | nabisco |
6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 89 | 109 | 119 | 84 | nabisco |
7 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 109 | 109 | 129 | 78 | sunshine |
8 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 109 | 119 | 129 | 78 | nabisco |
9 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 109 | 121 | 109 | 78 | nabisco |
10 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 79 | 121 | 109 | 78 | nabisco |
11 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 109 | 113 | 109 | 96 | nabisco |
12 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 109 | 121 | 99 | 86 | nabisco |
13 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 89 | 121 | 99 | 86 | nabisco |
14 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 109 | 109 | 129 | 96 | nabisco |
15 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
SQL Call
SELECT * FROM ApproxCardinalityReduce( ON ApproxCardinalityMap( ON cracker AS InputTable USING TargetColumns('sn', 'id', 'dispsunshine', 'dispkeebler', 'dispnabisco', 'dispprivate', 'featsunshine', 'featkeebler','featnabisco', 'featprivate', 'pricesunshine', 'pricekeebler', 'pricenabisco', 'priceprivate', 'choice') ErrorRate(1) ) PARTITION BY column_name ) AS dt ORDER BY column_name;
Output
column_name cnt method ------------- ---- --------- choice 4 nearExact dispkeebler 2 nearExact dispnabisco 2 nearExact dispprivate 2 nearExact dispsunshine 2 nearExact featkeebler 2 nearExact featnabisco 2 nearExact featprivate 2 nearExact featsunshine 2 nearExact id 136 nearExact pricekeebler 29 nearExact pricenabisco 42 nearExact priceprivate 39 nearExact pricesunshine 27 nearExact sn 3292 nearExact
Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.