This example calculates the number of distinct values for each specified column with a 1% error rate (accuracy).
Input
The input table, cracker, has more than 3000 rows of price and advertisement information for U.S. cracker brands Sunshine, Keebler, Nabisco, and a private label (such as a store brand). Input column names have these meanings:
Column |
Meaning |
dispbrand
|
Seller displayed brand prominently. |
featbrand
|
Seller featured brand. |
pricebrand
|
Price of brand. |
cracker
sn |
id |
dispsunshine |
dispkeebler |
dispnabisco |
dispprivate |
featsunshine |
featkeebler |
featnabisco |
featprivate |
pricesunshine |
pricekeebler |
pricenabisco |
priceprivate |
choice |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
98 |
88 |
120 |
71 |
nabisco |
2 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
99 |
109 |
99 |
71 |
nabisco |
3 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
49 |
109 |
109 |
78 |
sunshine |
4 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
103 |
109 |
89 |
78 |
nabisco |
5 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
109 |
109 |
119 |
64 |
nabisco |
6 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
89 |
109 |
119 |
84 |
nabisco |
7 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
109 |
109 |
129 |
78 |
sunshine |
8 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
109 |
119 |
129 |
78 |
nabisco |
9 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
109 |
121 |
109 |
78 |
nabisco |
10 |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
79 |
121 |
109 |
78 |
nabisco |
11 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
109 |
113 |
109 |
96 |
nabisco |
12 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
109 |
121 |
99 |
86 |
nabisco |
13 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
89 |
121 |
99 |
86 |
nabisco |
14 |
1 |
0 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
109 |
109 |
129 |
96 |
nabisco |
15 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
|
|
|
|
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
SQL Call
SELECT * FROM ApproxCardinalityReduce (
ON (
SELECT * FROM ApproxCardinalityMap (
ON cracker AS InputTable
USING
InputColumns ('sn','id',
'dispsunshine','dispkeebler','dispnabisco','dispprivate',
'featsunshine','featkeebler','featnabisco','featprivate',
'pricesunshine','pricekeebler','pricenabisco','priceprivate',
'choice')
ErrorRate (1)
) as dt1
) PARTITION BY column_name
) AS dt ;
Output
column_name |
cnt |
method |
choice |
4 |
nearExact |
dispkeebler |
2 |
nearExact |
dispnabisco |
2 |
nearExact |
dispprivate |
2 |
nearExact |
dispsunshine |
2 |
nearExact |
featkeebler |
2 |
nearExact |
featnabisco |
2 |
nearExact |
featprivate |
2 |
nearExact |
featsunshine |
2 |
nearExact |
id |
136 |
nearExact |
pricekeebler |
29 |
nearExact |
pricenabisco |
42 |
nearExact |
priceprivate |
39 |
nearExact |
pricesunshine |
27 |
nearExact |
sn |
3292 |
nearExact |