Input
The InputTable has 341 distinct users and the identifiers of the items they purchased in an office supplies store.
userid | itemid |
---|---|
1 | 1 |
2 | 2 3 |
3 | 4 |
4 | 2 |
5 | 3 1 |
6 | 1 |
7 | 5 |
8 | 5 6 |
9 | 2 1 |
10 | 3 |
11 | 8 |
12 | 10 |
13 | 11 4 |
... | ... |
itemid | Items |
---|---|
1 | Storage |
2 | Appliances |
3 | Binders |
4 | Telephones |
5 | Paper |
6 | Rubber Bands |
7 | Computer Peripherals |
8 | Office Furnishings |
9 | Office Machines |
10 | Envelopes |
11 | Bookcases |
12 | Tables |
13 | Pens & Art Supplies |
14 | Chairs & Chairmats |
15 | Scissors |
16 | Rulers & Trimmers |
17 | Copiers & Fax Storage |
18 | Labels |
SQL Call
SELECT * FROM MinHash ( ON salesdata AS InputTable OUT TABLE OutputTable (minhashoutput) USING UserIDColumn ('userid') ItemIDColumn ('itemid') HashNum (1002) KeyGroups (3) InputFormat ('integer') MinClusterSize (3) MaxClusterSize (5) ) AS dt;
Output
message |
---|
Result has been stored in the table specified in the argument OutputTable |
This query returns the following table:
SELECT * FROM minhashoutput ORDER BY clusterid;
clusterid | userid |
---|---|
1002732123681872942919652130 | 142 153 22 229 273 |
10191305779223184216324476 | 106 65 94 |
102623915513963258275858860 | 15 154 200 219 227 |
10521510524181490254808958 | 106 162 41 76 |
1057328301636481327290076924 | 145 336 64 73 |
111640426347546462487275395 | 159 199 329 |
111640426379300784959427683 | 172 201 8 |
1145291930783954549119382258 | 116 16 255 |
11574213171254045121408249132 | 116 126 264 |
1174195802405410071547744710 | 220 323 336 64 73 |
1178104602478564384799399977 | 233 336 64 73 |
12111042574047172271448914 | 105 233 336 64 73 |
... | ... |