MinHash Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

The InputTable has 341 distinct users and the identifiers of the items they purchased in an office supplies store.

InputTable: salesdata
userid itemid
1 1
2 2 3
3 4
4 2
5 3 1
6 1
7 5
8 5 6
9 2 1
10 3
11 8
12 10
13 11 4
... ...
Itemids and Items
itemid Items
1 Storage
2 Appliances
3 Binders
4 Telephones
5 Paper
6 Rubber Bands
7 Computer Peripherals
8 Office Furnishings
9 Office Machines
10 Envelopes
11 Bookcases
12 Tables
13 Pens & Art Supplies
14 Chairs & Chairmats
15 Scissors
16 Rulers & Trimmers
17 Copiers & Fax Storage
18 Labels

SQL Call

SELECT * FROM MinHash (
  ON salesdata AS InputTable
  OUT TABLE OutputTable (minhashoutput)
  USING
  UserIDColumn ('userid')
  ItemIDColumn ('itemid')
  HashNum (1002)
  KeyGroups (3)
  InputFormat ('integer')
  MinClusterSize (3)
  MaxClusterSize (5)
) AS dt;

Output

message
Result has been stored in the table specified in the argument OutputTable

This query returns the following table:

SELECT * FROM minhashoutput ORDER BY clusterid;
clusterid userid
1002732123681872942919652130 142 153 22 229 273
10191305779223184216324476 106 65 94
102623915513963258275858860 15 154 200 219 227
10521510524181490254808958 106 162 41 76
1057328301636481327290076924 145 336 64 73
111640426347546462487275395 159 199 329
111640426379300784959427683 172 201 8
1145291930783954549119382258 116 16 255
11574213171254045121408249132 116 126 264
1174195802405410071547744710 220 323 336 64 73
1178104602478564384799399977 233 336 64 73
12111042574047172271448914 105 233 336 64 73
... ...