MinHash Example | Teradata Vantage - MinHash Example - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Input

The InputTable has 341 distinct users and the identifiers of the items they purchased in an office supplies store.

InputTable: salesdata
userid itemid
1 1
2 2 3
3 4
4 2
5 3 1
6 1
7 5
8 5 6
9 2 1
10 3
11 8
12 10
13 11 4
... ...
Itemids and Items
itemid Items
1 Storage
2 Appliances
3 Binders
4 Telephones
5 Paper
6 Rubber Bands
7 Computer Peripherals
8 Office Furnishings
9 Office Machines
10 Envelopes
11 Bookcases
12 Tables
13 Pens & Art Supplies
14 Chairs & Chairmats
15 Scissors
16 Rulers & Trimmers
17 Copiers & Fax Storage
18 Labels

SQL Call

SELECT * FROM MinHash (
  ON salesdata AS InputTable
  OUT TABLE OutputTable (minhashoutput)
  USING
  UserIdColumn ('userid')
  ItemIdColumn ('itemid')
  HashNum (1002)
  KeyGroups (3)
  InputFormat ('integer')
  MinClusterSize (3)
  MaxClusterSize (5)
) AS dt ;

Output

 message                                                                   
 ------------------------------------------------------------------------- 
 Result has been stored in the table specified in the argument OutputTable
SELECT clusterid, CAST(userid AS varchar(50)) 
  FROM minhashoutput ORDER BY clusterid;
 clusterid                     userid               
 ----------------------------- -------------------- 
 10001632024477502928557112    115 143 33          
 1000181818172837758261472850  188 19 82           
 1000672928162506536467975178  172 201 8           
 1000672928162506536745194009  154 200 219 227     
 1000888321111525381245325420  231 278 279 292 43  
 10011398202195032931017789807 336 64 73           
 100200255478713962251455059   116 16 255          
 100200255691628539183060642   155 193 331 61      
 1002732123681872942919652130  142 153 22 229 273  
 100330557752858893780017760   106 65 94           
 10049170655836044391306314718 234 306 341         
 101328785814130588977849993   162 41 76           
 1015945565890613405523781488  155 193 331         
 10162828715827479001108091687 298 300 75          
 10175358509427418841426707515 154 219 227         
 1018466901986895211628049263  281 299 335 83      
 10220852810629588885324013    116 126 162 264     
 10220852817106841885324013    106 41 76           
 10226181701140395735173738032 115 159 199         
 10226181701254897345173738032 101 168 85          
 1022618170911392515173738032  165 210 229         
 102470444162678584793019606   142 153 22 273      
 102573341535757391533154664   116 126 264         
 102582206635867650161809733   162 234 306 341 56  
 102623915513963258275858860   15 154 200 219 227  
 1026873304546698627309800948  105 336 64 73       
 10293886931561094382367763    106 162 41 76       
 102938869926494095286052740   234 236 306 341     
 10313304113308841641346899945 15 154 219 227      
 1032014682918918621081479027  289 336 64 73       
 1034189323961653552588687202  116 16 162 255      
 103496462682875859105446942   131 319 329         
 103496462935523058105446942   101 115 168 85      
 103515721179174677712162623   154 200 219 227 277 
 1038787405450816405695980004  101 165 168 85      
 1047626411340449545590476206  220 336 64 73       
 105052222736915977534827861   334 49 99           
 1051524537188532351351736     162 234 306 341     
 1057328301636481327290076924  145 336 64 73       
 1062215657510129744623106336  159 199 329         
 1063074461401270656411541334  233 336 64 73       
 10681161985692751835793351    202 303 94          
 107352482206748926533638949   324 56 89           
 1077178555125838598220450673  142 153 213 22 273  
 108438676963330722908898142   323 336 64 73       
 1087091491051791057376376151  188 19 218 82       
 109565221148920194151646465   115 159 199 229 329 
 1104425083111007915632360455  137 324 33          
 11055405515522574861231316557 220 323 336 64 73   
 111384661784027554142923817   234 306 341 56      
 111878725277983383626020309   115 159 199 229     
 112000660881277414161330932   159 199 229 329     
 1135787282425554248855512     122 184 265         
 1135787282425554263955699     200 277 56          
 11364216137098692168568745    101 115 165 168 85  
 1163182571989967097544757011  105 233 336 64 73   
 118532648453415513812155324   162 334 99          
 1189747526909957268830076     329 56 98           
 12825749399292881511221643896 105 220 336 64 73   
 1303860321293646116676660877  154 219 227 303     
 130386032436060809270744284   172 201 56 8        
 13374366813446055233013996    162 334 49 99       
 141470682104264009341272583   159 199 229         
 143046875878802443317379604   215 23 33           
 155750144284072563472764756   115 159 199 329     
 16820489119808089544814447    162 234 236 306 341 
 213210460572199928292570925   105 323 336 64 73   
 2307301181223266695850700     234 236 306 341 56  
 23086115553832635553266216    145 323 336 64 73   
 246998243209606930167297798   116 162 165         
 3763388498647598191062219618  233 289 336 64 73   
 416520801194525772520154227   15 154 219 227 303  
 5010953953499458161174109310  145 233 336 64 73   
 503532392896216635761870770   289 323 336 64 73   
 607403481884103145578369013   220 233 336 64 73   
 68523913398609725423998883    220 289 336 64 73   
 7474160351045857577413872845  145 220 336 64 73

Download a zip file of all examples and a SQL script file that creates their input tables.