SELECT * FROM minhash( ON (SELECT 1) PARTITION BY 1 InputTable ('salesdata') OutputTable ('minhashoutput') IDColumn ('userid') ItemsColumn ('itemid') HashNum ('1002') KeyGroups ('3') InputType ('integer') MinClusterSize ('3') MaxClusterSize ('5') );
The number of hash functions must be an integer multiple of number of keygroups, while each clusterid is generated by concatenating KeyGroups’ hashcodes together. The larger the amount of keygroups, fewer clusters are obtained.