MinHash Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
OutputTable
Specify the name of the output table.
SaveSeedTo
[Optional. Disallowed with SeedTable table.] Specify the name of the table in which to save the random seeds that the function creates if you omit the SeedTable table. If you omit both SeedTable and SaveSeedTo, the function discards the random seeds at the end of execution.
Typically, you specify this argument in the first MinHash call, which creates seed_table_to_save, and then specify seed_table_to_save as the SeedTable in subsequent MinHash calls.
UserIDColumn
Specify the name of the input table column that contains the IDs to cluster. Typically these values are customer identifiers.
ItemIDColumn
Specify the name of the input column that contains the values to use for hashing.
HashNum
Specify the number of hash functions to create. The number_of_hash_functions determines the number and size of clusters created.
KeyGroups
Specify the number of key groups to create. The number_of_key_groups must be a divisor of number_of_hash_functions. A large number_of_key_groups decreases the probability that multiple users are assigned to the same cluster identifier.
InputFormat
[Optional] Specify the format of the values to hash (the values in item_id_column).
Default: 'integer'
MinClusterSize
[Optional] Specify the minimum cluster size.
Default: 3
MaxClusterSize
[Optional] Specify the maximum cluster size.
Default: 5
Delimiter
[Optional] Specify the delimiter used between hashed values (typically customer identifiers) in the output.
Default: ' ' (space)