MinHash Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
9.02
9.01
2.0
1.3
Published
February 2022
Language
English (United States)
Last Update
2022-02-10
dita:mapPath
rnn1580259159235.ditamap
dita:ditavalPath
ybt1582220416951.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
OutputTable
Specify the name of the output table.
OutputSeedsTable
[Optional. Disallowed with SeedTable table.] Specify the name of the table in which to save the random seeds that the function creates if you omit the SeedTable table. If you omit both SeedTable and OutputSeedsTable, the function discards the random seeds at the end of execution.
Typically, you specify this syntax element in the first MinHash call, which creates seed_table_to_save, and then specify seed_table_to_save as the SeedTable in subsequent MinHash calls.
UserIDColumn
Specify the name of the InputTable column that contains the IDs to cluster. Typically these values are customer identifiers.
ItemIDColumn
Specify the name of the input column that contains the values to use for hashing.
HashNum
Specify the number of hash functions to create. The number_of_hash_functions determines the number and size of clusters created.
KeyGroups
Specify the number of key groups to create. The number_of_key_groups must be a divisor of number_of_hash_functions. A large number_of_key_groups decreases the probability that multiple users are assigned to the same cluster identifier.
InputFormat
[Optional] Specify the format of the values to hash (the values in item_id_column).
Default: 'integer'
MinClusterSize
[Optional] Specify the minimum cluster size.
Default: 3
MaxClusterSize
[Optional] Specify the maximum cluster size.
Default: 5
Delimiter
[Optional] Specify the delimiter used between hashed values (typically customer identifiers) in the output.
Default: ' ' (space)