Description
The Minhash (td_minhash_mle
) function uses transaction history to cluster
similar items or users together. For example, the function
can cluster items that are frequently bought together
or users that bought the same items.
Usage
td_minhash_mle ( data = NULL, id.column = NULL, items.column = NULL, hash.num = NULL, key.groups = NULL, seed.table = NULL, input.format = "integer", mincluster.size = 3, maxcluster.size = 5, delimiter = " ", data.sequence.column = NULL, seed.table.sequence.column = NULL )
Arguments
data |
Required Argument. |
id.column |
Required Argument. |
items.column |
Required Argument. |
hash.num |
Required Argument. |
key.groups |
Required Argument. |
seed.table |
Optional Argument. |
input.format |
Optional Argument. |
mincluster.size |
Optional Argument. |
maxcluster.size |
Optional Argument. |
delimiter |
Optional Argument. |
data.sequence.column |
Optional Argument. |
seed.table.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_minhash_mle" which is a named
list containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator
using following names:
output.table
-
save.seed.to
output
Examples
# Get the current context/connection con <- td_get_context()$connection # Load example data. loadExampleData("minhash_example", "salesdata") # Create remote tibble objects. salesdata <- tbl(con, "salesdata") # Example - Create clusters of users based on items purchased. td_minhash_out <- td_minhash_mle(data = salesdata, id.column = "userid", items.column = "itemid", hash.num = 1002, key.groups = 3 ) # Example 2 - Use the previously generated seed table as input # Select a subset of the seed table to restrict the number of clusters td_minhash_out1 <- td_minhash_mle(data = salesdata, id.column = "userid", items.column = "itemid", hash.num = 99, key.groups = 3, seed.table = td_minhash_out$save.seed.to %>% filter(index < 99) )