Description
The MinHash function uses transaction history to cluster similar items or users together. For example, the function can cluster items that are frequently bought together or users that bought the same items.
Usage
td_minhash_mle (
data = NULL,
id.column = NULL,
items.column = NULL,
hash.num = NULL,
key.groups = NULL,
seed.table = NULL,
input.format = "integer",
mincluster.size = 3,
maxcluster.size = 5,
delimiter = " ",
data.sequence.column = NULL,
seed.table.sequence.column = NULL
)
Arguments
data |
Required Argument. |
id.column |
Required Argument. |
items.column |
Required Argument. |
hash.num |
Required Argument. |
key.groups |
Required Argument. |
seed.table |
Optional Argument. |
input.format |
Optional Argument. |
mincluster.size |
Optional Argument. |
maxcluster.size |
Optional Argument. |
delimiter |
Optional Argument. |
data.sequence.column |
Optional Argument. |
seed.table.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_minhash_mle" which is a named
list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using the following names:
output.table
-
save.seed.to
output
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load example data.
loadExampleData("minhash_example", "salesdata")
# Create object(s) of class "tbl_teradata".
salesdata <- tbl(con, "salesdata")
# Example - Create clusters of users based on items purchased.
td_minhash_out1 <- td_minhash_mle(data = salesdata,
id.column = "userid",
items.column = "itemid",
hash.num = 1002,
key.groups = 3
)
# Example 2 - Use the previously generated seed table as input.
# Select a subset of the seed table to restrict the number of clusters.
td_minhash_out2 <- td_minhash_mle(data = salesdata,
id.column = "userid",
items.column = "itemid",
hash.num = 99,
key.groups = 3,
seed.table = td_minhash_out1$save.seed.to %>% filter(index < 99)
)