Description
The LDATrainer function uses training data and parameters to build a topic model, using an unsupervised method to estimate the correlation between the topics and words according to the topic number and other parameters. Optionally, the function generates the topic distributions for each training document.
Usage
td_lda_mle (
data = NULL,
topic.num = NULL,
docid.column = NULL,
word.column = NULL,
alpha = 0.1,
eta = 0.1,
count.column = NULL,
maxiter = 50,
convergence.delta = 1.0E-4,
seed = NULL,
out.topicnum = "all",
out.topicwordnum = "none",
initmodeltaskcount = NULL,
data.sequence.column = NULL
)
Arguments
data |
Required Argument. |
topic.num |
Required Argument. |
docid.column |
Required Argument. |
word.column |
Required Argument. |
alpha |
Optional Argument. |
eta |
Optional Argument. |
count.column |
Optional Argument. |
maxiter |
Optional Argument. |
convergence.delta |
Optional Argument. |
seed |
Optional Argument. |
out.topicnum |
Specifies the number of top-weighted topics and their weights to
include in the output tbl_teradata for each training document. The
value "out.topicnum" must be a positive integer enclosed in quotes or "all". |
out.topicwordnum |
Optional Argument. |
initmodeltaskcount |
Optional Argument. |
data.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_lda_mle" which is a named list
containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using following names:
model.table
-
doc.distribution.data
output
.
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("lda_example", "complaints_traintoken")
# Create object(s) of class "tbl_teradata".
complaints_traintoken <- tbl(con, "complaints_traintoken")
# Example 1 - This function uses training data and parameters from 'complaints_traintoken'
# tbl_teradata to build a topic model.
td_lda_out <- td_lda_mle(data = complaints_traintoken,
topic.num = 5,
docid.column = "doc_id",
word.column = "token",
count.column = "frequency",
maxiter = 30,
convergence.delta = 1e-3,
seed = 2
)