Argument | Category | Description |
---|---|---|
InputTable | Required | Specifies the name of the table or view that contains the training documents. |
ModelTable | Required | Specifies the name for the model table that the function creates in the database. This table must not already exist. |
OutputTable | Optional | Specifies the name of the output table that contains the topic distribution of each document in the input table, which the function creates in the database. This table must not already exist. If you omit this argument, the function does not generate this table. |
TopicNum | Required | Specifies the number of topics for all the documents in the input table, an INTEGER value in the range [2, 1000]. |
Alpha | Optional | Specifies a hyperparameter of the model, the prior smooth parameter for the topic distribution over documents. As alpha decreases, fewer topics are associated with each document. The default value is 0.1. |
Eta | Optional | Specifies a hyperparameter of the model, the prior smooth parameter for the word distribution over topics. As eta decreases, fewer words are associated with each topic. The default value is 0.1. |
DocIDColumn | Required | Specifies the name of the input column that contains the document identifiers. |
WordColumn | Required | Specifies the name of the input column that contains the words (one word in each row). |
CountColumn | Optional | Specifies the name of the input column that contains the count of the corresponding word in the row, a positive value. By default, the count of each word is 1. |
MaxIterate | Optional | Specifies the maximum number of iterations to perform if the model does not converge, a positive INTEGER value. The default value is 50. |
ConvergenceDelta | Optional | Specifies the convergence delta of log perplexity, a NUMERIC value in the range [0.0, 1.0]. The default value is 1e-4. |
Seed | Optional | Specifies the seed with which to initialize the model, a LONG value. Given the same seed, cluster configuration, and input table, the function generates the same model. By default, the function initializes the model randomly. |
OutputTopicNum | Optional | Ignored unless OutputTable is specified. Specifies the number of top-weighted topics and their weights to include in the output table for each training document. The value topics must be a positive INTEGER. The default value, 'all', specifies all topics and their weights. |
OutputTopicWordNum | Optional | Ignored unless OutputTable is specified. Specifies the number of top topic words and their topic identifiers to include in the output table for each training document. The value topic_words must be a positive INTEGER. The value 'all' specifies all topic words and their topic identifiers. The default value, 'none', specifies no topic words or topic identifiers. |
The function might produce different results with different Seed settings and cluster configurations.