LDA Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.10
1.1
Published
October 2019
Language
English (United States)
Last Update
2019-12-31
dita:mapPath
ima1540829771750.ditamap
dita:ditavalPath
jsj1481748799576.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
ModelTable
Specify the name for the model table that the function creates in the database. This table must not already exist.
OutputTable
[Optional] Specify the name of the output table that contains the topic distribution of each document in the InputTable, which the function creates in the database. This table must not already exist. If you omit this syntax element, the function does not create this table.
TopicNum
Specify the number of topics for all the documents in the InputTable, an INTEGER value in the range [2, 1000].
Alpha
[Optional] Specify a hyperparameter of the model, the prior smooth parameter for the topic distribution over documents. As alpha decreases, fewer topics are associated with each document.
Default: 0.1
Eta
[Optional] Specify a hyperparameter of the model, the prior smooth parameter for the word distribution over topics. As eta decreases, fewer words are associated with each topic.
Default: 0.1
DocIDColumn
Specify the name of the input column that contains the document identifiers.
WordColumn
Specify the name of the InputTable column that contains the words (one word in each row).
CountColumn
[Optional] Specify the InputTable of the input column that contains the count of the corresponding word in the row, a positive value.
Default behavior: The count of each word is 1.
MaxIterNum
[Optional] Specify the maximum number of iterations to perform if the model does not converge, a positive INTEGER value.
Default: 50
StopThreshold
[Optional] Specify the convergence delta of log perplexity, a NUMERIC value in the range [0.0, 1.0].
Default: 1e-4
Seed
[Optional] Specify the random seed the algorithm uses for repeatable results. The seed must be a LONG value.
For repeatable results, use both the Seed and UniqueID syntax elements. For more information, see Nondeterministic Results and UniqueID Syntax Element.
OutputTopicNum
[Optional] Ignored unless OutputTable is specified. Specify the number of top-weighted topics to include, with their weights, in the output table for each training document:
Option Description
'all' (Default) All topics and their weights.
num_topics Positive integer.
OutputTopicWordNum
[Optional] Ignored unless OutputTable is specified. Specify the number of top topic words to include, with their topic identifiers, in the output table for each training document:
Option Description
'none' (Default) No topic words or identifiers.
'all' All topic words and their identifiers.
num_topic_words Positive integer.
InitModelTaskCount
[Optional] Specify the number of vworkers (an INTEGER value) to use to initialize the model. Use InitModelTaskCount with UniqueID to ensure the same results across cluster configurations.
Default behavior: Function uses all available vworkers to initialize the model.
Seed settings and cluster configurations affect function results.