LDA Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
ModelTable
Specify the name for the model table that the function creates in the database. This table must not already exist.
OutputTable
[Optional] Specify the name of the output table that contains the topic distribution of each document in the input table, which the function creates in the database. This table must not already exist. If you omit this argument, the function does not create this table.
TopicNum
Specify the number of topics for all the documents in the input table, an INTEGER value in the range [2, 1000].
Alpha
[Optional] Specify a hyperparameter of the model, the prior smooth parameter for the topic distribution over documents. As alpha decreases, fewer topics are associated with each document.
Default: 0.1
Eta
[Optional] Specify a hyperparameter of the model, the prior smooth parameter for the word distribution over topics. As eta decreases, fewer words are associated with each topic.
Default: 0.1
DocIDColumn
Specify the name of the input column that contains the document identifiers.
WordColumn
Specify the name of the input column that contains the words (one word in each row).
CountColumn
[Optional] Specify the name of the input column that contains the count of the corresponding word in the row, a positive value.
Default behavior: The count of each word is 1.
MaxIterNum
[Optional] Specify the maximum number of iterations to perform if the model does not converge, a positive INTEGER value.
Default: 50
ConvergenceDelta
[Optional] Specify the convergence delta of log perplexity, a NUMERIC value in the range [0.0, 1.0].
Default: 1e-4
Seed
[Optional] Specify the random seed the algorithm uses for repeatable results (for more information, see Nondeterministic Results). The seed must be a LONG value.
OutputTopicNum
[Optional] Ignored unless OutputTable is specified. Specify the number of top-weighted topics to include, with their weights, in the output table for each training document:
Option Description
'all' (Default) All topics and their weights.
num_topics Positive integer.
OutputTopicWordNum
[Optional] Ignored unless OutputTable is specified. Specify the number of top topic words to include, with their topic identifiers, in the output table for each training document:
Option Description
'none' (Default) No topic words or identifiers.
'all' All topic words and their identifiers.
num_topic_words Positive integer.
Seed settings and cluster configurations affect function results.