LDA Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

ModelTable

Specify the name for the model table that the function creates in the database. This table must not already exist.

OutputTable

[Optional] Specify the name of the output table that contains the topic distribution of each document in the input table, which the function creates in the database. This table must not already exist. If you omit this argument, the function does not create this table.

TopicNum

Specify the number of topics for all the documents in the input table, an INTEGER value in the range [2, 1000].

Alpha

[Optional] Specify a hyperparameter of the model, the prior smooth parameter for the topic distribution over documents. As alpha decreases, fewer topics are associated with each document.

Default: 0.1

Eta

[Optional] Specify a hyperparameter of the model, the prior smooth parameter for the word distribution over topics. As eta decreases, fewer words are associated with each topic.

Default: 0.1

DocIDColumn

Specify the name of the input column that contains the document identifiers.

WordColumn

Specify the name of the input column that contains the words (one word in each row).

CountColumn

[Optional] Specify the name of the input column that contains the count of the corresponding word in the row, a positive value.

Default behavior: The count of each word is 1.

MaxIterNum

[Optional] Specify the maximum number of iterations to perform if the model does not converge, a positive INTEGER value.

Default: 50

ConvergenceDelta

[Optional] Specify the convergence delta of log perplexity, a NUMERIC value in the range [0.0, 1.0].

Default: 1e-4

Seed

[Optional] Specify the random seed the algorithm uses for repeatable results (for more information, see Nondeterministic Results). The seed must be a LONG value.

OutputTopicNum

[Optional] Ignored unless OutputTable is specified. Specify the number of top-weighted topics to include, with their weights, in the output table for each training document:

Option	Description
'all' (Default)	All topics and their weights.
num_topics	Positive integer.

OutputTopicWordNum

[Optional] Ignored unless OutputTable is specified. Specify the number of top topic words to include, with their topic identifiers, in the output table for each training document:

Option	Description
'none' (Default)	No topic words or identifiers.
'all'	All topic words and their identifiers.
num_topic_words	Positive integer.

Seed settings and cluster configurations affect function results.