LDA Output - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Output Message Schema

Column	Data Type	Description
message	VARCHAR	Reports iteration steps and perplexity of model. Perplexity formula: perplexity = 2H (p) = 2-Σx p (x) log2 p (x) where H (p) is the entropy of the distribution. Although perplexity varies with training documents, you can use perplexity to find the best model for a specified set of training documents: Create models for several subsets of the training documents and then choose the model with the lowest perplexity.

Column

Data Type

Description

message

VARCHAR

Reports iteration steps and perplexity of model.

Perplexity formula:

perplexity = 2H (p) = 2-Σx p (x) log2 p (x)

where H (p) is the entropy of the distribution.

Although perplexity varies with training documents, you can use perplexity to find the best model for a specified set of training documents: Create models for several subsets of the training documents and then choose the model with the lowest perplexity.

ModelTable Schema

Column	Data Type	Description
topicid	INTEGER	Internally created topic identifier.
value_col	BLOB	Model in binary format, which is not readable. To see binary contents, use LDATopicSummary (ML Engine) function.

OutputTable Schema

This table appears only with the OutputTable syntax element.

Column	Data Type	Description
docid	Same as doc_id_column in input table	Document identifier from input table.
topicid	INTEGER	Topic identifier from ModelTable.
topicweight	DOUBLE PRECISION	[Column appears number of times specified by OutputTopicNum syntax element.] Topic weight.
topicwords	VARCHAR	[Column appears number of times specified by OutputTopicWordNum syntax element.] Topic words in document, separated by commas.