LDA Input - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.10

1.1

Published

October 2019

Language

English (United States)

Last Update

2019-12-31

dita:mapPath

ima1540829771750.ditamap

dita:ditavalPath

jsj1481748799576.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

InputTable Schema

Column	Data Type	Description
doc_id_column	INTEGER, SMALLINT, BIGINT, NUMERIC, VARCHAR, VARBYTE(n), or BLOB	Document identifier.
word_column	INTEGER, SMALLINT, BIGINT, or VARCHAR	Word.
count_column	INTEGER, SMALLINT, BIGINT, NUMERIC, or DOUBLE PRECISION	[Column appears only with CountColumn syntax element.] Number of times word appears in document.

You can use TextParser Output as input to the LDA function. Teradata recommends filtering out words with low and high frequency, which impact topics that consist of common words that are not meaningful in the topic model.