Column Name | Data Type | Description |
---|---|---|
doc_column | INTEGER, SMALLINT, BIGINT, NUMERIC, NUMERIC(p), NUMERIC(p,a), TEXT, VARCHAR, VARCHAR(n), UUID, or BYTEA. | Contains the document identifiers. |
word_column | INTEGER, SMALLINT, BIGINT, TEXT, VARCHAR, or VARCHAR(n) | Contains the words (one word in each row). |
count_column | INTEGER, SMALLINT, BIGINT, NUMERIC, NUMERIC(p), NUMERIC(p,a),DOUBLE PRECISION | Optional. Contains the counts of the words. The default value is 1. |
You can use the output of the TextTokenizer function with the argument OutputByWord('true') as input to the LDATrainer function. Teradata recommends filtering out words with low and high frequency, which impact topics that consist of common words that are not meaningful in topic model.