Column Name | Data Type | Description |
---|---|---|
doc_column | INTEGER, SMALLINT, BIGINT, NUMERIC, NUMERIC(p), NUMERIC(p,a), TEXT, VARCHAR, VARCHAR(n), UUID, or BYTEA. | Contains the document identifiers. |
word_column | INTEGER, SMALLINT, BIGINT, TEXT, VARCHAR, or VARCHAR(n) | Contains the words (one word in each row). |
count_column | INTEGER, SMALLINT, BIGINT, NUMERIC, NUMERIC(p), NUMERIC(p,a),DOUBLE PRECISION | Optional. Contains the counts of the words. The default value is 1. |
You can use the output of
the TextTokenizer function with the argument
OutputByWord('true') as input to the LDATrainer function. Teradata recommends that
you filter out the words with low frequency and high frequency, as they may impact
the topics that consist of common words that are not meaningful in topic
model.