The LDATrainer function outputs:
- Message
- Model table
- [Optional] Output table
Column Name | Data Type | Description |
---|---|---|
message | TEXT, VARCHAR, or VARCHAR(n) | Reports the iteration steps and perplexity of the model. The perplexity formula is: perplexity = 2 H (p) = 2-Σ x p (x) log2 p (x) where H (p) is the entropy of the distribution. Perplexity varies with training documents. However, you can use perplexity to find the best model for a specified set of training documents: Generate models for several subsets of the training documents and then choose the model with the lowest perplexity. |
Column Name | Data Type | Description |
---|---|---|
topicid | INTEGER | Internally generated topic identifier. |
value | BYTEA | Model in binary format. |
Column Name | Data Type | Description |
---|---|---|
docid | Same as data type of doc_column in input table | Contains document identifiers from the input table. |
topicid | INTEGER | Contains topic identifiers from the model table. |
topicweight | DOUBLE PRECISION | Contains topic weights. |
topicwords | TEXT, VARCHAR, or VARCHAR(n) | Optional. Contains topic words, separated by commas. |
The model table is in BYTEA
format, which is not readable. To see the binary contents, use the function LDATopicPrinter.