Table | Description |
---|---|
tokens | Contains classified training tokens. Usually output by a tokenizing function, such as TextTokenizer or TextParser. |
[Optional] categories | Contains prediction categories to use in model, which you can also specify with Categories argument. If you omit both this table and Categories argument, function uses all categories specified by DocCategoryColumn argument. |
[Optional] stop_words | Contains stop words (a, an, the, and so on). If you omit this table, you must specify stop words with StopWords argument. |
tokens Schema
Column | Data Type | Description |
---|---|---|
doc_id_column | CHARACTER, VARCHAR, INTEGER, or SMALLINT | [Column appears once for each specified doc_id_column.] Identifier of document that contains classified training tokens. |
token_column | CHARACTER or VARCHAR | Classified training token. |
doc_category_column | CHARACTER or VARCHAR | Category of document. Partition table by this column.
|
categories Schema
Column | Data Type | Description |
---|---|---|
category_column | CHARACTER or VARCHAR | Prediction category. |
stop_words Schema
Column | Data Type | Description |
---|---|---|
stop_words_column | CHARACTER or VARCHAR | Stop word. |