The NaiveBayesTextClassifierTrainer function has these input tables:
- Input table, token
- categories [Optional]
- stop_words [Optional]
The token table, which contains the classified training tokens, is usually generated by a tokenizing function, such as TextTokenizer or Text_Parser. The following table describes its schema.
Column Name | Data Type | Description |
---|---|---|
doc_id_column | CHARACTER, VARCHAR, text, INTEGER, or SMALLINT | Contains the identifiers of the documents that contain the classified training tokens. The table can have more than one such column. |
token_column | CHARACTER, VARCHAR, or text | Contains the classified training tokens. |
doc_category_column | CHARACTER, VARCHAR, or text | Contains the categories of the documents that contain the classified training tokens. Partition the table by this column.
|
The categories table contains all possible prediction categories. If you omit this table, then you must specify all possible prediction categories with the Categories argument.
Column Name | Data Type | Description |
---|---|---|
category_column | CHARACTER, VARCHAR, or text | Contains all possible prediction categories. |
The stop_words table contains all possible stop words (a, an, the, and so on). If you omit this table, then you must specify all possible stop words with the Stop_Words argument.
Column Name | Data Type | Description |
---|---|---|
stop_words_column | CHARACTER, VARCHAR, or text | Contains all possible stop words. |