- TokenColumn
- Specify the name of the InputTable column that contains the tokens to classify.
- ModelType
- [Optional] Specify the model type of the text classifier.
- DocIDColumn
- [Required if ModelType is 'Bernoulli', unnecessary otherwise.] Specify the names of the token table columns that contain the document identifier.
- DocCategoryColumn
- Specify the name of the InputTable column that contains the document category.
- CategoryColumn
- [Optional] Use only if you specify CategoriesTable. Specify the name of the CategoriesTable column that contains the prediction categories to use in the model.
- Categories
- [Optional] Specify the prediction categories to use in the model.
- StopWordsColumn
- [Optional] Specify the name of the StopWords table column that contains the stop words.
- StopWordsList
- [Optional] Specify either this syntax element or the StopWords table, but not both.
Specify words to ignore (such as a, an, and the).
Multinomial (default) Model Formula
Expression | Description |
---|---|
p(C i|D) | Probability that new document D is classified to category i |
TC | Total token count (including duplicate tokens) |
T j | Count of token j in category i (including duplicate tokens) |
TC i | Token count in category i (including duplicate tokens) |
TC ji | Count of token j in category i (including duplicate tokens) |
|V| | Number of unique tokens in training set V |
Bernoulli Model Formula
Expression | Description |
---|---|
p(C i|D) | Probability that new document D is classified to category i |
DC | Total document count |
DC i | Document count in category i |
V | Number of unique tokens in training set V |
T k | Token in V that is not in document D |
DC ji | Document count in category i that contains token j |
DC ki | Document count in category i that contains token k |
|C| | Number of unique categories in category set C |