NaiveBayesTextClassifierTrainer Arguments

NaiveBayesTextClassifierTrainer Arguments - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

TokenColumn: Specify the name of the token_table column that contains the tokens to classify.
ModelType: [Optional] Specify the model type of the text classifier.; Default: 'Multinomial'. See the sections that follow this table.
DocIDColumn: [Required if ModelType is 'Bernoulli', unnecessary otherwise.] Specify the names of the token_table columns that contain the document identifier.
DocCategoryColumn: Specify the name of the token_table column that contains the document category.
CategoryColumn: [Optional] Use only if you specify categories_table. Specify the name of the categories_table column that contains the prediction categories to use in the model.; Default: First column of categories_table; If you omit both categories_table and CategoryColumn, the function uses all categories specified by DocCategoryColumn.
Categories: [Optional] Specify the prediction categories to use in the model.; Default: All categories specified by DocCategoryColumn.
StopWordsColumn: [Optional] Specify the name of the stop_words_table column that contains the stop words.; Default: First column of stop_words_table
StopWords: [Optional] Specify either this argument or the stop_words_table, but not both.
Specify words to ignore (such as a, an, and the).

p(C i \| D)	Probability that new document is classified to category i
TC	Total token count (including duplicate tokens)
T j	Count of token j in category i (including duplicate tokens)
TC i	Token count in category i (including duplicate tokens)
TC ij	Count of token j in category i (including duplicate tokens)
\|V\|	Number of unique tokens in training set V

p(C i \| D)	Probability that new document is classified to category i
DC	Total document count
DC i	Document count in category i
V	Number of unique tokens in training set V
T k	Token in V that is not in document D
DC ij	Document count in category i that contains token j
\|C\|	Number of unique categories in category set C