Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product: Aster Analytics
Release Number: 6.21
Published: November 2016
Language: English (United States)
Last Update: 2018-04-14
dita:mapPath: kiu1466024880662.ditamap
dita:ditavalPath: AA-notempfilter_pdf_output.ditaval
dita:id: B700-1021
lifecycle: previous
Product Category: Software

Argument	Category	Description
TokenColumn	Required	Specifies the name of the token_table column that contains the tokens to be classified.
ModelType	Optional	Specifies the model type of the text classifier. The default value is 'Multinomial'. The formulas for the two model types follow this table.
DocIDColumn	Required if ModelType is 'Bernoulli', unnecessary otherwise	Specifies the names of the token_table columns that contain the document identifier.
DocCategoryColumn	Required	Specifies the name of the token_table column that contains the document category.
CategoryColumn	Optional	Specifies the name of the categories_table column that contains the prediction categories. The default value is the first column of categories_table.
Categories	Optional	Specifies the prediction categories. Specify either this argument or the categories_table, but not both.
StopWordsColumn	Optional	Specifies the name of the stop_words_table column that contains the stop words. The default value is the first column of stop_words_table.
StopWords	Optional	Specifies words to ignore (such as a, an, and the). Specify either this argument or the stop_words_table, but not both.

The Multinomial (default) model formula is:

p(C i \| D)	Probability that new document is classified to category i
TC	Total token count (including duplicate tokens)
T j	Count of token j in category i (including duplicate tokens)
TC i	Token count in category i (including duplicate tokens)
TC ij	Count of token j in category i (including duplicate tokens)
\|V\|	Number of unique tokens in training set V

The Bernoulli model formula is:

p(C i \| D)	Probability that new document is classified to category i
DC	Total document count
DC i	Document count in category i
V	Number of unique tokens in training set V
T k	Token in V that is not in document D
DC ij	Document count in category i that contains token j
\|C\|	Number of unique categories in category set C