Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
Argument Category Description
TokenColumn Required Specifies the name of the token_table column that contains the tokens to be classified.
ModelType Optional Specifies the model type of the text classifier. The default value is 'Multinomial'. The formulas for the two model types follow this table.
DocIDColumn Required if ModelType is 'Bernoulli', unnecessary otherwise Specifies the names of the token_table columns that contain the document identifier.
DocCategoryColumn Required Specifies the name of the token_table column that contains the document category.
CategoryColumn Optional Specifies the name of the categories_table column that contains the prediction categories. The default value is the first column of categories_table.
Categories Optional Specifies the prediction categories.
Specify either this argument or the categories_table, but not both.
StopWordsColumn Optional Specifies the name of the stop_words_table column that contains the stop words. The default value is the first column of stop_words_table.
StopWords Optional Specifies words to ignore (such as a, an, and the).
Specify either this argument or the stop_words_table, but not both.

The Multinomial (default) model formula is:



p(C i | D) Probability that new document is classified to category i
TC Total token count (including duplicate tokens)
T j Count of token j in category i (including duplicate tokens)
TC i Token count in category i (including duplicate tokens)
TC ij Count of token j in category i (including duplicate tokens)
|V| Number of unique tokens in training set V

The Bernoulli model formula is:



p(C i | D) Probability that new document is classified to category i
DC Total document count
DC i Document count in category i
V Number of unique tokens in training set V
T k Token in V that is not in document D
DC ij Document count in category i that contains token j
|C| Number of unique categories in category set C