Input - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
The NaiveBayesTextClassifierTrainer function has these input tables:
  • Input table, token
  • categories [Optional]
  • stop_words [Optional]

The token table, which contains the classified training tokens, is usually generated by a tokenizing function, such as TextTokenizer or Text_Parser. The following table describes its schema.

NaiveBayesTextClassifierTrainer Token Table Schema
Column Name Data Type Description
doc_id_column CHARACTER, VARCHAR, text, INTEGER, or SMALLINT Contains the identifiers of the documents that contain the classified training tokens. The table can have more than one such column.
token_column CHARACTER, VARCHAR, or text Contains the classified training tokens.
doc_category_column CHARACTER, VARCHAR, or text Contains the categories of the documents that contain the classified training tokens.
Partition the table by this column.

The categories table contains all possible prediction categories. If you omit this table, then you must specify all possible prediction categories with the Categories argument.

NaiveBayesTextClassifierTrainer Categories Table Schema
Column Name Data Type Description
category_column CHARACTER, VARCHAR, or text Contains all possible prediction categories.

The stop_words table contains all possible stop words (a, an, the, and so on). If you omit this table, then you must specify all possible stop words with the Stop_Words argument.

NaiveBayesTextClassifierTrainer Stop_Words Table Schema
Column Name Data Type Description
stop_words_column CHARACTER, VARCHAR, or text Contains all possible stop words.