7.00.02 - Input - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
Last Update
2018-04-17
The function has these input tables:
  • Input table
  • Dictionary table [Optional]
TextTokenizer Input Table Schema
Column Name Data Type Description
text_column VARCHAR Text to tokenize.
accumulate_column Any Column to copy to the output table.
TextTokenizer Dictionary Table Schema
Column Name Data Type Description
entry VARCHAR Dictionary entry.

The following table describes the format of both the dictionary table (dict) and the user dictionary file (specified by the UserDictionaryFile argument).

TextTokenizer Dictionary Table and User Dictionary File Format
Language Format
Chinese and English One dictionary word on each line.
Japanese A dictionary entry consists of the following comma-separated words:

word—The original word.

tokenized_word—The tokenized form of the word.

reading—The reading of word in Katakana.

pos—The part-of-speech of the word.

For example:

成田空港,成田空港,ナリタクウコウ,カスタム名詞