Input - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

The function has these input tables:

Input table
Dictionary table [Optional]

TextTokenizer Input Table Schema
Column Name	Data Type	Description
text_column	VARCHAR	Text to tokenize.
accumulate_column	Any	Column to copy to the output table.

TextTokenizer Dictionary Table Schema
Column Name	Data Type	Description
entry	VARCHAR	Dictionary entry.

The following table describes the format of both the dictionary table (dict) and the user dictionary file (specified by the UserDictionaryFile argument).

TextTokenizer Dictionary Table and User Dictionary File Format
Language	Format
Chinese and English	One dictionary word on each line.
Japanese	A dictionary entry consists of the following comma-separated words: word—The original word. tokenized_word—The tokenized form of the word. reading—The reading of word in Katakana. pos—The part-of-speech of the word. For example: 成田空港,成田空港,ナリタクウコウ,カスタム名詞