Input Table Schema
Column | Data Type | Description |
---|---|---|
text_column | VARCHAR | Text to tokenize. |
accumulate_column | Any | [Column appears once for each specified accumulate_column.] Column to copy to output table. |
Dict Schema
This table is optional.
Column | Data Type | Description |
---|---|---|
entry | VARCHAR | Dictionary entry. |
Dictionary Table and User Dictionary File Format
This table describes the format of both the dictionary table (Dict) and the user dictionary file (specified by the UserDictionaryFile syntax element).
Language | Format |
---|---|
Chinese and English | One dictionary word on each line. |
Japanese | A dictionary entry consists of the following comma-separated words: word—The original word. tokenized_word—The tokenized form of the word. reading—The reading of word in Katakana. pos—The part-of-speech of the word. For example: 成田空港,成田空港,ナリタクウコウ,カスタム名詞 |