The function has these input tables:
- Input table
- Dictionary table [Optional]
Column Name | Data Type | Description |
---|---|---|
text_column | VARCHAR | Text to tokenize. |
accumulate_column | Any | Column to copy to the output table. |
Column Name | Data Type | Description |
---|---|---|
entry | VARCHAR | Dictionary entry. |
The following table describes the format of both the dictionary table (dict) and the user dictionary file (specified by the UserDictionaryFile argument).
Language | Format |
---|---|
Chinese and English | One dictionary word on each line. |
Japanese | A dictionary entry consists of the following comma-separated words: word—The original word. tokenized_word—The tokenized form of the word. reading—The reading of word in Katakana. pos—The part-of-speech of the word. For example: 成田空港,成田空港,ナリタクウコウ,カスタム名詞 |