The input text table, dictionary table, and regex rules table can be specified in any order.
- The input text table should be aliased with InputTable.
- The Rules table should be aliased with Rules.
- The dictionary table should be aliased as Dict.
- Rules and Dict table should be DIMENSION. If aliases are not specified, an error is returned.
- If incorrect aliases are passed, an error is returned.
Both UNICODE and LATIN character set are allowed, however, all input tables must share the same character set. For example, if TextColumn is UNICODE, the contents of Dict and Rules table must also be UNICODE.
InputTable Schema
Column Name | Data Type | Description |
---|---|---|
text_column | VARCHAR | Column contains input text. |
accumulate_column | ANY | Column to copy to output table. CLOB and BLOB columns cannot be part of Accumulate columns. |
Rules Table Schema
Column names for Rules table must be type_ner and regex; otherwise, an error will be thrown.
Column Name | Data Type | Description |
---|---|---|
type_ner | VARCHAR | Name of the entity. |
regex | VARCHAR | Regular Expression pattern of the entity. No escape characters are needed for some special characters. For example to find ‘$' character, a valid regular expression would be ‘\$’ , and not ‘\\$’.
The following characters need to be escaped with one backslash for literal match:
|
Dict Table Schema
Column names for Dict table must be type_ner and dict, otherwise, an error will be thrown.