- TextColumn
- Specifies the name of the input table column that contains the text to analyze.
- Model
- [Optional] Required if you do not specify configuration_table, in which case you cannot specify 'all'.
Specifies the model items to load.
If you specify both configuration_table and this argument, the function loads the specified model items from configuration_table.
Default: 'all' (If you specify configuration_table but omit this argument).
The entity_type is the name of an entity type (for example, PERSON, LOCATION, or EMAIL), which appears in the output table.
The model_type is one of these model types:
-
'max entropy'
Maximum entropy language model generated by training.
-
'rule'
Rule-based model, a plain text file with one regular expression on each line.
-
'dictionary'
Dictionary-based model, a plain text file with one word on each line.
-
'reg exp'
Regular expression that describes entity_type.
If model_type is 'reg exp', specify regular_expression (a regular expression that describes entity_type); otherwise, specify model_file (the name of the model file). Before calling the function, add the location of every specified model_file to the user/session default search path.
If you specify configuration_table, you can use entity_type as a shortcut. For example, if the configure_table has the row 'organization, max entropy, en-ner-organization.bin', you can specify Model('organization') as a shortcut for Model('organization:max entropy:en-ner-organization.bin').
For model_type 'max entropy', if you specify configuration_file and omit this argument, then the JVM of the worker node needs more than 2GB of memory. -
'max entropy'
- ShowEntityContext
- [Optional] Specifies the number of context words to output. If context_words is n (which must be a positive integer), the function outputs the n words that precede the entity, the entity, and the n words that follow the entity. Default: 0.
- EntityColumn
- [Optional] Specifies the name of the output table column that contains the entity names. Default: 'entity'.
- Accumulate
- [Optional] Specifies the names of input columns to copy to the output table. No accumulate_column can be an entity_column. Default: All input columns.