Argument | Category | Description |
---|---|---|
TextColumn | Required | Specifies the name of the input table column that contains the text to analyze. |
Model | Optional | Specifies the model items to load. Optional if you specify configuration_table; required otherwise (and you cannot specify 'all'). If you specify both configuration_table and this argument, then the function loads the specified model items from configuration_table. If you specify configuration_table but omit this argument, its default value is 'all' (every model item from configuration_table). The entity_type is the name of an entity type (for example, PERSON, LOCATION, or EMAIL), which appears in the output table. The model_type is one of these model types:
If model_type is 'reg exp', specify regular_expression (a regular expression that describes entity_type); otherwise, specify model_file (the name of the model file). Before calling the function, add the location of every specified model_file to the user/session default search path. If you specify configuration_table, you can use entity_type as a shortcut. For example, if the configure_table has the row 'organization, max entropy, en-ner-organization.bin', you can specify Model('organization') as a shortcut for Model('organization:max entropy:en-ner-organization.bin'). For model_type 'max entropy', if you specify configuration_file and omit this argument, then the JVM of the worker node needs more than 2GB of memory.
|
ShowEntityContext | Optional | Specifies the number of context words to output. If context_words is n (which must be a positive integer), the function outputs the n words that precede the entity, the entity, and the n words that follow the entity. The default value is 0. |
EntityColumn | Optional | Specifies the name of the output table column that contains the entity names. The default value is 'entity'. |
Accumulate | Optional | Specifies the names of input columns to copy to the output table. No accumulate_column can be an entity_column. By default, the function copies all input columns to the output table. |