Teradata Package for Python Function Reference | 17.10 - NamedEntityFinder - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.mle.NamedEntityFinder = class NamedEntityFinder(builtins.object)

Methods defined here:

__init__(self, newdata=None, configure_table_data=None, text_column=None, model=None, show_entity_context=0, entity_column='entity', accumulate=None, newdata_sequence_column=None, configure_table_data_sequence_column=None, newdata_order_column=None, configure_table_data_order_column=None): DESCRIPTION: The NamedEntityFinder function evaluates the input text, identifies tokens based on the specified model, and outputs the tokens with detailed information. The function does not identify sentences; it simply tokenizes. Token identification is not case-sensitive. PARAMETERS: newdata: Required Argument. Specifies the input teradataml DataFrame containing the column with the text to find Named Entities. newdata_order_column: Optional Argument. Specifies Order By columns for newdata. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) configure_table_data: Optional Argument. Specifies the teradataml DataFrame containing the configuration data. configure_table_data_order_column: Optional Argument. Specifies Order By columns for configure_table_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) text_column: Required Argument. Specifies the name of the input teradataml DataFrame column that contains the text to analyze. Types: str model: Optional Argument. Specifies the model items to load. Optional if you specify configure_table_data; required otherwise (and you cannot specify "all"). If you specify both configure_table_data and this argument, then the function loads the specified model items from configure_table_data. If you specify configure_table_data but omit this argument, the default value of this argument is "all" (every model item from configure_table_data). The entity_type is the name of an entity type (for example, PERSON, LOCATION, or EMAIL), which appears in the output table. The model_type is one of these model types: • max entropy: Maximum entropy language model generated by training; • rule: Rule-based model, a plain text file with one regular expression on each line; • dictionary: Dictionary-based model, a plain text file with one word on each line; • reg exp: Regular expression that describes entity_type. If model_type is "reg exp", specify regular_expression (a regular expression that describes entity_type); otherwise, specify model_file (the name of the model file). If you specify configure_table_data, you can use entity_type as a shortcut. For example, if the configure_table_data has the row "organization, max entropy, en-ner-organization.bin", you can specify Model("organization") as a shortcut for Model("organization:max entropy:en-nerorganization.bin"). Note: For model_type "max entropy", if you specify configuration_file and omit this argument, then the Java virtual machine (JVM) of the worker node needs more than 2GB of memory. Types: str show_entity_context: Optional Argument. Specifies the number of context words to output. If the number of context words is n (which must be a positive integer), the function outputs n words that precede the entity, the entity itself, and n words that follow the entity. Default Value: 0 Types: int entity_column: Optional Argument. Specifies the name of the output teradataml DataFrame column that contains the entity names. Default Value: "entity" Types: str accumulate: Optional Argument. Specifies the names of input teradataml DataFrame columns to copy to the output teradataml DataFrame. No accumulate_column can be an entity_column. By default, the function copies all input teradataml DataFrame columns to the output teradataml DataFrame. Types: str OR list of Strings (str) newdata_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "newdata". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) configure_table_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "configure_table_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of NamedEntityFinder. Output teradataml DataFrames can be accessed using attribute references, such as NamedEntityFinderObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load example data. load_example_data("namedentityfinder", ['assortedtext_input', 'name_Find_configure']) # Provided example tables are 'assortedtext_input' and 'nameFind_configure'. # 'assortedtext_input' table contains the text 'content' which is analysed to get # Named Entities. 'nameFind_configure' is the configuration table which contain # the columns 'model_name', 'model_type' and 'model_file'. # Create teradataml DataFrame objects. nameFind_configure = DataFrame.from_table("name_Find_configure") assortedtext_input = DataFrame.from_table("assortedtext_input") # Example 1: Find entities using a configuration table containing model items. NamedEntityFinder_out = NamedEntityFinder(newdata = assortedtext_input, configure_table_data = nameFind_configure, text_column = 'content', accumulate = ['id', 'source'], entity_column = 'entity', model = 'all', show_entity_context = 0, newdata_sequence_column = 'id', configure_table_data_sequence_column= 'model_file') # Print the results print(NamedEntityFinder_out.result) # Example 2: Use a custom trained model to find the entities. # Load example data. load_example_data('namedentityfindertrainer', 'nermem_sports_train') # Create teradataml DataFrame object nermem_sports_train = DataFrame.from_table('nermem_sports_train') # Training NamedEntityFinder model on entity type "LOCATION" NamedEntityFinderTrainer_out = NamedEntityFinderTrainer(data = nermem_sports_train, text_column = 'content', entity_type = 'LOCATION', model = 'location.sports') # The trained model is stored in 'location.sports' # Select a subset of the train dataset to use as "newdata" in NamedEntityFinder. nermem_sports_test = nermem_sports_train[nermem_sports_train.id < 20] # Finding entities using custom trained model NamedEntityFinder_out1 = NamedEntityFinder(newdata = nermem_sports_test, text_column = 'content', model = "LOCATION:max entropy:location.sports") # Print the results print(NamedEntityFinder_out1.result)

__repr__(self): Returns the string representation for a NamedEntityFinder class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.