Teradata Package for Python Function Reference - TextTagger - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2021-11-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.mle.TextTagger = class TextTagger(builtins.object)

Methods defined here:

__init__(self, data=None, rules_data=None, language='en', rules=None, tokenize=False, outputby_tag=False, tag_delimiter=',', accumulate=None, data_sequence_column=None, rules_data_sequence_column=None, data_order_column=None, rules_data_order_column=None): DESCRIPTION: The TextTagger function tags text documents according to user-defined rules that use text-processing and logical operators. PARAMETERS: data: Required Argument. The input teradataml DataFrame that contains the texts. data_order_column: Optional Argument. Specifies Order By columns for data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) rules_data: Optional Argument. The input teradataml DataFrame that contains the rules. rules_data_order_column: Optional Argument. Specifies Order By columns for rules_data. Values to this argument can be provided as a list, if multiple columns are used for ordering. Types: str OR list of Strings (str) language: Optional Argument. Specifies the language of the input text: "en": English (default), "zh_cn": Simple Chinese, "zh_tw": Traditional Chinese, If Tokenize specifies "true", then the function uses the value of Language to create the word tokenizer. Default Value: "en" Permitted Values: en, zh_CN, zh_TW Types: str rules: Optional Argument. Specifies the tag names and tagging rules. Use this argument if and only if you do not specify a rules table. For information about defining tagging rules, refer to "Defining Tagging Rules" in function documentation. Types: str OR list of Strings (str) tokenize: Optional Argument. Specifies whether the function tokenizes the input text before evaluating the rules and tokenizes the text string parameter in the rule definition when parsing a rule. If you specify "True", then you must also specify the Language argument. Default Value: False Types: bool outputby_tag: Optional Argument. Specifies whether the function outputs a tuple when a text document matches multiple tags. which means that one tuple in the output stands for one document and the matched tags are listed in the output column tag. Default Value: False Types: bool tag_delimiter: Optional Argument. Specifies the delimiter that separates multiple tags in the output column tag if outputby.tag has the value "False" (the default). The default value is the comma (,). If outputby.tag has the value "True", specifying this argument causes an error. Default Value: "," Types: str accumulate: Optional Argument. Specifies the names of text teradataml DataFrame columns to copy to the output table. Note: Do not use the name "tag" for an accumulate_column, because the function uses that name for the output teradataml DataFrame column that contains the tags. Types: str OR list of Strings (str) data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) rules_data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "rules_data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of TextTagger. Output teradataml DataFrames can be accessed using attribute references, such as TextTaggerObj.<attribute_name>. Output teradataml DataFrame attribute name is: result RAISES: TeradataMlException EXAMPLES: # Load the data to run the example. load_example_data("TextTagger",["text_inputs","rule_inputs"]) # Create teradataml DataFrame objects. text_inputs = DataFrame("text_inputs") rule_inputs = DataFrame("rule_inputs") # Example 1: # Passing rules through 'rules_data' argument as teradataml dataframe. result = TextTagger(data=text_inputs, rules_data=rule_inputs, accumulate='id', language='en', tokenize=False, outputby_tag=False, tag_delimiter=',', data_sequence_column='id', rules_data_sequence_column='tagname') # Print the result print(result.result) # Example 2: # Passing rules through 'rules' argument as List of strings result = TextTagger(data=text_inputs, accumulate='id', rules=[ 'contain(content,"floods",1,) or contain(content,"tsunamis",1,) AS Natural-Disaster', 'contain(content,"Roger",1,) and contain(content,"Nadal",1,) AS Tennis-Rivalry', 'contain(titles,"Tennis",1,) and contain(content,"Roger",1,) AS Tennis-Greats', 'contain(content,"India",1,) and contain(content,"Pakistan",1,) AS Cricket-Rivalry', 'contain(content,"Australia",1,) and contain(content,"England",1,) AS The-Ashes'], language='en', tokenize=False, outputby_tag=False, tag_delimiter=',', data_sequence_column='id') # Print the result print(result.result) # Example 3 - Specify dictionary file in rules argument result = TextTagger(data = text_inputs, rules=['dict(content, "keywords.txt", 1,) and equal(titles, "Chennai Floods") AS Natural-Disaster', 'dict(content, "keywords.txt", 2,) and equal(catalog, "sports") AS Great-Sports-Rivalry '], accumulate = ["id"]) # Print the result print(result.result) # Example 4 - Specify superdist in rules argument result = TextTagger(data = text_inputs, rules=['superdist(content,"Chennai","floods",sent,,) AS Chennai-Flood-Disaster', 'superdist(content,"Roger","titles",para, "Nadal",para) AS Roger-Champion', 'superdist(content,"Roger","Nadal",para,,) AS Tennis-Rivalry', 'contain(content,regex"[A|a]shes",2,) AS Aus-Eng-Cricket', 'superdist(content,"Australia","won",nw5,,) AS Aus-victory'], accumulate = ["id"] ) # Print the result print(result.result)

__repr__(self): Returns the string representation for a TextTagger class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.