| |
Methods defined here:
- __init__(self, data=None, text_column=None, entity_type=None, model=None, iter_num=100, cutoff=5, data_sequence_column=None)
- DESCRIPTION:
The NamedEntityFinderTrainer function takes training data and outputs
a Max Entropy data model. The function is based on OpenNLP, and
follows its annotation. For more information on OpenNLP, see
https://opennlp.apache.org/docs/1.8.4/manual/opennlp.html.
The trainer supports only the English language.
PARAMETERS:
data:
Required Argument.
Specifies the input teradataml DataFrame containing text column
to train.
text_column:
Required Argument.
Specifies the name of the input teradataml DataFrame column that
contains the text to analyze.
Types: str
entity_type:
Required Argument.
Specifies the entity type to be trained (for example, PERSON). The
input training documents must contain the same tag.
Types: str
model:
Required Argument.
Specifies the name of the data model file to be generated.
Types: str
iter_num:
Optional Argument.
Specifies the iterator number for training (an openNLP training
parameter).
Default Value: 100
Types: int
cutoff:
Optional Argument.
Specifies the cutoff number for training (an openNLP training
parameter).
Default Value: 5
Types: int
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of NamedEntityFinderTrainer.
Output teradataml DataFrames can be accessed using attribute
references, such as NamedEntityFinderTrainerObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data('namedentityfindertrainer', 'nermem_sports_train')
# Provided example table is 'nermem_sports_train'. It contains two columns - 'id' and
# 'content'. 'content' column contains the training text data.
# Create teradataml DataFrame objects.
nermem_sports_train = DataFrame.from_table('nermem_sports_train')
# Example 1: Train a NamedEntityFinder model on entity type: "LOCATION".
# The trained model is stored in a binary file: "location.sports"
NamedEntityFinderTrainer_out = NamedEntityFinderTrainer(data=nermem_sports_train,
text_column='content',
entity_type='LOCATION',
model='location.sports',
cutoff=5,
data_sequence_column='id')
# Print the results
print(NamedEntityFinderTrainer_out.result)
- __repr__(self)
- Returns the string representation for a NamedEntityFinderTrainer class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|