| |
Methods defined here:
- __init__(self, data=None, text_coloumn=None, extractor_jar=None, feature_template=None, model_file=None, language='en', max_iter_num=1000, eta=0.0001, min_occur_num=0, data_sequence_column=None)
- DESCRIPTION:
The NERTrainer function takes training data and outputs a CRF model
(a binary file) that can be specified in the function NERExtractor
and NEREvaluator.
PARAMETERS:
data:
Required Argument.
Specifies an input teradataml DataFrame containing training data.
text_coloumn:
Required Argument.
Specifies the name of the input teradataml DataFrame column that
contains the text to analyze.
Types: str
extractor_jar:
Optional Argument.
Specifies the name of the JAR file that contains the Java classes
that extract features. The function includes the predefined extractor
classes described in the file provided in the argument feature_template.
Note:
1. The name of the JAR file is case-sensitive.
2. The ML Engine does not support the creation of new extractor classes.
However, it does support existing JAR files—for installation instructions,
see Teradata Vantage User Guide.
Types: str
feature_template:
Required Argument.
Specifies the name of the file that specifies how to generate
features when training the model. This file is pre-installed
in ML Engine under the name "template_1.txt".
Types: str
model_file:
Required Argument.
Specifies the name of the model file that is generated and installed
in the ML Engine by the function.
Types: str
language:
Optional Argument.
Specifies the language of the input text:
* en - English
* zh_CN - Simplified Chinese
* zh_TW - Traditional Chinese
Default Value: "en"
Permitted Values: en, zh_CN, zh_TW
Types: str
max_iter_num:
Optional Argument.
Specifies the maximum number of iterations.
Types: int
eta:
Optional Argument.
Specifies the tolerance of the termination criterion. Defines the
differences of the values of the loss function between two sequential
epochs. When training a model, the function performs n-times
iterations. At the end of each epoch, the function calculates the
loss or cost function on the training samples. If the loss function
value change is very small between two sequential epochs, the
function considers the training process to have converged.
The function defines eta as:
Eta=(f(n)-f(n-1))/f(n-1), where f(n) is the loss function value of the nth epoch.
Default Value: 1.0E-4
Types: float
min_occur_num:
Optional Argument.
Specifies the minimum number of times that a feature must occur in the
input text before the function uses the feature to construct the
model.
Default Value: 0
Types: int
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of NERTrainer.
Output teradataml DataFrames can be accessed using attribute
references, such as NERTrainerObj.<attribute_name>.
Output teradataml DataFrame attribute names are:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example
load_example_data("nertrainer","ner_sports_train")
# Create teradataml DataFrame object.
ner_sports_train = DataFrame.from_table("ner_sports_train")
# Run the NERTrain function to generated a trained model file which is used in NERExtractor or NEREvaluator
nertrainer_train = NERTrainer(data=ner_sports_train,
text_coloumn='content',
model_file='ner_model.bin',
feature_template='template_1.txt',
language='en',
eta=0.0001,
max_iter_num=1000,
min_occur_num=0,
extractor_jar=' ')
# Print the result DataFrame
print(nertrainer_train.result)
- __repr__(self)
- Returns the string representation for a NERTrainer class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|