| |
Methods defined here:
- __init__(self, object=None, newdata=None, dict_data=None, text_column=None, language='en', level='DOCUMENT', high_priority='NONE', filter='ALL', accumulate=None, newdata_sequence_column=None, dict_data_sequence_column=None, newdata_order_column=None, dict_data_order_column=None)
- DESCRIPTION:
The SentimentExtractor function extracts the sentiment (positive,
negative, or neutral) of each input document or sentence, using
either a classification model output by the function SentimentTrainer
or a dictionary model.
PARAMETERS:
object:
Optional Argument.
Specifies the model type and file. The default model type is
dictionary. If you omit this argument or specify dictionary without
dictionary file, then you must specify a dictionary teradataml DataFrame
with the name dict_data. If you specify both dict and dictionary file, then
whenever their words conflict, dict has higher priority. The
dictionary file must be a text file in which each line contains only a
sentiment word, a space, and the opinion score of the sentiment word.
If you specify classification:model_file, model_file must be the name
of a model file generated and installed on the database by the
function SentimentTrainer.
Note: Before running the function, add the location of dictionary file or
model_file to the user/session default search path.
Types: str
newdata:
Required Argument.
Specifies the teradataml DataFrame defining the input text.
newdata_order_column:
Optional Argument.
Specifies Order By columns for newdata.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
dict_data:
Optional Argument.
Specifies the teradataml DataFrame defining the dictionary.
dict_data_order_column:
Optional Argument.
Specifies Order By columns for dict_data.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Types: str OR list of Strings (str)
text_column:
Required Argument.
Specifies the name of the input column that contains text from which
to extract sentiments.
Types: str
language:
Optional Argument.
Specifies the language of the input text:
- en (English)
- zh_CN (Simplified Chinese)
- zh_TW (Traditional Chinese)
Default Value: "en"
Permitted Values: en, zh_CN, zh_TW
Types: str
level:
Optional Argument.
Specifies the level of analysis — whether to analyze each document or
each sentence.
Default Value: "DOCUMENT"
Permitted Values: DOCUMENT, SENTENCE
Types: str
high_priority:
Optional Argument.
Specifies the highest priority when returning results:
- NEGATIVE_RECALL: Give highest priority to negative results, including
those with lower confidence sentiment classifications
(maximizes the number of negative results returned).
- NEGATIVE_PRECISION: Give highest priority to negative results with
high-confidence sentiment classifications.
- POSITIVE_RECALL: Give highest priority to positive results, including
those with lower confidence sentiment classifications
(maximizes the number of positive results returned).
- POSITIVE_PRECISION: Give highest priority to positive results with
high-confidence sentiment classifications.
NONE: Give all results the same priority.
Default Value: "NONE"
Permitted Values: NEGATIVE_RECALL, NEGATIVE_PRECISION,
POSITIVE_RECALL, POSITIVE_PRECISION, NONE
Types: str
filter:
Optional Argument.
Specifies the kind of results to return:
- POSITIVE: Return only results with positive sentiments.
- NEGATIVE: Return only results with negative sentiments.
- ALL: Return all results.
Default Value: "ALL"
Permitted Values: POSITIVE, NEGATIVE, ALL
Types: str
accumulate:
Optional Argument.
Specifies the names of the input columns to copy to the output teradataml DataFrame.
Types: str OR list of Strings (str)
newdata_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "newdata". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
dict_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "dict_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of SentimentExtractor.
Output teradataml DataFrames can be accessed using attribute
references, such as SentimentExtractorObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data("sentimenttrainer", "sentiment_train")
load_example_data("sentimentextractor", ["sentiment_extract_input", "sentiment_word"])
# Create teradataml DataFrame objects.
sentiment_train = DataFrame.from_table("sentiment_train")
sentiment_extract_input = DataFrame.from_table("sentiment_extract_input")
sentiment_word = DataFrame.from_table("sentiment_word")
# Example 1 - This example uses the dictionary model file to analyze each document.
SentimentExtractor_out1 = SentimentExtractor(object = "dictionary",
newdata = sentiment_extract_input,
text_column = "review",
level = "document",
accumulate = ["id","product"]
)
# Print the results
print(SentimentExtractor_out1)
# Example 2 - This example uses the dictionary model file to analyze each sentence.
SentimentExtractor_out2 = SentimentExtractor(object = "dictionary",
newdata = sentiment_extract_input,
text_column = "review",
level = "sentence",
accumulate = ["id","product"]
)
# Print the results
print(SentimentExtractor_out2)
# Example 3 - This example uses a maximum entropy classification model file.
SentimentExtractor_out3 = SentimentExtractor(object = "classification:default_sentiment_classification_model.bin",
newdata = sentiment_extract_input,
text_column = "review",
level = "document",
accumulate = ["id"]
)
# Print the results
print(SentimentExtractor_out3)
# Example 4 - This example uses a model file output by the SentimentTrainer function.
SentimentTrainer_out = SentimentTrainer(data = sentiment_train,
text_column = "review",
sentiment_column = "category",
model_file = "sentimentmodel1.bin"
)
SentimentExtractor_out4 = SentimentExtractor(object = "classification:sentimentmodel1.bin",
newdata = sentiment_extract_input,
text_column = "review",
level = "document",
accumulate = ["id"]
)
# Print the results
print(SentimentExtractor_out4)
# Example 5 - This example uses a dictionary instead of a model file.
SentimentExtractor_out5 = SentimentExtractor(dict_data = sentiment_word,
newdata = sentiment_extract_input,
text_column = "review",
level = "document",
accumulate = ["id", "product"]
)
# Print the results
print(SentimentExtractor_out5)
- __repr__(self)
- Returns the string representation for a SentimentExtractor class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|