Teradata R Package Function Reference | 17.00 - 17.00 - SentimentExtractor - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The ExtractSentiment function extracts the sentiment (positive, negative, or neutral) of each input document or sentence, using either a classification model output by the TrainSentimentExtractor (td_sentiment_trainer_mle) function or a dictionary model.

Usage

  td_sentiment_extractor_mle (
      object = NULL,
      newdata = NULL,
      dict.data = NULL,
      text.column = NULL,
      language = "en",
      level = "DOCUMENT",
      high.priority = "NONE",
      filter = "ALL",
      accumulate = NULL,
      newdata.sequence.column = NULL,
      dict.data.sequence.column = NULL,
      newdata.order.column = NULL,
      dict.data.order.column = NULL
  )

Arguments

object

Optional Argument.
Specifies the model type and file. The default model type is dictionary. If you omit this argument or specify dictionary without "dict_file", then you must specify a dictionary tbl_teradata with alias "dict.data". If you specify both "dict.data" and "dict_file", then whenever their words conflict, "dict.data" has higher priority. The "dict_file" must be a text file in which each line contains only a sentiment word, a space, and the opinion score of the sentiment word. If you specify this argument as "classification:model_file", model_file must be the name of a model file generated and installed on the ML Engine by the td_sentiment_trainer_mle function.
Note: Before running the function, add the location of dict_file or model_file to the user/session default search path. The correct ways to specify this argument are "dictionary", "dictionary:dict_file" or "classification:model_file".
Types: character

newdata

Required Argument.
Specifies the tbl_teradata defining the input text.

newdata.order.column

Optional Argument.
Specifies Order By columns for "newdata".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

dict.data

Optional Argument.
Specifies the tbl_teradata defining the dictionary.

dict.data.order.column

Optional Argument.
Specifies Order By columns for "dict.data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

text.column

Required Argument.
Specifies the name of the input column that contains text from which to extract sentiments.
Types: character

language

Optional Argument.
Specifies the language of the input text: en (English), zh_CN (Simplified Chinese), zh_TW (Traditional Chinese)
Default Value: "en"
Permitted Values: en, zh_CN, zh_TW
Types: character

level

Optional Argument.
Specifies the level of analysis - whether to analyze each document or each sentence.
Default Value: "DOCUMENT"
Permitted Values: DOCUMENT, SENTENCE
Types: character

high.priority

Optional Argument.
Specifies the highest priority when returning results:

  1. NEGATIVE_RECALL: Give highest priority to negative results, including those with lower confidence sentiment classifications (maximizes the number of negative results returned).

  2. NEGATIVE_PRECISION: Give highest priority to negative results with high-confidence sentiment classifications.

  3. POSITIVE_RECALL: Give highest priority to positive results, including those with lower confidence sentiment classifications (maximizes the number of positive results returned).

  4. POSITIVE_PRECISION: Give highest priority to positive results with high-confidence sentiment classifications.

  5. NONE: Give all results the same priority.

Default Value: "NONE"
Permitted Values: NEGATIVE_RECALL, NEGATIVE_PRECISION, POSITIVE_RECALL, POSITIVE_PRECISION, NONE
Types: character

filter

Optional Argument.
Specifies the kind of results to return:

  1. POSITIVE: Return only results with positive sentiments.

  2. NEGATIVE: Return only results with negative sentiments.

  3. ALL: Return all results (default)

Default Value: "ALL"
Permitted Values: POSITIVE, NEGATIVE, ALL
Types: character

accumulate

Optional Argument.
Specifies the names of the input columns to copy to the output tbl_teradata.
Types: character OR vector of Strings (character)

newdata.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "newdata". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

dict.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "dict.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_sentiment_extractor_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("sentimenttrainer_example", "sentiment_train")
    loadExampleData("sentimentextractor_example", "sentiment_extract_input", "sentiment_word")
    
    # Create object(s) of class "tbl_teradata".
    sentiment_train <- tbl(con, "sentiment_train")
    sentiment_extract_input <- tbl(con, "sentiment_extract_input")
    sentiment_word <- tbl(con, "sentiment_word")
    
    # Example 1 - This example uses the dictionary model file and analysis level is document.
    td_sentiment_extractor_out1 <- td_sentiment_extractor_mle(object = "dictionary",
                                                         newdata = sentiment_extract_input,
                                                         text.column = "review",
                                                         level = "document",
                                                         accumulate = c("id","product")
                                                         )
    
    # Example 2 - This example uses the dictionary model file and analysis level is sentence.
    td_sentiment_extractor_out2 <- td_sentiment_extractor_mle(object = "dictionary",
                                                         newdata = sentiment_extract_input,
                                                         text.column = "review",
                                                         level = "sentence",
                                                         accumulate = c("id","product")
                                                         )
    
    # Example 3 - This example uses a maximum entropy classification model file.
    td_sentiment_extractor_out3 <- td_sentiment_extractor_mle(
                            object = "classification:default_sentiment_classification_model.bin",
                            newdata = sentiment_extract_input,
                            text.column = "review",
                            level = "document",
                            accumulate = c("id")
                            )
    
    # Example 4 - This example uses a model file output by the td_sentiment_trainer_mle() function.
    td_sentiment_trainer_out <- td_sentiment_trainer_mle(data = sentiment_train,
                                                     text.column = "review",
                                                     sentiment.column = "category",
                                                     model.file = "sentimentmodel1.bin"
                                                     )
                                                     
    td_sentiment_extractor_out4 <- td_sentiment_extractor_mle(
                                                    object = "classification:sentimentmodel1.bin",
                                                    newdata = sentiment_extract_input,
                                                    text.column = "review",
                                                    level = "document",
                                                    accumulate = c("id")
                                                    )
    
    # Example 5 - This example uses a dictionary instead of a model file.
    td_sentiment_extractor_out5 <- td_sentiment_extractor_mle(newdata = sentiment_extract_input,
                                                         dict.data = sentiment_word,
                                                         text.column = "review",
                                                         level = "document",
                                                         accumulate = c("id", "product")
                                                         )