Teradata R Package Function Reference - 16.20 - SentimentExtractor - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
16.20
created_date
February 2020
category
Programming Reference
featnum
B700-4007-098K

Description

The SentimentExtractor (td_sentiment_extractor_mle) function extracts the sentiment (positive, negative, or neutral) of each input document or sentence, using either a classification model output by the SentimentTrainer function or a dictionary model.

Usage

  td_sentiment_extractor_mle (
      object = NULL,
      newdata = NULL,
      dict.data = NULL,
      text.column = NULL,
      language = "en",
      level = "DOCUMENT",
      high.priority = "NONE",
      filter = "ALL",
      accumulate = NULL,
      newdata.sequence.column = NULL,
      dict.data.sequence.column = NULL
  )

Arguments

object

Optional Argument.
Specifies the model type and file. The default model type is dictionary. If you omit this argument or specify dictionary without "dict_file", then you must specify a dictionary tbl_teradata with alias "dict.data". If you specify both "dict.data" and "dict_file", then whenever their words conflict, "dict.data" has higher priority. The "dict_file" must be a text file in which each line contains only a sentiment word, a space, and the opinion score of the sentiment word. If you specify this argument as "classification:model_file", model_file must be the name of a model file generated and installed on the ML Engine by the SentimentTrainer (td_sentiment_trainer_mle) function.
Note: Before running the function, add the location of dict_file or model_file to the user/session default search path. The correct ways to specify this argument are "dictionary", "dictionary:dict_file" or "classification:model_file".

newdata

Required Argument.
Specifies the tbl_teradata defining the input text.

dict.data

Optional Argument.
Specifies the tbl_teradata defining the dictionary.

text.column

Required Argument.
Specifies the name of the input column that contains text from which to extract sentiments.

language

Optional Argument. Specifies the language of the input text: en (English), zh_CN (Simplified Chinese), zh_TW (Traditional Chinese) Default Value: "en" Permitted Values: en, zh_CN, zh_TW

level

Optional Argument.
Specifies the level of analysis - whether to analyze each document or each sentence.
Default Value: "DOCUMENT"
Permitted Values: DOCUMENT, SENTENCE

high.priority

Optional Argument.
Specifies the highest priority when returning results:

  1. NEGATIVE_RECALL: Give highest priority to negative results, including those with lower confidence sentiment classifications (maximizes the number of negative results returned).

  2. NEGATIVE_PRECISION: Give highest priority to negative results with high-confidence sentiment classifications.

  3. POSITIVE_RECALL: Give highest priority to positive results, including those with lower confidence sentiment classifications (maximizes the number of positive results returned).

  4. POSITIVE_PRECISION: Give highest priority to positive results with high-confidence sentiment classifications.

  5. NONE: Give all results the same priority.

Default Value: "NONE"
Permitted Values: NEGATIVE_RECALL, NEGATIVE_PRECISION, POSITIVE_RECALL, POSITIVE_PRECISION, NONE

filter

Optional Argument.
Specifies the kind of results to return:

  1. POSITIVE: Return only results with positive sentiments.

  2. NEGATIVE: Return only results with negative sentiments.

  3. ALL: Return all results (default)

Default Value: "ALL" Permitted Values: POSITIVE, NEGATIVE, ALL

accumulate

Optional Argument. Specifies the names of the input columns to copy to the output table.

newdata.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "newdata". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

dict.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "dict.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

Value

Function returns an object of class "td_sentiment_extractor_mle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("sentimenttrainer_example", "sentiment_train")
    loadExampleData("sentimentextractor_example", "sentiment_extract_input", "sentiment_word")
    
    # Create remote tibble objects.
    sentiment_train <- tbl(con, "sentiment_train")
    sentiment_extract_input <- tbl(con, "sentiment_extract_input")
    sentiment_word <- tbl(con, "sentiment_word")
    
    # This example uses the dictionary model file and analysis level is document
    td_sentiment_extractor_out1 <- td_sentiment_extractor_mle(object = "dictionary",
                                                         newdata = sentiment_extract_input,
                                                         text.column = "review",
                                                         level = "document",
                                                         accumulate = c("id","product")
                                                         )
    
    # This example uses the dictionary model file and analysis level is sentence
    td_sentiment_extractor_out2 <- td_sentiment_extractor_mle(object = "dictionary",
                                                         newdata = sentiment_extract_input,
                                                         text.column = "review",
                                                         level = "sentence",
                                                         accumulate = c("id","product")
                                                         )
    
    # This example uses a maximum entropy classification model file
    td_sentiment_extractor_out3 <- td_sentiment_extractor_mle(object = "classification:default_sentiment_classification_model.bin",
                                                         newdata = sentiment_extract_input,
                                                         text.column = "review",
                                                         level = "document",
                                                         accumulate = c("id")
                                                         )
    
    # This example uses a model file output by the SentimentTrainer function
    td_sentiment_trainer_out <- td_sentiment_trainer_mle(data = sentiment_train,
                                                     text.column = "review",
                                                     sentiment.column = "category",
                                                     model.file = "sentimentmodel1.bin"
                                                     )
                                                     
    td_sentiment_extractor_out4 <- td_sentiment_extractor_mle(object = "classification:sentimentmodel1.bin",
                                                         newdata = sentiment_extract_input,
                                                         text.column = "review",
                                                         level = "document",
                                                         accumulate = c("id")
                                                         )
    
    # This example uses a dictionary instead of a model file
    td_sentiment_extractor_out5 <- td_sentiment_extractor_mle(newdata = sentiment_extract_input,
                                                         dict.data = sentiment_word,
                                                         text.column = "review",
                                                         level = "document",
                                                         accumulate = c("id", "product")
                                                         )