Teradata Package for R Function Reference | 17.20 - SentimentExtractor - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

SentimentExtractor

Description

The td_sentiment_extractor_sqle() function uses a dictionary model to extract the sentiment (positive, negative, or neutral) of each input document or sentence.

The dictionary model consists of WordNet, a lexical database of the English language, and these negation words (no, not, neither, never, and similar negation words).

The function handles negated sentiments as follows:

  • -1 if the sentiment is negated (for example, "I am not happy")

  • -1 if the sentiment and a negation word are separated by one word (for example, "I am not very happy")

  • +1 if the sentiment and a negation word are separated by two or more words (for example, "I am not saying I am happy")

Notes:

  • This function requires the UTF8 client character set for UNICODE data.

  • This function does not support Pass Through Characters (PTCs).

  • For information about PTCs, see Teradata Vantage™ - Analytics Database International Character Set Support.

  • This function does not support KanjiSJIS or Graphic data types.

  • Only the English language is supported.

  • The max length supported for sentiment word in the dictionary data is 128 characters.

  • The Max length of the sentiment_words output column is 32000 characters. If the sentiment_words output column value exceeds this limit, then a triple dot(...) displays at the end of the string.

  • The Max length of the content output column is 32000 characters; that is, the supported maximum length of a sentence is 32000.

  • User can have up to 10 words in a sentiment phrase.

Usage

  td_sentiment_extractor_sqle (
      data = NULL,
      cust.dict = NULL,
      add.dict = NULL,
      text.column = NULL,
      accumulate = NULL,
      analysis.type = "DOCUMENT",
      priority = "NONE",
      output.type = "ALL",
      ...
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata

cust.dict

Optional Argument.
Specifies the input tbl_teradata containing custom dictionary data, to use non-default custom dictionary data.
Types: tbl_teradata

add.dict

Optional Argument.
Specifies the input tbl_teradata containing additional entries, to add additional entries to either "cust.dict" or default dictionary.
Types: tbl_teradata

text.column

Required Argument.
Specifies the "data" column that contains the text data for sentiment analysis.
Types: character

accumulate

Optional Argument.
Specifies the name(s) of input tbl_teradata column(s) to copy to the output. By default, the function copies no input tbl_teradata columns to the output.
Types: character OR vector of Strings (character)

analysis.type

Optional Argument.
Specifies the level of analysis, whether to analyze each document or each sentence in a document.
Permitted Values:

  • DOCUMENT - Analyzes each document.

  • SENTENCE - Analyzes each sentence in a document.

Default Value: "DOCUMENT"
Types: character

priority

Optional Argument.
Specifies the highest priority when returning results.
Permitted Values:

  • NONE - Provide all results the same priority.

  • NEGATIVE_RECALL - Provide the highest priority to negative results, including those with lower-confidence sentiment classifications (maximizes number of negative results returned).

  • NEGATIVE_PRECISION - Provide the highest priority to negative results with high-confidence sentiment classifications.

  • POSITIVE_RECALL - Provide the highest priority to positive results, including those with lower-confidence sentiment classifications (maximizes number of positive results returned).

  • POSITIVE_PRECISION - Provide the highest priority to positive results with high confidence sentiment classifications.

Default Value: "NONE"
Types: character

output.type

Optional Argument.
Specifies the kind of results to return.
Permitted Values:

  • ALL - Returns all results.

  • POS - Returns only results with positive sentiments.

  • NEG - Returns only results with negative sentiments.

  • NEU - Returns only results with neutral sentiments.

Default Value: "ALL"
Types: character

...

Specifies the generic keyword arguments SQLE functions accept. Below are the generic keyword arguments:
persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_sentiment_extractor_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):

  1. result

  2. output.dictionary.data

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("sentimentextractor_example", "sentiment_extract_input",
                                                  "sentiment_word_input",
                                                  "additional_table")
    
    # Create tbl_teradata object.
    sentiment_extract_input <- tbl(con, "sentiment_extract_input")
    sentiment_word_input <- tbl(con, "sentiment_word_input")
    additional_table <- tbl(con, "additional_table")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1 : Extracting the sentiment (positive, negative, or neutral)
    #             of each input document or sentence.
    sentimentextractor_out <- td_sentiment_extractor_sqle(text.column="review",
                                                          data=sentiment_extract_input)
    
    # Print the result.
    print(sentimentextractor_out$result)
    print(sentimentextractor_out$output.dictionary.data)
    
    # Example 2 : Extracting the sentiment (positive, negative, or neutral)
    #             of each input document by specifying custom dictionary data
    #             and adding additional entries to custom dictionary data.
    sentimentextractor_out_1 <- td_sentiment_extractor_sqle(
                                  text.column="review",
                                  accumulate=c('id', 'product'),
                                  analysis.type="DOCUMENT",
                                  priority="NONE",
                                  output.type="ALL",
                                  data=sentiment_extract_input,
                                  cust.dict=sentiment_word_input,
                                  add.dict=additional_table)
    
    # Print the result.
    print(sentimentextractor_out_1$result)
    print(sentimentextractor_out_1$output.dictionary.data)