Teradata R Package Function Reference | 17.00 - 17.00 - NaiveBayesTextClassifierPredict - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The NaiveBayesTextClassifierPredict function uses the model tbl_teradata generated by the NaiveBayesTextClassifierTrainer td_naivebayes_textclassifier_mle function to predict outcomes for test data.

Usage

  td_naivebayes_textclassifier_predict_sqle (
      object = NULL,
      newdata = NULL,
      input.token.column = NULL,
      doc.id.columns = NULL,
      model.type = "MULTINOMIAL",
      top.k = NULL,
      model.token.column = NULL,
      model.category.column = NULL,
      model.prob.column = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL,
      object.order.column = NULL
  )
## S3 method for class 'td_naivebayes_textclassifier_mle'
predict(
      object = NULL,
      newdata = NULL,
      input.token.column = NULL,
      doc.id.columns = NULL,
      model.type = "MULTINOMIAL",
      top.k = NULL,
      model.token.column = NULL,
      model.category.column = NULL,
      model.prob.column = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL,
      object.order.column = NULL)

Arguments

object

Required Argument.
Specifies the model tbl_teradata generated by td_naivebayes_textclassifier_mle.
This argument can accept either a tbl_teradata or an object of "td_naivebayes_textclassifier_mle" class.

object.order.column

Optional Argument.
Specifies Order By columns for "object".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

newdata

Required Argument.
Specifies the tbl_teradata containing the input test data.

newdata.partition.column

Required Argument.
Specifies Partition By columns for "newdata".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

newdata.order.column

Optional Argument.
Specifies Order By columns for "newdata".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

input.token.column

Required Argument.
Specifies the name of the "newdata" column that contains the tokens.
Types: character

doc.id.columns

Required Argument.
Specifies the names of the "newdata" columns that contain the document identifier.
Types: character OR vector of Strings (character)

model.type

Optional Argument.
Specifies the model type of the text classifier.
Default Value: "MULTINOMIAL"
Permitted Values: MULTINOMIAL, BERNOULLI
Types: character

top.k

Optional Argument.
Specifies the number of most likely prediction categories to output with their log-likelihood values (for example, the top 10 most likely prediction categories). The default is all prediction categories.
Types: integer

model.token.column

Optional Argument.
Specifies the name of the "object" column that contains the tokens. The default value is the first column of "object".
Types: character

model.category.column

Optional Argument.
Specifies the name of the "object" column that contains the prediction categories. The default value is the second column of "object".
Types: character

model.prob.column

Optional Argument.
Specifies the name of the "object" column that contains the token counts. The default value is the third column of "object".
Types: character

Value

Function returns an object of class "td_naivebayes_textclassifier_predict_sqle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("naivebayes_textclassifier_predict_example", "token_table",
                    "complaints_tokens_test")
    
    # Create object(s) of class "tbl_teradata".
    token_table <- tbl(con, "token_table")
    complaints_tokens_test <- tbl(con,"complaints_tokens_test")
    
    # Example -
    # Create the model
    textclassifier_out <- td_naivebayes_textclassifier_mle(data = token_table,
                                           data.partition.column = c("category"),
                                           token.column = "token",
                                           doc.id.columns = c("doc_id"),
                                           doc.category.column = "category",
                                           model.type = "Bernoulli"
                                           )
    
    # Predict the output
    predict_out <- td_naivebayes_textclassifier_predict_sqle(newdata = complaints_tokens_test,
                                                   object = textclassifier_out,
                                                   newdata.partition.column = "doc_id",
                                                   input.token.column = "token",
                                                   doc.id.columns = c("doc_id"),
                                                   model.type = "Bernoulli",
                                                   top.k = 1
                                                   )
                                          
    # Alternatively use S3 predict method to find the predictions.         
    predict_result <- predict(textclassifier_out,
                               newdata = complaints_tokens_test,
                               newdata.partition.column = "doc_id",
                               input.token.column = "token",
                               doc.id.columns = c("doc_id"),
                               model.type = "Bernoulli",
                               top.k = 1)