Teradata Package for R Function Reference | 17.00 - NaiveBayesTextClassifierPredict - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Product

Teradata Package for R

Release Number

17.00

Published

July 2021

Language

English (United States)

Last Update

2023-08-08

dita:id

B700-4007

NMT

Product Category

Teradata Vantage

NaiveBayesTextClassifierPredict

Description

The NaiveBayesTextClassifierPredict function uses the model tbl_teradata generated by the NaiveBayesTextClassifierTrainer td_naivebayes_textclassifier_mle function to predict outcomes for test data.

Usage

  td_naivebayes_textclassifier_predict_sqle (
      object = NULL,
      newdata = NULL,
      input.token.column = NULL,
      doc.id.columns = NULL,
      model.type = "MULTINOMIAL",
      top.k = NULL,
      model.token.column = NULL,
      model.category.column = NULL,
      model.prob.column = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL,
      object.order.column = NULL
  )
## S3 method for class 'td_naivebayes_textclassifier_mle'
predict(
      object = NULL,
      newdata = NULL,
      input.token.column = NULL,
      doc.id.columns = NULL,
      model.type = "MULTINOMIAL",
      top.k = NULL,
      model.token.column = NULL,
      model.category.column = NULL,
      model.prob.column = NULL,
      newdata.partition.column = NULL,
      newdata.order.column = NULL,
      object.order.column = NULL)

Arguments

`object`	Required Argument. Specifies the model tbl_teradata generated by `td_naivebayes_textclassifier_mle`. This argument can accept either a tbl_teradata or an object of "td_naivebayes_textclassifier_mle" class.
`object.order.column`	Optional Argument. Specifies Order By columns for "object". Values to this argument can be provided as a vector, if multiple columns are used for ordering. Types: character OR vector of Strings (character)
`newdata`	Required Argument. Specifies the tbl_teradata containing the input test data.
`newdata.partition.column`	Required Argument. Specifies Partition By columns for "newdata". Values to this argument can be provided as a vector, if multiple columns are used for partition. Types: character OR vector of Strings (character)
`newdata.order.column`	Optional Argument. Specifies Order By columns for "newdata". Values to this argument can be provided as a vector, if multiple columns are used for ordering. Types: character OR vector of Strings (character)
`input.token.column`	Required Argument. Specifies the name of the "newdata" column that contains the tokens. Types: character
`doc.id.columns`	Required Argument. Specifies the names of the "newdata" columns that contain the document identifier. Types: character OR vector of Strings (character)
`model.type`	Optional Argument. Specifies the model type of the text classifier. Default Value: "MULTINOMIAL" Permitted Values: MULTINOMIAL, BERNOULLI Types: character
`top.k`	Optional Argument. Specifies the number of most likely prediction categories to output with their log-likelihood values (for example, the top 10 most likely prediction categories). The default is all prediction categories. Types: integer
`model.token.column`	Optional Argument. Specifies the name of the "object" column that contains the tokens. The default value is the first column of "object". Types: character
`model.category.column`	Optional Argument. Specifies the name of the "object" column that contains the prediction categories. The default value is the second column of "object". Types: character
`model.prob.column`	Optional Argument. Specifies the name of the "object" column that contains the token counts. The default value is the third column of "object". Types: character

Value

Function returns an object of class "td_naivebayes_textclassifier_predict_sqle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

  
    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("naivebayes_textclassifier_predict_example", "token_table",
                    "complaints_tokens_test")
    
    # Create object(s) of class "tbl_teradata".
    token_table <- tbl(con, "token_table")
    complaints_tokens_test <- tbl(con,"complaints_tokens_test")
    
    # Example -
    # Create the model
    textclassifier_out <- td_naivebayes_textclassifier_mle(data = token_table,
                                           data.partition.column = c("category"),
                                           token.column = "token",
                                           doc.id.columns = c("doc_id"),
                                           doc.category.column = "category",
                                           model.type = "Bernoulli"
                                           )
    
    # Predict the output
    predict_out <- td_naivebayes_textclassifier_predict_sqle(newdata = complaints_tokens_test,
                                                   object = textclassifier_out,
                                                   newdata.partition.column = "doc_id",
                                                   input.token.column = "token",
                                                   doc.id.columns = c("doc_id"),
                                                   model.type = "Bernoulli",
                                                   top.k = 1
                                                   )
                                          
    # Alternatively use S3 predict method to find the predictions.         
    predict_result <- predict(textclassifier_out,
                               newdata = complaints_tokens_test,
                               newdata.partition.column = "doc_id",
                               input.token.column = "token",
                               doc.id.columns = c("doc_id"),
                               model.type = "Bernoulli",
                               top.k = 1)