Teradata R Package Function Reference - 16.20 - NaiveBayesTextClassifierPredict - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
16.20
created_date
February 2020
category
Programming Reference
featnum
B700-4007-098K

Description

The NaiveBayesTextClassifierPredict function uses the model table generated by the NaiveBayesTextClassifierTrainer function to predict outcomes for test data.

Usage

  td_naivebayes_textclassifier_predict_sqle (
      object = NULL,
      newdata = NULL,
      input.token.column = NULL,
      doc.id.columns = NULL,
      model.type = "MULTINOMIAL",
      top.k = NULL,
      model.token.column = NULL,
      model.category.column = NULL,
      model.prob.column = NULL,
      newdata.partition.column = NULL)
      
## S3 method for class 'td_naivebayes_textclassifier_mle'
predict(
      object = NULL,
      newdata = NULL, 
      input.token.column = NULL, 
      doc.id.columns = NULL,
      model.type = "MULTINOMIAL", 
      top.k = NULL, 
      model.token.column = NULL,
      model.category.column = NULL, 
      model.prob.column = NULL,
      newdata.partition.column = NULL)

Arguments

object

Required Argument.
Specifies the name of the object that contains the model which is the output of function td_naivebayes_textclassifier_mle. For td_naivebayes_textclassifier_predict_sqle, this can also be the tibble containing naivebayes textclassifier model.

newdata

Required Argument.
Specifies the table that defines the token table for prediction.

newdata.partition.column

Partition By columns for newdata.
Values to this argument can be provided as list, if multiple columns are used for ordering.

input.token.column

Required Argument.
Specifies the name of the input_table column that contains the tokens.

doc.id.columns

Required Argument.
Specifies the names of the input_table columns that contain the document identifier.

model.type

Optional Argument.
Specifies the model type of the text classifier. Default Value: "MULTINOMIAL"
Permitted Values: MULTINOMIAL, BERNOULLI

top.k

Optional Argument.
Specifies the number of most likely prediction categories to output with their log-likelihood values (for example, the top 10 most likely prediction categories). The default is all prediction categories.

model.token.column

Optional Argument.
Specifies the name of the model_table column that contains the tokens. The default value is the first column of model_table.

model.category.column

Optional Argument.
Specifies the name of the model_table column that contains the prediction categories. The default value is the second column of model_table.

model.prob.column

Optional Argument.
Specifies the name of the model_table column that contains the token counts. The default value is the third column of model_table.

Value

Function returns an object of class "td_naivebayes_textclassifier_predict_sqle" which is a named list containing Teradata tbl object. Named list member can be referenced directly with the "$" operator using name: result

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("naivebayes_textclassifier_predict_example", "token_table","complaints_tokens_test")
    
    # Create remote tibble objects.
    token_table <- tbl(con, "token_table")
    complaints_tokens_test <- tbl(con,"complaints_tokens_test")
    
    # Example -
    #Create the model
    textclassifier_out <- td_naivebayes_textclassifier_mle(data = token_table,
                                           data.partition.column = c("category"),
                                           token.column = "token",
                                           doc.id.columns = c("doc_id"),
                                           doc.category.column = "category",
                                           model.type = "Bernoulli"
                                           )
    
    # Predict the output
    predict_out <- td_naivebayes_textclassifier_predict_sqle(newdata = complaints_tokens_test,
                                                   object = textclassifier_out,
                                                   newdata.partition.column = "doc_id",
                                                   input.token.column = "token",
                                                   doc.id.columns = c("doc_id"),
                                                   model.type = "Bernoulli",
                                                   top.k = 1
                                                   )
                                          
    # Alternatively use S3 predict method to find the predictions.         
    predict_result <- predict(textclassifier_out,
                               newdata = complaints_tokens_test,
                               newdata.partition.column = "doc_id",
                               input.token.column = "token",
                               doc.id.columns = c("doc_id"),
                               model.type = "Bernoulli",
                               top.k = 1)