Teradata Package for R Function Reference | 17.20 - NaiveBayesTextClassifierPredict - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
lifecycle
latest
Product Category
Teradata Vantage

NaiveBayesTextClassifierPredict

Description

The td_naivebayes_textclassifier_predict_sqle() function uses the model generated by the td_naivebayes_textclassifier_trainer_sqle() function to predict the outcomes for a test set of data.

Usage

  td_naivebayes_textclassifier_predict_sqle (
      object = NULL,
      newdata = NULL,
      input.token.column = NULL,
      doc.id.columns = NULL,
      model.type = 'MULTINOMIAL',
      top.k = NULL,
      model.token.column = NULL,
      model.category.column = NULL,
      model.prob.column = NULL,
      output.prob = FALSE,
      responses = NULL,
      accumulate = NULL,
      ...
  )

Arguments

object

Required Argument.
Specifies the tbl_teradata which contains the model data generated by the td_naivebayes_textclassifier_trainer_sqle() function or instance of td_naivebayes_textclassifier_trainer_sqle.
Types: tbl_teradata or td_naivebayes_textclassifier_trainer_sqle

newdata

Required Argument.
Specifies the tbl_teradata containing the input test data.
Types: tbl_teradata

input.token.column

Required Argument.
Specifies the name of the newdata column that contains the tokens.
Types: character

doc.id.columns

Required Argument.
Specifies the names of the newdata columns that contain the document identifier.
Types: character OR vector of Strings (character)

model.type

Optional Argument.
Specifies the model type of the text classifier.
Permitted Values: 'MULTINOMIAL', 'BERNOULLI'
Default Value: 'MULTINOMIAL'
Types: character

top.k

Optional Argument.
Specifies the number of most likely prediction categories to output with their log-likelihood values (for example, the top 10 most likely prediction categories). The default is all prediction categories.
Types: integer

model.token.column

Optional Argument.
Specifies the name of the object column that contains the tokens. The default value is the first column of object.
Types: character

model.category.column

Optional Argument.
Specifies the name of the object column that contains the prediction categories. The default value is the second column of object.
Types: character

model.prob.column

Optional Argument.
Specifies the name of the object column that contains the token counts. The default value is the third column of object.
Types: character

output.prob

Optional Argument.
Specifies whether to output probabilities.
Default Value: FALSE
Types: logical

responses

Optional Argument.
Specifies a list of Responses to output.
Types: character OR vector of Strings (character)

accumulate

Optional Argument.
Specifies the name(s) of input tbl_teradata column(s) to copy to the output. By default, the function copies no input columns to the output.
Types: character OR vector of Strings (character)

...

Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not.
When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the function in a volatile table or not.
When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.hash.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.order.column" accepts character OR vector of Strings (character) (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_naivebayes_textclassifier_predict_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("naivebayes_textclassifier_predict_example", "token_table",
                    "complaints_tokens_test")
    
    # Create tbl_teradata object.
    token_table <- tbl(con, "token_table")
    complaints_tokens_test <- tbl(con, "complaints_tokens_test")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Create a model which is output of 
    # td_naivebayes_textclassifier_trainer_sqle() function.
    nbt_out <- td_naivebayes_textclassifier_trainer_sqle(
                data = token_table,
                token.column = 'token',
                doc.id.column = 'doc_id',
                doc.category.column = 'category',
                model.type = "Bernoulli",
                data.partition.column = 'category')
    
    # Example: Run td_naivebayes_textclassifier_predict_sqle() on model 
    #          generated by td_naivebayes_textclassifier_trainer_sqle() 
    #          where model_type is "Bernoulli".
    nbt_predict_out <- td_naivebayes_textclassifier_predict_sqle(
                        object = nbt_out$result,
                        newdata = complaints_tokens_test,
                        input.token.column = 'token',
                        doc.id.columns = 'doc_id',
                        model.type = "Bernoulli",
                        model.token.column = 'token',
                        model.category.column = 'category',
                        model.prob.column = 'prob',
                        newdata.partition.column = 'doc_id')
    
    # Print the result.
    print(nbt_predict_out$result)
    
    # Alternatively use S3 predict function to run predict on the output of
    # td_naivebayes_textclassifier_trainer_sqle() function.
    
    nbt_predict_out <- predict(
                         nbt_out,
                         newdata = complaints_tokens_test,
                         input.token.column = 'token',
                         doc.id.columns = 'doc_id',
                         model.type = "Bernoulli",
                         model.token.column = 'token',
                         model.category.column = 'category',
                         model.prob.column = 'prob',
                         newdata.partition.column = 'doc_id')
    
    # Print the result.
    print(nbt_predict_out$result)