Teradata Package for R Function Reference | 17.20 - NaiveBayesTextClassifier - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

NaiveBayesTextClassifierPredict

Description

The td_naivebayes_textclassifier_predict_mle_sqle() function uses the model generated by the td_naivebayes_textclassifier_mle() function to predict the outcomes for a test set of data.

Usage

  td_naivebayes_textclassifier_predict_mle_sqle (
      object = NULL,
      newdata = NULL,
      input.token.column = NULL,
      doc.id.columns = NULL,
      model.type = 'MULTINOMIAL',
      top.k = NULL,
      model.token.column = NULL,
      model.category.column = NULL,
      model.prob.column = NULL,
      output.prob = FALSE,
      responses = NULL,
      accumulate = NULL,
      ...
  )

Arguments

object

Required Argument.
Specifies the tbl_teradata which contains the model data generated by the td_naivebayes_textclassifier_mle() function or instance of td_naivebayes_textclassifier_mle().
Types: tbl_teradata or td_naivebayes_textclassifier_mle

newdata

Required Argument.
Specifies the tbl_teradata containing the input test data.
Types: tbl_teradata

input.token.column

Required Argument.
Specifies the name of the newdata column that contains the tokens.
Types: character

doc.id.columns

Required Argument.
Specifies the names of the newdata columns that contain the document identifier.
Types: character OR vector of Strings (character)

model.type

Optional Argument.
Specifies the model type of the text classifier.
Permitted Values: 'MULTINOMIAL', 'BERNOULLI'
Default Value: 'MULTINOMIAL'
Types: character

top.k

Optional Argument.
Specifies the number of most likely prediction categories to output with their log-likelihood values (for example, the top 10 most likely prediction categories). The default is all prediction categories.
Types: integer

model.token.column

Optional Argument.
Specifies the name of the object column that contains the tokens. The default value is the first column of object.
Types: character

model.category.column

Optional Argument.
Specifies the name of the object column that contains the prediction categories. The default value is the second column of object.
Types: character

model.prob.column

Optional Argument.
Specifies the name of the object column that contains the token counts. The default value is the third column of object.
Types: character

output.prob

Optional Argument.
Specifies whether to output probabilities.
Default Value: FALSE
Types: logical

responses

Optional Argument.
Specifies a list of Responses to output.
Types: character OR vector of Strings (character)

accumulate

Optional Argument.
Specifies the name(s) of input tbl_teradata column(s) to copy to the output. By default, the function copies no input columns to the output.
Types: character OR vector of Strings (character)

...

Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not.
When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the function in a volatile table or not.
When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.hash.column" accepts character OR vector of Strings (character) (Strings)

  • "<input.data.arg.name>.order.column" accepts character OR vector of Strings (character) (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_naivebayes_textclassifier_predict_mle_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("naivebayes_textclassifier_predict_example", "token_table",
                    "complaints_tokens_test")

    # Create object(s) of class "tbl_teradata".
    token_table <- tbl(con, "token_table")
    complaints_tokens_test <- tbl(con,"complaints_tokens_test")

    # Example -
    # Create the model
    textclassifier_out <- td_naivebayes_textclassifier_mle(
                            data = token_table,
                            data.partition.column = c("category"),
                            token.column = "token",
                            doc.id.columns = c("doc_id"),
                            doc.category.column = "category",
                            model.type = "Bernoulli"
                            )

    # Predict the output
    predict_out <- td_naivebayes_textclassifier_predict_mle_sqle(
                    newdata = complaints_tokens_test,
                    object = textclassifier_out,
                    newdata.partition.column = "doc_id",
                    input.token.column = "token",
                    doc.id.columns = c("doc_id"),
                    model.type = "Bernoulli",
                    top.k = 1
                    )
    
    # Print the result.
    print(nbt_predict_out$result)