Teradata Package for R Function Reference | 17.20 - ClassificationEvaluator - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
lifecycle
latest
Product Category
Teradata Vantage

ClassificationEvaluator

Description

In classification problems, a confusion matrix is used to visualize the performance of a classifier. The confusion matrix contains predicted labels represented across the row-axis and actual labels represented across the column-axis.
Each cell in the confusion matrix corresponds to the count of occurrences of labels in the test data. The td_classification_evaluator_sqle() function evaluate and emits various metrics of classification model based on its predictions on the data.
Apart from accuracy, the secondary output data returns micro, macro, and weighted-averaged metrics of precision, recall, and F1-score values.
Notes:

  • The function works for multi-class scenarios as well. In any case, the primary output data contains class-level metrics, whereas the secondary output data contains metrics that are applicable across classes.

  • The function works only when columns specified in 'observation.column' and 'prediction.column' has same teradata types.

Usage

  td_classification_evaluator_sqle (
      data = NULL,
      observation.column = NULL,
      prediction.column = NULL,
      num.labels = NULL,
      labels = NULL,
      ...
  )

Arguments

data

Required Argument.
Specifies the tbl_teradata, containing expected and predicted labels.
Types: tbl_teradata

observation.column

Required Argument.
Specifies the column name in "data" containing observation labels.
Types: character

prediction.column

Required Argument.
Specifies the column name in "data" containing predicted labels.
Types: character

num.labels

Optional Argument.
Specifies the number of labels in the dataset.
Note:
Argument is ignored if "labels" argument is used.
Allowed Values: 1 <= num.labels <= 512 Types: integer

labels

Optional Argument.
Specifies the list of all predicted labels in the input.
Provide either "num.labels" argument or "labels" argument.
Types: character OR vector of Strings (character)

...

Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_classification_evaluator_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):

  1. result

  2. output.data

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Example 1 : Evaluate the classification model generated to predict the labels
    #             'crash', 'nocrash' using the predicted data.
    
    # Load the example data.
    loadExampleData("textparser_example", "complaints", "stop_words")
    
    # Create tbl_teradata object.
    complaints <- tbl(con, "complaints")
    stop_words <- tbl(con, "stop_words")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Tokenize the "text_column" and accumulate result by "doc_id" and "category".
    complaints_tokenized <- td_text_parser_sqle(data=complaints,
                                                text.column="text_data",
                                                object=stop_words,
                                                remove.stopwords=TRUE,
                                                accumulate=c("doc_id",
                                                             "category"))
    
    # Calculate the conditional probabilities for token-category pairs.
    NBTCTrainer_out <- td_naive_bayes_text_classifier_trainer_sqle(
                        data=complaints_tokenized$result,
                        token.column="token",
                        doc.category.column="category")
    
    # Print the result tbl_teradata objects.
    print(NBTCTrainer_out$result)
    print(NBTCTrainer_out$model.data)
    
    # Score the data using td_naive_bayes_text_classifier_predict_sqle()
    # on model generated by td_naive_bayes_text_classifier_sqle()
    # where model_type is "MULTINOMIAL".
    nbt_predict_out <- td_naive_bayes_text_classifier_predict_sqle(
                        object = NBTCTrainer_out$model.data,
                        newdata = complaints_tokenized$result,
                        input.token.column = 'token',
                        accumulate="category",
                        doc.id.columns = 'doc_id')
    
    # Print the result.
    print(nbt_predict_out$result)
    
    # Convert prediction column and category column to same DataType.
    predicted_data <- td_convert_to_sqle(data = nbt_predict_out$result,
                                         target.columns = c("category",
                                                            "prediction"),
                                         target.datatype = 
                                          c("VARCHAR(charlen=20,charset=UNICODE,
                                            casespecific=NO)"))
    
    # Evaluate classification.
    ClassificationEvaluator_obj <- td_classification_evaluator_sqle(
                                    data=predicted_data$result,
                                    observation.column='category',
                                    prediction.column='prediction',
                                    labels=c('no_crash','crash'))
    
    # Print the result tbl_teradata objects.
    print(ClassificationEvaluator_obj$result)
    print(ClassificationEvaluator_obj$output.data)