Teradata Package for R Function Reference | 17.20 - ROC - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
lifecycle
latest
Product Category
Teradata Vantage

ROC

Description

The td_roc_sqle accepts a set of prediction-actual pairs for a binary classification model and calculates the following values for a range of discrimination thresholds:

  • TRUE positive rate (TPR)

  • FALSE positive rate (FPR)

  • The area under the ROC curve (AUC)

  • Gini coefficient

A ROC curve shows the performance of a binary classification model as its discrimination threshold varies. For a range of thresholds, the curve plots the true positive rate against the false-positive rate.
Notes:

  • This function requires the UTF8 client character set for UNICODE data.

  • This function does not support Pass Through Characters (PTCs). For information about PTCs, see Teradata Vantage™ - Analytics Database International Character Set Support.

  • This function does not support KanjiSJIS or Graphic data types.

Usage

  td_roc_sqle (
      data = NULL,
      probability.column = NULL,
      observation.column = NULL,
      model.id.column = NULL,
      positive.label = NULL,
      num.thresholds = 50,
      auc = TRUE,
      gini = TRUE,
      ...
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata that contains the prediction-actual pairs for a binary classifier.
Types: tbl_teradata

probability.column

Required Argument.
Specifies the input column in "data" that contains the predictions.
Types: character

observation.column

Required Argument.
Specifies the input column in "data" that contains the actual classes.
Types: character

model.id.column

Optional Argument.
Specifies the input column in "data" that contains the model or partition identifiers for the td_roc_sqle curves.
Types: character

positive_class:
Required Argument.
Specifies the label of the positive class.
Types: character

positive.label
num.thresholds

Optional Argument.
Specifies the number of threshold for the function to use. The "num_threshold" must be in the range [1, 10000]. The function uniformly distributes the thresholds between 0 and 1.
Default Value: 50
Types: integer

auc

Optional Argument.
Specifies whether the function displays the AUC calculated from the td_roc_sqle values(thresholds, false positive rates, and true positive rates).
Default Value: TRUE
Types: logical

gini

Optional Argument.
Specifies whether the function displays the gini coefficient calculated from the td_roc_sqle values.
The Gini coefficient is an inequality measure among the values of a frequency distribution. A Gini coefficient of 0 indicates that all values are the same. The closer the Gini coefficient is to 1, the more unequal are the values in the distribution.
Default Value: TRUE
Types: logical

...

Specifies the generic keyword arguments SQLE functions accept. Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_roc_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):

  1. result

  2. output.data

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("tdplyr_example", "roc_input")
    
    # Create tbl_teradata object.
    roc_input <- tbl(con, "roc_input")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1 : Calculating True-Positive Rate (TPR), False-Positive Rate (FPR),
    #             Area Under the td_roc_sqle Curve (AUC), Gini Coefficient for a range
    #             of discrimination thresholds.
    roc_out <- td_roc_sqle(
                probability.column="probability",
                observation.column="observation",
                model.id.column="model_id",
                positive.class="1",
                data=roc_input)
    
    
    # Print the result.
    print(roc_out$result)
    print(roc_out$output.data)