ClassificationEvaluator
Description
In classification problems, a confusion matrix is used to visualize the
performance of a classifier. The confusion matrix contains predicted labels
represented across the row-axis and actual labels represented across
the column-axis.
Each cell in the confusion matrix corresponds to the count of occurrences
of labels in the test data. The td_classification_evaluator_sqle()
function evaluate
and emits various metrics of classification model based on its predictions
on the data.
Apart from accuracy, the secondary output data returns micro, macro,
and weighted-averaged
metrics of precision, recall, and F1-score values.
Notes:
The function works for multi-class scenarios as well. In any case, the primary output data contains class-level metrics, whereas the secondary output data contains metrics that are applicable across classes.
The function works only when columns specified in 'observation.column' and 'prediction.column' has same teradata types.
Usage
td_classification_evaluator_sqle (
data = NULL,
observation.column = NULL,
prediction.column = NULL,
num.labels = NULL,
labels = NULL,
...
)
Arguments
data |
Required Argument. |
observation.column |
Required Argument. |
prediction.column |
Required Argument. |
num.labels |
Optional Argument. |
labels |
Optional Argument. |
... |
Specifies the generic keyword arguments SQLE functions accept. Below volatile: Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:
Note: |
Value
Function returns an object of class "td_classification_evaluator_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):
result
output.data
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Example 1 : Evaluate the classification model generated to predict the labels
# 'crash', 'nocrash' using the predicted data.
# Load the example data.
loadExampleData("textparser_example", "complaints", "stop_words")
# Create tbl_teradata object.
complaints <- tbl(con, "complaints")
stop_words <- tbl(con, "stop_words")
# Check the list of available analytic functions.
display_analytic_functions()
# Tokenize the "text_column" and accumulate result by "doc_id" and "category".
complaints_tokenized <- td_text_parser_sqle(data=complaints,
text.column="text_data",
object=stop_words,
remove.stopwords=TRUE,
accumulate=c("doc_id",
"category"))
# Calculate the conditional probabilities for token-category pairs.
NBTCTrainer_out <- td_naive_bayes_text_classifier_trainer_sqle(
data=complaints_tokenized$result,
token.column="token",
doc.category.column="category")
# Print the result tbl_teradata objects.
print(NBTCTrainer_out$result)
print(NBTCTrainer_out$model.data)
# Score the data using td_naive_bayes_text_classifier_predict_sqle()
# on model generated by td_naive_bayes_text_classifier_sqle()
# where model_type is "MULTINOMIAL".
nbt_predict_out <- td_naive_bayes_text_classifier_predict_sqle(
object = NBTCTrainer_out$model.data,
newdata = complaints_tokenized$result,
input.token.column = 'token',
accumulate="category",
doc.id.columns = 'doc_id')
# Print the result.
print(nbt_predict_out$result)
# Convert prediction column and category column to same DataType.
predicted_data <- td_convert_to_sqle(data = nbt_predict_out$result,
target.columns = c("category",
"prediction"),
target.datatype =
c("VARCHAR(charlen=20,charset=UNICODE,
casespecific=NO)"))
# Evaluate classification.
ClassificationEvaluator_obj <- td_classification_evaluator_sqle(
data=predicted_data$result,
observation.column='category',
prediction.column='prediction',
labels=c('no_crash','crash'))
# Print the result tbl_teradata objects.
print(ClassificationEvaluator_obj$result)
print(ClassificationEvaluator_obj$output.data)