TD_ClassificationEvaluator Function | ClassificationEvaluator - TD_ClassificationEvaluator - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
lifecycle
latest
Product Category
Teradata Vantage™

In classification problems, a confusion matrix is used to visualize the performance of a classifier. The confusion matrix contains predicted labels represented across the row-axis and actual labels represented across the column-axis. Each cell in the confusion matrix corresponds to the count of occurrences of labels in the test data.

The function works for multi-class scenarios as well. In any case, the primary output table contains class-level metrics, whereas the secondary output table contains metrics that are applicable across classes.

Apart from accuracy, the secondary output table returns micro, macro, and weighted-averaged metrics of precision, recall, and F1-score values.

Classification is a type of Machine Learning algorithm where the goal is to predict a categorical variable or class label based on a set of input features. The algorithm learns to classify new observations by training on a labeled dataset, where the class labels are already known. The most common type of classification is logistic regression where the algorithm models the probability of an event taking place by having the log-odds for the event to be linear combination of one or more independent variables.

In the classification process, a model is trained on a dataset consisting of input variables/features and corresponding categorical label. The model tries to estimate the probability of an event occurring based on a given set of input features. There are different types of classification algorithms, including decision trees, logistic regression, naïve Bayes, etc.

To evaluate the performance of a classification model, various metrics can be used. The most commonly used metrics include accuracy, precision, recall, F1-score and ROC curve. These metrics help to assess the performance of the model.

One crucial aspect of classification is selecting the appropriate features. Too many features can lead to overfitting, where the model performs well on the training data but poorly on the test data. On the other hand, too few features can lead to underfitting, where the model fails to capture the underlying patterns in the data.