TD_ClassificationEvaluator Function | ClassificationEvaluator - TD_ClassificationEvaluator - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

In classification problems, a confusion matrix is used to visualize the performance of a classifier. The confusion matrix contains predicted labels represented across the row-axis and actual labels represented across the column-axis. Each cell in the confusion matrix corresponds to the count of occurrences of labels in the test data.

The function works for multi-class scenarios as well. In any case, the primary output table contains class-level metrics, whereas the secondary output table contains metrics that are applicable across classes.

Apart from accuracy, the secondary output table returns micro, macro, and weighted-averaged metrics of precision, recall, and F1-score values.

Classification is a type of Machine Learning algorithm where the goal is to predict a categorical variable or class label based on a set of input features. The algorithm learns to classify new observations by training on a labelled dataset, where the class labels are already known. The most common type of classification is logistic regression where the algorithm models the probability of an event taking place by having the log-odds for the event to be linear combination of one or more independent variables.

In the classification process, a model is trained on a dataset consisting of input variables/features and corresponding categorical label. The model tries to estimate the probability of an event occurring based on a given set of input features. There are different types of classification algorithms, including decision trees, logistic regression, naïve Bayes, etc.

To evaluate the performance of a classification model, various metrics can be used. The most commonly used metrics include accuracy, precision, recall, F1-score and ROC curve. These metrics help to assess the performance of the model.

One crucial aspect of classification is selecting the appropriate features. Too many features can lead to overfitting, where the model performs well on the training data but poorly on the test data. On the other hand, too few features can lead to underfitting, where the model fails to capture the underlying patterns in the data.