TD_ClassificationEvaluator Usage Notes | ClassificationEvaluator - TD_ClassificationEvaluator Usage Notes - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

Classification is a type of statistical analysis used to classify or categorize data into different groups based on their characteristics. It involves predicting a categorical response based on one or more independent variables. It is commonly used in various fields, such as finance, medicine, marketing etc. The most common type of classification is logistic regression where the algorithm models the probability of an event taking place by having the log-odds for the event to be linear combination of one or more independent variables.

In classification algorithms, the goal is to find a decision boundary that separates the different classes in the data space.

The goal of classification algorithms, such as logistic regression, is to estimate the values of coefficients that maximize the likelihood of the observed data. The likelihood function measures the probability of observing the actual values of the dependent variable, given the predicted probabilities from the model. The maximum likelihood of the coefficients is obtained using numerical optimization techniques such as gradient descent. Gradient descent is an optimization algorithm that iteratively adjusts the values of coefficients to maximize the likelihood.

To evaluate the performance of a classification model, various metrics can be used. The most commonly used metrics include accuracy, precision, recall, F1-score and ROC curve. Accuracy is the fraction of predictions our model got right. Precision refers to the fraction of relevant instances among the total retrieved instances. Similarly, recall refers to the fraction of relevant instances retrieved over the total amount of relevant instances. F1 score is defined as the harmonic mean of precision and recall. ROC curve plots True positive rate vs False positive rate at different classification thresholds. These metrics help to assess the performance of the model.

Classification Evaluator in Teradata

The TD_ClassificationEvaluator function computes evaluation metrics to evaluate and compare multiple classification models and summarize how close predictions are to their expected values. It takes the actual and predicted values of the dependent variable(s) to calculate specified metrics. Apart from accuracy, the secondary output table returns micro, macro and weighted-averaged metrics of precision, recall and F1-score values.

This function works for multi-class scenarios as well. In any case, the primary output table contains class-level metrics, where the secondary output table contains metrics that are applicable across classes.