Classification is a type of statistical analysis used to classify or categorize data into different groups based on their characteristics. It involves predicting a categorical response based on one or more independent variables. You can use it in finance, medicine, marketing, and so on. Logistic regression is a type of classification where the algorithm models the probability of an event taking place by having the log odds for the event to be linear combination of one or more independent variables.
In classification algorithms, the goal is to find a decision boundary that separates the different classes in the data space.
The goal of classification algorithms, such as logistic regression, is to estimate the values of coefficients that maximize the likelihood of the observed data. The likelihood function measures the probability of observing the actual values of the dependent variable, given the predicted probabilities from the model. The maximum likelihood of the coefficients is obtained using numerical optimization techniques such as gradient descent. Gradient descent is an optimization algorithm that iteratively adjusts the values of coefficients to maximize the likelihood.
You can use the following metrics to evaluate the performance of a classification model:.
- Accuracy is the fraction of predictions our model got right.
- Precision refers to the fraction of relevant instances among the total retrieved instances.
- Recall refers to the fraction of relevant instances retrieved over the total amount of relevant instances.
- F1 score is defined as the harmonic mean of precision and recall.
- ROC curve plots True positive rate vs False positive rate at different classification thresholds.