The Naive Bayes is a probabilistic algorithm used for classification tasks, that is based on the Bayes theorem, which states that the probability of a hypothesis is proportional to the likelihood of the data given the hypothesis, multiplied by the prior probability of the hypothesis. In Naive Bayes, you compute the posterior probability of each class given the observed input data, and choose the class with the highest posterior probability as the predicted class for the input data.
To compute the posterior probability of each class, you need to calculate the likelihood of the input data given the class and the prior probability of the class. The likelihood is the product of the conditional probabilities of each feature given the class. To avoid underflow when working with small probabilities, take the logarithm of the likelihood and sum the logarithms of the conditional probabilities.
The Bayes theorem computes the posterior probability of each class given the observed input data. By comparing the posterior probabilities of each class, you can choose the class with the highest probability as the predicted class for the input data.
TD_NaiveBayesTextClassifierPredict function calculates the loglikelihood, and the posterior conditional probabilities for token-category pairs, for each document.
- TopK
- Responses
This function uses the model output by TD_NaiveBayesTextClassifierTrainer function to analyze the input data and make predictions.
TD_NaiveBayesTextClassifierTrainer function works as follow:
Training:
- Starts by pre-processing the training data, which involves cleaning and tokenizing the text data.
- Calculates the prior probability of each class by counting the number of data points in each class and dividing it by the total number of data points.
- Calculates the likelihood of each feature given the class by counting the number of times each feature appears in the data points belonging to each class and dividing it by the total number of data points in that class.
- Calculates the posterior probability of each class given the input data using the Bayes theorem, which states the following:
Testing:
- Starts by pre-processing the test data in the same way as the training data.
- Calculates the posterior probability of each class given the test data using the likelihood and prior probability calculated during training.
- Chooses the class with the highest probability as the predicted class for the test data.