Boosting is a technique that develops a strong classifying algorithm from a collection of weak classifying algorithms. A classifying algorithm is weak if its correct classification rate is slightly better than random guessing (which is 50% for binary classification). The intuition behind boosting is that combining a set of predictions, each of which has more than 50% probability of being correct, can produce an arbitrarily accurate predictor function.
Without boosting, a decision tree has these inherent problems:
- Its growth is based on binary splitting, which can introduce inaccuracy in classification.
- An incorrect decision at one tree level propagates to the next level.
Boosting is sensitive to noise in the data. Because weak classifiers are likely to incorrectly classify outliers, the algorithm weights outliers more heavily with each iteration, thereby increasing their influence on the final result.