Many of the real-world datasets for classification are imbalanced such that observations belonging to one class (minority class) are much fewer than the observations belonging to the other class (majority class). The challenge of working with imbalanced datasets is that most machine learning techniques model the majority class more optimally and have poor performance on the minority class whereas in many situations, the minority class is a more important class.
One approach to addressing imbalanced datasets is to oversample the minority class. The simplest approach involves duplicating examples in the minority class, although these examples do not add any new information to the model. Instead, new examples can be synthesized from the existing examples using a technique called Synthetic Minority Oversampling Technique (SMOTE).
- SMOTE algorithm generates samples from a random nearest neighbor by using random linear interpolation with the original sample.
- Adaptive Synthetic Sampling Approach or ADASYN aims for sampling from datasets where the neighbors density from majority class is larger. See He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328).
- Borderline aims for sampling from the border group which are the minority samples closer to the boundary with the majority class. See Han, H., Wang, W. Y., & Mao, B. H. (2005, August). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878-887). Berlin, Heidelberg: Springer Berlin Heidelberg. This function implements Borderline-2 algorithm mentioned here.
- Synthetic Minority Over-sampling TEchnique-Nominal Continuous or SMOTE-NC is a generalization of SMOTE to handle mixed datasets of continuous and nominal features. See Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
TD_SMOTE can handle multiclass datasets. However, the function can only sample one minority class at a time and considers all other classes than minority as majority.