Cross-validation, sometimes called rotation estimation, is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of previously unseen data against which the model is tested (testing dataset). The goal of cross validation is to define a dataset to “test” the model in the training phase (the validation dataset) to provide insight into how the model will generalize to an independent dataset. Cross-validation can be useful to identify and avoid overfitting problems.
Cross-validation works as follows: the data are randomly partitioned into k equal-sized subsamples. One group is kept aside as a validation set, and the model is trained on the rest of the data. The trained model is used on the validation set and the error rate is calculated. The process is repeated k times, with each of the k subsamples used as the validation set in turn.