In the IEEE International Conference on Data Mining (ICDM) in December 2006, the K-Nearest Neighbor (kNN) classification algorithm was presented as one of the top 10 data-mining algorithms.
The kNN algorithm classifies data objects based on proximity to other data objects with known classification. The objects with known classification serve as training data.
kNN classifies data based on the following parameters:
- Training data
- A metric that measures distance between objects
- The number of nearest neighbors (k)
The following figure shows an example of data classification using kNN. The red and blue dots represent training data objects—the red dots are classified as cancerous tissue and the blue dots are classified as normal tissue. The gray dot represents a test data object.
The inner circle represents k=4 and the outer circle represents k=10. When k=4, most of the nearest neighbors of the gray dot are red, so the algorithm classifies the gray dot as cancerous tissue. When kk=10, most of the nearest neighbors of the gray dot are blue, so the algorithm classifies the gray dot as normal tissue.