Background - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

In the IEEE International Conference on Data Mining (ICDM) in December 2006, the K-Nearest Neighbor (kNN) classification algorithm was presented as one of the top 10 data-mining algorithms.

The kNN algorithm classifies data objects based on proximity to other data objects with known classification. The objects with known classification serve as training data.

kNN classifies data based on the following parameters:

  • Training data
  • A metric that measures distance between objects
  • The number of nearest neighbors (k)

The following figure shows an example of data classification using kNN. The red and blue dots represent training data objects—the red dots are classified as cancerous tissue and the blue dots are classified as normal tissue. The gray dot represents a test data object.

The inner circle represents k=4 and the outer circle represents k=10. When k=4, most of the nearest neighbors of the gray dot are red, so the algorithm classifies the gray dot as cancerous tissue. When kk=10, most of the nearest neighbors of the gray dot are blue, so the algorithm classifies the gray dot as normal tissue.

KNN Example