7.00.02 - Background - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Release Date
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)

K-means clustering is a simple unsupervised learning algorithm that is popular for cluster analysis in data mining. The algorithm aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean—the centroid for the cluster.

The algorithm aims to minimize an objective function (in this case, a squared error function). The objective function, which is a chosen distance measure between a data point and the cluster center, indicates the distance of the n data points from their respective centroids.

The algorithm has these steps:

  1. Place k points into the space represented by the objects that are being clustered.

    These points represent initial group centroids.

  2. Assign each object to the group that has the closest centroid.
  3. Recalculate the positions of the k centroids.
  4. Repeat steps 2 and 3 until the centroids no longer move.

    Now the objects are in groups from which the metric to be minimized can be calculated.

Although the procedure always terminates, the k-means algorithm does not necessarily find the optimal configuration, corresponding to the global objective function minimum. The algorithm is significantly sensitive to the initial randomly selected cluster centers. To reduce the effect of these limitations, the k-means algorithm can be run multiple times.

The k-means algorithm in map-reduce consists of an iteration (until convergence) of a map and a reduce step. The map step assigns each point to a cluster. The reduce step takes all the points in each cluster and calculates the new centroid of the cluster.