Background - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

K-means clustering is a simple unsupervised learning algorithm that is popular for cluster analysis in data mining. The algorithm aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean—the centroid for the cluster.

The algorithm aims to minimize an objective function (in this case, a squared error function). The objective function, which is a chosen distance measure between a data point and the cluster center, indicates the distance of the n data points from their respective centroids.

The algorithm has these steps:

Place k points into the space represented by the objects that are being clustered.
These points represent initial group centroids.
Assign each object to the group that has the closest centroid.
Recalculate the positions of the k centroids.
Repeat steps 2 and 3 until the centroids no longer move.
Now the objects are in groups from which the metric to be minimized can be calculated.

Although the procedure always terminates, the k-means algorithm does not necessarily find the optimal configuration, corresponding to the global objective function minimum. The algorithm is significantly sensitive to the initial randomly selected cluster centers. To reduce the effect of these limitations, the k-means algorithm can be run multiple times.

The k-means algorithm in map-reduce consists of an iteration (until convergence) of a map and a reduce step. The map step assigns each point to a cluster. The reduce step takes all the points in each cluster and calculates the new centroid of the cluster.