The KMeans function takes a data set and outputs the centroids of its clusters and, optionally, the clusters themselves. The k-means algorithm groups a set of observations into k clusters with each observation assigned to the cluster with the nearest centroid, or mean. The algorithm minimizes an objective function; in the KMeans function, the objective function is the total Euclidean distance of all data points from the center of the cluster to which they are assigned.
- Specify or randomly select k initial cluster centroids.
- Assign each data point to the cluster that has the closest centroid.
- Recalculate the positions of the k centroids.
- Repeat steps 2 and 3 until the centroids no longer move.
Although the procedure always terminates, the k-means algorithm does not necessarily find the optimal configuration, corresponding to the global objective function minimum. The algorithm is significantly sensitive to the initial randomly selected cluster centers. To reduce the effect of these limitations, the k-means algorithm can be run multiple times.