KMeans - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

The KMeans function takes a data set and outputs the centroids of its clusters and, optionally, the clusters themselves. The k-means algorithm groups a set of observations into k clusters with each observation assigned to the cluster with the nearest centroid, or mean. The algorithm minimizes an objective function; in the KMeans function, the objective function is the total Euclidean distance of all data points from the center of the cluster to which they are assigned.

This is the algorithm:
  1. Specify or randomly select k initial cluster centroids.
  2. Assign each data point to the cluster that has the closest centroid.
  3. Recalculate the positions of the k centroids.
  4. Repeat steps 2 and 3 until the centroids no longer move.

Although the procedure always terminates, the k-means algorithm does not necessarily find the optimal configuration, corresponding to the global objective function minimum. The algorithm is significantly sensitive to the initial randomly selected cluster centers. To reduce the effect of these limitations, the k-means algorithm can be run multiple times.