KMeans - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

The KMeans function takes a data set and outputs the centroids of its clusters and, optionally, the clusters themselves. The k-means algorithm groups a set of observations into k clusters with each observation assigned to the cluster with the nearest centroid, or mean. The algorithm minimizes an objective function; in the KMeans function, the objective function is the total Euclidean distance of all data points from the center of the cluster to which they are assigned.

This is the algorithm:

Specify or randomly select k initial cluster centroids.
Assign each data point to the cluster that has the closest centroid.
Recalculate the positions of the k centroids.
Repeat steps 2 and 3 until the centroids no longer move.

Although the procedure always terminates, the k-means algorithm does not necessarily find the optimal configuration, corresponding to the global objective function minimum. The algorithm is significantly sensitive to the initial randomly selected cluster centers. To reduce the effect of these limitations, the k-means algorithm can be run multiple times.