KMeans Output - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Teradata Vantage
Release Number
May 2019
English (United States)
Last Update
Product Category
Teradata Vantage™
Output Description
Results message Contains information about each cluster.
OutputTable Contains cluster centroids. Schema depends on UnpackColumns argument.
ClusteredOutput [Optional] Contains clusters themselves.

Results Message Schema

Column Data Type Description
clusterid INTEGER Cluster identifier of centroid.
feature_set VARCHAR Concatenation of means in centroid. For example, means 3, 5, and 6 are represented as '3 5 6'.

Column name is concatenation of feature names. For example, for feature names 'p1', 'p2', and 'p3', the column name is 'p1 p2 'p3'.

The UnpackColumns argument does not affect this column.
size INTEGER Number of points in cluster.
withinss INTEGER Within-cluster-sum-of-squares—sum of squared differences of each point from its cluster centroid.

After the information described by the preceding schema, the results message has the following information:

Label Value
Converged : 'True' if algorithm converged, 'False' otherwise.
Number of iterations : Number of iterations algorithm performed.
Number of clusters : Number of clusters.
Successfully created Output table  
Successfully created Clustered Output table [Column appears only with ClusteredOutput.]
Total_WithinSS : Sum of withinss values in preceding table.
Between_SS : Between sum of squares—sum of squared distances of centroids to global mean, where squared distance of each mean to global mean is multiplied by number of data points it represents.

OutputTable Schema, UnpackColumns ('false') (Default)

Column Data Type Description
clusterid INTEGER Cluster identifier of centroid.
feature_set VARCHAR Concatenation of means in centroid. For example, means 3, 5, and 6 are represented as '3 5 6'.

Column name is concatenation of feature names. For example, for feature names 'p1', 'p2', and 'p3', the column name is 'p1 p2 'p3'.

The UnpackColumns argument does not affect this column.
size INTEGER Number of points in cluster.
withinss INTEGER Within-cluster-sum-of-squares—sum of squared differences of each point from its cluster centroid.

OutputTable Schema, UnpackColumns ('true')

Column Data Type Description
clusterid INTEGER Cluster identifier of centroid.
feature_i INTEGER or VARCHAR [Column appears once for each feature.] Mean for feature i.
size INTEGER Number of points in cluster.
withinss INTEGER Within-cluster-sum-of-squares—sum of squared differences of each point from its cluster centroid.

ClusteredOutput Table Schema

Column Data Type Description
pointid INTEGER Identifier of user or item (id from InputTable).
centroidid INTEGER Identifier of centroid for pointid.