Output - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

The KMeans function has two required outputs and one optional output. The required outputs are the results message (output to the screen) and the table of cluster centroids (specified by the OutputTable argument). The optional output is a table of the clusters themselves (specified by the ClusteredOutput argument).

The results message table starts with information about each cluster, described by the following two tables.

KMeans Results Message Table Schema
Column Name	Data Type	Description
clusterid	INTEGER	Contains the cluster identifiers of the centroids.
feature_set	VARCHAR	Column name is the concatenation of the feature names. For example, if the feature names are 'p1', 'p2', and 'p3', then the column name is 'p1 p2 'p3'. Contains the concatenation of the means in the centroid. For example, means 3, 5, and 6 are represented as '3 5 6'. The UnpackColumns argument does not affect this column.
size	INTEGER	Contains the number of points in the cluster.
withinss	INTEGER	Contains the within-cluster-sum-of-squares—the sum of squared differences of each point from its cluster centroid.

KMeans Results Messages
Label	Value
Converged :	'True' if the algorithm converged, 'False' otherwise.
Number of iterations :	Number of iterations that the algorithm performed.
Number of clusters :	Number of clusters.
Output table :	Name of the output table specified by the OutputTable argument.
Total_WithinSS :	Sum of withinss values in the preceding table.
Between_SS :	Between sum of squares—the sum of squared distances of centroids to the global mean, where the squared distance of each mean to the global mean is multiplied by the number of data points it represents.

The schema of the table of cluster centroids is affected by the UnpackColumns argument.

KMeans Output Table Schema, UnpackColumns('false') (Default)
Column Name	Data Type	Description
clusterid	INTEGER	Contains the cluster identifiers of the centroids.
feature_set	VARCHAR	Column name is the concatenation of the feature names. For example, if the feature names are 'p1', 'p2', and 'p3', then the column name is 'p1 p2 'p3'. Contains the concatenation of the means of the features in the centroid. For example, means 3, 5, and 6 are represented as '3 5 6'.
size	INTEGER	Contains the number of points in the cluster.
withinss	INTEGER	Contains the within-cluster-sum-of-squares—the sum of squared differences of each point from its cluster centroid.

KMeans Output Table Schema, UnpackColumns('true')
Column Name	Data Type	Description
clusterid	INTEGER	Contains the cluster identifiers of the centroids.
feature_i	INTEGER or VARCHAR	Contains the means for feature i. The table has one such column for each feature.
size	INTEGER	Contains the number of points in the cluster.
withinss	INTEGER	Contains the within-cluster-sum-of-squares—the sum of squared differences of each point from its cluster centroid.

The following table describes the optional table of the clusters themselves.

KMeans Clustered Output Table Schema
Column Name	Data Type	Description
pointid	INTEGER	Contains the identifier of the user or item (from input_table).
centroidid	INTEGER	Contains the identifier of the centroid for pointid.