Description
The KMeans function takes a data set and outputs the centroids of its clusters and, optionally, the clusters themselves. The algorithm groups a set of observations into k clusters with each observation assigned to the cluster with the nearest centroid, or mean. The algorithm minimizes an objective function; in the KMeans function, the objective function is the total Euclidean distance of all data points from the center of the cluster to which they are assigned.
Usage
td_kmeans_mle ( data = NULL, centers = NULL, iter.max = 10, initial.seeds = NULL, seed = NULL, unpack.columns = FALSE, centroids.table = NULL, threshold = 0.0395, data.sequence.column = NULL, centroids.table.sequence.column = NULL )
Arguments
data 
Required Argument. 
centers 
Optional Argument. 
iter.max 
Optional Argument. 
initial.seeds 
Optional Argument. 
seed 
Optional Argument. 
unpack.columns 
Optional Argument. 
centroids.table 
Optional Argument. 
threshold 
Optional Argument. 
data.sequence.column 
Optional Argument. 
centroids.table.sequence.column 
Optional Argument. 
Value
Function returns an object of class "td_kmeans_mle" which is a named
list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using the following names:
clusters.centroids

clustered.output
output
Examples
# Get the current context/connection. con < td_get_context()$connection # Load example data. loadExampleData("kmeans_example", "computers_train1") # Create object(s) of class "tbl_teradata". computers_train1 < tbl(con, "computers_train1") # These examples use different arguments to find clusters based on the five # attributes of personal computers data in the input tbl_teradata. # Example 1  Using "centers" to specify the number of clusters to generate. td_kmeans_out1 < td_kmeans_mle(data = computers_train1, centers = 8, iter.max = 10, threshold = 0.05 ) # Example 2  Using "centers" to specify the number of clusters to generate, and # setting "unpack.columns" to TRUE to make sure the centroids appear unpacked in # the "clusters.centroids" output tbl_teradata.. td_kmeans_out2 < td_kmeans_mle(data = computers_train1, centers = 8, iter.max = 10, unpack.columns = TRUE, threshold = 0.05 ) # Example 3  Using "initial.seeds" to specify the initial seed means. td_kmeans_out3 < td_kmeans_mle(data = computers_train1, initial.seeds = c("2249_51_408_8_14", "2165_51_398_7_14.6", "2182_51_404_7_14.6", "2204_55_372_7.19_14.6", "2419_44_222_6.6_14.3", "2394_44.3_277_7.3_14.5", "2326_43.6_301_7.11_14.3", "2288_44_325_7_14.4") )