Description
The KMeans function takes a data set and outputs the centroids of its clusters and, optionally, the clusters themselves. The algorithm groups a set of observations into k clusters with each observation assigned to the cluster with the nearest centroid, or mean. The algorithm minimizes an objective function; in the KMeans function, the objective function is the total Euclidean distance of all data points from the center of the cluster to which they are assigned.
Usage
td_kmeans_mle (
data = NULL,
centers = NULL,
iter.max = 10,
initial.seeds = NULL,
seed = NULL,
unpack.columns = FALSE,
centroids.table = NULL,
threshold = 0.0395,
data.sequence.column = NULL,
centroids.table.sequence.column = NULL
)
Arguments
data |
Required Argument. |
centers |
Optional Argument. |
iter.max |
Optional Argument. |
initial.seeds |
Optional Argument. |
seed |
Optional Argument. |
unpack.columns |
Optional Argument. |
centroids.table |
Optional Argument. |
threshold |
Optional Argument. |
data.sequence.column |
Optional Argument. |
centroids.table.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_kmeans_mle" which is a named
list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using the following names:
clusters.centroids
-
clustered.output
output
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load example data.
loadExampleData("kmeans_example", "computers_train1")
# Create object(s) of class "tbl_teradata".
computers_train1 <- tbl(con, "computers_train1")
# These examples use different arguments to find clusters based on the five
# attributes of personal computers data in the input tbl_teradata.
# Example 1 - Using "centers" to specify the number of clusters to generate.
td_kmeans_out1 <- td_kmeans_mle(data = computers_train1,
centers = 8,
iter.max = 10,
threshold = 0.05
)
# Example 2 - Using "centers" to specify the number of clusters to generate, and
# setting "unpack.columns" to TRUE to make sure the centroids appear unpacked in
# the "clusters.centroids" output tbl_teradata..
td_kmeans_out2 <- td_kmeans_mle(data = computers_train1,
centers = 8,
iter.max = 10,
unpack.columns = TRUE,
threshold = 0.05
)
# Example 3 - Using "initial.seeds" to specify the initial seed means.
td_kmeans_out3 <- td_kmeans_mle(data = computers_train1,
initial.seeds = c("2249_51_408_8_14",
"2165_51_398_7_14.6",
"2182_51_404_7_14.6",
"2204_55_372_7.19_14.6",
"2419_44_222_6.6_14.3",
"2394_44.3_277_7.3_14.5",
"2326_43.6_301_7.11_14.3",
"2288_44_325_7_14.4")
)