Description
The KMeans function takes a data set and outputs the centroids of its
clusters and, optionally, the clusters themselves.
Usage
td_kmeans_mle (
data = NULL,
centers = NULL,
iter.max = 10,
initial.seeds = NULL,
seed = NULL,
unpack.columns = FALSE,
centroids.table = NULL,
threshold = 0.0395,
data.sequence.column = NULL,
centroids.table.sequence.column = NULL
)
Arguments
data |
Required Argument.
Specifies the input dataset containing the list of features by which we
are clustering the data.
|
centers |
Optional Argument.
Specifies the number of clusters to generate from the data.
Note: With centers, the function uses a nondeterministic algorithm and
the function supports up to 1543 dimensions.
|
iter.max |
Optional Argument.
Specifies the maximum number of iterations that the algorithm runs
before quitting if the convergence threshold has not been met.
Default Value: 10
|
initial.seeds |
Optional Argument.
Specifies the initial seed means as strings of underscore-delimited
DOUBLE PRECISION values. For example, this clause initializes eight
clusters in eight-dimensional space: Means("50_50_50_50_50_50_50_50",
"150_150_150_150_150_150_150_150", "250_250_250_250_250_250_250_250",
"350_350_350_350_350_350_350_350", "450_450_450_450_450_450_450_450",
"550_550_550_550_550_550_550_550", "650_650_650_650_650_650_650_650",
"750_750_750_750_750_750_750_750") The dimensionality of the means
must match the dimensionality of the data (that is, each mean must
have n numbers in it, where n is the number of input columns minus
one). By default, the algorithm chooses the initial seed means
randomly.
Note: With initial.seeds, the function uses a deterministic algorithm and
the function supports up to 1596 dimensions.
|
seed |
Optional Argument.
Sets a random seed for the algorithm.
|
unpack.columns |
Optional Argument.
Specifies whether the means for each centroid appear unpacked (that
is, in separate columns) in output_table. By default, the function
concatenates the means for the centroids and outputs the result in a
single VARCHAR column.
Default Value: FALSE
|
centroids.table |
Optional Argument.
Specifies the input dataset that contains the initial seed means for
the clusters. The schema of the centroids table depends on the value
of the unpack.columns argument.
Note: With centroids.table, the function uses a deterministic algorithm
and the function supports up to 1596 dimensions.
|
threshold |
Optional Argument.
Specifies the convergence threshold. When the centroids move by less
than this amount, the algorithm has converged.
Default Value: 0.0395
|
data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
|
centroids.table.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "centroids.table". The argument is used to
ensure deterministic results for functions which produce results that
vary from run to run.
|
Value
Function returns an object of class "td_kmeans_mle" which is a named list
containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator
using following names:
clusters.centroids
clustered.output
output
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("kmeans_example", "computers_train1")
# Create remote tibble objects.
computers_train1 <- tbl(con, "computers_train1")
# Example 1 -
td_kmeans_out1 <- td_kmeans_mle(data = computers_train1,
centers = 8,
iter.max = 10,
threshold = 0.05
)
# Example 2 -
td_kmeans_out2 <- td_kmeans_mle(data = computers_train1,
centers = 8,
iter.max = 10,
unpack.columns = TRUE,
threshold = 0.05
)
# Example 3 -
td_kmeans_out3 <- td_kmeans_mle(data = computers_train1,
initial.seeds = c("2249_51_408_8_14", "2165_51_398_7_14.6",
"2182_51_404_7_14.6", "2204_55_372_7.19_14.6","2419_44_222_6.6_14.3",
"2394_44.3_277_7.3_14.5"," 2326_43.6_301_7.11_14.3",
"2288_44_325_7_14.4")
)