The KmeansTrain class defines a wrapper function that uses the Aster Spark API and implements the training phase of the Spark MLlib K-means clustering algorithm. The function generates a model that is typically used by the KMeansRun function.
Run Method Signature
run(input: RDD[DataRow], sparkFunctParams: String): RDD[DataRow]
Parameters
String representing the parameters specific to the function you are implementing. The string has this syntax:
'--option_value_pair [,...]'
option_value_pair is one of the following:
-
initializationMode { "random" | "k-means||" }
Default: "k-means||"
-
k clusters
Number of clusters.
-
maxIterations max_iterations
Maximum number of iterations.
-
modelLocation model_location
Required. Specifies the HDFS path to the location where the function is to save the model.
-
runs runs
Number of parallel runs. Default: 1.
-
seed seed
Random seed value for cluster initialization.
Returns
The input data and the predicted value (that is, the cluster number).
Side Effects
Function saves model in model_location.
Version
Spark 1.4 and later.