The MLPCTrainDF class defines a wrapper function that uses the Aster Spark API and implements the training phase of the Spark MLlib MultilayerPerceptronClassifier (MLP), both by itself and in a pipeline with Principal Component Analysis (PCA). The function generates a model that is typically used by the MLPCRunDF function.
Run Method Signature
run(inputDF: DataFrame, functParams: String): DataFrame
Parameters
'--option_value_pair [,...]'
-
blockSize block_size
[Optional] Block size for stacking input data in matrices to accelerate the computation.
-
labelCol label_column
[Optional] Specifies the name of the column that contains labels. Default: 'labels'.
-
layers n[:...]
Required. Specifies the number of elements in each layer. The first n applies to the input layers, the last n applies to the output layers, and each other n applies to a hidden layer. For example, 700:300:200:10 specifies 700 elements in the input layer, 10 elements in the output layer, 300 elements in the first hidden layer, and 200 elements in the second hidden layer.
-
maxIter maximum_iterations
[Optional] The maximum_iterations must be nonnegative.
-
modelLocation model_location
Required. Specifies the HDFS path to the location where the function is to save the model.
-
pcaK pcak_value
[Optional] If specified, the function uses PCA and pcak_value is the number of components for the PCA algorithm. If omitted, the function does not use PCA.
-
seed seed
[Optional] Specifies a random seed.
-
tol tol
Specifies the convergence tolerance for iterative algorithms.
-
ignoreCols column[,...]
[Optional] Specifies the names of input columns to copy to the output table.
Returns
The labels in label_column (if specified), the predicted values, and the input columns.
Side Effects
Function saves model in model_location.
Version
Spark 1.5 and later.