MLPCTrainDF - Aster Analytics

Teradata AsterĀ® Spark Connector User Guide

Product
Aster Analytics
Release Number
7.00.00.01
Published
May 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
dbt1482959363906.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
dbt1482959363906
lifecycle
previous
Product Category
Software

The MLPCTrainDF class defines a wrapper function that uses the Aster Spark API and implements the training phase of the Spark MLlib MultilayerPerceptronClassifier (MLP), both by itself and in a pipeline with Principal Component Analysis (PCA). The function generates a model that is typically used by the MLPCRunDF function.

Run Method Signature

run(inputDF: DataFrame, functParams: String): DataFrame

Parameters

String representing the parameters specific to the function you are implementing. The string has this syntax:
'--option_value_pair [,...]'
option_value_pair is one of the following:
  • blockSize block_size

    [Optional] Block size for stacking input data in matrices to accelerate the computation.

  • labelCol label_column

    [Optional] Specifies the name of the column that contains labels. Default: 'labels'.

  • layers n[:...]

    Required. Specifies the number of elements in each layer. The first n applies to the input layers, the last n applies to the output layers, and each other n applies to a hidden layer. For example, 700:300:200:10 specifies 700 elements in the input layer, 10 elements in the output layer, 300 elements in the first hidden layer, and 200 elements in the second hidden layer.

  • maxIter maximum_iterations

    [Optional] The maximum_iterations must be nonnegative.

  • modelLocation model_location

    Required. Specifies the HDFS path to the location where the function is to save the model.

  • pcaK pcak_value

    [Optional] If specified, the function uses PCA and pcak_value is the number of components for the PCA algorithm. If omitted, the function does not use PCA.

  • seed seed

    [Optional] Specifies a random seed.

  • tol tol

    Specifies the convergence tolerance for iterative algorithms.

  • ignoreCols column[,...]

    [Optional] Specifies the names of input columns to copy to the output table.

Returns

The labels in label_column (if specified), the predicted values, and the input columns.

Side Effects

Function saves model in model_location.

Version

Spark 1.5 and later.