DenseSVMTrainer Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

SampleIdColumn

Specifies the name of the column in the input table that contains the identifiers of the training samples.

AttributeColumns

Specifies the attribute columns, which must have numeric data types.

KernelFunction

[Optional] Specifies the kernel function that the DenseSVMTrainer function uses to compute the hash function:

'linear' (Default)
DenseSVMTrainer uses a Pegasos algorithm to solve the linear SVM.
'polynomial'
DenseSVMTrainer uses a Hash-SVM algorithm.

The formula for a polynomial is: γ(u T v + c) d
'rfb'
DenseSVMTrainer uses a Hash-SVM algorithm.

The formula for RBF is: exp (-γ * | | x - x' | |2)
'sigmoid'
DenseSVMTrainer uses a Hash-SVM algorithm.

The formula for sigmoid is: tanh (γ * u T v + c)

When DenseSVMTrainer uses a Hash-SVM algorithm, each sample is represented by compact hash bits, over which an inner product is defined to serve as the surrogate of the original nonlinear kernels.

Gamma

[Optional] Use only when KernelFunction is 'polynomial', 'RBF', or 'sigmoid'. Specifies γ in the formula. The gamma must be a positive DOUBLE value. Default: 1.0.

Constant

[Optional] Use only when KernelFunction is 'polynomial' or 'sigmoid'. Specifies c in the formula. The c must be a DOUBLE value. If KernelFunction is polynomial, the minimum c value is 0.0. Default: 1.0.

Degree

[Optional] Use only when KernelFunction is 'polynomial'. Specifies d in the formula. The d must be a positive INTEGER. Default: 2.

SubspaceDimension

[Optional] Use only when KernelFunction is 'polynomial' or 'sigmoid'. Specifies the random subspace dimension of the basis matrix V obtained by the Gram-Schmidt process. The subspace_dimension must be in the range [1, 2048]. Because the Gram-Schmidt process cannot be parallelized, this dimension cannot be too large. Accuracy increases with higher subspace_dimension values, but computation costs also increase. Default: 256.

HashBits

[Optional] Use only when KernelFunction is 'polynomial', 'RBF', or 'sigmoid'. Specifies the number of compact hash bits that represent a data point. The hash_bits must be in the range [8, 8192]. Accuracy increases with higher hash_bits values, but computation costs also increase. Default: 256.

InputTable

Specifies the name of the table that contains the training samples. Each row consists of a sample identifier, a set of attribute values, and a label.

ModelTable

Specifies the name for the model table that the function creates.

LabelColumn

Specifies the name of the input table column that contains the class identifiers of the samples. The label_column must have an integer or string data type.

Cost

[Optional] Specifies the regularization parameter λ in the SVM soft-margin loss function:

The cost must be greater than 0.0. Default: 1.0.

Bias

[Optional] Specifies whether to add another dimension containing the bias value b. The bias must be nonnegative. If bias is greater than 0, the function converts each sample

in the training set to (

, b). Use this argument when not all samples center at 0. Default: 0.0.

ClassWeights

[Optional] Specifies the weights for different classes. If you specify a weight for a class, the cost parameter for that class is weight * cost. A weight larger than 1 often increases the accuracy of class; however, it may decrease global accuracy. Default behavior: The function assigns weight 1.0 to any class not assigned a weight in this argument.

MaxStep

[Optional] Specifies the maximum number of steps of the training process. One step means that the trainer sees each sample once. The max_step must be in the range (0, 10000]. Default: 100.

Epsilon

[Optional] Specifies the termination criterion: When the difference between the values of the loss function in two sequential iterations is less than this epsilon, the function stops. The epsilon must be greater than 0.0. Default: 0.01.

Seed

[Optional] Specifies the seed used to order the training set randomly and consistently. Use this value to cause the function to generate the same model if it is run multiple times in the same database with the same argument values. The seed must be in the range [0, 9223372036854775807]. Default: 0.

OverwriteOutput

[Optional] Specifies whether to overwrite model_table. Default: 'false'.