DenseSVMTrainer Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1022
lifecycle
previous
Product Category
Software
SampleIdColumn
Specifies the name of the column in the input table that contains the identifiers of the training samples.
AttributeColumns
Specifies the attribute columns, which must have numeric data types.
KernelFunction
[Optional] Specifies the kernel function that the DenseSVMTrainer function uses to compute the hash function:
  • 'linear' (Default)

    DenseSVMTrainer uses a Pegasos algorithm to solve the linear SVM.

  • 'polynomial'

    DenseSVMTrainer uses a Hash-SVM algorithm.

    The formula for a polynomial is: γ(u T v + c) d

  • 'rfb'

    DenseSVMTrainer uses a Hash-SVM algorithm.

    The formula for RBF is: exp (-γ * | | x - x' | |2)

  • 'sigmoid'

    DenseSVMTrainer uses a Hash-SVM algorithm.

    The formula for sigmoid is: tanh (γ * u T v + c)

When DenseSVMTrainer uses a Hash-SVM algorithm, each sample is represented by compact hash bits, over which an inner product is defined to serve as the surrogate of the original nonlinear kernels.

Gamma
[Optional] Use only when KernelFunction is 'polynomial', 'RBF', or 'sigmoid'. Specifies γ in the formula. The gamma must be a positive DOUBLE value. Default: 1.0.
Constant
[Optional] Use only when KernelFunction is 'polynomial' or 'sigmoid'. Specifies c in the formula. The c must be a DOUBLE value. If KernelFunction is polynomial, the minimum c value is 0.0. Default: 1.0.
Degree
[Optional] Use only when KernelFunction is 'polynomial'. Specifies d in the formula. The d must be a positive INTEGER. Default: 2.
SubspaceDimension
[Optional] Use only when KernelFunction is 'polynomial' or 'sigmoid'. Specifies the random subspace dimension of the basis matrix V obtained by the Gram-Schmidt process. The subspace_dimension must be in the range [1, 2048]. Because the Gram-Schmidt process cannot be parallelized, this dimension cannot be too large. Accuracy increases with higher subspace_dimension values, but computation costs also increase. Default: 256.
HashBits
[Optional] Use only when KernelFunction is 'polynomial', 'RBF', or 'sigmoid'. Specifies the number of compact hash bits that represent a data point. The hash_bits must be in the range [8, 8192]. Accuracy increases with higher hash_bits values, but computation costs also increase. Default: 256.
InputTable
Specifies the name of the table that contains the training samples. Each row consists of a sample identifier, a set of attribute values, and a label.
ModelTable
Specifies the name for the model table that the function creates.
LabelColumn
Specifies the name of the input table column that contains the class identifiers of the samples. The label_column must have an integer or string data type.
Cost
[Optional] Specifies the regularization parameter λ in the SVM soft-margin loss function:


The cost must be greater than 0.0. Default: 1.0.

Bias
[Optional] Specifies whether to add another dimension containing the bias value b. The bias must be nonnegative. If bias is greater than 0, the function converts each sample in the training set to ( , b). Use this argument when not all samples center at 0. Default: 0.0.
ClassWeights
[Optional] Specifies the weights for different classes. If you specify a weight for a class, the cost parameter for that class is weight * cost. A weight larger than 1 often increases the accuracy of class; however, it may decrease global accuracy. Default behavior: The function assigns weight 1.0 to any class not assigned a weight in this argument.
MaxStep
[Optional] Specifies the maximum number of steps of the training process. One step means that the trainer sees each sample once. The max_step must be in the range (0, 10000]. Default: 100.
Epsilon
[Optional] Specifies the termination criterion: When the difference between the values of the loss function in two sequential iterations is less than this epsilon, the function stops. The epsilon must be greater than 0.0. Default: 0.01.
Seed
[Optional] Specifies the seed used to order the training set randomly and consistently. Use this value to cause the function to generate the same model if it is run multiple times in the same database with the same argument values. The seed must be in the range [0, 9223372036854775807]. Default: 0.
OverwriteOutput
[Optional] Specifies whether to overwrite model_table. Default: 'false'.