Teradata R Package Function Reference | 17.00 - 17.00 - SVMDense - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The DenseSVMTrainer function takes training data in dense format and outputs a predictive model in binary format, which is the input to the functions DenseSVMPredictor (td_svm_dense_predict_mle) and DenseSVMModelPrinter (td_svm_dense_summary_mle).

Usage

  td_svm_dense_mle (
      data = NULL,
      sample.id.column = NULL,
      attribute.columns = NULL,
      kernel.function = "LINEAR",
      gamma = 1.0,
      constant = 1.0,
      degree = 2,
      subspace.dimension = 256,
      hash.bits = 256,
      label.column = NULL,
      cost = 1.0,
      bias = 0.0,
      class.weights = NULL,
      max.step = 100,
      epsilon = 0.01,
      seed = 0,
      data.sequence.column = NULL
  )

Arguments

data

Required Argument.
Specifies the name of the tbl_teradata containing the training samples. Each row consists of a sample id, a set of attribute values, and a corresponding label.

sample.id.column

Required Argument.
Specifies the name of the column in the data that contains the identifier of the training samples.
Types: character

attribute.columns

Required Argument.
Specifies all the attribute columns. Attribute columns must have a numeric value.
Types: character OR vector of Strings (character)

kernel.function

Optional Argument.
Specifies the distribution exponential family used to compute the hash function. For function linear, a Pegasos algorithm is used to solve the linear SVM. For function polynomial, RBF, or sigmoid, a Hash-SVM algorithm is used. Each sample is represented by compact hash bits, over which an inner product is defined to serve as the surrogate of the original nonlinear kernels.
Default Value: "LINEAR"
Permitted Values: LINEAR, POLYNOMIAL, RBF, SIGMOID
Types: character

gamma

Optional Argument.
Specifies double value. This argument is used only when "kernel.function" is polynomial, RBF, or sigmoid.
Default Value: 1.0
Types: numeric

constant

Optional Argument.
Specifies double value. This argument is used only when "kernel.function" polynomial or sigmoid. If "kernel.function" is polynomial, the minimum value is 0.0.
Default Value: 1.0
Types: numeric

degree

Optional Argument.
specifies the degree (d) of the polynomial kernel. This argument is used only when "kernel.function" polynomial. The input value must be greater than 0.
Default Value: 2
Types: integer

subspace.dimension

Optional Argument.
Specifies the random subspace dimension of the basis matrix V obtained by the Gram-Schmidt process. Since the Gram-Schmidt process cannot be parallelized, this dimension cannot be too large. Accuracy will increase with higher values of this number, but computation costs will also increase. The input value must be in the range [1, 2048].
Only valid if "kernel.function" is polynomial, RBF, or sigmoid.
Default Value: 256
Types: integer

hash.bits

Optional Argument.
Specifies the number of compact hash bits used to represent a data point. Accuracy will increase with higher values of this number, but computation costs will also increase. The input value must be in the range [8, 8192].
Only valid if kernel is polynomial, RBF, or sigmoid.
Default Value: 256
Types: integer

label.column

Required Argument.
Specifies the column that identifies the class of the corresponding sample. Must be an integer or a string.
Types: character

cost

Optional Argument.
Specifies the regularization parameter in the SVM soft-margin loss function. Cost must be greater than 0.0.
Default Value: 1.0
Types: numeric

bias

Optional Argument.
Specifies a non-negative value. If the value is greater than zero, each sample (x) in the training set will be converted to (x, b); that is, it will add another dimension containing the bias value b. This argument addresses situations where not all samples center at 0.
Default Value: 0.0
Types: numeric

class.weights

Optional Argument.
Specifies the weights for different classes. The format is: "classlabel m:weight m".
For a single class, the weight can be specified as the value of type character:
"classlabel m:weight m"
For multiple classes, the weights can be specified as a vector of characters:
c("classlabel m:weight m", "classlabel n:weight n")
If weight for a class is given, the cost parameter for this class will be weight * cost. A weight larger than 1 often increases the accuracy of the corresponding class; however, it may decrease global accuracy. Classes not assigned a weight in this argument will be assigned a weight of 1.0.
Types: character OR vector of characters

max.step

Optional Argument.
Specifies a positive integer value that specifies the maximum number of iterations of the training process. One step means that each sample is seen once by the trainer. The input value must be in the range (0, 10000].
Default Value: 100
Types: integer

epsilon

Optional Argument.
Specifies the termination criterion. When the difference between the values of the loss function in two sequential iterations is less than this number, the function stops. Must be greater than 0.0.
Default Value: 0.01
Types: numeric

seed

Optional Argument.
Speifies an integer value used to order the training set randomly and consistently. This value can be used to ensure that the same model will be generated if the function is run multiple times in a given database with the same arguments. The input value must be in the range [0, 9223372036854775807].
Default Value: 0
Types: numeric

data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_svm_dense_mle" which is a named list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator using following names:

  1. model.table

  2. output

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("svmdense_example", "svm_iris_train")

    # Create object(s) of class "tbl_teradata".
    svm_iris_train <- tbl(con, "svm_iris_train")

    # Example 1 -  Linear Model
    td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
                                         sample.id.column = "id",
                                         attribute.columns = c('sepal_length', 'sepal_width', 
                                                               'petal_length', 'petal_width'),
                                         kernel.function = "linear",
                                         label.column = "species",
                                         cost = 1,
                                         bias = 0,
                                         max.step = 100,
                                         seed = 1
                                         )

    # Example 2 - Polynomial Model
    td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
                                         sample.id.column = "id",
                                         attribute.columns = c('sepal_length', 'sepal_width', 
                                                               'petal_length', 'petal_width'),
                                         kernel.function = "polynomial",
                                         gamma = 0.1,
                                         degree = 2,
                                         subspace.dimension = 120,
                                         hash.bits = 512,
                                         label.column = "species",
                                         cost = 1,
                                         bias = 0,
                                         max.step = 100,
                                         seed = 1
                                         )

    # Example 3 - Radial Basis Model (RBF) Model
    td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
                                         sample.id.column = "id",
                                         attribute.columns = c('sepal_length', 'sepal_width', 
                                                               'petal_length', 'petal_width'),
                                         kernel.function = "rbf",
                                         gamma = 0.1,
                                         subspace.dimension = 120,
                                         hash.bits = 512,
                                         label.column = "species",
                                         cost = 1,
                                         bias = 0,
                                         max.step = 100,
                                         seed = 1
                                         )

    # Example 4 - Sigmoid Model
    td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
                                         sample.id.column = "id",
                                         attribute.columns = c('sepal_length', 'sepal_width', 
                                                               'petal_length', 'petal_width'),
                                         kernel.function = "sigmoid",
                                         gamma = 0.1,
                                         subspace.dimension = 120,
                                         hash.bits = 512,
                                         label.column = "species",
                                         cost = 1,
                                         bias = 0,
                                         max.step = 30,
                                         seed = 1
                                         )