Description
The DenseSVMTrainer (td_svm_dense_mle
) function takes training data
in dense format and outputs a predictive model in binary format, which is
the input to the functions DenseSVMPredictor (td_svm_dense_predict_mle
)
and DenseSVMModelPrinter (td_svm_dense_summary_mle
).
Usage
td_svm_dense_mle (
data = NULL,
sample.id.column = NULL,
attribute.columns = NULL,
kernel.function = "LINEAR",
gamma = 1.0,
constant = 1.0,
degree = 2,
subspace.dimension = 256,
hash.bits = 256,
label.column = NULL,
cost = 1,
bias = 0,
class.weights = NULL,
max.step = 100,
epsilon = 0.01,
seed = 0,
data.sequence.column = NULL
)
Arguments
data |
Required Argument.
Name of the tbl_teradata containing the training samples. Each row
consists of a sample id, a set of attribute values, and a
corresponding label.
|
sample.id.column |
Required Argument.
Name of the column in the data that contains the identifier of the
training samples.
|
attribute.columns |
Required Argument.
Specifies the names of the columns in the data argument that contain
the attributes, which must have numeric data types.
|
kernel.function |
Optional Argument.
Specifies the distribution exponential family used to compute the hash function.
For function linear, a Pegasos algorithm is used to solve the linear SVM.
For function polynomial, RBF, or sigmoid, a Hash-SVM algorithm is used. Each
sample is represented by compact hash bits, over which an inner
product is defined to serve as the surrogate of the original
nonlinear kernels.
Default Value: "LINEAR"
Permitted Values: LINEAR, POLYNOMIAL, RBF, SIGMOID
|
gamma |
Optional Argument.
Only used when kernel.function is polynomial, RBF, or sigmoid. Must be a
positive double. The minimum value is 0.0.
Default Value: 1.0
|
constant |
Optional Argument.
Specifies double value. This argument is used only when kernel.function is
polynomial or sigmoid. If kernel.function is polynomial, the minimum
value is 0.0.
Default Value: 1.0
|
degree |
Optional Argument.
Only used when kernel.function is polynomial. A positive integer that
specifies the degree (d) of the polynomial kernel. The input value
must be greater than 0.
Default Value: 2
|
subspace.dimension |
Optional Argument.
Only valid if kernel.function is polynomial, RBF, or sigmoid. A positive
integer that specifies the random subspace dimension of the basis
matrix V obtained by the Gram-Schmidt process. Since the Gram-Schmidt
process cannot be parallelized, this dimension cannot be too large.
Accuracy will increase with higher values of this number, but
computation costs will also increase. The input value must be in the
range [1, 2048].
Default Value: 256
|
hash.bits |
Optional Argument.
Only valid if kernel.function is polynomial, RBF, or sigmoid. A positive
integer that specifies the number of compact hash bits used to represent
a data point. Accuracy will increase with higher values of this
number, but computation costs will also increase. The input value
must be in the range [8, 8192].
Default Value: 256
|
label.column |
Required Argument.
Specifies the column that identifies the class of the corresponding sample. Must be
an integer or a string.
|
cost |
Optional Argument.
Specifies the regularization parameter in the SVM soft-margin loss function.
Cost must be greater than 0.0.
Default Value: 1
|
bias |
Optional Argument.
Specifies a non-negative value. If the value is greater than zero, each sample
(x) in the training set will be converted to (x, b); that is, it
will add another dimension containing the bias value b. This argument
addresses situations where not all samples center at 0.
Default Value: 0
|
class.weights |
Optional Argument.
Specifies the weights for different classes. The format should be:
"classlabel m:weight m, classlabel n:weight n". If weight for a class
is given, the cost parameter for this class will be weight * cost. A
weight larger than 1 often increases the accuracy of the
corresponding class; however, it may decrease global accuracy.
Classes not assigned a weight in this argument will be assigned a
weight of 1.0.
|
max.step |
Optional Argument.
A positive integer value that specifies the maximum number of
iterations of the training process. One step means that each sample
is seen once by the trainer. The input value must be in the range (0,
10000].
Default Value: 100
|
epsilon |
Optional Argument.
Termination criterion. When the difference between the values of the
loss function in two sequential iterations is less than this number,
the function stops. Must be greater than 0.0.
Default Value: 0.01
|
seed |
Optional Argument.
Specifies an integer value used to order the training set randomly and
consistently. This value can be used to ensure that the same model
will be generated if the function is run multiple times in a given
database with the same arguments.
Default Value: 0
|
data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
|
Value
Function returns an object of class "td_svm_dense_mle" which is a named
list containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator
using following names:
model.table
output
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("svmdense_example", "svm_iris_train")
# Create remote tibble objects.
svm_iris_train <- tbl(con, "svm_iris_train")
# Example 1 - Linear Model
td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
sample.id.column = "id",
attribute.columns = c('sepal_length', 'sepal_width' , 'petal_length' , 'petal_width'),
kernel.function = "linear",
label.column = "species",
cost = 1,
bias = 0,
max.step = 100,
seed = 1
)
# Example 2 - Polynomial Model
td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
sample.id.column = "id",
attribute.columns = c('sepal_length', 'sepal_width' , 'petal_length' , 'petal_width'),
kernel.function = "polynomial",
gamma = 0.1,
degree = 2,
subspace.dimension = 120,
hash.bits = 512,
label.column = "species",
cost = 1,
bias = 0,
max.step = 100,
seed = 1
)
# Example 3 - Radial Basis Model (RBF) Model
td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
sample.id.column = "id",
attribute.columns = c('sepal_length', 'sepal_width' , 'petal_length' , 'petal_width'),
kernel.function = "rbf",
gamma = 0.1,
subspace.dimension = 120,
hash.bits = 512,
label.column = "species",
cost = 1,
bias = 0,
max.step = 100,
seed = 1
)
# Example 4 - Sigmoid Model
td_svm_dense_out <- td_svm_dense_mle(data = svm_iris_train,
sample.id.column = "id",
attribute.columns = c('sepal_length', 'sepal_width' , 'petal_length' , 'petal_width'),
kernel.function = "sigmoid",
gamma = 0.1,
subspace.dimension = 120,
hash.bits = 512,
label.column = "species",
cost = 1,
bias = 0,
max.step = 30,
seed = 1
)