Description
The SVMSparse (td_svm_sparse_mle
) function takes training data
(in sparse format) and outputs a predictive model in binary format,
which is input to the functions td_svm_sparse_predict_mle
and
td_svm_sparse_summary_mle
.
Usage
td_svm_sparse_mle (
data = NULL,
sample.id.column = NULL,
attribute.column = NULL,
value.column = NULL,
label.column = NULL,
cost = 1,
bias = 0,
hash = FALSE,
hash.buckets = NULL,
class.weights = NULL,
max.step = 100,
epsilon = 0.01,
seed = 0,
data.sequence.column = NULL
)
Arguments
data |
Required Argument.
Specifies the name of the tbl_teradata that contains the training
samples.
|
sample.id.column |
Required Argument.
Specifies the name of the input column that contains the
identifiers of the training samples.
Types: character
|
attribute.column |
Required Argument.
Specifies the name of the input column that contains the attributes
of the samples.
Types: character
|
value.column |
Required Argument.
Specifies the name of the input column that contains the attribute
values.
Types: character
|
label.column |
Required Argument.
Specifies the name of the input column that contains the
classes of the samples.
Types: character
|
cost |
Optional Argument.
Speifies the regularization parameter in the SVM soft-margin loss
function. The cost must be greater than 0.0.
Default Value: 1
Types: numeric
|
bias |
Optional Argument.
Specifies a non-negative value. If the value is greater than zero,
each sample x in the training set will be converted to (x, b);
that is, it will add another dimension containing the bias value b.
This argument addresses situations where not all samples center at 0.
Default Value: 0
Types: numeric
|
hash |
Optional Argument.
Specifies whether to use hash projection on attributes. Hash
projection can accelerate processing speed but can slightly decrease
accuracy.
Note: You must use hash projection if the dataset has more features
than fit into memory.
Default Value: FALSE
Types: logical
|
hash.buckets |
Optional Argument.
Valid only if hash is TRUE. Specifies the number of buckets for
hash projection. In most cases, the function can determine the
appropriate number of buckets from the scale of the input data set.
However, if the dataset has a very large number of features, you
might have to specify the number of buckets to accelerate the function.
Types: numeric
|
class.weights |
Optional Argument.
Specifies the weights for different classes. The format is:
"classlabel m:weight m, classlabel n:weight n". If weight for a class
is given, the cost parameter for this class is weight * cost. A
weight larger than 1 often increases the accuracy of the
corresponding class; however, it may decrease global accuracy.
Classes not assigned a weight in this argument are assigned a weight
of 1.0.
Types: character OR vector of characters
|
max.step |
Optional Argument.
A positive integer value that specifies the maximum number of
iterations of the training process. One step means that each sample
is seen once by the trainer. The input value must be in the range (0,
10000].
Default Value: 100
Types: numeric
|
epsilon |
Optional Argument.
Specifies the termination criterion.
When the difference between the values of the loss function in two
sequential iterations is less than this number, the function stops.
epsilon must be greater than 0.0.
Default Value: 0.01
Types: numeric
|
seed |
Optional Argument.
Specifies a long integer value used to order the training set randomly
and consistently. This value can be used to ensure that the same model
will be generated if the function is run multiple times in a given
database with the same arguments. The input value must be in the
range [0, 9223372036854775807].
Default Value: 0
Types: numeric
|
data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
Value
Function returns an object of class "td_svm_sparse_mle" which is a
named list containing Teradata tbl objects.
Named list members can be referenced directly with the "$" operator
using following names:
model.table
output
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("svmsparse_example", "svm_iris_input_train")
# Create remote tibble objects.
svm_iris_input_train <- tbl(con, "svm_iris_input_train")
# Example -
td_svm_sparse_out <- td_svm_sparse_mle(data = svm_iris_input_train,
sample.id.column = "id",
attribute.column = "attribute",
value.column = "value1",
label.column = "species",
max.step = 150,
seed = 0
)