OneClassSVM
Description
The td_one_class_svm_sqle()
is a linear support vector machine (SVM) that performs
classification analysis on datasets to identify outliers or novelty.
This function supports these models:
Classification (loss: hinge). During the training, all the data is assumed to belong to a single class (value 1), therefore "response_column" is not needed by the model. For
td_one_class_svm_predict_sqle()
, output values are 0 or 1. A value of 0 corresponds to an outlier, and 1 to a normal
td_one_class_svm_sqle()
is implemented using Minibatch Stochastic Gradient Descent (SGD) algorithm,
which is highly scalable for large datasets.
The function output is a trained one-class SVM model, which can be input to the
td_one_class_svm_predict_sqle()
for prediction. The model also contains model statistics of MSE,
Loglikelihood, AIC, and BIC.
Notes:
The categorical columns should be converted to numerical columns as preprocessing step (for example,
td_one_hot_encoding_sqle()
,td_ordinal_encoding_sqle()
).td_one_class_svm_sqle()
takes all features as numeric input.For a good model, dataset should be standardized before feeding to
td_one_class_svm_sqle()
as a preprocessing step (for example, usingtd_scale_fit_sqle()
andtd_scale_tranform_sqle()
).The rows with missing values are ignored during training and prediction of
td_one_class_svm_sqle
andtd_one_class_svm_predict_sqle
. Consider filling up those rows using imputation (td_simple_impute_fit_sqle()
andtd_simple_impute_trasform_sqle()
) or other mechanism to train on rows with missing values.The function supports linear SVMs only.
A maximum of 2046 features are supported due to the limitation imposed by the maximum number of columns (2048) in a database table for
td_one_class_svm_sqle()
.
Usage
td_one_class_svm_sqle (
data = NULL,
input.columns = NULL,
iter.max = 300,
batch.size = 10,
lambda1 = 0.02,
alpha = 0.15,
iter.num.no.change = 50,
tolerance = 0.001,
intercept = TRUE,
learning.rate = "OPTIMAL",
initial.eta = 0.05,
decay.rate = 0.25,
decay.steps = 5,
momentum = 0.0,
nesterov = FALSE,
local.sgd.iterations = 0,
...
)
Arguments
data |
Required Argument. |
input.columns |
Required Argument. |
iter.max |
Optional Argument.
Default Value: 300 |
batch.size |
Optional Argument.
Default Value: 10 |
lambda1 |
Optional Argument.
Default Value: 0.02 |
alpha |
Optional Argument.
Default Value: 0.15(15 Types: float OR integer |
iter.num.no.change |
Optional Argument.
Default Value: 50 |
tolerance |
Optional Argument.
Default Value: 0.001 |
intercept |
Optional Argument. |
learning.rate |
Optional Argument. |
initial.eta |
Optional Argument. |
decay.rate |
Optional Argument.
Default Value: 0.25 |
decay.steps |
Optional Argument.
Default Value: 5 |
momentum |
Optional Argument.
Default Value: 0.0 |
nesterov |
Optional Argument.
Default Value: FALSE |
local.sgd.iterations |
Optional Argument.
Note:
Default Value: 0 |
... |
Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments: volatile: Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:
Note: |
Value
Function returns an object of class "td_one_class_svm_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):
result
output.data
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load the example data.
loadExampleData("tdplyr_example", "diabetes")
# Create tbl_teradata object.
data_input <- tbl(con, "diabetes")
# Check the list of available analytic functions.
display_analytic_functions()
# Example 1 : Train OneClassSVM model using "input.columns"
# which helps in identifying the input data whether
# it is normal or novelty when result of td_one_class_svm_sqle()
# is passed to td_one_class_svm_predict_sqle().
one_class_svm1 <- td_one_class_svm_sqle(
data=data_input,
input.columns=c('age', 'sex', 'bmi',
'map1', 'tc', 'ldl',
'hdl', 'tch', 'ltg',
'glu', 'y'),
local.sgd.iterations=537,
batch.size=1,
learning.rate='CONSTANT',
initial.eta=0.01,
lambda1=0.1,
alpha=0.0,
momentum=0.0,
iter.max=1
)
# Print the result.
print(one_class_svm1$result)
print(one_class_svm1$output.data)
# Example 2 : Train OneClassSVM model using "input.columns",
# "learning.rate" set to 'ADAPTIVE', "momentum"
# set to '0.6' for better results.
one_class_svm2 <- td_one_class_svm_sqle(
data=data_input,
input.columns=c('age', 'sex', 'bmi',
'map1', 'tc', 'ldl',
'hdl', 'tch', 'ltg',
'glu', 'y'),
local.sgd.iterations=537,
batch.size=1,
learning.rate='ADAPTIVE',
initial.eta=0.01,
lambda1=0.1,
alpha=0.0,
momentum=0.6,
iter.max=100)
# Print the result.
print(one_class_svm2$result)
print(one_class_svm2$output.data)