Teradata Package for R Function Reference | 17.20 - OneClassSVM - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

OneClassSVM

Description

The td_one_class_svm_sqle() is a linear support vector machine (SVM) that performs classification analysis on datasets to identify outliers or novelty.

This function supports these models:

  • Classification (loss: hinge). During the training, all the data is assumed to belong to a single class (value 1), therefore "response_column" is not needed by the model. For td_one_class_svm_predict_sqle(), output values are 0 or 1. A value of 0 corresponds to an outlier, and 1 to a normal

td_one_class_svm_sqle() is implemented using Minibatch Stochastic Gradient Descent (SGD) algorithm, which is highly scalable for large datasets.

The function output is a trained one-class SVM model, which can be input to the td_one_class_svm_predict_sqle() for prediction. The model also contains model statistics of MSE, Loglikelihood, AIC, and BIC.

Notes:

  • The categorical columns should be converted to numerical columns as preprocessing step (for example, td_one_hot_encoding_sqle(), td_ordinal_encoding_sqle()). td_one_class_svm_sqle() takes all features as numeric input.

  • For a good model, dataset should be standardized before feeding to td_one_class_svm_sqle() as a preprocessing step (for example, using td_scale_fit_sqle() and td_scale_tranform_sqle()).

  • The rows with missing values are ignored during training and prediction of td_one_class_svm_sqle and td_one_class_svm_predict_sqle. Consider filling up those rows using imputation (td_simple_impute_fit_sqle() and td_simple_impute_trasform_sqle()) or other mechanism to train on rows with missing values.

  • The function supports linear SVMs only.

  • A maximum of 2046 features are supported due to the limitation imposed by the maximum number of columns (2048) in a database table for td_one_class_svm_sqle().

Usage

  td_one_class_svm_sqle (
      data = NULL,
      input.columns = NULL,
      iter.max = 300,
      batch.size = 10,
      lambda1 = 0.02,
      alpha = 0.15,
      iter.num.no.change = 50,
      tolerance = 0.001,
      intercept = TRUE,
      learning.rate = "OPTIMAL",
      initial.eta = 0.05,
      decay.rate = 0.25,
      decay.steps = 5,
      momentum = 0.0,
      nesterov = FALSE,
      local.sgd.iterations = 0,
      ...
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata

input.columns

Required Argument.
Specifies the name(s) of the column(s) in "data" to be used for training the model (predictors, features, or independent variables).
Types: character OR vector of Strings (character)

iter.max

Optional Argument.
Specifies the maximum number of iterations (mini-batches) over the training data batches.
Note:

  • It must be a positive value less than 10,000,000.

Default Value: 300
Types: integer

batch.size

Optional Argument.
Specifies the number of observations (training samples) processed in a single mini-batch per AMP. The value '0' indicates no mini-batches, the entire dataset is processed in each iteration, and the algorithm becomes Gradient Descent. A value higher than the number of rows on any AMP will also default to Gradient Descent.
Notes:

  • It must be a non-negative integer value.

  • It must be in the range [0, 2147483647]

Default Value: 10
Types: integer

lambda1

Optional Argument.
Specifies the amount of regularization to be added. The higher the value, the stronger the regularization. It is also used to compute the learning rate when the "learning.rate" is set to 'OPTIMAL'.
A value of '0' means no regularization.
Note:

  • It must be a non-negative float value.

Default Value: 0.02
Types: float OR integer

alpha

Optional Argument.
Specifies the Elasticnet parameter for penalty computation. It only becomes effective when "lambda1" is greater than 0. The value represents the contribution ratio of L1 in the penalty. A value '1.0' indicates L1 (LASSO) only, a value '0' indicates L2 (Ridge) only, and a value in between is a combination of L1 and L2.
Note:

  • It must be a float value between 0 and 1.

Default Value: 0.15(15 Types: float OR integer

iter.num.no.change

Optional Argument.
Specifies the number of iterations (mini-batches) with no improvement in loss including the "tolerance" to stop training. A value of '0' indicates no early stopping and the algorithm continues until "iter.max" iterations are reached.
Notes:

  • It must be a non-negative integer value.

  • It must be in the range [0, 2147483647]

Default Value: 50
Types: integer

tolerance

Optional Argument.
Specifies the stopping criteria in terms of loss function improvement.
Notes:

  • Applicable when "iter.num.no.change" is greater than '0'.

  • It must be a positive value.

Default Value: 0.001
Types: float OR integer

intercept

Optional Argument.
Specifies whether "intercept" should be estimated or not based on whether "data" is already centered or not.
Default Value: TRUE
Types: logical

learning.rate

Optional Argument.
Specifies the learning rate algorithm for SGD iterations.
Permitted Values: 'CONSTANT', 'OPTIMAL', 'INVTIME', 'ADAPTIVE'
Default Value: 'OPTIMAL'
Types: character

initial.eta

Optional Argument.
Specifies the initial value of eta for the learning rate. When the "learning.rate" is 'CONSTANT', this value is applicable for all iterations.
Default Value: 0.05
Types: float OR integer

decay.rate

Optional Argument.
Specifies the decay rate for the learning rate.
Note:

  • Only applicable for 'INVTIME' and 'ADAPTIVE' learning rates.

Default Value: 0.25
Types: float OR integer

decay.steps

Optional Argument.
Specifies the decay steps (number of iterations) for the 'ADAPTIVE' learning rate. The learning rate changes by decay rate after the specified number of iterations are completed.
Note:

  • It must be in the range [0, 2147483647]

Default Value: 5
Types: integer

momentum

Optional Argument.
Specifies the value to use for the momentum learning rate optimizer. A larger value indicates a higher momentum contribution.
A value of '0' means the momentum optimizer is disabled. For a good momentum contribution, a value between 0.6-0.95 is recommended.
Note:

  • It must be a non-negative float value between 0 and 1.

Default Value: 0.0
Types: float OR integer

nesterov

Optional Argument.
Specifies whether Nesterov optimization should be applied to the momentum optimizer or not.
Note:

  • Applicable when "momentum" is greater than 0.

Default Value: FALSE
Types: logical

local.sgd.iterations

Optional Argument.
Specifies the number of local iterations to be used for Local SGD algorithm. A value of 0 implies Local SGD is disabled. A value higher than 0 enables Local SGD and that many local iterations are performed before updating the weights for the global model. With Local SGD algorithm, recommended values for arguments are as follows:

  • local.sgd.iterations: 10

  • iter_max:100

  • batch.size: 50

  • iter.num.no.change: 5

Note:

  • It must be a positive integer value.

Default Value: 0
Types: integer

...

Specifies the generic keyword arguments SQLE functions accept. Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_one_class_svm_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):

  1. result

  2. output.data

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection

    # Load the example data.
    loadExampleData("tdplyr_example", "diabetes")
    
    # Create tbl_teradata object.
    data_input <- tbl(con, "diabetes")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Example 1 :  Train OneClassSVM model using "input.columns"
    #              which helps in identifying the input data whether
    #              it is normal or novelty when result of td_one_class_svm_sqle()
    #              is passed to td_one_class_svm_predict_sqle().
    one_class_svm1 <- td_one_class_svm_sqle(
                        data=data_input,
                       input.columns=c('age', 'sex', 'bmi',
                                      'map1', 'tc', 'ldl',
                                      'hdl', 'tch', 'ltg',
                                      'glu', 'y'),
                       local.sgd.iterations=537,
                       batch.size=1,
                       learning.rate='CONSTANT',
                       initial.eta=0.01,
                       lambda1=0.1,
                       alpha=0.0,
                       momentum=0.0,
                       iter.max=1
                       )
    
    # Print the result.
    print(one_class_svm1$result)
    print(one_class_svm1$output.data)
    
    # Example 2 :  Train OneClassSVM model using "input.columns",
    #              "learning.rate" set to 'ADAPTIVE', "momentum"
    #              set to '0.6' for better results.
    one_class_svm2 <- td_one_class_svm_sqle(
                       data=data_input,
                       input.columns=c('age', 'sex', 'bmi',
                                      'map1', 'tc', 'ldl',
                                      'hdl', 'tch', 'ltg',
                                      'glu', 'y'),
                       local.sgd.iterations=537,
                       batch.size=1,
                       learning.rate='ADAPTIVE',
                       initial.eta=0.01,
                       lambda1=0.1,
                       alpha=0.0,
                       momentum=0.6,
                       iter.max=100)
    
    # Print the result.
    print(one_class_svm2$result)
    print(one_class_svm2$output.data)