SVM
Description
The td_svm_sqle()
function is a linear support vector machine that performs
classification and regression analysis on data sets.
This function supports these models:
Regression (loss: epsilon_insensitive).
Classification (loss: hinge). Only binary classification is supported. The only response values are 0 or 1.
td_svm_sqle()
is implemented using Minibatch Stochastic Gradient Descent (SGD) algorithm,
which is highly scalable for large datasets.
Due to gradient-based learning, the function is highly sensitive to feature scaling.
Before using the features in the function, you must standardize the Input features
using td_scale_fit_sqle()
and td_scale_transform_sqle()
functions. The function only accepts numeric
features. Therefore, before training, you must convert the categorical features to
numeric values. The function skips the rows with missing (null) values during training.
The function output is a trained td_svm_sqle model, which can be input to the td_svm_predict_sqle()
for prediction. The model also contains model statistics of mean squared error (MSE),
Loglikelihood, Akaike information criterion (AIC), and Bayesian information criterion (BIC).
Further model evaluation can be done as a post-processing step using functions such as
td_regression_evaluator_sqle()
, td_classification_evaluator_sqle()
, and td_roc_sqle()
.
Usage
td_svm_sqle (
formula = NULL,
data = NULL,
input.columns = NULL,
response.column = NULL,
model.type = "Classification",
iter.max = 300,
epsilon = 0.1,
batch.size = 10,
lambda1 = 0.02,
alpha = 0.15,
iter.num.no.change = 50,
tolerance = 0.001,
intercept = TRUE,
class.weights = "0:1.0, 1:1.0",
learning.rate = NULL,
initial.eta = 0.05,
decay.rate = 0.25,
decay.steps = 5,
momentum = 0.0,
nesterov = FALSE,
local.sgd.iterations = 0,
...
)
Arguments
formula |
Required Argument when "input.columns" and "response.column" are not provided,
optional otherwise.
Types: character |
data |
Required Argument. |
input.columns |
Required Argument when "formula" is not provided, optional otherwise.
Types: character OR vector of Strings (character) |
response.column |
Required Argument when "formula" is not provided, optional otherwise.
Types: character |
model.type |
Optional Argument. |
iter.max |
Optional Argument.
Default Value: 300 |
epsilon |
Optional Argument. |
batch.size |
Optional Argument.
Default Value: 10 |
lambda1 |
Optional Argument.
Default Value: 0.02 |
alpha |
Optional Argument.
Default Value: 0.15(15 Types: float OR integer |
iter.num.no.change |
Optional Argument.
Default Value: 50 |
tolerance |
Optional Argument.
Default Value: 0.001 |
intercept |
Optional Argument. |
class.weights |
Optional Argument.
Default Value: "0:1.0, 1:1.0" |
learning.rate |
Optional Argument.
Types: character |
initial.eta |
Optional Argument. |
decay.rate |
Optional Argument.
Default Value: 0.25 |
decay.steps |
Optional Argument. |
momentum |
Optional Argument.
Default Value: 0.0 |
nesterov |
Optional Argument.
Default Value: FALSE |
local.sgd.iterations |
Optional Argument.
Note:
Default Value: 0 |
... |
Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments: volatile: Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:
Note: |
Value
Function returns an object of class "td_svm_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):
result
output.data
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load the example data.
loadExampleData("tdplyr_example", "cal_housing_ex_raw")
# Create tbl_teradata object.
data_input <- tbl(con, "cal_housing_ex_raw")
# Check the list of available analytic functions.
display_analytic_functions()
# Scale "target_columns" with respect to 'STD' value of the column.
fit_obj <- td_scale_fit_sqle(
data=data_input,
target.columns=c('MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup',
'Latitude', 'Longitude'),
scale.method="STD")
# Transform the data.
transform_obj <- td_scale_transform_sqle(data=data_input,
object=fit_obj$output,
accumulate=c("id", "MedHouseVal"))
# Example 1 : Train the transformed data using td_svm_sqle when
# "model.type" is 'Regression' and default values provided.
obj1 <- td_svm_sqle(
data=transform_obj$result,
input.columns=c('MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup',
'Latitude', 'Longitude'),
response.column="MedHouseVal",
model.type="Regression"
)
# Print the result.
print(obj1$result)
print(obj1$output.data)
# Example 2 : Train the transformed data using td_svm_sqle()
# when "model.type" is 'Classification'
# and "learning.rate" is 'INV_TIME'.
obj2 <- td_svm_sqle(
data=transform_obj$result,
input.columns=c('MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup',
'Latitude', 'Longitude'),
response.column="MedHouseVal",
model.type="Classification",
batch.size=12,
iter.max=301,
lambda1=0.1,
alpha=0.5,
iter.num.no.change=60,
tolerance=0.01,
intercept=FALSE,
class.weights="0:1.0,1:0.5",
learning.rate="INVTIME",
initial.data=0.5,
decay.rate=0.5,
momentum=0.6,
nesterov=TRUE,
local.sgd.iterations=1
)
# Print the result.
print(obj2$result)
print(obj2$output.data)
# Example 3 : Generate linear support vector machine(SVM) when
# "learning.rate" is 'ADAPTIVE' and "class_weight" is
# '0:1.0,1:0.5'.
obj3 <- td_svm_sqle(
data=transform_obj$result,
input.columns=c('MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population', 'AveOccup',
'Latitude', 'Longitude'),
response.column="MedHouseVal",
model.type="Classification",
batch.size=1,
iter.max=1,
lambda1=0.0,
iter.num.no.change=60,
tolerance=0.01,
intercept=FALSE,
class.weights="0:1.0,1:0.5",
learning.rate="ADAPTIVE",
initial.data=0.1,
decay.rate=0.5,
momentum=0.7,
nesterov=TRUE,
local.sgd.iterations=1
)
# Print the result.
print(obj3$result)
print(obj3$output.data)
# Example 4 : Generate linear support vector machine(SVM) when
# "decay.rate" is 0.5 and "model.type" is 'regression'.
obj4 <- td_svm_sqle(
data=transform_obj$result,
input.columns=c('MedInc', 'HouseAge', 'AveRooms',
'AveBedrms', 'Population'),
response.column="MedHouseVal",
model.type="Regression",
decay.rate=0.5,
momentum=0.7,
nesterov=TRUE,
local.sgd.iterations=1
)
# Print the result.
print(obj4$result)
print(obj4$output.data)
# Example 5 : Generate linear support vector machine(SVM) using
# input tbl_teradata, provided formula and "model.type" is 'regression'.
formula <- MedHouseVal~MedInc + HouseAge + AveRooms +
AveBedrms + Population + AveOccup + Latitude + Longitude
obj5 <- td_svm_sqle(
data=transform_obj$result,
formula=formula,
model.type="Regression"
)
# Print the result.
print(obj5$result)
print(obj5$output.data)