TD_SVM function is a linear support vector machine (SVM) that performs classification and regression analysis on datasets.
- Regression (loss: epsilon_insensitive).
- Classification (loss: hinge). Only supports binary classification. The only response values are 0 or 1.
TD_SVM is implemented using Minibatch Stochastic Gradient Descent (SGD) algorithm, which is highly scalable for large datasets. See TD_GLM for details on SGD.
Support Vector Machines (SVM) is a type of supervised machine learning algorithm used for classification and regression analysis. The goal of SVM is to find a hyperplane that best separates the data points into different classes while maximizing the margin, such as the distance between the hyperplane and the nearest data points from both classes.
SVM is a machine learning algorithm used for classification and regression analysis that has been proven to be effective in solving various real-world problems. Its ability to handle high-dimensional data, non-linearly separable data, and outliers make it a popular choice for many applications.
- Standardize the Input features using TD_ScaleFit and TD_ScaleTransform functions. The function only accepts numeric features.
- Before training, convert the categorical features to numeric values. The function skips the rows with missing (null) values during training.
The function output is a trained SVM model, which can be used as input to TD_SVMPredict for prediction. The model also contains model statistics of mean squared error (MSE), Loglikelihood, Akaike information criterion (AIC), and Bayesian information criterion (BIC).
- Perform model evaluation as a post-processing step using functions such as TD_RegressionEvaluator, TD_ClassificationEvaluator, and TD_ROC.
The optimization problem of SVM can be formulated as:
minimize: 1/2 *|(|w|)|2+C * sum(xi)
- ||w|| is the Euclidean norm of the weight vector w
- C is a parameter that controls the trade-off between maximizing the margin and minimizing the classification error
- xi is the slack variable that measures the degree of misclassification
- sum(xi) is the total misclassification error
- yi is the class label (either -1 or 1)
- xi is the slack variable
- w is the weight vector
- b is the bias term
- (xi + b) is the decision boundary
- Linear kernel:
K(xi,xj) = xi*xj
- Polynomial kernel:
K(xi,xj)=(gamma*xi*xj+r)d
where gamma, r, and d are parameters.
- Radial basis function (RBF) kernel:
K(xi,xj) = exp(-gamma*|(|xi-xj|)|2 )
where gamma is a parameter.
Once the data is mapped into a higher-dimensional space, the optimization problem can be solved to find the optimal hyperplane. The decision boundary is then given by:
w*phi(xi) + b = 0
where phi(xi) is the feature vector in the higher-dimensional space.
SVM can be trained using the Minibatch Stochastic Gradient Descent (SGD) algorithm, which is a popular optimization algorithm for large-scale machine learning problems. Minibatch SGD updates the weight vector incrementally using small batches of data, making it more computationally efficient than batch SGD.
The objective function for SVM using Minibatch SGD can be formulated as follows:
minimize:
1/2 *|(|w|)|2+C * (1/b)*sum(max(0,1-yi(w * xi + b)))
- ||w|| is the Euclidean norm of the weight vector w
- C is a parameter that controls the trade-off between maximizing the margin and minimizing the classification error
- yi is the class label (either -1 or 1)
- xi is the feature vector
- b is the bias term
- b is the size of the minibatch
At each iteration, a minibatch of data points is randomly selected from the training set, and the weight vector is updated using the following equation:
w = w - eta * (lambda * w - (1/b) * sum(max(0,1 - yi(w * xi + b))) * yi * xi)
- eta is the learning rate
- lambda is the regularization parameter
- xi is the feature vector in the minibatch
The weight vector is then updated iteratively until convergence.
Minibatch SGD has several advantages over batch SGD, including faster convergence and better generalization performance. However, it also has some disadvantages, such as sensitivity to the choice of the learning rate and the minibatch size.