TD_OneClassSVM Predict Function | OneClassSVMPredict - TD_OneClassSVMPredict - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

TD_OneClassSVMPredict predicts target class labels (classification) for test data using a one-class SVM model trained by TD_OneClassSVM. Output values are 0 and 1. A value of 1 corresponds to a 'normal' observation, and a value of 0 is assigned to 'outlier' observations.

Assumptions

Similar to TD_OneClassSVM, input features must be standardized, such as using TD_ScaleFit and TD_ScaleTransform, before using in the function. The function takes only numeric features. The categorical features must be converted to numeric values prior to prediction. Rows with missing (null) values are skipped by the function during prediction. For prediction results evaluation, you can use TD_ClassificationEvaluator or TD_ROC as the postprocessing step.

OneClassSVM is a type of support vector machine algorithm used for outlier detection or novelty detection. Unlike traditional SVMs that are used for classification and regression, OneClassSVM is an unsupervised learning algorithm that learns to identify anomalies or outliers in a given dataset.

To make a prediction using OneClassSVM, the algorithm first learns the normal behavior of the dataset during training. Then, during testing, it evaluates new data points to determine if they fit within the normal behavior of the dataset or if they are outliers.

OneClassSVM is commonly used in various fields such as fraud detection, intrusion detection, and fault detection in industrial systems, where detecting anomalies is critical for maintaining the safety and reliability of the systems.

The algorithm works by learning the normal behavior of the dataset during training. It does this by creating a hyperplane that separates the majority of the data points from the rest of the dataset. The hyperplane is created in such a way that it maximizes the margin between the hyperplane and the closest data points. The data points that fall outside of this margin are considered as outliers or anomalies.

During testing, the algorithm predicts whether a new observation belongs to the normal behavior of the dataset or not. To make this prediction, the algorithm evaluates the distance of the new observation to the hyperplane. If the new observation falls within the margin or on the same side as the majority of the training data, it is considered as a normal data point. If the new observation falls outside the margin or on the opposite side of the majority of the training data, it is considered as an outlier.

Here's how it works:
  1. Data Preprocessing: The first step is to preprocess the dataset to prepare it for training. This typically involves removing any missing values, scaling the features, and removing any noise or irrelevant information.
  2. Training: The next step is to train the OneClassSVM model using the preprocessed data. The model learns to identify the normal behavior of the dataset during this step. It does this by finding a hyperplane that separates the majority of the data points from the rest of the dataset. This hyperplane is created in such a way that it maximizes the margin between the hyperplane and the closest data points.
  3. Prediction: Once the model has been trained, it can be used to predict whether new observations belong to the normal behavior of the dataset or not. During prediction, the algorithm evaluates the distance of the new observation to the hyperplane. If the new observation falls within the margin or on the same side as the majority of the training data, it is considered as a normal data point. If the new observation falls outside the margin or on the opposite side of the majority of the training data, it is considered as an outlier.
  4. Parameter Tuning: One of the challenges of using OneClassSVM is selecting the right values for the hyperparameters. The hyperparameters control the behavior of the algorithm, such as the width of the margin and the type of kernel used. To select the best hyperparameters, cross-validation can be used to evaluate the performance of the model on a held-out dataset.
  5. Model Evaluation: Once the model has been trained and tuned, it is important to evaluate its performance. This typically involves using metrics such as precision, recall, and F1-score to assess how well the model can identify outliers and normal data points.

One of the benefits of OneClassSVM is that it can handle datasets with a high dimensionality. It can also handle datasets with a small number of observations, as long as the observations are representative of the normal behavior of the dataset. Also, OneClassSVM has a main application in anomaly detection. For example, in fraud detection, OneClassSVM can be used to identify credit card transactions that are unusual or deviate from the normal behavior of the dataset. In intrusion detection, OneClassSVM can be used to identify network traffic that is anomalous or potentially malicious. In industrial systems, OneClassSVM can be used to identify faulty or abnormal behavior in machinery, which can help prevent equipment failure and downtime.