Business Data Mining Week 6
Business Data Mining Week 6
A support vector machine (SVM) is a machine learning algorithm that uses supervised
learning models to solve complex classification, regression, and outlier detection
problems by performing optimal data transformations that determine boundaries
between data points based on predefined classes, labels, or outputs. SVMs are widely
adopted across disciplines such as healthcare, natural language processing, signal
processing applications, and speech & image recognition fields.
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using
a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it
can learn about different features of cats and dogs, and then we test it with this strange creature.
So as support vector creates a decision boundary between these two data (cat and dog) and
choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis
of the support vectors, it will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed as
non-linear data and classifier used is called as Non-linear SVM classifier.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors and
the hyperplane is called as margin. And the goal of SVM is to maximize this margin. The
hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have
used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be
calculated as:
z=x2+y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the below
image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert
it in 2d space with z=1, then it will become as:
Technically, the primary objective of the SVM algorithm is to identify a hyperplane that
distinguishably segregates the data points of different classes. The hyperplane is localized in
such a manner that the largest margin separates the classes under consideration.
As seen in the above figure, the margin refers to the maximum width of the slice that runs
parallel to the hyperplane without any internal support vectors. Such hyperplanes are easier to
define for linearly separable problems; however, for real-life problems or scenarios, the SVM
algorithm tries to maximize the margin between the support vectors, thereby giving rise to
incorrect classifications for smaller sections of data points.
SVMs are potentially designed for binary classification problems. However, with the rise in
computationally intensive multiclass problems, several binary classifiers are constructed and
combined to formulate SVMs that can implement such multiclass classifications through
binary means.
In the mathematical context, an SVM refers to a set of ML algorithms that use kernel methods
to transform data features by employing kernel functions. Kernel functions rely on the process
of mapping complex datasets to higher dimensions in a manner that makes data point separation
easier. The function simplifies the data boundaries for non-linear problems by adding higher
dimensions to map complex data points.
While introducing additional dimensions, the data is not entirely transformed as it can act as a
computationally taxing process. This technique is usually referred to as the kernel trick,
wherein data transformation into higher dimensions is achieved efficiently and inexpensively.
The idea behind the SVM algorithm was first captured in 1963 by Vladimir N. Vapnik and
Alexey Ya. Chervonenkis. Since then, SVMs have gained enough popularity as they have
continued to have wide-scale implications across several areas, including the protein sorting
process, text categorization, facial recognition, autonomous cars, robotic systems, and so on.
In the above code, we have used kernel='linear', as here we are creating SVM for linearly
separable data. However, we can change it for non-linear data. And then we fitted the classifier
to the training dataset(x_train, y_train)
Output:
The model performance can be altered by changing the value of C(Regularization factor),
gamma, and kernel.
algorithm#"
After getting the y_pred vector, we can compare the result of y_pred and y_test to check the
difference between the actual value and predicted value.
Output: Below is the output for the prediction of the test set:
Output:
As we can see in the above output image, there are 66+24= 90 correct predictions and 8+2= 10
correct predictions. Therefore we can say that our SVM model improved as compared to the
Logistic regression model.
o Visualizing the training set result:
Now we will visualize the training set result, below is the code for it:
Output:
By executing the above code, we will get the output as:
As we can see, the above output is appearing similar to the Logistic regression output. In the
output, we got the straight line as hyperplane because we have used a linear kernel in the
classifier. And we have also discussed above that for the 2d space, the hyperplane in SVM is
a straight line.
o Visualizing the test set result:
Output:
By executing the above code, we will get the output as:
As we can see in the above output image, the SVM classifier has divided the users into two
regions (Purchased or Not purchased). Users who purchased the SUV are in the red region with
the red scatter points. And users who did not purchase the SUV are in the green region with
green scatter points. The hyperplane has divided the two classes into Purchased and not
purchased variable.
The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular
to the hyperplane. The parameter b in the equation represents the offset or distance of the
hyperplane from the origin along the normal vector w.
The distance between a data point x_i and the decision boundary can be calculated as:
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the
normal vector W
For Linear SVM classifier :
Optimization:
Dual Problem: A dual Problem of the optimisation problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The
optimal Lagrange multipliers α(i) that maximize the following dual objective function
where,
αi is the Lagrange multiplier associated with the ith training sample.
K(xi, xj) is the kernel function that computes the similarity between two samples
xi and xj. It allows SVM to handle nonlinear classification problems by
implicitly mapping the samples into a higher-dimensional feature space.
The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal Lagrange
multipliers and the support vectors once the dual issue has been solved and the
optimal Lagrange multipliers have been discovered. The training samples that have
i > 0 are the support vectors, while the decision boundary is supplied by:
# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()
Output:
Breast Cancer Classifications with SVM RBF kernel
Generalization: SVMs have good generalization performance, which means that they are
able to classify new, unseen data well.
Versatility: SVMs can be used for both classification and regression tasks, and it can be
applied to a wide range of applications such as natural language processing, computer
vision, and bioinformatics.
Sparse solution: SVMs have sparse solutions, which means that they only use a subset of
the training data to make predictions. This makes the algorithm more efficient and less
prone to overfitting.
Regularization: SVMs can be regularized, which means that the algorithm can be
modified to avoid overfitting.
Not suitable for large datasets with many features: SVMs can be very slow and can
consume a lot of memory when the dataset has many features.
Not suitable for datasets with missing values: SVMs requires complete datasets, with no
missing values, it can not handle missing values.