0% found this document useful (0 votes)
20 views

Business Data Mining Week 6

Business Data Mining

Uploaded by

pm6566
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Business Data Mining Week 6

Business Data Mining

Uploaded by

pm6566
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Week 6 - LAQ's

Explain in detail about the types of support vector machine?


A support vector machine (SVM) is defined as a machine learning algorithm that uses
supervised learning models to solve complex classification, regression, and outlier detection
problems by performing optimal data transformations that determine boundaries between
data points based on predefined classes, labels, or outputs.

A support vector machine (SVM) is a machine learning algorithm that uses supervised
learning models to solve complex classification, regression, and outlier detection
problems by performing optimal data transformations that determine boundaries
between data points based on predefined classes, labels, or outputs. SVMs are widely
adopted across disciplines such as healthcare, natural language processing, signal
processing applications, and speech & image recognition fields.
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using
a decision boundary or hyperplane:

Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it
can learn about different features of cats and dogs, and then we test it with this strange creature.
So as support vector creates a decision boundary between these two data (cat and dog) and
choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis
of the support vectors, it will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Types of SVM
SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.

o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed as
non-linear data and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:


Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-
dimensional space, but we need to find out the best decision boundary that helps to classify the
data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means
if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there
are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.

How does SVM works?


Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have
a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We
want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors and
the hyperplane is called as margin. And the goal of SVM is to maximize this margin. The
hyperplane with maximum margin is called the optimal hyperplane.

Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:

So to separate these data points, we need to add one more dimension. For linear data, we have
used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be
calculated as:
z=x2+y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the below
image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert
it in 2d space with z=1, then it will become as:

Hence we get a circumference of radius 1 in case of non-linear data.

Technically, the primary objective of the SVM algorithm is to identify a hyperplane that
distinguishably segregates the data points of different classes. The hyperplane is localized in
such a manner that the largest margin separates the classes under consideration.

The support vector representation is shown in the figure below:

SVMs Optimize Margin Between Support Vectors or Classes

As seen in the above figure, the margin refers to the maximum width of the slice that runs
parallel to the hyperplane without any internal support vectors. Such hyperplanes are easier to
define for linearly separable problems; however, for real-life problems or scenarios, the SVM
algorithm tries to maximize the margin between the support vectors, thereby giving rise to
incorrect classifications for smaller sections of data points.

SVMs are potentially designed for binary classification problems. However, with the rise in
computationally intensive multiclass problems, several binary classifiers are constructed and
combined to formulate SVMs that can implement such multiclass classifications through
binary means.

In the mathematical context, an SVM refers to a set of ML algorithms that use kernel methods
to transform data features by employing kernel functions. Kernel functions rely on the process
of mapping complex datasets to higher dimensions in a manner that makes data point separation
easier. The function simplifies the data boundaries for non-linear problems by adding higher
dimensions to map complex data points.

While introducing additional dimensions, the data is not entirely transformed as it can act as a
computationally taxing process. This technique is usually referred to as the kernel trick,
wherein data transformation into higher dimensions is achieved efficiently and inexpensively.

The idea behind the SVM algorithm was first captured in 1963 by Vladimir N. Vapnik and
Alexey Ya. Chervonenkis. Since then, SVMs have gained enough popularity as they have
continued to have wide-scale implications across several areas, including the protein sorting
process, text categorization, facial recognition, autonomous cars, robotic systems, and so on.

Python Implementation of Support Vector Machine


Now we will implement the SVM algorithm using Python. Here we will use the same dataset
user_data, which we have used in Logistic regression and KNN classification.

o Data Pre-processing step


Till the Data pre-processing step, the code will remain the same. Below is the code:
After executing the above code, we will pre-process the data. The code will give the dataset as:

The scaled output for the test set will be:


Fitting the SVM classifier to the training set:
Now the training set will be fitted to the SVM classifier. To create the SVM classifier, we will
import SVC class from Sklearn.svm library. Below is the code for it:

In the above code, we have used kernel='linear', as here we are creating SVM for linearly
separable data. However, we can change it for non-linear data. And then we fitted the classifier
to the training dataset(x_train, y_train)
Output:

The model performance can be altered by changing the value of C(Regularization factor),
gamma, and kernel.

o Predicting the test set result:


Now, we will predict the output for test set. For this, we will create a new vector y_pred.
Below is the code for it:
HYPERLINK "https://www.javatpoint.com/machine-learning-support-vector-machine-

algorithm#"

After getting the y_pred vector, we can compare the result of y_pred and y_test to check the
difference between the actual value and predicted value.
Output: Below is the output for the prediction of the test set:

o Creating the confusion matrix:


Now we will see the performance of the SVM classifier that how many incorrect
predictions are there as compared to the Logistic regression classifier. To create the
confusion matrix, we need to import the confusion_matrix function of the sklearn library.
After importing the function, we will call it using a new variable cm. The function takes
two parameters, mainly y_true( the actual values) and y_pred (the targeted value return by
the classifier). Below is the code for it:

Output:

As we can see in the above output image, there are 66+24= 90 correct predictions and 8+2= 10
correct predictions. Therefore we can say that our SVM model improved as compared to the
Logistic regression model.
o Visualizing the training set result:
Now we will visualize the training set result, below is the code for it:
Output:
By executing the above code, we will get the output as:

As we can see, the above output is appearing similar to the Logistic regression output. In the
output, we got the straight line as hyperplane because we have used a linear kernel in the
classifier. And we have also discussed above that for the 2d space, the hyperplane in SVM is
a straight line.
o Visualizing the test set result:

Output:
By executing the above code, we will get the output as:
As we can see in the above output image, the SVM classifier has divided the users into two
regions (Purchased or Not purchased). Users who purchased the SUV are in the red region with
the red scatter points. And users who did not purchase the SUV are in the green region with
green scatter points. The hyperplane has divided the two classes into Purchased and not
purchased variable.

Support Vector Machine Terminology


1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data points
of different classes in a feature space. In the case of linear classifications, it will be a
linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the hyperplane, which
makes a critical role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane. The main
objective of the support vector machine algorithm is to maximize the margin. The wider
margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces, so, that the hyperplane can be
easily found out even if the data points are not linearly separable in the original input
space. Some of the common kernel functions are linear, polynomial, radial basis
function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits
a soft margin technique. Each data point has a slack variable introduced by the soft-
margin SVM formulation, which softens the strict margin requirement and permits certain
misclassifications or violations. It discovers a compromise between increasing the margin
and reducing violations.
7. C: Margin maximisation and misclassification fines are balanced by the regularisation
parameter C in SVM. The penalty for going over the margin or misclassifying data items
is decided by it. A stricter penalty is imposed with a greater value of C, which results in
a smaller margin and perhaps fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently formed
by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The dual
formulation enables the use of kernel tricks and more effective computing.

Mathematical intuition of Support Vector Machine


Consider a binary classification problem with two classes, labeled as +1 and -1. We have a
training dataset consisting of input feature vectors X and their corresponding class labels Y.
The equation for the linear hyperplane can be written as:

The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular
to the hyperplane. The parameter b in the equation represents the offset or distance of the
hyperplane from the origin along the normal vector w.
The distance between a data point x_i and the decision boundary can be calculated as:

where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the
normal vector W
For Linear SVM classifier :

Optimization:

 For Hard margin linear SVM classifier:


The target variable or label for the ith training instance is denoted by the symbol t i in this
statement. And t i=-1 for negative occurrences (when yi= 0) and ti=1positive instances (when
yi = 1) respectively. Because we require the decision boundary that satisfy the constraint:

 For Soft margin linear SVM classifier:

 Dual Problem: A dual Problem of the optimisation problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The
optimal Lagrange multipliers α(i) that maximize the following dual objective function

where,
 αi is the Lagrange multiplier associated with the ith training sample.
 K(xi, xj) is the kernel function that computes the similarity between two samples
xi and xj. It allows SVM to handle nonlinear classification problems by
implicitly mapping the samples into a higher-dimensional feature space.
 The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal Lagrange
multipliers and the support vectors once the dual issue has been solved and the
optimal Lagrange multipliers have been discovered. The training samples that have
i > 0 are the support vectors, while the decision boundary is supplied by:

Popular kernel functions in SVM


The SVM kernel is a function that takes low-dimensional input space and transforms it into
higher-dimensional space, ie it converts nonseparable problems to separable problems. It is
mostly useful in non-linear separation problems. Simply put the kernel, does some extremely
complex data transformations and then finds out the process to separate the data based on the
labels or outputs defined.
# Load the important packages
fromsklearn.datasets importload_breast_cancer
importmatplotlib.pyplot as plt
fromsklearn.inspection importDecisionBoundaryDisplay
fromsklearn.svm importSVC

# Load the datasets


cancer =load_breast_cancer()
X =cancer.data[:, :2]
y =cancer.target

#Build the model


svm =SVC(kernel="rbf", gamma=0.5, C=1.0)
# Trained the model
svm.fit(X, y)

# Plot Decision Boundary


DecisionBoundaryDisplay.from_estimator(
svm,
X,
response_method="predict",
cmap=plt.cm.Spectral,
alpha=0.8,
xlabel=cancer.feature_names[0],
ylabel=cancer.feature_names[1],
)

# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()
Output:
Breast Cancer Classifications with SVM RBF kernel

Advantages of support vector machine:


 Support vector machine works comparably well when there is an understandable margin
of dissociation between classes.
 It is more productive in high-dimensional spaces.
 It is effective in instances where the number of dimensions is larger than the number of
specimens.

 Support vector machine is comparably memory systematic. Support Vector Machine


(SVM) is a powerful supervised machine learning algorithm with several advantages.
Some of the main advantages of SVM include:
 Handling high-dimensional data: SVMs are effective in handling high-dimensional data,
which is common in many applications such as image and text classification.
 Handling small datasets: SVMs can perform well with small datasets, as they only require
a small number of support vectors to define the boundary.
 Modeling non-linear decision boundaries: SVMs can model non-linear decision
boundaries by using the kernel trick, which maps the data into a higher-dimensional space
where the data becomes linearly separable.
 Robustness to noise: SVMs are robust to noise in the data, as the decision boundary is
determined by the support vectors, which are the closest data points to the boundary.

 Generalization: SVMs have good generalization performance, which means that they are
able to classify new, unseen data well.

 Versatility: SVMs can be used for both classification and regression tasks, and it can be
applied to a wide range of applications such as natural language processing, computer
vision, and bioinformatics.

 Sparse solution: SVMs have sparse solutions, which means that they only use a subset of
the training data to make predictions. This makes the algorithm more efficient and less
prone to overfitting.
 Regularization: SVMs can be regularized, which means that the algorithm can be
modified to avoid overfitting.

Disadvantages of support vector machine:


 Support vector machine algorithm is not acceptable for large data sets.
 It does not execute very well when the data set has more sound i.e. target classes are
overlapping.
 In cases where the number of properties for each data point outstrips the number of
training data specimens, the support vector machine will underperform.
 As the support vector classifier works by placing data points, above and below the
classifying hyperplane there is no probabilistic clarification for the classification.Support
Vector Machine (SVM) is a powerful supervised machine learning algorithm, but it also
has some limitations and disadvantages. Some of the main disadvantages of SVM
include:
 Computationally expensive: SVMs can be computationally expensive for large datasets,
as the algorithm requires solving a quadratic optimization problem.
 Choice of kernel: The choice of kernel can greatly affect the performance of an SVM,
and it can be difficult to determine the best kernel for a given dataset.
 Sensitivity to the choice of parameters: SVMs can be sensitive to the choice of
parameters, such as the regularization parameter, and it can be difficult to determine the
optimal parameter values for a given dataset.
 Memory-intensive: SVMs can be memory-intensive, as the algorithm requires storing the
kernel matrix, which can be large for large datasets.
 Limited to two-class problems: SVMs are primarily used for two-class problems,
although multi-class problems can be solved by using one-versus-one or one-versus-all
strategies.
 Lack of probabilistic interpretation: SVMs do not provide a probabilistic interpretation
of the decision boundary, which can be a disadvantage in some applications.

 Not suitable for large datasets with many features: SVMs can be very slow and can
consume a lot of memory when the dataset has many features.

 Not suitable for datasets with missing values: SVMs requires complete datasets, with no
missing values, it can not handle missing values.

Applications of support vector machine:


1. Face observation – It is used for detecting the face according to the classifier and model.
2. Text and hypertext arrangement – In this, the categorization technique is used to find
important information or you can say required information for arranging text.
3. Grouping of portrayals – It is also used in the Grouping of portrayals for grouping or
you can say by comparing the piece of information and take an action accordingly.
4. Bioinformatics – It is also used for medical science as well like in laboratory, DNA,
research, etc.
5. Handwriting remembrance – In this, it is used for handwriting recognition.
6. Protein fold and remote homology spotting – It is used for spotting or you can say the
classification class into functional and structural classes given their amino acid
sequences. It is one of the problems in bioinformatics.
7. Generalized predictive control(GPC) – It is also used for Generalized predictive
control(GPC) for predicting and it relies on predictive control using a multilayer feed-
forward network as the plants linear model is presented.
8. Support Vector Machine (SVM) is a type of supervised machine learning algorithm that
can be used for classification and regression tasks. The idea behind SVM is to find the
best boundary (or hyperplane) that separates the different classes of data. This boundary
is chosen in such a way that it maximizes the margin, which is the distance between the
boundary and the closest data points from each class, also known as support vectors.
9. In the case of classification, the goal is to find a boundary that separates the different
classes of data as well as possible. The input data is plotted in a high-dimensional space
(with as many dimensions as the number of features), and the SVM algorithm finds the
best boundary that separates the classes.
10. In the case of regression, the goal is to find the best hyperplane that fits the data. Similar
to classification, the data is plotted in a high-dimensional space, but instead of trying to
separate the classes, the algorithm is trying to fit the data with the best hyperplane.
11. One of the main advantages of SVM is that it works well in high-dimensional spaces and
it’s relatively memory efficient. it also able to handle non-linearly separable data by
transforming them into a higher dimensional space where they become linear separable,
this is done by using kernel trick.
12. SVMs are not just limited to linear boundaries, it could also handle non-linear boundaries
by using kernel functions that allow us to map the input data into a higher-dimensional
space where it becomes linearly separable. The most commonly used kernel functions are
linear, polynomial and radial basis functions (RBF).
13. SVMs are popular in various applications such as image classification, natural language
processing, bioinformatics, and more.
14. Facial Expression Classification – Support vector machines (SVMs) is a binary
classification technique. The face Expression Classification model determines the precise
face expression by modeling differences between two facial images. Validation
techniques include the leave-one-out methods and the K-fold test methods.
15. Speech Recognition – The transcription of speech into text is called speech recognition.
Mel Frequency Cepstral Coefficients (MFCC)-based features are used to train Support
Vector Machines (SVM), which are used for figuring out speech. Speech recognition is a
challenging classification problem that is categorized using a variety of mathematical
techniques, including support vector machines, pattern recognition techniques, etc.

You might also like