• Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. • However, primarily, it is used for Classification problems in Machine Learning. What is SVM? • An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. • In addition to performing linear classification, SVMs can efficiently perform a nonlinear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. What is SVM? A support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks like outliers detection What is SVM? How does SVM works? • Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue • So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be multiple lines that can separate these classes. • Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B and C). Now, identify the right hyper-plane to classify star and circle.
• You need to remember a thumb rule to identify the right
hyper-plane: • “Select the hyper-plane which segregates the two classes better • Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B and C) and all are segregating the classes well. Now, How can we identify the right hyper-plane? • Here, maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as Margin. Let’s look at the below snapshot: • Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These points are called support vectors. The distance between the vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane. • Above, you can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-classification • Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM. • Support Vectors: The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector. Types of SVM • SVM can be of two types: • Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. • Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. Advantages of Support Vector algorithm • Support vector machine is very effective even with high dimensional data. • When you have a data set where number of features is more than the number of rows of data, SVM can perform in that case as well. • When classes in the data are points are well separated SVM works really well. • SVM can be used for both regression and classification problem. Disadvantages of Support Vector Machine (SVM) • 1. Choosing an appropriate Kernel function is difficult: Choosing an appropriate Kernel function (to handle the non-linear data) is not an easy task. It could be tricky and complex. In case of using a high dimension Kernel, you might generate too many support vectors which reduce the training speed drastically. 2. Extensive memory requirement: Algorithmic complexity and memory requirements of SVM are very high. You need a lot of memory since you have to store all the support vectors in the memory and this number grows abruptly with the training dataset size. 3. Requires Feature Scaling: One must do feature scaling of variables before applying SVM. 4. Long training time: SVM takes a long training time on large datasets. 5. Difficult to interpret: SVM model is difficult to understand and interpret by human beings unlike Decision Trees. Applications of Support Vector Machine Face detection – SVM classify parts of the image as a face and nonface and create a square boundary around the face. • Text and hypertext categorization – SVMs allow Text and hyper text categorization for both inductive and transductive models. They use training data to classify documents into different categories. It categorizes on the basis of the score generated and then compares with the threshold value. • Classification of images – Use of SVMs provides better search accuracy for image classification. It provides better accuracy in comparison to the traditional query based searching techniques. Applications of Support Vector Machine • • Bioinformatics – It includes protein classification and cancer classification. We use SVM for identifying the classification of genes, patients on the basis of genes and other biological problems. • Protein fold and remote homology detection – Apply SVM algorithms for protein remote homology detection. • Handwriting recognition – We use SVMs to recognize hand written characters used widely. Exercise Excercise SVM Kernel • Kernel plays a vital role in classification and is used to analyse some patterns in the given dataset. • They are very helpful in solving a non-linear problem by using a linear classifier. • Kernels help us to deal with high dimensional data in a very efficient manner • Kernels are a way to solve non-linear problems with the help of linear classifiers. This is known as the kernel trick method. • Kernel Function is a method used to take data as input and transform into the required form of processing data. SVM Kernel • The kernel functions are used as parameters in the SVM codes. They help to determine the shape of the hyperplane and decision boundary. • The value can be any type of kernel from linear to polynomial. If the value of the kernel is linear then the decision boundary would be linear and two-dimensional. • These kernel functions also help in giving decision boundaries for higher dimensions. Types of kernel • These functions are of different kinds—for instance, linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. • The most preferred kind of kernel function is RBF. Because it's localized and has a finite response along the complete x-axis. • The kernel functions return the scalar product between two points in an exceedingly suitable feature space. SVM Kernel Linear Kernel • It is the most basic type of kernel, usually one dimensional in nature. It proves to be the best function when there are lots of features. • The linear kernel is mostly preferred for text-classification problems as most of these kinds of classification problems can be linearly separated.
• Linear kernel functions are faster than other functions.
• Linear Kernel Formula
• K(xi, xj) = sum( xi.xj)
• Here, xi, xj represents the data you’re trying to classify.
SVM Kernel • Polynomial Kernel • It is a more generalized representation of the linear kernel. It is not as preferred as other kernel functions as it is less efficient and accurate.
• Polynomial Kernel Formula
• k(xi, xj) = (x.xj+1)^d
• Here ‘.’ shows the dot product of both the values, and d denotes the degree.
• k(xi, xj) representing the decision boundary to separate the given
classes. SVM Kernel • Gaussian Radial Basis Function (RBF) • It is one of the most preferred and used kernel functions in svm. • It is usually chosen for non-linear data. It helps to make proper separation when there is no prior knowledge of data.
• Gaussian Radial Basis Formula
• k(xi, xj) = exp(-gamma * ||xi - xj||^2)
• The value of gamma varies from 0 to 1.
• You have to manually provide the value of gamma in the code. The most preferred value for gamma is 0.1. SVM Kernel • Sigmoid Kernel • It is mostly preferred for neural networks. This kernel function is similar to a two-layer perceptron model of the neural network, which works as an activation function for neurons.
• It can be shown as,
• Sigmoid Kenel Function
• k(xi, xj) = tanh(αxay + c) SVM Kernel • Gaussian Kernel • It is a commonly used kernel. It is used when there is no prior knowledge of a given dataset. • Gaussian Kernel Formula SVM Kernel • The linear kernel is mostly preferred for text classification problems as it performs well for large datasets. • Gaussian kernels tend to give good results when there is no additional information regarding data that is not available. • Rbf kernel is also a kind of Gaussian kernel which projects the high dimensional data and then searches a linear separation for it. • Polynomial kernels give good results for problems where all the training data is normalized.