0% found this document useful (0 votes)
14 views2 pages

Intro to Support Vector Machines

This chapter discusses Support Vector Machines (SVM), a popular classification method developed in the 1990s, which includes the maximal margin classifier and its extensions for broader applications. It explains the concept of hyperplanes and their role in defining optimal separating boundaries in classification tasks. The chapter also highlights the distinctions between the maximal margin classifier, support vector classifier, and support vector machine to avoid confusion in terminology.

Uploaded by

Mulyani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

Intro to Support Vector Machines

This chapter discusses Support Vector Machines (SVM), a popular classification method developed in the 1990s, which includes the maximal margin classifier and its extensions for broader applications. It explains the concept of hyperplanes and their role in defining optimal separating boundaries in classification tasks. The chapter also highlights the distinctions between the maximal margin classifier, support vector classifier, and support vector machine to avoid confusion in terminology.

Uploaded by

Mulyani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

9

Support Vector Machines

In this chapter, we discuss the support vector machine (SVM), an approach


for classification that was developed in the computer science community in
the 1990s and that has grown in popularity since then. SVMs have been
shown to perform well in a variety of settings, and are often considered one
of the best “out of the box” classifiers.
The support vector machine is a generalization of a simple and intu-
itive classifier called the maximal margin classifier , which we introduce in
Section 9.1. Though it is elegant and simple, we will see that this classifier
unfortunately cannot be applied to most data sets, since it requires that
the classes be separable by a linear boundary. In Section 9.2, we introduce
the support vector classifier , an extension of the maximal margin classifier
that can be applied in a broader range of cases. Section 9.3 introduces the
support vector machine, which is a further extension of the support vec-
tor classifier in order to accommodate non-linear class boundaries. Support
vector machines are intended for the binary classification setting in which
there are two classes; in Section 9.4 we discuss extensions of support vector
machines to the case of more than two classes. In Section 9.5 we discuss
the close connections between support vector machines and other statistical
methods such as logistic regression.
People often loosely refer to the maximal margin classifier, the support
vector classifier, and the support vector machine as “support vector
machines”. To avoid confusion, we will carefully distinguish between these
three notions in this chapter.

© Springer Science+Business Media, LLC, part of Springer Nature 2021 367


G. James et al., An Introduction to Statistical Learning, Springer Texts in Statistics,
https://doi.org/10.1007/978-1-0716-1418-1_9
368 9. Support Vector Machines

9.1 Maximal Margin Classifier


In this section, we define a hyperplane and introduce the concept of an
optimal separating hyperplane.

9.1.1 What Is a Hyperplane?


In a p-dimensional space, a hyperplane is a flat affine subspace of
hyperplane
dimension p − 1.1 For instance, in two dimensions, a hyperplane is a flat
one-dimensional subspace—in other words, a line. In three dimensions, a
hyperplane is a flat two-dimensional subspace—that is, a plane. In p > 3
dimensions, it can be hard to visualize a hyperplane, but the notion of a
(p − 1)-dimensional flat subspace still applies.
The mathematical definition of a hyperplane is quite simple. In two di-
mensions, a hyperplane is defined by the equation

β0 + β1 X1 + β 2 X2 = 0 (9.1)

for parameters β0 , β1 , and β2 . When we say that (9.1) “defines” the hyper-
plane, we mean that any X = (X1 , X2 )T for which (9.1) holds is a point
on the hyperplane. Note that (9.1) is simply the equation of a line, since
indeed in two dimensions a hyperplane is a line.
Equation 9.1 can be easily extended to the p-dimensional setting:

β0 + β1 X1 + β2 X2 + · · · + βp Xp = 0 (9.2)

defines a p-dimensional hyperplane, again in the sense that if a point X =


(X1 , X2 , . . . , Xp )T in p-dimensional space (i.e. a vector of length p) satisfies
(9.2), then X lies on the hyperplane.
Now, suppose that X does not satisfy (9.2); rather,

β0 + β1 X1 + β2 X2 + · · · + βp Xp > 0. (9.3)

Then this tells us that X lies to one side of the hyperplane. On the other
hand, if
β0 + β1 X1 + β2 X2 + · · · + βp Xp < 0, (9.4)

then X lies on the other side of the hyperplane. So we can think of the
hyperplane as dividing p-dimensional space into two halves. One can easily
determine on which side of the hyperplane a point lies by simply calculating
the sign of the left hand side of (9.2). A hyperplane in two-dimensional
space is shown in Figure 9.1.

1 The word affine indicates that the subspace need not pass through the origin.

You might also like