0% found this document useful (0 votes)
44 views33 pages

Kernel Methods in Machine Learning

This document provides an overview of high dimensional representation and kernel methods. It discusses the motivation for moving beyond PCA to kernel-based learning methods. It introduces kernel methods, including how kernels transform between distance and similarity, calculate similarity in transformed feature spaces, and define reproducing kernel Hilbert spaces. The kernel trick is described as a way to efficiently calculate inner products in high dimensional spaces. Common kernel functions like linear, polynomial, sigmoid, and Gaussian kernels are also outlined. Finally, support vector machines are presented as a main application of kernels, allowing nonlinear classification through kernels mapping data to high dimensional feature spaces.

Uploaded by

Hosam Hatim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views33 pages

Kernel Methods in Machine Learning

This document provides an overview of high dimensional representation and kernel methods. It discusses the motivation for moving beyond PCA to kernel-based learning methods. It introduces kernel methods, including how kernels transform between distance and similarity, calculate similarity in transformed feature spaces, and define reproducing kernel Hilbert spaces. The kernel trick is described as a way to efficiently calculate inner products in high dimensional spaces. Common kernel functions like linear, polynomial, sigmoid, and Gaussian kernels are also outlined. Finally, support vector machines are presented as a main application of kernels, allowing nonlinear classification through kernels mapping data to high dimensional feature spaces.

Uploaded by

Hosam Hatim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Lecture 6

High dimensional representation


outline

1. Motivation

2. Kernel method
 Transformation between distance and
similarity measure
 Kernel similarity measurement
 Kernel and reproducing kernel Hilbert space
(RKHS)
 Kernel trick
 Kernel functions
Motivations
Motivation
• The performance of machine learning methods is heavily influenced by the
different forms of data representation on which they are applied.

• PCA method is based on keeping only the eigenvectors that encodes the most
variation among the data.

• However, the PCA is not always good enough for learning representation of the
data.

• The PCA is purely second order representation. However, more information in


the natural images is in the higher order of the data.

• A revolution in pattern analysis has occurred with the introducing of kernel-


based learning methods.
Kernel Method
• Transformation between distance and
similarity measure
• The measure of distance is an important routine in data
processing and analysis.

• One of the mostly used dissimilarity measure is the Euclidean


distance. It is defined as the L2 norm (square root of the
vector inner product) of the difference of the two vectors or
two points.

• If the similarity is interpreted as a covariance, then is the


Euclidean distance could be written as a similarity matrix.

So a concept called kernel rises up, which it


considered as a transformation between
distance and similarity matrix.
• If the covariance is of The
the form:
kernel concepts becomes a basic for a
number of algorithms in machine learning
• Kernel similarity measurement
Kernel method comes up with a different idea
for similarity measurement. And the difference
is that, the kernel calculates distance in the
space of transformed feature.

Given the transformation , that maps the data


from original feature space to some higher
dimensional feature space.

As shown in the Figure , takes points xi and xj


mapped them into a Gaussian centered on xi,
and xj, respectively

Graphical illustration of the feature space of the Gaussian kernel


•• The kernel is the same as a dot product of mapped features.
The kernel gives large number value if the two inputs are
similar, whereas in contrast low value if the inputs are
dissimilar.

• Distance in transformed feature space is computed as the


following:
• Kernel and reproducing kernel Hilbert space
(RKHS)
Riesz’ representation theorem
• Riesz’ representation theorem tells whenever, there is a linear
continuous function (f(x)) it can be represented as a dot
product with other some element of Hilbert space (H).

• The H can be defined as an inner product space.

• The Riesz’ representation theorem states that, there is an


element can be written as:

• Using Riesz’ representation theorem, the kernel
can be defined as a reproducing kernel Hilbert
space (RKHS) if:

• Given kernel , one can construct the RKHS as the


completion of the space of functions spanned by
the set with an inner product defined as follows.

• Note that
•• Testing that is an inner product is by checking the following
conditions:
1. Symmetry

2. Positive definiteness

it is a dot product of the vector


with itself > 0
Summary
• So as long as we define a kernel function and construct the
kernel matrix and it is positive definite kernel.

• It means we could find a mapping such that it is possible to


rewrite the kernel function in term of inner product of the
mapped features.

• Conversely, for every RKHS there exists an associated


reproducing kernel which is symmetric and positive definite
(PD).
• Kernel trick
if is an extremely high dimensional, constructing the kernel need to:

• The extremely high dimensional feature vector
• Then computing the inner products in the feature space which seem
computationally inefficient and very expensive.

However, by using kernel trick, the need is just only

• Evaluating the kernel and knowing that there is a map and inner product.

• The evaluation of the kernel function is much easier than the computation
of the transformation of the feature followed by the inner product
computation.
Example
•• The basic idea of kernel trick is given in the following example.
The example shows that the inner products in the feature
space could be evaluated implicitly in the input space.
• Assuming there is a transformation mapping from original 2D
features to some higher three dimensional set of features
•• is needed just to compute
• Then O(n) is needed to compute the kernel which is inner
product in the feature space.

2
xi , x j = x i2 x 2j1 + 2 xi1 xi2 x j1 x j2 , xi22 x 2j2 = ( xi1 x j1 )2 + 2( xi1 x j1 xi2 x j2 ) + ( xi2 x j2 ) 2
1

• The and are the dot product terms taken in the input space.
• The is the dot product term taken in the input space raised to power of 2.
• So, just take the inner product between xi and xj which is O(n), then
square that and the kernel function is computed.
• In the above example examined only the 2D case, but the n
dimensional case is just generalization of the 2D case.

• So you have n dimensional xi and xj space vectors, then


calculate the dot product (get a single number) and raise it to
power of r. So we get a simple summation operation, and it
does not matter if the r is big or large, since we get the same
computation complexity.
• Kernel functions
• The typical kernel functions that can express the similarity
between the xi and xj are:

• Linear: (i.e there is no transformation, but just


it is computing the inner product of two input vectors)

• Polynomial:
• Just compute the inner product of these two vectors and put
into the exponent, where r is the parameter specifies the
maximum degree of the polynomial function.
• Sigmoid kernel:
Where, ћ and θ are the steepness and offset parameters, respectively.

• Laplacian radial basis function:

• Gaussian radial basis function:

• It is considered as one of the preferred kernel function, which


computes the Gaussian with square distance between xi and xj.

• It takes the points and mapped them into a Gaussian function centered
on the xi, and xj points.
Application of kernel

Support Vector Machine


• This section addresses one of the main applications of the
kernel method which explain why kernel method could be
useful. The SVM is one of the powerful classification
algorithms, which looking for a decision surface that separates
between the two groups of the data points.
• It is first appeared in the form of SVM which is one of the
powerful binary classification algorithms.

• For non-linearly separable data, the kernel method helps


researchers to build an efficient SVM linear classifier in high-
dimensional feature space.
Linear separable
Non linearly separable
Figure 1 The idea of kernel based SVM classifier.
• Figure 1 shows separable training data sets, it seem to it is
impossible to use a linear separator. Thus a more complicated
(curve instead of line) nonlinear classifier is needed.

• Applying the kernel trick is a way to create kernel based SVM


classifiers, and this allows the algorithm to separate the data
points using a hyper plane in a transformed features (higher
dimensional, 3D)
Projects
• Group 6

Handwriting Recognition

You might also like