0% found this document useful (0 votes)

44 views33 pages

Kernel Methods in Machine Learning

This document provides an overview of high dimensional representation and kernel methods. It discusses the motivation for moving beyond PCA to kernel-based learning methods. It introduces kernel methods, including how kernels transform between distance and similarity, calculate similarity in transformed feature spaces, and define reproducing kernel Hilbert spaces. The kernel trick is described as a way to efficiently calculate inner products in high dimensional spaces. Common kernel functions like linear, polynomial, sigmoid, and Gaussian kernels are also outlined. Finally, support vector machines are presented as a main application of kernels, allowing nonlinear classification through kernels mapping data to high dimensional feature spaces.

Uploaded by

Hosam Hatim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views33 pages

Kernel Methods in Machine Learning

Uploaded by

Hosam Hatim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Lecture 6

High dimensional representation

outline

1. Motivation

2. Kernel method
 Transformation between distance and
similarity measure
 Kernel similarity measurement
 Kernel and reproducing kernel Hilbert space
(RKHS)
 Kernel trick
 Kernel functions
Motivations
Motivation
• The performance of machine learning methods is heavily influenced by the
different forms of data representation on which they are applied.

• PCA method is based on keeping only the eigenvectors that encodes the most
variation among the data.

• However, the PCA is not always good enough for learning representation of the
data.

• The PCA is purely second order representation. However, more information in

the natural images is in the higher order of the data.

• A revolution in pattern analysis has occurred with the introducing of kernel-

based learning methods.
Kernel Method
• Transformation between distance and
similarity measure
• The measure of distance is an important routine in data
processing and analysis.

• One of the mostly used dissimilarity measure is the Euclidean

distance. It is defined as the L2 norm (square root of the
vector inner product) of the difference of the two vectors or
two points.

• If the similarity is interpreted as a covariance, then is the

Euclidean distance could be written as a similarity matrix.
•

So a concept called kernel rises up, which it

considered as a transformation between
distance and similarity matrix.
• If the covariance is of The
the form:
kernel concepts becomes a basic for a
number of algorithms in machine learning
• Kernel similarity measurement
Kernel method comes up with a different idea
for similarity measurement. And the difference
is that, the kernel calculates distance in the
space of transformed feature.

Given the transformation , that maps the data

from original feature space to some higher
dimensional feature space.

As shown in the Figure , takes points xi and xj

mapped them into a Gaussian centered on xi,
and xj, respectively

Graphical illustration of the feature space of the Gaussian kernel

•• The kernel is the same as a dot product of mapped features.
The kernel gives large number value if the two inputs are
similar, whereas in contrast low value if the inputs are
dissimilar.

• Distance in transformed feature space is computed as the

following:
• Kernel and reproducing kernel Hilbert space
(RKHS)
Riesz’ representation theorem
• Riesz’ representation theorem tells whenever, there is a linear
continuous function (f(x)) it can be represented as a dot
product with other some element of Hilbert space (H).

• The H can be defined as an inner product space.

• The Riesz’ representation theorem states that, there is an

element can be written as:
•
• Using Riesz’ representation theorem, the kernel
can be defined as a reproducing kernel Hilbert
space (RKHS) if:

• Given kernel , one can construct the RKHS as the

completion of the space of functions spanned by
the set with an inner product defined as follows.
•

• Note that
•• Testing that is an inner product is by checking the following
conditions:
1. Symmetry

2. Positive definiteness

it is a dot product of the vector

with itself > 0
Summary
• So as long as we define a kernel function and construct the
kernel matrix and it is positive definite kernel.

• It means we could find a mapping such that it is possible to

rewrite the kernel function in term of inner product of the
mapped features.

• Conversely, for every RKHS there exists an associated

reproducing kernel which is symmetric and positive definite
(PD).
• Kernel trick
if is an extremely high dimensional, constructing the kernel need to:
•
• The extremely high dimensional feature vector
• Then computing the inner products in the feature space which seem
computationally inefficient and very expensive.

However, by using kernel trick, the need is just only

• Evaluating the kernel and knowing that there is a map and inner product.

• The evaluation of the kernel function is much easier than the computation
of the transformation of the feature followed by the inner product
computation.
Example
•• The basic idea of kernel trick is given in the following example.
The example shows that the inner products in the feature
space could be evaluated implicitly in the input space.
• Assuming there is a transformation mapping from original 2D
features to some higher three dimensional set of features
•• is needed just to compute
• Then O(n) is needed to compute the kernel which is inner
product in the feature space.

2
xi , x j = x i2 x 2j1 + 2 xi1 xi2 x j1 x j2 , xi22 x 2j2 = ( xi1 x j1 )2 + 2( xi1 x j1 xi2 x j2 ) + ( xi2 x j2 ) 2
1

• The and are the dot product terms taken in the input space.
• The is the dot product term taken in the input space raised to power of 2.
• So, just take the inner product between xi and xj which is O(n), then
square that and the kernel function is computed.
• In the above example examined only the 2D case, but the n
dimensional case is just generalization of the 2D case.

• So you have n dimensional xi and xj space vectors, then

calculate the dot product (get a single number) and raise it to
power of r. So we get a simple summation operation, and it
does not matter if the r is big or large, since we get the same
computation complexity.
• Kernel functions
• The typical kernel functions that can express the similarity
between the xi and xj are:

• Linear: (i.e there is no transformation, but just

it is computing the inner product of two input vectors)

• Polynomial:
• Just compute the inner product of these two vectors and put
into the exponent, where r is the parameter specifies the
maximum degree of the polynomial function.
• Sigmoid kernel:
Where, ћ and θ are the steepness and offset parameters, respectively.

• Laplacian radial basis function:

• Gaussian radial basis function:

• It is considered as one of the preferred kernel function, which

computes the Gaussian with square distance between xi and xj.

• It takes the points and mapped them into a Gaussian function centered
on the xi, and xj points.
Application of kernel

Support Vector Machine

• This section addresses one of the main applications of the
kernel method which explain why kernel method could be
useful. The SVM is one of the powerful classification
algorithms, which looking for a decision surface that separates
between the two groups of the data points.
• It is first appeared in the form of SVM which is one of the
powerful binary classification algorithms.

• For non-linearly separable data, the kernel method helps

researchers to build an efficient SVM linear classifier in high-
dimensional feature space.
Linear separable
Non linearly separable
Figure 1 The idea of kernel based SVM classifier.
• Figure 1 shows separable training data sets, it seem to it is
impossible to use a linear separator. Thus a more complicated
(curve instead of line) nonlinear classifier is needed.

• Applying the kernel trick is a way to create kernel based SVM

classifiers, and this allows the algorithm to separate the data
points using a hyper plane in a transformed features (higher
dimensional, 3D)
Projects
• Group 6

Handwriting Recognition

Camps-Valls, Martínez-Ramón, Rojo-Álvarez - 2009 - Kernal Methods
No ratings yet
Camps-Valls, Martínez-Ramón, Rojo-Álvarez - 2009 - Kernal Methods
5 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
Data An-6
No ratings yet
Data An-6
36 pages
Understanding Kernel Methods in ML
No ratings yet
Understanding Kernel Methods in ML
5 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
25 pages
Kernel Methods in Computational Biology
No ratings yet
Kernel Methods in Computational Biology
42 pages
Kernel Trick
No ratings yet
Kernel Trick
40 pages
ML Mod 4
No ratings yet
ML Mod 4
26 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
Machine Learning: Kernel Methods
No ratings yet
Machine Learning: Kernel Methods
6 pages
Lec 16
No ratings yet
Lec 16
23 pages
Kernel Methods for Pattern Analysis
No ratings yet
Kernel Methods for Pattern Analysis
77 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
Ds 11
No ratings yet
Ds 11
21 pages
The Representation of Similarities in Linear Spaces
No ratings yet
The Representation of Similarities in Linear Spaces
17 pages
5th Unit ML
No ratings yet
5th Unit ML
40 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
Note KT 1
No ratings yet
Note KT 1
5 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Lecture 8 - Kernels
No ratings yet
Lecture 8 - Kernels
32 pages
Lec5 SVM Kernel SoftMargin
No ratings yet
Lec5 SVM Kernel SoftMargin
44 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Kernel Models for Data Scientists
No ratings yet
Kernel Models for Data Scientists
56 pages
Kernels Spaces
No ratings yet
Kernels Spaces
1 page
A Primer On Reproducing Kernel Hilbert Spaces
No ratings yet
A Primer On Reproducing Kernel Hilbert Spaces
133 pages
Machine Learning Kernels Guide
No ratings yet
Machine Learning Kernels Guide
31 pages
Kernel Methods for Nonlinear Models
No ratings yet
Kernel Methods for Nonlinear Models
15 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Lecture4 introToRKHS
No ratings yet
Lecture4 introToRKHS
33 pages
Kernel Methods for Nonlinear Regression
No ratings yet
Kernel Methods for Nonlinear Regression
23 pages
DVT Unit-Iv
No ratings yet
DVT Unit-Iv
31 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
Scribe
No ratings yet
Scribe
4 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
53 pages
Kernel Methods for Statisticians
No ratings yet
Kernel Methods for Statisticians
53 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
3 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
SVM 4
No ratings yet
SVM 4
8 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
121 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
Kernel Methods For Pattern Analysis
100% (3)
Kernel Methods For Pattern Analysis
478 pages
KernelTrick PDF
No ratings yet
KernelTrick PDF
4 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
760 pages
Kernel Approximation for ML Experts
No ratings yet
Kernel Approximation for ML Experts
35 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
Polynomial Kernels in SVM Learning
No ratings yet
Polynomial Kernels in SVM Learning
6 pages
Banach Spaces and Hilbert Spaces in Machine Learning Theory
No ratings yet
Banach Spaces and Hilbert Spaces in Machine Learning Theory
33 pages
Subspace & Kernel Methods Overview
100% (1)
Subspace & Kernel Methods Overview
12 pages
Solutions To The Exercises On The Kernel Trick
No ratings yet
Solutions To The Exercises On The Kernel Trick
3 pages
Nyström Method for Gram Matrix Approximation
No ratings yet
Nyström Method for Gram Matrix Approximation
23 pages
25 Ricin Isolate
No ratings yet
25 Ricin Isolate
7 pages
RV-4A, 5AJ, 3AL, 4AJL - Instruction Manual (Arm Setup, Maintenance CR2A-572 Controller) BFP-A8229-A (02.02)
No ratings yet
RV-4A, 5AJ, 3AL, 4AJL - Instruction Manual (Arm Setup, Maintenance CR2A-572 Controller) BFP-A8229-A (02.02)
82 pages
JARiTS №14, v.2, с.190-196
No ratings yet
JARiTS №14, v.2, с.190-196
13 pages
Mandelbrot & Julia Sets via Jungck-CR
No ratings yet
Mandelbrot & Julia Sets via Jungck-CR
10 pages
Sounds We Cannot Hear
No ratings yet
Sounds We Cannot Hear
159 pages
Effective Use of Transfer Plates
No ratings yet
Effective Use of Transfer Plates
12 pages
Tribological Performance of Carbon Nanotubes-Alumina Hybrid/epoxy Composites
No ratings yet
Tribological Performance of Carbon Nanotubes-Alumina Hybrid/epoxy Composites
6 pages
HEMPATHANE HS 55610 Product Data Sheet
No ratings yet
HEMPATHANE HS 55610 Product Data Sheet
2 pages
Synthesis of Enantiomeric Azabicyclo Esters
No ratings yet
Synthesis of Enantiomeric Azabicyclo Esters
7 pages
Components of Laminated Glass
No ratings yet
Components of Laminated Glass
3 pages
A Shear Locking Free Six-Node Mindlin Plate Bending Element
No ratings yet
A Shear Locking Free Six-Node Mindlin Plate Bending Element
5 pages
Surface Finish Measurement Guide
0% (1)
Surface Finish Measurement Guide
22 pages
4 Seq Sep Train 2
No ratings yet
4 Seq Sep Train 2
57 pages
Mixtures and Solutions Explained
100% (1)
Mixtures and Solutions Explained
4 pages
Ivantsov's Dendrite Growth Analysis
No ratings yet
Ivantsov's Dendrite Growth Analysis
2 pages
Deltalight The Lighting Bible 12 Lb12
No ratings yet
Deltalight The Lighting Bible 12 Lb12
1,018 pages
Soft Lubrication Characteristics of Microparticulated Whey Proteins Used As Fat PDF
No ratings yet
Soft Lubrication Characteristics of Microparticulated Whey Proteins Used As Fat PDF
35 pages
Environmental Trend Analysis Guide
No ratings yet
Environmental Trend Analysis Guide
19 pages
T220 Testing Handsheets
No ratings yet
T220 Testing Handsheets
6 pages
Redox Reactions and Balancing Guide
No ratings yet
Redox Reactions and Balancing Guide
31 pages
Dynamic Theory of Tides Explained
No ratings yet
Dynamic Theory of Tides Explained
6 pages
Soil Consolidation Theory
No ratings yet
Soil Consolidation Theory
18 pages
Understanding Friction in Physics
No ratings yet
Understanding Friction in Physics
2 pages
Zoetemeijer 1974 T Stub Analysis
No ratings yet
Zoetemeijer 1974 T Stub Analysis
59 pages
Liquid State Theory for Engineers
100% (1)
Liquid State Theory for Engineers
79 pages
Interfaces For Lc-Ms
No ratings yet
Interfaces For Lc-Ms
21 pages
Big Bang Theory: Modern Cosmology Insights
No ratings yet
Big Bang Theory: Modern Cosmology Insights
16 pages
Parameter List NC - PMC (FO-3015 160il)
No ratings yet
Parameter List NC - PMC (FO-3015 160il)
55 pages
FLAC3D Plastic Hardening Model - Itasca Consulting Group
No ratings yet
FLAC3D Plastic Hardening Model - Itasca Consulting Group
14 pages
Analysis of Cylindrical Water Tanks - Wind or Eq
No ratings yet
Analysis of Cylindrical Water Tanks - Wind or Eq
5 pages