COMP 5523 Lecture 512
Computer Vision and Image Processing
Local Invariant Feature
Fall, 2024
Instructor: WU, Xiao-Ming
For internal use only,
please do not distribute!
Outline
• Local invariant features
• Harris corner detection
• Blob detection with LoG
• SIFT Algorithm
Local invariant feature detection
Previously: Features and filters
Transforming and describing images; colors, edges
Slide credit: Kristen Grauman
Now: Multiple views
Matching, invariant features, instance recognition
Lowe
Fei-Fei Li
Slide credit: Kristen Grauman
Panorama Stitching
• We have two images – how do we combine them?
Step 1: extract key points
Step 2: match key point features
Step 3: align images
Where are the corresponding points?
NASA Mars Rover images
Image matching
NASA Mars Rover images
with SIFT feature matches
Figure by Noah Snavely
https://www.youtube.com/watch?v=82jjFq303UY
https://www.youtube.com/watch?v=rDVW2_NgyAs
https://www.youtube.com/watch?v=KgsHoJYJ4S8&list=PL2zRqk16wsdqXEMpHrc4Qnb5rA1Cylrhx&index=12
Important tool for multiple views: Local features
Multi-view matching relies on local feature correspondences.
How to detect which local features to match?
Local features: main components
1) Detection: Identify the
interest points
2) Description: Extract vector x1 [ x1(1) , , xd(1) ]
feature descriptor surrounding
each interest point.
x 2 [ x1( 2) , , xd( 2 ) ]
3) Matching: Determine
correspondence between
descriptors in two views
Slide credit: Kristen Grauman
Local features: desired properties
• Repeatability
• The same feature can be found in several images despite geometric and
photometric transformations
• Distinctiveness
• Each feature has a distinctive description
• Compactness and efficiency
• Many fewer features than image pixels
• Locality
• A feature occupies a relatively small area of the image; robust to clutter
and occlusion
Geometric transformations
e.g., scale,
translation,
rotation
Photometric transformations
e.g., illumination
change, shadows,
highlights.
Figure from T. Tuytelaars ECCV 2006 tutorial
Goal: interest operator repeatability
• We want to detect (at least some of) the same points in both images.
No chance to find true matches!
• Yet we have to be able to run the detection procedure independently
per image, which can find some of the same points.
Goal: descriptor distinctiveness
• We want to be able to reliably determine which
point goes with which.
?
• Must provide some invariance to geometric and
photometric differences between the two views.
What is an interest point?
Some patches are not interesting
Are lines/edges interesting?
Pick a point in the image.
Find it again in the next image.
What type of feature would you select?
Pick a point in the image.
Find it again in the next image.
What type of feature would you select?
Pick a point in the image.
Find it again in the next image.
What type of feature would you select?
A corner!
What points would you choose?
Slide credit: Kristen Grauman
Detecting Corners
9300 Harris Corners Pkwy, Charlotte, NC
Slide credit: Kristen Grauman
Are Blobs Interesting?
Blob Detection
• For a Blob-like Feature to be
useful, we need to:
• Locate the blob
• Determine its size
• Determine its orientation
• Formulate a description or
signature that is independent of
size and orientation
https://www.youtube.com/watch?v=wcqbiHonfbo&list=PL2zRqk16wsdqXEMpHrc4Qnb5rA1Cylrhx&index=13
Outline
• Local invariant features
• Harris corner detection
• Blob detection with LoG
• SIFT Algorithm
Corners as distinctive interest points
• We should easily recognize the point by looking through a small window
• Shifting a window in any direction should give a large change in intensity
“flat” “edge”: “corner”:
region: no change significant
no change along the change in all
in all edge direction directions
directions
Slide credit: Alyosha Efros, Darya Frolova, Denis Simakov
Corners as distinctive interest points
Corner Detection: Derivation
Change in appearance of window W for the shift [u,v]:
I(x, y)
E(u, v)
E(3,2)
Corner Detection: Derivation
Change in appearance of window W for the shift [u,v]:
I(x, y)
E(u, v)
E(0,0)
Corner Detection: Derivation
Change in appearance of window W for the shift [u,v]:
We want to find out how this function behaves for small shifts
E(u, v)
Corner Detection: Derivation
First-order Taylor approximation for small motions [u, v]:
I ( x u , y v ) I ( x, y ) I x u I y v
Let’s plug this into E(u,v):
I I I I
Notation: Ix Iy IxI y
x y x y
∑ 𝐼 𝑥 𝐼 𝑦 =sum( .* )
𝑥,𝑦
array of x gradients array of y gradients
Corner Detection: Derivation
E(u,v) can be locally approximated by a quadratic surface:
Second moment matrix M
What does this matrix reveal?
First, consider an axis-aligned corner:
What does this matrix reveal?
First, consider an axis-aligned corner:
I 2
I x I y 1 0
M
x
2
I x I y I y 0 2
This means dominant gradient directions align with x or y axis
Look for locations where both λ’s are large.
If either λ is close to 0, then this is not corner-like.
What if we have a corner that is not aligned with the image axes?
What does this matrix reveal?
1 0 T
Since M is symmetric, we have M X X
0 2
Mxi i xi
The eigenvalues of M reveal the amount of intensity change in the
two principal orthogonal gradient directions in the window.
Note: please refer to Singular Value Decomposition (SVD) for detailed derivation.
Corner response function
“edge”: “corner”: “flat”
region
1 >> 2 1 and 2 are large, 1 and 2
2 >> 1 1 ~ 2; are small;
Cornerness score
(other variants possible)
Corner Detection: Derivation
E(u,v) can be locally approximated by a quadratic surface:
E(u, v)
In which directions does this surface
have the fastest/slowest change?
Interpreting the second moment matrix
A horizontal “slice” of E(u, v) is given by the equation of an ellipse:
u
[u v] M const
v
Interpreting the second moment matrix
Consider the axis-aligned case (gradients are either horizontal
or vertical):
I x2 I Ix y
x, y x, y a 0
M 2
IxI y I y 0 b
Minor axis
x, y x, y
a-1/2
Major axis
b-1/2
Which surface indicates a good image feature?
What kind of image patch do these surfaces represent?
Which surface indicates a good image feature?
flat edge corner
‘line’ ‘dot’
Harris corner detector
1) Compute matrix M for each image window to get
their cornerness scores.
2) Find points whose surrounding window gave large
corner response (f > threshold)
3) Take the points of local maxima, i.e., perform non-
maximum suppression
Harris Detector: Steps
Harris Detector: Steps
Compute corner response f
Harris Detector: Steps
Find points with large corner response: f > threshold
Harris Detector: Steps
Take only the points of local maxima of f
Harris Detector: Steps
https://www.youtube.com/watch?v=Z_HwkG90Yvw&list=PL2zRqk16wsdqXEMpHrc4Qnb5rA1Cylrhx&index=7
Properties of the Harris corner detector
• Rotation invariant? 1 0 T
Yes M X X
0 2
• Scale invariant? No
All points will be classified as edges Corner !
Automatic Scale Selection
How to find corresponding patch sizes?
K. Grauman, B. Leibe
Automatic Scale Selection
Intuition:
• Find scale that gives local maxima of some function f in both position and scale.
f f
Image 1 Image 2
s1 region size s2 region size
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f ( I i1im ( x, )) f ( I i1im ( x, ))
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f ( I i1im ( x, )) f ( I i1im ( x, ))
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f ( I i1im ( x, )) f ( I i1im ( x, ))
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f ( I i1im ( x, )) f ( I i1im ( x, ))
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f ( I i1im ( x, )) f ( I i1im ( x, ))
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f ( I i1im ( x, )) f ( I i1im ( x, ))
K. Grauman, B. Leibe
Outline
• Local invariant features
• Harris corner detection
• Blob detection with LoG
• SIFT Algorithm
Recall: First Derivative Filters
• Sharp changes in gray level of the input
image correspond to “peaks or valleys”
of the first-derivative of the input signal.
F(x)
F
’(x)
x
(1D example)
Slide from Robert Collins CSE486
Second-Derivative Filters
• Peaks or valleys of the first-derivative of
the input signal, correspond to “zero-
crossings” of the second-derivative of the
input signal.
F(x) F F’’(x
’(x) )
Slide from Robert Collins CSE486
1D Gaussian and Derivatives
2
x
g(x) e 2 2
1 x
2 x x2
g'(x) 2xe 2 2
2 2
2
2
2
x2
e 2
x34 1
g''(x) ( 2 )e 2 2
Slide from Robert Collins CSE486
Laplacian of Gaussian
• Laplacian of Gaussian (LoG): Circularly symmetric operator for
blob detection in 2D
2 2
2 g g
g 2 2
x y
Examples of Laplacian and LoG Filters
Laplacian of Gaussian Filter
Laplacian Filter
22 g 2 g
g 2 2
x y
Blob detection in 2D
• At what scale does the Laplacian achieve a maximum response to a
binary circle of radius r?
• To get maximum response, the zeros of the Laplacian have to be
aligned with the circle
2 2 2
• The Laplacian is given by (up to scale): ( x y 2 ) e
2 2 2 ( x y ) / 2
• Therefore, the maximum response occurs at r / 2 .
circle
r 0
Laplacian
image
Example: LoG Extrema
LoG maxima
sigma = 2
minima
Slide from Robert Collins CSE486
LoG Extrema, Detail
maxima
LoG sigma = 2
Slide from Robert Collins CSE486
LoG Blob Finding
LoG filter extrema locates “blobs”
maxima = dark blobs on light background
minima = light blobs on dark
background
Scale of blob (size ; radius in pixels) is determined
by the sigma parameter of the LoG filter.
LoG sigma = 2 LoG sigma = 10
Slide from Robert Collins CSE486
Observe and Generalize
convolve result
with LoG
maxima
Slide from Robert Collins CSE486
Observe and Generalize
LoG looks a bit
like an eye.
and it responds
maximally in the
eye region!
Slide from Robert Collins CSE486
Blob detection in 2D: scale selection
2 2
• Laplacian-of-Gaussian = “blob” detector 2 g g
g 2 2
x y
filter scales
img1 img2 img3
Blob detection in 2D
• We define the characteristic scale as the scale that
produces peak of Laplacian response
characteristic scale
Slide credit: Lana Lazebnik
Example
Original image at
¾ the size
Slide credit: Kristen Grauman
Scaled down image
Original image at
¾ the size
Original image
Slide credit: Kristen Grauman
Scaled down image
Original image
Slide credit: Kristen Grauman
Scaled down image
Original image
Slide credit: Kristen Grauman
Scaled down image
Original image
Slide credit: Kristen Grauman
Scaled down image
Original image
Slide credit: Kristen Grauman
Scaled down image
Original image
Slide credit: Kristen Grauman
http://www.cs.utexas.edu/~grauman/courses
/spring2011/slides/lecture14_localfeats.pdf
Scale invariant interest points
Interest points are local maxima in both position and scale.
s5
s4
scale
Lxx ( ) Lyy ( ) s3
s2
List of
(x, y, σ)
s1
Squared filter response maps
Slide credit: Kristen Grauman
Scale-space blob detector: Example
Robert Collins
CSE486
Lindeberg: blobs are detected
as local extrema in space and
scale, within the LoG (or DoG)
scale-space volume.
T. Lindeberg. Feature detection with automatic scale selection. IJCV 1998.
Image credit: Lana Lazebnik
Technical detail
• We can approximate the Laplacian with a difference of Gaussians;
more efficient to implement.
L 2 Gxx ( x, y , ) G yy ( x, y, )
(Laplacian)
DoG G ( x, y, k ) G ( x, y, )
(Difference of Gaussians)
Difference of Gaussians Filtering
Outline
• Local invariant features
• Harris corner detection
• Blob detection with LoG
• SIFT Algorithm
SIFT
(Scale Invariant Feature Transform)
SIFT describes both a detector and descriptor
1. Multi-scale extrema
detection
2. Refine location and scale
3. Orientation assignment
4. Keypoint descriptor
Steps of SIFT algorithm
• 1. Determine approximate location and scale of salient
feature points (also called keypoints)
• 2. Refine their location and scale
• 3. Determine orientation(s) for each keypoint.
• 4. Determine descriptors for each keypoint.
Step 1: Approximate keypoint location
D(x, y, )
Down-
sampling L(x, y, k ) L(x,
y, )
L(x, y, )
G(x, y, ) * I (x, y)
Octave = doubling of σ0. Within an octave, the adjacent scales differ by a constant factor k. If an
octave contains s+1 images, then k = 2(1/s). The first image has scale σ0, the second image has
scale kσ0, the third image has scale k2σ0, and the last image has scale ksσ0. Such a sequence of
images convolved with Gaussians of increasing σ constitute a so-called scale space.
Scale = 0 Scale = 1 Scale = 4
Scale = 16 Scale = 64 Scale = 256
http://en.wikipedia.org/wiki/Scale_space
Selected if
1. Scale-space extrema detection larger/lower
than all 26
neighbors
Scale of Gaussian variance
• Detect maxima and minima
of difference-of-Gaussian in
scale space
• Each point is compared to its
R es a m pl e
B lu r
Subt r a ct
8 neighbors in the current
image and 9 neighbors each
in the scales above and
For each max or min found,
below output is the location and
the scale.
Keypoint computations
• Why do we look for extrema of the DoG
function?
Maxima of the DoG indicate dark points (blobs)
on a bright background.
Minima of the DoG indicate bright points (blobs) on
a dark background.
• Why do we look for extrema in a spatial as well as
scale sense?
It helps us pick the “scale” associated with the
keypoint!
Initial detection of keypoints
http://upload.wikimedia.org/wikipedia/commons/4/44/Sift_keypoints_filtering.jpg
Step 2 Refine Keypoint Location & Scale
Removal of low-contrast keypoints
http://upload.wikimedia.org/wikipedia/commons/4/44/Sift_keypoints_filtering.jpg
Step 2 Refine Keypoint Location & Scale
Removal of high-contrast keypoints residing on edges
http://upload.wikimedia.org/wikipedia/commons/4/44/Sift_keypoints_filtering.jpg
Step 2 Refine Keypoint Location & Scale
Step 3: Assigning orientations
• Compute the gradient magnitudes and orientations in a small
window around the keypoint – at the appropriate scale.
35
30 Histogram of gradient
25 orientation – the bin-
20 counts are weighted by
15 gradient magnitudes
10 and a Gaussian
5 weighting function.
0
- 0 0.5 1 1.5 2 2.5 3 3.5
0.5
Step 3: Assigning orientations
• Assign the dominant orientation as the
orientation of the keypoint.
• In case of multiple peaks or histogram entries
more than 0.8 x peak, create a separate
descriptor for each orientation (they will all
have the same scale and location).
35
30
Histogram of gradient
25
orientation – the bin-
20
counts are weighted by
15
gradient magnitudes
10
and a Gaussian
5
weighting function.
0
- 0 0.5 1 1.5 2 2.5 3 3.5
0.5
Step 4. Keypoint Descriptors
• At this point, each keypoint has
• location
• scale
• orientation
• Next is to compute a descriptor for the local image region
about each keypoint that is
• highly distinctive
• invariant as possible to variations such as changes in viewpoint and
illumination
09/20/2025 106
Normalization
• Scale the window size based on the scale at which the point was found.
• Rotate patch according to its dominant gradient orientation (relative
orientation never change!) This puts the patches into a canonical
orientation.
09/20/2025 Image taken from slides by George Bebis (UNR). 107
Scale and Rotation Invariant
Step 4: Descriptors for each keypoint
• Use histograms to bin pixels within sub-patches according
to their orientation.
0 2p
gradients binned by orientation
Final descriptor =
concatenation of all
histograms, normalize
subdivided local patch histogram per grid cell
Step 4: Descriptors for each keypoint
• use the normalized region about the keypoint
• compute gradient magnitude and orientation at each point in the region
• weight them by a Gaussian window overlaid on the circle
• create an orientation histogram over the 4 X 4 subregions of the window
• 4 X 4 descriptors over 16 X 16 sample array were used in practice. 4 X 4
times 8 directions gives a vector of 128 values.
...
09/20/2025 110
Keypoint descriptor
Image Gradients SIFT descriptor
(4 x 4 pixel per cell, 4 x 4 cells) (16 cells x 8 directions = 128 dims)
SIFT descriptor [Lowe 2004]
• Extraordinarily robust matching technique
• Can handle changes in viewpoint
• Up to about 60 degree out of plane rotation
• Can handle significant changes in illumination
• Sometimes even day vs. night (below)
• Fast and efficient—can run in real time
• Lots of code available, e.g.
http://www.vlfeat.org/overview/sift.html
Steve Seitz
SIFT properties
• Invariant to
• Scale
• Rotation
• Partially invariant to
• Illumination changes
• Camera viewpoint
• Occlusion, clutter
Value of local (invariant) features
• Complexity reduction via selection of distinctive points
• Describe images, objects, parts without requiring
segmentation
• Local character means robustness to clutter, occlusion
• Robustness: similar descriptors in spite of noise, blur, etc.
Automatic mosaicing
Matthew Brown
http://matthewalunbrown.com/autostitch/autostitch.html
Recognition of specific objects, scenes
Scale Viewpoint
Lighting Occlusion
Slide credit: J. Sivic
https://www.youtube.com/watch?v=ram-jbLJjFg&list=PL2zRqk16wsdqXEMpHrc4Qnb5rA1Cylrhx&index=15
https://www.youtube.com/watch?v=IBcsS8_gPzE&list=PL2zRqk16wsdqXEMpHrc4Qnb5rA1Cylrhx&index=16
Summary
• Desirable properties for local features for
correspondence
• Basic matching pipeline
• Interest point detection
• Harris corner detector
• Laplacian of Gaussian and difference of Gaussians,
automatic scale selection
• SIFT descriptor
References
A lot of slides, images and contents of this
lecture are adapted from:
• CS 376: Computer Vision
http://vision.cs.utexas.edu/376-spring2018/#S
yllabus
• 16-385: Computer Vision
http://www.cs.cmu.edu/~16385/