Object Recognition from
Local Scale-Invariant
Features (SIFT)
David G. Lowe
Presented by David Lee
3/20/2006
Introduction
Well engineered local descriptor
Introduction
Image content is transformed into local feature
coordinates that are invariant to translation,
rotation, scale, and other imaging parameters
SIFT Features
Introduction
Initially proposed for correspondence
matching
Proven to be the most effective in such cases according to a recent
performance study by Mikolajczyk & Schmid (ICCV ’03)
Introduction
Automatic Mosaicing
[Link]
Introduction
Now being used for general object class
recognition (e.g. 2005 Pascal challenge)
Histogram of gradients
Human detection, Dalal & Triggs CVPR ’05
Introduction
SIFT in one sentence
Histogram of gradients @ Harris-corner-like
Extract features
Find keypoints
Scale, Location
Orientation
Create signature
Match features
Finding Keypoints – Scale,
Location
How do we choose scale?
Finding Keypoints – Scale,
Location
Scale selection principle (T. Lindeberg ’94)
In the absence of other evidence, assume that a scale level, at
which (possibly non-linear) combination of normalized derivatives
assumes a local maximum over scales, can be treated as
reflecting a characteristic length of a corresponding structure in
the data.
Maxima/minima of Difference of Gaussian
Finding Keypoints – Scale,
Location # of scales/octave
=> empirically
Downsample
Find extrema
in 3D DoG space
Convolve with
Gaussian
Finding Keypoints – Scale,
Location
Sub-pixel Localization
Fit Trivariate quadratic to
find sub-pixel extrema
Eliminating edges
Similar to Harris corner detector
Finding Keypoints – Scale,
Location
Key issue: Stability (Repeatability)
Alternatives
Multi-scale Harris corner detector
Harris-Laplacian
Kadir & Brady Saliency Detector
Recall Fei-fei’s pLSA paper
…
Uniform grid sampling
Random sampling
** Important Note ** Their application was scene classification
NOT correspondence matching
Finding Keypoints – Scale,
Location
← Laplacian →
scale
Harris-Laplacian1
Find local maximum of:
Laplacian in scale
y
Harris corner detector
in space (image ← Harris → x
coordinates)
• SIFT2 scale
← DoG →
Find local maximum of:
– Difference of y
Gaussians in space and
scale ← DoG → x
1
[Link], [Link]. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
2
[Link]. “Distinctive Image Features from Scale-Invariant Keypoints”. IJCV 2004
Finding Keypoints –
Orientation
Create histogram of local
gradient directions
computed at selected scale
Assign canonical
orientation at peak of
smoothed histogram
Each key specifies stable
2D coordinates (x, y, scale,
orientation)
0 2π
Finding Keypoints –
Orientation
Assign dominant
orientation as the
orientation of the
keypoint
Finding Keypoints
So far, we found…
where interesting things are happening
and its orientation
With the hope of
Same keypoints being found, even under some
scale, rotation, illumination variation.
Extract features
Find keypoints
Scale, Location
Orientation
Create signature
Match features
Creating Signature
Thresholded image gradients are sampled over
16x16 array of locations in scale space
Create array of orientation histograms
8 orientations x 4x4 histogram array = 128
dimensions
# dimension
=> empirically
Creating Signature
What kind of information does this capture?
Comparison with HOG (Dalal
’05)
Histogram of Oriented Gradients
General object class recognition (Human)
Engineered for a different goal
Uniform sampling
Larger cell (6-8 pixels)
Fine orientation binning
9 bins/180O vs. 8 bins/360O
Both are well engineered
Comparison with MOPS
(Brown ’05)
Multi-Image Matching using Multi-Scale
Orientated Patches (CVPR ’05)
Simplified SIFT
Multi-scale Harris corner
No Histogram in orientation selection
Smoothed image patch as descriptor
Good performance for panorama stitching
Extract features
Find keypoints
Scale, Location
Orientation
Create signature
Match features
Nearest neighbor, Hough voting, Least-square affine
parameter fit
Conclusion
A novel method for detecting interest points
Histogram of Oriented Gradients are
becoming more popular
SIFT may not be optimal for general object
classification