0% found this document useful (0 votes)
23 views58 pages

03-3 Feature Descriptors

The document states that the training data is current only up to October 2023. No additional information or context is provided. The focus is solely on the limitation of the data's timeliness.

Uploaded by

laqlama3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views58 pages

03-3 Feature Descriptors

The document states that the training data is current only up to October 2023. No additional information or context is provided. The focus is solely on the limitation of the data's timeliness.

Uploaded by

laqlama3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

CCAI-433: Computer Vision

Chapter 3-Part2: Feature Descriptors

AI Program
College of Computer Science and Engineering
University of Jeddah
Slides are based on the material of COMP425/6341 Computer Vision
in Concordia University by Charalambos Poullis, Ph.D.
http://vision.stanford.edu/teaching/cs131_fall1819/syllabus.html
Compiled and modified by Dr. Nuha Zamzami, CSAI, University of Jeddah
How can we find corresponding points?
How can we find correspondences?
How do we describe an image patch?
How do we describe an image patch?
Patches with similar content should have similar descriptors.
Raw patches as local descriptors
The simplest way to describe the
neighborhood around an interest
point is to write down the list of
intensities to form a feature vector.
Raw patches as local descriptors
The simplest way to describe the
neighborhood around an interest
point is to write down the list of
intensities to form a feature vector.
Raw patches as local descriptors
The simplest way to describe the
neighborhood around an interest
point is to write down the list of
intensities to form a feature vector.

But this is very sensitive to even


small shifts, rotations.
Raw patches as local descriptors
The simplest way to describe the
neighborhood around an interest
point is to write down the list of
intensities to form a feature vector.

But this is very sensitive to even


small shifts, rotations.
Scale Invariant Detectors
Experimental evaluation of detectors
w.r.t. scale change

Repeatability rate:
# correspondences
# possible correspondences

K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
Scale Invariant Detection:
Summary
Given: two images of the same scene with a
large scale difference between them
Goal: find the same interest points independently
in each image
Solution: search for maxima of suitable functions
in scale and in space (over the image)

Methods:
1. Harris-Laplacian [Mikolajczyk, Schmid]: maximize Laplacian over
scale, Harris’ measure of corner response over the image
2. SIFT Scale-invariant feature transform [Lowe]: maximize
Difference of Gaussians over scale and space
Advantages of invariant local features
Locality: features are local, so robust to occlusion and
clutter (no prior segmentation)
Distinctiveness: individual features can be matched to
a large database of objects
Quantity: many features can be generated for even
small objects
Efficiency: close to real-time performance
Extensibility: can easily be extended to wide range of
differing feature types, with each adding robustness
SIFT descriptor
Full version (Local)
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor

Adapted from slide by David Lowe


SIFT descriptor
Full version
• Divide the 16x16 window into a 4x4 grid of cells
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor

8 8 ... 8
Numeric Example

0.37 0.79 0.97 0.98

0.08 0.45 0.79 0.97

0.04 0.31 0.73 0.91

0.45 0.75 0.90 0.98

by Yao Lu
by Yao Lu
by Yao Lu
Orientations in each of The orientations all
the 16 pixels of the ended up in two bins:
cell 11 in one bin, 5 in
the other. (rough
SIFT descriptor
Full version
• Start with a 16x16 window (256 pixels)
• Divide the 16x16 window into a 4x4 grid of cells (16 cells)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
• Threshold normalize the descriptor:

such that:

0.2

Adapted from slide by David Lowe


When does the SIFT descriptor fail?
Patches SIFT thought were the same but aren’t:
HOG (Histograms of Oriented Gradients ): General Approach

In practice, effect is very


small (about 1%) while
some computational time
is required*

*Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, SanDiego, USA, June 2005. Vol. II, pp. 886-893.
Computing gradients

Mask 1D 1D
1D cubic‑corrected 2x2 diagonal 3x3 Sobel
Type centered uncentered

 0 1   1 0 1
  1 0   2 0 2
   
  1 0 1 
Operator [-1, 0, 1] [-1, 1] [1, -8, 0, 8, -1]
  1 0   1  2  1
 0 1 0
   0 0 
 1 2 1 

Miss rate
at 10−4 11% 12.5% 12% 12.5% 14%
FPPW
Accumulate weight votes over spatial cells
• How many bins should be in histogram?

• Should we use oriented or non-oriented


gradients?

• How to select weights?

• Should we use overlapped blocks or not? If yes,


then how big should be the overlap?

• What block size should we use?


Contrast normalization

v v v
L1  norm  L1  sqrt  L 2  norm 
v 1  v 1  2
v 2 
L 2  Hys - L2-norm followed by clipping (limiting the maximum values of v to 0.2) and renormalising
Making feature vector

Variants of HOG descriptors. (a) A rectangular HOG (R-HOG) descriptor with 3 × 3 blocks of cells. (b)
Circular HOG (C-HOG) descriptor with the central cell divided into angular sectors as in shape contexts.
(c) A C-HOG descriptor with a single central cell.
HOG feature vector for one block

f  h11 ,..., h91 , h12 ,..., h92 , h13 ,..., h93 , h14 ,..., h94 
Angle Magnitude

0 15 25 25 5 20 20 10

10 15 25 30 5 10 10 5

10
45 95 110 20 30 30 40
1
10 12
47 97 50 70 70 80
1 0

5
Binary voting 160 Magnitude voting
140
4
120

3 100

80
2 60

40
1
20

0 0
0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179 0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

Feature vector extends while window moves


Other methods: SURF
For computational efficiency only compute gradient histogram with 4 bins:

SURF: Speeded Up Robust Features


Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, ECCV 2006
Other methods: BRIEF
Randomly sample pair of pixels a and b.
1 if a > b, else 0. Store binary vector.

BRIEF: binary robust independent elementary features, Calonder, V Lepetit, C Strecha, ECCV 2010
Descriptors and Matching
• The SIFT descriptor and the various variants are
used to describe an image patch, so that we
can match two image patches.

• In addition to the descriptors, we need a distance


measure to calculate how different the two patches
are?

?
Feature distance
How to define the difference between two features f1, f2?
• Simple approach is SSD(f1, f2)
– sum of square 2 differences between entries of the two
Σ (f
descriptors i 1i – f2i
)
– But it can give good scores to very ambiguous (bad) matches

f1 f2

I1 I2
Feature distance in practice
How to define the difference between two features f1, f2?
• Better approach: ratio distance = SSD(f1, f2) /
– f2 is ,best SSD match to f1 in I2
SSD(f 1 2
f ’)
– f2’ is 2nd best SSD match to f 1 2 in I
– gives large values (~1) for ambiguous matches WHY?

f1 f2 ' f2

I1 I2
Other kinds of descriptors

• There are descriptors for other purposes


• Describing shapes HOG
• Describing textures
• Describing features for image classification
• Describing features for a code book
Texture
• The texture features of a patch can be considered
a descriptor.
• E.g. the LBP (Local binary patterns ) histogram is
a texture descriptor for a patch.

Varma, M., & Zisserman, A. (2008). A statistical approach to material classification using image patch exemplars. IEEE
transactions on pattern analysis and machine intelligence, 31(11), 2032-2047.
Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
What is a bag-of-words representation?
• For a text document
• Have a dictionary of non-common words
• Count the occurrence of each word in that document
• Make a histogram of the counts
• Normalize the histogram by dividing each count by the
sum of all the counts
• The histogram is the representation.

apple worm tree dog joint leaf grass bush fence


Bags of features for image classification
1. Extract features
Bags of features for image classification
1. Extract features
2. Learn “visual vocabulary”
Bags of features for image classification
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
Bags of features for image classification
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of
“visual words”
A possible texture representation

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002,
2003; Lazebnik, Schmid & Ponce, 2003
1. Feature extraction

• Regular grid: every grid square is a feature


• Vogel & Schiele, 2003
• Fei-Fei & Perona, 2005
•Interest point detector: the
region around each point
• Csurka et al. 2004
• Fei-Fei & Perona, 2005
• Sivic et al. 2005
1. Feature extraction

• Regular grid: every grid square is a feature


• Vogel & Schiele, 2003
• Fei-Fei & Perona, 2005
•Interest point detector: the
region around each point
• Csurka et al. 2004
• Fei-Fei & Perona, 2005
• Sivic et al. 2005
1. Feature extraction
3 2

Compute SIFT
descriptor Normalize patch
[Lowe’99]

1 Detect patches
[Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]

Slide credit: Josef Sivic


1. Feature extraction

Lots of feature descriptors


for the whole image or
set of images.
2. Discovering the visual vocabulary

feature vector space


2. Discovering the visual vocabulary

feature vector space

What is the dimensionality?


2. Discovering the visual vocabulary

Clustering

Slide credit: Josef Sivic


2. Discovering the visual vocabulary
Visual vocabulary

Clustering

Slide credit: Josef Sivic


Clustering and vector quantization
• Clustering is a common method for learning a visual vocabulary or
codebook
• Each cluster center produced by k-means becomes a codevector
• Codebook can be learned on separate training set
• The codebook is used for quantizing features
• A vector quantizer takes a feature vector and maps it to the index of the nearest code
vector in a codebook
• Codebook = visual vocabulary

• Code vector = visual word

1 code vector 1
feature code vector 2
2 code vector 3
vector
3
code vector N
N
Another example visual vocabulary

Fei-Fei et al. 2005


Example codebook

… Appearance codebook
Source: B. Leibe
Another codebook





Appearance codebook

Source: B. Leibe
Visual vocabularies: Issues

• How to choose vocabulary size?


• Too small: visual words not representative of all patches
•Too large: quantization artifacts,
overfitting
• Computational efficiency
• Vocabulary trees
(Nister & Stewenius, 2006)
3. Image representation: histogram of codewords
frequency

…..
codewords
Image classification
• Given the bag-of-features representations of images from
different classes, learn a classifier using machine learning
Review
• Describing images or image patches is very important for
matching and recognition
• The SIFT descriptor was invented in 1999 and is still very
heavily used.
• Other descriptors are also available, some much simpler, but
less powerful.
• Texture and shape descriptors are also useful.
• Bag-of-words is a handy technique borrowed from text retrieval.
Lots of people use it to compare images or regions.

You might also like