0% found this document useful (0 votes)
17 views48 pages

Unit 2 Updated

The document discusses various techniques in computer vision, focusing on gradient-based methods for feature detection, including the Difference of Gaussians (DoG) for edge detection and noise reduction, and the Histogram of Oriented Gradients (HOG) for shape recognition. It also explains the importance of keypoints and descriptors in image analysis, emphasizing their robustness to transformations and their role in algorithms like SIFT and SURF. Additionally, it outlines characteristics of good features and methods for corner detection, highlighting the significance of stable and distinctive points in images.

Uploaded by

akshat124b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views48 pages

Unit 2 Updated

The document discusses various techniques in computer vision, focusing on gradient-based methods for feature detection, including the Difference of Gaussians (DoG) for edge detection and noise reduction, and the Histogram of Oriented Gradients (HOG) for shape recognition. It also explains the importance of keypoints and descriptors in image analysis, emphasizing their robustness to transformations and their role in algorithms like SIFT and SURF. Additionally, it outlines characteristics of good features and methods for corner detection, highlighting the significance of stable and distinctive points in images.

Uploaded by

akshat124b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Computer vision (unit-2)

In computer vision,
gradient-based techniques help identify edges, corners, and textures in images
by analyzing changes in intensity.

These techniques are vital for


feature detection

Computer vision (unit-2) 1


Computer vision (unit-2) 2
Computer vision (unit-2) 3
Computer vision (unit-2) 4
Computer vision (unit-2) 5
Key Points about Difference of Gaussians (DoG):
1. Edge Detection:

DoG enhances edges by subtracting two smoothed images (with different


blur levels).

This suppresses low-frequency details (like flat regions) and highlights


high-frequency details (like edges).

2. Noise Reduction:

Gaussian blurring reduces noise. DoG uses two blurs to remove fine noise
while preserving important structure like edges or blobs.

3. Efficient Approximation of Laplacian of Gaussian (LoG):

DoG approximates the Laplacian of Gaussian, which is more


computationally expensive.

It gives similar results but is faster and easier to compute.

4. Blob Detection:

DoG is widely used in blob detection algorithms like SIFT (Scale-Invariant


Feature Transform) to find keypoints in an image at different scales.

Computer vision (unit-2) 6


Drawback:
Reduction in overall image contrast(Contrast refers to the difference in
brightness or color between different parts of an image. It's what makes
objects in an image stand out from the background and from each other.)

HOG(histogram of gradients)
instead of just marking all the edge spots(Unlike simple edge detectors), HOG
summarizes in every patch both -

how strong the edges are and which way they point.

That extra directional info helps a computer recognize shapes (like the outline
of a person) much more reliably than just knowing where edges lie.

The Histogram of Oriented Gradients (HOG) is a powerful feature descriptor used


in computer vision, especially for object detection.

Interest points(key points)

Computer vision (unit-2) 7


A point in the image where the local appearance (intensity pattern) is distinctive
enough to be recognized again — even if the image is rotated, scaled, or
changed a little.
Keypoints (or interest points) are distinctive, unique points in an image—like
corners, edges, or blobs—that:

Stand out from their surroundings, and

Can be reliably detected even if the image is scaled, rotated, or slightly


changed.

Think of an image like this:

A plain white wall: Every pixel has a location, but they all look the same —
there's no texture or change, so it's hard to pick a "special" spot.

A chessboard corner: That junction of black and white squares stands out
sharply from its neighbors. It’s visually unique and easy to find again in other
images.

characterstic of key point:

There may be a change of one or more image properties (color, texture,


intensity etc),Interest points should be robust to these changes

Corner point is an example of interest point

key Descriptors
key Descriptors: Vectors that describe the appearance around each keypoint.
Key descriptors (also called feature descriptors) are used to uniquely describe
and compare interest points (keypoints) in images. They encode the local image
appearance around each keypoint into a numeric vector so that keypoints from
different images can be matched, tracked, or used in recognition tasks.

Process of Calculating HOG (Step 1)


Resize image to integer multiple of 8 (nearest to original size)
8 ka multiple jo original size k aas paas ho

Computer vision (unit-2) 8


Process of Calculating HOG (Step 2)
To calculate gradients of patch

Divide the image patch into cells of the same size (Ex: 8x8)

Size can be 16x16 or larger

Determine gradient of each pixel

Process step-3 and step -4

Computer vision (unit-2) 9


Computer vision (unit-2) 10
Computer vision (unit-2) 11
Step-4 (continuation)

Step-5
Block Formation:
A block is made of 2×2 cells, i.e., 16×16 pixels (since each cell is 8×8).

So:
📦 1 block = 4 cells = 4 histograms = 4 × 9 = 36 values

Computer vision (unit-2) 12


• 16×16 block is moved in steps of 8 ( i.e. 50% overlap with the previous block )

Step-6

Computer vision (unit-2) 13


Step-7
Classify images

Each image is represented by a descriptor (feature vector) of length,3780

Train classifier (ex: SVM) using descriptors of images

Feature Extraction
Interest points(key points)
A point in the image where the local appearance (intensity pattern) is distinctive
enough to be recognized again — even if the image is rotated, scaled, or
changed a little.
Keypoints (or interest points) are distinctive, unique points in an image—like
corners, edges, or blobs—that:

Computer vision (unit-2) 14


Stand out from their surroundings, and

Can be reliably detected even if the image is scaled, rotated, or slightly


changed.

Think of an image like this:

A plain white wall: Every pixel has a location, but they all look the same —
there's no texture or change, so it's hard to pick a "special" spot.

A chessboard corner: That junction of black and white squares stands out
sharply from its neighbors. It’s visually unique and easy to find again in other
images.

characterstic of key point:

There may be a change of one or more image properties (color, texture,


intensity etc),Interest points should be robust to these changes

Corner point is an example of interest point

Examples of Keypoints:
1. Corners

2. Blobs

3. Edges with intersections or curvature

key Descriptors
key Descriptors: Vectors that describe the appearance around each keypoint so
that keypoints can be compared mathematically.
Key descriptors (also called feature descriptors) are used to uniquely describe
and compare interest points (keypoints) in images. They encode the local image
appearance around each keypoint into a numeric vector so that keypoints from
different images can be matched, tracked, or used in recognition tasks.

Global Descriptors:
Scope: They describe the entire image (or a large region of it).

Computer vision (unit-2) 15


Relation to Keypoints: Global descriptors don’t focus on a specific keypoint.
Instead, they provide a summary of the whole image or scene.

Example: A color histogram, texture descriptor, or shape descriptor like HOG.

Use case: Image classification — you compare the overall visual content
of one image with another.

Characteristics:

Sensitive to transformations like rotation, scaling, or partial occlusion.

Not as robust to changes in viewpoint because they treat the image as a


whole.

Local Descriptors:
Scope: They describe small patches of the image around specific keypoints
(interest points).

Relation to Keypoints: Local descriptors focus on the appearance around


keypoints (like a corner or blob) and encode the local texture and structure of
the image around them.

Example: SIFT, ORB, SURF.

Use case: Feature matching — you compare local regions between two
images (e.g., finding corresponding points for image stitching).

Characteristics:

Robust to transformations like rotation, scaling, and lighting.

Helps match specific keypoints even if the rest of the image has changed.

Characteristics of good features


Repeatability
The same features should be detected across different images of the same scene,
even if there are changes in angle, scale, or lighting.

Saliency

Computer vision (unit-2) 16


Features should be unique and easily distinguishable from their surroundings,
making them reliable for matching.

Compactness
A small number of meaningful features should be enough to represent the whole
image—no need to process every pixel.

Efficiency
The detection process should be fast enough to work in real-time applications like
video, robotics, or AR.

Locality
Features should be based on small, local regions in the image, so they’re easier to
track and match.

Robustness to Clutter and Occlusion


Features should still be detectable even when parts of the image are hidden or
surrounded by irrelevant objects.

Covariance
Features should appear in the same relative position even if the image is rotated,
scaled, or lit differently.

Corner points

Computer vision (unit-2) 17


A corner is a point in an image where intensity changes significantly in multiple
directions.

Corner points are a special case of interest points


Interest points are distinctive and identifiable locations in an image (e.g.,
blobs, edges, corners).

Corners are a specific type of interest point that are particularly stable and
useful for matching and tracking.

Corner – Intersection of two or more edge segments


A corner is formed where two edges meet, creating a region with a sharp
intensity change in multiple directions.

Well localized in image space


Corners have a precise location; they don’t shift easily with small image
changes, making them reliable for matching.

Stable against scaling and rotation

Computer vision (unit-2) 18


Corners remain consistent when the image is zoomed in/out or rotated,
making them suitable for comparing different views.

Can compute interest points accurately


Algorithms like Harris or Shi-Tomasi can detect corners with high accuracy,
identifying their exact location in the image.

Corner detector
Locate interest points where the surrounding neighborhood shows edges in
more than one direction

These points are image corners

Change in intensity is tested for the shift (u,v) of a window

Computer vision (unit-2) 19


There are two detectors:
1.Harris corner detector

2.harris and Hessian corner detector

Computer vision (unit-2) 20


Computer vision (unit-2) 21
Computer vision (unit-2) 22
Computer vision (unit-2) 23
Rest process is same as above.

Hessian Corner Detector

Computer vision (unit-2) 24


Computer vision (unit-2) 25
Computer vision (unit-2) 26
characterstic of Harris and Hessian detectors:
Corner points should be detected at corresponding locations in other image
even if image is rotated

Harris and Hessian detectors are rotation invariant

SIFT and SURF

Computer vision (unit-2) 27


Computer vision (unit-2) 28
Computer vision (unit-2) 29
Computer vision (unit-2) 30
Computer vision (unit-2) 31
Computer vision (unit-2) 32
Computer vision (unit-2) 33
Computer vision (unit-2) 34
Computer vision (unit-2) 35
Computer vision (unit-2) 36
🧠 SIFT Algorithm (Orientation Assignment)
🔷 Goal:
Assign a consistent orientation to each keypoint to ensure rotation invariance.

🔹 Orientation Assignment Steps:


1. Window Selection: Around each detected keypoint, a 16×16 pixel window is
selected.

2. Subdivision into Cells:

This window is divided into 16 cells of 4×4 pixels each (i.e., 4×4 grid of
cells).

Computer vision (unit-2) 37


4.Orientation Histogram (per cell):

Each 4×4 cell builds a histogram of gradient orientations.

Each histogram has 8 bins, covering the entire 360° range.

Computer vision (unit-2) 38


Aggregate all histograms:

The next step is to combine these histograms into a single, global


orientation histogram. This is done by concatenating the individual
histograms from the 16 cells into a single long vector of 128 values (16
cells × 8 bins per cell).

This combined vector represents the distribution of gradient orientations


over the entire local window around the keypoint.

Find the dominant orientation:

To find the dominant orientation, you look for the largest peak in the
combined global histogram. This peak corresponds to the most frequent
gradient orientation in the local neighborhood of the keypoint.

The dominant orientation is the angle that corresponds to the peak in this
aggregated histogram. This orientation represents the primary direction in
which the image gradients are oriented in the keypoint’s region.

Assign the dominant orientation to the keypoint:

Once the dominant orientation is found, it is assigned to the keypoint. This


ensures that the keypoint descriptor is rotation-invariant because the local
window around the keypoint is now oriented according to this dominant
direction.

Rotation of the descriptor:

After the keypoint’s orientation is set, the entire local descriptor (the 16×16
window of cells) is rotated such that the dominant orientation aligns with a
fixed reference direction (usually 0°).

This rotation of the window ensures that the descriptor is invariant to


rotation because the local feature vectors (gradients) are now aligned
relative to this fixed reference.

Keypoint Descriptor:

Computer vision (unit-2) 39


Computer vision (unit-2) 40
SURF(refined)
Integral Image
The first step in the SURF algorithm is to compute the integral image, which
allows for fast computation of box filters used in keypoint detection, orientation
assignment, and descriptor extraction.

Scale space creation:


Instead of serial downsampling, each pyramid level is built by upscaling the
image in parallel.

Each scale corresponds to the response of the image after it has been
convolved with a box filter of a specific size (e.g., 9x9, 15x15, 27x27).

The scale space is organized into octaves, where each octave contains a set
of responses from the filters applied at different scales.

Hessian matrix calculation:


Response Map: For each filter size (i.e., each scale level), we compute the
determinant of the Hessian matrix at every pixel, using the formula:

Computer vision (unit-2) 41


w balances the relative weights of the components.

Octaves & Levels: These different filter sizes (scales) are organized into
octaves, with typically 4 levels per octave. Each octave uses a progressively
larger filter to detect larger features across different image scales.

Non-maximum suppression (NMS)


To localize interest points in the image and over scales, a non-maximum
suppression (non-maximum pixels are set to 0) in a 3 x 3 x 3 neighborhood is
applied.

Orientation Assignment:
For each keypoint (interest point) in the image, we calculate the Haar wavelet
responses in the x and y directions.

Computer vision (unit-2) 42


The x-direction captures horizontal changes in intensity (e.g., left to right).

The y-direction captures vertical changes (e.g., top to bottom).

These responses are calculated within a circular neighborhood around the


keypoint.

The size of this neighborhood is 6 × s(radius), where s is the scale of the


keypoint.

The Haar wavelet responses in the x-direction ( dx ) and y-direction ( dy ) are


treated as vectors at their respective locations in the neighborhood.

The response in the x-direction ( dx ) gives the horizontal gradient.

The response in the y-direction ( dy ) gives the vertical gradient.

These vectors are plotted, not in the literal sense, but conceptually.

Each Haar response at a pixel becomes a small vector represented by (dx,


dy).

Computer vision (unit-2) 43


For each keypoint, SURF samples 100 points (sub-regions) around it,
regardless of the scale (sigma value, ss).

These points are distributed uniformly across the region of interest, and
gradients (Gx,,Gy) are computed at each of these 100 locations.

📌 Example: Horizontal Haar Wavelet (x-direction with filter)


Suppose we have 6 pixel intensities across a row:

css
CopyEdit
[ 20 22 24 | 40 42 44 ]
Left Right

We use the Haar wavelet filter for the x-direction:

css
CopyEdit
Haar x-filter: [ -1 -1 -1 +1 +1 +1 ]

This acts like a mask applied over the pixel window.

🔢 Step-by-step Calculation (Filter Multiplication)


Now we multiply each pixel by its corresponding filter value:

CopyEdit
= (20 × -1) + (22 × -1) + (24 × -1) + (40 × +1) + (42 × +1) + (44 × +1)
= -20 -22 -24 + 40 + 42 + 44
= (-66) + (126)
= 60

What is a sliding window?

Computer vision (unit-2) 44


A sliding orientation window is a sector that moves around the keypoint in
angular increments.

The window has a size of π/3 (60°), meaning that it covers 60° of the full
360° circle around the keypoint.

What happens here:


The window slides around the full 360° circle, and for each window:

The vectors (dx, dy) that fall inside the angular range of the window are
summed.

Note:

The Haar wavelet responses are weighted using a Gaussian function to


give more importance to pixels closer to the keypoint.

The Gaussian's standard deviation is σ = 2s, where s is the scale. This


means larger features have a larger Gaussian window and more spread
out weights.

This gives a total x-strength and a total y-strength for that window:

Sum all the dx values → this gives the x-strength.

Sum all the dy values → this gives the y-strength.

The result is a local orientation vector for each angular window.

Dominat orientation
After the sliding window completes the full circle (360°), we look at the local
orientation vectors produced in each window.

The window with the largest resulting vector (in terms of magnitude) is
selected as the dominant orientation of the keypoint.

this dominant orientation helps to make the descriptor rotational invariant


which would be calculated later

Computer vision (unit-2) 45


Keypoint Descriptor
1. Define a square window around the interest point:

Around each keypoint (interest point), imagine a square window of size 20


times the scale (20s). This window is centered on the keypoint.

2. Divide the window into smaller sections:

Inside that square window, divide it into a 4 x 4 grid, so you have 16


smaller boxes.

3. Calculate Haar wavelet responses:

For each of the 16 smaller boxes, calculate the Haar wavelet responses.
This basically tells you how the image intensity changes in the horizontal
and vertical directions within each box.

4. Extract metrics:

For each box, you calculate four metrics (numbers) based on the Haar
wavelet responses. These metrics are taken from 5 x 5 equally spaced
points inside each box.

metrics:
Sum of horizontal responses (dx)

Sum of vertical responses (dy)


Sum of squared horizontal responses (dx²)
Sum of squared vertical responses (dy²)

5. Create the local feature vector:

Once you've calculated the metrics for each box, you sum them up to get
a single value that represents that box.

6. Combine all the metrics:

After calculating these values for all 16 boxes, you combine (concatenate)
all of them together to form a 64-element feature vector. This vector

Computer vision (unit-2) 46


describes the keypoint and its local neighborhood.

SURF -
Step 1)
Integral image
scale space banai by different size of box filter(box filter ki size increase karte
gaye)
scale space par hessian matrix
3*3*3 = 27 values me se sabse dominant value nikali(non maximal supression)

Step2)
Rotation invariant k liye

har keypoint k around 6s k radius banaye


100 orientations ka dx/dy nikala using Haar wavelet (aise 100 k bhot saare groups
honge)
sectors banaye 60 each covering whole 360 of circle made by 6s radius
har ek 60 degree k sector me jo dx/dy mile hai usnka sum( diff 100 k groups k
dx/dy add karna)
6 sectors me se sabse dominant orientation nikalna (this makes it rotational
invarient)

Step3) Keypoint descriptor


3*3*3 = 27 values me se sabse dominant value nikali (iss step k baad)
har ek keypoint k around 20s ki square window liii
4*4 k 16 boxes bana liye
4*4 k ek box se 4 metrics nikal lenge dx dy dx^2 dy^2 using haar wavelet
so 4 metrics per box and 16 boxes therefore 16*4 = 64 element feature vector

Computer vision (unit-2) 47


Computer vision (unit-2) 48

You might also like