0% found this document useful (0 votes)
109 views79 pages

Unit 3

The document discusses various methods for image segmentation in computer vision, focusing on techniques such as active contours, snakes, and level sets. It explains the principles behind these methods, their applications in medical imaging, and the advantages and disadvantages of different models like the snake model and intelligent scissors. Additionally, it highlights the limitations of traditional methods and introduces advanced concepts like gradient vector flow and level set approaches for more effective segmentation.

Uploaded by

yoparty3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views79 pages

Unit 3

The document discusses various methods for image segmentation in computer vision, focusing on techniques such as active contours, snakes, and level sets. It explains the principles behind these methods, their applications in medical imaging, and the advantages and disadvantages of different models like the snake model and intelligent scissors. Additionally, it highlights the limitations of traditional methods and introduces advanced concepts like gradient vector flow and level set approaches for more effective segmentation.

Uploaded by

yoparty3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 79

Unit 3

A Method for Image Segmentation in


Computer Vision
Active contours, Snakes, Dynamic snakes and
CONDENSATION, Scissors, Level Sets, Split and merge,
Mean shift and mode finding, Normalized cuts, Graph
cuts and energy-based methods, 2D and 3D feature-
based alignment, Pose estimation
What is Image segmentation?
• Partitioning the input image, by clustering pixel values of
the image.
• It is mainly used for identifying various surfaces or living or
nonliving objects from an image.
• Image segmentation is the task of finding groups of pixels that “go
together”.

• In statistics, this problem is known as cluster analysis and is a widely


studied area with hundreds of different algorithms.

• In computer vision, image segmentation is one of the oldest and most


widely studied problems.

• Early techniques tend to use region splitting or merging which


correspond to divisive and agglomerative algorithms in the clustering
literature.

• More recent algorithms often optimize some global criterion, such as


intra-region consistency and inter-region boundary lengths or
dissimilarity.
Image segmentation techniques
• Active contours,
• Split & merge,
• Watershed,
• Region splitting,
• Region merging,
• Graph-based segmentation,
• Mean shift and model finding, and
• Normalized cut.
Active contours
https://www.analyticsvidhya.com/blog/2021/09/active-contours-a-
method-for-image-segmentation-in-computer-vision/

https://youtu.be/FROJUMk9P3Y
What are Active Contours?
• Defined as an active model for the segmentation process.
• Contours are the boundaries that define the region of
interest in an image.
• A contour is a collection of points that have been
interpolated.
• The interpolation procedure might be linear, splines, or
polynomial, depending on how the curve in the image is
described.
• segmentation method that uses energy forces and
constraints to separate the pixels of interest from a picture
for further processing and analysis.
Why Active Contours is needed?
• To define smooth shapes in images and to construct
closed contours for regions.
• It is mainly used to identify uneven shapes in images.
• Used in a variety of medical image segmentation
applications.
• Ex:
• The separation of desired regions from a variety of medical
images.
• A slice of a brain CT scan, for example, is examined for
segmentation using active contour models.
How does Active Contour work?
• Technique of obtaining deformable models or structures in an
image with constraints and forces for segmentation.
• Contour models define the object borders or other picture
features to generate a parametric curve or contour.
• The curvature of the models is determined using several
contour techniques that employ external and internal
forces.
• The energy function is always related to the image’s curve.
• External energy is described as the sum of forces caused by
the picture that is specifically used to control the location of
the contour onto the image, and internal energy, which is
used to govern deformable changes.
How does Active Contour work?
• The contour segmentation constraints for a certain
image are determined by the needs.
• The desired shape is obtained by defining the
energy function.
• A collection of points that locate a contour is used to
describe contour deformation.
• This shape corresponds to the desired image contour,
which was defined by minimizing the energy
function.
Active Contour Models - 1.
Snake Model
• Has the ability to solve a broad range of
segmentation problems.
• The model’s primary function is to identify and
outline the target object for segmentation.
• It requires some prior knowledge of the target
object’s shape, especially for complicated things.
• Active snake models, often known as snakes, are
generally configured by the use of spline focused on
minimizing energy, followed by various forces
governing the image.
Active Contour Models - 1.
Snake Model
• A simple snake model can be denoted by a set
of n points, vi for i=0,….n-1, the internal elastic energy
term EInternal and the external edge-based energy term
Eexternal.
• The internal energy term’s aim is to regulate the
snake’s deformations, while the exterior energy term’s
function is to control the contour’s fitting onto the
image.
• The external energy is typically a combination of forces
caused by the picture Eimage and constraint forces
imposed by the user Econ.
snake’s energy function

The snake’s energy function is the total of its exterior and internal energy, which can be written as below.
ADVANTAGE & DISADVANTAGE
ADVANTAGES: DISADVANTAGE:
• The applications of the active snake model are
expanding rapidly, particularly in the many imaging • The conventional active snake model approach has
domains. various inefficiencies, such as noise sensitivity and
• In the field of medical imaging, the snake model is erroneous contour detection in high-complexity
used to segment one portion of an image that has objects, which are addressed in advanced contour
unique characteristics when compared to other methods.
regions of the picture.
• Traditional snake model applications in medical
imaging include optic disc and cup segmentation to
identify glaucoma, cell image segmentation,
vascular region segmentation, and several other
regions segmentation for diagnosis and study of
disorders or anomalies
.
Gradient Vector Flow Model
• It is a more developed and well-defined version of the snake or
active contour models.
• The traditional snake model has two limitations: inadequate
contour convergence for concave borders and when the
snake curve flow is commenced at a great distance from
the minimum.
• As an extension, the gradient vector flow model uses the
gradient vector flow field as an energy constraint to determine
the contour flow.
Equation
ADVANTAGE & DISADVANTAGE
ADVANTAGES: DISADVANTAGE:
• Is an advanced version of the snake model that is
used for various image processing applications, • The main difficulty with utilizing GVF is that the
especially in medical image processing. smoothing term ‘μ’ causes the contour’s edges to
• The segmentation of regions with particular round. Reducing the value of ‘μ’ minimizes
parameters in medical imaging is done with the rounding but increases the amount of smoothing.
help of active contour models.
• Because these models create a contour around the
target object, it is separated from the image.
Balloon Model
• A snake model isn’t drawn to far-off edges.
• If no significant image forces apply to the snake model, its
inner side will shrink. A snake that is larger than the minima
contour will eventually shrink into it, whereas a snake that is
smaller than the minima contour will not discover the
minima and will instead continue to shrink.
• To address the constraints of the snake model, the balloon
model was developed, in which an inflation factor is
incorporated into the forces acting on the snake.
• The inflation force can overwhelm forces from weak edges,
exacerbating the problem with first guess localization.
Equation
ADVANTAGE & DISADVANTAGE
ADVANTAGES: DISADVANTAGE:

• Used to segment various medical pictures. • The balloon model’s biggest problem is its slow
processing, which makes it difficult to manage
• The application’s primary purpose is to propose a sharp edges and requires careful object placement.
novel technique for segmenting 2-D images and The balloon model is commonly used in analyzing
reconstructing 3-D meshes that ensures a picture contour extraction.
watertight mesh.
Geometric or geodesic active
contour models
• Geometric active contour (GAC) is a form of contour model that adjusts
the smooth curve established in the Euclidean plan by moving the
curve’s points perpendicular.
• The points move at a rate proportionate to the curvature of the image’s
region.
• The geometric flow of the curve and the recognition of items in the
image are used to characterize contours.
• Geometric flow encompasses both internal and external geometric
measures in the region of interest.
• In the process of detecting items in an image, a geometric replacement
for snakes is utilized.
• These contour models rely heavily on the level set functions that
specify the image’s unique regions for segmentation.
Equation
ADVANTAGE & DISADVANTAGE
ADVANTAGES: DISADVANTAGE:
• Mostly used in medical image computing,
particularly image-based segmentation. • Mostly it has no such inefficiencies but they are
• In this case, any imaging modality’s picture is difficult to implement as they are complex in
examined for segmentation in order to research, nature.
process, and analyze the regions of interest.
• These regions include any aberration that forms in
the interior regions or organs of the human body,
such as blood clots, traumas, lesions, cell
abnormalities, metabolic interruptions,
biomolecule disruptions, and so on.
Intelligent Scissors
• Active contours allow users to roughly specify a
boundary of interest and have the system to evolve the
contour towards a more accurate location as well as
track it over time.

• The results of this curve evolution,


• May be unpredictable
• May require additional user-based hints to achieve the
desired result.

• An alternative approach is
• To have the system to optimize the contour in real time
as the user is drawing.
• The intelligent scissors system developed by Mortensen
and Barrett (1995).

• As the user draws a rough outline (the white curve in


Figure 5.9a), the system computes and draws a better
curve that clings (adhere) to high-contrast edges (the
orange curve).

• To compute the optimal curve path (live-wire), the image


is first pre-processed to associate low costs with edges
(links between neighboring horizontal, vertical, and
diagonal, i.e., N8 neighbors) that are likely to be
boundary elements.
1.Take a 7x7 image and convert it into its pixel values.
2.Find Ix and IY for every pixel values using the following
Kernels (Sobel filters):
3. Find the value of G for every pixel.
• For the image matrixes Ix and Iy, we’ll take the corresponding points.
This can be understood like, if you superimpose ( ie put the matrix of
Iy over Ix), the points which fall in the same box will act as the points
for which the distance formula is applied.
• For example, the first point in Ix, ie -9 will be squared with Iy’s first
term 13, just below those points -1 and -5 will be used to calculate the
value of G at that point and so on.

• The highlighted value is the max value of G, we are going to use this
for the next calculation.
4. Find the value of every node

• Here G(max) will be 22 as we saw in the last step


• The values of the matrix now become:

• Seed point is the starting point of our boundary selection. We select a


starting point (just like clicking the boundary of an object that you want
to select) and then select another point known as the free point which is
where we want the selection to end.
• It is important to note that the selection doesn’t have to be a closed one.
This means that the starting and ending points can be different.
5. Assign Weights to vertex
Compute using filters
Start from the seed point and expand to all the neighbors‘

6. Apply the following formula to calculate the cost:


Cost= (max-|filter response|)*length
Here, like Gmax, the max filter response will be the max value obtained after the
filters are applied and values are calculated.
7. Apply the Dijkstra Algorithm to determine the shortest path
Figure 5.9 Intelligent scissors: (a) as the mouse traces the white path, the scissors
follow the orange path along the object boundary (the green curves show intermediate
positions) (Mortensen and Barrett 1995) © 1995 ACM; (b) regular scissors can
sometimes jump to a stronger (incorrect) boundary; (c) after training to the previous
segment, similar edge profiles are preferred (Mortensen and Barrett 1998) © 1995
Elsevier.
• Their system uses a combination of zerocrossing,
gradient magnitudes, and gradient orientations to
compute these costs.

• Next, as the user traces a rough curve, the system


continuously recomputes the lowest cost path between
the starting seed point and the current mouse location
using Dijkstra’s algorithm,

• A breadth-first dynamic programming algorithm that


terminates at the current target location.
• In order to keep the system from jumping around
unpredictably, the system will “freeze” the curve to date
(reset the seed point) after a period of inactivity.

• To prevent the live wire from jumping onto adjacent


higher-contrast contours, the system also “learns” the
intensity profile under the current optimized curve, and
uses this to preferentially to keep the wire moving along
the same (or a similar looking) boundary (Figure 5.9b–c).
Figure 5.9 Intelligent scissors: (a) as the mouse traces the white path, the scissors
follow the orange path along the object boundary (the green curves show intermediate
positions) (Mortensen and Barrett 1995) c 1995 ACM; (b) regular scissors can
sometimes jump to a stronger (incorrect) boundary; (c) after training to the previous
segment, similar edge profiles are preferred (Mortensen and Barrett 1998) c 1995
Elsevier.
• Several extensions have been proposed to the basic
algorithm, which works remarkably well even in its
original form.

• Mortensen and Barrett (1999) use tobogganing, which is


a simple form of watershed region segmentation, to pre-
segment the image into regions whose boundaries
become candidates for optimized curve paths.

• The resulting region boundaries are turned into a much


smaller graph, where nodes are located wherever three
or four regions meet.
• The Dijkstra algorithm is run on this reduced graph,
resulting in much faster (and often more stable)
performance.

• Another extension to intelligent scissors is to use a


probabilistic framework that takes into account the
current trajectory (Path/Line) of the boundary, resulting
in a system called JetStream.

• Instead of re-computing an optimal curve at each time


instant, a simpler system can be developed by simply
“snapping” the current mouse position to the nearest
likely boundary point.
Level Sets
• A limitation of active contours is based on parametric
curves of the form f(s), e.g., snakes, Bsnakes, and
CONDENSATION, is that
• It is challenging to change the topology of the curve as it
evolves.

• If the shape changes dramatically, curve


reparameterization may also be required.

• An alternative representation for such closed contours is


to use a level set, where the zero crossing(s) of a
characteristic (or signed distance) function define the
curve.
• Level sets evolve to fit and track objects of interest by
modifying the underlying embedding function ф(x,y)
instead of the curve f(s).

• To reduce the amount of computation required, only a


small strip (frontier) around the locations of the current
zero-crossing needs to updated at each step, which
results in what are called fast marching methods.

• An example of an evolution equation is the geodesic


active contour proposed by Caselles, Kimmel, and Sapiro
(1997) and Yezzi, Kichenassamy, Kumar et al. (1997),
• where g(I) is a generalized version of the snake edge potential
(5.5).

• To get an intuitive (spontaneous) sense of the curve’s behavior,


assume that the embedding function is a signed distance
function away from the curve, in which case |ф| = 1.

• The first term in Equation (5.19) moves the curve in the direction
of its curvature,
Figure 5.10 Level set evolution for a geodesic active contour. The embedding function ф
is updated based on the curvature of the underlying surface modulated by the
edge/speed function g(I), as well as the gradient of g(I), thereby attracting it to strong
edges.
• i.e., it acts to straighten the curve, under the influence
of the modulation function g(I).
• The second term
• Moves the curve down the gradient of g(I), encouraging
the curve to migrate towards minima of g(I).
• This level-set formulation can readily change topology, it
is still susceptible to local minima, since it is based on
local measurements such as image gradients.
• An alternative approach is to re-cast the problem in a
segmentation framework, where the energy measures the
consistency of the image statistics (e.g., color, texture,
motion) inside and outside the segmented regions.

• These approaches build on earlier energy-based


segmentation frameworks introduced by Leclerc (1989),
Mumford and Shah (1989), and Chan and Vese (1992),

• Examples of such level-set segmentations are shown in


Figure 5.11,
• Which shows the evolution of the level sets from a series of
distributed circles towards the final binary segmentation.
Figure 5.11 Level set segmentation (Cremers, Rousson, and Deriche 2007) © 2007
Springer: (a) grayscale image segmentation and (b) color image segmentation. Uni-
variate and multi-variate Gaussians are used to model the foreground and background
pixel distributions. The initial circles evolve towards an accurate segmentation of
foreground and background, adapting their topology as they evolve.
Mean Shift and mode
finding
2d representation of pixel feature distribution Select a sample pixel

Hill climbing using mean shift


Mean Shift Results
Advantages and Disadvantages
Mean shift tutorial
• https://www.youtube.com/watch?v=PCNz_zttmtA
Pose Estimation
Pose Estimation - Types
Pose Estimation – Key point based
• Pose estimation is a computer vision task that infers the pose of a
person or object in an mage or video. We can also think of pose
estimation as the problem of determining the position and
orientation of a camera relative to a given person or object.

• This is typically done by identifying, locating, and tracking a


number of keypoints on a given object or person. For objects, this
could be corners or other significant features. And for humans,
these keypoints represent major joints like an elbow or knee.

• The goal of our machine learning models are to track these


keypoints in images and videos.
Categories of pose estimation
• When working with people, these keypoints represent major joints like elbows, knees, wrists, etc. This is
referred to as human pose estimation.
• Humans fall into a particular category of objects that are flexible. By bending our arms or legs, keypoints
will be in different positions relative to others. Most inanimate objects are rigid. For instance, the corners of
a brick are always the same distance apart regardless of the brick’s orientation. Predicting the position of
these objects is known as rigid pose estimation.
• There’s also a key distinction to be made between 2D and 3D pose estimation. 2D pose estimation simply
estimates the location of keypoints in 2D space relative to an image or video frame. The model estimates an
X and Y coordinate for each keypoint. 3D pose estimation works to transform an object in a 2D image into a
3D object by adding a z-dimension to the prediction.
• 3D pose estimation allows us to predict the actual spatial positioning of a depicted person or object. As you
might expect, 3D pose estimation is a more challenging problem for machine learners, given the complexity
required in creating datasets and algorithms that take into account a variety of factors – such as an image’s
or video’s background scene, lighting conditions, and more.
• Finally, there is a distinction between detecting one or multiple objects in an image or video. These two
approaches can be referred to as single and multi pose estimation, and are largely self-explanatory: Single
pose estimation approaches detect and track one person or object, while multi pose estimation approaches
detect and track multiple people or objects.
Primary techniques for pose
estimation
• In general, deep learning architectures suitable for pose estimation are
based on variations of convolutional neural networks (CNNs).
• There are two overarching approaches: a bottom-up approach, and a
top-down approach.
• With a bottom-up approach, the model detects every instance of a
particular keypoint (e.g. all left hands) in a given image and then
attempts to assemble groups of keypoints into skeletons for distinct
objects.
• A top-down approach is the inverse – the network first uses an object
detector to draw a box around each instance of an object, and then
estimates the keypoints within each cropped region.
Use cases and applications
• Human activity and movement
• Augmented reality experiences
• Animation & Gaming
• Robotics
Human activity and movement
• One of the most clear areas in which pose estimation is applicable is in tracking and measuring
human movement. an AI-powered personal trainer that works simply by pointing a camera at a
person completing a workout, and having a human pose estimation model (trained on a number
specific poses related to a workout regimen) indicate whether or not a given exercise has been
completed properly.
• This kind of application could enable safer and more inspirational home workout routines, while also
increasing the accessibility and decreasing the costs associated with professional physical trainers.
• And given that pose estimation models can now run on mobile devices and without internet access,
this kind of application could easily extend the access of this kind of expertise to remote or otherwise
hard-to-reach places.
• Using human pose estimation to track human movement could also power a number of other
experiences, including but not limited to:
• AI-powered sports coaches
• Workplace activity monitoring
• Crowd counting and tracking (e.g. for retail outlets measuring foot traffic)
Augmented reality experiences
• While not immediately evident, pose estimation also presents an
opportunity to create more realistic and responsive augmented reality
(AR) experiences.
• AR allows us to place digital objects in real-world scenes. This could
be testing out a piece of furniture in your living room by placing a 3D
rendering of it in the space, or trying on a pair of digitally-rendered
shoes.
• Pose estimation is used to locate and accurately track a physical
object in real-world space, then we can also overlay a digital AR
object onto the real object that’s being tracked.
Animation & Gaming
• Traditionally, character animation has been a manual process that’s relied on bulky and
expensive motion capture systems. However, with the advent of deep learning approaches
to pose estimation, there’s the distinct potential that these systems can be streamlined
and in many ways automated.

• Recent advances in both pose estimation and motion capture technology are making this
shift possible, allowing for character animation that doesn’t rely on markers or specialized
suits, while still being able to capture motion in real-time.

• Similarly, capturing animations for immersive video game experiences can also potentially
be automated by deep learning-based pose estimation. This kind of gaming experience
was popularized with Microsoft’s Kinect depth camera, and advances in gesture
recognition promise to fulfill the real-time requirement that these systems demand.
Robotics
• Traditionally, industrial robotics have employed 2D vision systems to
enable robots to perform their various tasks. However, this 2D
approach presents a number of limitations. Namely, computing the
position to which a robot should move, given this 2D representation
of space, requires intensive calibration processes, and become
inflexible to environmental changes unless reprogrammed.

• With the advent of 3D pose estimation, however, the opportunity


exists to create more responsive, flexible, and accurate robotics
systems.
Region based Segmentation

• Edges and thresholds do not give good results for segmentation


• Region based segmentation is based on the connectivity of similar
pixels in a region.
• Two main approaches in region based segmentation:
• Region growing method
• Region splitting and merging method
Region Growing
• A procedure that groups pixels or sub regions into larger regions.
• Simplest method – pixel aggregation – starts with a seed point and from
these grows regions by appending to each seed points those neighboring
pixels that have similar properties (such as gray level, texture, color, shape)
• Region growing based techniques better than edge based techniques in
noisy images where edges are difficult to detect.
Procedure – Region Growing
Split the image into four quadrants as Zmax-Zmin > threshold
Region splitting (divisive clustering)
• Splitting the image into successively finer regions is one of
the oldest techniques in computer vision.

• Ohlander, Price, and Reddy (1978) present such a


technique, which
• First computes a histogram for the whole image and then
• Finds a threshold that best separates the large peaks in the
histogram.

• This process is repeated until regions are either fairly


uniform or below a certain size.

• More recent splitting algorithms often optimize some


metric of intra-region similarity and inter-region
dissimilarity.
Region merging (agglomerative clustering)
• Region merging techniques also date back to the
beginnings of computer vision.

• Brice and Fennema (1970) uses a dual grid for


representing boundaries between pixels

• Merge regions based on their relative boundary lengths


and the strength of the visible edges at these
boundaries.

• In data clustering, algorithms can link clusters together


based on the distance between their closest points
(single-link clustering), their farthest points (complete-
link clustering), or something in between.
Region merging (agglomerative clustering)
• Kamvar, Klein, and Manning (2002) provide a probabilistic
interpretation of these algorithms and show how
additional models can be incorporated within this
framework.

• A very simple version of pixel-based merging combines


adjacent regions whose average color difference is
below a threshold or whose regions are too small.

• Segmenting the image into such superpixels, which are


not semantically meaningful, can be a useful for pre-
processing stage to make higher-level algorithms such as
stereo matching, optic flow, and recognition both faster
and more robust.
Regiond based segmentation
• https://www.youtube.com/watch?v=DD52pKezdgY

• https://www.youtube.com/watch?v=V5VTJEKRcVs&t=589s
Thank You

You might also like