DIP-Imagery MontenegroJoo
DIP-Imagery MontenegroJoo
net/publication/281690161
CITATIONS READS
0 1,951
1 author:
SEE PROFILE
All content following this page was uploaded by Javier Montenegro Joo on 13 September 2015.
www.VirtualDynamicsSoft.com
[email protected]
This document contains the class notes of the course on Digital Image Processing given by
Prof. Montenegro Joo to Science and Engineering graduate-level students.
Experiments based on the theory in this document may be executed with Imagery, the Digital
Image Processing EduVirtualLab authored by Prof. Montenegro Joo
Classroom lectures make use of the Imagery EduVirtualLab as a visual aid. Out of class,
students practice on their own images with Imagery, installed on the university Computer
Room machines.
Images, image transformations and operations on images shown in this document have all
been generated with the Imagery EduVirtualLab.
This is a computer-assisted course. When the Imagery icon appears next to a subject heading,
the reader will find experimental support about that theme in the Imagery EduVirtualLab.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 2
The student taking this course on Digital Image Processing must have knowledge and
experience on any Computer Programming Language, Matrices, Analytic Geometry,
Derivates, Integrals, Gradients and the Laplacian.
Contents.-
1. Introduction.
2. Applications of DIP.
3. Two examples of situations that demand an efficient DIP.
4. Images from the point of view of the DIP (Binary, Grey-level and Colour images).
5. Conversion of Colour to Grey-level images.
6. Digitization.
7. Pixels and its Neighbourhoods.
8. Geometric Transformations on Images (Translation, Rotation, Scaling, Shearing).
9. Problems generated by discretization.
10. Straight lines in DIP.
11. Variation of Darkness and brightness in an image
12. Image Colour Inversion.
13. Rotating and Flipping images.
14. Image subtraction.
15. Segmentation of images
16. Histograms. (Grey-leveled images and color images).
17. Histogram-based Binarization of grey-leveled images
18. Boundary (Edge) Detection of Binary Images.
19. Histogram_thresholding (Detection of edges in grey-levelled images)
20. Spatial Operators, Box filters, Windows, Templates and Masks.
21. User defined convolution filters
22. Smoothing filters.
23. Noise-Reduction Median Filters.
24. Unsharp Masking Filter.
25. Detection of Discontinuities in Digital Images (Points, Lines, Edges).
26. The Gradient.
27. Edge Detection by First Derivatives and Gradient.
28. Edge Enhancement by Gradient. The Sobel Operators.
29. Generalised Sobel Operators.
30. Edge Detection with the Laplacian.
31. High-boost Filter.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 3
Acknowledgement.-
Introduction
Optics, a branch of Physics, deals with the radiation emitted by objects, when this radiation is
in the visible spectrum it is called (visible) light, additionally, when this radiation encounters
an opaque surface it renders an image characteristic of the object it is coming from. Optical
instruments such as lenses, prisms, mirrors, etc, are used by physicists to visualize the
aforementioned radiation and to study it. Consequently, Digital Image Processing (DIP) may
be regarded as just another tool of Optics, because the algorithms of DIP are simply virtual
instruments to manipulate and to study the images produced by the radiation generated by
objects.
DIP deals with the algorithms used to transform images, which may need to be transformed
merely for aesthetics but also to extract information from them, being this the case of images
used in medical applications and in Pattern Recognition.
Applications of DIP
DIP has main applications in image reconstruction and pattern recognition, especially
autonomous (mechanised). It is highly probable that many successful pattern recognition
applications are kept secret for commercial or security reasons.
Common applications of DIP include Automatic industrial inspection (quality control), Radar
and detection systems, Autonomous robots, Optical character recognition (text recognition),
Geophysical data analysers, Chromosome classification, Electrocardiogram analysis,
Radiography, fingerprint recognition, military target recognition.
Obviously industrial and medical applications are much easier to perform than military ones
because there are illumination control, and no camouflage in the former.
In the automatic industrial quality control hundreds of products must be checked in a short
time and those presenting imperfections must be automatically identified and separated.
In automatic defence systems a dot in the sky, which is approximating to a vessel on the sea,
must be identified in a very short time in order for the ship to activate its defences and shoot
it down, if identified as an enemy aircraft.
The two situations just mentioned used to be solved in the past in a manual way and so it
demanded a long time, nowadays with all the research and development in DIP these
situations have become much more manageable.
In medical applications there is also the need to improve the quality or reduce the level of
noise present in some images, here however the time factor is not as crucial as it is in the
examples above mentioned.
A distorted image due to camera shaking might be corrected in some level by means of DIP
techniques.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 6
DIP regards an image as a function z = f(x,y) where the value of z at the point (x,y)
represents the intensity of the light there, this is the colour. The values of x and y are limited
by the image length and width.
In the field of DIP an image -may be a photograph- is discretized as a two-dimensional light
intensity function f(x,y), where (x,y) are the coordinates of every point and the value of the
function f at point (x,y) is the colour at that point, in this way an image is a matrix with rows
and columns indicated by x and y, and the matrix element indicates its colour.
Digital image researchers have developed mathematical operations over functions like f(x,y)
so as to transform them. A few examples of these transformations include (1) Cleaning a
noisy image, (2) Detection of straight lines (illegal airports) in aerial photographs through a
cloudy day, (3) Detection of contours in poor quality images, etc. DIP deals with the
transformations that can be made on images to extract some information from them.
Binary images
In a Binary Image (that in strict black & white) the points (x,y) have either of two values 1 or
0, sometimes 0 represents the white background and 1, the black colour silhouette, the
opposite being also possible.
Grey-level images
In a grey-level image the light intensity values go from 0 through 255, this making a total of
256 colour levels in each point (x,y).
Two versions of the same image, the one at the left is in (strict) black and white and it is called a
Binary Image, the image at the right side is in grey-levels. Obviously the grey-level image has much
more details (information) than the binary image, but the former demands much more storage space
to be saved and much more memory when it is displayed on a computer screen. A colour image
requires even more memory and storage.
Colour images
Colour images are generally represented in the RGB system (Red, Green, Blue), in this case
every point (x,y) of an image is associated to three values R, G and B, each one varying from
0 (0%) to 255 (100%), the consequence of this is that colour images can contain a total of
256 x 256 x 256 = 16777216 different colours. When R = G = B = 0 the resulting colour is
black, and if R = G = B = 255, the resulting colour is white
In colour images each colour has three components, R, G and B which vary from 0 to 255.
In grey-level images the colour has only a single component which varies from 0 to 255. In
order to make the conversion from colour to grey levels there are some algorithms, the one
proposed by the NTSC (National Television Standards Commission) states that each grey
level has 56% Red, 33% Green and 11% Blue, notice that 56% + 33% + 11% = 100%.
Under the NTSC criteria the colour RGB(240, 36, 128) becomes the grey level with 56% of
240 plus 33% of 36 and 11% of 128, which is 134.4 + 11.88 + 14.08 = 160.36 = 160. The
white colour RGB(255,255,255) becomes the grey level 142.8 + 84.15 + 28.05 = 255 and
the black colour RGB(0,0,0) becomes the grey level 0.
There are other proposals to carry out the conversion from colour to grey level, like the 3-6-1
which proposes 30% Black and 60% Green and 10% Blue. In general every person may
propose his/her own conversion rule. Notice that if the blue colour component has a high
percentage, the grey level image may result rather dark, for this reason the red and green
components are assigned high percentages in the conversions.
The figure above shows three colour to grey-level transformations achieved with Imagery. Image (A) is
the colour input image, images B, C and D are the corresponding transformations to grey-level.
Transformations B and C are standard while D is a user-defined transformation. The RGB
percentages used in the transformations are respectively: Image B: RGB (0.56, 0.33, 0.11), Image C:
RGB(0.30, 0.60, 0.10) and image D: RGB( 0.20, 0.30, 0.50)
Obviously colors is a matter of individual taste, hence everyone may define his own color to
grey-level transformation rule.
Return to Index
Digitization
The digitisation of an image generates a mapping of the image over a grid of discrete
coordinates x and y, this has a huge influence in the digital image processing of the grid
(image), because operations like the derivative, which in ordinary Calculus are carried out
over a space of continuous coordinates X-Y, when trying to be achieved over an image, this
will be executed on a discrete set of coordinates, where X and Y have only discrete values.
Every point of the bidimensional matrix or grid representing an image is called Pixel or Pel. It
is very important to bear in mind that the coordinates x and y of a pixel are discrete, this is,
their values can be only x, y = 0,1,2,3,… The value stored in the pixel (x,y) is the image
colour at that position.
Part (a) of the figure above shows a (central) pixel C and its four nearest neighbours,
identified as North, South, East and West, or Top, Bottom, Left and Right. Part (b) of the
figure shows the coordinates of a (central) Pixel (x,y) and its eight neighbours.
Neighbourhoods of a pixel
Every pixel (x,y) has four nearest neighbours (Top, Bottom, Right
and Left), these are at a unit distance, there are also other four
next nearest neighbours in the diagonals, these are slightly farther
at a distance of 1.4142 Some Digital Image Processing
applications demand a 9-pixel neighbourhood, and others a 4-pixel
neighbourhood.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 9
The most common geometric transformations are Translation, Rotation, Scaling and
Shearing, these are accomplish by operating on the image pixel coordinates.
In the bi-dimensional cases shown below after a geometric transformation the position (x,y)
of a pixel is changed to (xnew, ynew). Coefficients Dx, Dy, Sx and Sy not necessarily have
integer values.
In Rotation, the pixel (x,y) is rotated an angle with respect to the origin of coordinates (0,0).
Slightly different equations in x and y can generate rotations with respect to a different
reference point.
In Scaling, pixel (x,y) is displaced to position (x Sx, y Sy) from (0,0). When this operation is
carried-out over the pixels of a polygon, this becomes larger or smaller, depending on the
values Sx and Sy. If Sx = Sy the change of size is uniform, same in all directions, if these
are different, the size change is not uniform.
In Shearing not necessarily Sx = Sy, this means that shearing may be not uniform in x and
y.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 10
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 11
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 12
The main problem in DIP is that due to the fact that pixels have only discrete coordinates
some geometric transformations produce an image that is only an approximation of the
original one. This can be visualized by rotating a point P by an angle A and then rotating by –
A the resulting point. Mathematically this operation generates the original point P, however in
DIP the original point is not always recovered.
since images have only discrete coordinates, the resulting point is (-1,16).
As an attempt to obtain the original point, rotate (-1,16) by –45°:
These results generate the image point (11,12), and it can be seen that the original point
cannot be recovered.
Mathematically a straight line is a succession of connected points all with the same slope. In
DIP this is not necessarily true, because due to the discretization of image pixel coordinates
a line may also be a succession of line segments and not necessarily connected.
In some applications it is necessary to have a full line, one without gaps, and if there are
chances the line might appear as a set of aligned line-segments and these include gaps, like
objects G and H in the figure, it is necessary then to use an algorithm for automatically
detecting the gaps if they exist and refill them. This is the case when it is essential for
example to know the exact number of pixels in a line, a situation arising in several pattern
recognition algorithms.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 13
The black and white degrees in grey-levelled image range from 0 through 255, and the
darker the image, the lower its grey levels. Colour images have R, G, and B components
ranging from 0 through 255 each, and here also, the brighter the image, the higher its R, G,
and B components.
The algorithm to change the degree of brightness or darkness in a colour image is:
Darkness:
New_Red = Red - DeltaDarkness
New_Green = Green - DeltaDarkness
New_Blue = Blue - DeltaDarkness
Brightness:
New_Red = Red + DeltaBrightness
New_Green = Green + DeltaBrightness
New_Blue = Blue + DeltaBrightness
In a grey-leveled image simply increase (or reduce) its grey levels in order to make it brighter
(or darker).
Return to Index
The colours of an image can be “Inverted” by means of the following operation on its Red,
Green and Blue colour components
Original Colour
Grey-level Inverted
Image Image
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 14
Operating on every image pixel (x,y), Images may be rotated and flipped with the
transformations shown next. Additional transformations can easily be devised. For all the
transformations the origin of coordinates is placed in the top left corner of the image.
Original
(x,y)
Horizontal Mirror
(x,y) >>> (-x,y)
Upside down
(x,y) >>> (x,-y)
90° Rotation
(x,y) >>> (y,-x)
Image Subtraction
[ Imagery module: Transformations – Image Subtraction]
Given two images f(x,y) and g(x,y) the difference image h(x,y) is given by
h( x, y) f ( x, y) g ( x, y)
and it is computed by obtaining the difference between pairs of corresponding pixels in
images f(x,y) and g(x,y). The resulting image h(x,y) contains only the regions making the
difference between f(x,y) and g(x,y).
Image subtraction has important applications in image enhancement and in image
segmentation. Image subtraction is commonly used in radiography enhancement, a medical
imaging application.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 16
Image Segmentation
[ Imagery module: Segmentation – Binary image segmentation through wrapping ]
Image segmentation, this is, the process of identifying individual pixels in an image matrix as
being members of different objects or regions in a scene, is an essential constituent of
Machine Vision, a topic Artificial Intelligence deals with.
In simple words, image segmentation is the process of dividing an image into regions. For
instance, if there is an image showing an apple, an orange, a book and a pen, after
segmentating this image, four new images may be generated, each one showing one of the
mentioned objects. Once images have been segmentated, the generated images may be
used to achieve some tasks in pattern recognition.
The algorithm introduced here to carry out segmentation of binary images consists in
surrounding (encapsulating) the objects inside an image, in capsules or wrappings and then
extracting each capsule. In the figure below, it is possible to appreciate some limitations of
the algorithm.
In order to wrap every object of the image in a capsule, a top-down and left-right sweeping of
the primary (original) image containing the objects is executed, and as soon as a pixel
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 17
different from the background is detected, walk along its border and mark the surrounding
pixels, these marked pixels -once first with last are joined- become the capsule wrapping the
object. The process is repeated for as many pixels different from the background are
detected, using a different mark (capsule or wrapping) for every object. Subsequently carry
out the detection of capsules by just scanning the image searching for the marked pixels and
extract the contents of each capsule.
The proposed algorithm performs rather well, however there are some restrictions, which
arise specially if the algorithm is to be applied to automatic pattern recognition:
(1) Elements in an image must be separated, this is, objects in the image must be very well
individualized. The proposed algorithm does not operate well with overlapped objects,
because these may be regarded as a single object.
(2) The borders (frontiers) of the elements of the image must be well defined.
(3) Inside the rectangle that tightly surrounds every object, there must be only one object,
this despite the fact that objects are not wrapped in rectangular capsules. This problem may
be appreciated in the frame 2-2 above, which includes a pistol and a rectangle, notice that
the rectangle appears in frames 2-2 and 2-3.
In order to read more about this algorithm a published paper is included in the annex.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 18
Histograms
[ Imagery module: Histograms – Histograms (Grey-levelled images) ]
The histogram of a grey-level digital image gives the number of pixels per grey level in the
image.
The histogram gives also information about the probability of finding a given grey-level in the
image, the higher the number of pixels for a given grey level, the higher the probability of
finding that grey level in the image, and vice versa.
Histogram equalization: A histogram has been equalized when it has been normalized
between 0 and 1, with 0 representing black and 1 representing white. In this way the grey
levels may be regarded as random quantities in the interval [0, 1].
Imagery allows simultaneously visualizing and comparing the histograms of three grey-leveled images.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 19
Color images in the RGB (red, green and blue) representation, build each pixel color as a
combination of intensities ranging from 0 through 255 of red, green and blue colors.
The histogram of a color image gives the number of pixels per each intensity of Red, Green
and Blue in the image.
A color image generates three histograms, corresponding to the red, green and blue color
components. These histograms in general must be different among them, unless the image
has the same amount and distribution of red, green, and blue.
When a color image histogram has been equalized, it has been normalized between 0 and 1,
with 0 representing the darkest intensity of the color it is associated to, and 1 represents the
brightest intensity of that color.
The figure shows Lenna’s color image and the associated red, green and blue histograms obtained
with Imagery.
Since pixels in grey-leveled images have the same intensity of red, green and blue, then the
R, G and B histograms generated by a grey-leveled image are all equal. Hence a grey-
leveled image may be regarded as a color image whose pixels have the same quantities of
red, green, and blue.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 20
Sometimes it is necessary to work with a binary (strict black and white) version of a grey-
levelled image; this happens especially in pattern recognition applications, which usually
operate on binary images, because in those cases, only the silhouette of an object is
needed. The histogram of a grey-levelled image may be used to set a binarization threshold.
When binarizing a grey-levelled image, those pixels in the image whose grey-levels are
above the prefixed threshold are highlighted, by for example showing them in white colour on
a black background.
In the image above the input grey-levelled image to be binarized is the side view of a head,
which resulted from Nuclear Magnetic Resonance (NMR) scanning. The associated
histogram appears in green color, and at the bottom are three binarization instances,
obtained with thresholds of 75, 120 and 150, respectively.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 21
The figure below shows how the histogram of a grey-levelled image can be used to set a
threshold so that the image is binarized (turned into strict black and white) and then by
means of the algorithm to detect incomplete neighbourhoods around pixels, the edge pixels
are detected.
The binarized image displays (in white) only those pixels whose grey-level is above the
binarization threshold, those pixels whose grey-levels are below this threshold are discarded.
At the top of the figure the grey-leveled input image is shown along with its corresponding
histogram in green color, at the bottom, the binarized input image associated to a
binarization threshold of 89, is shown and this is accompanied by the edge images obtained
with the four and eight neighborhood techniques respectively. When the four neighboring
pixels technique is used, the edge image has 1528 pixels, and when the eight pixels
neighborhood is considered, the edge image contains 2035 pixels.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 23
Spatial Operators are also known as Box Filters, Windows, Masks and Templates. When a
spatial operator T operates on an image f(x,y) it generates the image g(x,y):
g ( x, y) T f ( x, y)
These are usually 3x3 (may be smaller or larger) matrices containing a weight factor in each
cell. By means of a discrete convolution the centre of the filter matrix is placed on each
image pixel and the new value of that pixel is the weighted sum of the pixels in its 3x3
neighbourhood.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 24
The image below shows a module included in the Imagery EduVirtualLab which allows the
user to operate on grey-leveled images with his/her own 3x3 filters, this module enables the
user to investigate the effect of the filters he/she defines. The module allows visualizing on
screen and comparing the effect of three different user-defined filters.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 25
Smoothing Filters
[ Imagery module: Convolution – Spatial Operators ]
This filter is used for noise removal by means of neighbourhood averaging, it is given by:
Median filters are classified as nonlinear filters. Median filters achieve noise reduction with
small blurring. The grey-level of each pixel in the image is replaced by the Median of the
grey-levels in the neighbourhood of the pixel and not by the neighbourhood average as
Smoothing filters do. Their size and shape depends on the application. The filter may be
applied to an image in any of its four versions: Square, Cross, Vertical and Horizontal Strip
Given a set of values, the Median m is such that half the values are less than m and half are
greater than m. A Median filter forces pixels with distinct grey-levels to be more like their
neighbours.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 26
To set up a median filter order from min to max the values of the grey-levels of the pixels in
the neighbourhood of each pixel p take the Median of this set (including p) and replace the
grey level of pixel p with the Median.
The shape of the Median filter, this is its type of neighbourhood, seriously affects its filtering
effects. The most common shapes are Square, Cross, Horizontal Strip and Vertical Strip.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 27
This filter is used for image enhancement and it is based on subtracting a blurred (smoothed)
image from its original.
Let G(x,y) be the enhanced image obtained from f(x,y) by means of
G( x, y) f ( x, y) Smooth[ f ( x, y) ]
where Smooth[f(x,y)] is the smoothed version of f(x,y) obtained as the local average of the
eight neighbouring pixels surrounding but not including each pixel (x,y):
The drawback of the Unsharp Masking filter is that it enhances noise and introduces some
ringing around noisy dots.
The three basic discontinuities in a digital image are: Points, Lines and Edges.
The easiest way of detecting a discontinuity in an image is by convolving the image with a
3x3 mask or filter:
The Result of the operation at the location of every image pixel centred in the centre of the
filter, is
R n1Wn Z n
9
where Zn is the grey-level of pixel n in the image and Wn is the Weight of pixel n in the mask.
The result R is defined with respect to the central pixel of the mask.
Detection of Points
[ Imagery module: Masks – Point & Small hole detection mask ]
The grey-level of an isolated point is quite different from the grey-levels of its neighbours, a
filter to detect this is
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 28
and the Result of the evaluation of the filter at every pixel position must be
Return to Index
where Threshold is not negative, obviously detection results depend on the value of
Threshold. This filter detects points and small holes.
Detection of Lines
[ Imagery module: Masks – Line detection masks ]
In order to detect lines horizontal, vertical and diagonal lines at 45° and 135°, the following
filters may be used
R n1Wn Z n
9
Suppose the four line detection masks are run through an image and if at a given pixel (xo
,yo), it happens that | Result m | > | Result n |, this means that pixel (xo ,yo) is more likely to
belong to a line in the direction of mask m than in the direction of mask n. For example, if
for a given pixel (xo ,yo) it is found that
Return to Index
Detection of Edges
The Gradient
The Gradient of a function f(x,y) is a vector that points in the direction of maximum rate of
change of the function f(x,y). The
magnitude G of the gradient gives the
maximum rate of change of f(x,y) in the
direction of vector G .
First derivative Dx with respect to x, and Dy with respect to y, and Gradient Dx+Dy
Operators are used to detect edges. Dx and Dy detect edges mainly perpendicular to their
directions, this is, Dx detects mainly vertical edges and Dy
detects horizontal edges. However the gradient Dx+Dy is an isotropic edge detector, this
means that it detects edges independent of their orientation.
The magnitude of the Digital Gradient (first derivative) can be used to detect edges in an
image and the sign of the Laplacian (second derivative) can be used to detect whether an
edge pixel lies on the dark or light side of the edge.
The second derivative of an edge pixel is Positive when the pixel lies on the dark side of an
edge, it is Negative when the pixel is on the light side of an edge
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 31
The figure above shows the edge detection by first derivatives (Gradient). The input image
(frames.bmp) containing lines with different orientations has been processed with the
Imagery module that detects the gradient of an image. The other three images are output
images, and in these, detected pixels have been highlighted. Image Dx displays the
derivative with respect to x, detecting primarily verticals. Image Dy shows the derivative with
respect to y, and it can be seen that detects mainly horizontals, however, in image Dx+Dy
all edges -independent of their direction- have been detected.
In the figure the Gradient threshold has been set to 0.50. The lower the threshold, the higher
the number of detected pixels. Only image pixels whose gradient is above the gradient
threshold are detected. When the threshold is too high, no pixel is detected and the output
image is blank.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 32
The gradient is used to detect edges, but it enhances noise. The edge enhancement by
gradient is optimal when the image is noise-free, however if it includes noise this is also
enhanced, and in cases like this, taking the derivative of the image in order to boost the
object borders may not produce the desired results.
The Sobel edge detectors operate not only on binary images, they also operate on grey-
scale images.
The
image above displays the Sobel’s edge enhancement by Gradient (first derivative). The input
(grey-levelled) image is a cut of a human brain. The output (binary) images are the
corresponding edge profiles under three different thresholds. It can be seen that the lower
the threshold, the higher the number of detected edge pixels. Only those image pixels
whose gradients are above the threshold are detected and highlighted.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 33
The figure above allows appreciating the noise enhancing effect of the Sobel derivative
operator. It can be seen that when the threshold is high, object edges may be imperceptible,
and on the other extreme, when the threshold is low, noise is enhanced.
Notice that finding the correct threshold for only object edges to be enhanced in a noisy
image may be relatively easy, if done manually, however, in computer vision applications -
where most image processing is made automatically by a computer- and in a case like the
one shown in the image, it may be not so easy to find the correct threshold.
Return to Index
The Laplacian is a second derivative operator used to detect edges, but it also enhances
noise.
High-boost Filter
[ Imagery module: Edges – High boost filter ]
An original image f(x,y) may be represented as the sum of its two components
f(x,y) = Highpass + Lowpass
then Highpass = f(x,y) – lowpass and this is a kind of sharpening mask.
Let A be an image amplification factor, then
High boost = A f(x,y) – lowpass
Adding and subtracting f(x,y): High boost = A f(x,y) – f(x,y) + f(x,y) – lowpass
High boost = (A-1) f(x,y) + f(x,y) – lowpass
High boost = (A-1) f(x,y) + Highpass
Notice that when A = 1, then the standard Highpass image is obtained. When A > 1, part of
the original image is added back to the Highpass result, then the High boost image looks
more like the original image and includes some degree of edge-enhancement that depends
on the value of A.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 36
Dilation expands a region whereas Erosion erodes (shrinks) it. Dilation and Erosion
masks may be either 4-pixel or 8-pixel neighbourhoods around a central pixel.
Dilation
[ Imagery module: Transformations – Dilation of binary images ]
In Dilation pixels having at least one neighbour within the predefined (4 or 8 pixel) mask
survive, otherwise they are discarded.
With Dilation small holes or cracks become filled and contour lines become smoother.
Shapes or objects dilated with an 8-pixel mask result much more dilated than those dilated
with a 4-pixel mask.
Erosion
[ Imagery module: Transformations – Erosion of binary images ]
Erosion removes boundary pixels, thus only pixels having full neighbourhoods survive an
Erosion operation, all other pixels vanish. Eroded objects (shapes) are thinner than their
originals.
With Erosion some noise may be eroded and then fade away. Groups of pixels connected
by a small bridge, become disconnected after erosion, and objects smaller than the mask
completely disappear.
Erosion with an 8-pixel mask erodes much more than with a 4-pixel mask.
The Opening operation removes small objects in an image; it is achieved by Erosion followed
by Dilation.
Erosion eliminates small objects in an image but it also shrinks all the remaining objects. In
order to avoid this shrinking, the image may be dilated after erosion.
The Closing operation can be used to refill holes and cracks; it is achieved by Dilation
followed by Erosion.
Dilation refills small holes and cracks, but enlarges the objects in an image. This
enlargement may be reversed by eroding the image after it has been dilated.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 37
Return to Index
The (absolute) difference between two images f and g is given by d = | f – g |; the absolute
value of the difference is considered because the difference may be positive or negative,
depending on the grey-level of the corresponding pixels on both images.
When the difference d = | f – g | = 0, both f and g are exactly equal images, without the
slightest difference between them.
Since the grey-levels vary from 0 through 255, a given pixel at (xo,yo) in two images f and g,
may differ by some value between 0 and 255, and a controlled difference between the
images may be obtained. This controlled difference may detect only those pixels satisfying
where T is a threshold. When T = 1 then the minimum (absolute) difference in the grey-
levels of corresponding pixels at (xo,yo) in f and g is detected and, when T is smaller than the
difference between the grey-levels at (xo,yo), no difference is detected.
Once the different pixels have been detected in images f and g, these may be reproduced on
the image that does not include them and, in this way a new image h being the fusion of
images f and g is generated. Notice that the new image h is a controlled fusion of f and g.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 38
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 39
The weighted average of two images generates an image that contains a prescribed amount
of information from both input images. In this case, user-defined percentages of the two input
images are combined according to the following algorithm:
( ) ( )
{ ( ) ( )
( ) ( )
As usual the devil hides in the details and, notwithstanding the resulting image has
information from both input images, the drawback of this algorithm is that the output image
has -with regard to color- less information than any of the two constituent images, this
because every pixel color has been averaged.
In the Imagery Virtual Lab the above mentioned drawback has in some way been avoided,
by adding the option of ignoring (disregarding) a color during image fusion. When a color to
disregard is selected, that color is assigned a contribution percentage of 0 during fusion, in
this way any color that fuses with the selected color contributes with 100 % to the image
fusion.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 40
Pattern Recognition
Pattern Recognition is the collection of techniques that allow classifying objects or signals
within a group of pre defined categories.
Animals constantly carry out pattern recognition when they identify objects and persons (by
sight) and when they identify sounds (by ear).
The Invariant Pattern Recognition is the collection of techniques (algorithms and strategies)
allowing to classify or recognize and object within an image, independently of its position,
orientation and size. Pattern recognition makes use of digital image processing.
Computer Vision
Computer Vision, also known as Cybernetic Vision, is the area of Artificial Intelligence aimed
at recognition and classification of objects within images. The goal of Computer Vision is
the autonomous and automatic application of algorithms and techniques belonging to digital
image processing, so as to replace the human eye and accomplish the functions of the
human visual system.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 41
The Pattern Centroidal Profile reduces the pattern (object) boundary 2-D representation to a
much simpler 1-D function representation, the “Signature” of the pattern.
Given a pattern (object) in input image, its Centroid (geometric center) is detected by means
of the Physics equation for Centre of Mass given by
and then the whole pattern is displaced in such a way as to put its centroid in the Origin of
Coordinates (0,0),
Since neither rotation nor scaling is carried out, the orientation and size of the translated
pattern are the same it had originally. Next the Angle and Distance of every border point
(x,y) with respect to (0,0) is computed, this constitutes the Centroidal Profile Representation
(Signature) of the pattern.
The Centroidal profile representation is possible only as long as the object is not solid but
edged, has no holes and the image is noiseless.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 42
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 43
There are two types of Invariant Moments, the Invariant Moments, created by M. K. Hu, on
1961, operating on all the pixels of the object to be recognized, and the Improved Invariant
Moments, created by C.C. Chen, on 1993, which operate only on the boundary (edge) pixels
of the object. Here the former are referred to as Massive Invariant Moments and the later as
Boundary Invariant Moments.
m pq x p, q 0,1,2,
p
y q f ( x, y ) dx dy, (1)
These geometrical moments are not invariant. The double integrals are to be
considered over the whole area of the object including its boundary, this implies
computational complexity of order O N . The density distribution function f x, y gives
2
the intensity color of the point x, y in image space, in simpler words, the function f(x,y) is
the color of point (x,y) in the image. In practical pattern recognition applications the image
space is reduced to a binary version, and in such a case f x, y takes the value of 1 when
the pixel x, y represents objects or even noise and it is 0 when it is part of the
background.
Notice that
∫ ∫ ( ) ∫ ∫ ( )
is the total area of the object f(x,y), this is, its total number of pixels, its area.
When the geometrical moments mpq in equation (1) are referred to the object Centroid or
Centre of Mass xc , y c , they become the Central Moments , and they are invariant to
translation:
pq (x x ) ( y yc ) q f ( x, y ) dxdy
p
c (2)
Central Moments may be normalized to turn also invariant to area scaling (change of
size) through the relation
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 44
( )
The set of seven lowest order Rotation, Translation and Scale (RTS) invariant functions i
include invariants up to the third order, it is given by:
1 20 02
2 (20 02 ) 2 4112
3 (30 312 ) 2 (321 03 ) 2
4 (30 12 ) 2 (21 03 ) 2
5 (30 312 )(30 12 )[(30 12 ) 2 3(21 03 ) 2 ]
(321 03 )(21 03 )[3(30 12 ) 2 (21 03 ) 2 ]
6 (20 02 )[(30 12 ) 2 (21 03 ) 2 ] 411(30 12 )(21 03 )
7 (321 03 )(30 12 )[(30 12 ) 2 3(21 03 ) 2 ]
(30 312 )(21 03 )[3(30 12 ) 2 (21 03 ) 2 ] (4)
In practical pattern recognition applications the equations (1) and (2) are discretized for
binary images according to
∑∑ ( ) ( )
∑∑ ( )( ) ( ) ( )
In practice when the set of equations (4) is applied to a group of n images containing
different (rotation, translation and scale) instances of the same object, seven numbers
are obtained from each image (instance). These numbers are, if
not equal, at least close to each other for every
( ) ( ) ( ) ( )
{
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 45
As an example of the application of the Invariant Moments, the following six different RTS
(Rotation, Translation and Size) instances of a holder were submitted to the Massive
Invariant Moments module in the Imagery Virtual Lab.
The following table shows the seven invariant massive moments for the six different
instances of the holder. In order to avoid dealing with huge numbers, their logarithms were
used.
As it can be seen, the invariant moments are not exactly equal for different instances of the
same object, exists a range of variation, and in pattern recognition applications the range of
variation of the invariant moments must be taken into account.
The table below shows the range of variation of the invariant moments for the holders used
in this example.
C.C. Chen introduces a method to compute a set of slightly different invariant functions
based on computations only along the boundary of the object. In this way the computational
complexity of the problem and the computer time are reduced from O( N2 ) to O(N).
Computations to achieve the Boundary Moments are much simpler than those to obtain the
Massive Moments, however those require pre-processing in order to extract the boundaries
of the objects
C.C. Chen uses the same RTS invariant functions given by equations (4) deduced originally
by Hu, however he introduces a new scaling factor instead of
(see equations (12) and (3)) to achieve invariance to boundary length scaling.
( ) ( ) ( )
notice that m00 is the length of the curve C, the edge of the object.
For the Boundary Moments to turn invariant to translation, they must be referred to the object
centroid:
pq f ( x, y) ( x xc ) p ( y yc ) q dl (10)
C
these are the Boundary Central Moments, and the integral must be evaluated along the edge
pq
( x, y ) C
f ( x, y) ( x xc ) p ( y yc )q (11)
and it can be seen that after discretization it is not necessary to carry out the sum in any
particular order; this means that x, y C can be taken in any order, for example, as they
are met when sweeping the image space top-down and left-right.
Return to Index
( )
The Accumulator Space ρ-ϴ is discretized in cells of coordinates ( ), and the sinusoid
associated to a point (x,y) contributes with votes to the accumulator cells it passes by. Even
noise dots in input space, generate a sinusoid in the Accumulator.
Aligned points in X-Y space generate sinusoids in the Accumulator that intersect in at least
one point ( ), whose coordinates may be used to identify and reconstruct that set of
aligned points, or line.
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 48
As the sinusoid passes by different cells ( ) in the accumulator, the votes stored in those
cells is incremented in one by each sinusoid passing by a cell, in this way, the value stored in
each cell of the Accumulator is the number of sinusoids crossing that cell.
Since all the sinusoids corresponding to aligned points (or line) pass by a given cell, the
counting stored in that cell is the number of dots in the line, and after Hough Transforming an
input space (input image), the highest counts in the accumulator will correspond to aligned
points or lines and the lowest counts will be associated to noise or to sets of a few aligned
dots.
In Pattern Recognition applications after Hough transforming the input space XY, the
accumulator cells containing the highest values are identified, cropped and processed so as
to extract information from them, this automatically discards noise and also short lines.
For instance, after Hough transforming a Cartesian input space XY containing a rectangle,
many cells of the accumulator will store different integer values, but the four highest stored
values will correspond to the four sides of the rectangle. The cells containing these highest
values are easily detected; consequently the rectangle is represented by only four
accumulator pairs ( ). Noise dots, if present in input space, will store low values in the
accumulator cells.
Return to Index
Digital Image Processing Javier Montenegro Joo www.VirtualDynamicsSoft.com 49
References
This Digital Image Processing course is based on the following material:
[1] J. Montenegro Joo. Geometric-Transformations Invariant Pattern Recognition in the Hough Space,
Doctoral Degree Project. Cybernetic Vision Research Group, Instituto de Física de Sao Carlos
(IFSC), Dpto. de Física e Informática, Universidade de Sao Paulo (USP), Sao Carlos, SP, Brazil.
(August 1994)
[2] J. Montenegro Joo. Invariant Boundary moments in Pattern Recognition. The method of C.C.
Chen. Doctoral Qualification Exam (April 1994). Cybernetic Vision Research Group, Instituto de
Física de Sao Carlos (IFSC), Dpto. de Física e Informática, Univ. de Sao Paulo (USP), Brazil.
[4] J. Montenegro Joo. Invariant Recognition of Rectangular Biscuits through an Algorithm Operating
exclusively in Hough Space. Flawed Pieces Detection. RIF-UNMSM, Vol 5 (2002)
[6] J. Montenegro Joo. Improved Moment Invariants Know How, Why and When,
RIF-UNMSM., Vol. 8, No 2, 2005
[8] J. Montenegro Joo. Boundary Geometric Moments and its application to automatic quality control
in the Industry. JMJ, Industrial Data, Vol. 9, No 1, 2006
[9] Javier Montenegro Joo, Hough-Transform based algorithm for the automatic invariant recognition of
rectangular chocolates. Detection of defective pieces. Industrial Data Vol 9, No 2, 2006.
[10] J. Montenegro Joo. Hough-Transform based Automatic Invariant Recognition of Metallic Corner-
Fasteners. Industrial Data, Vol. 10 - No 1 – 2007
[11] Javier Montenegro Joo. Automatic Classification of Products in the Industry via Invariant
Boundary Moments. Industrial Data, Vol 10, No 2, 2007
[12] Javier Montenegro Joo. Image Segmentation through Encapsulation of its Constituents.
Industrial Data, Vol 13, No 1, 2010