UNIT 3
UNIT 3
Geometric transformations (or operations) are used in image processing for a variety of
reasons.
They allow us to bring multiple images into the same frame of reference so that
they can be combined (e.g. when forming a mosaic of two or more images) or
compared (e.g. when comparing images taken at different times to see what
changes have occurred).
They can be used to eliminate distortion (e.g. barrel distortion from a wide angle
lenses) in order to create images with evenly spaced pixels.
Geometric transformations can also simplify further processing (e.g. by bringing
the image of a planar object into alignment with the image axes;
Given a distorted image f(i, j) and a corrected image f ′ (i ′ , j ′ ) we can model the
geometric transformation between their coordinates as i = Ti(i ′ , j ′ ) and j = Tj(i ′ , j ′ )
that is given the coordinates (i ′ , j ′ ) in the corrected image, the functions Ti() and Tj()
compute the corresponding coordinates (i, j) in the distorted image.
1. Obtaining the distorted image by imaging a known pattern (such as in Figure 5.2)
so that the corrected image can be produced directly from the known pattern.
2. Obtaining two images of the same object, where one image (referred to as the
distorted image) is to be mapped into the frame of reference of the other image
(referred to as the corrected image).
Once sufficient correspondences are determined between the sample distorted and
corrected images, it is relatively straightforward to compute the geometric transformation
function. Once the transformation is determined it can be applied to the distorted image
and to any other images that require the same ‘correction’.
◦ Interpolate a value for the output point from close neighbouring points in the input
image, bearing in mind that Ti() and Tj() will compute real values and hence the point (i,
j) is likely to be between pixels.
AFFINE TRANSFORMATIONS
In Affine Transformation, all parallel lines in the original image will still be parallel in
the output image.
To apply affine transformation on an image, we need three points on the input image and
corresponding point on the output image.
M = cv2.getAffineTransform(pts1,pts2)
Where pts1 is an array of three points on the input image and pts2 is an array of the
corresponding three points on the output image. The translation matrix M is a numpy
array of type np.float64.
cv2.warpAffine(img,M,(cols,rows))
where,
Steps
To perform an image affine transformation, you can follow the steps given below −
1. Import the required library. In all the following Python examples, the required
Python library is OpenCV. Make sure you have already installed it.
o import cv2
2. Read the input image using cv2.imread() function. Pass the full path of the input
image.
o img = cv2.imread('lines.jpg')
3. Define pts1 and pts2. We need three points from the input image and their
corresponding locations in the output image.
o pts1 = np.float32([[50,50],[200,50],[50,200]])
o pts2 = np.float32([[10,100],[200,50],[100,250]])
4. Compute the affine transform matrix M using cv2.getAffineTransform(pts1,
pts2) function.
o M = cv2.getAffineTransform(pts1, pts2)
5. Transform the image using cv2.warpAffine() method. cols and rows are the
desired width and height of the image after transformation.
o dst = cv2.warpAffine(img,M,(cols,rows))
6. Display the affine transformed image.
o cv2.imshow("Affine Transform", dst)
1. Translation
A translation is a function that moves every point with a constant distance in a specified
direction. It is specified as tx and ty which will provide the orientation and the distance.
tx: Width shift.
ty: Heigh shift.
2. Rotation
Rotation is a circular transformation around a point or an axis. We can specify the angle of
rotation to rotate our image around a point or an axis.
3. Scaling
Scaling is a linear transformation that enlarges or shrinks objects by a scale factor that is
the same in all directions. We can specify the values of the sx and sy to enlarge or shrink
our images. It is basically zooming in the image or zooming out the image.
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('food.jpeg')
rows, cols, ch = img.shape
M = cv2.getAffineTransform(pts1, pts2)
dst = cv2.warpAffine(img, M, (cols, rows))
plt.subplot(121)
plt.imshow(img)
plt.title('Input')
plt.subplot(122)
plt.imshow(dst)
plt.title('Output')
plt.show()
cv2.imshow('image', img)
if cv2.waitKey(20) & 0xFF == 27:
break
cv2.destroyAllWindows()
PERSPECTIVE TRANSFORMATION
When human eyes see near things they look bigger as compare to those who are far away.
This is called perspective in a general way. Whereas transformation is the transfer of an
object e.t.c from one state to another.
So overall, the perspective transformation deals with the conversion of 3d world into 2d
image. The same principle on which human vision works and the same principle on
which the camera works.
Frame of reference:
Object
World
Camera
Image
Pixel
Object coordinate frame
Object coordinate frame is used for modeling objects. For example, checking if a
particular object is in a proper place with respect to the other object. It is a 3d coordinate
system.
Camera co-ordinate frame is used to relate objects with respect of the camera. It is a 3d
coordinate system.
Y = 3d object
y = 2d Image
Now there are two different angles formed in this transform which are represented by Q.
Where minus denotes that image is inverted. The second angle that is formed is:
From this equation, we can see that when the rays of light reflect back after striking from
the object, passed from the camera, an invert image is formed.
For example
Calculating the size of image formed
Suppose an image has been taken of a person 5m tall, and standing at a distance of 50m
from the camera, and we have to tell that what is the size of the image of the person, with
a camera of focal length is 50mm.
Solution:
Since the focal length is in millimeter, so we have to convert every thing in millimeter in
order to calculate it.
So,
Y = 5000 mm.
f = 50 mm.
Z = 50000 mm.
= -5 mm.
INTERPOLATION
As each point in the output image will map to real coordinates in the input (distorted)
image, it is unlikely that the coordinates selected in the input image will correspond
precisely to any one pixel. Hence we need to interpolate the value for the output point
from the nearby surrounding pixels in the input image. There are a variety of possible
interpolation schemes and three of these will be detailed here.
Interpolation is generally used to estimate new data points from the known data
points by statisticians for better understanding of the underlying data, can be used to
approximate complex functions for efficient experimentations or even used for
scaling images!
Image Resizing — Basic Idea:
A 2-d image is basically represented as a 2-dimensional matrix with each cell in the
matrix containing a pixel value. So when we say scaling up this matrix, it means creating
a bigger matrix than the original one and fill up the missing pixel values in this bigger
matrix.
1. Consider an input matrix of size 3x3 with each cell containing some pixel values in the
range 0–255. row, col indices starts from 0 to n-1 where n is the row/col length.
3. Inorder to start filling the pixel values for the new matrix, we first have to represent
the output coordinate space interms of the input coordinate space i.e. for every (row,
col) in the output matrix, what is the corresponding (row, col) in the input matrix? This
is just the scaling factor which is 1/2 in our case.
4. To make it more clearer, the row scaling factor is 1/2 and column scaling factor is
1/2(row and col will have separate scaling factors but since our eg considers a square
matrix, both are same here).
5. row 0, col 0 in output is mapped to row 0, col 0 in input, whereas row 1, col 1 in
output is mapped to row 0.5, col 0.5 in input and so on.
The above image shows how the transformed (row, col) coordinates look like for the
output matrix.
One can see that there are known cells such as 0, 0 or 2, 2(in integers) and there are also
unknown cells such as 1.5, 2 or 0, 2.5(in floating point).
It is easier to fill in the pixel values for the known cells by simply picking the
corresponding pixel values of those (row, col) cells from the original matrix but how do
we find the pixel values for the unknown cells?
This is same as estimating new data points given some set of input data points where the
given data points are pixel intensities from our input 3x3 matrix and the unknown pixel
intensities(or new data points) are computed using them. Hence interpolation is used to
find those unknown pixel values.
Here in the above image, a simple way to find the values for unknown cells such as (0.5,
0.5) or (2, 2.5) is to simply round them off to nearest integer. for eg: (2, 2.5) to (2,
2) or (1.5, 2.5) to (1, 2). This is called nearest neighbour interpolation.
In nearest neighbour interpolation we simply round the real coordinates so that we use the
nearest pixel value. This scheme results in very distinct blocky effects that are frequently
visible.
1. Consider an input image I of dimensions (hi, wi, c) where c=3. Let’s assume the image
I has to be resized to twice its size. i.e. output image R of dimensions (hr, wr, c).
2. For every x, y coordinates in the output image, find its corresponding x, y coordinates
in the input image.
3. Now each of the remapped x and y output coordinates can either correspond to the
actual input pixel coordinates or coordinates in-between them (i.e. floating point
coordinates).
4. For floating-point coordinates, we could simply round them off and pick the
corresponding integer coordinate — hence called nearest neighbour interpolation.
PYTHON CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
image = cv2.imread("insta.png")
# Loading the image
2. Bilinear Interpolation:
Bilinear interpolation makes use of linear interpolation method to compute the pixel
values for a new image.
1. For any unknown (row, col) cell in the upsampled matrix, pick the 4 nearest pixels.
These nearest pixels can be obtained by doing int(row), int(col), int(row) + 1 and
int(col) + 1. Let’s call it row1, col1, row2 and col2
2. Perform linear interpolation at (row1, col) using (row1, col1) and (row1, col2) and
similarly linear interpolation at (row2, col) using (row2, col1) and (row2, col2). Both
are along x-directions.
3. Do one final linear interpolation at (row, col) using (row1, col) and (row2, col).
4. The above steps are repeated for every unknown (row, col) cell in the new matrix.
5. Note that the same pixel value can be obtained by doing two linear interpolation along
y-directions first and then along x-direction.
Let us take an example and see how this method works in practice:
1. For any unknown (row, col) cell in the upsampled matrix, pick the 4 nearest pixels. For
eg: for cell (0.5, 0.5), the 4 nearest pixels are (0, 0), (0, 1), (1, 0), (1, 1).
2. Finding the pixel value at (0.5, 0.5) means first finding the pixels values at (0, 0.5) and
(1, 0.5) and then using them to find the value at (0.5, 0.5)
3. Do linear interpolation twice along x-direction — one at (0, 0.5) using <(0, 0), (0,
1)> and another at (1, 0.5) using <(1, 0), (1, 1)>
4. Then another interpolation along y-direction — at (0.5, 0.5) using (0, 0.5) and (1, 0.5)
Let (x1, y) and (x2, y) be the pixel coordinates in the new matrix and their intensities
be I1 and I2 where x2 > x1.
New pixel intensity is nothing but the weighted sum of pixel intensities of the nearest two
pixels where the weight is determined by the distance from the nearest pixels.
1. Consider an input image I of dimensions (hi, wi, c) where c=3. Let’s assume the image
I has to be resized to twice its size. i.e. output image R of dimensions (hr, wr, c).
2. For every x, y coordinates in the output image, find its corresponding x, y coordinates
in the input image.
3. Now each of the remapped x and y output coordinates can either correspond to the
actual input pixel coordinates or coordinates in-between them (i.e. floating point
coordinates).
4. For floating-point coordinates, get the nearest four pixels by by doing int(row),
int(col), int(row) + 1 and int(col) + 1.
5. For every unknown cell, perform linear interpolation twice along x-direction and one
more along y-direction to compute the pixel intensity. Note that there can also be only
one coordinate which is float while the other is integer — for eg: (0, 0.5). In such
cases, it is enough to do linear interpolation once along either x or y direction to
compute the unknown cell.
PYTHON CODE
import cv2
import numpy as np
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
image = cv2.imread("insta.png")
# Loading the image
3. Bi-Cubic Interpolation
PYTHON CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt
image = cv2.imread("insta.png")
# Loading the image
stretch_near = cv2.resize(image, (1920, 1920),interpolation = cv2.INTER_CUBIC)
cv2.imshow(stretch_near)
EDGE DETECTION
Two types:
Gradient – based operator which computes first-order derivations in a digital image like,
Sobel operator, Prewitt operator, Robert operator
Gaussian – based operator which computes second-order derivations in a digital image like,
Canny edge detector, Laplacian of Gaussian
Sobel Operator: It is a discrete differentiation operator. It computes the gradient
approximation of image intensity function for image edge detection. At the pixels of an image,
the Sobel operator produces either the normal to a vector or the corresponding gradient vecto r.
It uses two 3 x 3 kernels or masks which are convolved with the input image to calculate the
vertical and horizontal derivative approximations respectively –
Advantages:
Prewitt Operator: This operator is almost similar to the sobel operator. It also detects vertical
and horizontal edges of an image. It is one of the best ways to detect the orientation and
magnitude of an image. It uses the kernels or masks –
Advantages:
3. Robert Operator: This gradient-based operator computes the sum of squares of the
differences between diagonally adjacent pixels in an image through discrete
differentiation. Then the gradient approximation is made. It uses the following 2 x 2
kernels or masks –
import cv2
import numpy as np
from scipy import ndimage
roberts_cross_h = np.array( [[ 0, 1 ],
[ -1, 0 ]] )
img = cv2.imread("input.webp",0).astype('float64')
img/=255.0
vertical = ndimage.convolve( img, roberts_cross_v )
horizontal = ndimage.convolve( img, roberts_cross_h )
Where Sigma is the standard deviation. And the LoG operator is computed from
#OPENCV implementation
import cv2
import matplotlib.pyplot as plt
image = cv2.imread(r"E:\eye.png", cv2.IMREAD_COLOR)
image = cv2.GaussianBlur(image, (3, 3), 0)
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
filtered_image = cv2.Laplacian(image_gray, cv2.CV_16S, ksize=3)
# Plot the original and filtered images
plt.figure(figsize=(10, 5))
plt.subplot(121)
plt.imshow(image,cmap='gray')
plt.title('Original Image')
plt.subplot(122)
plt.imshow(filtered_image, cmap='gray')
plt.title('LoG Filtered Image')
plt.show()
Advantages:
Advantages:
1. It has good localization
2. It extract image features without altering the features
3. Less Sensitive to noise
Limitations:
1. There is false zero crossing
2. Complex computation and time consuming
Some Real-world Applications of Image Edge Detection:
CONTOUR SEGMENTATION
Unfortunately, extracting an edge image is not sufficient for most applications. We need to
extract the information contained in the edge image and represent it more explicitly so that it can
be reasoned with more easily. The extraction of edge data involves firstly deciding on which
points are edges (as most have a non-zero gradient). This is typically achieved by edge image
thresholding and non-maxima suppression (or through the use of Canny). The edge data then
needs to be extracted from the image domain (e.g. using graph searching, border refining and so
on) and represented in some fashion (e.g. BCCs, graphs, sequences of straight line segments and
other geometric representations).
Basic Representations of Edge Data
There are many ways in which edge image data can be represented outside of the image domain.
Here we will present two of the simplest: boundary chain codes and directed graphs.
A boundary chain code (BCC) consists of a start point and a list of orientations to other
connected edge points. The start point is specified by the (row,column) pair, and then the
direction to the next point is repeatedly specified simply as a value from 0 to 7 (i.e. the eight
possible directions to a neighbouring pixel). If considering this as a shape representation then we
have to consider how useful it will be for further processing (e.g. shape/object recognition).
BCCs are orientation dependant in that if the orientation of the object/region changes then the
representation will change significantly. The representation will also change somewhat with
object scale. It is only position dependent to the extent of the start point, although it should be
noted that the start point is somewhat arbitrary for a closed contour
2. Directed Graphs
A directed graph is a general structure consisting of nodes (which correspond to edge pixels in
this case) and oriented arcs (connections between bordering pixels/nodes). To create this type of
graph, we add all edge pixels to the graph as nodes if their gradient values s(xi) are greater than
some threshold (T). To decide on which nodes are then connected by arcs, we look at the
orientation s(xi) associated with each node ni to determine which neighbouring pixels xj could be
connected. The orientation vectors which have been quantised to eight possible directions, are
orthogonal to the edge contours and hence the most likely next pixel will be to the side of the
point (and hence is a possible arc). We also allow pixels on either side of this pixel (i.e. ±45◦) to
be considered to be connected via an arc. If any of these pixels have an associated node nj then
we add in a directed arc from ni to nj if the difference in orientation between the two
corresponding pixels is less than /2.