Since I cannot create a Word document, I will provide the detailed answers to the questions in
a text-based format. You can copy and paste this content into a document of your choice.
2-Mark Questions
1. Define Computer Vision.
Computer vision is a field of artificial intelligence that trains computers to interpret and
understand the visual world from images and videos1.
2. Explain the different components of a vision system.
A vision system typically includes an
image sensor to capture the image, a processor to analyze the data, and software that
contains algorithms for tasks like object recognition or tracking2.
3. List out the applications of Computer Vision.
Applications of computer vision include autonomous vehicles, medical image analysis, quality
control in manufacturing, facial recognition, and surveillance333.
4. Define DIP.
DIP stands for
Digital Image Processing, which is the use of computer algorithms to perform processing on
digital images4.
5. List the elements of a DIP system.
The elements of a DIP system are:
● Image Sensor 5
● Digitizer 6
● Image Processor 7
● Computer 8
● Mass Storage 9
● Image Display 10
● Hardcopy Device 11
● Network 12
6. Summarize the purposes of gray scaling.
Grayscaling is the process of converting a color image into a range of grays, where the
intensity of each pixel represents the brightness of the original color13. The purpose is to
simplify image processing tasks, reduce data storage requirements, and prepare images for
algorithms that only work with intensity values14.
7. Define Frequency Domain.
The frequency domain refers to an image representation based on the rate of change of pixel
intensity values15. It is used to analyze the spatial variations in an image, with high
frequencies corresponding to sharp transitions like edges and low frequencies corresponding
to smooth regions16.
8. List any two types of image filters used in preprocessing.
Two types of image filters used in preprocessing are
Gaussian filters (for smoothing and noise reduction) and Median filters (effective at
removing salt-and-pepper noise while preserving edges)171717.
9. Describe the concept of a color model and list hardware-oriented color models.
A color model is a system for representing colors as a set of numerical values18. For example,
the
RGB model uses red, green, and blue values to create a color1919. Hardware-oriented color
models include
RGB (for displays) and CMY(K) (for printing)20.
10. State the expression to find the number of bits to store a digital image.
The number of bits needed to store a digital image can be found using the expression:
Number of bits = M×N×k
where
M and N are the image dimensions (rows and columns) and k is the number of bits per pixel21.
11. Explain, with notation, how pixel coordinates map to entries in an M×N image matrix.
In an
M×N image matrix, the pixel at coordinate (x,y) corresponds to the entry at f(y,x)22. The
coordinate
(x,y) represents the column and row, respectively, while the matrix entry (y,x) follows the
typical matrix notation of row and column23.
12. Define Histogram.
An image histogram is a graphical representation of the tonal distribution in a digital image24.
It plots the number of pixels for each intensity value in the image25.
13. List the types of Histogram.
The types of histograms include:
● Intensity Histogram: Shows the distribution of pixel values across the image26.
● Color Histogram: Shows the distribution of colors in a color image27.
● Local Histogram: Calculated for a specific region of the image28.
● Global Histogram: Calculated for the entire image29.
14. Define Radiometry.
Radiometry is the science of measuring electromagnetic radiation, including light3030. In
computer vision, it helps understand how light interacts with surfaces to form an
image31313131.
15. Illustrate the word "pseudo coloring".
Pseudo coloring is the process of assigning colors to grayscale image intensities based on a
specific rule or map32. This technique helps to highlight details and patterns that might not be
visible in the original grayscale image33. For example, low intensities could be mapped to blue
and high intensities to red34.
16. List the types of thresholding and explain.
Types of thresholding include:
● Simple Thresholding: A single global threshold value is used to segment the image.
Pixels with intensity values above the threshold are assigned one value (e.g., white), and
pixels below are assigned another (e.g., black)35.
● Adaptive Thresholding: The threshold value is not fixed but changes based on the local
characteristics of the image. This is useful for images with varying illumination36.
● Otsu's Thresholding: An automatic thresholding method that finds the optimal global
threshold by minimizing the intra-class variance of the two classes (foreground and
background)37.
17. Describe Image Thresholding.
Image thresholding is a segmentation technique used to convert a grayscale image into a
binary image38. It works by selecting a threshold value and classifying pixels as either
foreground or background based on whether their intensity is above or below that value39.
4-Mark Questions
1. Illustrate the function of each element in a DIP system with a diagram.
A typical DIP system consists of several components working together40.
● Image Sensor: Captures the initial visual data from a real-world scene, like a camera or a
scanner41.
● Digitizer: Converts the analog output from the sensor into a digital format42.
● Image Processor: A specialized hardware unit that performs various image processing
functions on the digital data at high speed43.
● Computer: A general-purpose computer that controls the system and performs more
complex processing tasks44.
● Mass Storage: Stores the original images and processed results45.
● Image Display: A monitor or screen to visualize the images46.
● Hardcopy Device: Creates a physical copy of the image, such as a printer47.
● Network: Allows the system to share and receive images from other systems48.
2. Explain the concept and purpose of edge detection in image processing.
Edge detection is a fundamental technique in image processing used to identify points in a
digital image where the brightness changes sharply49. These sharp changes typically
represent boundaries of objects or regions50. The purpose of edge detection is to simplify
the image data, preserving important structural properties while significantly reducing the
amount of data to be processed51. It is a crucial preprocessing step for tasks such as object
recognition, segmentation, and feature extraction52.
3. Define PMF and CDF.
● Probability Mass Function (PMF): In the context of images, the PMF for a discrete
random variable (like pixel intensity) gives the probability that a pixel in the image will
have a specific intensity value53.
● Cumulative Distribution Function (CDF): The CDF gives the cumulative probability up
to a certain intensity level54. It is the sum of all PMF values up to that intensity. In image
processing, the CDF is used in techniques like histogram equalization to redistribute pixel
intensities55.
4. Explain the relationship between pixels, providing an example for clarification.
The relationship between pixels can be defined by their neighborhood and connectivity56.
● 4-Neighborhood: A pixel's 4-neighbors are the pixels directly above, below, to the left,
and to the right of it57.
● 8-Neighborhood: An 8-neighborhood includes the 4-neighbors plus the four diagonal
pixels58.
● Adjacency/Connectivity: Pixels are considered adjacent if they are neighbors and have
the same intensity value59.
For example, if a pixel at coordinate (2, 2) has a value of 150, its 4-neighbors are pixels at
(2, 1), (2, 3), (1, 2), and (3, 2)60. If its diagonal neighbors at (1, 1), (1, 3), (3, 1), and (3, 3) also
have a value of 150, all nine pixels are considered connected in an 8-neighborhood61.
5. Define terms like gradient magnitude, non-maximum suppression, and hysteresis
thresholding in the context of Canny edge detection.
The Canny edge detector is a multi-stage algorithm for finding edges62.
● Gradient Magnitude: This is a measure of the change in pixel intensity at a point63. A
high gradient magnitude indicates a strong edge64.
● Non-Maximum Suppression: This process thins the edges by retaining only the pixel
with the highest gradient magnitude in the direction of the gradient65. All other pixels
along the edge are suppressed, resulting in a single-pixel-wide edge66.
● Hysteresis Thresholding: This final step uses two thresholds, a high and a low one, to
determine which edges are valid67. Pixels with a gradient magnitude above the high
threshold are immediately classified as edges. Pixels between the two thresholds are
classified as edges only if they are connected to a pixel that is already a confirmed
edge68.
6. Describe the mathematical expression for 2-D convolution in linear filtering, including the
meaning of each term.
Two-dimensional convolution for a linear filter is a mathematical operation that applies a
kernel (a small matrix) to an image69. The expression is:
g(x,y)=∑j=−aa∑k=−bbh(j,k)f(x−j,y−k)
Where:
● f(x,y) is the input image70.
● h(j,k) is the filter kernel or mask, with dimensions (2a+1)×(2b+1)71.
● g(x,y) is the output (filtered) image72.
The expression calculates the new value of each pixel in the output image by taking a
weighted sum of its neighboring pixels in the input image, with the weights provided by
the filter kernel73.
7. Explain the Image Restoration process with an example.
Image restoration is the process of recovering a degraded image by applying a prior model of
the degradation process74. Unlike image enhancement, which is subjective and aims to
improve visual appearance, image restoration is objective and tries to reverse the damage
done to the image75.
For example, an image might be degraded by motion blur caused by camera shake76. To
restore it, one would model the blur as a convolution with a specific kernel77. The restoration
process then involves deconvolution to try and remove the blur and recover the original sharp
image78.
8. List the difference between RGB and grayscale images with an example.
| Feature | RGB Image | Grayscale Image |
| :--- | :--- | :--- |
| Color Information | Stores color information using three channels: Red, Green, and Blue79. |
Represents colors as shades of gray, from black to white80. It has only one channel81. |
| Data Size | Requires three times the data storage compared to a grayscale image of the
same size82. | Requires less data storage as it only needs to store one value per pixel83. |
| Representation | Each pixel is represented by three values (R, G, B)84. | Each pixel is
represented by a single intensity value, usually from 0 (black) to 255 (white)85. |
Example: A vibrant photograph of a red flower would be an RGB image, with pixels storing
values for red, green, and blue86. A black-and-white version of that same photo, where the red
flower appears as a light gray, would be a
grayscale image87.
9. Explain the working principle of a vision-based autonomous vehicle perception system.
A vision-based perception system in an autonomous vehicle uses cameras to gather visual
information about the environment8888. The system uses computer vision algorithms to
perform tasks like:
● Object Detection: Identifying other vehicles, pedestrians, and cyclists8989.
● Lane Detection: Recognizing road markings to stay within a lane9090.
● Sign Recognition: Reading traffic signs and signals9191.
● Depth Estimation: Using multiple cameras (stereo vision) to determine the distance to
objects9292.
This information is then processed to create a 3D map of the surroundings, which is used
by the vehicle's control system to make decisions like braking, accelerating, or
steering9393.
6-Mark Questions
1. Discuss in detail about Sampling and quantization.
Sampling and quantization are two fundamental processes in converting a continuous analog
image into a digital one94.
● Sampling: This process digitizes the spatial coordinates of the image95. It involves taking
discrete measurements of the image's intensity values at regular intervals in both the
horizontal and vertical directions96. The result is a grid of samples, where each sample
corresponds to a pixel in the digital image97. The resolution of the resulting image is
determined by the sampling frequency; a higher sampling rate means more pixels and a
more detailed image98.
● Quantization: This process digitizes the amplitude (intensity) of the sampled image99.
Each sample is assigned a discrete intensity value from a predefined range of levels100.
For example, if an image is quantized with 8 bits, there are
28=256 possible intensity levels (0 to 255)101. The number of quantization levels
determines the number of gray levels or colors in the final image102. Insufficient
quantization can lead to "false contours" where smooth intensity changes appear as
abrupt steps103.
2. Explain the basic steps of Image Processing.
The basic steps of image processing are:
1. Image Acquisition: The process of capturing the image from a sensor (like a camera)
and converting it into a digital format104.
2. Image Enhancement: Aims to improve the visual appearance of the image or make it
more suitable for a specific task105. Techniques include adjusting contrast and
brightness, noise reduction, and sharpening106.
3. Image Restoration: An objective process that attempts to recover a degraded image by
modeling the degradation and applying an inverse filter107.
4. Color Image Processing: Deals with images that have multiple color channels (e.g.,
RGB)108.
5. Wavelets and Multi-resolution Processing: Techniques that analyze the image at
different levels of detail, useful for tasks like compression and feature extraction109.
6. Image Compression: Reduces the amount of data required to store or transmit an image
without significant loss of quality110.
7. Morphological Processing: A set of operations based on shape, used for tasks like
noise removal, skeletonization, and object thinning111.
8. Segmentation: The process of partitioning an image into multiple segments to simplify
or change the representation of an image into something more meaningful and easier to
analyze112.
9. Object Recognition: The final step where objects are identified and classified based on
their features113.
3. Explain the difference between 4-neighbor, diagonal neighbor, and 8-neighbor
relationships in images.
In an image, a pixel's relationship to its surrounding pixels is defined by its neighborhood114.
Let's consider a pixel at coordinates
(x,y).
● 4-Neighbor (N4): These are the four pixels directly adjacent to the center pixel, sharing
a common edge115. Their coordinates are
(x+1,y), (x−1,y), (x,y+1), and (x,y−1)116.
● Diagonal Neighbor (ND): These are the four pixels that are diagonally adjacent to the
center pixel, sharing only a corner117. Their coordinates are
(x+1,y+1), (x+1,y−1), (x−1,y+1), and (x−1,y−1)118.
● 8-Neighbor (N8): This is the combination of the 4-neighbors and the diagonal
neighbors119. An 8-neighbor relationship includes all eight pixels surrounding the center
pixel120.
4. Explain how the various filter masks are generated to sharpen images in spatial filters.
Image sharpening aims to enhance fine details and edges in an image121. This is typically
achieved by using spatial filter masks that highlight areas of sharp intensity changes. These
filters work by subtracting a blurred version of the image from the original, which enhances
the high-frequency components (edges)122.
● Unsharp Masking: A simple sharpening method where a blurred version of the image is
created and then subtracted from the original123. The resulting "mask" is then added back
to the original image to sharpen it124.
● Laplacian Filter: The Laplacian is a second-order derivative operator125. A common
Laplacian filter mask is:
0−10−14−10−10
Applying this mask enhances points where there is a sudden change in intensity,
effectively highlighting edges126.
● High-Boost Filter: This is a generalization of the unsharp mask127. It is represented by
the formula:
HighBoost(x,y)=A⋅Original(x,y)−Blurred(x,y), where A > 1. By increasing the value of A, you
can control the degree of sharpening128.
5. Discuss in detail about Histogram Equalization.
Histogram equalization is a technique used to improve the contrast of an image by stretching
out the range of intensity values129. The goal is to make the distribution of pixel intensities as
uniform as possible130.
The process involves these steps:
1. Calculate the Histogram: First, the histogram of the input image is computed, which
shows the frequency of each intensity level131.
2. Calculate the Probability Mass Function (PMF): This is done by normalizing the
histogram (dividing the frequency of each intensity by the total number of pixels)132.
3. Calculate the Cumulative Distribution Function (CDF): The CDF is calculated by
summing the PMF values133.
4. Perform Transformation: The CDF is then used as a transformation function to map the
original intensity levels to new, equalized intensity levels134. The formula is:
sk=T(rk)=(L−1)∑j=0kpr(rj), where rkare the original intensity levels and skare the new
ones135.
The result of this process is an output image with a flatter, more uniform histogram,
which often leads to better visual quality and contrast136.
6. Define Computer Vision. Why is vision so difficult? Provide any six real-world
applications of computer vision and explain.
● Definition: Computer vision is a field of artificial intelligence that trains computers to
interpret and understand the visual world from images and videos137.
● Why Vision is Difficult: Human vision is complex and effortless, but replicating it in a
computer is a major challenge due to several factors:
○ Scale and Orientation: An object can be viewed from different distances and
angles, and a computer must be able to recognize it in any of these variations138.
○ Deformation: The shape of objects can change (e.g., a person walking or an animal
running)139.
○ Illumination and Shadows: Lighting conditions vary greatly, and shadows can
obscure parts of an object140.
○ Intra-class Variation: There can be significant differences between objects of the
same class (e.g., different breeds of dogs)141.
○ Occlusion: Objects can be partially hidden by others, making it difficult to recognize
them142.
○ Background Clutter: An object may blend in with its background, making it hard to
distinguish143.
● Six Real-World Applications:
○ Autonomous Vehicles: Computer vision helps cars "see" the road, pedestrians,
traffic lights, and other vehicles to navigate safely144144144.
○ Medical Imaging: It is used to analyze X-rays, CT scans, and MRIs to detect diseases
like cancer or to assist in surgery145.
○ Facial Recognition: Used for security systems, unlocking smartphones, and
identifying individuals in crowds146.
○ Industrial Automation: For quality control, where a camera system can inspect
products on an assembly line for defects147.
○ Retail Analytics: Tracking customer behavior in stores, analyzing product
engagement, and managing inventory148.
○ Agriculture: Drones with vision systems can monitor crop health, detect weeds, and
even guide automated harvesting equipment149.
7. Explain the concepts of Frequency and Spatial Domain with example.
● Spatial Domain: This is the traditional representation of an image as a 2D grid of
pixels150. Processing in the spatial domain involves directly manipulating the pixel values.
Techniques like brightness adjustment, contrast enhancement, and noise reduction with
filters operate in this domain151.
○ Example: Applying a blur filter by averaging the pixel values of a neighborhood. You
are directly changing the values of the pixels based on their spatial location152.
● Frequency Domain: This domain represents an image based on the rate of change of
pixel intensity153. It is accessed through a mathematical transformation, most commonly
the Fourier Transform154. High frequencies correspond to sharp intensity changes (edges
and textures), while low frequencies correspond to smooth regions (the overall
brightness and color)155. Processing in this domain involves modifying the frequency
components and then transforming the image back to the spatial domain156.
○ Example: Applying a low-pass filter to remove noise. In the frequency domain, you
would suppress or remove the high-frequency components that represent noise157.
The resulting image will be smoother with less detail158.
8. Explain the purpose and functioning of image filtering techniques. Provide examples
for types of filters, such as Gaussian and Median filters used in image processing.
● Purpose: Image filtering is used to modify or enhance an image by applying a filter mask
or kernel to each pixel and its neighborhood159. The main purposes are to suppress noise,
enhance features like edges, and perform image restoration or compression160.
● Functioning: A filter works by convolving a filter kernel (a small matrix of numbers) with
the image161. The kernel is slid over the image, and at each pixel, a new value is calculated
by taking a weighted sum of the pixels in the neighborhood, with the weights determined
by the kernel162.
● Examples of Filters:
○ Gaussian Filter: This is a linear, low-pass filter used for blurring and noise
reduction163. It uses a kernel whose values are weighted by a Gaussian distribution164.
This gives more weight to the center pixel and less to its neighbors, resulting in a
smooth blur that is effective at removing random noise165.
○ Median Filter: This is a non-linear, order-statistic filter that is highly effective at
removing "salt-and-pepper" noise (random bright or dark pixels)166. Instead of a
weighted average, it replaces each pixel's value with the median value of its
neighborhood167. Because the median is not affected by outliers, it can remove the
noise while preserving the edges168.
9. Compare the performance of the Canny edge detector, Sobel operator, and Laplacian of
Gaussian (LoG) filter.
| Feature | Canny Edge Detector | Sobel Operator | Laplacian of Gaussian (LoG) |
| :--- | :--- | :--- | :--- |
| Method | Multi-stage algorithm (smoothing, gradient, non-maximum suppression,
hysteresis)169. | Simple first-order derivative operator170. | Second-order derivative
operator171. |
| Noise Sensitivity | Highly resistant to noise due to the initial Gaussian smoothing step172. |
Very sensitive to noise173. | Sensitive to noise, as derivatives amplify it174. |
| Edge Quality | Produces thin, continuous, and well-defined edges175. | Produces thick edges
that are often not single-pixel wide176. | Can produce double edges and is poor at
localization177. |
| Computational Cost | Relatively complex and computationally expensive due to multiple
stages178. | Computationally fast and simple to implement179. | More computationally
expensive than Sobel but less than Canny180. |
10. Under what conditions might one algorithm be more suitable than the others?
Justify your Answer.
● Canny Edge Detector: This is the most suitable algorithm when high-quality, continuous,
and single-pixel-wide edges are required181. It is the preferred choice for applications like
autonomous vehicles or robotics, where precise edge detection is critical for object
recognition and navigation182. Its noise robustness makes it ideal for real-world images
that are often noisy183.
● Sobel Operator: The Sobel operator is best suited for applications where speed is more
important than edge quality184. Because it is computationally simple, it can be used for
real-time applications or as a quick-and-dirty method for finding prominent edges185. It is
also useful as a first step in more complex algorithms.
● Laplacian of Gaussian (LoG): The LoG is useful for detecting edges in images where
the intensity changes are more gradual and the location of the edge is less important
than its existence186. It can be a good choice for tasks like blob detection187. However, its
poor noise performance and tendency to create double edges make it less suitable for
precise edge detection than the Canny operator188.
11. Find the convolution of the two sequences x[k]=[3,1,2] and h[k]=[3,2,1].
The convolution of two sequences x[k] and h[k] is given by the formula:
y[n]=∑k=−∞∞x[k]h[n−k].
In this case, we have:
x[k]=[3,1,2]
h[k]=[3,2,1]
The length of the output sequence will be the sum of the lengths of the input sequences
minus one: (3+3)−1=5.
y[n]=[y[0],y[1],y[2],y[3],y[4]]
● y[0]=x[0]h[0]=3×3=9
● y[1]=x[0]h[1]+x[1]h[0]=(3×2)+(1×3)=6+3=9
● y[2]=x[0]h[2]+x[1]h[1]+x[2]h[0]=(3×1)+(1×2)+(2×3)=3+2+6=11
● y[3]=x[1]h[2]+x[2]h[1]=(1×1)+(2×2)=1+4=5
● y[4]=x[2]h[2]=2×1=2
So, the convolution of the two sequences is
y[n]=[9,9,11,5,2]189.
12. Find the convolution of the two sequences x[k]=[1,2,4] and h[k]=[1,1,1,1,1].
The convolution of two sequences x[k] and h[k] is given by the formula:
y[n]=∑k=−∞∞x[k]h[n−k].
In this case, we have:
x[k]=[1,2,4]
h[k]=[1,1,1,1,1]
The length of the output sequence will be the sum of the lengths of the input sequences
minus one: (3+5)−1=7.
y[n]=[y[0],y[1],y[2],y[3],y[4],y[5],y[6]]
● y[0]=x[0]h[0]=1×1=1
● y[1]=x[0]h[1]+x[1]h[0]=(1×1)+(2×1)=1+2=3
● y[2]=x[0]h[2]+x[1]h[1]+x[2]h[0]=(1×1)+(2×1)+(4×1)=1+2+4=7
● y[3]=x[0]h[3]+x[1]h[2]+x[2]h[1]=(1×1)+(2×1)+(4×1)=1+2+4=7
● y[4]=x[0]h[4]+x[1]h[3]+x[2]h[2]=(1×1)+(2×1)+(4×1)=1+2+4=7
● y[5]=x[1]h[4]+x[2]h[3]=(2×1)+(4×1)=2+4=6
● y[6]=x[2]h[4]=4×1=4
So, the convolution of the two sequences is
y[n]=[1,3,7,7,7,6,4]190.