UNIT - I
1.Define computer vision?
Computer vision is a field of artificial intelligence (AI) that allows computers to understand and identify objects and
people in images and videos:
How it works
Computer vision uses machine learning and neural networks to process visual data and make sense of it.
2.What is meant by digitization? And its properties?
Digitization is the process of converting information from a physical or analog format into a digital format. It
involves capturing data or media (such as documents, images, audio, or video) and translating it into binary code so
that it can be stored, processed, and transmitted through digital systems. For instance, scanning a paper document to
create a digital copy or converting audio into an MP3 file are forms of digitization.
➢ Data Storage and Retrieval
➢ Accessibility
➢ Reproducibility
➢ Efficiency
➢ Integration and Processing
➢ Security:
3.What are the levels of image representations?
The levels of image representation in digital imaging describe how images are captured, stored, processed, and
displayed. These levels form a hierarchy from raw, low-level data to more complex, high-level interpretations of an
image's content. Here’s an overview of each level:
1. Pixel Level (Low Level)
● Definition: The pixel level represents the most basic form of an image, consisting of tiny, square elements
called pixels.
2. Feature Level
● Definition: At the feature level, an image is represented by distinct features such as edges, corners, shapes,
textures, and colors.
C3. Object Level
● Definition: At the object level, an image is represented in terms of distinct objects or regions.
4. Scene Level (High Level)
● Definition: The scene level provides a high-level understanding of the entire image, identifying the context
or setting within which objects are located.
5. Semantic Level
● Definition: The semantic level involves interpreting an image in terms of its meaning or significance, often
in relation to specific human knowledge or understanding.
5..Differentiate between image representations and image analysis tasks?
Image representation is the process of converting visual data into a digital format that computers can interpret.
Image analysis is the process of extracting useful information from images, usually digital images, using digital
image processing techniques.
● Image analysis is the process of breaking down an image into its fundamental components to extract
meaningful information. Some image analysis tasks include:
Image segmentation: A key task in image analysis that involves isolating regions and objects of interest.
Image segmentation is a major part of object recognition and categorization in computer vision.
● Object recognition: A major task in image mining that involves finding objects in the real world from an
image using known object models.
● Region analysis: Involves extracting statistical data from an image.
7.What is the difference between image analysis and and computer graphics ?
Image analysis is the process of extracting information from an image, while computer graphics is the process of
creating images:
Computer graphics is a technology that uses computers to generate images, videos, and animations. It's used in many
applications, including:
Film and television: Computer graphics are used to create images and effects for movies and TV shows. For
example, the 2009 film Avatar used facial motion capture technology.
Video games: Computer graphics are used to create the images and animations in video games.
8.list out the data structures used for image analysis?
● Image Maps. ...
● Chains. ...
● Run-Length Encoding. ...
● Hierarchical Image Structures. ...
● Pyramids. ...
● Trees. ...
● Relation Graph
9.What is meant by image acquisition?
image Acquisition using single, line and array sensor. Image acquisition is the action of retrieving an image from
some source
10.Define neighbours of pixel?
A pixel's neighbors are the pixels that are adjacent to it within a specified distance and threshold. The collection of
pixels around a pixel is called its neighborhood.
UNIT 2
1.What do you understand by the term aliasing?
“Aliasing is basically an artifact that makes an image look particularly digital in a disagreeable way,” says
photographer Philip Heying. “It's caused when digital information is broken down into pixels and bits; little tiny
particles that are laid out in a grid.Image Classification - Assigning labels or categories to entire images.
2.list out the image analysis tasks?
Object Detection - Identifying and localizing objects within an image.
Image Segmentation - Dividing an image into meaningful parts, often into individual objects or regions.
Edge Detection - Identifying and highlighting the edges or boundaries within an image.
Feature Extraction - Detecting specific features, such as corners, textures, or patterns, that are useful for further
analysis.
Image Registration - Aligning two or more images, often from different sources or viewpoints.
Optical Character Recognition (OCR) - Recognizing and extracting text from images.
Motion Analysis - Tracking and analyzing motion in a sequence of images, useful in video processing.
Facial Recognition - Identifying and verifying individuals based on facial features.
Anomaly Detection - Detecting unusual patterns or objects within images, often used in security or quality control.
3D Reconstruction - Creating 3D models from multiple 2D images.
Image Denoising - Reducing noise in images to enhance visual quality.
Image Enhancement - Improving image quality, such as adjusting contrast, sharpness, or brightness.
4.What are the main goals of image preprocessing?
It tackles various distortions and boosts key image qualities like contrast, resolution, and noise levels. These
adjustments are essential for computer vision and machine learning applications, falling under foundational
techniques in image processing.
5.Define convolution mask?
In image processing convolution mask is a small matrix with a set of weightings which is applied to pixel values in
order to create a new effect such as blurring, sharpening, embossing, edge-detection, and more.
6.Define smoothing?
Smoothing aims to suppress noise or other small fluctuations in the image; it is equivalent to the suppression of high
frequencies in the Fourier transform domain. Unfortunately, smoothing also blurs all sharp edges that bear important
information about the image.
7.Give an algorithm for smoothing using a rotating mask?
Averaging using a rotating mask is such a non-linear method that avoids edge blur- ring, and the resulting image is
in fact sharpened. The brightness average is calculated only within this region; a brightness dispersion σ2 is used as
the region homogeneity measure
8.Define edge detectors?
Edge detectors are image processing algorithms used to identify significant transitions (or "edges") in intensity
within an image, which often correspond to the boundaries of objects. Edges are typically areas of high contrast
between neighboring pixels. Edge detection is a fundamental step in computer vision and image processing, as it
simplifies an image, making it easier to analyze object boundaries, shapes, and textures. Here are some common
edge detection techniques:
1. Sobel Edge Detector
● The Sobel operator uses two 3x3 kernels (one for horizontal and one for vertical gradients) to calculate the
gradient of image intensity. These kernels approximate the gradient in the x and y directions.
● It combines the two gradient images to determine the edges.
● This detector emphasizes edges in diagonal and horizontal directions and is sensitive to noise.
2. Prewitt Edge Detector
● Similar to Sobel, the Prewitt operator also uses 3x3 kernels to calculate horizontal and vertical gradients.
3. Canny Edge Detector
● The Canny edge detector is a multi-step process that includes Gaussian smoothing, gradient computation,
non-maximum suppression, and edge tracking by hysteresis.
4. Laplacian of Gaussian (LoG)
5. Roberts Cross Edge Detector
6. Difference of Gaussians (DoG)
7. Directional Derivative Edge Detectors (e.g., Scharr Filter)
8. Wavelet Edge Detectors
9. What is meant by Zero-Crossing Edge Detectors
● Zero-crossing detectors identify edges by finding places where the second derivative of the image intensity
function changes sign, which often indicates an edge.
● The Laplacian of Gaussian (LoG) is a common example, detecting zero-crossings to localize edges.
● These detectors are sensitive to fine detail but may require pre-smoothing to avoid noise artifacts.
10.What is meant by image sharping?
Image sharpening is a digital technique that enhances the edges and details of an image to make it appear clearer and
more vibrant. It's a common step in digital image processing, and can be used to fix blurry images, compensate for
camera shake, and more.
List out the some operatoron image smoothing?
Define sobel operator?
Define homomorpic filtering?\
What is the difference between image restoration and image enhancement?
Difference between wiener &inverse filtering?
Define wrong lens focus?
UNIT 3
1.What is meant by object detection?.
Object Detection is a computer vision technique to locate objects in an image or in a video. Organizations and
researchers are spending huge time and resources to uncover this capability. When we humans look at a picture, we
can quickly identify the objects and their respective position in an image.
2.Compare object classification , localization & detection?
Classification with localization not only classifies the main object in the image but also localizes it in the image
determining its bounding box (position and size or localization anchors). Detection tries to find all object of the
previously trained (known) classes in the image and localize them.
3.What are use case of object detection?
1. Object Detection is the key intelligence behind autonomous driving technology. It allows the users to detect the
cars, pedestrians, the background, motorbikes, and so on to improve road safety.
2. We can detect objects in the hands of people, and the solution can be used for security and monitoring purposes.
Surveillance systems can be made much more intelligent and accurate. Crowd control systems can be made more
sophisticated, and the reaction time will be reduced.
3. A solution might be used for detecting objects in a shopping basket, and it can be used by the retailers for the
automated transactions. This will speed up the overall process with less manual intervention.
4. Object Detection is also used in testing of mechanical systems and on manufacturing lines. We can detect objects
present on the products which might be contaminating the product quality.
5. In the medical world, the identification of diseases by analyzing the images of a body part will help in faster
treatment of the diseases.
5.Define sliding window?
When we want to detect objects, a very simple approach can be: why not divide the
image into regions or specific areas and then classify each one of them. This approach for
object detection is sliding window. As the name suggests, it is a rectangular box which
slides through the entire image. The box is of fixed length and width with a stride to move
over the entire image.
6.Define IOU?
Intersection over Union is a test to ascertain how close is our prediction to the actual truth.
The numerator is the common area, while the denominator is the complete union of the two areas. The higher the
value of IoU, the better
it is
IoU = Overlapping region/Combined entire region (Equation 5-1)
So, if we get a higher value of Intersection over Union, it means the overlap is better.
Hence, the prediction is more accurate and better. It is depicted in the example in Figure 55
to visualize.
IoU calculates intersection over the union of the two bounding boxes, the bounding box
of the ground truth and the predicted bounding box.
7.What the salient features of yolo?define yolo?
YOLO divides the input image into an SxS grid. To be noted is that each grid is responsible for predicting only one
object. If the center of an object falls in a grid cell, that grid cell is responsible for detecting that object.
2. For each of the grid cells, it predicts boundary boxes (B). Each of the boundary boxes has five attributes – the x
coordinate, y coordinate, width, height, and a confidence . In other words, it has (x, y, w, h) and a score. This
confidence score is the confidence of having an object inside the box. It also reflects the accuracy of the
boundary box.
3. The width w and height h are normalized to the images’ width and height. The x and y coordinates represent the
center relative to the bound of the grid cells.
4. The confidence is defined as Probability(Object) times IoU. If there is no object, the confidence is zero. Else, the
confidence is equal to the IoU between the predicted box and ground truth.
5. Each grid cell predicts C conditional class probabilities – Pr(Classi | Object). These probabilities are conditioned
on the grid cell containing an object. We only predict one set of class probabilities per grid cell, regardless of the
number of boxes B.
6. At the test time, we multiply the conditional class probabilities and the individual class predictions.
8. What is the loss function in YOLO?
YOLO uses a sum of squared error between the predictions and the ground truth to calculate the loss. The loss
function composes of:
• The Classification loss.
• The Localization loss (errors between the predicted boundary box and the ground
truth).
• The Confidence loss (the objectness of the box).
9. What is the advantage of two-stage methods?
In two-stage methods like R-CNN, they first predict a few candidate object locations and then use a convolutional
neural network to classify each of these candidate object locations as one of the classes or as background.
10. What is FPN?
Feature Pyramid Network (FPN) is a feature extractor designed with a feature pyramid concept to improve accuracy
and speed. Images are first to pass through the CNN pathway, yielding semantically rich final layers. Then to regain
better resolution, it creates a top-down pathway by upsampling this feature map. While the top-down pathway helps
detect objects of varying sizes, spatial positions may be skewed. Lateral connections are added between the original
feature maps and the corresponding reconstructed layers to improve object localization. It currently provides one of
the leading ways to detect objects at multiple scales, and YOLOv3, Faster R-CNN were build up with this technique.
UNIT 4
1. What is face detection?
Face detection, also called facial detection, is an artificial intelligence ( AI )-based computer technology used to find
and identify human faces in digital images and video.
2. What is face recognition?
Facial recognition is a way of identifying or confirming an individual’s identity using their face. Facial recognition
systems can be used to identify people in photos, videos, or in real-time.
3.What are the applications of face recognition?
Security management:
Identity verification
It is used by retailers to know when individuals with not-so-good history have entered the premises. When
shoplifters, criminals, or fraudsters enter the stores, they act as a threat.
Marketing
Consumer experience is improved when the consumer-product interaction is
Analyzed.
Access to offices, airports, buildings, warehouses,
4. What is the concept of triplet loss?
The triplet loss is a distance-based loss function that aims to learn embeddings that are closer for similar input data
and farther for dissimilar ones. First, we have to compute triplets of data that consist of the following:
• an anchor input sample
• a positive example that has the same label with
• and a negative example that has different label with (and of course)
5.What is the concept of facial alignment?
Face alignment is a computer vision technology for identifying the geometric structure of human faces in digital
images. Given the location and size of a face, it automatically determines the shape of the face components Such as
eyes and nose. A face alignment program typically operates by iteratively adjusting a deformable models, which
encodes the prior knowledge of face shape or appearance, to take into account the low-level image evidences and
find the face that is present in the image.
6.What are the various use cases of gesture recognition?
Gesture recognition can be used to control devices or interfaces, such as a computeror a smartphone, through
movements or actions, such as hand or body movements, facial expressions or even voice commands.
Gesture recognition has a variety of uses, including:
• Human-computer interaction: Gesture recognition can be used to control computers, smartphones, and other
devices through gestures, such as swiping, tapping, and pinching.
• Gaming: Gesture recognition can be used to control characters and objects in video games, making the gaming
experience more immersive and interactive.
• Virtual and augmented reality: Gesture recognition can be used to interact with virtual and augmented reality
environments, allowing users to control and manipulate objects in those environments.
• Robotics: Gesture recognition can be used to control robots, allowing them to perform tasks based on the user’s
gestures.
• Sign language recognition: Gesture recognition can be used to recognize and translate sign language into spoken or
written language, helping people who are deaf or hard of hearing communicate with others.
7. What are fiducial points?
Fiducial marker or fiducial is an object placed in the field of view of an imaging system that
appears in the image produced, for use as a point of reference or a measure.
8.What is the concept of facenet?
FaceNet is a face recognition system developed in 2015 by Google researchers Florian Schroff, Dmitry
Kalenichenko, and James Philbin in a paper titled FaceNet: A Unified Embedding for Face Recognition and
Clustering.
Unit 5
1.What are the .Use cases of video analytics
Videos are a rich source of knowledge and information. We can utilize Deep Learning– based capabilities across
domains and business functions. Some of
them are listed as follows:
1. Real-time face detection can be done using video analytics, allowing us to detect and recognize the faces. It has
huge benefits and applications across multiple . We have discussed the application in detail in the last chapter.
2. In disaster management, video analytics can play a significant role. Consider this. In a flood-like situation, by
analyzing videos of the actual area, the rescue team can identify the zones they should concentrate on. It will help
reduce the time to action which directly leads to more lives saved.
3.For crowd management, video analytics plays an important role. We can identify the concentration of the
population and the eminent dangers in that situation. The respective team can analyze the videos or a real-time
streaming of video using cameras. And a suitable action can be taken to prevent any mishappening.
4. By analyzing the social media videos, the marketing teams can improve the contents. The marketing teams can
even analyze the contents of the competitors and tweak their business plan accordingly as per the business needs.
2. What is the purpose of skip connections and how are they useful?
Skip connections can help the network to learn more complex and diverse patterns from the data and reduce
thenumber of parameters and operations needed by the network. Additionally, skip connections can help to alleviate
the problem of vanishing gradients by providing alternative paths for the gradients to flow.
3. What is the problem of vanishing gradients and how can we rectify it?
The vanishing gradient problem is a challenge in deep learning that occurs when the gradients used to update
weights during backpropagation become very small. This makes it difficult for the network to learn, and can slow
down or stop training.
4. What is the improvement between Inception v1 and Inception v3 networks?
Inception V1 has a simple network structure and fewer parameters, it is easier to implement and deploy in the case
of limited computing resources. Compared to Inception V3, it has higher computational efficiency and faster
learning speed
5. Define video analytics.
Video analytics, also known as Video Content Analysis (VCA), is a technology that uses algorithms to analyze video
in real-time and extract useful information:
6. List the uses of skip connection.?
Skip connections can help to preserve information and gradients that might otherwise be lost or diluted by passing
through multiple layers. They can also help to combine features from different levels of abstraction and resolution,
which can enhance the representation power of the network.
.
8. What is inception v2.
Inception-ResNet-v2 is a convolutional neural network that is trained on more than a million images from the
ImageNet database [1]. The network is 164 layers deep and can classify images into 1000 object categories, such as
keyboard, mouse, pencil, and many animals.
9. Define gradient clipping?
Gradient clipping is one of the methods which can be used. We can limit the size
of the gradients during the process of training. We set a threshold for the error gradients,
and the error gradients are set to that limit or clipped if the error gradient exceeds the
threshold.
10.Define video processing?
Video processing is the process of manipulating and managing video data using computational tasks. It involves a
variety of activities, including: Image processing, Compression, Color conversion, Encryption, Video stabilization,
and MPEG-4 algorithm execution. Video processing is a type of signal processing that uses video filters and
statistical analysis to extract information or alter video. Some basic video processing techniques include: Trimming,
Image resizing, Brightness and contrast adjustment, and Fade in and fade out.