0% found this document useful (0 votes)
50 views8 pages

Making Machines See (Unit-3)

Computer vision (CV) is a subset of artificial intelligence that enables machines to interpret and understand visual information from digital images and videos. The CV process involves stages such as image acquisition, preprocessing, feature extraction, detection/segmentation, and high-level processing, which collectively enhance the machine's ability to analyze visual data. Applications of computer vision range from facial recognition and healthcare diagnostics to self-driving vehicles and surveillance, while challenges include privacy concerns and the need for accurate interpretation of diverse visual data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views8 pages

Making Machines See (Unit-3)

Computer vision (CV) is a subset of artificial intelligence that enables machines to interpret and understand visual information from digital images and videos. The CV process involves stages such as image acquisition, preprocessing, feature extraction, detection/segmentation, and high-level processing, which collectively enhance the machine's ability to analyze visual data. Applications of computer vision range from facial recognition and healthcare diagnostics to self-driving vehicles and surveillance, while challenges include privacy concerns and the need for accurate interpretation of diverse visual data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

UNIT-3

Making Machines See


How machines see?
Computer vision is also known as CV. Computer vision is a subset of machine learning,
which can work similarly to human vision. CV derives meaningful information from digital
images, videos, and other visual input and makes recommendations or takes appropriate
actions. Computer Vision is sometimes called Machine Vision.
Definition: Computer Vision is a field of artificial intelligence (AI) that uses Sensing devices
and deep learning models to help systems understand and interpret the visual world.

Working of Computer Vision


computer vision is the field of study that focuses on processing and analyzing digital images
and videos to comprehend their content. A fundamental aspect of computer vision lies in
understanding the basics of digital images.
Basics of digital images
A digital image is a picture that is stored on a computer in the form of a sequence of
numbers that computers can understand. Digital images can be created in several ways like
using design software (like Paint or Photoshop), taking one on a digital camera, or scan one
using a scanner.
Interpretation of Image in digital form
When a computer processes the image, the image is converted into multiple tiny square
boxes known as pixels. Each pixel represents a specific color value. During the process of
digitization, an image is converted into a grid of pixels. The resolution of an image is
determined by the number of pixels; the higher resolution images have more pixels. Every
pixel in the image was assigned numbers. For monochrome images, such as black and white
pictures, there is a range from 0 to 255, where 0 corresponds to black and 255 represents
white.
Computer Vision – Process
The Computer Vision process often involves five stages. Image Acquisition, Preprocessing,
Feature Extraction, Detection/Segmentation, and High-Level Processing. They are explained
below.
1. Image Acquisition:
Image acquisition is the initial stage in the process of computer vision, involving the capture
of digital images or videos; it is raw data. A high-resolution camera can capture and produce
clear images compared to a lower-resolution camera; in low-light conditions, it may result in
poor image quality. In scientific and medical fields, MRI (Magnetic Resonance Imaging) or CT
(Computer Tomography) can scan highly detailed images of biological tissues or structures.
2. Preprocessing:
Preprocessing in computer vision aims to improve the quality of the acquired image. Some
of the common techniques are –
a. Noise Reduction: Remove unwanted elements like blurriness, random spots or
distortions. This makes the image clearer and reduces distractions for algorithms.

b. Image Normalization: Standardizes pixel values across images for consistency. Adjusts the
pixel values of an image so they fall within a consistent range (e.g., 0–1 or -1 to 1).
c. Resizing/Cropping: Changes the size or aspect ratio of the image to make it uniform.
Ensures all images have the same dimensions for analysis.

d. Histogram Equalization: Adjusts the brightness and contrast of an image. Spreads out the
pixel intensity values evenly, enhancing details in dark or bright areas. Example: Making a
low-contrast image look sharper and more detailed. The main goal for preprocessing is to
prepare images for computer vision tasks by:
• Removing noise (disturbances).
• Highlighting important features.
• Ensuring consistency and uniformity across the dataset.
3. Feature Extraction:
Feature extraction involves identifying and extracting relevant visual patterns or attributes
from the pre-processed image. Feature extraction algorithms vary depending on the specific
application and the types of features relevant to the task.
• Edge detection: Edge detection identifies the boundaries between different regions in
an image where there is a significant change in intensity.
• Corner detection: Corner detection identifies points where two or more edges meet.
These points are areas of high curvature in an image, focused on identifying sharp
changes in image gradients, which often correspond to corners or junctions in
objects.
• Texture analysis: Texture analysis extracts feature like smoothness, roughness, or
repetition in an image.
• Colour-based feature: Colour-based feature extraction quantifies colour distributions
within the image, enabling discrimination between different objects or regions based
on their colour characteristics.

4. Detection/Segmentation:
Detection and segmentation are fundamental tasks in computer vision, detection and
segmentation focusing on identifying objects or regions of interest within an image. These
tasks play a important role in applications like autonomous driving, medical imaging, and
object tracking. This crucial stage is categorized into two primary tasks:
• Single Object Tasks
• Multiple Object Tasks
a. Single Object Tasks: Single object tasks focus on analysing or describing individual objects
within an image, with two main objectives:


Classification: Classification is the process of using algorithms to sort data into
different categories. KNN (K-Nearest Neighbour) classification algorithms are used in
supervised learning, and the K-means clustering algorithm is used in unsupervised
learning.
• Classification + Localization: Classification with localisation helps computer vision to
combine classifying an object in an image with identifying its location. Localisation
involves precisely localising the object within the image by predicting bounding boxes
that tightly enclose it.
b. Multiple Object Tasks: Multiple object tasks deal with multiple objects within a single
image. Multiple object tasks deal with scenarios where an image contains multiple instances
of objects or different object classes. These tasks aim to identify and distinguish between
various objects within the image, and they include:
• Object Detection: Object detection focuses on identifying and locating multiple
objects within the image. It helps to make boundary boxes around detected objects
by assigning class labels to these boxes.
• Image segmentation: It helps to create a mask around similar characteristic pixels
and identifies their class. Image segmentation helps to understand the image in
detail. Each pixel of the object is assigned as a class; it helps to identify each object
separately from the other. Techniques like edge detection, which works by detecting
discontinuities in brightness, are used in image segmentation. There are different
types of Image Segmentation available. Two of the popular segmentation are:
• Semantic Segmentation: It classifies pixels belonging to a particular class. For
example, we have one image that contains multiple objects like animals, plants,
and humans; the pixels are used to identify the class of animals but do not
identify the type of animal.
• Instance Segmentation: All the objects in the image are differentiated even if they
belong to the same class. In this image, for example, in the animal class image,
now instance segmentation will categorise the type of dog in the image.


5. High-Level Processing:
In the final stage of computer vision, high-level processing plays an important role in
interpreting and extracting meaningful information from the detected objects or regions
within digital images. The high-level processing enables computers to have a deeper
understanding of visual content and helps to make decisions based on the visual data. The
high-level processing includes recognising objects, understanding scenes, and analysing the
context of the visual content using machine learning algorithms.
Application of Computer Vision
Computer vision is integrated into the majority of the applications that we use in day-to-day
life. Some of the applications are listed below, which you might have already learnt in lower
classes.
• Facial recognition: Face recognition is used in social media platforms like Facebook to
identify the tags of the users.
• Healthcare: In the healthcare sector, computer vision can identify the disease or
abnormalities of the patient; object detection can also track the medical images.
• Self-driving vehicles: In a self-driving vehicle, the computer vision captures the
different angles around the car and their surroundings. Computer vision can also read
the traffic signals, detect other vehicles, and detect pedestrian paths, etc.
• Optical character recognition (OCR): OCR helps to extract text from the visual data,
for example, the images of invoices, articles, bills, etc.
• Machine inspection: Computer vision can inspect the machine and detect any
defects, features, or functional flaws that are there in the machine.
• 3D model building: Computer vision can help to construct a 3D computer model
using existing objects and also use it in various places, such as robotics, autonomous
driving, 3D tracking, 3D scene reconstruction, and AR/VR.
• Surveillance: Live footage from CCTV cameras in public places helps to identify
suspicious behaviour, identify dangerous objects, and prevent crimes by maintaining
law and order.
• Fingerprint recognition and biometrics: Detects fingerprints and biometrics to
validate a user’s identity.
Challenges of Computer Vision
Computer vision plays an important role in artificial intelligence, but to make some sense of
the visual data, computer vision faces multiple challenges. These challenges include:
• Reasoning and Analytical Issues: Computer vision has the capability to extract
meaningful insights from images; computer vision also requires accurate
interpretation. The robust reasoning and analytical skills are important to defining
attributes within visual content, which helps to extract meaningful and accurate data
from visuals.
• Difficulty in Image Acquisition: Every image is different from others; some images are
high resolution, some are not, some images have more light, some do not, and
different images have different scales. The analyzing of data in computer vision is
important, and accurate interpretation is required for the same.
• Privacy and Security Concerns: The surveillance of security cameras raises privacy
concerns, which can lead to the individuals privacy rights. Face recognition
technology in computer vision can create dilemmas regarding privacy and security.
• Duplicate and False Content: Computer vision introduces challenges related to the
proliferation of duplicate and false content. Malicious actors can exploit
vulnerabilities in image and video processing algorithms to create misleading or
fraudulent content.
The future of Computer Vision
There are significant improvements in computer vision and technology; now the basic
image can be processed using a complex system to understand and interpret visual data like
a human. Deep learning is capable of analysing the vast amount of labelled data. Now the
possibilities of computer vision are awe-inspiring, from personalized healthcare diagnostics
to immersive AR experiences. We can unlock the full potential of computer vision and
harness its transformative power for the benefit of humanity.

You might also like