0% found this document useful (0 votes)
47 views

AI For IA Unit 2

The document provides an introduction to key concepts in image processing, including: - Images are composed of pixels arranged in a grid, with each pixel representing a color value. Image resolution refers to the number of pixels and is often measured in pixels per inch (PPI) or dots per inch (DPI). - There are two main types of digital images - bitmap (raster) images composed of grids of pixels and vector images defined by mathematical formulas. Common file formats and compression techniques are also discussed.

Uploaded by

Harshitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

AI For IA Unit 2

The document provides an introduction to key concepts in image processing, including: - Images are composed of pixels arranged in a grid, with each pixel representing a color value. Image resolution refers to the number of pixels and is often measured in pixels per inch (PPI) or dots per inch (DPI). - There are two main types of digital images - bitmap (raster) images composed of grids of pixels and vector images defined by mathematical formulas. Common file formats and compression techniques are also discussed.

Uploaded by

Harshitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Unit 2

Introduction to Image Processing: Images, Pixels, Image Resolution, PPI and DPI,
Bitmap Images, Lossless Compression, Lossy Compression, Image File Formats, Color
Spaces: RGB, XYZ, HSV/HSL, LAB, LCH, YPbPr, YUV, YIQ, Advanced Image Concepts:
Bezire Curve, Ellipsoid, Gamma Correction, Structural Similarity Index, Deconvolution,
Homograph, Convolution
Images
Images are digital representations of visual information stored on computers. They are typically
composed of pixels arranged in a grid, where each pixel represents a single color value. Images
can be created by cameras, scanners, or generated by software. They play a crucial role in
various fields, including communication, entertainment, education, and science. Images can be of
various types, such as photographs, illustrations, graphics, or medical scans.

Pixels
A pixel, short for "picture element," is the smallest unit of a digital image. It is a tiny, square-shaped
element that contains color information. Pixels are arranged in a grid, and the combination of
these pixels creates the visual content of an image. Each pixel is assigned a specific color value,
and the resolution of an image is determined by the number of pixels it contains.
In a grayscale image, each pixel is represented by a single value indicating its brightness. In a
color image, each pixel is usually represented by three values (Red, Green, Blue - RGB) or four
values (Red, Green, Blue, Alpha - RGBA) that together define its color.
The resolution of an image is closely tied to pixels, and it is a critical factor in determining the
clarity and detail of the visual content.

Image Resolution
Image resolution refers to the number of pixels present in an image, usually expressed as the
number of pixels in the horizontal and vertical dimensions. It is often measured in pixels per inch
(PPI) or dots per inch (DPI). The resolution provides an indication of the image's level of detail and
sharpness.
 High Resolution: An image with a high resolution has more pixels per inch, resulting in a finer
level of detail. High-resolution images are often preferred for tasks requiring clarity, such as
printing large posters or displaying images on high-density screens.
 Low Resolution: An image with a low resolution has fewer pixels per inch, and it may appear
pixelated or blurry when enlarged. Low-resolution images are suitable for online use, such as
websites or social media, where smaller file sizes are desirable for faster loading times.

PPI and DPI


PPI (Pixels Per Inch):
PPI is a measure of the pixel density in a digital image. It indicates how many pixels are present
per inch in both the horizontal and vertical directions. PPI is primarily associated with digital
images that are viewed on screens, such as computer monitors, smartphones, or tablets.
Here's how PPI works:
 High PPI: An image with a high PPI has more pixels per inch, resulting in a finer level of detail.
High PPI is desirable for images that will be viewed on high-resolution screens, as it ensures a
crisp and clear display.
 Low PPI: An image with a low PPI has fewer pixels per inch, and it may appear pixelated or
less detailed, especially when displayed on high-resolution screens.
 In the context of printing, PPI is also relevant when determining the appropriate image
resolution for a specific print size. For example, an image with a resolution of 300 PPI is
considered suitable for high-quality printing.
DPI (Dots Per Inch):
DPI, on the other hand, is a measure of the printing resolution and refers to the number of ink dots
a printer can place in a linear inch. DPI is relevant in the context of printed documents, such as
photographs, graphics, or text.
Here's how DPI works:
 High DPI: A printer with a high DPI capability can produce prints with more dots per inch,
resulting in a higher level of detail and sharper images. Printers with higher DPI are generally
capable of producing more detailed and vibrant prints.
 Low DPI: A printer with a lower DPI capability may produce prints with less detail, potentially
leading to a loss of sharpness and clarity.

Bitmap Images
A bitmap image, also known as a raster image, is a type of digital image that is composed of a grid
of individual pixels, where each pixel represents a single colour value.. Each pixel in a bitmap
image contains specific color information, and when these pixels are arranged in a grid, they
collectively form the visual content of the image. Bitmap images are contrasted with vector
images, which are composed of mathematical formulas describing shapes rather than individual
pixels.
Characteristics:
 Pixel-based: Unlike vector graphics, which use mathematical formulas to represent shapes and
lines, bitmaps are based on individual pixels. This provides a more realistic representation of
continuous tones and gradients.
 Lossy compression: Due to the large amount of data required to store information for each
pixel, bitmap images are often compressed using lossy compression techniques. This reduces
file size but can result in some loss of image quality, especially at high compression levels.
 Rasterization: Vector graphics can be converted to bitmaps by a process called rasterization.
This involves determining the color of each pixel based on the underlying vector elements.
Advantages:
 Photorealistic representation: Bitmaps excel at capturing realistic details and subtle color
variations, making them ideal for photographs and other images with intricate textures and
shading.
 Simple editing: Editing bitmap images is straightforward, as each pixel can be manipulated
directly. This allows for basic adjustments like cropping, resizing, and color correction.
Lossless Compression
Lossless compression is a data compression method that reduces the size of a file or dataset
without any loss of information or quality. In other words, when data is compressed using a
lossless compression algorithm and then decompressed, This means that no information is lost
during the compression process and the original data is perfectly reconstructed. This is in contrast
to lossy compression, where some data is discarded during compression, leading to a loss of
quality.

Here's how it works:

1. Identifying Redundancy: Most real-world data contains redundancy. This means that the same
information is repeated multiple times throughout the data. Lossless compression algorithms
identify and exploit this redundancy to reduce the file size.

2. Replacing Redundant Data: Once redundant data is identified, the algorithm replaces it with
shorter representations. This can be done using various techniques such as dictionary coding,
Huffman coding, or run-length encoding.

3. Storing Decompression Instructions: In addition to the compressed data, the algorithm also
stores instructions on how to decompress the data back to its original form. These instructions
typically involve a set of rules and tables that the decompression software uses to reverse the
compression process.

Benefits of Lossless Compression:

 No loss of information: The original data is perfectly preserved after being compressed and
decompressed.
 Wide range of applications: Lossless compression can be used for various types of
data, including text, images, audio, and video.
 Improved storage efficiency: Compressed data takes up less storage space than the
original data, making it ideal for archiving and transferring large files.

Examples of Lossless Compression Algorithms:

 GZIP: Commonly used for compressing text files.


 PNG: A lossless image format that maintains original image quality.
 ZIP: A popular archive format that can contain multiple compressed files.

Limitations of Lossless Compression:

 Compression ratio: Lossless compression typically achieves lower compression ratios


compared to lossy compression.
 Computational cost: The process of compressing and decompressing data can be
computationally expensive for some algorithms.
Lossy Compression
Lossy compression is a data compression technique used to reduce the size of an image or file. It
achieves compression by eliminating some of the data that the compression algorithm deems less
important. As a result, lossy compression typically results in a reduction in image quality, although
the loss may not be perceptible to the human eye in many cases. Common lossy image
compression formats include JPEG and MPEG.

How does it work?

Lossy compression algorithms work by analysing the data and identifying patterns that can be
discarded. This can be achieved through various techniques, including:

 Quantization: This involves reducing the precision of data representation, discarding less
important details. For example, in an image, subtle differences in color might be ignored.
 Transform coding: This involves transforming the data into a different representation where
redundant information is more easily identified and removed.

Key characteristics of lossy compression include:

 Information Loss: Some data is permanently discarded during compression, which can lead
to a loss in image quality.
 Smaller File Size: Lossy compression typically results in significantly smaller file sizes
compared to the original, which is useful for storage and transmission.
 Faster transmission: Smaller files can be transmitted faster over networks, improving
streaming and download times.
 Suitable for Natural Images: Lossy compression is often used for photographs, graphics,
and multimedia content where some loss of quality is acceptable.

Examples of Lossy Compression Algorithms:

 JPEG: A widely used image format that offers a good balance between compression and
quality.
 MP3: A popular audio format that reduces file size while maintaining acceptable sound
quality.
 WebP: A modern image format offering smaller file sizes than JPEG with comparable
quality.

Image File Formats


Image file formats are standardized ways of organizing and storing digital images. Each file format
has its own specifications, features, and characteristics, making it suitable for particular use cases.
Here are some common image file formats:
JPEG (Joint Photographic Experts Group):
 Compression: JPEG is a lossy compression format, meaning that it achieves smaller file
sizes by discarding some image information. The degree of compression can often be
adjusted.
 Use Cases: Suitable for photographs and images with complex color gradients. Commonly
used for web images and digital photography.
 Drawbacks: Lossy compression may result in a loss of image quality, especially at higher
compression levels.
PNG (Portable Network Graphics):
 Compression: PNG uses lossless compression, preserving all image information. It also
supports transparency.
 Use Cases: Suitable for images with text, line art, or images requiring transparency (alpha
channels). Commonly used for web graphics and images with a transparent background.
 Drawbacks: File sizes are typically larger than JPEG for photographic images.
GIF (Graphics Interchange Format):
 Compression: GIF uses lossless compression and supports animations. It is limited to 256
colors.
 Use Cases: Suitable for simple graphics, icons, and images with limited colors. Often used
for simple animations.
 Drawbacks: Limited color support compared to other formats.
TIFF (Tagged Image File Format):
 Compression: TIFF supports both lossless and lossy compression. It is a flexible format that
can include multiple layers and pages.
 Use Cases: Commonly used in professional settings for high-quality printing, archiving, and
exchanging raster graphics (photographs, scanned images).
 Drawbacks: Larger file sizes compared to some other formats.
BMP (Bitmap):
 Compression: BMP files use uncompressed image data, resulting in large file sizes.
 Use Cases: Suitable for storing images in their raw, uncompressed form. Commonly used
in Windows environments.
 Drawbacks: Large file sizes make BMP less practical for web or storage constraints.
WEBP:
 Compression: WEBP is a modern image format developed by Google that supports both
lossy and lossless compression. It often provides smaller file sizes compared to JPEG and
PNG.
 Use Cases: Suitable for web images, providing a balance between compression efficiency
and image quality.
 Drawbacks: Not as widely supported as other formats, but support is increasing.

Color Spaces: RGB, XYZ, HSV/HSL, LAB, LCH, YPbPr, YUV, YIQ
Color spaces are mathematical models that represent colors in a way that allows for the accurate
and consistent description of colors across different devices and applications. Each color space
has its own set of coordinates and parameters to define colors.
1) RGB (Red, Green, Blue): Description: RGB is a color model where colors are represented as
combinations of red, green, and blue light. It is widely used in electronic displays, such as
computer monitors, television screens, and digital cameras.
 Representation: Colors are defined by intensity values for each of the three primary colors
(Red, Green, Blue) usually on a scale from 0 to 255.
2) XYZ (CIE 1931 Color Space):
 Description: XYZ is a color space defined by the International Commission on Illumination
(CIE). It is based on human perception and attempts to be a perceptually uniform color
space. XYZ is not intended for practical use but serves as a foundation for other color
spaces.
 Representation: It uses three components: X, Y, and Z, where Y represents luminance
(brightness), and X and Z define the chromaticity.
3) HSV (Hue, Saturation, Value) / HSL (Hue, Saturation, Lightness):
 Description: HSV/HSL are representations of colors based on their perceptual attributes.
Hue represents the color itself, saturation is the intensity or purity of the color, and
value/lightness determines the brightness. HSL is more intuitive for human perception than
RGB.
 Representation: Hue is typically represented as an angle (0 to 360 degrees), while
saturation and lightness are represented as percentages.
4) LAB (CIELAB):
 Description: LAB is another color space defined by the CIE, designed to be perceptually
uniform. It separates color information (chromaticity) from luminance, making it well-suited
for color correction and comparisons. It is widely used in industries like printing and graphic
design.
 Representation: LAB has three components: L* (lightness), a* (green to red), and b* (blue
to yellow). The L* component represents the brightness, while a* and b* define the color
information.
5) LCH (CIELCH):
 Description: LCH is a cylindrical representation of the CIELAB color space. It is an
alternative to the more commonly used CIELAB color space. LCH separates color
information into three components: L* (lightness), C* (chroma or colorfulness), and h* (hue
angle).
 Representation: L* represents the brightness, C* represents the colorfulness, and h*
represents the hue angle.
6) YPbPr:
 Description: YPbPr is a component video color space used in analog and digital television
broadcasting. It represents color information separately from brightness. Y (luminance or
brightness) represents the black and white information, while Pb and Pr represent the
chrominance or color information.
 Representation: Y represents luminance, and Pb and Pr represent color difference signals.
7) YUV:
 Description: YUV is another component video color space widely used in video
compression and broadcast television. Similar to YPbPr, it separates brightness (luma or Y)
from color information (chroma or UV). Y represents the brightness, and U and V represent
color information.
 Representation: Y represents luminance, and U and V represent color information.
8) YIQ:
 Description: YIQ is a color space used in the NTSC television standard. It separates the
luminance (Y) from the chrominance (IQ), where I represents the in-phase component
(color information along the axis of the color signal), and Q represents the quadrature or
quadrature-phase component (color information orthogonal to the color signal).
 Representation: Y represents luminance, while I and Q represent chrominance
information.

Advanced Image Concepts


Bezier Curve
Bezier curves are parametric curves widely used in graphics and animation for creating smooth
and aesthetically pleasing shapes. They are defined by a set of control points, and the curve itself
is generated by interpolating between these points.
Here's how they work:
 Control points: Bézier curves are defined by a set of control points, which determine the
overall shape and direction of the curve. These points can be positioned in any
configuration, allowing for a wide variety of shapes.
 Interpolation: The curve is generated by calculating points along a path that interpolates
between the control points. This is done using a mathematical formula based on the degree
of the curve and the positions of the control points.
 Degree: The degree of a Bézier curve determines the number of control points needed to
define it. For example, a linear Bézier curve requires only two control points, while a cubic
Bézier curve requires four control points.
Benefits of Bézier Curves:
 Smooth and continuous: Bézier curves offer smooth and continuous transitions between
control points, resulting in aesthetically pleasing shapes.
 Scalable: These curves can be scaled to any size without losing their smoothness or detail.
 Flexible: By manipulating the control points, you can create a wide variety of shapes, from
simple lines and arcs to complex curves and organic forms.

Ellipsoid

An ellipsoid is a three-dimensional, closed geometric shape where all planar cross-sections are
either ellipses or circles. It's essentially a stretched or squashed sphere. Imagine taking a soft
sphere and pushing or pulling it from different directions – the resulting shape would be an
ellipsoid.

Key characteristics:

 Three axes: An ellipsoid has three independent axes (usually labeled a, b, and c) that
intersect at the center of the shape. These axes determine the size and shape of the
ellipsoid.
 Symmetrical: The ellipsoid is symmetrical about these three axes, meaning it looks the
same from different angles.
 Planar sections: As mentioned earlier, all cross-sections made by a plane through the
ellipsoid will be either ellipses or circles. The size and shape of these cross-sections will
depend on the angle of the cut.

Types of Ellipsoids:

 Oblate spheroid: This type of ellipsoid has two equal semi-minor axes (b) and a larger semi-
major axis (a). It resembles a flattened sphere. Earth is an example of an oblate spheroid
due to its slight bulge at the equator.
 Sphere: When all three axes are equal (a = b = c), the ellipsoid becomes a perfect sphere.

Gamma Correction
Gamma correction is a technique used in image processing and computer graphics to adjust the
brightness and contrast of an image. The gamma correction compensates for the nonlinear
relationship between the intensity of light in a scene and the way it is displayed on a monitor or
captured by a camera.

In simple terms, the gamma correction adjusts the mid-tones of an image, making it visually more
accurate and natural to the human eye. It is particularly important when images are displayed on
computer monitors, television screens, or other digital devices.

How does it work?

Gamma correction involves applying a non-linear power function to the image data. This function
compresses the brighter tones and expands the darker tones, bringing them closer to how we
perceive them.

The formula for gamma correction is:

Vout = Vin^γ

where:

 Vout is the output luminance


 Vin is the input luminance
 γ is the gamma value (typically between 2.2 and 2.4)

Types of Gamma Correction:

 Encoding gamma: This is applied to the image data before it is stored or transmitted. It is
typically applied by the camera or software that created the image.
 Decoding gamma: This is applied to the image data when it is displayed on a device. It is
typically applied by the display device itself.

Benefits of Gamma Correction:


 Improved image quality: Gamma correction helps to display images with more natural and
realistic looking brightness and contrast.
 Reduced file size: By compressing the brighter tones, gamma correction can help to reduce
the file size of images and videos.

Structural Similarity Index

The Structural Similarity Index (SSIM) is a metric used to measure the similarity between two
images. Unlike traditional metrics like Peak Signal-to-Noise Ratio (PSNR) that focus solely on
pixel-level differences, SSIM takes into account the structural information of an image, such as
luminance, contrast, and texture. This makes it a more reliable and accurate measure of perceived
image quality.

Here's a breakdown of the key aspects of SSIM:

How it works:

SSIM operates by comparing three key components of two images:

1. Luminance: Luminance represents the overall brightness of an image. The SSIM index
compares the mean luminance of the two images (μx and μy) and computes their similarity
in terms of brightness.
2. Contrast: Contrast refers to the difference in brightness between objects and their
background in an image. The SSIM index compares the standard deviations of the two
images (σx and σy) and computes their similarity in terms of contrast.
3. Structure: Structure represents the organization of pixel intensities in an image. The SSIM
index compares the covariance of the two images (σxy) and computes their similarity in
terms of structure.

For each component, a comparison function is applied to calculate the similarity between the two
images. These individual component scores are then combined into a single SSIM score between
0 and 1, where:

 1 indicates identical images


 0 indicates completely different images

Applications of SSIM:

 Image compression: SSIM can be used to optimize image compression


algorithms, ensuring that compressed images maintain good visual quality.
 Image denoising: SSIM can be used to guide image denoising algorithms, helping to
remove noise while preserving important image details.

Deconvolution
Deconvolution is a mathematical operation that reverses the process of convolution. Convolution
is a common operation in signal processing and image processing, used for operations such as
blurring, sharpening, and feature extraction. Deconvolution, on the other hand, is often employed
for tasks like image restoration, image deblurring, and inverse problems.

Basic Idea:

Imagine you have a signal (e.g., an image) that has been blurred or distorted by some process
(e.g., a filter or camera lens). Deconvolution aims to reverse this process and recover the original
signal as accurately as possible.

Understanding the Math:

Deconvolution involves solving a mathematical equation where the observed signal is the result of
convolving the original signal with a known "blurring function" (also known as the point spread
function). The goal is to find the original signal by mathematically "undoing" the blurring effect.

Common Deconvolution Techniques:

 Wiener deconvolution: One of the most widely used methods, balancing noise suppression
with detail preservation.
 Blind deconvolution: When the blurring function is unknown, additional information or
assumptions are needed.
 Regularized deconvolution: Incorporates additional constraints into the deconvolution
process to avoid overfitting and noise amplification.

Challenges of Deconvolution:

 Noise amplification: Deconvolution can amplify noise present in the observed


signal, leading to artifacts in the recovered signal.
 Computational complexity: Deconvolution algorithms can be computationally
expensive, especially for large datasets.

Applications of Deconvolution:

 Image deblurring: Sharpening blurry images, particularly in microscopy or astronomy.


 Signal processing: Removing noise or interference from signals.
 Inverse filtering: Recovering the original signal from its filtered version.

Homography

Homography refers to a transformation or mapping between two sets of points in an image or


between two images. A homography can represent the geometric relationship between two
images, such as the transformation between two different camera views of the same scene or the
mapping between an image and a planar surface.

The procedure for estimating a homography involves following steps:

1. Corresponding Points:
Identify corresponding points between two images or scenes. These points should be manually
selected or automatically detected and represent features that are common to both images.

For each set of corresponding points, set up equations that represent the relationship between the
coordinates in one image and the coordinates in the other image. This forms a system of linear
equations.

2. Homogeneous Coordinates:

Convert the coordinates to homogeneous coordinates by adding a third coordinate with a value of
1. This allows the linear equations to be represented in matrix form.

Arrange the homogeneous coordinates into matrices and use linear algebra techniques to solve
the system of equations. The solution is the homography matrix (H).

3. Singular Value Decomposition (SVD):

The linear system is typically solved using Singular Value Decomposition (SVD) to find the
eigenvalues and eigenvectors of a matrix. The eigenvector corresponding to the smallest
eigenvalue provides the solution for the homography matrix.

4. Scaling:

The homography matrix is initially determined up to scale. To remove this ambiguity, the matrix is
typically normalized by dividing all its elements by a suitable scale factor.

5. Validation (Optional):

Depending on the application, it might be necessary to validate the estimated homography. This
can be done by checking how well the transformation aligns with additional corresponding points
not used in the initial estimation. RANSAC (Random Sample Consensus) is a common method for
dealing with outliers (mismatched points).

6. Applying Homography:

Once the homography matrix is estimated and validated, it can be used to transform points or
entire images. For example, it can be used to align images, stitch panoramas, or map virtual
objects onto a real-world scene.

Convolution

Convolution is a mathematical operation that combines two functions to produce a third function.
The convolution operation is denoted by the symbol "*".

In convolution we have a signal and a kernel:

 Input signal: This can be anything, like a sound wave, or an image.


 Kernel: This is a smaller function that acts as a filter or template.
Convolution essentially slides the kernel across the input signal, performing element-wise
multiplication and summing the results at each position. This process is similar to taking a
weighted average of the input signal, where the weights are determined by the values of the
kernel.

key features of convolution:

 Commutative: The order of convolution doesn't matter, meaning that convolving A with B is
the same as convolving B with A.

A*B=B*A

 Associative: Combining multiple convolutions can be done in any order, as long as the
kernel sizes and order of functions are consistent.

A*(B*C) = (A*B)*C

 Linear: Convolution is a linear operation, meaning that scaling the input or kernel also
scales the output proportionally.

K*(A) = kA

Examples of convolution used in action:

 Blurring an image: This can be achieved by using a kernel with all positive values, which
essentially averages the neighbouring pixels.
 Edge detection: This can be achieved by using a kernel that responds to changes in
intensity, such as the Sobel filter.
 Extracting features from an image: Convolutional neural networks (CNNs) use a series of
convolution layers to extract features from images, which are then used to identify objects
or classify images.

You might also like