AI For IA Unit 2
AI For IA Unit 2
Introduction to Image Processing: Images, Pixels, Image Resolution, PPI and DPI,
Bitmap Images, Lossless Compression, Lossy Compression, Image File Formats, Color
Spaces: RGB, XYZ, HSV/HSL, LAB, LCH, YPbPr, YUV, YIQ, Advanced Image Concepts:
Bezire Curve, Ellipsoid, Gamma Correction, Structural Similarity Index, Deconvolution,
Homograph, Convolution
Images
Images are digital representations of visual information stored on computers. They are typically
composed of pixels arranged in a grid, where each pixel represents a single color value. Images
can be created by cameras, scanners, or generated by software. They play a crucial role in
various fields, including communication, entertainment, education, and science. Images can be of
various types, such as photographs, illustrations, graphics, or medical scans.
Pixels
A pixel, short for "picture element," is the smallest unit of a digital image. It is a tiny, square-shaped
element that contains color information. Pixels are arranged in a grid, and the combination of
these pixels creates the visual content of an image. Each pixel is assigned a specific color value,
and the resolution of an image is determined by the number of pixels it contains.
In a grayscale image, each pixel is represented by a single value indicating its brightness. In a
color image, each pixel is usually represented by three values (Red, Green, Blue - RGB) or four
values (Red, Green, Blue, Alpha - RGBA) that together define its color.
The resolution of an image is closely tied to pixels, and it is a critical factor in determining the
clarity and detail of the visual content.
Image Resolution
Image resolution refers to the number of pixels present in an image, usually expressed as the
number of pixels in the horizontal and vertical dimensions. It is often measured in pixels per inch
(PPI) or dots per inch (DPI). The resolution provides an indication of the image's level of detail and
sharpness.
High Resolution: An image with a high resolution has more pixels per inch, resulting in a finer
level of detail. High-resolution images are often preferred for tasks requiring clarity, such as
printing large posters or displaying images on high-density screens.
Low Resolution: An image with a low resolution has fewer pixels per inch, and it may appear
pixelated or blurry when enlarged. Low-resolution images are suitable for online use, such as
websites or social media, where smaller file sizes are desirable for faster loading times.
Bitmap Images
A bitmap image, also known as a raster image, is a type of digital image that is composed of a grid
of individual pixels, where each pixel represents a single colour value.. Each pixel in a bitmap
image contains specific color information, and when these pixels are arranged in a grid, they
collectively form the visual content of the image. Bitmap images are contrasted with vector
images, which are composed of mathematical formulas describing shapes rather than individual
pixels.
Characteristics:
Pixel-based: Unlike vector graphics, which use mathematical formulas to represent shapes and
lines, bitmaps are based on individual pixels. This provides a more realistic representation of
continuous tones and gradients.
Lossy compression: Due to the large amount of data required to store information for each
pixel, bitmap images are often compressed using lossy compression techniques. This reduces
file size but can result in some loss of image quality, especially at high compression levels.
Rasterization: Vector graphics can be converted to bitmaps by a process called rasterization.
This involves determining the color of each pixel based on the underlying vector elements.
Advantages:
Photorealistic representation: Bitmaps excel at capturing realistic details and subtle color
variations, making them ideal for photographs and other images with intricate textures and
shading.
Simple editing: Editing bitmap images is straightforward, as each pixel can be manipulated
directly. This allows for basic adjustments like cropping, resizing, and color correction.
Lossless Compression
Lossless compression is a data compression method that reduces the size of a file or dataset
without any loss of information or quality. In other words, when data is compressed using a
lossless compression algorithm and then decompressed, This means that no information is lost
during the compression process and the original data is perfectly reconstructed. This is in contrast
to lossy compression, where some data is discarded during compression, leading to a loss of
quality.
1. Identifying Redundancy: Most real-world data contains redundancy. This means that the same
information is repeated multiple times throughout the data. Lossless compression algorithms
identify and exploit this redundancy to reduce the file size.
2. Replacing Redundant Data: Once redundant data is identified, the algorithm replaces it with
shorter representations. This can be done using various techniques such as dictionary coding,
Huffman coding, or run-length encoding.
3. Storing Decompression Instructions: In addition to the compressed data, the algorithm also
stores instructions on how to decompress the data back to its original form. These instructions
typically involve a set of rules and tables that the decompression software uses to reverse the
compression process.
No loss of information: The original data is perfectly preserved after being compressed and
decompressed.
Wide range of applications: Lossless compression can be used for various types of
data, including text, images, audio, and video.
Improved storage efficiency: Compressed data takes up less storage space than the
original data, making it ideal for archiving and transferring large files.
Lossy compression algorithms work by analysing the data and identifying patterns that can be
discarded. This can be achieved through various techniques, including:
Quantization: This involves reducing the precision of data representation, discarding less
important details. For example, in an image, subtle differences in color might be ignored.
Transform coding: This involves transforming the data into a different representation where
redundant information is more easily identified and removed.
Information Loss: Some data is permanently discarded during compression, which can lead
to a loss in image quality.
Smaller File Size: Lossy compression typically results in significantly smaller file sizes
compared to the original, which is useful for storage and transmission.
Faster transmission: Smaller files can be transmitted faster over networks, improving
streaming and download times.
Suitable for Natural Images: Lossy compression is often used for photographs, graphics,
and multimedia content where some loss of quality is acceptable.
JPEG: A widely used image format that offers a good balance between compression and
quality.
MP3: A popular audio format that reduces file size while maintaining acceptable sound
quality.
WebP: A modern image format offering smaller file sizes than JPEG with comparable
quality.
Color Spaces: RGB, XYZ, HSV/HSL, LAB, LCH, YPbPr, YUV, YIQ
Color spaces are mathematical models that represent colors in a way that allows for the accurate
and consistent description of colors across different devices and applications. Each color space
has its own set of coordinates and parameters to define colors.
1) RGB (Red, Green, Blue): Description: RGB is a color model where colors are represented as
combinations of red, green, and blue light. It is widely used in electronic displays, such as
computer monitors, television screens, and digital cameras.
Representation: Colors are defined by intensity values for each of the three primary colors
(Red, Green, Blue) usually on a scale from 0 to 255.
2) XYZ (CIE 1931 Color Space):
Description: XYZ is a color space defined by the International Commission on Illumination
(CIE). It is based on human perception and attempts to be a perceptually uniform color
space. XYZ is not intended for practical use but serves as a foundation for other color
spaces.
Representation: It uses three components: X, Y, and Z, where Y represents luminance
(brightness), and X and Z define the chromaticity.
3) HSV (Hue, Saturation, Value) / HSL (Hue, Saturation, Lightness):
Description: HSV/HSL are representations of colors based on their perceptual attributes.
Hue represents the color itself, saturation is the intensity or purity of the color, and
value/lightness determines the brightness. HSL is more intuitive for human perception than
RGB.
Representation: Hue is typically represented as an angle (0 to 360 degrees), while
saturation and lightness are represented as percentages.
4) LAB (CIELAB):
Description: LAB is another color space defined by the CIE, designed to be perceptually
uniform. It separates color information (chromaticity) from luminance, making it well-suited
for color correction and comparisons. It is widely used in industries like printing and graphic
design.
Representation: LAB has three components: L* (lightness), a* (green to red), and b* (blue
to yellow). The L* component represents the brightness, while a* and b* define the color
information.
5) LCH (CIELCH):
Description: LCH is a cylindrical representation of the CIELAB color space. It is an
alternative to the more commonly used CIELAB color space. LCH separates color
information into three components: L* (lightness), C* (chroma or colorfulness), and h* (hue
angle).
Representation: L* represents the brightness, C* represents the colorfulness, and h*
represents the hue angle.
6) YPbPr:
Description: YPbPr is a component video color space used in analog and digital television
broadcasting. It represents color information separately from brightness. Y (luminance or
brightness) represents the black and white information, while Pb and Pr represent the
chrominance or color information.
Representation: Y represents luminance, and Pb and Pr represent color difference signals.
7) YUV:
Description: YUV is another component video color space widely used in video
compression and broadcast television. Similar to YPbPr, it separates brightness (luma or Y)
from color information (chroma or UV). Y represents the brightness, and U and V represent
color information.
Representation: Y represents luminance, and U and V represent color information.
8) YIQ:
Description: YIQ is a color space used in the NTSC television standard. It separates the
luminance (Y) from the chrominance (IQ), where I represents the in-phase component
(color information along the axis of the color signal), and Q represents the quadrature or
quadrature-phase component (color information orthogonal to the color signal).
Representation: Y represents luminance, while I and Q represent chrominance
information.
Ellipsoid
An ellipsoid is a three-dimensional, closed geometric shape where all planar cross-sections are
either ellipses or circles. It's essentially a stretched or squashed sphere. Imagine taking a soft
sphere and pushing or pulling it from different directions – the resulting shape would be an
ellipsoid.
Key characteristics:
Three axes: An ellipsoid has three independent axes (usually labeled a, b, and c) that
intersect at the center of the shape. These axes determine the size and shape of the
ellipsoid.
Symmetrical: The ellipsoid is symmetrical about these three axes, meaning it looks the
same from different angles.
Planar sections: As mentioned earlier, all cross-sections made by a plane through the
ellipsoid will be either ellipses or circles. The size and shape of these cross-sections will
depend on the angle of the cut.
Types of Ellipsoids:
Oblate spheroid: This type of ellipsoid has two equal semi-minor axes (b) and a larger semi-
major axis (a). It resembles a flattened sphere. Earth is an example of an oblate spheroid
due to its slight bulge at the equator.
Sphere: When all three axes are equal (a = b = c), the ellipsoid becomes a perfect sphere.
Gamma Correction
Gamma correction is a technique used in image processing and computer graphics to adjust the
brightness and contrast of an image. The gamma correction compensates for the nonlinear
relationship between the intensity of light in a scene and the way it is displayed on a monitor or
captured by a camera.
In simple terms, the gamma correction adjusts the mid-tones of an image, making it visually more
accurate and natural to the human eye. It is particularly important when images are displayed on
computer monitors, television screens, or other digital devices.
Gamma correction involves applying a non-linear power function to the image data. This function
compresses the brighter tones and expands the darker tones, bringing them closer to how we
perceive them.
Vout = Vin^γ
where:
Encoding gamma: This is applied to the image data before it is stored or transmitted. It is
typically applied by the camera or software that created the image.
Decoding gamma: This is applied to the image data when it is displayed on a device. It is
typically applied by the display device itself.
The Structural Similarity Index (SSIM) is a metric used to measure the similarity between two
images. Unlike traditional metrics like Peak Signal-to-Noise Ratio (PSNR) that focus solely on
pixel-level differences, SSIM takes into account the structural information of an image, such as
luminance, contrast, and texture. This makes it a more reliable and accurate measure of perceived
image quality.
How it works:
1. Luminance: Luminance represents the overall brightness of an image. The SSIM index
compares the mean luminance of the two images (μx and μy) and computes their similarity
in terms of brightness.
2. Contrast: Contrast refers to the difference in brightness between objects and their
background in an image. The SSIM index compares the standard deviations of the two
images (σx and σy) and computes their similarity in terms of contrast.
3. Structure: Structure represents the organization of pixel intensities in an image. The SSIM
index compares the covariance of the two images (σxy) and computes their similarity in
terms of structure.
For each component, a comparison function is applied to calculate the similarity between the two
images. These individual component scores are then combined into a single SSIM score between
0 and 1, where:
Applications of SSIM:
Deconvolution
Deconvolution is a mathematical operation that reverses the process of convolution. Convolution
is a common operation in signal processing and image processing, used for operations such as
blurring, sharpening, and feature extraction. Deconvolution, on the other hand, is often employed
for tasks like image restoration, image deblurring, and inverse problems.
Basic Idea:
Imagine you have a signal (e.g., an image) that has been blurred or distorted by some process
(e.g., a filter or camera lens). Deconvolution aims to reverse this process and recover the original
signal as accurately as possible.
Deconvolution involves solving a mathematical equation where the observed signal is the result of
convolving the original signal with a known "blurring function" (also known as the point spread
function). The goal is to find the original signal by mathematically "undoing" the blurring effect.
Wiener deconvolution: One of the most widely used methods, balancing noise suppression
with detail preservation.
Blind deconvolution: When the blurring function is unknown, additional information or
assumptions are needed.
Regularized deconvolution: Incorporates additional constraints into the deconvolution
process to avoid overfitting and noise amplification.
Challenges of Deconvolution:
Applications of Deconvolution:
Homography
1. Corresponding Points:
Identify corresponding points between two images or scenes. These points should be manually
selected or automatically detected and represent features that are common to both images.
For each set of corresponding points, set up equations that represent the relationship between the
coordinates in one image and the coordinates in the other image. This forms a system of linear
equations.
2. Homogeneous Coordinates:
Convert the coordinates to homogeneous coordinates by adding a third coordinate with a value of
1. This allows the linear equations to be represented in matrix form.
Arrange the homogeneous coordinates into matrices and use linear algebra techniques to solve
the system of equations. The solution is the homography matrix (H).
The linear system is typically solved using Singular Value Decomposition (SVD) to find the
eigenvalues and eigenvectors of a matrix. The eigenvector corresponding to the smallest
eigenvalue provides the solution for the homography matrix.
4. Scaling:
The homography matrix is initially determined up to scale. To remove this ambiguity, the matrix is
typically normalized by dividing all its elements by a suitable scale factor.
5. Validation (Optional):
Depending on the application, it might be necessary to validate the estimated homography. This
can be done by checking how well the transformation aligns with additional corresponding points
not used in the initial estimation. RANSAC (Random Sample Consensus) is a common method for
dealing with outliers (mismatched points).
6. Applying Homography:
Once the homography matrix is estimated and validated, it can be used to transform points or
entire images. For example, it can be used to align images, stitch panoramas, or map virtual
objects onto a real-world scene.
Convolution
Convolution is a mathematical operation that combines two functions to produce a third function.
The convolution operation is denoted by the symbol "*".
Commutative: The order of convolution doesn't matter, meaning that convolving A with B is
the same as convolving B with A.
A*B=B*A
Associative: Combining multiple convolutions can be done in any order, as long as the
kernel sizes and order of functions are consistent.
A*(B*C) = (A*B)*C
Linear: Convolution is a linear operation, meaning that scaling the input or kernel also
scales the output proportionally.
K*(A) = kA
Blurring an image: This can be achieved by using a kernel with all positive values, which
essentially averages the neighbouring pixels.
Edge detection: This can be achieved by using a kernel that responds to changes in
intensity, such as the Sobel filter.
Extracting features from an image: Convolutional neural networks (CNNs) use a series of
convolution layers to extract features from images, which are then used to identify objects
or classify images.