Unit 2 AIIA 2
Unit 2 AIIA 2
Image Processing
Introduction to Image Processing:
Images, Pixels, Image Resolution, PPI and DPI, Bitmap Images, Lossless Compression, Lossy
Compression, Image File Formats, Color Spaces: RGB, XYZ, HSV/HSL, LAB, LCH, YPbPr,
YUV, YIQ, Advanced Image Concepts: Bezire Curve, Ellipsoid, Gamma Correction, Structural
Similarity Index, Deconvolution, Homography, Convolution
Digital Image Processing means processing digital image by means of a digital computer. We can also
say that it is a use of computer algorithms, in order to get enhanced image either to extract some useful
information.
Digital image processing is the use of algorithms and mathematical models to process and analyze
digital images. The goal of digital image processing is to enhance the quality of images, extract
meaningful information from images, and automate image-based tasks.
1. Image acquisition: This involves capturing an image using a digital camera or scanner, or
importing an existing image into a computer.
2. Image enhancement: This involves improving the visual quality of an image, such as increasing
contrast, reducing noise, and removing artifacts.
3. Image restoration: This involves removing degradation from an image, such as blurring, noise,
and distortion.
4. Image segmentation: This involves dividing an image into regions or segments, each of which
corresponds to a specific object or feature in the image.
5. Image representation and description: This involves representing an image in a way that can be
analyzed and manipulated by a computer, and describing the features of an image in a compact and
meaningful way.
6. Image analysis: This involves using algorithms and mathematical models to extract information
from an image, such as recognizing objects, detecting patterns, and quantifying features.
7. Image synthesis and compression: This involves generating new images or compressing existing
images to reduce storage and transmission requirements.
8. Digital image processing is widely used in a variety of applications, including medical imaging,
remote sensing, computer vision, and multimedia.
Types of an image
1. BINARY IMAGE– The binary image as its name suggests, contain only two pixel elements i.e
0 & 1,where 0 refers to black and 1 refers to white. This image is also known as Monochrome.
2. BLACK AND WHITE IMAGE– The image which consist of only black and white color is
called BLACK AND WHITE IMAGE.
3. 8 bit COLOR FORMAT– It is the most famous image [Link] has 256 different shades of
colors in it and commonly known as Grayscale Image. In this format, 0 stands for Black, and 255
stands for white, and 127 stands for gray.
4. 16 bit COLOR FORMAT– It is a color image format. It has 65,536 different colors in [Link] is
also known as High Color Format. In this format the distribution of color is not as same as
Grayscale image.
A 16 bit format is actually divided into three further formats which are Red, Green and Blue. That
famous RGB format.
Image as a Matrix
As we know, images are represented in rows and columns we have the following syntax in which
images are represented:
The right side of this equation is digital image by definition. Every element of this matrix is called
image element , picture element , or pixel.
According to block 1,if input is an image and we get out image as a output, then it is termed as Digital
Image Processing.
According to block 2,if input is an image and we get some kind of information or description as a
output, then it is termed as Computer Vision.
According to block 3,if input is some description or code and we get image as an output, then it is
termed as Computer Graphics.
According to block 4,if input is description or some keywords or some code and we get description or
some keywords as a output,then it is termed as Artificial Intelligence
Advantages of Digital Image Processing:
1. Improved image quality: Digital image processing algorithms can improve the visual quality of
images, making them clearer, sharper, and more informative.
2. Automated image-based tasks: Digital image processing can automate many image-based tasks,
such as object recognition, pattern detection, and measurement.
3. Increased efficiency: Digital image processing algorithms can process images much faster than
humans, making it possible to analyze large amounts of data in a short amount of time.
4. Increased accuracy: Digital image processing algorithms can provide more accurate results than
humans, especially for tasks that require precise measurements or quantitative analysis.
1. High computational cost: Some digital image processing algorithms are computationally
intensive and require significant computational resources.
2. Limited interpretability: Some digital image processing algorithms may produce results that are
difficult for humans to interpret, especially for complex or sophisticated algorithms.
3. Dependence on quality of input: The quality of the output of digital image processing algorithms
is highly dependent on the quality of the input images. Poor quality input images can result in poor
quality output.
4. Limitations of algorithms: Digital image processing algorithms have limitations, such as the
difficulty of recognizing objects in cluttered or poorly lit scenes, or the inability to recognize
objects with significant deformations or occlusions.
5. Dependence on good training data: The performance of many digital image processing
algorithms is dependent on the quality of the training data used to develop the algorithms. Poor
quality training data can result in poor performance of the algorit
6.
2.2 Pixels
An image is an array of pixels (picture elements) arranged in columns and rows. So image is a collection
of discrete (and usually small) cells, which is known as pixel in image processing. In a 8-bit greyscale
image each picture element has an assigned intensity that ranges from 0 to 255. A grey scale image will
include many shades of grey as show below.
Grayscale Image showing pixel value
In the above image, each pixel has a value from 0 (black) to 255 (white). Possible range of the pixel
values depend on the colour depth of the image. Here each pixel is represented by 8 bit = 256 (2 ^8) tones
or greyscales. Some greyscale images have more greyscales levels, for instance 16 bit = 65536
greyscales.
Pixel
A digital image is divided into a rectangular grid of pixels, so that each pixel is itself a small rectangle.
Once this has been done, each pixel is given a pixel value that represents the color of that pixel. It is
assumed that the whole pixel is the same color, and so any color variation that did exist within the area of
the pixel before the image was discretized is lost. If the area of each pixel is very small, then the discrete
nature of the image is often not visible to the human eye.
Pixel Values
Each of the pixels that represents an image stored inside a computer has a pixel value which describes
color value. For a grayscale images, the pixel value is a single number that represents the brightness of
the pixel. Pixel value zero is taken to be black, and 255 is taken to be white. Values in between make up
the different shades of gray.
To represent color images, separate red, green and blue components must be specified for each pixel
(assuming an RGB colorspace). Different components are stored as three separate grayscale images
known as color planes (one for each of red, green and blue).
Below image shows the image and associated coordinates system. (0,0) coordinate starts from top left
corner of the image as shown below. A pixel is located using x and y coordinate. At coordinate (1,1), the
pixel values of the cell is (255,255,0). So Red channel value is 255, Green channel value is 255 and Blue
channel value is 0.
Multi-spectral images can contain even more than three components for each pixel, and by extension
these are stored in the same kind of way, as a vector pixel value, or as separate color planes.
Usually 8-bit integers are the most common sorts of pixel values used. Some image formats support
different types of value, for instance 32-bit signed integers or floating point values. Such values are useful
in image processing as they allow processing to be carried out on the image where the resulting pixel
values are not necessarily 8-bit integers.
2.3 Image Resolution
Image resolution is the level of detail an image holds. The term applies to digital images, film images,
and other types of images. "Higher resolution" means more image detail. Image resolution can be
measured in various ways. Resolution quantifies how close lines can be to each other and still be
visibly resolved. Resolution units can be tied to physical sizes (e.g. lines per mm, lines per inch), to the
overall size of a picture (lines per picture height, also known simply as lines, TV lines, or TVL), or to
angular subtense. Instead of single lines, line pairs are often used, composed of a dark line and an
adjacent light line; for example, a resolution of 10 lines per millimeter means 5 dark lines alternating with
5 light lines, or 5 line pairs per millimeter (5 LP/mm). Photographic lens and are most often quoted in line
pairs per millimeter.
Types
The resolution of digital cameras can be described in many different ways.
Pixel count
The term resolution is often considered equivalent to pixel count in digital imaging, though international
standards in the digital camera field specify it should instead be called "Number of Total Pixels" in
relation to image sensors, and as "Number of Recorded Pixels" for what is fully captured. Hence, CIPA
DCG-001 calls for notation such as "Number of Recorded Pixels 1000 × 1500". [1][2] According to the
same standards, the "Number of Effective Pixels" that an image sensor or digital camera has is the count
of pixel sensors that contribute to the final image (including pixels not in said image but nevertheless
support the image filtering process), as opposed to the number of total pixels, which includes unused or
light-shielded pixels around the edges.
An image of N pixels height by M pixels wide can have any resolution less than N lines per picture
height, or N TV lines. But when the pixel counts are referred to as "resolution", the convention is to
describe the pixel resolution with the set of two positive integer numbers, where the first number is the
number of pixel columns (width) and the second is the number of pixel rows (height), for example
as 7680 × 6876. Another popular convention is to cite resolution as the total number of pixels in the
image, typically given as number of megapixels, which can be calculated by multiplying pixel columns
by pixel rows and dividing by one million. Other conventions include describing pixels per length unit or
pixels per area unit, such as pixels per inch or per square inch. None of these pixel resolutions are true
resolutions[clarification needed], but they are widely referred to as such; they serve as upper bounds on image
resolution.
Below is an illustration of how the same image might appear at different pixel resolutions, if the pixels
were poorly rendered as sharp squares (normally, a smooth image reconstruction from pixels would be
preferred, but for illustration of pixels, the sharp squares make the point better).
An image that is 2048 pixels in width and 1536 pixels in height has a total of 2048×1536 = 3,145,728
pixels or 3.1 megapixels. One could refer to it as 2048 by 1536 or a 3.1-megapixel image. The image
would be a very low quality image (72ppi) if printed at about 28.5 inches wide, but a very good quality
(300ppi) image if printed at about 7 inches wide.
The number of photodiodes in a color digital camera image sensor is often a multiple of the number of
pixels in the image it produces, because information from an array of color image sensors is used to
reconstruct the color of a single pixel. The image has to be interpolated or demosaiced to produce all three
colors for each output pixel.
Spatial resolution
Main article: Spatial resolution
The terms blurriness and sharpness are used for digital images but other descriptors are used to reference
the hardware capturing and displaying the images.
Spatial resolution in radiology refers to the ability of the imaging modality to differentiate two objects.
Low spatial resolution techniques will be unable to differentiate between two objects that are relatively
close together.
The 1951 USAF resolution test target is a classic test target used to determine spatial resolution of
imaging sensors and imaging systems.
Image at left has a higher pixel count than the one to the right, but is still of worse spatial resolution.
The measure of how closely lines can be resolved in an image is called spatial resolution, and it depends
on properties of the system creating the image, not just the pixel resolution in pixels per inch (ppi). For
practical purposes the clarity of the image is decided by its spatial resolution, not the number of pixels in
an image. In effect, spatial resolution refers to the number of independent pixel values per unit length.
The spatial resolution of consumer displays ranges from 50 to 800 pixel lines per inch. With
scanners, optical resolution is sometimes used to distinguish spatial resolution from the number of pixels
per inch.
In remote sensing, spatial resolution is typically limited by diffraction, as well as by aberrations,
imperfect focus, and atmospheric distortion. The ground sample distance (GSD) of an image, the pixel
spacing on the Earth's surface, is typically considerably smaller than the resolvable spot size.
In astronomy, one often measures spatial resolution in data points per arcsecond subtended at the point of
observation, because the physical distance between objects in the image depends on their distance away
and this varies widely with the object of interest. On the other hand, in electron microscopy, line or fringe
resolution refers to the minimum separation detectable between adjacent parallel lines (e.g. between
planes of atoms), whereas point resolution instead refers to the minimum separation between adjacent
points that can be both detected and interpreted e.g. as adjacent columns of atoms, for instance. The
former often helps one detect periodicity in specimens, whereas the latter (although more difficult to
achieve) is key to visualizing how individual atoms interact.
In Stereoscopic 3D images, spatial resolution could be defined as the spatial information recorded or
captured by two viewpoints of a stereo camera (left and right camera).
Spectral resolution
Main article: ICC profile
Pixel encoding limits the information stored in a digital image, and the term color profile is used for
digital images but other descriptors are used to reference the hardware capturing and displaying the
images.
Spectral resolution is the ability to resolve spectral features and bands into their separate
components. Color images distinguish light of different spectra. Multispectral images can resolve even
finer differences of spectrum or wavelength by measuring and storing more than the traditional 3 of
common RGB color images.
Temporal resolution
Main article: Frame rate
Temporal resolution (TR) refers to the precision of a measurement with respect to time.
Movie cameras and high-speed cameras can resolve events at different points in time. The time resolution
used for movies is usually 24 to 48 frames per second (frames/s), whereas high-speed cameras may
resolve 50 to 300 frames/s, or even more.
The Heisenberg uncertainty principle describes the fundamental limit on the maximum spatial resolution
of information about a particle's coordinates imposed by the measurement or existence of information
regarding its momentum to any degree of precision.
This fundamental limitation can, in turn, be a factor in the maximum imaging resolution at subatomic
scales, as can be encountered using scanning electron microscopes.
Radiometric resolution
Main article: Color depth
Radiometric resolution determines how finely a system can represent or distinguish differences
of intensity, and is usually expressed as a number of levels or a number of bits, for example 8 bits or 256
levels that is typical of computer image files. The higher the radiometric resolution, the better subtle
differences of intensity or reflectivity can be represented, at least in theory. In practice, the effective
radiometric resolution is typically limited by the noise level, rather than by the number of bits of
representation.
PPI resolution
—
What PPI means
PPI, or pixels per inch, refers both to the fixed number of pixels that a screen can
display and the density of pixels within a digital image. Pixel count on the other hand
refers to the number of pixels across the length and width of a digital image—that is,
the image dimensions in pixels. Pixels, or “picture elements”, are the smallest building
blocks of a digital image. Zoom in to any image on your and you will see it break up
into colored squares—these are pixels.
Each pixel is made up of RGB subpixels
Pixel count describe an image’s dimensions based on the number of pixels
PPI, or pixel density, describes the amount of detail in an image based on the
concentration of pixels
Within pixels are sub-pixels, red, green and blue light elements that the human eye
cannot see because additive color processing blends them into a single hue which
appears on the pixel level. This is why PPI utilizes the RGB (red, green and
blue) color model, also known as the additive color model. This does not exist in
print—only in the electronic display of images, like television screens, computer
monitors and digital photography.
A
higher PPI resolution results in more detail and a sharper image
Because increasing the PPI increases the size of your file, you will want to use a high PPI only when
necessary. For example, when printing involves many fine details on a glossy surface, it’s best to consider
using a higher resolution. Printing an image on canvas does not require as high a resolution because
details get lost in the texture of the material. PPI does not really matter for distribution on the web
because the pixel density of your monitor is fixed. A 72 PPI image and a 3,000 PPI image will appear the
same on your screen. It is the pixel dimensions (the amount of pixels from left to right, top to bottom) that
will determine the size and detail of your image.
The New Document window in Photoshop has you set your Pixels Per Inch resolution in the beginning
Raster programs (software that work with pixel-based media) like Photoshop have you set up the PPI
resolution right at the beginning when you create a document. You will find Resolution listed with other
parameters in the New Document window.
If you need to increase the resolution on an image that has already been created, you can resample it.
Resampling is the process of changing the amount of pixels in an image, in which the software will create
or delete pixels to preserve image quality.
In Photoshop, you can do this by navigating to Image > Image Size. In the Image Size window, you will
have options for changing the width, height and PPI resolution of your image. Select the “Resample”
checkbox and set it to “Preserve Details” to choose how Photoshop fills in the new pixels.
The Image Size window gives you options for adjusting your resolution in Photoshop
You can decrease the resolution if you set the PPI to a lower value. As the pixel count decreases, the
image size and dimensions decrease as well. You increase the resolution when you set PPI to a higher
value. This allows the image to be printed at a larger print size.
That said, it is best to avoid changing the PPI on an existing image whenever possible. The resampling
process requires Photoshop to generate new pixels from scratch. While Photoshop is able to read the
surrounding pixels and make a somewhat accurate guess as to what color each new pixel should be,
computers are notoriously bad at “seeing” images the way humans can. Thus, computer generated pixels
can create unintentional results on your image.
DPI printing
—
DPI, or dots per inch, refers to the resolution value of a physical printer. Printers reproduce an image by
spitting out tiny dots, and the number of dots per inch affects the amount of detail and overall quality of
the print.
Printer dots mix CMYK inks
DPI describes the amount of detail in an image based on the concentration of printer dots
DPI uses the CMYK (cyan, magenta, yellow and key/black) color model to control the amount of red,
green, and blue light that is reflected from white paper. This is also known as the subtractive color model.
Dots of each color are printed in patterns, enabling the human eye to perceive a specific color made from
this combination. DPI is a measurement of this density. These dots are a fixed size and resolution is only
affected by how many dots appear per inch.
When your design is going to be physically printed, the printer will use DPI. Each model and style of
printer produces its own unique DPI based on its settings. Inkjet printers produce a resolution around 300
to 720 DPI, while laser printers produce images anywhere from 600 to 2,400 DPI.
Higher DPI can mean higher resolution, but dot sizes vary by printer
There is no standard dot size or shape, so higher DPI does not always equate to a higher quality print. One
manufacturer’s dots might look as good at 1200 DPI as another manufacturer’s dots do at 700 DPI. Books
and magazines often use 150 DPI for photographic reproduction, and newspapers often use 85 DPI. Ask
the printshop or consult the printer specifications to find the appropriate DPI for your project.
Knowing how to use PPI will empower you to produce high quality images every time. And knowing
how to navigate DPI will help you to effectively communicate with printing machines and professionals
in the printing industry. Unless you are a printer, your main focus will be on PPI. But it is important to
understand the process of physical printing if your work requires it on a regular basis.
In the end, even the best design can be ruined by a poor image resolution. That’s why if you want crystal
clear quality in your designs, make sure you’re working with a professional designer.
Check out this article on graphic design basics for more design knowledge.
2.5 Bitmap Images
What Is Bitmap?
"Bitmap" images are created by arranging a grid of differently colored pixels. When viewed from a
distance or at a small scale, the images appear natural. But, if viewed up close or when the image is
enlarged, they appear blurry and "pixelated."
This method can create any 2D rectangular image. More than that, a rectangular image created using
bitmap can be copied and pasted repeatedly to quickly and easily cover a vast area with a similar
repeating pattern, known as a "tilemap."
The only real limitation to bitmap design is file size. Creating crisp and highly detailed images requires a
higher number of "bits." This can mean that these images take up a great deal of computing space.
Furthermore, an image cannot effectively have a higher resolution than the screen that it appears on.
To understand the limitations of display and bitmaps, put your eye very close to your screen to see "the
screendoor effect." This is the grid pattern that appears on digital images due to the space between pixels.
It's a big topic in virtual reality because of how close the display is to your face, but it's actually a factor
in any digital display.
"8-bit" videogames and graphics are good illustrations of bitmap design. Be careful, though. 8-bit doesn't
refer to the resolution. It refers to the memory that each pixel requires.
More "bits" really just means more color options. This comes into play with "retro" or "8-bit-style" games
made with modern designs for modern displays, like Minecraft.
While bitmap images are only as old as digital displays, the same way of constructing images from
discrete points has been used for decades. The print version of a bitmap, called a "dot matrix," was used
in image printing for decades. Just like some videogames deliberately replicate 8-bit graphics, some
comics deliberately maintain dot matrix.
The ability to scale the image is a huge benefit over a bitmap, but that's more or less where the benefit
ends. Vector images are harder to create from scratch, and a lot is lost in the design process. Further, it is
hard to make a vector image that can be stylistically replicated in the same way that a bitmap can be used
for a tilemap.
No details are lost along the way, hence the name. As it’s a reversible process, you can also easily switch
back to the original file if you need to.
Lossless compression uses an algorithm to shrink the image without losing any important data.
Essentially, it rewrites the file to make it more efficient - resulting in a smaller storage or transfer size.
One way the algorithm does this is by replacing non-essential information and storing them in an index
file.
Compression algorithms reconstruct the compressed data to look exactly like the original. The indexed
data then allows you to return the images to their original form if you wish.
The point of this kind of compression is to preserve as much of the image quality as possible, meaning
lossless images can be hard to spot! In some cases, you may see a slight difference — but the images are
often close to identical.
However, as you look closely, you may notice a slight change in texture where some pixelation has
begun. This would be fine for storing digital images — ready to restore them back to their original size
and make any further edits — but for high-definition perfectionists, you may need to tweak the
compression rate to get the desired result.
In any case, it’s easy to see how lossless compression can shrink the size of an image, while losing little
to no graphic detail.
Lossless compression is used to make it easier to store, transfer and upload high-quality digital images, as
the definition remains the same while occupying less space.
This makes it ideal for enthusiasts of high-quality sports photography, fine art photography, nature
photography and more, as you can then handle large numbers of files, without compromising on quality.
As a reversible compression method, lossless compression can also be used as an alternative to lossy
compression, after which the original image cannot be recovered.
Lossless compression advantages and disadvantages.
Lossless compression isn’t the only way to reduce the size of digital images. To understand which method
is right for you, it helps to weigh up the pros and cons of using a lossless image type.
No loss of quality. It typically works by removing non-essential metadata from image files,
leaving you with a smaller but identical high-quality image. This is ideal for when you need to showcase
images and photography for a digital portfolio or when delivering images to a client.
Images can be restored. Lossless compression is reversible, should you need to restore an image
to its former glory for editing or printing.
Faster loading time. Image compression can help to improve a website’s performance.
Faster transfers. Whether you’re sending image files for a job or simply transferring from one
folder to another, lossless compression can make transfers much quicker by shrinking the file size.
File sizes cannot be reduced by a large amount. The drawback of retaining high-quality is that
lossless algorithms cannot reduce file size as much as those that use lossy compression. This can be
limiting when working with large volumes of images.
Web use. Although detailed, lossless images typically have a larger file size than their lossy
counterparts. This can affect the responsiveness of a web page, which is something you may want to
consider if you need to display lots of images.
Lossy compression algorithms are techniques that reduce file size by discarding the less important
information.
Nobody likes losing information, but some types of files are so large that there's just not enough space to
keep all the original data, plus we didn't need all that data in the first place. That's often the case with
photos, videos, and audio recordings; files that try to capture the beautiful complexity of the world we
live in.
Computers can capture an incredible amount of detail in a photo—but how much of that detail can
humans actually perceive? As it turns out, there's a lot of detail that we can remove. Lossy compression
algorithms are all about figuring out clever ways to remove detail without humans noticing (too much).
Let's explore some of the clever ways that lossy compression algorithms remove detail to reduce file size.
The human eye is better at perceiving differences in brightness than differences in color. A compression
algorithm can take advantage of that fact by keeping the brightness while reducing the amount of color
information, a process known as chroma subsampling.
Let's try it on the photo of the cat in the hat. The first step is to separate the brightness information from
the chroma (color).
Let's zoom into an 8x2 block of chroma from the left eye:
Each of those pixels are slightly different chroma values, so there are 16 values total. We can average
each 2x2 block and set its color to the average, so that there are only 4 values total:
The result looks quite similar to the original, but it has a quarter of the original color information. If we
apply that to the entire image, we can save a lot of space without affecting perception much.
Chroma subsampling is a process used in many compression algorithms that deal with images, including
the popular file formats JPEG and MPEG.
Those algorithms also use a process called discrete cosine transform (DCT) to simplify the details in the
brightness layer. If you'd like to learn more, here's a nice walk-through.
The human ear has limitations to what it can hear. Audio compression algorithms can analyze an audio
file and discard the sounds that are outside our hearing capacity, a process known as perceptual audio
coding.
One interesting limitation of our ears is known as temporal masking. That's when a sudden sound can
mask (hide) other sounds for a period after it occurs—and even a bit before!
For example, imagine a song with a loud drum beat. The beat masks sounds for about 20 milliseconds
before it happens, and for up to 200 milliseconds after it happens. This graph illustrates the masking
effect:
A graph with time in milliseconds on the x axis, and loudness in decibels on the y axis. A drum beat
happens from 140-200ms. The pre-masking effect is a tight curve leading up to the beat, from 100-140ms.
The post-masking effect is a more gradual curve, from 200-350ms.
The grey striped areas show the pre-masking and post-masking periods.
The computer sees those hidden sounds in the recorded audio file, but our ears can't hear them, so audio
compression algorithms can safely discard that information or represent it with fewer bits.
Compression algorithms can also use other limitations of our natural auditory process, like the high
frequency limit and simultaneous masking. If you'd like to learn more, you can research the fascinating
field of psychoacoustics.
Compression quality
When we use a lossless compression algorithms, we can always reconstruct 100% of the original data.
With lossy compression algorithms, we're losing some % of the original data; maybe 5%, maybe 10%,
maybe 70%. How much do we lose? We can decide that based on our use case for the data.
Consider the earlier photo of the cat in the hat. If the plan is to put that photo in a presentation about cat
fashion and project it on a large screen, we probably want to keep as much detail as possible. In that case,
we can either use a lossless compression algorithm (like PNG) or we can use a lossy compression
algorithm (like JPEG) and specify a high quality (like 100%). Photo editing applications often give you
those options.
Screenshot of File settings from Export menu for Adobe Photoshop. For "Format" option, "JPG" is
selected. For "Quality" option, slider is set to "100%".
What if we want to use that photo in a website, and our target users for the website are in another country
on low-bandwidth connections? The smaller the file, the faster the download. We want them to see the
photo, but for this use case, high quality isn't as important as download speed. We'll definitely want to use
a lossy compression algorithm, and we can specify a lower quality when exporting. The algorithm can
tweak its internal dials to simplify details even more.
Screenshot of File settings from Export menu for Adobe Photoshop. For "Format" option, "JPG" is
selected. For "Quality" option, slider is set to "60%".
That still looks pretty good - definitely usable for a low-bandwidth website. How low can we go? Here's
the photo at 1% quality:
Photo of a gray cat with green eyes sitting in a blue hat on a gray couch.
It's definitely not perfect—the green of the eyes seems to be smearing into the fur, and there are artifacts
where the hat meets the chair. But it's also only 12 KB, less than a tenth of the original size. It's impressive
how much information we can lose but still convey so much detail.
We can also specify quality for lossy compression algorithms that transform audio and video files. You've
probably seen low quality, highly compressed videos around the Web; videos are the most likely to get
compressed since they're so large to begin with.
Whenever we use lossy compression, we're always making a trade-off between quality and size, and it's
up to us to find the settings that work best for our use case.
Including proprietary types, there are hundreds of image file types. The PNG, JPEG, and GIF formats are
most often used to display images on the Internet. Some of these graphic formats are listed and briefly
described below, separated into the two main families of graphics: raster and vector. Raster images are
further divided into formats primarily aimed at (web) delivery (i.e. supporting relatively strong
compression) versus formats primarily aimed at authoring or interchange (uncompressed or only
relatively weak compression).
In addition to straight image formats, Metafile formats are portable formats which can include both raster
and vector information. Examples are application-independent formats such as WMF and EMF. The
metafile format is an intermediate format. Most applications open metafiles and then save them in their
own native format. Page description language refers to formats used to describe the layout of a printed
page containing text, objects and images. Examples are PostScript, PDF and PCL.
Raster formats (2D)
Further information: Raster graphics
Delivery formats
JPEG[edit]
JPEG (Joint Photographic Experts Group) is a lossy compression method; JPEG-compressed images are
usually stored in the JFIF (JPEG File Interchange Format) or the Exif (Exchangeable image file format)
file format. The JPEG filename extension is JPG or JPEG. Nearly every digital camera can save images
in the JPEG format, which supports eight-bit grayscale images and 24-bit color images (eight bits each for
red, green, and blue). JPEG applies lossy compression to images, which can result in a significant
reduction of the file size. Applications can determine the degree of compression to apply, and the amount
of compression affects the visual quality of the result. When not too great, the compression does not
noticeably affect or detract from the image's quality, but JPEG files suffer generational degradation when
repeatedly edited and saved. (JPEG also provides lossless image storage, but the lossless version is not
widely supported.)
GIF
The GIF (Graphics Interchange Format) is in normal use limited to an 8-bit palette, or 256 colors (while
24-bit color depth is technically possible). [1][2] GIF is most suitable for storing graphics with few colors,
such as simple diagrams, shapes, logos, and cartoon style images, as it uses LZW lossless compression,
which is more effective when large areas have a single color, and less effective for photographic
or dithered images. Due to GIF's simplicity and age, it achieved almost universal software support. Due to
its animation capabilities, it is still widely used to provide image animation effects, despite its low
compression ratio compared to modern video formats.
PNG
The PNG (Portable Network Graphics) file format was created as a free, open-source alternative to GIF.
The PNG file format supports 8-bit (256 colors) paletted images (with optional transparency for all palette
colors) and 24-bit truecolor (16 million colors) or 48-bit truecolor with and without alpha channel – while
GIF supports only 8-bit palettes with a single transparent color.
Compared to JPEG, PNG excels when the image has large, uniformly colored areas. Even for
photographs – where JPEG is often the choice for final distribution since its lossy compression typically
yields smaller file sizes – PNG is still well-suited to storing images during the editing process because of
its lossless compression.
PNG provides a patent-free replacement for GIF (though GIF is itself now patent-free) and can also
replace many common uses of TIFF. Indexed-color, grayscale, and truecolor images are supported, plus
an optional alpha channel. The Adam7 interlacing allows an early preview, even when only a small
percentage of the image data has been transmitted — useful in online viewing applications like web
browsers. PNG can store gamma and chromaticity data, as well as ICC profiles, for accurate color
matching on heterogeneous platforms.
Animated formats derived from PNG are MNG and APNG, which is backwards compatible with PNG
and supported by most browsers.
JPEG 2000
JPEG 2000 is a compression standard enabling both lossless and lossy storage. The compression methods
used are different from the ones in standard JFIF/JPEG; they improve quality and compression ratios, but
also require more computational power to process. JPEG 2000 also adds features that are missing in
JPEG. It is not nearly as common as JPEG, but it is used currently in professional movie editing and
distribution (some digital cinemas, for example, use JPEG 2000 for individual movie frames).
WebP
WebP is an open image format released in 2010 that uses both lossless and lossy compression. It was
designed by Google to reduce image file size to speed up web page loading: its principal purpose is to
supersede JPEG as the primary format for photographs on the web. WebP is based on VP8's intra-frame
coding and uses a container based on RIFF.
In 2011,[3] Google added an "Extended File Format" allowing WebP support for animation, ICC
profile, XMP and Exif metadata, and tiling.
The support for animation allowed for converting older animated GIF to animated WebP.
The WebP container (i.e., RIFF container for WebP) allows feature support over and above the basic use
case of WebP (i.e., a file containing a single image encoded as a VP8 key frame). The WebP container
provides additional support for:
Lossless compression – An image can be losslessly compressed, using the WebP Lossless Format.
Metadata – An image may have metadata stored in EXIF or XMP formats.
Transparency – An image may have transparency, i.e., an alpha channel.
Color Profile – An image may have an embedded ICC profile as described by the International
Color Consortium.
Animation – An image may have multiple frames with pauses between them, making it an
animation.[4]
HDR raster formats
Most typical raster formats cannot store HDR data (32 bit floating point values per pixel component),
which is why some relatively old or complex formats are still predominant here, and worth mentioning
separately. Newer alternatives are showing up, though. RGBE is the format for HDR images originating
from Radiance and also supported by Adobe Photoshop. JPEG-HDR is a file format from Dolby Labs
similar to RGBE encoding, standardized as JPEG XT Part 2.
JPEG XT Part 7 includes support for encoding floating point HDR images in the base 8-bit JPEG file
using enhancement layers encoded with four profiles (A-D); Profile A is based on the RGBE format and
Profile B on the XDepth format from Trellis Management.
HEIF
The High Efficiency Image File Format (HEIF) is an image container format that was standardized
by MPEG on the basis of the ISO base media file format. While HEIF can be used with any image
compression format, the HEIF standard specifies the storage of HEVC intra-coded images and HEVC-
coded image sequences taking advantage of inter-picture prediction.
AVIF
AV1 Image File Format (AVIF) standardized by the video consortium Alliance for open media
(AOMedia) creator of the video format Av1, to take advantage of modern compression algorithms and a
completely royalty-free image format. It uses the image format with AVIF coding and recommends using
the HEIF container, see AV1 in HEIF.
JPEG XL
JPEG XL is a royalty-free raster-graphics file format that supports both lossy and lossless compression. It
supports reversible recompression of existing JPEG files, as well as high-precision HDR (up to 32-bit
floating point values per pixel component). It is designed to be usable for both delivery and authoring use
cases.
Authoring / Interchange formats
TIFF
The TIFF (Tagged Image File Format) format is a flexible format usually using either
the TIFF or TIF filename extension. The tagged structure was designed to be easily extendible, and
many vendors have introduced proprietary special-purpose tags – with the result that no one reader
handles every flavor of TIFF file. TIFFs can be lossy or lossless, depending on the technique chosen for
storing the pixel data. Some offer relatively good lossless compression for bi-level (black&white) images.
Some digital cameras can save images in TIFF format, using the LZW compression algorithm for lossless
storage. TIFF image format is not widely supported by web browsers, but it remains widely accepted as a
photograph file standard in the printing business. TIFF can handle device-specific color spaces, such as
the CMYK defined by a particular set of printing press inks. OCR (Optical Character Recognition)
software packages commonly generate some form of TIFF image (often monochromatic) for scanned text
pages.
BMP
The BMP file format (Windows bitmap) is a raster-based device-independent file type designed in the
early days of computer graphics. It handles graphic files within the Microsoft Windows OS. Typically,
BMP files are uncompressed, and therefore large and lossless; their advantage is their simple structure
and wide acceptance in Windows programs.
PPM, PGM, PBM, and PNM
Netpbm format is a family including the portable pixmap file format (PPM), the portable graymap file
format (PGM) and the portable bitmap file format (PBM). These are either pure ASCII files or raw
binary files with an ASCII header that provide very basic functionality and serve as a lowest common
denominator for converting pixmap, graymap, or bitmap files between different platforms. Several
applications refer to them collectively as PNM ("Portable aNy Map").
Container formats of raster graphics editors
These image formats contain various images, layers and objects, out of which the final image is to be
composed
BPG (Better Portable Graphics) — an image format from 2014. Its purpose is to replace JPEG
when quality or file size is an issue. To that end, it features a high data compression ratio, based on a
subset of the HEVC video compression standard, including lossless compression. In addition, it
supports various meta data (such as EXIF).
DEEP — IFF-style format used by TVPaint
DRW (Drawn File)
ECW (Enhanced Compression Wavelet)
FITS (Flexible Image Transport System)
FLIF (Free Lossless Image Format) — a discontinued lossless image format which claims to
outperform PNG, lossless WebP, lossless BPG and lossless JPEG 2000 in terms of compression ratio.
It uses the MANIAC (Meta-Adaptive Near-zero Integer Arithmetic Coding) entropy encoding
algorithm, a variant of the CABAC (context-adaptive binary arithmetic coding) entropy encoding
algorithm.
ICO — container for one or more icons (subsets of BMP and/or PNG)
ILBM — IFF-style format for up to 32 bit in planar representation, plus optional 64 bit extensions
IMG (ERDAS IMAGINE Image)
IMG (Graphics Environment Manager (GEM) image file) — planar, run-length encoded
JPEG XR — JPEG standard based on Microsoft HD Photo
Layered Image File Format — for microscope image processing
Nrrd (Nearly raw raster data)
PAM (Portable Arbitrary Map) — late addition to the Netpbm family
PCX (PiCture eXchange) — obsolete
PGF (Progressive Graphics File)
PLBM (Planar Bitmap) — proprietary Amiga format
SGI (Silicon Graphics Image) — native raster graphics file format for Silicon Graphics
workstations
SID (multiresolution seamless image database, MrSID)
Sun Raster — obsolete
TGA (TARGA) — obsolete
VICAR file format — NASA/JPL image transport format
XISF (Extensible Image Serialization Format)
Vector formats
Further information: Vector graphics
As opposed to the raster image formats above (where the data describes the characteristics of each
individual pixel), vector image formats contain a geometric description which can be rendered smoothly
at any desired display size.
At some point, all vector graphics must be rasterized in order to be displayed on digital monitors. Vector
images may also be displayed with analog CRT technology such as that used in some electronic test
equipment, medical monitors, radar displays, laser shows and early video games. Plotters are printers that
use vector data rather than pixel data to draw graphics.
CGM
CGM (Computer Graphics Metafile) is a file format for 2D vector graphics, raster graphics, and text, and
is defined by ISO/IEC 8632. All graphical elements can be specified in a textual source file that can be
compiled into a binary file or one of two text representations. CGM provides a means of graphics data
interchange for computer representation of 2D graphical information independent from any particular
application, system, platform, or device. It has been adopted to some extent in the areas of technical
illustration and professional design, but has largely been superseded by formats such as SVG and DXF.
Gerber format (RS-274X)
The Gerber format (aka Extended Gerber, RS-274X) is a 2D bi-level image description format developed
by Ucamco. It is the de facto standard format for printed circuit board or PCB software.[5]
SVG
SVG (Scalable Vector Graphics) is an open standard created and developed by the World Wide Web
Consortium to address the need (and attempts of several corporations) for a versatile, scriptable and all-
purpose vector format for the web and otherwise. The SVG format does not have a compression scheme
of its own, but due to the textual nature of XML, an SVG graphic can be compressed using a program
such as gzip. Because of its scripting potential, SVG is a key component in web applications: interactive
web pages that look and act like applications.
Other 2D vector formats
These are formats containing both pixel and vector data, possible other data, e.g. the interactive features
of PDF.
MPO The Multi Picture Object (.mpo) format consists of multiple JPEG images (Camera &
Imaging Products Association) (CIPA).
PNS The PNG Stereo (.pns) format consists of a side-by-side image based on PNG (Portable
Network Graphics).
JPS The JPEG Stereo (.jps) format consists of a side-by-side image format based on JPEG.
2.9.1 RGB
Color Models (Color Spaces)
• A purpose of color model is to serve as a method of representing color.
• Some color models are oriented towards hardware (eg: monitors, printers), others for applications
involving color manipulation.
• Monitors: RGB, Printers: CMY, human perception: HSI, efficient compression and transmission:
YCbCr
The RGB color model is an additive color model[1] in which the red, green and blue primary colors of
light are added together in various ways to reproduce a broad array of colors. The name of the model
comes from the initials of the three additive primary colors, red, green, and blue.[2]
The main purpose of the RGB color model is for the sensing, representation, and display of images in
electronic systems, such as televisions and computers, though it has also been used in
conventional photography. Before the electronic age, the RGB color model already had a solid theory
behind it, based in human perception of colors.
RGB is a device-dependent color model: different devices detect or reproduce a given RGB value
differently, since the color elements (such as phosphors or dyes) and their response to the individual red,
green, and blue levels vary from manufacturer to manufacturer, or even in the same device over time.
Thus an RGB value does not define the same color across devices without some kind of color
management.[3][4]
Typical RGB input devices are color TV and video cameras, image scanners, and digital cameras. Typical
RGB output devices are TV sets of various technologies (CRT, LCD, plasma, OLED, quantum dots,
etc.), computer and mobile phone displays, video projectors, multicolor LED displays and large screens
such as the Jumbotron. Color printers, on the other hand, are not RGB devices, but subtractive
color devices typically using the CMYK color model.
Additive colors
Additive color mixing: projecting primary color lights on a white surface shows secondary colors where
two overlap; the combination of all three primaries in equal intensities makes white.
To form a color with RGB, three light beams (one red, one green, and one blue) must be superimposed
(for example by emission from a black screen or by reflection from a white screen). Each of the three
beams is called a component of that color, and each of them can have an arbitrary intensity, from fully off
to fully on, in the mixture.
The RGB color model is additive in the sense that if light beams of differing color (frequency) are
superposed in space their light spectra adds up, wavelength for wavelength, to make up a resulting, total
spectrum.[5][6] This is essentially opposite to the subtractive color model, particularly the CMY color
model, which applies to paints, inks, dyes and other substances whose color depends on reflecting certain
components (frequencies) of the light under which we see them. In the additive model, if the resulting
spectrum, e.g. of superposing three colors, is flat, white color is perceived by the human eye upon direct
incidence on the retina. This is in stark contrast to the subtractive model, where the perceived resulting
spectrum is what reflecting surfaces, such as dyed surfaces, emit. Simply put, a dye filters out all colors
but its own; two blended dyes filter out all colors but the common color component between them, e.g.
green as the common component between yellow and cyan, red as the common component between
magenta and yellow, and blue-violet as the common component between magenta and cyan. It so happens
that there is no color component among magenta, cyan and yellow, thus rendering a spectrum of zero
intensity, black.
Zero intensity for each component gives the darkest color (no light, considered the black), and full
intensity of each gives a white; the quality of this white depends on the nature of the primary light
sources, but if they are properly balanced, the result is a neutral white matching the system's white point.
When the intensities for all the components are the same, the result is a shade of gray, darker or lighter
depending on the intensity. When the intensities are different, the result is a colorized hue, more or
less saturated depending on the difference of the strongest and weakest of the intensities of the primary
colors employed.
When one of the components has the strongest intensity, the color is a hue near this primary color (red-
ish, green-ish, or blue-ish), and when two components have the same strongest intensity, then the color is
a hue of a secondary color (a shade of cyan, magenta or yellow). A secondary color is formed by the sum
of two primary colors of equal intensity: cyan is green+blue, magenta is blue+red, and yellow is
red+green. Every secondary color is the complement of one primary color: cyan complements red,
magenta complements green, and yellow complements blue. When all the primary colors are mixed in
equal intensities, the result is white.
The RGB color model itself does not define what is meant by red, green, and blue colorimetrically, and
so the results of mixing them are not specified as absolute, but relative to the primary colors. When the
exact chromaticities of the red, green, and blue primaries are defined, the color model then becomes
an absolute color space, such as sRGB or Adobe RGB; see RGB color space for more details.
The choice of primary colors is related to the physiology of the human eye; good primaries are stimuli
that maximize the difference between the responses of the cone cells of the human retina to light of
different wavelengths, and that thereby make a large color triangle.[7]
The normal three kinds of light-sensitive photoreceptor cells in the human eye (cone cells) respond most
to yellow (long wavelength or L), green (medium or M), and violet (short or S) light (peak wavelengths
near 570 nm, 540 nm and 440 nm, respectively[7]). The difference in the signals received from the three
kinds allows the brain to differentiate a wide gamut of different colors, while being most sensitive
(overall) to yellowish-green light and to differences between hues in the green-to-orange region.
As an example, suppose that light in the orange range of wavelengths (approximately 577 nm to 597 nm)
enters the eye and strikes the retina. Light of these wavelengths would activate both the medium and long
wavelength cones of the retina, but not equally—the long-wavelength cells will respond more. The
difference in the response can be detected by the brain, and this difference is the basis of our perception
of orange. Thus, the orange appearance of an object results from light from the object entering our eye
and stimulating the different cones simultaneously but to different degrees.
Use of the three primary colors is not sufficient to reproduce all colors; only colors within the color
triangle defined by the chromaticities of the primaries can be reproduced by additive mixing of non-
negative amounts of those colors of light. [7][page needed]
History of RGB color model theory and usage
The RGB color model is based on the Young–Helmholtz theory of trichromatic color vision, developed
by Thomas Young and Hermann von Helmholtz in the early to mid-nineteenth century, and on James
Clerk Maxwell's color triangle that elaborated that theory (c. 1860).
Early color photographs
The first permanent color photograph, taken by Thomas Sutton in 1861 using James Clerk
Maxwell's proposed method of three filters, specifically red, green, and violet-blue.
A photograph of Mohammed Alim Khan (1880–1944), Emir of Bukhara, taken in 1911 by Sergey
Prokudin-Gorsky using three exposures with blue, green, and red filters.
Photography
The first experiments with RGB in early color photography were made in 1861 by Maxwell himself, and
involved the process of combining three color-filtered separate takes.[1] To reproduce the color
photograph, three matching projections over a screen in a dark room were necessary.
The additive RGB model and variants such as orange–green–violet were also used in the Autochrome
Lumière color plates and other screen-plate technologies such as the Joly color screen and the Paget
process in the early twentieth century. Color photography by taking three separate plates was used by
other pioneers, such as the Russian Sergey Prokudin-Gorsky in the period 1909 through 1915.[8] Such
methods lasted until about 1960 using the expensive and extremely complex tri-color
carbro Autotype process.[9]
When employed, the reproduction of prints from three-plate photos was done by dyes or pigments using
the complementary CMY model, by simply using the negative plates of the filtered takes: reverse red
gives the cyan plate, and so on.
Television
Before the development of practical electronic TV, there were patents on mechanically scanned color
systems as early as 1889 in Russia. The color TV pioneer John Logie Baird demonstrated the world's first
RGB color transmission in 1928, and also the world's first color broadcast in 1938, in London. In his
experiments, scanning and display were done mechanically by spinning colorized wheels. [10][11]
The Columbia Broadcasting System (CBS) began an experimental RGB field-sequential color system in
1940. Images were scanned electrically, but the system still used a moving part: the transparent RGB
color wheel rotating at above 1,200 rpm in synchronism with the vertical scan. The camera and
the cathode-ray tube (CRT) were both monochromatic. Color was provided by color wheels in the camera
and the receiver.[12][13][14] More recently, color wheels have been used in field-sequential projection TV
receivers based on the Texas Instruments monochrome DLP imager.
The modern RGB shadow mask technology for color CRT displays was patented by Werner Flechsig in
Germany in 1938.[15]
Personal computers
Personal computers of the late 1970s and early 1980s, such as the Apple II and VIC-20, used composite
video. The Commodore 64 and the Atari 8-bit family used S-Video derivatives. IBM introduced a 16-
color scheme (four bits—one bit each for red, green, blue, and intensity) with the Color Graphics
Adapter (CGA) for its IBM PC in 1981, later improved with the Enhanced Graphics Adapter (EGA) in
1984. The first manufacturer of a truecolor graphics card for PCs (the TARGA) was Truevision in 1987,
but it was not until the arrival of the Video Graphics Array (VGA) in 1987 that RGB became popular,
mainly due to the analog signals in the connection between the adapter and the monitor which allowed a
very wide range of RGB colors. Actually, it had to wait a few more years because the original VGA cards
were palette-driven just like EGA, although with more freedom than VGA, but because the VGA
connectors were analog, later variants of VGA (made by various manufacturers under the informal name
Super VGA) eventually added true-color. In 1992, magazines heavily advertised true-color Super VGA
hardware.
RGB devices
RGB and displays
Cutaway rendering of a color CRT: 1. Electron guns 2. Electron beams 3. Focusing coils 4. Deflection
coils 5. Anode connection 6. Mask for separating beams for red, green, and blue part of displayed
image 7. Phosphor layer with red, green, and blue zones 8. Close-up of the phosphor-coated inner side of
RGB sub-pixels in an LCD TV (on the right: an orange and a blue color; on the left: a close-up)
One common application of the RGB color model is the display of colors on a cathode-ray
tube (CRT), liquid-crystal display (LCD), plasma display, or organic light emitting diode (OLED) display
such as a television, a computer's monitor, or a large scale screen. Each pixel on the screen is built by
driving three small and very close but still separated RGB light sources. At common viewing distance, the
separate sources are indistinguishable, which tricks the eye to see a given solid color. All the pixels
together arranged in the rectangular screen surface conforms the color image.
During digital image processing each pixel can be represented in the computer memory or interface
hardware (for example, a graphics card) as binary values for the red, green, and blue color components.
When properly managed, these values are converted into intensities or voltages via gamma correction to
correct the inherent nonlinearity of some devices, such that the intended intensities are reproduced on the
display.
The Quattron released by Sharp uses RGB color and adds yellow as a sub-pixel, supposedly allowing an
increase in the number of available colors.
Video electronics
Main article: Component_video § RGB_analog_component_video
RGB is also the term referring to a type of component video signal used in the video electronics industry.
It consists of three signals—red, green, and blue—carried on three separate cables/pins. RGB signal
formats are often based on modified versions of the RS-170 and RS-343 standards for monochrome
video. This type of video signal is widely used in Europe since it is the best quality signal that can be
carried on the standard SCART connector.[16][17] This signal is known as RGBS (4 BNC/RCA terminated
cables exist as well), but it is directly compatible with RGBHV used for computer monitors (usually
carried on 15-pin cables terminated with 15-pin D-sub or 5 BNC connectors), which carries separate
horizontal and vertical sync signals.
Outside Europe, RGB is not very popular as a video signal format; S-Video takes that spot in most non-
European regions. However, almost all computer monitors around the world use RGB.
Video framebuffer
A framebuffer is a digital device for computers which stores data in the so-called video
memory (comprising an array of Video RAM or similar chips). This data goes either to three digital-to-
analog converters (DACs) (for analog monitors), one per primary color or directly to digital monitors.
Driven by software, the CPU (or other specialized chips) write the appropriate bytes into the video
memory to define the image. Modern systems encode pixel color values by devoting eight bits to each of
the R, G, and B components. RGB information can be either carried directly by the pixel bits themselves
or provided by a separate color look-up table (CLUT) if indexed color graphic modes are used.
A CLUT is a specialized RAM that stores R, G, and B values that define specific colors. Each color has
its own address (index)—consider it as a descriptive reference number that provides that specific color
when the image needs it. The content of the CLUT is much like a palette of colors. Image data that uses
indexed color specifies addresses within the CLUT to provide the required R, G, and B values for each
specific pixel, one pixel at a time. Of course, before displaying, the CLUT has to be loaded with R, G,
and B values that define the palette of colors required for each image to be rendered. Some video
applications store such palettes in PAL files (Age of Empires game, for example, uses over half-a-
dozen[18]) and can combine CLUTs on screen.
RGB24 and RGB32
This indirect scheme restricts the number of available colors in an image CLUT—typically 256-cubed (8
bits in three color channels with values of 0–255)—although each color in the RGB24 CLUT table has
only 8 bits representing 256 codes for each of the R, G, and B primaries, making 16,777,216 possible
colors. However, the advantage is that an indexed-color image file can be significantly smaller than it
would be with only 8 bits per pixel for each primary.
Modern storage, however, is far less costly, greatly reducing the need to minimize image file size. By
using an appropriate combination of red, green, and blue intensities, many colors can be displayed.
Current typical display adapters use up to 24-bits of information for each pixel: 8-bit per component
multiplied by three components (see the Numeric representations section below (24bits = 2563, each
primary value of 8 bits with values of 0–255). With this system, 16,777,216 (2563 or 224) discrete
combinations of R, G, and B values are allowed, providing millions of different (though not necessarily
distinguishable) hue, saturation and lightness shades. Increased shading has been implemented in various
ways, some formats such as .png and .tga files among others using a fourth greyscale color channel as a
masking layer, often called RGB32.
For images with a modest range of brightnesses from the darkest to the lightest, eight bits per primary
color provides good-quality images, but extreme images require more bits per primary color as well as the
advanced display technology. For more information see High Dynamic Range (HDR) imaging.
Nonlinearity
Main article: Gamma correction
In classic CRT devices, the brightness of a given point over the fluorescent screen due to the impact of
accelerated electrons is not proportional to the voltages applied to the electron gun control grids, but to an
expansive function of that voltage. The amount of this deviation is known as its gamma value ( ), the
argument for a power law function, which closely describes this behavior. A linear response is given by a
gamma value of 1.0, but actual CRT nonlinearities have a gamma value around 2.0 to 2.5.
Similarly, the intensity of the output on TV and computer display devices is not directly proportional to
the R, G, and B applied electric signals (or file data values which drive them through digital-to-analog
converters). On a typical standard 2.2-gamma CRT display, an input intensity RGB value of
(0.5, 0.5, 0.5) only outputs about 22% of full brightness (1.0, 1.0, 1.0), instead of 50%.[19] To obtain the
correct response, a gamma correction is used in encoding the image data, and possibly further corrections
as part of the color calibration process of the device. Gamma affects black-and-white TV as well as color.
In standard color TV, broadcast signals are gamma corrected.
RGB and cameras
The Bayer filter arrangement of color filters on the pixel array of a digital image sensor
In color television and video cameras manufactured before the 1990s, the incoming light was separated
by prisms and filters into the three RGB primary colors feeding each color into a separate video camera
tube (or pickup tube). These tubes are a type of cathode-ray tube, not to be confused with that of CRT
displays.
With the arrival of commercially viable charge-coupled device (CCD) technology in the 1980s, first, the
pickup tubes were replaced with this kind of sensor. Later, higher scale integration electronics was
applied (mainly by Sony), simplifying and even removing the intermediate optics, thereby reducing the
size of home video cameras and eventually leading to the development of full camcorders.
Current webcams and mobile phones with cameras are the most miniaturized commercial forms of such
technology.
Photographic digital cameras that use a CMOS or CCD image sensor often operate with some variation of
the RGB model. In a Bayer filter arrangement, green is given twice as many detectors as red and blue
(ratio [Link]) in order to achieve higher luminance resolution than chrominance resolution. The sensor has
a grid of red, green, and blue detectors arranged so that the first row is RGRGRGRG, the next is
GBGBGBGB, and that sequence is repeated in subsequent rows. For every channel, missing pixels are
obtained by interpolation in the demosaicing process to build up the complete image. Also, other
processes used to be applied in order to map the camera RGB measurements into a standard RGB color
space as sRGB.
RGB and scanners
In computing, an image scanner is a device that optically scans images (printed text, handwriting, or an
object) and converts it to a digital image which is transferred to a computer. Among other formats, flat,
drum and film scanners exist, and most of them support RGB color. They can be considered the
successors of early telephotography input devices, which were able to send consecutive scan
lines as analog amplitude modulation signals through standard telephonic lines to appropriate receivers;
such systems were in use in press since the 1920s to the mid-1990s. Color telephotographs were sent as
three separated RGB filtered images consecutively.
Currently available scanners typically use CCD or contact image sensor (CIS) as the image sensor,
whereas older drum scanners use a photomultiplier tube as the image sensor. Early color film scanners
used a halogen lamp and a three-color filter wheel, so three exposures were needed to scan a single color
image. Due to heating problems, the worst of them being the potential destruction of the scanned film,
this technology was later replaced by non-heating light sources such as color LEDs.
Numeric representations
From 0 to 1, with any fractional value in between. This representation is used in theoretical
analyses, and in systems that use floating point representations.
Each color component value can also be written as a percentage, from 0% to 100%.
In computers, the component values are often stored as unsigned integer numbers in the range 0 to
255, the range that a single 8-bit byte can offer. These are often represented as either decimal
or hexadecimal numbers.
High-end digital image equipment are often able to deal with larger integer ranges for each
primary color, such as 0..1023 (10 bits), 0..65535 (16 bits) or even larger, by extending the 24-bits
(three 8-bit values) to 32-bit, 48-bit, or 64-bit units (more or less independent from the particular
computer's word size).
Geometric representation
See also: RGB color spaces
The RGB color model mapped to a cube. The horizontal x-axis as red
values increasing to the left, y-axis as blue increasing to the lower right and the vertical z-axis as green
increasing towards the top. The origin, black is the vertex hidden from view.
Since colors are usually defined by three components, not only in the RGB model, but also in other color
models such as CIELAB and Y'UV, among others, then a three-dimensional volume is described by
treating the component values as ordinary Cartesian coordinates in a Euclidean space. For the RGB
model, this is represented by a cube using non-negative values within a 0–1 range, assigning black to the
origin at the vertex (0, 0, 0), and with increasing intensity values running along the three axes up to white
at the vertex (1, 1, 1), diagonally opposite black.
An RGB triplet (r,g,b) represents the three-dimensional coordinate of the point of the given color within
the cube or its faces or along its edges. This approach allows computations of the color similarity of two
given RGB colors by simply calculating the distance between them: the shorter the distance, the higher
the similarity. Out-of-gamut computations can also be performed this way.
[Link]
Attribute Description
L* Luminance or brightness of the image. Values are in the range [0, 100], where 0 specifies
a* Amount of red or green tones in the image. A large positive a* value corresponds to red/magenta.
A large negative a* value corresponds to green. Although there is no singlerange for a*,
b* Amount of yellow or blue tones in the image. A large positive b* value corresponds to yellow.
A large negative b* value corresponds to blue. Although there is no single range for b*,
Device-independent color spaces include the effect of the illumination source, called the reference white
point. The source imparts a color hue to the raw image data according to the color temperature of the
illuminant. For example, sunlight during sunrise or sunset imparts a yellow hue to an image, whereas
sunlight around noontime imparts a blue hue.
Use the rgb2xyz and xyz2rgb functions to convert between the RGB and XYZ color spaces. Use
the rgb2lab and lab2rgb functions to convert between the RGB and L*a*b* color spaces.
The toolbox supports several related color space specifications that are better suited to some purposes
than XYZ. For more information see Device-Independent Color Spaces.
2.9.3 HSV/HSL
The HSV (Hue, Saturation, Value) color space corresponds better to how people experience color than the
RGB color space does. For example, this color space is often used by people who are selecting colors,
such as paint or ink color, from a color wheel or palette.
Attribute Description
H Hue, which corresponds to the color’s position on a color wheel. H is in the range [0, 1].
As H increases, colors transition from red to orange, yellow, green, cyan, blue, magenta,
and finally back to red. Both 0 and 1 indicate red.
S Saturation, which is the amount of hue or departure from neutral. S is in the range [0, 1].
As S increases, colors vary from unsaturated (shades of gray) to full
Attribute Description
saturated (no white component).
V Value, which is the maximum value among the red, green, and blue components of a
specific color. V is in the range [0, 1]. As V increases, the corresponding colors
become increasingly brighter.
2.9.4 LAB
The L*a*b* color space provides a more perceptually uniform color space than the XYZ model. Colors in
the L*a*b* color space can exist outside the RGB gamut (the valid set of RGB colors). For example,
when you convert the L*a*b* value [100, 100, 100] to the RGB color space, the returned value is
[1.7682, 0.5746, 0.1940], which is not a valid RGB color. For more information, see Determine If L*a*b*
Value Is in RGB Gamut.
The YCbCr color space is widely used for digital video. In this format, luminance information is stored as
a single component (Y) and chrominance information is stored as two color-difference components
(Cb and Cr). Cb and Cr represent the difference between a reference value and the blue or red component,
respectively. (YUV, another color space widely used for digital video, is very similar to YCbCr but not
identical.)
Attribute Description
Y Luminance or brightness of the image.
Colors increase in brightness as Y increases.
Cb Chrominance value that indicates the difference between the blue component and
a reference value.
Cr Chrominance value that indicates the difference between the red component and
a reference value.
The range of numeric values depends on the data type of the image. YCbCr does not use the full range of
the image data type so that the video stream can include additional (non-image) information.
For single or double arrays, Y is in the range [16/255, 235/255] and Cb and Cr are in the range
[16/255, 240/255].
For uint8 arrays, Y is in the range [16, 235] and Cb and Cr are in the range [16, 240].
For uint16, Y is in the range [4112, 60395] and Cb and Cr are in the range [4112, 61680].
Use the rgb2ycbcr and ycbcr2rgb functions to convert between the RGB and YCbCr color spaces.
YIQ
The National Television Systems Committee (NTSC) defines a color space known as YIQ. This color
space is used in televisions in the United States. This color space separates grayscale information from
color data, so the same signal can be used for both color and black and white television sets.
Attribute Description
Y Luma, or brightness of the image. Values are in the range [0, 1], where 0 specifies black and
1 specifies white. Colors increase in brightness as Y increases.
I In-phase, which is approximately the amount of blue or orange tones in the image.
I in the range [-0.5959, 0.5959], where negative numbers indicate blue tones and
positive numbers indicate orange tones. As the magnitude of I increases,
the saturation of the color increases.
Q Quadrature, which is approximately the amount of green or purple tones in the image.
Q in the range [-0.5229, 0.5229], where negative numbers indicate green tones and
positive numbers indicate purple tones. As the magnitude of Q increases,
the saturation of the color increases.
Use the rgb2ntsc and ntsc2rgb functions to convert between the RGB and YIQ color spaces.
Because luminance is one of the components of the NTSC format, the RGB to NTSC conversion is also
useful for isolating the gray level information in an image. In fact, the toolbox
functions rgb2gray and ind2gray use the rgb2ntsc function to extract the grayscale information from a
color image.
Invention
The mathematical basis for Bézier curves—the Bernstein polynomials—was established in 1912, but
the polynomials were not applied to graphics until some 50 years later when mathematician Paul de
Casteljau in 1959 developed de Casteljau's algorithm, a numerically stable method for evaluating the
curves, and became the first to apply them to computer-aided design at French automaker Citroën.[6] Yet,
de Casteljau's method was patented in France but not published until the 1980s [7] while the Bézier
polynomials were widely publicised in the 1960s by the French engineer Pierre Bézier, who discovered
them independently and used them to design automobile bodies at Renault.
Specific cases
A Bézier curve is defined by a set of control points P0 through Pn, where n is called the order of the curve
(n = 1 for linear, 2 for quadratic, 3 for cubic, etc.). The first and last control points are always the
endpoints of the curve; however, the intermediate control points generally do not lie on the curve. The
sums in the following sections are to be understood as affine combinations – that is, the coefficients sum
to 1.
Linear Bézier curves
Given distinct points P0 and P1, a linear Bézier curve is simply a line between those two points. The curve
is given by
and is equivalent to linear interpolation. The quantity represents the displacement vector from
the start point to the end point.
Quadratic Bézier curves
Quadratic Béziers in string art: The end points (•) and control point (×)
define the quadratic Bézier curve (⋯).
A quadratic Bézier curve is the path traced by the function B(t), given points P0, P1, and P2,
,
which can be interpreted as the linear interpolant of corresponding points on the linear Bézier curves
from P0 to P1 and from P1 to P2 respectively. Rearranging the preceding equation yields:
This can be written in a way that highlights the symmetry with respect to P1:
Which immediately gives the derivative of the Bézier curve with respect to t:
from which it can be concluded that the tangents to the curve at P0 and P2 intersect at P1. As t increases
from 0 to 1, the curve departs from P0 in the direction of P1, then bends to arrive at P2 from the direction
of P1.
The second derivative of the Bézier curve with respect to t is
For some choices of P1 and P2 the curve may intersect itself, or contain a cusp.
Any series of 4 distinct points can be converted to a cubic Bézier curve that goes through all 4 points in
order. Given the starting and ending point of some cubic Bézier curve, and the points along the curve
corresponding to t = 1/3 and t = 2/3, the control points for the original Bézier curve can be recovered. [8]
The derivative of the cubic Bézier curve with respect to t is
General definition
Bézier curves can be defined for any degree n.
Recursive definition
A recursive definition for the Bézier curve of degree n expresses it as a point-to-point linear
combination (linear interpolation) of a pair of corresponding points in two Bézier curves of degree n − 1.
Let denote the Bézier curve determined by any selection of points P0, P1, ..., Pk. Then to start,
Terminology
Some terminology is associated with these parametric curves. We have
The points Pi are called control points for the Bézier curve. The polygon formed by connecting the Bézier
points with lines, starting with P0 and finishing with Pn, is called the Bézier polygon (or control polygon).
The convex hull of the Bézier polygon contains the Bézier curve.
Polynomial form
Sometimes it is desirable to express the Bézier curve as a polynomial instead of a sum of less
straightforward Bernstein polynomials. Application of the binomial theorem to the definition of the curve
followed by some rearrangement will yield
where
This could be practical if can be computed prior to many evaluations of ; however one should
use caution as high order curves may lack numeric stability (de Casteljau's algorithm should be used if
this occurs). Note that the empty product is 1.
Properties
A cubic Bézier curve (yellow) can be made identical to a quadratic one (black)
by
2. placing its 2 middle control points (yellow circles) 2/3 along line segments from the end points to the
quadratic curve's middle control point (black rectangle).
The curve begins at and ends at ; this is the so-called endpoint interpolation property.
The curve is a line if and only if all the control points are collinear.
The start and end of the curve is tangent to the first and last section of the Bézier polygon,
respectively.
A curve can be split at any point into two subcurves, or into arbitrarily many subcurves, each of
which is also a Bézier curve.
Some curves that seem simple, such as the circle, cannot be described exactly by a Bézier
or piecewise Bézier curve; though a four-piece cubic Bézier curve can approximate a circle
(see composite Bézier curve), with a maximum radial error of less than one part in a thousand, when each
inner control point (or offline point) is the distance horizontally or vertically from an outer control
point on a unit circle. More generally, an n-piece cubic Bézier curve can approximate a circle, when each
inner control point is the distance from an outer control point on a unit circle, where (i.e.
), and .
Every quadratic Bézier curve is also a cubic Bézier curve, and more generally, every
degree n Bézier curve is also a degree m curve for any m > n. In detail, a degree n curve with control
points is equivalent (including the parametrization) to the degree n + 1 curve with control
A quadratic Bézier curve is also a segment of a parabola. As a parabola is a conic section, some sources
refer to quadratic Béziers as "conic arcs". [11] With reference to the figure on the right, the important
features of the parabola can be derived as follows:[12]
1. Tangents to the parabola at the endpoints of the curve (A and B) intersect at its control point (C).
2. If D is the midpoint of AB, the tangent to the curve which is perpendicular to CD (dashed cyan
line) defines its vertex (V). Its axis of symmetry (dash-dot cyan) passes through V and is perpendicular to
the tangent.
3. E is either point on the curve with a tangent at 45° to CD (dashed green). If G is the intersection of
this tangent and the axis, the line passing through G and perpendicular to CD is the directrix (solid green).
4. The focus (F) is at the intersection of the axis and a line passing through E and perpendicular to
CD (dotted yellow). The latus rectum is the line segment within the curve (solid yellow).
Derivative[edit]
The derivative for a curve of order n is
Quadratic curves
For quadratic Bézier curves one can construct intermediate points Q0 and Q1 such that as t varies from 0
to 1:
Higher-order curve
For higher-order curves one needs correspondingly more intermediate points. For cubic curves one can
construct intermediate points Q0, Q1, and Q2 that describe linear Bézier curves, and points R0 and R1 that
describe quadratic Bézier curves:
For fourth-order curves one can construct intermediate points Q0, Q1, Q2 and Q3 that describe linear
Bézier curves, points R0, R1 and R2 that describe quadratic Bézier curves, and points S0 and S1 that
describe cubic Bézier curves:
These representations rest on the process used in De Casteljau's algorithm to calculate Bézier curves.[13]
Offsets (or stroking) of Bézier curves
The curve at a fixed offset from a given Bézier curve, called an offset or parallel curve in mathematics
(lying "parallel" to the original curve, like the offset between rails in a railroad track), cannot be exactly
formed by a Bézier curve (except in some trivial cases). In general, the two-sided offset curve of a cubic
Bézier is a 10th-order algebraic curve[14] and more generally for a Bézier of degree n the two-sided offset
curve is an algebraic curve of degree 4n − 2.[15] However, there are heuristic methods that usually give an
adequate approximation for practical purposes.[16]
In the field of vector graphics, painting two symmetrically distanced offset curves is called stroking (the
Bézier curve or in general a path of several Bézier segments). [14] The conversion from offset curves to
filled Bézier contours is of practical importance in converting fonts defined in Metafont, which require
stroking of Bézier curves, to the more widely used PostScript type 1 fonts, which only require (for
efficiency purposes) the mathematically simpler operation of filling a contour defined by (non-self-
intersecting) Bézier curves.[17]
Degree elevation
A Bézier curve of degree n can be converted into a Bézier curve of degree n + 1 with the same shape.
This is useful if software supports Bézier curves only of specific degree. For example, systems that can
only work with cubic Bézier curves can implicitly work with quadratic curves by using their equivalent
cubic representation.
To do degree elevation, we use the equality Each component is multiplied by (1 − t) and t, thus
increasing a degree by one, without changing the value. Here is the example of increasing degree from 2
to 3.
Therefore:
introducing arbitrary and .
Therefore, new control points are
The rational Bézier curve adds adjustable weights to provide closer approximations to arbitrary shapes.
The numerator is a weighted Bernstein-form Bézier curve and the denominator is a weighted sum
of Bernstein polynomials. Rational Bézier curves can, among other uses, be used to represent segments
of conic sections exactly, including circular arcs.[19]
Given n + 1 control points P0, ..., Pn, the rational Bézier curve can be described by
or simply
The expression can be extended by using number systems besides reals for the weights. In the complex
plane the points {1}, {-1}, and {1} with weights { }, {1}, and { } generate a full circle with
radius one. For curves with points and weights on a circle, the weights can be scaled without changing the
curve's shape.[20] Scaling the central weight of the above curve by 1.35508 gives a more uniform
parameterization.
2.10.2 Ellipsoid, Gamma Correction
where the non-negative real input value is raised to the power and multiplied by the constant A to
get the output value . In the common case of A = 1, inputs and outputs are typically in the range 0–1.
A gamma value is sometimes called an encoding gamma, and the process of encoding with this
Explanation
Gamma encoding of images is used to optimize the usage of bits when encoding an image, or bandwidth
used to transport an image, by taking advantage of the non-linear manner in which humans perceive light
and color.[1] The human perception of brightness (lightness), under common illumination conditions
(neither pitch black nor blindingly bright), follows an approximate power function (which has no relation
to the gamma function), with greater sensitivity to relative differences between darker tones than between
lighter tones, consistent with the Stevens power law for brightness perception. If images are not gamma-
encoded, they allocate too many bits or too much bandwidth to highlights that humans cannot
differentiate, and too few bits or too little bandwidth to shadow values that humans are sensitive to and
would require more bits/bandwidth to maintain the same visual quality. [2][1][3] Gamma encoding
of floating-point images is not required (and may be counterproductive), because the floating-point
format already provides a piecewise linear approximation of a logarithmic curve. [4]
Although gamma encoding was developed originally to compensate for the brightness characteristics
of cathode ray tube (CRT) displays, that is not its main purpose or advantage in modern systems. In CRT
displays, the light intensity varies nonlinearly with the electron-gun voltage. Altering the input signal by
gamma compression can cancel this nonlinearity, such that the output picture has the intended luminance.
However, the gamma characteristics of the display device do not play a factor in the gamma encoding of
images and video. They need gamma encoding to maximize the visual quality of the signal, regardless of
the gamma characteristics of the display device. [1][3] The similarity of CRT physics to the inverse of
gamma encoding needed for video transmission was a combination of coincidence and engineering,
which simplified the electronics in early television sets. [5]
Photographic film has a much greater ability to record fine differences in shade than can be reproduced
on photographic paper. Similarly, most video screens are not capable of displaying the range of
brightnesses (dynamic range) that can be captured by typical electronic cameras. [6] For this reason,
considerable artistic effort is invested in choosing the reduced form in which the original image should be
presented. The gamma correction, or contrast selection, is part of the photographic repertoire used to
adjust the reproduced image.
Analogously, digital cameras record light using electronic sensors that usually respond linearly. In the
process of rendering linear raw data to conventional RGB data (e.g. for storage into JPEG image format),
color space transformations and rendering transformations will be performed. In particular, almost all
standard RGB color spaces and file formats use a non-linear encoding (a gamma compression) of the
intended intensities of the primary colors of the photographic reproduction. In addition, the intended
reproduction is almost always nonlinearly related to the measured scene intensities, via a tone
reproduction nonlinearity.
Generalized gamma
The concept of gamma can be applied to any nonlinear relationship. For the power-law relationship ,
the curve on a log–log plot is a straight line, with slope everywhere equal to gamma (slope is represented
here by the derivative operator):
That is, gamma can be visualized as the slope of the input–output curve when plotted on logarithmic axes.
For a power-law curve, this slope is constant, but the idea can be extended to any type of curve, in which
case gamma (strictly speaking, "point gamma"[7]) is defined as the slope of the curve in any particular
region.
Film photography
Main article: Sensitometry
When a photographic film is exposed to light, the result of the exposure can be represented on a graph
showing log of exposure on the horizontal axis, and density, or negative log of transmittance, on the
vertical axis. For a given film formulation and processing method, this curve is its characteristic or
Hurter–Driffield curve.[8][9] Since both axes use logarithmic units, the slope of the linear section of the
curve is called the gamma of the film. Negative film typically has a gamma less than 1;[9][10] positive film
(slide film, reversal film) typically has a gamma with absolute value greater than 1. [11]
The sRGB color space standard used with most cameras, PCs, and printers does not use a simple power-
law nonlinearity as above, but has a decoding gamma value near 2.2 over much of its range, as shown in
the plot to the right/above. Below a compressed value of 0.04045 or a linear intensity of 0.00313, the
curve is linear (encoded value proportional to intensity), so γ = 1. The dashed black curve behind the red
curve is a standard γ = 2.2 power-law curve, for comparison.
Gamma correction in computers is used, for example, to display a gamma = 1.8 Apple picture correctly
on a gamma = 2.2 PC monitor by changing the image gamma. Another usage is equalizing of the
individual color-channel gammas to correct for monitor discrepancies.
Some picture formats allow an image's intended gamma (of transformations between encoded image
samples and light output) to be stored as metadata, facilitating automatic gamma correction.
The PNG specification includes the gAMA chunk for this purpose[14] and with formats such
as JPEG and TIFF the Exif Gamma tag can be used. Some formats can specify the ICC profile which
includes a transfer function.
These features have historically caused problems, especially on the web. For HTML and CSS colors and
JPG or GIF images without attached color profile metadata, popular browsers passed numerical color
values to the display without color management, resulting in substantially different appearance between
devices; however those same browsers sent images with gamma explicitly set in metadata through color
management, and also applied a default gamma to PNG images with metadata omitted. This made it
impossible for PNG images to simultaneously match HTML or untagged JPG colors on every
device.[15] This situation has since improved, as most major browsers now support the gamma setting (or
lack of it).[16][17]
Linea
r VS 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
encod = 0 1 2 3 4 5 6 7 8 9 0
ing
Linea
r I= 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
intens 0 1 2 3 4 5 6 7 8 9 0
ity
On most displays (those with gamma of about 2.2), one can observe that the linear-intensity scale has a
large jump in perceived brightness between the intensity values 0.0 and 0.1, while the steps at the higher
end of the scale are hardly perceptible. The gamma-encoded scale, which has a nonlinearly-increasing
intensity, will show much more even steps in perceived brightness.
A cathode ray tube (CRT), for example, converts a video signal to light in a nonlinear way, because the
electron gun's intensity (brightness) as a function of applied video voltage is nonlinear. The light
intensity I is related to the source voltage Vs according to
where γ is the Greek letter gamma. For a CRT, the gamma that relates brightness to voltage is usually in
the range 2.35 to 2.55; video look-up tables in computers usually adjust the system gamma to the range
1.8 to 2.2,[1] which is in the region that makes a uniform encoding difference give approximately uniform
perceptual brightness difference, as illustrated in the diagram at the top of this section.
For simplicity, consider the example of a monochrome CRT. In this case, when a video signal of 0.5
(representing a mid-gray) is fed to the display, the intensity or brightness is about 0.22 (resulting in a mid-
gray, about 22% the intensity of white). Pure black (0.0) and pure white (1.0) are the only shades that are
unaffected by gamma.
To compensate for this effect, the inverse transfer function (gamma correction) is sometimes applied to
the video signal so that the end-to-end response is linear. In other words, the transmitted signal is
deliberately distorted so that, after it has been distorted again by the display device, the viewer sees the
correct brightness. The inverse of the function above is
where Vc is the corrected voltage, and Vs is the source voltage, for example, from an image sensor that
converts photocharge linearly to a voltage. In our CRT example 1/γ is 1/2.2 ≈ 0.45.
A color CRT receives three video signals (red, green, and blue) and in general each color has its own
value of gamma, denoted γR, γG or γB. However, in simple display systems, a single value of γ is used for
all three colors.
Other display devices have different values of gamma: for example, a Game Boy Advance display has a
gamma between 3 and 4 depending on lighting conditions. In LCDs such as those on laptop computers,
the relation between the signal voltage Vs and the intensity I is very nonlinear and cannot be described
with gamma value. However, such displays apply a correction onto the signal voltage in order to
approximately get a standard γ = 2.5 behavior. In NTSC television recording, γ = 2.2.
The power-law function, or its inverse, has a slope of infinity at zero. This leads to problems in
converting from and to a gamma colorspace. For this reason most formally defined colorspaces such
as sRGB will define a straight-line segment near zero and add raising x + K (where K is a constant) to a
power so the curve has continuous slope. This straight line does not represent what the CRT does, but
does make the rest of the curve more closely match the effect of ambient light on the CRT. In such
expressions the exponent is not the gamma; for instance, the sRGB function uses a power of 2.4 in it, but
more closely resembles a power-law function with an exponent of 2.2, without a linear portion.
The pixel's intensity values in a given image file; that is, the binary pixel values are stored in the
file in such way that they represent the light intensity via gamma-compressed values instead of a linear
encoding. This is done systematically with digital video files (as those in a DVD movie), in order to
minimize the gamma-decoding step while playing, and maximize image quality for the given storage.
Similarly, pixel values in standard image file formats are usually gamma-compensated, either for sRGB
gamma (or equivalent, an approximation of typical of legacy monitor gammas), or according to some
gamma specified by metadata such as an ICC profile. If the encoding gamma does not match the
reproduction system's gamma, further correction may be done, either on display or to create a modified
image file with a different profile.
The rendering software writes gamma-encoded pixel binary values directly to the video memory
(when highcolor/truecolor modes are used) or in the CLUT hardware registers (when indexed
color modes are used) of the display adapter. They drive digital-to-analog converters (DAC) which output
the proportional voltages to the display. For example, when using 24-bit RGB color (8 bits per channel),
writing a value of 128 (rounded midpoint of the 0–255 byte range) in video memory it outputs the
proportional ≈ 0.5 voltage to the display, which it is shown darker due to the monitor behavior.
Alternatively, to achieve ≈ 50% intensity, a gamma-encoded look-up table can be applied to write a value
near to 187 instead of 128 by the rendering software.
Modern display adapters have dedicated calibrating CLUTs, which can be loaded once with the
appropriate gamma-correction look-up table in order to modify the encoded signals digitally before the
DACs that output voltages to the monitor. [19] Setting up these tables to be correct is called hardware
calibration.[20]
Some modern monitors allow the user to manipulate their gamma behavior (as if it were merely
another brightness/contrast-like setting), encoding the input signals by themselves before they are
displayed on screen. This is also a calibration by hardware technique but it is performed on the analog
electric signals instead of remapping the digital values, as in the previous cases.
In a correctly calibrated system, each component will have a specified gamma for its input and/or output
encodings.[20] Stages may change the gamma to correct for different requirements, and finally the output
device will do gamma decoding or correction as needed, to get to a linear intensity domain. All the
encoding and correction methods can be arbitrarily superimposed, without mutual knowledge of this fact
among the different elements; if done incorrectly, these conversions can lead to highly distorted results,
but if done correctly as dictated by standards and conventions will lead to a properly functioning system.
In a typical system, for example from camera through JPEG file to display, the role of gamma correction
will involve several cooperating parts. The camera encodes its rendered image into the JPEG file using
one of the standard gamma values such as 2.2, for storage and transmission. The display computer may
use a color management engine to convert to a different color space (such as older Macintosh's γ =
1.8 color space) before putting pixel values into its video memory. The monitor may do its own gamma
correction to match the CRT gamma to that used by the video system. Coordinating the components via
standard interfaces with default standard gamma values makes it possible to get such system properly
configured.
2.10.3 Structural Similarity Index
The structural similarity index measure (SSIM) is a method for predicting the perceived quality of
digital television and cinematic pictures, as well as other kinds of digital images and videos. SSIM is used
for measuring the similarity between two images. The SSIM index is a full reference metric; in other
words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-
free image as reference.
SSIM is a perception-based model that considers image degradation as perceived change in structural
information, while also incorporating important perceptual phenomena, including both luminance
masking and contrast masking terms. The difference with other techniques such as MSE or PSNR is that
these approaches estimate absolute errors. Structural information is the idea that the pixels have strong
inter-dependencies especially when they are spatially close. These dependencies carry important
information about the structure of the objects in the visual scene. Luminance masking is a phenomenon
whereby image distortions (in this context) tend to be less visible in bright regions, while contrast
masking is a phenomenon whereby distortions become less visible where there is significant activity or
"texture" in the image.
History
The predecessor of SSIM was called Universal Quality Index (UQI), or Wang–Bovik Index, which was
developed by Zhou Wang and Alan Bovik in 2001. This evolved, through their collaboration with Hamid
Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in
the IEEE Transactions on Image Processing.[1] In addition to defining the SSIM quality index, the paper
provides a general context for developing and evaluating perceptual quality measures, including
connections to human visual neurobiology and perception, and direct validation of the index against
human subject ratings.
The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at The
University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision
(LCV) at New York University. Further variants of the model have been developed in the Image and
Visual Computing Laboratory at University of Waterloo and have been commercially marketed.
SSIM subsequently found strong adoption in the image processing community and in the television and
social media industries. The 2004 SSIM paper has been cited over 40,000 times according to Google
Scholar,[2] making it one of the highest cited papers in the image processing and video engineering fields.
It was recognized with the IEEE Signal Processing Society Best Paper Award for 2009.[3] It also received
the IEEE Signal Processing Society Sustained Impact Award for 2016, indicative of a paper having an
unusually high impact for at least 10 years following its publication. Because of its high adoption by the
television industry, the authors of the original SSIM paper were each accorded a Primetime Engineering
Emmy Award in 2015 by the Television Academy.
Algorithm
The SSIM index is calculated on various windows of an image. The measure between two
the variance of ;
the variance of ;
and by default.
Formula components
The SSIM formula is based on three comparison measurements between the samples of and :
c3=c2/2
SSIM is then a weighted combination of those comparative measures:
Setting the weights to 1, the formula can be reduced to the form shown above.
Mathematical Properties
SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or
non-negativity, and thus is not a distance function. However, under certain conditions, SSIM may be
converted to a normalized root MSE measure, which is a distance function. [5] The square of such a
function is not convex, but is locally convex and quasiconvex,[5] making SSIM a feasible target for
optimization.
Application of the formula
In order to evaluate the image quality, this formula is usually applied only on luma, although it may also
be applied on color (e.g., RGB) values or chromatic (e.g. YCbCr) values. The resultant SSIM index is a
decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1
indicates perfect anti-correlation. For an image, it is typically calculated using a sliding Gaussian window
of size 11x11 or a block window of size 8×8. The window can be displaced pixel-by-pixel on the image
to create an SSIM quality map of the image. In the case of video quality assessment,the authors propose
to use only a subgroup of the possible windows to reduce the complexity of the calculation.
Variants
Multi-Scale SSIM
A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM)[4] is conducted over multiple scales
through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early
vision system. It has been shown to perform equally well or better than SSIM on different subjective
image and video databases.
Multi-component SSIM
Three-component SSIM (3-SSIM) is a form of SSIM that takes into account the fact that the human eye
can see differences more precisely on textured or edge regions than on smooth regions. [9] The resulting
metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and
smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The
authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are
closer to subjective ratings. This suggests that edge regions play a dominant role in image quality
perception.
The authors of 3-SSIM have also extended the model into four-component SSIM (4-SSIM). The edge
types are further subdivided into preserved and changed edges by their distortion status. The proposed
weighting is 0.25 for all four components.
Structural Dissimilarity
Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance
function as the triangle inequality is not necessarily satisfied.
transform for the signal y . Additionally, K is a small positive number used for the purposes
of function stability. Ideally, it should be zero. Like the SSIM, the CW-SSIM has a maximum value of 1.
The maximum value of 1 indicates that the two signals are perfectly structurally similar while a value of 0
indicates no structural similarity.
SSIMPLUS
The SSIMPLUS index is based on SSIM and is a commercially available [Link] extends SSIM's
capabilities, mainly to target video applications. It provides scores in the range of 0–100, linearly matched
to human subjective ratings. It also allows adapting the scores to the intended viewing device, comparing
video across different resolutions and contents.
According to its authors, SSIMPLUS achieves higher accuracy and higher speed than other image and
video quality metrics. However, no independent evaluation of SSIMPLUS has been performed, as the
algorithm itself is not publicly available.
cSSIM
In order to further investigate the standard discrete SSIM from a theoretical perspective,
the continuous SSIM (cSSIM) has been introduced and studied in the context of Radial basis function
interpolation.
SSIMULACRA
SSIMULACRA and SSIMULACRA2 are variants of SSIM developed by Cloudinary with the goal of
fitted to subjective opinion data. The variants operate in XYB color space and combine MS-SSIM with
two types of asymmetric error maps for blockiness/ringing and smoothing/blur, common compression
artifacts. SSIMULACRA2 is part of libjxl, the reference implementation of JPEG XL.
Other simple modifications
The r* cross-correlation metric is based on the variance metrics of SSIM. It's defined as r*(x, y)
= σxy/σxσy when σxσy ≠ 0, 1 when both standard deviations are zero, and 0 when only one is zero. It has
found use in analyzing human response to contrast-detail phantoms.
SSIM has also been used on the gradient of images, making it "G-SSIM". G-SSIM is especially useful on
blurred images.
The modifications above can be combined. For example, 4-G-r* is a combination of 4-SSIM, G-SSIM,
and r*. It is able to reflect radiologist preference for images much better than other SSIM variants tested.
2.11 Deconvolution
Deconvolution is a computationally intensive image processing technique used to improve the contrast
and sharpness of images captured using a light microscope. Light microscopes are diffraction limited,
which means they are unable to resolve individual structures unless they are more than half the
wavelength of light away from one another. Each point source below this diffraction limit is blurred by
the microscope into what is known as a point spread function (PSF). With traditional widefield
fluorescence microscopy, out-of-focus light from areas above or below the focal plane causes additional
blurring in the captured image. Deconvolution removes or reverses this degradation by using the optical
system’s point spread function and reconstructing an ideal image made from a collection of smaller point
sources.
A light microscope’s point spread function varies based on the optical properties of both the microscope
and the sample, making it difficult to experimentally determine the exact point spread function of the
complete system. For this reason, mathematical algorithms have been developed to determine the point
spread function and to make the best possible reconstruction of the ideal image using deconvolution.
Nearly any image acquired with a fluorescence microscope can be deconvolved, including those that are
not three dimensional.
Commercial software brings these algorithms together into cost-effective, user-friendly packages. Each
deconvolution algorithm differs in how the point spread and noise functions of the convolution operations
are determined. The basic imaging formula is:
x: Spatial coordinate
g(x): Observed image
f(x): Object
h(x): Point spread function
n(x): Noise function
*: Convolution
Deblurring Algorithms
Deblurring algorithms apply an operation to each two-dimensional plane of a three-dimensional image
stack. A common deblurring technique, nearest neighbor, operates on each z-plane by blurring the
neighboring planes (z + 1 and z - 1, using a digital blurring filter), then subtracting the blurred planes
from the z-plane. Multi-neighbor techniques extend this concept to a user-selectable number of planes. A
three-dimensional stack is processed by applying the algorithm to every plane in the stack.
This class of deblurring algorithms is computationally economical because it involves relatively simple
calculations performed on a small number of image planes. However, there are several disadvantages to
these approaches. For example, structures whose point spread functions overlap each other in nearby z-
planes may be localized in planes where they do not belong, altering the apparent position of the object.
This problem is particularly severe when deblurring a single two-dimensional image because it often
contains diffraction rings or light from out-of-focus structures that will then be sharpened as if they were
in the correct focal plane.
Inverse Filter Algorithms
An inverse filter functions by taking the Fourier transform of an image and dividing it by the Fourier
transform of the point spread function. Division in Fourier space is equivalent to deconvolution in real
space, making inverse filtering the simplest method to reverse the convolution in the image. The
calculation is fast, with a similar speed as two-dimensional deblurring methods. However, the method’s
utility is limited by noise amplification. During division in Fourier space, small noise variations in the
Fourier transform are amplified by the division operation. The result is that blur removal is compromised
as a tradeoff against a gain in noise. This technique can also introduce an artifact known as ringing.
Additional noise and ringing can be reduced by making some assumptions about the structure of the
object that gave rise to the image. For instance, if the object is assumed to be relatively smooth, noisy
solutions with rough edges can be eliminated. Regularization can be applied in one step within an inverse
filter, or it can be applied iteratively. The result is an image stripped of high Fourier frequencies, resulting
in a smoother appearance. Much of the "roughness" removed in the image resides at Fourier frequencies
well beyond the resolution limit and, therefore, the process does not eliminate structures recorded by the
microscope. However, because there is a potential for loss of detail, software implementations of inverse
filters typically include an adjustable parameter that enables the user to control the tradeoff between
smoothing and noise amplification. In most image-processing software programs, these algorithms have a
variety of names including Wiener deconvolution, Regularized Least Squares, Linear Least Squares, and
Tikhonov-Miller regularization.
Constrained Iterative Algorithms
A typical constrained iterative algorithm improves the performance of inverse filters by applying
additional algorithms to restore photons to the correct position. These methods operate in successive
cycles based on results from previous cycles, hence the term iterative. An initial estimate of the object is
performed and convolved with the point spread function. The resulting "blurred estimate" is compared
with the raw image to compute an error criterion that represents how similar the blurred estimate is to the
raw image. Using the information contained in the error criterion, a new iteration takes place—the new
estimate is convolved with the point spread function, a new error criterion is computed, and so on. The
best estimate is the one that minimizes the error criterion. As the algorithm progresses, each time the
software determines that the error criterion has not been minimized, a new estimate is blurred again, and
the error criterion recomputed. The cycle is repeated until the error criterion is minimized or reaches a
defined threshold. The final restored image is the object estimate at the last iteration.
The constrained iterative algorithms offer good results, but they are not suitable for all imaging setups.
They require long calculation times and place a high demand on computer processors. This can be
overcome with modern technologies, such as GPU-based processing which significantly improves speed.
To take full advantage of the algorithms, three-dimensional images are required, though two-dimensional
images can be used with limited performance.
Confocal, Multiphoton, and Super Resolution
Some recommend deconvolution as an alternate technique to using a confocal microscope. 1This is not
strictly true because deconvolution techniques can also be applied to the images acquired using the
pinhole aperture in a confocal microscope. In fact, it is possible to restore images acquired with a
confocal, multiphoton, or super resolution light microscope.
The combination of optical image improvement through confocal or super resolution microscopy and
deconvolution techniques improves sharpness beyond what is generally attainable with either technique
alone. However, the major benefit of deconvolving images from these specialized microscopes is
decreased noise in the final image. This is particularly helpful for low-light applications like live cell
super resolution or confocal imaging. Deconvolution of multiphoton images has also been successfully
utilized to remove noise and improve contrast. In all cases, care must be taken to apply an appropriate
point spread function, especially if the confocal pinhole aperture is adjustable.
*1. Shaw, Peter J., and David J. Rawlins. “The point-spread function of a confocal microscope: its
measurement and use in deconvolution of 3-D data.” [Link] of Microscopy. 163, Issue no. 2 (1991):
151–165.
Deconvolution in Practice
Processing speed and quality are dramatically affected by how software implements the deconvolution
algorithm. The algorithm can be optimized to reduce the number of iterations and accelerate convergence
to produce a stable estimate. For example, an unoptimized Jansson-Van Cittert algorithm usually requires
between 50 and 100 iterations to converge to an optimal estimate. By prefiltering the raw image to
suppress noise and correcting with an additional error criterion on the first two iterations, the algorithm
converges in only 5 to 10 iterations.
When using an empirical point spread function, it is critical to use a high-quality point spread function
with minimal noise. To achieve this, commercial software packages contain preprocessing routines that
reduce noise and enforce radial symmetry by averaging the Fourier transform of the point spread
function. Many software packages also enforce axial symmetry in the point spread function and assume
the absence of spherical aberration. These steps reduce the empirical point spread function’s noise and
aberrations and make a significant difference in the restoration’s quality.
Preprocessing can also be applied to raw images using routines such as background subtraction and
flatfield correction. These operations can improve the signal-to-noise ratio and remove certain artifacts
that are detrimental to the final image.
In general, the more faithful the data representation, the more computer memory and processor time are
required to deconvolve an image. Previously, images would be divided into subvolumes to accommodate
processing power, but modern technologies have reduced this barrier and expanded into larger data sets.
Olympus Deconvolution Solutions
Olympus’ cellSens imaging software features TruSight deconvolution, which combines commonly used
deconvolution algorithms with new techniques designed for use on images acquired with Olympus
FV3000 and SpinSR10 microscopes, delivering a full portfolio of tools for image processing and analysis.
Lauren Alvarenga
Scientific Solutions Group
Olympus Corporation of the Americas
2.12 Homography
Homography
Homography, also referred to as planar homography, is a transformation that is occurring between two
planes. In other words, it is a mapping between two planar projections of an image. It is represented by a
3x3 transformation matrix in a homogenous coordinates space. Mathematically, the homograohy matrix is
represented as:
As observed in the illustration, the element in an image has its projection to the other image in a
homogenous coordinate plane, retaining the same information but in a transformed perspective.
We shall be using multiple images for this article. Download the images by clicking at the link embedded
in their captions.
the information from the image? Fortunately, yes! And you have to do for this scenario is to locate the
corners of the board and set them as your source coordinates. After that, in the same image where you
want homography projection to take place, choose your destination coordinate where you would want the
mind where the destination points are from that other image? Let us go through this example.
Say we are interested in transforming half of the court through homography. Can this be done? Let us see.
First things first, identify the source coordinates from the image above (i.e. the corners of the half court).
After that, locate our destination coordinates from another image entirely different from the one presented
above.
players in another perspective but still retaining the relevant information from the original one.
Very cool, indeed! Now you are now are able to transform images using the homography matrix. Keep
posted on my upcoming articles. Fasten you seatbelt and see you in the next article!
2.13 Convolution
This tutorial is about one of the very important concept of signals and system. We will completely discuss
convolution. What is it? Why is it? What can we achieve with it?
As we have discussed in the introduction to image processing tutorials and in the signal and system that
image processing is more or less the study of signals and systems because an image is nothing but a two
dimensional signal.
Also we have discussed, that in image processing , we are developing a system whose input is an image
and output would be an image. This is pictorially represented as.
The box is that is shown in the above figure labeled as “Digital Image Processing system” could be
thought of as a black box
Till now we have discussed two important methods to manipulate images. Or in other words we can say
that, our black box works in two different ways till now.
Graphs (Histograms)
This method is known as histogram processing. We have discussed it in detail in previous tutorials for
increase contrast, image enhancement, brightness e.t.c
Transformation functions
This method is known as transformations, in which we discussed different type of transformations and
some gray level transformations
Here we are going to discuss another method of dealing with images. This other method is known as
convolution. Usually the black box(system) used for image processing is an LTI system or linear time
invariant system. By linear we mean that such a system where output is always linear , neither log nor
exponent or any other. And by time invariant we means that a system which remains same during time.
So now we are going to use this third method. It can be represented as.
Or
There are two ways to represent this because the convolution operator(*) is commutative. The h(x,y) is
the mask or filter.
What is mask?
Mask is also a signal. It can be represented by a two dimensional matrix. The mask is usually of the order
of 1x1, 3x3, 5x5, 7x7 . A mask should always be in odd number, because other wise you cannot find the
mid of the mask. Why do we need to find the mid of the mask. The answer lies below, in topic of, how to
perform convolution?
Example of convolution
Mask
1 2 3
4 5 6
7 8 9
3 2 1
6 5 4
9 8 7
Flipping the mask vertically
9 8 7
6 5 4
3 2 1
Image
2 4 6
8 10 12
14 16 18
Convolution
Convolving mask over image. It is done in this way. Place the center of the mask at each element of an
image. Multiply the corresponding elements and then add them , and paste the result onto the element of
the image on which you place the center of mask.
The box in red color is the mask, and the values in the orange are the values of the mask. The black color
box and values belong to the image. Now for the first pixel of the image, the value will be calculated as
= 10 + 16 + 16 + 10
= 52
Place 52 in the original image at the first index and repeat this procedure for each pixel of the image.
Why Convolution
Convolution can achieve something, that the previous two methods of manipulating images can’t achieve.
Those include the blurring, sharpening, edge detection, noise reduction e.t.c.