0% found this document useful (0 votes)
128 views

ECE - DIP - Unit 4

This document provides an introduction to wavelets, multiresolution theory, and image compression techniques. It discusses wavelet transforms as an alternative to Fourier transforms that uses wavelets of varying frequency and duration as basis functions. Multiresolution theory represents images at multiple resolutions, from high-resolution base images to lower-resolution approximations at higher pyramid levels. Image pyramids and subband coding are described as techniques to decompose an image into bands of different frequencies. The discrete wavelet transform provides an efficient way to store and analyze images at multiple resolutions in both spatial and frequency domains.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

ECE - DIP - Unit 4

This document provides an introduction to wavelets, multiresolution theory, and image compression techniques. It discusses wavelet transforms as an alternative to Fourier transforms that uses wavelets of varying frequency and duration as basis functions. Multiresolution theory represents images at multiple resolutions, from high-resolution base images to lower-resolution approximations at higher pyramid levels. Image pyramids and subband coding are described as techniques to decompose an image into bands of different frequencies. The discrete wavelet transform provides an efficient way to store and analyze images at multiple resolutions in both spatial and frequency domains.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

UNIT IV

Wavelets and Image Compression


Introduction
In this chapter a small introduction is presented to the theory of wavelets along with multiresolution
theory in the first part and Image compression concepts and techniques are discussed in the second part.

4.1 Wavelets
Unlike the Fourier Transform whose basis functions are sinusoids, wavelet transforms are based on small
waves called wavelets, of varying frequency and limited duration. Mallat in the year 1987 showed that
wavelets will be the foundation for the new powerful approach to signal processing and analysis called
multiresolution theory. Multiresolution theory is concerned with the representation and analysis of images
at more than one resolution. The main advantage of multiresolution theory is that features that might go
undetected at one resolution may be easily detected at another resolution.

4.1.1 Image Pyramids


Image pyramids are powerful structure for representing images at more than one resolution. Image
pyramid is a collection of decreasing resolution images arranged in the shape of a pyramid. The base of
the pyramid contains a high resolution representation of the image being processed and the apex contains
a low resolution approximation as shown in figure 8.1 (a). As we move up the pyramid, both size and
resolution decrease.

Figure (a) Pyramidal image structure (b) Block diagram for Generation

Figure 8.1(b) shows a simple system for constructing image pyramids. The level j-1 approximation output
is used to create approximation pyramids, which contain one or more approximations of the original
image. Both the original image which is at the base of the pyramid, and its P reduced resolution
approximations can be accessed and manipulated directly. The level j prediction residual output is used to
build prediction residual pyramids. The information at level j is the difference between the level j
approximation of the corresponding approximation pyramid and an estimate of that approximation based
on the level j-1 prediction residual. This difference is coded and therefore can be stored and transmitted
efficiently.
From the block diagram it is clear that the approximation and prediction residual pyramids are computed
in an iterative fashion. A P+1 level pyramid is built executing the operations in the block diagram P
times. During the first iteration or pass, j=J and the original 2J x 2J image is applied as input. This
produces level J-1 approximation and level J prediction residual results. Each pass is composed of three
sequential steps:

Step1: In the first step a reduced resolution approximation of the input image is computed using
an approximation filter followed by a downsampler. A variety of filters such as mean pyramid or low
pass Gaussian filter can be used.The approximation filter output is downsampled by a factor of 2. The
output of the downsampler is called as level j-1 approximation and the quality of the generated
approximation is a function of the filter used. Without filters, the effects of aliasing will be more
pronounced in the upper levels of the pyramid.

Step 2: The output obtained at the step 1 is upsampled by a factor of 2 and then filtered using an
interpolation filter to obtain a prediction image. The resolution of the prediction image would be the same
as that of the input image.

Step 3: The difference between the step 2 output (i.e. the prediction image) and the input image is
computed to obtain the level j prediction residual. The prediction residual can be used to reconstruct
progressively the original image.

Executing this procedure P times produces two intimately related P+1 level approximations and
prediction residual pyramids. The level j-1 approximation outputs are used to populate the approximation
pyramid and the level j prediction residual outputs are placed in the prediction residual pyramid.

4.2 Subband Coding

In subband coding an image is decomposed into a set of bandlimited components, called


subbands. The subbands can be reassembled to reconstruct the original image without error. The
subbands are generated by bandpass filtering the input image. Since the bandwidth of the subbands is
smaller than that of the original image, the subbands can be downsampled without loss of information.
Reconstruction of the original image is accomplished by upsampling, filtering and summing the
individual subbands.
Figure 8.2 (a) shows the principal components of a two-band subbabnd coding and decoding
system. The input of the system is a one dimensional band limited discrete time signal x(n) for
n=0,1,2……. In the coding (analysis) section, the input signal is decomposed using the analysis filters
h0(n) and h1(n) and then downsampled to obtain the subbands y0(n) and y1(n) as shown in the figure. In
the decoder (synthesis) section, the thesubbandsy0(n) and y1(n) are upsampled initially and filtered using
the synthesis filters g0(n) and g1(n). The resulting subbands are summed up to obtain the output sequence
x^(n).
The analysis filters h0(n) and h1(n) are half-band digital filters whose idealized transfer functions
H0 and H1 are shown in figure 8.2(b). H0 is a low pass filter whose output is an approximation of the input
x(n) and H1 is a high pass filter whose output is the high frequency or detail part of x(n).

(a) Subband coding and decoding filter banks (b) Frequency spectrum

4.2.1 Subband Image coding

One dimensional filters can be used as two dimensional separable filters for the processing of images.
Figure 8.3 shows the block diagram of a two dimensional four band filter bank for subband image coding.
In this technique the separable filters are first applied in one dimension (e.g. vertically) and then in the
other dimension (e.g. horizontally). Downsampling is performed in two stages: once after the first
filtering stage and then after the second filtering stage. Downsampling is done to reduce the overall
number of computations. The resulting filtered outputs are denoted as a(m,n), d V(m,n),dH(m,n) and
dD(m,n) and are called the approximation, vertical detail, horizontal detail and diagonal detail subbands of
the image respectively. One or more of these subbands can be split into four smaller subbands which can
be split again and so on.
Two Dimensional four band filter bank for subband image coding

Multiresolution Analysis

Unlike Fourier Transform, whose basis functions are sinusoids, wavelet transforms are based on
small waves called wavelets of varying frequency and limited duration. In 1987, wavelets were shown to
be the foundation of a powerful new approach to signal processing and analysis called Multiresolution
theory. Multiresolution theory incorporates and unifies techniques form a variety of disciplines,
including subband coding from signal processing, quadrature mirror filtering from digital speech
recognition and pyramidal image processing. Multiresoltuion theory is concerned with the representation
and analysis of signals or images at more than one resolution. The features that might go unnoticed in one
resolution may be easy to spot at another.

It is our common observation that the level of details within an image varies from location to
location. Some locations contain significant details, where we require finer resolution for analysis and
there are other locations, where a coarser resolution representation suffices. A multi-resolution
representation of an image gives us a complete idea about the extent of the details existing at different
locations from which we can choose our requirements of desired details. Wavelet transform is one of the
popular, approach for multi-resolution image analysis. In Multiresolution Analysis (MRA), a scaling
function is used to create a series of approximations of a function or image, each differing by a factor of 2
from its nearest neighboring approximations. Additional functions called wavelets are then used to
encode the difference in information between adjacent approximations. When digital images are to be
viewed or processed at multiple resolutions, the discrete wavelet transform (DWT) is the mathematical
tool of choice. DWT is an highly efficient, intuitive framework for the representation and storage of
multiresolution images. Fourier Transform is capable of revealing only an image’s frequency attributes
whereas DWT provides powerful insight into an image’s spatial and frequency attributes.

4.3. Series Expansions


A signal of function f(x) can often be better analyzed as a linear combination of expansion functions,
= ∑ ∝ 𝜑 (1)
Where k is an interger index of the finite or infinite sum; ∝ are real-valued expansion coefficients; and
𝜑 are real-valued expansion functions. If the expansion is unique that is if there is only one set of ∝
for any give f(x) then 𝜑 are called basis functions and the expansion set {𝜑 } is called a basis for
the class of functions that can be expressed. The expressible functions form a function space that is
referred to as the closed span of the expansion set, denoted as

= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
{𝜑 } (2)
The condition ∈ , means that f(x) is in the closed span of {𝜑 } and can be written in the form
of equation (1).
For any function space V and corresponding expansion set {𝜑 }, there is a set of dual functions
denoted as {𝜑̃ }, that can be used to compute the ∝ coefficients of equation(1). That is,

∝ = 𝜑̃ , = ∫ 𝜑̃ ∗ (3)

Where * denotes the complex conjugate operation.

4.3.1 Scaling Functions


The set of expansion functions composed of integer translations and binary scaling of the real, square –
integrable function 𝜑 , that is the set { 𝜑 , , } where,
/
𝜑, = 𝜑 − (4)
For all j, ∈ 𝜑 ∈𝐿 . Here, k determines the position of 𝜑 , along the x-axis, j
/
determines 𝜑 , ′ width and controls its height and amplitude. Since the shape of 𝜑 ,
changes with j, 𝜑 is called the scaling function.

Requirements of MRA
The scaling functions should obey the following four fundamental requirements of MRA,
MRA requirement 1: the scaling function is orthogonal to its integer translates

MRA requirement2: the subspaces spanned by the scaling function at low scales are nested within those
spanned at higher scales, as shown in figure. That is,

−∞ ⊂⋯ ⊂ − ⊂ ⊂ ⊂ … ∞

Figure The nested function spaces spanned by a scaling function

MRA requirement 3:The only function that is common to all Vj is f(x)=0.


If we consider coarsest expansion functions (i.e., = −∞ , the only representable function is the function
of no information. That is,
i.e., ∞ = { }
MRA Requirement 4: Any function can be represented with arbitrary precision.

Though it is not possible to expand a particular f(x) at an arbitrarily coarse resolution, all
measurable, square-integrable functions can be represented in the limit as → ∞. That is
∞ = {𝐿 }

Under these conditions, the expansion funciton of subspace Vj can be expressed as weighted sum of the
expansion functions of subspace Vj+1.

Let,
𝜑 , = ∑𝛼 𝜑 +

Using equation (4), 𝜑 + , can be written as,


𝑗+
+
𝜑 + , = ∑ ℎ𝜑 𝜑 −

Since 𝜑 =𝜑 , , both j and k can be set to 0 to obtain the simpler non subsripted expressio

𝜑 = ∑ ℎ𝜑 √ 𝜑 − (a)

The ℎ𝜑 coefficients in this recursive equation are called scaling function coefficients, ℎ𝜑 is referred to
as a scaling factor. Equation (a) is fundamental to multiresolution analysis and is called the refinement
equation, the MRA equation or the dilation equation. It states that the expression functions of any
subspace can be built from double resoltuion copies of themselves, that is from the expansion functions of
the next higher resolution space. The choice of a reference subspace, Vo is arbitrary.

4.3.2 Wavelet Functions


Given a scaling function that meets the MRA requirements of the previous section, we can define a
wavelet function 𝜓 that, together with its integer translates and binary scalings, spans the difference
between any two adjacent scaling subspaces, Vj and Vj+1. The situation is illustrated graphically in
Figure . we define the set{ 𝜓 , )} of wavelets,

/
𝜓 , = 𝜑 − )

for all ∈ that spans the Wj spaces in the figure. As with scaling functions, we write
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
{𝜓𝑗, }

And note that if,


= ∑𝛼 𝜓 ,

The scaling and wavelet function subspaces in figure 7.11 are related by
+ = ⨁
Where ⨁ denotes the union of spaces. Tbhe orthogonal complement of Vj in Vj+1 is Wj, and all members
of Vj are orthogonal to the members of Wj. Thus,
𝜑, ,𝜓 , =

For all appropriates , , ∈ .


We can now express the space of all measurable, square integrable functions as,
𝐿 = ⊕ ⊕ ⊕ … (a)
Or
𝐿 = ⊕ ⊕ ⊕ … (b)
Or even
𝐿 = ⋯⊕ − ⊕ − ⊕ ⊕ ⊕ …(c)

Which eliminates the scaling function and represents a function in terms of wavelets alone. Note that if
f(x) is an element of V1, but not Vo, an expansion using equation (b) contains an approximationof f(x)
using Vo scaling functions; wavelets from Wo would encode the difference between this appoximation
and the actual function. Equations (a),(b),(c) can be generalized to yield,
𝐿 = ⊕ ⊕ + ⊕…

Where jo is an arbitrary starting scale.

Any wavelet function like its scaling function counterpart can be expressed as aweighted sum of shifted
double resolution scaling functions. That is,
𝜓 = ∑ ℎ𝜓 √ 𝜑 −

Where ℎ𝜓 are called the wavelet function coefficients and ℎ𝜓 is the wavelet vector.
Since the integer wavelet translates are orthognals, it can be shown ℎ𝜓 is related to ℎ𝜑 by,
ℎ𝜓 = − ℎ𝜑 −

4.3.3 Discrete Wavelet Transforms and its inverse


Using wavelet series any function f( x)∈ L2( R )can be expressed as a series summation of scaling
functions and wavelet functions as

(3)
Where are the corresponding expansion coefficients. In the above equation, the first term
of the expansion involving the scaling functions provide approximations to f(x) at scale r0 and the second
term of expansion involving the wavelet functions add details to the approximation at r0 and its higher
scales. If the expansion functions form an orthonormal basis, which is often the case, the coefficients can
be calculated as

Using wavelet series a continuous one-dimensional signal f(x) is represented in terms of a series
expansion consisting of continuous scaling and wavelet functions, as given in equation (3). Now, instead
of considering a continuous signal, if we consider a discrete sequence s(n), defined for n =0,1,…, the
resulting coefficients in the series expansion are called the Discrete Wavelet Transforms (DWT) of s(n).
The coefficients of series expansions shown in equations are modified for discrete signal to obtain the
DWT coefficients, given by

are functions of discrete variables n=0,1….,M-1. Equation


computes the detail coefficients. The corresponding Inverse Discrete Wavelet Transform (IDWT) to
express the discrete signal in terms of the wavelet coefficients can be written as

Normally j0=0 and M is set to be a power of 2, (i.e., M=2J), so that the summations are performed over
j=0,1,…,J-1 and k=0,1,2…2j-1

4.3.4 Two-dimensional DWT


The concepts of one-dimensional DWT and its implementation through subband coding can be easily
extended to two-dimensional signals for digital images. In case of subband analysis of images, we require
extraction of its approximate forms in both horizontal and vertical directions, details in horizontal
direction alone (detection of horizontal edges), details in vertical direction alone (detection of vertical
edges) and details in both horizontal and vertical directions (detection of diagonal edges). This analysis of
2-D signals require the use of following two-dimensional filter functions through the multiplication of
separable scaling and wavelet functions in n1 (horizontal) and n2 (vertical) directions, as defined below:
ϕ(n1,n2)=ϕ(n1)ϕ(n2)

ψ H (n1,n2 ) =ψ(n1 )ϕ (n2 )

ψ V (n1,n2 ) =ϕ(n1 )ψ (n2 )

ψ D (n1,n2 ) =ψ(n1 )ψ (n2 )

In the above equations, represent the approximated signal,


signal with horizontal details, signal with vertical details and signals with diagonal details respectively.
The 2-D analysis filter implemented through separable scaling and wavelet functions is shown in fig.
2-D analysis filtering through separable scaling and wavelet functions.

The filtering in each direction follows subsampling by a factor of two, so that each of the subbands
corresponding to the filter outputs contain one-fourth of the number of samples, as compared to the
original 2-D signal. The output of the analysis filter banks is the Discrete Wavelet Transformed (DWT)
coefficients.
The bands are also referred to as LL, LH, HL and HH
respectively, where the first letter represents whether it is low-pass (L) or high-pass (H) filtered along the
columns (vertical direction) and the second letter represents whether the low-pass or high-pass filtering is
applied along the rows (horizontal direction). It is possible to iteratively apply the 2-D subband
decompositions as above on any of the subbands. Commonly, it is the LL subband (the approximated
signal) that requires analysis for further details. Like this, if the LL subband is iteratively decomposed for
analysis, the resulting subband partitioning is called the dyadic partitioning. It may be noted that every
level of decomposition subsamples the newly created subbands by a factor of two along the rows and
columns (that is, by a factor of four) as compared to the previous level of decomposition. Note however
that the total number of DWT coefficients considering all the subbands always remains same as that of
the total number of pixels in the image. As we go further up in the levels of decomposition, we suffer a
loss of resolution in the newly created subbands. This is to say that the first level of decomposition
extracts the finest resolution of details; the subbands created in the second level of decomposition extract
coarser details than the first one and so on.
It may be noted that the process of analysis filtering is lossless. It is therefore possible to have a perfect
reconstruction of the original 2-D signal (image) by a reverse process of synthesis filtering, as shown in
fig.8.4, which is just the mirror of fig.8.3
2-D synthesis filtering for image reconstruction

It may be noted that the synthesis filter banks along the rows and columns are associated with an
upsampling by a factor of two so that the reconstructed image can be shown at the original resolution.
The synthesis filter banks therefore perform the IDWT, which is also lossless, like the DWT.

4.3.5 Applying DWT and IDWT on images


Figure 8.5 shows original image,and figure 8.6(a) shows the result of DWT after one level of
decomposition.

Original Lena Image


(a) Result of DWT after one level of decomposition (b) Result of DWT after two levels of
decomposition

It may be noted that the LL, LH, HL and HH subband partitions indicate the approximated image and the
images with horizontal edges, vertical edges and diagonal edges respectively. Decomposing the LL
subband further to have a second level of DWT decomposition results in fig (b). The pixels within each
subband represent the DWT coefficients corresponding to the subband. By applying IDWT on the DWT
coefficients, we obtain the reconstructed image, as shown in fig below.

Reconstructed Lena image

In this case, the reconstruction is lossless, since the filter coefficients are exactly represented and the
DWT coefficients are not quantized. However, in practice, we have to transmit the images in limited
bandwidth situations, necessitating compression and hence, the DWT coefficients are to be quantized and
efficiently encoded.
IMAGE COMPRESSION

4.4 Image Compression Fundamentals


Image compression is an application of data compression that encodes the original image with few
bits. The objective of image compression is to reduce the redundancy of the image and to store or
transmit data in an efficient form.With the advancement in internet, teleconferencing, multimedia and
HDTV technologies, the amount of information that is handled by computer has grown exponentially.For
example, for a color image of size 1600 x 1200,
The storage space required =1600 x 1200 x 8 x 3
= 5.76 Mbytes ~ 4 floppy disk space
If we talk about video images of 25 frames per second, even a one second of color film requires
approximately 19 megabytes of memory, therefore the capacity of a typical hard disk of a PC-machine
(540 MB) can store only about 30 seconds of film. Thus, the necessity for image and video compression
is obvious.

4.4.1 Image compression (Definition)


It is the process of reducing the size of the image for easy storage and transmission. It is achieved
by removing the redundant data.
Data compression refers to the process of reducing the amount of data required to represent given
quantity of information. Note that data and information are not the same. Data refers to the means by
which the information is conveyed.Various amounts of data can represent the same amount of
information. Sometimes the given data contains some data which has no relevant information, or
restates/repeats the known information. It is thus said to contain data redundancy. Data redundancy is the
central concept in image compression, because compression is achieved by removing this redundancy.
Figure 8.8 illustrates that the image is comprised of redundant data and information. As the redundant
data is high, higher compression rates can be achieved.

Information versus data


Need for Image Compression

Image compression is needed to


i) reduce the storage space required to store multimedia content
ii) increase the transmission rate and hence the speed of transmission
iii) minimize the bandwidth requirements and hence the cost.

4.4.2 Redundancy
Various amounts of data may be used to represent the same amount of information. The data that
provides no relevant information or simply restates which is already known is called redundant data. This
redundant data can be removed to achieve compression. There are three main data redundancies used in
image compression,
•Coding redundancy
•Interpixel redundancy
•Psychovisual redundancy

Coding Redundancy:

A code is a system of symbols (i.e. bytes, bits) that represents information. Each piece of information is
represented by a set of code symbols. If the code fails to minimize the average number of bits required to
represent each pixel it leads to coding redundancy. Thus coding redundancy can be removed by using
optimal codes.

Inter pixel redundancy:[Geometric or interference or spatial]


It is based on interpixel correlations within an image. Because the value of any given pixel can be
reasonably predicted from the value of its neighbors, The information carried by individual pixel is
redundant. This is known as interpixel redundancy. The interpixel redundancy in images is known as
spatial redundancy.The interpixel redundancy in video i.e between frames is known as temporal
redundancy.

Psycho visual Redundancy


Certain information simply has less relative importance than other information in normal visual
processing. This information is said to be psycho visually redundant. Unlike coding and interpixel
redundancies, the psychovisual redundancy is related with the real/quantifiable visual information. Its
elimination results a loss of quantitative information. However psychovisually the loss is negligible.
Removing this type of redundancy is a lossy process and the lost information cannot be recovered. Since
the elimination of psycho visually redundant data results in a loss of quantitative information, it is
commonly referred to as quantization.
4.4.3 Compression Metrics

Bit Rate

The most obvious measure of the compression efficiency is the bit rate, which gives the average number
of bits per stored pixel of the image:

size of the compressed file C


bit rate =  (bits per pixel)
pixels in the image N

where C is the number of bits in the compressed file, and N is the number of pixels in the image.

Compression efficiency

The most commonly used compression parameter is compression ratio. Compression ratio is defined as,

size of the original file n1


Compression ratio CR= 
size of the compressed file n2
Where n1 and n2 denote the number of information carrying units in the original file and the compressed
file respectively.

If n1 = n2 , then CR=1 and RD=0, indicating that the first representation of the information (i.e. the
original image) contains no redundant data. A typical compression ratio around 10 or(10:1) indicates that
90% (RD=0.9) of the data in the first data set is redundant.

Data redundancy

The Relative data redundancy RD of the first data set, n1is defined as
RD = −
cR

4.5 Image Compression Model:

Figure 8.9 shows a general image compression model. Image compression system consists of two
distinct structured blocks: an encoder and a decoder. The storage or transmission media is also referred to
as channel as shown in the figure.
Image Compression Model

Encoder:
The encoder consists of two relatively independent function or sub blocks: source Encoder and
Channel Encoder.It consists of a source encoder and a channel encoder. The source encoder reduces or
eliminates any redundancies in the input image, which usually leads to bit savings (i.e., compression).
The channel encoder is used to increase noise immunity of source encoder’s output. In the channel
encoder block, redundant bits are added to the data to achieve error control and correction. If the channel
is noise-free, the channel encoder and decoder may be omitted. At the receiver’s side, the channel and
source decoder perform the opposite functions and ultimately recover the original image. If the
compression technique used is lossless the recovered image will be the same as the original image and if
the compression technique used is lossy then the recovered image will be an approximation of the
original image.

Source Encoder:
Figure 8.10 shows the blocks in a source encoder. The source encoder is responsible for reducing
or eliminating any coding, interpixel or psychovisual redundancies in the input image. The Mapper is
used to reduce the interpixel redundancy. The quantizer is used to reduce the psycho visual redundancies.
The quantisation operation is irreversible. Thus it must be omitted when error free compression is desired.
Symbol Encoder is used to encode the quantizer output. It uses a fixed or variable length code to
represent the quantizer output, which is given to the channel encoder. The symbol encoder aims at
reducing the coding redundancy. It removes coding redundancy by assigning the shortest code words to
the most probable symbols.

(a) Source Encoder (b) Source decoder


Channel Encoder:
The channel used to transmit the encoded data is noisy or prone to error. Channel encoder is
designed to reduce the impact of channel noise by inserting a controlled form of redundancy into the
source encoded data. One of the most useful channels encoding technique is hamming code. The channel
encoded data is transmitted through the channel.

Decoder:
The decoder consists of two main structural blocks: i) Channel decoder and ii) Source decoder

Channel Decoder:
The Channel decoder is used to decode the hamming encoded data. The channel decoder checks
for odd parity (or even parity) and if a nonzero value is found the decoder simply complements the code
word bit position indicated by the parity word.

Source decoder:
The channel decoded data is given to the source decoder. The source decoder consists of a
symbol decoder and an inverse mapper as shown in 8.10 (b). The source decoder performs the inverse
operation of the source encoder in the inverse order. Since quantization results in irreversible loss of
information an inverse quantization block is not included in the model. The symbol decoder performs the
inverse operation of the symbol encoder that is it is used to decode the data. For example if Huffman
encoding is used in the symbol decoder then Huffman decoding will be done in the symbol decoder.
Similarly inverse mapper performs the inverse operation of the mapper.

Note: Since quantization results in irreversible information loss, an inverse quantizer block is not
included in the general source decoder model. Thus the output of the inverse mapper gives the
decompressed image f(x,y). As stated earlier the decompressed image will be an approximation to the
original image if the compression is lossy.

4.5.1 Types of compression


Based on the type of redundancy removed and the techniques used, compression is classified in to
two types : i) Lossless compression ii) Lossy compression.

Lossless compression also called as error free compression is achieved by removing interpixel and coding
redundancy. Whereas removal of psychovisual redundancy leads to loss of information and is called as
lossy compression. The difference between the two is stated in Table 4.1
Table 4.1 Lossless Compression Vs. Lossy Compression

S.No lossless compression lossy compression


or
Error free compression
1. There is no loss of data There is loss of data
2. It is reversible It is irreversible
3. Achieves compression by reducing interpixel Achieves compression by reducing interpixel,
and coding redundancy psycho visual and coding redundancy.
4. Achieves compression in the range of 2 to 10 Achieves higher compression ratio.
5. It is used in satellite communication, medical It is used in broad casting where loss of data is
documents etc where loss of data is not acceptable
acceptable
4.6 Error-free compression

For numerous applications such as archival of medical and legal documents, Error-free compression is the
only acceptable means of data reduction. Similarly, for applications such as satellite imagery and
diagnostic radiography lossy compression is prohibited. Thus for all these applications stated above error
free compression techniques are desired. As stated earlier Error-free compression is also known as
lossless compression. Error free compression is generally achieved by using two relatively independent
operations: (i) reducing the interpixel redundancies and (ii) using an efficient coding method to reduce the
coding redundancies. The coding redundancy can be minimized by using a variable-length coding method
where the shortest codes are assigned to most probable gray levels.

4.7 Variable length coding


Variable length coding is used to reduce coding redundancy. Coding redundancy is present in any
natural binary coding technique. Coding redundancy can be removed by using variable length codes.
Variable length codes assign shortest possible code words to the most probable symbols (or pixels).
Several optimal and near optimal coding techniques can be used for constructing the code.

Most commonly used variable length coding techniques are i) Huffman coding ii) Shift Codes iii)
Arithmetic coding.

4.7.1 Huffman coding

Huffman code is the most popular variable length code used to remove coding redundancy. When the
information source symbols are to coded individually, Huffman code is the most appropriate solution.
Huffman code is an optimal code for a fixed value of n, subject to the constraint that the sources symbols
be coded one at a time.

Principle: Huffman code assigns shortest possible code word to the most probable gray level; and vice
versa. It yields the smallest possible number of code symbols/source symbol.

Procedure:
Step 1: (Source Reduction)
 Arrange the symbols in descending order of their probability
 Combine the two lowest probability symbols into a single compound symbol with compound
probability.
 Move the compound symbol along with its probability to the first source reduction step such that
the probabilities are in the descending order.
 Repeat the procedure until reduced source with two symbols is reached.

Step 2: (Assigning code and codeword formation)


 Assign symbols ‘0’ and ‘1’ for the last two symbols in the extreme right and work it back to the
original source.
 Repeat the procedure to each and every step of source reduction by appending it with the
combined symbols’ code.
Properties of Huffman code:
1. It is a block code: Because each source symbol is mapped into a fixed sequence of code
symbols.
2. It is instantaneous: Because each codeword in a string of code symbols can be descended
without the reference of succeeding symbols.
3. It is uniquely decodable: Because each codeword in a string of code symbols can be
decoded in only one way. This is also known as prefix property. (i.e.) a shorter codeword
will never form the start of a longer codeword.
4. It is an optimal code: Because average codeword length is minimum for the given fixed
number of symbols

Disadvantage:
i)When a large number of symbols are to be coded, the construction of optimal binary Huffman coding is
a non trivial task For ‘J’ source symbols, ‘J-2’ source reductions and ‘J-2’ code assignments should be
made.
ii) Huffman Code is not adaptable, that is it cannot be adapted for varying probabilities of the symbols.

Example:
Encode the following using Huffman coding
Symbol Probability
a1 0.1
a2 0.4
a3 0.06
a4 0.1
a5 0.04
a6 0.3

Solution:
Step1: Source Reduction
`i) Arrange the symbols in descending order of their probability
Symbol Probability
a2 0.4
a6 0.3
a1 0.1
a4 0.1
a3 0.06
a5 0.04
ii) Reduce the source symbols by combining the bottom two symbols to form the combined symbol. In
this example the bottom two probabilities are 0.06 & 0.04. They are added, hence the
probability of the combined symbol is 0.1. It is moved to the first step of source reduction
such that the probabilities are in the descending order.
ii) Repeat the procedure, that is combine the bottom two probabilities (0.1 & 0.1) to obtain 0.2 and place
it such that the probabilities are in the descending order.
Step2: Code Assignment.
i) The codes 0 and 1 are assigned to the last two symbols in the extreme right
ii) The combined symbol code is appended with 0 and 1 for the bottom two symbols and the
procedure is repeated as shown in the figure.
iii) The procedure is repeated for all stages of source reduction to obtain the codewords.

The symbols along with their unique codeword is displayed in Table 4.2

Table 4.2 Symbol with codeword


Symbol Final unique code word
A2 1
A6 00
A1 011
A4 0100
A3 01010
A5 01011
Step 3: Compute li

Symbol Probability Code word Length of code


(li)
A1 0.1 011 3
A2 0.4 1 1
A3 0.06 01010 5
A4 0.1 0100 4
A5 0.04 01011 5
A6 0.3 00 2

Average length of code word Lavg=Σ pili

=0.1(3) + 0.4(1) + 0.06(5) + 0.1(4) + 0.04(5) + 0.3(2)

=2.2 bits/symbol

4.7.2 Shift codes

Due to the difficulty in generating Huffman code for longer number of source symbols, some other
near optimal variable length codes are generated.

4.7.2.1 Binary Shift Code


Procedure
1. Arrange the source symbols in the decreasing order of probability
2. Divide the total number of source symbols into symbol blocks of equal size.
3. Select the first block as reference block and code the symbols in the reference block using
binary code.
4. The next symbol code which is not used in the reference block is used to identify the
remaining blocks
5. The remaining blocks are coded similar to reference block and a shift up or shift down symbol
is used to identify the block.
i.e. One shift up symbol is added to block 1 and 2 shift up symbols are added to block 2 and so
on.

4.7.2.2 Huffman shift codes


 Arrange the source symbols in the decreasing order of probability
 Divide the total number of source symbols into symbol blocks of equal size
 Select the first block as reference block
 Use Huffman coding to code the reference block as described below,
1. add the probability of symbols in the remaining blocks and take that as the shift
symbols.
2. arrange the symbols in the reference block along with shift symbol in decreasing order
of probability.
3. code them using Huffman coding
4. use the code obtained for the shift symbol as the shift up or shift down code to identify
the remaining blocks.
One shift up symbol is added to block 1 and 2 shift up symbols are added to block 2 and so
on.
Problem
Encode the given symbols using binary and Huffman codes
Table 4.3
Source symbol Probability Binary shift code Huffman shift code
Block A
A1 0.2 000 10
A2 0.1 001 011
A3 0.1 010 110
A4 0.06 011 0100
A5 0.05 100 0101
A6 0.05 101 1110
A7 0.05 110 1111

Block 2
A8 0.04 111 000 0010
A9 0.04 111 001 00 011
A10 0.04 111 010 00 110
A11 0.04 111 011 00 0100
A12 0.03 111 100 00 0101
A13 0.03 111 101 00 1110
A14 0.03 111 110 00 1111
Block 3
A15 0.03 111 111 000 00 00 10
A16 0.02 111 111 001 00 00 011
A17 0.02 111 111 010 00 00 110
A18 0.02 111 111 011 00 00 0100
A19 0.02 111 111 100 00 00 0101
A20 0.02 111 111 101 00 00 1110
A21 0.01 111 111 110 00 00 1111
Average Length 4.59 4.2
Solution:

Procedure for binary shift


1. divide the symbol blocks, since there are 21, split into 3 blocks of length 7
2. encode the reference block i.e. block 1 using binary coding
3. unused code ‘111’ can be used as the shift code
4. add one ‘shift up’ symbol code to block ‘2’.
5. add two shift up symbol code to block ‘3’.

Huffman shift code solution


Step1: divide the symbols into 3 blocks of 7 each.
Step 2: sum the probabilities of all the source symnols outside the reference block.
i.e. sum of the probability of A8 to A21 = 0.39 and it is taken as the probability of shift symbol.
Step3: Encode the symbols using binary Huffman coding (Refer Huffman coding procedure)
Step4: Using Huffman code the shift code generated is 00.
Step5. add one ‘shift up’ symbol code to block ‘2’
Step 6:add two ‘shift up’ symbol code to block ‘3’

S o urce red u ctio n


Sy m bol p r o b a b i lit y 1 2 3 4 5 6

0
S hift 0. 39 0. 39 0. 39 0. 3 9 0. 3 9 0. 4 0. 6
Sy m bol

0
A1 0. 2 0. 2 0. 2 0. 2 0. 2 1 0. 3 9 0. 4
1

0
A2 0. 1 0. 1 0. 11 0. 2 1 0. 2 0. 2 1
1
0
A3 0. 1 0. 1 0. 1 0. 1 1 0. 2
1
0
A4 0. 0 6 0. 1 0. 1 0. 1
1
0
A5 0. 0 5 0. 0 6 0. 1
1
0
A6 0. 0 5 0. 0 5
1

A7 0. 0 5
1
Symbol probability Traced word Code word
shift 0.39 00 00
A1 0.2 01 10
A2 0.1 110 011
A3 0.1 011 110
A4 0.06 0010 0100
A5 0.05 1010 0101
A6 0.05 0111 1110
A7 0.05 1111 1111

4.7.3 Arithmetic coding

Arithmetic code is a non block code i.e. a one to one correspondence between source symbols and
code words does not exist. An entire sequence of source symbol is assigned a single arithmetic code
word. Arithmetic coding maps a string of data (source) symbols to a code string in such a way that
the original data can be recovered from the code string. The encoding and decoding algorithms
perform arithmetic operations on the code string. The following example illustrates the procedure to
perform arithmetic coding with an example.

Example
Consider the transmission of a message comprising a string of characters with probability e→0.3,
n→0.3, t→0.2, w→0.1, ·→0.1. Encode the word ‘went.’ using arithmetic coding.

Procedure:
Step 1: Divide the numeric range 0 to 1 into number of different symbols present in the message.
In the example, number of symbols = 5.
So, divide the range 0 to 1 into 5 and allot the intervals based on their probability of the symbols.
1 0.9 .83 .818 .8162
· · · · ·

0.9 0.89 .827 .8171 .8160 2


w w w w w

0.8 0.88 .824 .8162 .8158 4


t t t t t

0.6 0.86 .818 0.814 4 .8154 8

n n n n n
0.3 0.83 .809 0.811 7 .8149 4

e e e e e
0 0.8 .8 0.809 .8144
The interval [0-0.3] is assigned for ‘e’,[0.3-0.6] for ‘n’ and so on

Step 2: Expand the first letter to be coded. The first letter is ‘w’.
Expand the probability of letter’w’.
Expanded range is [0.8-0.9]
This range is further subdivided into ‘5’ as follows
Compute ‘d’ d=[upper bound – lower bound] =[0.9-0.8]
=0.1
So, range of e=lower limit :lower limit + d(probability of ‘e’)
= 0.8 :[0.8+0.1(0.3)]
=0.8:0.83
Range of ‘n’ =0.83:0.83+d(0.3)
=0.83:0.86
Range of‘t’ =0.86:0.86+d(0.2)
=0.86:0.88
Range of‘w’ =0.88:0.88+d(0.1)
=0.88:0.89
Range of ‘·’ =0.89:1

Step3:Expand letter ‘e’, and repeat the procedure. Here d=[0.83-0.8]=0.03


Range of ‘e’=0.8:[0.8+d(0.3)]
=0.8:[0.8+0.3(0.3)]
0.8:0.809
Similarly determine the range of all other symbols

Step 4: Expand the letter ‘n’, and repeat the procedure.

Step 5: Continue this procedure until the termination character ‘· ‘is encoded

Hence the completed code word is 0.81602<code word<0.8162.

Example 2: Encode the word a1a2a3a3a4 using arithmetic code and generate the tag for the given
source symbol and their probabilities.

Symbol Probability
a1 0.2
a2 0.2
a3 0.4
a4 0.2

Solution:
. ≤ Codeword ≤ .

The codeword can have any value between 0.06752 and 0.0688. The tag is generated by finding the
average the two values,
. + .
Therefore tag = = .

4.8 Bit plane coding:


Another effective technique for reducing an image’s interpixel redundancies is to process image’s bit
planes individually. Bit plane coding is based on the concept of decomposing a multilevel image into
a series of binary images and compressing each binary image using any one of the existing image
compression method.

Bit plane decomposition:


The gray levels of an m-bit gray scale image can be represented in the form of the base 2 polynomial
− −
− + − + ⋯+ +

Based on this property, the image can be decomposed into a collection of binary image by separating the
m-coefficients of the polynomial into m 1-bit bit planes. The zeroth order bit plane is generated by
collecting the a0 bits of each pixel while the (m-1)st order bit plane contains the am-1 bits or coefficients
and so on. The problem with this method is that, even small changes in gray level can have a significant
impact on the complexity of the bit plane. For example, if two pixels with values 127 and 128 are
adjacent, there is a 0 to 1 (or) 1 to 0 transition in each and every bit as shown below:
127 vs. 128  0111 1111 vs. 1000 0000

An alternative approach to overcome this problem is to use m-bit gray code instead of a binary code. The
m-bit gray code corresponding to the polynomial in equation can be computed from,

= ⊕ + ≤ ≤ −
− = −

Here ⊕ denotes the exclusive OR operation. The code has the unique property that successive code
words differ in only one bit position. Thus small changes in gray levels are less likely to affect all m
bit planes. For instance, when gray levels shifts from 127to 128, the gray code shifts from 11000000
to 01000000. Gray coded bit planes are less complex than the corresponding binary bit planes.

4.8.1 Run Length Encoding[1-D]

Run Length Encoding was developed in the 1950s. With its extensions, the standard compression
approach in facsimile(FAX) coding. The basic concept is to code each continuous group of 0’s or 1’s by
its length.

Run is defined as the length of continuous 1’s or 0’s.

In addition to RLE, variable length codes can be used to code the run lengths. This results in additional
compression. In this case, approximate run length entropy values of the image is,
H +H
𝐻𝑅 =
L +L
Where L0, L1denote the average values of black and white run length respectively.
H0, H1denote the entropy black and white run length respectively.

4.8.2 2-D Run Length Encoding

1-D RLE concepts are extended to create a variety of 2D coding procedures. One of the most commonly
used 2D RLE encoding is RLC(Relative Address coding). RAC is based on the principle of tracking the
binary transitions that begin and end each black and white run.
Illustration
2D RLE is explained with example shown below. Each line is encoded with reference to previous
line.
Let,
c-refer to current transition
e- refer to previous transition in the current line.
c’- refer to similar transition in the previous line after’e’.

If c is below c’, distance d=0

If ec>cc’ then d=ec.

Based on‘d’, approximate code is chosen from the table shown below.
P revio us ro w

C u r r e n t t r a n s it i o n = 0
c
C urrent ro w e ec
= 1

D ist an c e m e as ur ed D ist anc e Code D ist anc e r ang e C o d e h ( d)

cc’ 0 0 1- 4 0 Χ Χ

e c or c c ’ (left) 1 100 5- 2 0 10 Χ Χ Χ Χ

cc’ (rig ht) 1 101 21-84 110 Χ Χ Χ Χ Χ Χ

ec’ d ( d > 1) 1 1 1 h ( d) 85-340 1110 Χ Χ Χ Χ Χ Χ Χ Χ

c c ’ (c ’ to left) d ( d > 1) 1 1 0 0 h (d) 341-364 11110 Χ Χ Χ Χ Χ Χ Χ Χ Χ Χ

cc’ (c’ to rig ht) d ( d > 1) 1 1 0 1 h (d) 13 6 5- 5 46 0 111110 Χ Χ Χ Χ Χ Χ Χ Χ Χ Χ

For the example shown


ec>cc’, cc’ is to be coded.
Cc’=4 i.e. d>1 and c’ is to the left of ‘c’.
Then from the table, the code is1100h(d).

Determine h(d)
Since cc’=4, it falls in the distance range 1 – 4, the corresponding code is 0xx where xx refers to
binary code. Since cc’=4 from the table it is clear that h(d)=011.

Value Xx
1 00
2 01
3 10
4 11
The code for the shown example is 1100011
4.9 Lossless predictive coding
Predictive coding is a compression technique based on eliminating interpixel redundancies. Predictive
coding techniques are broadly classified into two types: Lossless Predictive coding and Lossy Predictive
coding.

Lossless predictive coding predicts the value of each pixel by using the values of its neighboring pixels.
The new information of a pixel is defined as the difference between the actual pixel value and predicted
value of that pixel. Therefore, every pixel is encoded with a prediction error rather than its original value.
Typically, the errors are much smaller compared with the original value so that fewer bits are required to
store them.

Figure shows the elements of a lossless predictive coding system. The system consists of an encoder and
a decoder, each containing an identical predictor. Let fn be the input pixel value introduced to the
encoder. The predictor generates the anticipated value of that pixel based on some number of past inputs.
The output of the predictor is then rounded to the nearest integer denoted as fn.

The prediction error is then computed as


= − ̂

This difference is coded using variable length code by the symbol encoder to generate the next element of
the compressed data stream. The decoder reconstructs en from the received variable length code words
and performs the inverse operation
= + ̂

The prediction is formed by a linear combination of m previous pixels, that is

̂ = [∑ 𝛼 − ]
=

Where m is the order of the linear predictor, round is a function used to denote the rounding or nearest
integer operations and 𝛼 are prediction coefficients. In raster scan applications, the subscript n indexes
the predictor outputs in accordance with their time of occurrence. In 1-D linear prediction ̂ , is a
function of the previous pixels on the current line alone. In 2-D predictive coding, the prediction is a
function of the previous pixels in a left to right, top to bottom scan of an image. In 3-D case, it is based on
these pixels and the previous pixels of preceding frames.

4.10 Lossy Compression:


Lossy compression is based on the concept of compromising the accuracy of the reconstructed image in
exchange for increased compression.

4.11 Lossy Predictive Coding


Like lossless predictive coding schemes, the basic principle of lossy predictivecoding is also the
prediction of current sample, based on the past samples,usually picked up from the neighborhood of the
current pixels for images. In lossy predictive coding a quantizer block is added to the encoder as shown in
the figure .Theerror in prediction, given by
= ̂−
The prediction error is quantized to obtain ̇ which is further encoded using a symbol encoder and
transmitted.
At the decoder, the quantized error signal ̇ is added to the predictedsample ̂ to generate the
reconstructed sample ̇ . The reconstructed sample is given by
̇ = ̇ + ̂
The reconstructed samples are not equal to the original samples because of the introduction of the
quantizer. The reconstructed sample (and a set of past reconstructed samples) is used togenerate the next
predicted sample. Identical predictors should exist at bothencoder and the decoder to prevent error
accumulation. The encodershould also derive the reconstructed sample, in accordance with equation
.The encoder thus contains a feedback path to derive the predicted sample, as shown in Figure .

4.11.1 Vector Quantization


VQ is a block coding technique that quantizes blocks of data instead of single sample. VQ exploits the
correlation existing between neighboring samples by quantizing them together.
Encoder
At the encoder the input image is portioned into a set of non overlapping image blocks. The encoder
is made of a code book with corresponding indices and a search engine. Each image block is fed to the
search engine and the search engine finds the closest code word foe the given image from the code book.
The closest code word for a given block is the one in the code book that has the minimum squared
Euclidean distance from the input block. The index corresponding to the closest code word is transmitted.
Thus a high rate of compression can be achieved, since instead of transmitting the image block only the
index is transmitted.

Decoder
 At the decoding end, the decoder will have a similar code book as that the transmitter has got. As
the decoder receives the index of the code word, it retrieves the code word corresponding to the
index from the code book.

Note: The performance of VQ is directly proportional to the code book size and the vector size. The code
book used should be optimal to reduce compression error.
Encoder Decoder

I/P
Code book Indices Indices Code book
Search
vector Engine

O/P
vector

Types of Vector Quantization Technique


1. Tree search Vector Quantization
2. Multistage Vector Quantization
3. Mean removed Vector Quantization
4. Gain shape Vector Quantization
5. classified Vector Quantization
6. Hierarchical Vector Quantization
7. Interpolative Vector Quantization
8. Lapped Vector Quantization
9. Lattice Vector Quantization
4.11.2 Transform Coding
Transform coding is a frequency domain method which modifies the transform of an image to
achieve compression.
Encoder
Input construct Forward Symbol Compressed
image nxn transform Quantizer
Encoder image
NXN Sub images

Step 1: Construct n x n sub images


It is very difficult to apply transform on the entire image. Hence for ease of operation the image is
subdivided into manageable blocks of size 8x8 or 16x16. thus NXN image is subdivided into n x n
sub images.
Step 2: Forward Transform
The sub images are then converted into frequency domain using any reversible linear transform

Transform selection: Generally DFT,DCT,Walsh Hadamard or KLT can be used. The goal of the
transform is to pack as much information as possible into the smallest number of transform coefficients.
The choice of a particular transform in a given application depends on the amount of reconstruction error
that can be tolerated and computational resources available.

DFT – Good energy compaction but leads to visible blocking artifacts.


DCT – Energy compaction, poor energy compaction.
WHT – Simple computation , poor energy compaction.
KLT- Best energy compaction[i.e. better than DCT], but data dependent and hence complex
computation, used rarely.

Thus using any one of the transform the sub image is transformed to frequency domain and the energy is
packed into few coefficients

Quantizer:
Compression is achieved in this stage. The quantizer quantizes the transformed coefficients. It
selectively eliminates or more coarsely quantizes the coefficients that carry least information.
Symbol Encoder:
The symbol encoder encodes the quantized values using any one of the variable length code.
Decoder
The decoder consists of following blocks

C o m pressed D eco m pressed


Sy m bol Inverse M erge nxn
image
decoder tr a n s f o r m sub images
ima ge

Symbol decoder:
It decodes the variable length coded data to obtain the transform coefficients.

Inverse transform
The transform coefficients are then subjected to inverse transform to obtain their pixel values.
Block Merger
The nxn sub images are decompressed individually and finally the merger merges them to obtain the
entire image [i.e. decompressed image output].

Blocking Artifacts
It is the block like appearance that results when the boundaries between sub images become visible
after merging the decompressed subimages to get back the entire image.

Types of transform coding


Transform coding is broadly classified into two types
T ra nsfor m c o din g

A d a ptive T ransfor m N o n A d a ptive

coding T ra nsfor m c o din g

If transfor m enc o din g steps ada pts I f t h e s t e p s a r e f i x e d , it is k n o w n

t o l o c a l i m a g e c o n t e n t, it is k n o w n as Ad a ptiv e Tra nsfor m c o din g

as ada ptive transfor m co din g

Bit Allocation
The process of truncating, quantizing and coding the coefficients of a transformed subimage is
commonly called bit allocation.
T w o types

Z o n al co din g
T hresh old co din g

Zonal coding
In this method the retained coefficients are selected on the basis of maximum variance. A zonal
mask is constructed by placing a ‘1’ in the locations of max variance and a ‘0’ in all other locations.
Coefficients of maximum variance are usually located around the origin of an image transform, resulting
in a typical mask as shown below.
1 1 1 1 1 0 0 0
1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Threshold coding

This is adaptive in the sense that the location of the transform coefficients retained for each sub
image varies from one to another. For any sub image the transform coefficients of largest magnitude
make the most significant contribution to reconstruct the image.
T hree types of T hresh old co din g

A d i f f e r e n t t h r e s h o l d is u s e d
A s i n g l e g l o b a l t h r e s h o l d is T h r e s h o l d is v a r i e d a s a
for eac h su b i m a ges to select
ap plied to all su b i m a ges functio n of eac h
N lar gest coefficie nts
co efficient locatio n in
the su b i m a ge

L e v el of co m pressio n A c hie ves variable


C o d e r a t e is
d if f e r s f r o m i m a g e t o co d e rate
co nsta nt
image and hence
ac hiev es variable co d e

rate

A ty pical thresh old co din g m as k

1 1 1 0 0 0 0 0

1 1 1 0 0 0 0 0

0 0 1 0 1 0 0 0

0 0 0 1 1 0 0 0

0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0
4.12 Image Compression Standards (or) Continuous Tone Image Compression Standards [JPEG]
 Used to compress monochrome and color image.
 Based on lossy transform coding.
T hree types

JP E G JP E G L S
JP E G 2000

(D C T Based) ( Ad a ptive Pre dictio n


( W a v elet Base d)
sche me)

4.12.1 JPEG [Joint Photographic Experts Group]


It is one of the most popular and comprehensive compression standard. It defines three different
coding systems:
i) Lossy baseline coding systems [Sequential baseline coding]
It is used in almost all compression applications

ii )Extended coding system


It is used for greater compression and higher precision

iii)Lossless independent coding system


It is used where reversible compression is required.

JPEG Encoder:
A typical JPEG encoder is shown in the figure below

Inp ut C o m pressed
8 x8 bloc k For ward N o r m a lizer/ Sy m bol
image
e xtractor DCT Q u a ntizer Encoder
ima ge

Step 1: Block Extraction

The first step is to sub divide the input image into non overlapping pixel blocks of size 8x8,which is
processed left to right , top to bottom. Each 8x8 block or sub image contains 64 pixel values. These
64 pixel values are level shifted by subtracting 2m-1 from its value, where 2m is the maximum
number of gray levels. Level shifting is done so that the pixel values lie between -127 to +127

Step 2: Compute forward DCT

In the second step,2-D DCT of the sub images is calculated.2-D DCT is calculated using the
formula,
N 1 N 1
 (2 x  1)u   (2 y  1)v 
T (u, v)   (u) (v) f ( x, y) cos   cos  
x 0 y 0  2N   2N 
Where

 1 / N for u  0 
 1 / N for v  0
 (u )    (v )  

 2 / N for u  0 
 2 / N for v  0

Step 3: Normalizer/Quantizer

The resulting DCT co efficients of step 2 are then simultaneously normalized and quantized using
the relation,
T (u, v)
Tˆ (u, v)  round
Z (u, v)
Where
Tˆ (u, v)  DCT transformed matrix
Z (u, v)  normalisation array

Step 4: Symbol Encoder


 In this step the elements of Tˆ (u, v) is encoded using variable length code. To encode, first the
elements are reordered in accordance with the zig zag pattern
 T̂ (0,0) is called the DC coefficients and other elements are called the AC coefficients.
 The DC coefficient is difference coded relative to the DC coefficients of the previous sub image.
 The non zero AC coefficients are coded using a variable length code that defines the coefficients
value and number of preceding zeros. The codes are generated using Huffman code look up table.

Decoder

Symbol decoder:
Since the compressed bit stream is a Huffman coded binary sequence, it is instantaneous and
uniquely decodable. The symbol decoder decodes the binary sequence using a similar look up table as
̂
that was used in the encoder. The recovered data is denoted as (u,v)

Denormalizer:
Since quantization is irreversible, only denormalising is done in this step. To obtain the
̂
denormalised matrix, as (u,v) is multiplied with Z(u,v) , the normalization array.
̂
T(u,v) = (u,v) z(u,v)
The denormalized array is array is subjected to inverse DCT to obtain the reconstructed sub image
in spatial domain. To the pixel values obtained add 2m-1 to serve the level shifting process. Thus in this
stage the 8x8 sub image blocks are reconstructed.
8x8 block merger:
All the reconstructed 8x8 sub image blocks are merged to obtain the complete reconstructed
image.

4.12.2 JPEG 2000


Extends the initial JPEG standard to provide increased flexibility in both the compression of still
images and access to the compressed data.

It is based on wavelet coding. Since wavelet transform can be directly applied on the entire image,
here the block extraction step is not required.

JPEG 2000 Encoder

The first step of the encoding process is to level shift the pixels of the image by subtracting 2m-1 from
each pixel value, where 2m is the number of gray levels in the image.
| 𝑏 , |
𝑏 , = [ 𝑏 , ]∙ [ ]
∆𝑏
Where ∆𝑏 → .

Symbol Encoder:

 Coefficients of each transformed sub band components are arranged into rectangular
blocks called code blocks.
 Starting from the most significant bit plane with a non zero element, each bit plane is
processed in three passes.
 The outputs are then arithmetically coded
 Code blocks with similar passes are grouped together to form layers.
 The resulting layers are finally portioned into packets which are the fundamental unit of
the encoded code stream.

JPEG 2000 DECODER:


The received packetized code stream is decoded using the inverse of the operations described in
the encoder. The layers are extracted from the packets and are arithmetically decoded.
Let Mb → no. of bit planes encoded.
Nb → no of bit planes selected by the user to be decoded.
In the second step one dimensional DWI of the rows and columns of the image is computed. The
transform is computed using fast wavelet transform or using lifting based approach.

The wavelet transform when applied on an image produces four sub bands as shown below.
In the first level of decomposition, the image is decomposed into 4 levels, ‘LL,LH,HL,HH’. Further
decompositions can be achieved by acting upon the LL subband successively and the resultant image is
split into multiple bands. As the levels of decomposition increases the compression ratio increases and
also the reconstruction error increases. The standard does not specify the number of levels of
decomposition.

Quantization:
‘LL’ sub band coefficients carry the most important visual information, hence it sufficient to
quantize the LL sub band coefficients. Any transform coefficient 𝑏 , of sub band ‘b’ is quantized to
value 𝑏 , using,
Then the resulting coefficients ̅𝑏 , are dequantized using the relation,
( ̅𝑏 , + 𝑏− 𝑏 , ) ∙ ∆ , ̅ , >
𝑏 𝑏
𝑞𝑏 , = { ( ̅𝑏 , − 𝑏 − 𝑏 ,
) ∙ ∆𝑏 , ̅𝑏 , <
, ̅𝑏 , =
Where 𝑞𝑏 , → ℎ .

The dequantized transform coefficients are then inverse transformed using IFWT filter bank.

Finally the output of the IFWT filter bank is level shifted by adding 2m-1to obtain the
decompressed image

4.12.3 MPEG [Video Compression Standards]


 Includes methods to reduce temporal or frame to frame redundancies.
 MPEG (Motion Picture Experts Group) developed the standards for multimedia video
compression and is known as MPEG standards.

MPEG has recommended four main standards,


MPEG- 1 MPEG- 2 MPEG- 4 MPEG-7

MPEG- 1
 It is an entertainment quality coding standard.
 Developed for storage and retrieval of video on digital media such as CDROM.
 It supports bit rates in the range of 1.5 Mbit/s.
MPEG-2
 This standard addresses applications involving video quality in the range between NTSC/PAL
and CCIR 601 standards.
 It supports bit rates in the range of 2-10 Mbits/s .
 It is used in cable TV distribution and narrow channel satellite broadcasting.

MPEG -4
 Both MPEG-1 & MPEG-2 aim to provide an efficient storage and transmission of digital audio
and video materials.
 Whereas MPEG-4 provides
 Improved video compression efficiency
 Content based interactivity such as Av object based access
 Efficient integration of natural and synthetic data
 Increased robustness in error prone environments
 Universal access
 Ability to add or drop Av objects and
 Object resolution capability

MPEG-4 provides a bit rate of 5-64 k bit/s for mobile and PSTN[public Switched Telephone Network]
applications and upto 4Mbit/s for TV and flim applications .
 It supports both constant and variable bit rate coding.
 Also it supports internet and various multimedia applications

MPEG-7
 It supports various multimedia applications specifically in search engines – structure and content
description of compressed multimedia information is stored.

MPEG Encoder
 Figure shows a typical MPEG encoder.
 It is a hybrid block-based DDCM/DCT coding scheme.
 It exploits redundancies within and between adjacent video frames,motion uniformity
between frames and psychovisual properties of HVS
 The input of the encoder is an 8x8 array of pixels called an image block.

Macroblock is defined as a 2x2 array of image blocks. Slice is a row of non-overlapping macroblocks.
For color video the macroblock is composed of 4 luminance blocks y1,y2,y3,y4& 2 chrominance blocks
cb & cr .Because the eye has far less spatial acuity for for color than luminance,sampling is done in the
following ratio.

: : : cb : c r
Operation
The input given to the encoder can be of two types.It will be either a conventional image block
or the difference between a conventional image block and its prediction based on previous and/or future
video frames.The difference block is transformed using DCT.The transform coefficients are quantized
and variable length coding is done to get the output. The encoded output frames can be of three types

1. I frame [inter frame or independent frame]


2. P frame [predictive frame]
3.B frame [Bidirectional frame]

MPEG Encoder:

I frame: [Intra frame or Independent frame]


 An I- frame is compressed independently of all previous and future video frames.
 It highly resembles a JPEG encoded image.
 I frame is the reference for the motion estimation needed to generate subsequent P and B frames.
 It provides the highest degree of random access, ease of editing and greatest resistance to the propagation
of error.
 Thus all standards require periodic insertion of I frames into the compressed code stream.

P- Frame [periodic frame]


 A P-frame is the compressed difference between the current frame and a prediction of it based on the
previous I or P frame.
 The difference is formed in the leftmost summer of figure.
 The prediction is motion compensated and the motion vector is computed.
 The computed motion vector is variable length coded and transmitted as an integral part of the encoded
data stream.
 Motion estimation is carried out on the macro block level.

Bframe[Bidirectional frame]
 A B-frame is the compressed difference between the current frame and a prediction of it based on
the previous I or P frame and next P frame.
 Since both past and future frames are required to predict ‘B’ frame there is increase in encoding
and decoding delay.
 Compression achieved is very high.
The decoder must have access to both past and future reference frames. The encoded frames are
therefore reordered before transmission. The decoder reconstructs and displays them in the proper
sequence.
To adjust the quantization parameters in order to match the generated bit stream and video
channel capacity. A ratecontroller is used. Their rate controller works based on the content of the
output buffer. As the buffer becomes full, the quantization is made coarser, so that fewer bits stream
into the buffer.
Example problems

Decode the message 0.23355 given the coding model.

Symbol A E I O U !
Probability 0.2 0.3 0.1 0.2 0.1 0.1

Solution: since the given message is a decimal number, the coding method is arithmetic coding.

Step 1: since the message value is 0.23355,it lies between the range 0.2 to 0.5 expand the alphabet ‘e’.
D=0.5 -0.2=0.3
Range of ‘a’= 0.2 +0.2(0.3)
= 0.26
‘e’ = 0.26 +0.3(0.3)
=0.35
‘I’=0.35 + 0.1(0.3)
= 0.38.
‘o’=0.38 + 0.2(0.3)
= 0.44
‘u’= 0.44 + 0.1 (0.3)
=0.47
‘!’ = 0.47 +0.1 (0.3)
=0.5
Step 2 :
Since 0.23355 lies between .2 and .26 expand ‘a’
d=.26-.2=.06
Range of ‘a’=.2+.2(0.06) =.230
‘e’=.230+.1(0.06)=.236
‘u’=.248+.1(0.06)=254
‘!’=.254+.1(0.06)=.26
‘o’=.236+.2(0.06)=.248

Step 3:
Since .23355 lies between .230to .236 expand ‘i’
d=0.006
Range of ‘a’=.230+.2(0.006) =.2312
‘e’=.2312+.3(0.006)=.2330
‘o’=.2336+.2(0.006) =.2348
‘u’=.2348+.1(0.006)=.2354
‘i’=.2330+.1(0.006)=.2336
‘!’=.2354+.1(0.006) =.236

Step 4: since .23355 lies between .2330 to .336 expand ‘I’


D=.2336-.2330=.0006
Range of ‘a’=.2330+.2(0.0006) =.23312
‘e’=.23312+.3(0.0006)=.23330
‘i’=.23330+.1(0.0006)=.23336
‘o’=.23336+.2(0.0006) =.23348
‘u’=.23348+.1(0.0006)=.23354
‘!’=.23354+.1(0.0006) =.2336
Step 5:
Since .23355 lies between .23354 & .2336 and the symbol in the range is ‘!’, it indicates end of
word.
The decoded message is eaii!

2. Decode the message 00010110011011000 for the given symbols and probability.

Symbol a l m y
Probability 4/9 2/9 2/9 1/9

Solution: construct Huffman code. Since the message is in binary.


Since Huffman code is uniquely decodable, look for the codes from left to right.

Decoding 000 1 01 1 001 1 01 1 000


M a l a y a l a m

3. Construct Huffman code for the word ‘committee’. Construct the code word for committee

Solution
Total number of letters in the word=9
Determine the probability of each letter in the word for ex ‘c’=1/9, e=2/9 and so on

Answer 00100011010000011110101

You might also like