Image Compression Based On Compressive Sensing: End-to-End Comparison With JPEG
Image Compression Based On Compressive Sensing: End-to-End Comparison With JPEG
Abstract—We present an end-to-end image compression system CS is not a complete signal compression system because its
based on compressive sensing. The presented system integrates “compressed signal”, the measurements vector, is an array of
the conventional scheme of compressive sampling and recon- real numbers rather than a sequence of bits or bytes. Thus, in
struction with quantization and entropy coding. The compression
performance, in terms of decoded image quality versus data rate, order to build a complete system we added a source coding
is shown to be comparable with JPEG and significantly better at stage, in which the measurements are quantized, and a channel
arXiv:1706.01000v1 [cs.CV] 3 Jun 2017
the low rate range. We study the parameters that influence the coding stage, which the quantized measurements into a byte
system performance, including (i) the choice of sensing matrix, sequence, using entropy coding.
(ii) the trade-off between quantization and compression ratio,
and (iii) the reconstruction algorithms. We propose an effective
method to jointly control the quantization step and compression A. Related Work
ratio in order to achieve near optimal quality at any given bit
rate. Furthermore, our proposed image compression system can Goyal et al. [8] applied an information-theoretic approach
be directly used in the compressive sensing camera, e.g. the to assess the effectiveness of a CS based compression system
single pixel camera, to construct a hardware compressive sampling for sparse signals x ∈ RN , with only K non-zero entries,
system.
a.k.a. K-sparse. Their benchmark was the “baseline” method,
Index Terms—Compressive sensing, image compression, quan- where the coded bit sequence consisted of the sparsity pattern
tization, entropy coding, sparse coding, reconstruction, JPEG, (i.e. a specification of the indices of the non-zero entries in x)
JPEG2000. and the quantized and coded non-zero entries. They showed
that the rate-distortion functions of a CS-based compression
I. I NTRODUCTION system are considerably worse than those of the baseline
Compressive Sensing (CS) [2], [3], [4] has been proposed method for two reasons. First, the number of measurement
more than a decade ago as a method for dimensionality M required by CS to recover x is several times larger than
reduction of signals which are known to be sparse or com- K, M/K ≥ log(N/K), and the number of bits needed to
pressible in a specific basis representation. By “sparse” we represent the additional M − K quantized variables exceeds
mean that only a relatively small number of the coefficients the number of bits needed to specify the sparsity pattern,
of the representation are non-zero, whereas “compressible” especially when the number of bits per measurement is high.
indicates that the magnitude of the coefficients decays quickly, Second, the quantization noise in the baseline method is
according to a power law, hence the signal can be well proportional to K, whereas in CS with a random sensing
approximated by a sparse signal. In the CS paradigm, the matrix, it is proportional to M . Goyal et al. suggest that the
signal is projected onto a low-dimension space, resulting in use of distributed lossless coding and entropy-coded dithered
a measurements vector. If the signal is sparse, it is possible quantization might potentially alleviate those problems, but
to exactly reconstruct it from the measurements vector. If the complexity added by those methods would probably make
the measurements are noisy or the signal is not sparse but them impractical.
compressible the reconstruction yields an approximation to the Despite this pessimistic outlook, the research of the effect
original signal. Natural images are inherently compressible in of quantization on CS measurements got significant attention
the frequency or wavelet domain and therefore suitable for in the last few years. Laska and Baraniuk studied the trade-off
CS. The past five years saw an impressive progress in this between the number of measurements and quantization accu-
field with new reconstruction algorithms [5], [6], [7] achieving racy [9] and showed that the sensitivity of the reconstruction
better reconstructed image quality at a lower compression ratio accuracy to the quantization noise varies dramatically with
(the ratio of the dimension of the measurements vector to the the compression ratio. Laska et al. showed that even a small
number of pixels in the original image). These algorithms number of saturated measurements, i.e. measurements whose
go beyond sparsity and leverage other properties of natural quantization error is not bounded, may cause a considerable
images, such as having low rank [6], or being being capable degradation on reconstruction accuracy [10]. Methods for
of denoising [7]. accounting for the quantization effect in the reconstruction
Encouraged by these achievements, we set out to create an algorithm were studied in [11]. The extreme case of 1-bit
end-to-end image compression system based on CS. In itself quantizer was investigated in [9], [12], [11]. The asymptotic
normality and approximate independence of measurements
The authors are with Nokia Bell Labs, 600 Mountain Avenue, Murray Hill, generated by various sensing matrices were shown in [13].
NJ, 07974, USA, [email protected]. The MATLAB code used to generate
the results in this paper can be downloaded at CS vs. JPEG Demo and more Dai et al. compared various quantizer designs and studied their
results are available at [1]. effect on reconstruction accuracy [14], [15]. The distribution
2
Figure 1
of measurements generated by video and image CS systems,
which included quantization, were also described in [16], [17].
However this significant body of research was of limited value
for our purposes. First, these works assume a random, or a CS (1879 Bytes)
Lossy Lossless
Possible implementation by CS camera
Fig. 2. Image compression architecture comparison between proposed CSbIC (top) and JPEG (bottom).
ternative to the conventional methods (refer to Figs 4-6 for which leverage this property [22], [29]. These matrices are not
comparison with JPEG). incoherent with the common sparsity bases of natural images,
The rest of this paper is organized as follows. Sec. II as classical CS theory would require for guaranteeing robust
describes the system architecture and discusses the design reconstruction [2], [4], but generally they perform better than
choices in each component. Sec. III provides the general the classical sensing matrices in our application, i.e. image
framework of reconstruction algorithms. Sec. IV presents the compression. For example, (1) can be implemented by per-
results of our performance testing and Sec. V discusses the forming 2D-DCT on the image pixels and then reordering the
implications of our results. resulting 2D-DCT coefficients in a “zig-zag” order (similar
to the one used in JPEG encoding) and selecting the first M
II. S YSTEM A RCHITECTURE low-frequency coefficients [22].
In some applications, such as the single-pixel camera and
A diagram of the system architecture is given in Fig. 2. lensless camera [20], [21], [22], a binary-valued matrix, e.g. a
Each of the encoding steps is matched by a corresponding matrix whose entries are only ±1 (or {0, 1}) is more suitable
decoding step (in reverse order), with the exception of the bit for hardware implementation. In this case we approximate the
rate/quality control block, which appears only in the encoder. 2D frequency decomposition by using a 2D Walsh-Hadamard
In the following we present a detailed description of each of transform (2D-WHT) [30]. Let
those processing steps.
W = Wh ⊗ Wv , (2)
A. Sensing Matrix and Measurements Generation
where ⊗ denotes the Kronecker product [31] and Wh , Wv
We consider monochromatic images of Nv ×Nh pixels. The are Walsh-Hadamard matrices in sequency order [32], that is,
pixels of the input image are organized column by column as the k th row of Wh , Wv , has k − 1 zero crossings (if Nv or
a pixel vector x ∈ RN (N = Nv Nh ). The pixel vector x is Nh is not a power of two, the image is padded by zero pixels).
multiplied by a sensing matrix Φ ∈ RM ×N , M N , yielding Similar to the 2D-DCT case, the selected measurements are
the measurements vector the first M coefficients of the zig-zag ordered entries of
Wx [29]. Note that Wx can also be computed numerically in
y = Φx. (1)
an efficient way, using the Fast Walsh-Hadamard Transform.
Φ is quite a large matrix, even for small images. Therefore, We can get the CS theoretical guarantee for successful
for practical reasons, Φ is never stored, and the operation reconstruction, w.h.p. (with high probability), by replacing the
(1) is implemented as a fast transform. It is well known that deterministic matrices described above with random matrices,
a 2-dimensional discrete cosine transform (2D-DCT) is very whose entries are independent, identically distributed (IID)
effective in decorrelating an image, and most of the energy random variables (RVs), with Gaussian or Rademacher dis-
of the image is concentrated in the low frequency transform tributions [33], [34]. These matrices are universal, i.e. w.h.p.
coefficients. Recently, new sensing matrices were introduced they are incoherent with any given sparsity basis. Further-
4
more, the measurements generated by those random matri- If Q(y) = c and |c| < L, we define the dequantizer by
ces are mutually independent and asymptotically normally
Q−1 (c) = cs + µ, (5)
distributed, which is helpful in the quantization and coding
design. Such fully random matrices do not allow fast transform hence the quantization error is bounded by
implementation of (1), but similar desired properties and
performance guarantees were shown for structurally random |y − Q−1 (Q(y))| ≤ 0.5s. (6)
matrices (SRM) [13], [35], [36], where Φx is obtained by On the other hand, if |c| = L, the quantized measurement
applying a randomly selected permutation to x, computing a is saturated and the quantization error cannot be bounded.
fast transform, such as DCT or WHT, on the permuted vector, Even a small number of saturated measurements can cause
and randomly selecting M of the transform coefficients as severe quality degradation unless they are specially handled
measurements (the DC coefficient is always selected). We de- in the reconstruction [10]. The simplest way to do it is by
note these matrices SRM-DCT and SRM-WHT, respectively. not using the saturated measurements at all; attempts to mod-
ify the reconstruction algorithm to use those measurements
B. Quantization showed little gain over simply discarding them. Another option
(not considered in [10]) is to code the value of Q̃(y) for
The quantizer maps the measurements vector y into a each saturated measurement y in some ad hoc method and
finite sequence of codewords taken from a finite codebook transmit is as additional information. In both cases saturated
C and the dequantizer maps the codewords sequence into a measurements incur a penalty, either in the form of transmitted
measurements vector which is an approximation of the original codewords which are not used, or as ad hoc transmission of
measurements vector. If the sensing matrix is deterministic Q̃(y). Therefore, we select L large enough to make saturation
the measurements are highly uncorrelated; if it is a SRM the a rare event. In fact, L can be set sufficiently large to eliminate
measurements are nearly independent. Hence the advantage saturation completely, but a very large codebook may have an
of vector quantization [37] over scalar quantization is small adverse effect on channel coding (see Sec. II-D3). We found
and does not justify its added complexity [14]. Therefore, that a good trade-off is to select L so that quantizer’s range
we consider a scalar quantizer which maps each measurement s(L−0.5) is about 4 standard deviations of the measurements.
{yi }M
i=1 to a codeword qi = Qi (yi ) ∈ C, where Qi : R → C is However, the system is not very sensitive to this parameter
the quantizer of the ith measurement. In this work we use the — performance does not change much if the range is 3 or
same quantizer for all measurements, hence in the following 6 standard deviations. We also compared ignoring saturated
we omit the subscript i from Qi . measurements to sending Q̃(y) for each of them. We chose
The simplest scalar quantizer is the uniform quantizer. We the latter because it performed slightly better and had the
select the “mid-tread” type, defined by important advantage of not requiring any change in the re-
def construction algorithm.
Q(y) = max(−L, min(L, Q̃(y))), (3) With all the sensing matrices considered, the first measure-
def
Q̃(y) = b(y − µ)/s + 0.5c, (4) ment y0 is the DC coefficient, which is the sum of all pixels
in the image. Since the pixels are unsigned, y0 is much larger
1
PMbyc denotes the largest integer not exceeding y, µ =
where than the other measurements and is always saturated, therefore
M i=1 yi is the mean of the measurements, s is the quan- it requires special handling: y0 is excluded when calculating
tizer’s step, Q̃(y) is the unclipped quantized value, and L is the mean (µ), and the standard deviation of the measurements,
a positive integer which determines the range s(L − 0.5) of and Q(y0 ) = L is not included in the quantized measurement.
the actual quantizer Q(y). Consequently there are 2L + 1 Instead, Q̃(y0 ) is coded in an ad hoc fashion and transmitted
codewords, C = {−L, · · · , L}. Since the distribution of the separately.
measurements is highly non-uniform, the codewords distribu- Unless the quantization is very coarse, its effect on the
tion is also not uniform, hence in order to represent codewords measurements can be modeled as adding white noise, uni-
effectively by a bit sequence we need to use variable length formly distributed in [−s/2, s/2], which is uncorrelated with
coding (VLC) in the channel coder. On the other hand, an the measurements [37]. Hence the variance of the quantization
optimal quantizer (in the mean square sense) or an entropy noise in each measurement is
constrained quantizer [37] usually results in nearly equally
2
populated quantization regions, which makes it possible to σQ = s2 /12. (7)
use fixed length coding (FLC) with little data rate penalty.
The integral pixel values are generally obtained by sampling
Thus the design choice is between a simple quantizer with
an analog signal and rounding the samples values to the nearest
a sophisticated VLC, versus a sophisticated quantizer with a
integer. Hence the pixel values contain digitization noise with
simple FLC. We opted for the first option because designing
variance of 1/12. This noise appears in the measurement yj
an optimal quantizer requires knowledge of the measurements
with variance
distribution, which is difficult to estimate for a deterministic 2
σD = kφj k22 /12, (8)
sensing matrices. Another reason to use a uniform quantizer
is that the reconstruction may be sensitive to the presence of where φj is the j th row of Φ. In the sensing matrices that
even few measurements with large errors [10], which is often we consider kφj k2 is constant, kφj k2 = kΦk2 . Clearly,
the case with non-uniform quantizers. there is no point in spending bits to accurately represent the
5
digitization noise, hence we need to have σQ ≥ σD and labels into a single label L, thus we have 2L codewords:
consequently s ≥ kΦk2 . Typical quantizer step sizes are −L + 1, . . . , L.
between kΦk2 to 50kΦk2 . Let pc , c ∈ C be the probability of a measurement to be
quantized to c. If the codewords {Q(yi )}M i=1 are IID random
C. Image Quality Control variables, then a tight lower bound on the expected number of
bits required to transmit these codewords is the entropy rate:
The compression ratio R and the quantizer step size s con-
trol the coded image size and the quality of the reconstructed def P
H = −M c∈C pc log2 pc . (10)
image. One can get to the same reconstruction quality with
various combinations of these parameters but the coded image Arithmetic coding (AC) [38] represents the codeword sequence
size varies greatly. Our experiments (Section IV-B) showed by a bit sequence, the length of which can get arbitrarily close
that at any given quality the lowest bit rate is achieved when to the entropy rate for a large M . We use AC to encode the
codewords sequence. The AC bit sequence is coded as a bit
Rs = CkΦk2 , (9) array.
where C is a constant. The optimal value of C depends on Since the probabilities pc , c ∈ C are not known a priori,
the type of sensing matrix and varies from image to image. they need to be estimated and sent to the receiver, in addition
However, we found that using C = 2.0 is a good general to the AC bit sequence. These probability estimates can be
value for all pictures. Thus, in our tests s is determined by R obtained in two ways: For SRMs, the measurements are
using (9), with C = 2.0. This quantization step is sufficiently approximately normally distributed [13], hence for |c| < L,
fine to allow modeling the quantization noise as uncorrelated, pc is the normal probabilities of the quantization Pintervals
uniformly distributed white noise. [cs + µ − 0.5s, cs + µ + 0.5s), and pL = 1 − |c|<L pc .
Thus, all that needs to be sent to the receiver is the estimated
D. Lossless Coding standard deviation of the measurements, which is coded as a
real number. For deterministic sensing matrices, it is neces-
The lossless encoder encodes the codeword sequence gen- sary to compute a histogram of the quantized measurements
erated by the quantizer, as well as some miscellaneous infor- sequence, use it to determine the probabilities and then code
mation (e.g. µ, s, and the ad hoc representation of saturated the histogram and send it to the receiver along with the AC
measurements), as a bit sequence. The lossless decoder ex- bit sequence. Sending the histogram is an overhead, but it
actly decodes the codeword sequence and the miscellaneous is small in comparison to the gain achieved with arithmetic
information from the bit sequence. coding. In fact, even with measurements generated by SRM,
1) Coded Numbers Format : Various types of numbers are in many cases the total bit rate achieved when using AC with
coded by the lossless encoder (and decoded by the lossless a histogram is better than the total bit rate when using the
decoder). Each type is encoded in a different way: normal distribution assumption, because the gain obtained by
Unbounded signed or unsigned integers are integers accurately describing the actual codeword frequencies is more
whose maximal possible magnitude is not known in advance. than the overhead required of transmitting the histogram.
They are represented by byte sequences bytes, where the most In natural images the magnitudes of the coefficients of 2D-
significant bit (MSB) In each byte is a continuation bit — it DCT or 2D-WHT decay quickly, hence measurements gener-
is clear in the last byte of the sequence and set in all other ated using a deterministic sensing matrix are not identically
bytes. The rest of the bits are payload bits which represent distributed, which violates the assumptions under which AC
the integer. The number of bytes is the minimal number that is asymptotically optimal. In order to handle this problem
has enough payload bits to fully represent the unsigned or we partition the codeword sequence into sections, and for
unsigned integer. each section we compute a histogram and an AC sequence
Real numbers, which are natively stored in single or double separately. This, of course, makes the overhead of coding the
precision floating point format [38] are coded as pairs of histograms significant. In the following we describe how the
unbounded signed integers representing the mantissas and the histograms are coded efficiently and how to select a locally
exponents in the floating point format. optimal partition of the codewords sequence.
Bit arrays are zero padded to a length which is a multiple 3) Coding of Histograms : In order to be efficient, the code
of 8 and coded as a sequence of bytes, 8 bits per bytes. of histograms of short codeword sequences should be short
Bounded unsigned integer arrays are arrays of unsigned as well. Fortunately, such histograms often have many zero
integers, each of which may be represented by a fixed number counts, which can be used for efficient coding. A histogram
of bits. An array of n integers, each of which can be repre- is coded in one of three ways:
sented by b bits, is encoded as a bit array of bn bits. Full histogram: A sequence of 2L unbounded unsigned in-
Each of these number formats can be easily decoded. Note tegers, containing the counts for each codeword. This method
that these formats are byte aligned for simpler implementation. is effective when most counts are non-zero.
2) Entropy Coding : The codewords ±L represent saturated Flagged histogram: A bit array of 2L bits indicates for
measurements. Whether those measurements are ignored or each codeword whether the corresponding count is non-zero,
transmitted separately, there is no distinction between sat- and a sequence of unbounded unsigned integers contains the
uration from above or below, hence we merge these two non-zero counts. This method is effective when a significant
6
share of the counts is zero. Algorithm 1 partitioning the codeword sequence into entropy
Indexed histogram: A bounded integer indicates the num- coded sections.
ber of non-zero counts, an array of bounded integers contains 1. Initialization:
the indices of the non-zero counts, and a sequence of un- a. Partition the codewords sequence {Q(yj )}M j=1 into sec-
K
bounded unsigned integers contains the non-zero counts. This tions {Sk }k=1 such that each section contains only one
method is effective when most of the counts are zero. In the codeword (RLE).
extreme case of a single non-zero count the AC bit sequence b. Compute the histograms’ counts
def
is of zero length, hence this histogram coding is effectively a hk (c) = hSk (c), c ∈ C, k = 1 . . . K.
run length encoding (RLE). c. Compute the coded sections’ lengths using (14):
def
The histogram is coded in these three ways and the shortest lk = L̂(Sk ), ∀k = 1, . . . , K.
code is transmitted. A 2-bit histogram format selector (HFS) 2. For a fixed m > 0 Let
indicates which representation was chosen. The HFSs of all
sections of the codeword sequence are coded as a bit array. P = {(k, j)|1 ≤ k < j ≤ min(K, k + m − 1)} (11)
Thus, each section of the codeword sequence is represented and for each pair (k, j) ∈ P, let Sk,j be the section obtained
by the HFS, the selected histogram representation and the AC by merging sections Sk , . . . , Sj .
bit sequence.
a. Compute the histograms counts of Sk,j :
Pj
hk,j (c) = r=k hr (c) , c ∈ C.
b. Using the histogram and (14), compute the gain of
Raw image parameters
Header
merging Sk , . . . , Sj :
Pj
Algorithmic parameters gk,j = r=k lr − L̂(Sk,j ).
3. If ∀(k, j) ∈ P : gk,j < 0, exit.
Global image parameters 4. Update the partition:
No. of sections HCS of each section a. Let (k ∗ , j ∗ ) = arg max(k,j)∈P gk,j .
b. Merge sections Sk∗ , . . . , Sj ∗ into one section, with
… Histogram section K
body
c. Go to step 2.
Saturated measurements
our experiments. When the algorithm starts all the bits are
Fig. 3. Coded image structure. spent on HFSs and histogram representation, and none on
AC bit sequences. As the algorithm progresses and sections
4) Partitioning into AC sections: Partitioning the codeword are merged, more bits are spent on AC, and the histograms
sequence into AC sections requires estimating the number of become fewer in number, but aving more non-zero counts. It
bits in each coded section. Let {hs (c), c ∈ C} be the histogram is plausible that we could get even better compression if we
of section S, where hs (c) is the count for codeword c. In order used more efficient ways for histogram representation, e.g. by
to avoid repeated AC encoding computation, the number of using more parametric models of approximated histograms Bell Labs (in
bits5 in the AC sequence of S is estimated, based on (10), by © Nokia addition
Internal Use to the normal distribution).
• The quantization codewords are recovered by arithmetic from a quantization table and is different for each co-
decoding. efficient, resulting in quantization noise shaping, which
• The unsaturated quantized measurements are computed may give JPEG an advantage at higher quality/data rate
using (5). For the saturated measurements (those having (Fig. 4). JPEG2000 also performs quantization noise
a codeword of L), if values of Q̃(y) are transmitted, they shaping through varying step size, and in addition, its
are used. Otherwise, the values of the quantized saturated quantizer is not exactly uniform—the quantization inter-
measurements are set to zero. val around zero (the “dead-zone”) is larger than the other
• The sensing matrix is determined according to the al- quantization intervals, effectively forcing small wavelet
gorithmic choices in the coded image. If there was no coefficients to zero and reducing the amount of bits spent
ad hoc transmission of Q̃(y) for saturated measurements, on coding them. It is plausible that using noise-shaping
the rows corresponding to these measurements are set to and slight non-uniformities in the quantization would
zero. In practice this is done by replacing the original improve the performance of CSbIC as well.
sensing matrix Φ by DΦ, where D is a M ×M diagonal • The bit rate and quality trade-off in JPEG and JPEG2000
matrix whose diagonal elements are zero for saturated is controlled by tuning the operation of a single module
measurements and one for unsaturated measurements. — the quantizer. In contrast, the CSbIC this trade-off is
• The image is reconstructed using the sensing matrix and controlled by jointly tuning two different modules: The
the quantized measurements, as described in detail in projection, or measurement capturing module is tuned by
Sec. III. changing the compression ratio, and the quantizer is tuned
by changing the quantization step.
• In JPEG, entropy coding is based on Huffman coding
F. Module Comparison of CSbIC with JPEG and JPEG2000 and RLE. JPEG2000 uses arithmetic coding, with a so-
We now compare the architectures of CSbIC, JPEG and phisticated adaptive algorithm to determine the associated
JPEG2000 and consider the aspects which may lead to perfor- probabilities. CSbIC uses arithmetic coding, which is
mance differences. Fig. 2 compares the architecture of CSbIC known to be better than Huffman coding, but instead
with that of JPEG side by side. The main common points and of using adaptive estimation of the probabilities, the
differences are: codewords are partitioned into sections and for each
section a histograms of codewords is computed and sent
• JPEG, JPEG2000 and CSbIC begin with a linear projec-
as side information. The overhead of the transmitted
tion of the image onto a different space. However:
histograms may be a disadvantage of CSbIC relative to
– JPEG2000 may partition the image into tiles of JPEG2000.
varying sizes, which are processed separately. This • In JPEG and JPEG2000, the decoder generates the im-
is equivalent to using a block-diagonal projection age from the dequantized transform coefficients by an
matrix, where each block corresponds to a tile. In inverse 2D-DCT or wavelet transform, respectively — a
JPEG the tiles (referred to as blocks) are of fixed simple linear operation which does not rely on any prior
size of 8 × 8 pixels. In CSbIC the projection is done information not included in the received data. In contrast,
on the whole image, which was adequate for the the CS reconstruction in CSbIC is an iterative, non-linear
image sizes which we experimented with. For larger optimization, which relies on prior assumptions about the
images, adding tiling should be straight forward. structure of the image (e.g. sparsity).
– In both JPEG and JPEG2000, each tile/block is
projected on a space of the same dimension, N ,
hence there is no data loss in this operation, whereas III. I MAGE R ECONSTRUCTION
in CSbIC the projection is lossy since it is on a M - Since the seminal works of Candès et al. [2] and
dimensional space, M N . Donoho [3], various reconstruction algorithms have been de-
– In JPEG the projection is a 2D-DCT with the output veloped [26], [39], [40], [5], [41], [42]. The early recon-
organized in zig-zag order, while in JPEG2000 it is struction algorithms leveraged the property of natural images
a 2D-wavelet transform. In CSbIC, a 2D-DCT based of being compressible when projected by a suitable sparsity
projection is one of several options. operator D:
– JPEG uses block to block prediction of the DC
f = Dx, (15)
coefficient in order to deal with the issue of the
DC coefficient being much larger than the other where f denotes the projected vector and it is usually forced
coefficients. In JPEG2000 this is done by subtracting to be sparse. D can be a pre-defined basis (DCT or wavelet),
a fixed value from all pixels before the projection. In or learned on the fly [43]. Another popular sparsity operator is
contrast, CSbIC takes care of this issue in the quan- the Total Variation (TV), where D is a projection on a higher
tization stage. The effect of these different methods dimension space [5], [25].
is similar and has little impact on performance. Recently, better results were obtained by algorithms such
• CSbIC uses a simple uniform scalar quantizer with an as D-AMP [7] and NLR-CS [6], which exploit established
identical step size for all measurements. JPEG uses the image denoising methods or the natural images property of
same type of quantizer, but the step size is selected having a low rank on small patches. While using different
Figure 4
8
Fig. 4. Performance diagrams, SSIM vs. compressed file size (in bytes), comparing JPEG (black solid curves), JPEG2000 (black dash curves) with CSbIC
compression using different sensing matrices – 2D-DCT (solid) and 2D-WHT (dash), and different reconstruction algorithms — GAP-TV (blue), NLR-CS
(red) and D-AMP (green).
projection operators D, most of these algorithms compute x̂, and the reconstructed image quality, measured by structural
the estimated signal, by solving the minimization problem similarity (SSIM) [46]. We preferred SSIM over Peak-Signal-
to-Noise-Ratio (PSNR) as a measure of quality because SSIM
x̂ = argminx kDxkp , s.t. y = Φx, (16) better matches the quality perception of human visual system
where p can be 0 (kf k0 denotes the number of non-zero (refer to Fig. 5 and [1] for PSNR results, which provide similar
components in f ) or 1, using kf k1 as a computationally- observations to SSIM). This observation is verified by the
tractable approximation to kf k0 [44]. Alternatively, k kp can exemplar reconstructed images compares with JPEG images
stand for the nuclear norm k k∗ to impose a low rank in Fig. 6. Results are presented as points in SSIM vs. coded
assumption [6]. image size diagram. By connecting the points corresponding to
test cases with different rate conntrol parameters (but identical
Problem (16) is usually solved iteratively, where each itera-
in all other parametrs) we get performance curves which
tion consists of two steps [24]. Beginning with an initial guess,
can be compared with each other in order to determine the
the first step in each iteration projects the current guess onto
better operating parameters. Performance curves for JPEG or
the subspace of images which are consistent with the measure-
JPEG2000 were obtained using the MATLAB implementation
ments, and the second step denoises the results obtained in the
of those standards (the “imwrite” function). All the results
first step. Various projection and denoising algorithms [43],
presented in this section and more results can be found at [1],
[45] can be employed in this general framework in order to
and these results can be reproduced by the code downloadable
achieve excellent results.
from [1].
In our experiments we have used an improved variant of TV
known as Generalized Alternating Projection Total Variation
(GAP-TV) [28], as well as D-AMP and NLR-CS. A. Effect of The Choice of Sensing Matrix
The performance with the deterministic sensing matrices,
2D-DCT and 2D-WHT, was always significantly better than
IV. P ERFORMANCE T ESTING
with the SRMs with the same fast transforms, SRM-DCT
Our test material consisted of 8 monochrome images of and SRM-WHT, respectively (Fig. 7), regardless of the re-
256x256 8-bit pixels (Barbara, Boats, Cameraman, Foreman, construction algorithm which was used. Within each of those
House, Lenna, Monarch, Parrots — see Fig. 4). These images groups (deterministic matrices and SRMs), the sensing ma-
were processed in a variety of test conditions, specified by trices based on DCT generally yielded better performance
encoder and reconstruction parameters. The outcome of pro- than the those based on WHT (Fig. 7 right). However, (i) the
cessing an image in a particular test condition is the data rate, difference, in terms of SSIM for the same file size, is generally
as expressed by the size of the coded image file (in bytes), smaller than the performance difference between deterministic
9
Fig. 5. PSNR vs. compressed file size (in bytes), comparing JPEG (black
solid curves), JPEG2000 (black dash curves) with CSbIC compression using Fig. 7. Performance plots (SSIM vs. compressed file size in bytes) for the
different sensing matrices – 2D-DCT (solid) and 2D-WHT (dash), and two images of CS with different sensing matrices. D-AMP (left) and NLR-CS
different reconstruction algorithms — GAP-TV (blue), NLR-CS (red) and (right) are used for reconstruction. Results with 8 images and 3 algorithms
D-AMP (green). Results with 8 images are available at [1]. are available at [1].
Fig. 11. SSIM vs. compression ratio (CSr), comparing the reconstructed image with quantization (dash lines) and without quantization (solid lines) for
different reconstruction algorithms.
[13] R. Haimi-Cohen and Y. M. Lai, “Compressive measurements generated of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122,
by structurally random matrices: Asymptotic normality and quantiza- January 2011.
tion.” Signal Processing, vol. 120, pp. 71–87, 2016. [40] M. A. Figueiredo, J. M. Bioucas-Dias, and R. D. Nowak, “Majoriza-
[14] W. Dai, H. V. Pham, and O. Milenkovic, “A comparative study of tionminimization algorithms for wavelet-based image restoration,” IEEE
quantized compressive sensing schemes,” in 2009 IEEE International Transactions on Image Processing, vol. 16, no. 12, pp. 2980–2991, 2007.
Symposium on Information Theory, June 2009, pp. 11–15. [41] Y. Wang, J. Yang, W. Yin, and Y. Zhang, “A new alternating minimiza-
[15] ——, “Distortion-rate functions for quantized compressive sensing,” in tion algorithm for total variation image reconstruction,” SIAM Journal
Networking and Information Theory, 2009. ITW 2009. IEEE Information on Imaging Sciences, vol. 1, no. 3, pp. 248–272, 2008.
Theory Workshop on, June 2009, pp. 171–175. [42] X. Liao, H. Li, and L. Carin, “Generalized alternating projection for
[16] Y. Baig, E. M. K. Lai, and J. P. Lewis, “Quantization effects on weighted-`2,1 minimization with applications to model-based compres-
compressed sensing video,” in Telecommunications (ICT), 2010 IEEE sive sensing,” SIAM Journal on Imaging Sciences, vol. 7, no. 2, pp.
17th International Conference on, April 2010, pp. 935–940. 797–823, 2014.
[17] D. Venkatraman and A. Makur, “A compressive sensing approach to [43] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
object-based surveillance video coding,” in 2009 IEEE International designing overcomplete dictionaries for sparse representation,” IEEE
Conference on Acoustics, Speech and Signal Processing, April 2009, Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
pp. 3513–3516. [44] E. Candes, M. Wakin, and S. Boyd, “Enhancing sparsity by reweighted
[18] Information Technology – Digital Compression And Coding Of Con- `1 minimization,” Journal of Fourier Analysis and Applications, vol. 14,
tinuous Tone Still Images – Requirements And Guidelines, “CCITT no. 5, pp. 877–905, 2008.
Recommendation T.81,” 1992. [45] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by
[19] Information technology – JPEG 2000 image coding system: Core coding sparse 3d transform-domain collaborative filtering,” IEEE Transactions
system, “ITU-T Recommendation T.800,” 2002. on Image Processing, vol. 16, no. 8, pp. 2080–2095, August 2007.
[20] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, [46] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive quality assessment: From error visibility to structural similarity,” IEEE
sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 83–91, Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
2008.
[21] G. Huang, H. Jiang, K. Matthews, and P. Wilford, “Lensless imaging
by compressive sensing,” pp. 2101–2105, Sept 2013.
[22] J. Romberg, “Imaging via compressive sampling,” IEEE Signal Process-
ing Magazine, vol. 25, no. 2, pp. 14–20, 2008.
[23] X. Yuan, H. Jiang, G. Huang, and P. Wilford, “Lensless compressive
imaging,” arXiv:1508.03498, 2015.
[24] ——, “SLOPE: Shrinkage of local overlapping patches estimator for
lensless compressive imaging,” IEEE Sensors Journal, vol. 16, no. 22,
pp. 8091–8102, November 2016.
[25] J. Bioucas-Dias and M. Figueiredo, “A new TwIST: Two-step iterative
shrinkage/thresholding algorithms for image restoration,” IEEE Trans-
actions on Image Processing, vol. 16, no. 12, pp. 2992–3004, December
2007.
[26] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection
for sparse reconstruction: Application to compressed sensing and other
inverse problems,” IEEE Journal of Selected Topics in Signal Processing,
pp. 586–597, Dec. 2007.
[27] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE
Transactions on Signal Processing, vol. 56, no. 6, pp. 2346–2356, June
2008.
[28] X. Yuan, “Generalized alternating projection based total variation mini-
mization for compressive sensing,” in 2016 IEEE International Confer-
ence on Image Processing (ICIP), Sept 2016, pp. 2539–2543.
[29] J.-H. Ahn, “Compressive sensing and recovery for binary images,” IEEE
Transactions on Image Processing, vol. 25, no. 10, pp. 4796–4802, 2016.
[30] W. K. Pratt, J. Kane, and H. C. Andrews, “Hadamard transform image
coding,” Proceedings of the IEEE, vol. 57, no. 1, pp. 58–68, Jan 1969.
[31] M. F. Duarte and R. G. Baraniuk, “Kronecker compressive sensing,”
IEEE Transactions on Image Processing, vol. 21, no. 2, pp. 494–504,
Feb 2012.
[32] B. J. Fino and V. R. Algazi, “Unified matrix treatment of the fast
walsh-hadamard transform,” IEEE Transactions on Computers, vol. C-
25, no. 11, pp. 1142–1146, Nov 1976.
[33] E. J. Candes and T. Tao, “Near-optimal signal recovery from random
projections: Universal encoding strategies?” IEEE Trans. Inf. Theor.,
vol. 52, no. 12, pp. 5406–5425, Dec. 2006.
[34] ——, “Decoding by linear programming,” IEEE Transactions on Infor-
mation Theory, vol. 51, no. 12, pp. 4203–4215, Dec 2005.
[35] T. T. Do, L. Gan, N. H. Nguyen, and T. D. Tran, “Fast and efficient com-
pressive sensing using structurally random matrices,” IEEE Transactions
on Signal Processing, vol. 60, no. 1, pp. 139–154, Jan 2012.
[36] T. T. Do, T. D. Tran, and L. Gan, “Fast compressive sampling with
structurally random matrices,” in 2008 IEEE International Conference
on Acoustics, Speech and Signal Processing, March 2008, pp. 3369–
3372.
[37] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.
Norwell, MA, USA: Kluwer Academic Publishers, 1991.
[38] G. G. Langdon, “Arithmetic coding,” IBM J. Res. Develop, vol. 23, pp.
149–162, 1979.
[39] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed
optimization and statistical learning via the alternating direction method