Compressed Sensing
Compressed Sensing
Collection Editors:
Richard Baraniuk
Mark A. Davenport
Marco F. Duarte
Chinmay Hegde
Collection Editors:
Richard Baraniuk
Mark A. Davenport
Marco F. Duarte
Chinmay Hegde
Authors:
Richard Baraniuk
Mark A. Davenport
Marco F. Duarte
Chinmay Hegde
Jason Laska
Mona Sheikh
Wotao Yin
Online:
< http://cnx.org/content/col11133/1.5/ >
CONNEXIONS
Rice University, Houston, Texas
This selection and arrangement of content as a collection is copyrighted by Richard Baraniuk, Mark A. Davenport, Marco F. Duarte, Chinmay Hegde.
(http://creativecommons.org/licenses/by/3.0/).
Collection structure revised: April 2, 2011
PDF generated: September 23, 2011
For copyright and attribution information for the modules contained in this collection, see p. 107.
Table of Contents
1 Introduction
1.1 Introduction to compressive sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Sparse and Compressible Signal Models
2.1 Introduction to vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Bases and frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Sparse representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Compressible signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Sensing Matrices
3.1 Sensing matrix design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Null space conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 The restricted isometry property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 The RIP and the NSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Matrices that satisfy the RIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Sparse Signal Recovery via `_1 Minimization
4.1 Signal recovery via `_1 minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Noise-free signal recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Signal recovery in noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Instance-optimal guarantees revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5 The cross-polytope and phase transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Algorithms for Sparse Recovery
5.1 Sparse recovery algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Convex optimization-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Greedy algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4 Combinatorial algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Bayesian methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Applications of Compressive Sensing
6.1 Linear regression and model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Sparse error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3 Group testing and data stream algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4 Compressive medical imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.5 Analog-to-information conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.6 Single-pixel camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.7 Hyperspectral imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.8 Compressive processing of manifold-modeled data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.9 Inference using compressive measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.10 Compressive sensor networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.11 Genomic sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7 Appendices
7.1 Sub-Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Concentration of measure for sub-Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3 Proof of the RIP for sub-Gaussian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 `_1 minimization proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
iv
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Attributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Chapter 1
Introduction
1
We are in the midst of a digital revolution that is driving the development and deployment of new kinds
of sensing systems with ever-increasing delity and resolution. The theoretical foundation of this revolution
is the pioneering work of Kotelnikov, Nyquist, Shannon, and Whittaker on sampling continuous-time bandlimited signals [126], [157], [173], [211]. Their results demonstrate that signals, images, videos, and other
data can be exactly recovered from a set of uniformly spaced samples taken at the so-called
Nyquist rate
of
twice the highest frequency present in the signal of interest. Capitalizing on this discovery, much of signal
processing has moved from the analog to the digital domain and ridden the wave of Moore's law. Digitization
has enabled the creation of sensing and processing systems that are more robust, exible, cheaper and,
consequently, more widely-used than their analog counterparts.
As a result of this success, the amount of data generated by sensing systems has grown from a trickle to a
torrent. Unfortunately, in many important and emerging applications, the resulting Nyquist rate is so high
that we end up with far too many samples. Alternatively, it may simply be too costly, or even physically
impossible, to build devices capable of acquiring samples at the necessary rate. Thus, despite extraordinary
advances in computational power, the acquisition and processing of signals in application areas such as
imaging, video, medical imaging, remote surveillance, spectroscopy, and genomic data analysis continues to
pose a tremendous challenge.
To address the logistical and computational challenges involved in dealing with such high-dimensional
data, we often depend on compression, which aims at nding the most concise representation of a signal
that is able to achieve a target level of acceptable distortion. One of the most popular techniques for signal
compression is known as
sparse
or
compressible
transform coding,
N,
K N
nonzero coecients.
Both sparse and compressible signals can be represented with high delity by preserving only the values
and locations of the largest coecients of the signal. This process is called
the foundation of transform coding schemes that exploit signal sparsity and compressibility, including the
JPEG, JPEG2000, MPEG, and MP3 standards.
Leveraging the concept of transform coding,
for signal acquisition and sensor design.
compressive sensing
computation costs for sensing signals that have a sparse or compressible representation.
The Nyquist-
Shannon sampling theorem states that a certain minimum number of samples is required in order to perfectly
capture an arbitrary bandlimited signal, but when the signal is sparse in a known basis we can vastly reduce
the number of measurements that need to be stored. Consequently, when sensing sparse signals we might
1 This
CHAPTER 1. INTRODUCTION
than rst sampling at a high rate and then compressing the sampled data, we would like to nd ways to
directly
of the work of Emmanuel Cands, Justin Romberg, and Terence Tao and of David Donoho, who showed
that a nite-dimensional signal having a sparse or compressible representation can be recovered from a small
set of linear, nonadaptive measurements [6], [24], [66]. The design of these measurement schemes and their
extensions to practical data models and acquisition schemes are one of the most central challenges in the
eld of CS.
Although this idea has only recently gained signicant attraction in the signal processing community,
there have been hints in this direction dating back as far as the eighteenth century. In 1795, Prony proposed
an algorithm for the estimation of the parameters associated with a small number of complex exponentials
sampled in the presence of noise [162]. The next theoretical leap came in the early 1900's, when Carathodory
any
other
by George, Gorodnitsky, and Rao, who studied sparsity in the context of biomagnetic imaging and other
contexts [109], [164], and by Bressler and Feng, who proposed a sampling scheme for acquiring certain classes
of signals consisting of
components with nonzero bandwidth (as opposed to pure sinusoids) [94], [92]. In
the early 2000's Vetterli, Marziliano, and Blu proposed a sampling scheme for non-bandlimited signals that
are governed by only
parameters, showing that these signals can be sampled and recovered from just
2K
samples [203].
A related problem focuses on recovery of a signal from partial observation of its Fourier transform. Beurling proposed a method for extrapolating these observations to determine the entire Fourier transform [15].
One can show that if the signal consists of a nite number of impulses, then Beurling's approach will correctly
recover the entire Fourier transform (of this non-bandlimited signal) from
Fourier transform. His approach to nd the signal with smallest
`1
any
the acquired Fourier measurements bears a remarkable resemblance to some of the algorithms used in CS.
More recently, Cands, Romberg, Tao [24], [30], [32], [33], [37], and Donoho [66] showed that a signal
having a sparse representation can be recovered
This result suggests that it may be possible to sense sparse signals by taking far fewer measurements, hence
the name
compressive
respects.
First, rather than sampling the signal at specic points in time, CS systems typically acquire
sensing.
measurements in the form of inner products between the signal and more general test functions. We will
see throughout this course that
randomness
Second, the two frameworks dier in the manner in which they deal with
signal recovery,
of recovering the original signal from the compressive measurements. In the Nyquist-Shannon framework,
signal recovery is achieved through cardinal sine (sinc) interpolation a linear process that requires little
computation and has a simple interpretation.
CS has already had notable impact on several applications. One example is medical imaging (Section 6.4),
where it has enabled speedups by a factor of seven in pediatric MRI while preserving diagnostic quality [201].
Moreover, the broad applicability of this framework has inspired research that extends the CS framework
by proposing practical implementations for numerous applications, including sub-Nyquist analog-to-digital
converters (Section 6.5) (ADCs), compressive imaging architectures (Section 6.6), and compressive sensor
networks (Section 6.10).
This course introduces the basic concepts in compressive sensing. We overview the concepts of sparsity
(Section 2.3), compressibility (Section 2.4), and transform coding.
eld, beginning by focusing primarily on the theory of sensing matrix design (Section 3.1),
`1 -minimization
(Section 4.1), and alternative algorithms for sparse recovery (Section 5.1). We then review applications of
sparsity in several signal processing problems such as sparse regression and model selection (Section 6.1), error
correction (Section 6.2), group testing (Section 6.3), and compressive inference (Section 6.9). We also discuss
applications of compressive sensing in analog-to-digital conversion (Section 6.5), biosensing (Section 6.11),
conventional (Section 6.6) and hyperspectral (Section 6.7) imaging, medical imaging (Section 6.4), and sensor
networks (Section 6.10).
1.1.1 Acknowledgments
The authors would like to thank Ewout van den Berg, Yonina Eldar, Piotr Indyk, Gitta Kutyniok, and Yaniv
Plan for their feedback regarding some portions of this course which now also appear in the introductory
chapter
of
2 http://www-stat.stanford.edu/markad/publications/ddek-chapter1-2011.pdf
CHAPTER 1. INTRODUCTION
Chapter 2
For much of its history, signal processing has focused on signals produced by physical systems.
Many
natural and man-made systems can be modeled as linear. Thus, it is natural to consider signal models that
complement this kind of linear structure. This notion has been incorporated into modern signal processing
by modeling signals as
vectors
vector space.
living in an appropriate
we often desire, namely that if we add two signals together then we obtain a new, physically meaningful
signal. Moreover, vector spaces allow us to apply intuitions and tools from geometry in
R3 ,
such as lengths,
distances, and angles, to describe and compare signals of interest. This is useful even when our signals live
in high-dimensional or innite-dimensional spaces.
2
Throughout this course , we will treat signals as real-valued functions having domains that are either
continuous or discrete, and either innite or nite. These assumptions will be made clear as necessary in
each chapter. In this course, we will assume that the reader is relatively comfortable with the key concepts
in vector spaces. We now provide only a brief review of some of the key concepts in vector spaces that will
be required in developing the theory of compressive sensing (Section 1.1) (CS). For a more thorough review
3
norm.
In
the case of a discrete, nite domain, we can view our signals as vectors in an
denoted by
dened for
RN . When dealing
p [1, ] as
RN ,
with vectors in
P
kxkp = {
N
i=1
we will make
|xi |
p1
p [1, ) ;
max |xi |,
i=1,2,...,N
inner product
< x, z >= z x =
This inner product leads to the
`2
norm:
kxk2 =
(2.1)
p = .
N
X
in
RN ,
which we denote
xi zi .
(2.2)
i=1
< x, x >.
`p norms
p < 1.
norm dened in (2.1) fails to satisfy the triangle inequality, so it is actually a quasinorm.
make frequent use of the notation
1 This
2
3
where
supp (x) = {i : xi 6= 0}
We will also
and
|supp (x) |
supp (x).
Note that
k k0
(2.3)
p0
justifying this choice of notation. The
of
p.
`p
norms in
p<1
{x : kxkp = 1},
(a)
(b)
R2
(c)
(d)
1
.
2
(a) Unit sphere for `1 norm (b) Unit sphere for `2 norm (c) Unit sphere for ` norm (d) Unit sphere for
Figure 2.1: Unit spheres in
`p
for the
`p
norms with
p = 1, 2, ,
`p
quasinorm with
p=
quasinorm
We typically use norms as a measure of the strength of a signal, or the size of an error. For example,
suppose we are given a signal
space
A.
x R2
that minimizes
kx x kp .
The choice of
`p
`p
point
x A
that is closest to
`p
x A
approximation error. An example is illustrated in Figure 2.2. To compute the closest point in
each
sphere centered on
in the corresponding
`p
the error more evenly among the two coecients, while smaller
A.
to
using
distributed and tends to be sparse. This intuition generalizes to higher dimensions, and plays an important
role in the development of CS theory.
(a)
(b)
(c)
(d)
R2
for
p = 1, 2, ,
`p
and the
quasinorm with
p=
= {i }iI is called a basis for a nite-dimensional vector space (Section 2.1) V if the vectors in
V and are linearly independent. This implies that each vector in the space can be represented
as a linear combination of this (smaller, except in the trivial case) set of basis vectors in a unique fashion.
Furthermore, the coecients of this linear combination can be found by the inner product of the signal and
a dual set of vectors. In discrete settings, we will only consider real nite-dimensional Hilbert spaces where
V = RN
and
I = {1, ..., N }.
x RN
x=
ai i ,
(2.4)
iI
where our coecients are computed as
ai =< x, i >
and
{i }iI
basis
our
synthesis basis
It is often useful to generalize the concept of a basis to allow for sets of possibly linearly dependent
vectors, resulting in what is known as a
d<n
corresponding to a matrix
frame.
dn
xR
{i }ni=1
in
Rd ,
,
(2.5)
0 < A B < . Note that the condition A > 0 implies that the rows of must be linearly
A is chosen as the largest possible value and B as the smallest for these inequalities to
hold, then we call them the (optimal) frame bounds. If A and B can be chosen as A = B , then the frame is
called A-tight, and if A = B = 1, then is a Parseval frame. A frame is called equal-norm, if there exists
some > 0 such that ki k2 = for all i = 1, ..., N , and it is unit-norm if = 1. Note also that while the
concept of a frame is very general and can be dened in innite-dimensional spaces, in the case where is
T
a d N matrix A and B simply correspond to the smallest and largest eigenvalues of , respectively.
with
independent. When
4 This
x,
Frames can provide richer representations of data due to their redundancy: for a given signal
exist innitely many coecient vectors
such that
x = .
(2.6)
T 1
is referred to as the
T =
T =I
dual frame.
there
is invertible, so that
A>0
requires
canonical
to have
d = T T
1
x.
(2.7)
One can show that this sequence is the smallest coecient sequence in
such that
`2
norm, i.e.,
kd k2 kk2
for all
x = .
Finally, note that in the sparse approximation (Section 2.3) literature, it is also common for a basis or
frame to be referred to as a
being called
atoms.
dictionary
or
overcomplete dictionary
Transforming a signal to a new basis or frame (Section 2.2) may allow us to represent a signal more concisely.
The resulting compression is useful for reducing data storage and data transmission, which can be quite
expensive in some applications. Hence, one might wish to simply transmit the analysis coecients obtained
in our basis or frame expansion instead of its high-dimensional correlate.
non-zero coecients is small, we say that we have a sparse representation. Sparse signal models allow us
to achieve high rates of compression and in the case of compressive sensing (Section 1.1), we may use the
knowledge that our signal is sparse in a known basis or frame to recover our original signal from a small
number of measurements. For sparse data, only the non-zero coecients need to be stored or transmitted
in many cases; the rest can be assumed to be zero).
Mathematically, we say that a signal
is
K -sparse
nonzeros, i.e.,
kxk0 K .
We
let
K = {x : kxk0 K}
denote the set of all
K -sparse
(2.8)
signals. Typically, we will be dealing with signals that are not themselves
. In
x =
K -sparse,
where
as
as being
k k0 K .
Sparsity has long been exploited in signal processing and approximation theory for tasks such as compression [60], [161], [183] and denoising [64], and in statistics and learning theory as a method for avoiding
overtting [199]. Sparsity also gures prominently in the theory of statistical estimation and model selection [111], [186], in the study of the human visual system [158], and has been exploited heavily in image
processing tasks, since the multiscale wavelet transform [139] provides nearly sparse representations for natural images. Below, we briey describe some one-dimensional (1-D) and two-dimensional (2-D) examples.
A simple periodic signal is sampled and represented as a periodic train of weighted impulses (see
Figure 2.3). One can interpret sampling as a basis expansion where our elements in our basis are impulses
5 This
placed at periodic points along the time axis. We know that in this case, our dual basis consists of sinc functions used to reconstruct our signal from discrete-time samples. This representation contains many non-zero
coecients, and due to the signal's periodicity, there are many redundant measurements. Representing the
signal in the Fourier basis, on the other hand, requires only two non-zero basis vectors, scaled appropriately
at the positive and negative frequencies (see Figure 2.3).
lower, we may apply the discrete cosine transform (DCT) to our signal, thereby requiring only a single nonzero coecient in our expansion (see Figure 2.3). The DCT equation is
with
k = 0, , N 1, xn
Xk =
PN 1
n=0
xn cos
n+
(a)
(b)
(c)
Figure 2.3:
basis
Cosine signal in three representations: (a) Train of impulses (b) Fourier basis (c) DCT
1
2
k
10
thresholding
the coecients, to
K -sparse representation. When measuring the approximation error using an `p norm, this procedure
yields the best K -term approximation of the original signal, i.e., the best approximation of the signal using
6
only K basis elements.
obtain a
(a)
(b)
Figure 2.4: Sparse representation of an image via a multiscale wavelet transform. (a) Original image.
(b) Wavelet representation. Large coecients are represented by light pixels, while small coecients are
represented by dark pixels. Observe that most of the wavelet coecients are close to zero.
Sparsity results through this decomposition because in most natural images most pixel values vary little
from their neighbors.
Areas with little contrast dierence can be represent with low frequency wavelets.
Low frequency wavelets are created through stretching a mother wavelet and thus expanding it in space. On
the other hand, discontinuities, or edges in the picture, require high frequency wavelets, which are created
through compacting a mother wavelet.
mimicking the properties of the high frequency compacted wavelet. See "Compressible signals" (Section 2.4)
for an example.
6 Thresholding
K -term
When redundant
frames are used, we must rely on sparse approximation algorithms like those described later in this course [86], [139].
11
The signal is
considered sparse if it has only a few nonzero values in comparison with its overall length.
Few structured signals are truly sparse; rather they are compressible.
sorted coecient magnitudes in
compressible in the basis
A signal is
compressible
:
x = ,
where
coecients
if its
be a signal which is
in the basis
If
(2.9)
|s | C1 sq , s = 1, 2, ....
We dene a signal as being compressible if it obeys this power law decay. The larger
(2.10)
magnitudes decay, and the more compressible a signal is. Figure 2.5 shows images that are compressible in
dierent bases.
7 This
12
(a)
(b)
1.5
0.5
(c)
Figure 2.5:
3
Sorted indices
6
4
x 10
(d)
values are sorted from largest to smallest, there is a sharp descent. The image in the lower left is not
compressible in space, but it is compressible in wavelets since its wavelet coecients exhibit a power law
decay.
Because the magnitudes of their coecients decay so rapidly, compressible signals can be represented
well by
KN
K -term
coecients are kept, with the rest being zero. The error between the true signal and its
is denoted the
K -term
approximation error
K (x),
largest
term approximation
dened as
(2.11)
13
For compressible signals, we can establish a bound with power law decay as follows:
K (x) C2 K 1/2s .
(2.12)
r
In fact, one can show that K (x)2 will decay as K
if and only if the sorted coecients
r+1/2
i
[61]. Figure 2.6 shows an image and its K -term approximation.
(a)
decay as
(b)
Figure 2.6:
(b) Approximation of
image obtained by keeping only the largest 10% of the wavelet coecients. Because natural images are
compressible in a wavelet domain, approximating this image it in terms of its largest wavelet coecients
maintains good delity.
`p
`p
`p
x (n)
norm is nite:
! p1
k x kp =
|xi |
< .
(2.13)
i
The smaller
bounded.
non-zero values.
various
`p
is, the faster the sequence's values must decay in order to converge so that the norm is
p = 0,
`p
`p
norm is 1) in 3 dimensions.
14
`p
be seen visually when comparing the the size of the spaces of signals, in three dimensions, for which the
`p
`p
`p
which is in an
x [n].
`p
space with
p1
p.
q > 1/p.
Therefore a signal
Chapter 3
Sensing Matrices
3.1 Sensing matrix design
In order to make the discussion more concrete, we will restrict our attention to the standard nite-dimensional
compressive sensing (Section 1.1) (CS) model. Specically, given a signal
systems that acquire
x RN ,
we consider measurement
y = x,
(3.1)
is an
xed in advance and do not depend on the previously acquired measurements. In certain settings adaptive
measurement schemes can lead to signicant performance gains.
Note that although the standard CS framework assumes that
valued index (such as time or space), in practice we will often be interested in designing measurement systems
for acquiring continuously-indexed signals such as continuous-time signals or images. For now we will simply
think of
as a nite-length window of Nyquist-rate samples, and we temporarily ignore the issue of how to
directly acquire compressive measurements without rst sampling at the Nyquist rate.
There are two main theoretical questions in CS. First, how should we design the sensing matrix
ensure that it preserves the information in the signal
from measurements
y?
x?
to
In the case where our data is sparse (Section 2.3) or compressible (Section 2.4), we
with M N
the original signal accurately and eciently using a variety of practical algorithms (Section 5.1).
2
Rather than directly proposing a design procedure, we instead consider a number of desirable properties that
we might wish
to have (including the null space property (Section 3.2), the restricted isometry property
(Section 3.3), and bounded coherence (Section 3.6)). We then provide some important examples of matrix
constructions (Section 3.5) that satisfy these properties.
1 This
2
15
16
denoted
N () = {z : z = 0}.
all
(3.2)
spark
[70].
Theorem 3.1:
For any vector
(Corollary 1 of [70])
y RM ,
x K
such that
y = x
if and only if
spark () > 2K .
Proof:
y RM ,
spark () 2K .
x K
such that
y = x.
2K columns that are linearly independent, which in turn implies that there
h N () such that h 2K . In this case, since h 2K we can write h = x x' ,
'
'
where x, x K . Thus, since h N () we have that x x
= 0 and hence x = x' . But
this contradicts our assumption that there exists at most one signal x K such that y = x.
Therefore, we must have that spark () > 2K .
'
Now suppose that spark () > 2K . Assume that for some y there exist x, x K such that
'
'
'
y = x = x . We therefore have that x x = 0. Letting h = x x , we can write this as
h = 0. Since spark () > 2K , all sets of up to 2K columns of are linearly independent, and
'
therefore h = 0. This in turn implies x = x , proving the theorem.
exactly
sparse vectors, the spark provides a complete characterization of when sparse re-
approximately
N ()
does not contain any vectors that are too compressible in addition to vectors that are sparse. In
order to state the formal denition we dene the following notation that will prove to be useful throughout
4
3 This
4
17
Similarly, by
to zero.
M N
indexed by
Denition 3.2:
A matrix
satises the
(NSP) of order
C >0
such that,
khc k
kh k2 C 1
K
holds for all
h N ()
such that
(3.3)
|| K .
should not be too concentrated on
K -sparse, then there exists a such that
a matrix satises the NSP then the only
The NSP quanties the notion that vectors in the null space of
that
h = 0
is exactly
as well. Thus, if
To fully illustrate the implications of the NSP in the context of sparse recovery, we now briey discuss
how we will measure the performance of sparse recovery algorithms when dealing with general non-sparse
x.
: RM R N
K (x)
k (x) xk2 C 1
K
for all
x,
(3.4)
K (x)p = min kx x kp .
(3.5)
x K
K -sparse
non-sparse signals that directly depends on how well the signals are approximated by
guarantees are called
instance-optimal
This distinguishes them from guarantees that only hold for some subset of possible signals, such as sparse
or compressible signals the quality of the guarantee adapts to the particular choice of
commonly referred to as
uniform guarantees
x.
x.
Our choice of norms in (3.4) is somewhat arbitrary. We could easily measure the reconstruction error
using other
`p
p,
however, will limit what kinds of guarantees are possible, and will
also potentially lead to alternative formulations of the NSP. See, for instance, [46].
of the right-hand-side of (3.4) might seem somewhat unusual in that we measure the approximation error
as
K (x)1 / K
K (x)2 .
that
such a guarantee is actually not possible without taking a prohibitively large number of measurements, and
that (3.4) represents the best possible guarantee we can hope to obtain (see "Instance-optimal guarantees
revisited" (Section 4.4)).
Later in this course, we will show that the NSP of order
2K
form (3.4) for a practical recovery algorithm (see "Noise-free signal recovery" (Section 4.2)). Moreover, the
following adaptation of a theorem in [46] demonstrates that if there exists
(3.4), then
Theorem 3.2:
Let
5 We
(, )
: RM RN
or the
M ||
||
2K .
from the context, but typically there is no substantive dierence between the two.
note that this notation will occasionally be abused to refer to the length
: RN RM
corresponding to
any
2K .
18
Proof:
h N () and let be the indices corresponding to the 2K largest entries of h. We
into 0 and 1 , where |0 | = |1 | = K . Set x = h1 + hc and x' = h
0 , so that
h = x x' . Since by construction x' K , we can apply (3.4) to obtain x' = x' . Moreover,
since h N (), we have
Suppose
next split
h = x x' = 0
so that
x' = x.
Thus,
x' = (x).
(3.6)
khc k
K (x)
kh k2 khk2 = kx x' k2 = kx (x) k2 C 1 = 2C 1 ,
K
2K
(3.7)
The null space property (Section 3.2) (NSP) is both necessary and sucient for establishing guarantees of
the form
K (x)
k (x) xk2 C 1 ,
K
but these guarantees do not account for
noise.
(3.8)
been corrupted by some error such as quantization, it will be useful to consider somewhat stronger conditions.
In [36], Cands and Tao introduced the following isometry condition on matrices
Denition 3.3:
A matrix
satises the
(RIP) of order
if there exists a
K (0, 1)
such that
If a matrix
(3.9)
x K = {x : kxk0 K}.
2K , then we can interpret (3.9) as saying that approximately
K -sparse vectors. This will clearly have fundamental implications
It is important to note that in our denition of the RIP we assume bounds that are symmetric about 1,
but this is merely for notational convenience. In practice, one could instead consider arbitrary bounds
0 < < .
K = ( ) / ( + ).
by
2/ ( + )
will result in an
We will not explicitly show this, but one can check that all of the
(3.10)
satises the RIP of order K with constant K , then for any K ' < K we automatically
satises the RIP of order K ' with constant K ' K . Moreover, in [151] it is shown that if
7 This
19
K with a suciently small constant, then it will also automatically satisfy the RIP
, albeit with a somewhat worse constant.
for certain
Lemma 3.1:
Suppose that
K ' = b K
2 c
K . Let
K ' < K ,
with constant
with constant
bc
operator.
This lemma is trivial for
= 1, 2,
but for
(and
K 4)
useful.
that if a matrix
algorithms (Section 5.1) to be able to successfully recover a sparse signal from noisy measurements. First,
however, we will take a closer look at whether the RIP is actually necessary. It should be clear that the
lower bound in the RIP is a necessary condition if we wish to be able to recover all sparse signals
measurements
from the
Denition 3.4:
Let
say
k (x + e) xk2 Ckek.
We
(3.11)
This denition simply says that if we add a small amount of noise to the measurements, then the impact
of this on the recovered signal should not be arbitrarily large.
Theorem 3.3, p.
19 below demonstrates
that the existence of any decoding algorithm (potentially impractical) that can stably recover from noisy
measurements requires that
C.
Theorem 3.3:
If the pair
(, )
is
C -stable,
then
1
kxk2 kxk2
C
for all
Proof:
(3.12)
x 2K .
Pick any
x, z K .
Dene
ex =
(z x)
2
and
ez =
(x z)
,
2
(3.13)
x + ex = z + ez =
8
(x + z)
.
2
(3.14)
20
Let
x= (x + ex ) = (z + ez ).
C -stability,
we have that
^
kx zk2
= kx x + x zk2
^
kx x k2 + k x zk2
(3.15)
Ckex k + Ckez k2
= Ckx zk2 .
Since this holds for any
Note that as
C 1,
x, z K ,
we have that
K = 1 1/C 2 0.
must adjust so that it
Thus, if we desire to reduce the impact of noise in our recovered signal then we
satises the lower bound of (3.9) with a tighter constant.
One might respond to this result by arguing that since the upper bound is not necessary, we can avoid
redesigning
simply by rescaling
so that as long as
choice of
C.
2K < 1,
measurements, and if increasing this gain does not impact the noise, then we can achieve arbitrarily high
signal-to-noise ratios, so that eventually the noise is negligible compared to the signal.
However, in practice we will typically not be able to rescale
practical settings the noise is not independent of
[T, T ],
represents
[T, T ],
by
e,
no reduction
in the
reconstruction error.
M,
and
K)
bound. We rst provide a preliminary lemma that we will need in the proof of the main theorem.
Lemma 3.2:
Let
N satisfying
K < N/2 be given. There exists a
and
we have
set
X K
xX
(3.16)
and
K
log|X| log
2
N
K
.
(3.17)
Proof:
We will begin by considering the set
U = {x {0, +1, 1}
By construction,
we automatically
kxk22 = K forall x U .
have kxk2
K.
: kxk0 = K}.
Thus if we construct
(3.18)
then
21
|U | =
K/2
then
kx zk0 K/2.
N
K
2K .
kx zk0 kx zk22 ,
and thus if
kx zk22
x U,
N
{z U : kx zk22 K/2} |{z U : kx zk K/2}|
3K/2 .
0
K/2
From this we observe that for any xed
(3.19)
After
N
K
2K j
N
K/2
3K/2
|X|
N
K/2
3K/2
N
K
(3.20)
|X|
provided that
2K
(3.21)
N
K
K/2
K/2
Y N K + i N
(K/2)! (N K/2)!
1
=
,
K! (N K)!
K/2 + i
K
2
i=1
=
N
K/2
i.
Thus, if we set
K/2
|X| = (N/K)
(n K + i) / (K/2 + i)
(3.22)
is decreasing as a function
then we have
N
K/2
K/2
K/2
K/2
K
3
3N
N
N
N
1
.
|X|
=
=
N
4
4K
K
4K
K
2
(3.23)
K/2
Hence, (3.21) holds for
|X| = (N/K)
K/2
Using this lemma, we can establish the following bound on the required number of measurements to
satisfy the RIP.
Theorem 3.4:
Let
M N
be an
M CKlog
where
Proof:
C = 1/2log
N
K
2K
with constant
0, 21
. Then
(3.24)
24 + 1 0.28.
in Lemma 3.2, p. 20 we
have,
kx zk2
for all
x, z X ,
since
x z 2K
and
kxk2
for all
1 kx zk2
K/4
(3.25)
1
2 . Similarly, we also have
1 + kxk2
3K/2
(3.26)
x X.
From the lower bound we can say that for any pair of points
K/4/2 =
K/16
at
and
z ,
then these balls will be disjoint. In turn, the upper bound tells
22
us that the entire set of balls is itself contained within a larger ball of radius
we let
3K/2 +
K/16.
M
24 + 1
|X|,
|X|
(3.27)
log|X|
.
log ( 24+1)
If
1
2 is arbitrary and is made merely for convenience minor modications
to the argument establish bounds for max for any max < 1. Moreover, although we have made no eort
Note that the restriction to
to optimize the constants, it is worth noting that they are already quite reasonable.
N
`1 ball [98]. However, both this result and Theorem 3.4,
of M on the desired RIP constant . In order to quantify
Although the proof is somewhat less direct, one can establish a similar result (in the dependence on
and
p.
K)
by examining the
Gelfand width
of the
embeddings of nite sets of points in low-dimensional spaces [120]. Specically, it is shown in [118] that if
we are given a point cloud with
1 ,
c0 > 0
`2
c0 log (p)
,
2
M
where
RM
(3.28)
is a constant.
The Johnson-Lindenstrauss lemma is closely related to the RIP. We will see in "Matrices that satisfy
the RIP" (Section 3.5) that any procedure that can be used for generating a linear, distance-preserving
embedding for a point cloud can also be used to construct a matrix that satises the RIP. Moreover, in [127]
it is shown that if a matrix
M
Thus, for small
used
points with
c0 log (p)
16c0 K
=
.
2
c1 2
(3.29)
satises
Klog (N/K). See
K/ 2 ,
be proportional to
will
Next we will show that if a matrix satises the restricted isometry property (Section 3.3) (RIP), then it also
satises the null space property (Section 3.2) (NSP). Thus, the RIP is strictly stronger than the NSP.
Theorem 3.5:
Suppose that
order
2K
2K
with
2K <
2 1.
Then
with constant
C=
9 This
22K
.
1 1 + 2 2K
(3.30)
23
Proof:
The proof of this theorem involves two useful lemmas.
K -sparse
vector to a vector in
RK .
We include a simple
Lemma 3.3:
u K .
Suppose
Then
kuk1
kuk2 Kkuk .
K
Proof:
(3.31)
u, kuk1 = |< u, sgn (u) >|. By applying the Cauchy-Schwarz inequality we obtain kuk1
kuk2 ksgn (u) k2 . The lower bound followssince sgn (u) has exactly K nonzero entries all equal to
1 (since u K ) and thus ksgn (u) k = K . The upper bound is obtained by observing that each
of the K nonzero entries of u can be upper bounded by kuk .
For any
Below we state the second key lemma that we will need in order to prove Theorem 3.5, p. 22.
h,
h N (),
h N ().
It should be
this lemma will prove immensely useful when we turn to the problem of sparse recovery from noisy
10
We state the lemma here, which is proven in "`1 minimization proof" (Section 7.4).
Lemma 3.4:
any subset of
entries of
hc0
satises
{1, 2, ..., N }
Suppose that
with
khc k
|< h , h >|
kh k2 0 1 +
,
kh k2
K
(3.32)
where
22K
,
1 2K
1
.
1 2K
h.
h N ().
(3.33)
h N ().
khc k
kh k2 C 1
K
holds for the case where
can take
(3.34)
2K
largest entries of
largest entries of
h.
Thus, we
p. 23.
The second term in Lemma 3.4, p. 23 vanishes since
h = 0,
khc k
kh k2 0 1 .
K
(3.35)
Kkh1 k2 + khc k1
(3.36)
24
resulting in
khc k
kh k2 kh1 k2 + 1 .
K
Since
kh1 k2 kh k2 ,
The assumption
(3.37)
we have that
2K <
khc k
(1 ) kh k2 1 .
K
21
ensures that
< 1,
(3.38)
without
22K
,
C=
=
1
1 1 + 2 2K
(3.39)
as desired.
11
We now turn to the question of how to construct matrices that satisfy the restricted isometry property
(Section 3.3) (RIP). It is possible to deterministically construct matrices of size
RIP of order
K,
M = O K 2 logN
M N
M = O (KN )
for
In many real-world settings, these results would lead to an unacceptably large requirement
M.
Fortunately, these limitations can be overcome by randomizing the matrix construction. We will construct
and
N,
ij
as independent realizations from some probability distribution. We begin by observing that if all we require
is that
2K > 0
simple approach rst described in [7] and later generalized to a larger class of random matrices in [144].
To ensure that the matrix will satisfy the RIP, we will impose two conditions on the random distribution.
First, we require that the distribution will yield a matrix that is norm-preserving, which will require that
1
E 2ij =
,
(3.40)
M
and hence the variance of the distribution is 1/M . Second, we require that the distribution is a sub-Gaussian
distribution (Section 7.1), meaning that there exists a constant c > 0 such that
2 2
E eij t ec t /2
for all
t R.
This says that the moment-generating function of our distribution is dominated by that of
a Gaussian distribution, which is also equivalent to requiring that tails of our distribution decay
fast
(3.41)
at least as
as the tails of a Gaussian distribution. Examples of sub-Gaussian distributions include the Gaussian
11 This
25
1/ M ,
bounded support. See "Sub-Gaussian random variables" (Section 7.1) for more details.
For the moment, we will actually assume a bit more than sub-Gaussianity. Specically, we will assume
that the entries of
are
strictly
c2 = E 2ij =
1
M . Similar results to the following would hold for general sub-Gaussian distributions, but to simplify the
constants we restrict our present attention to the strictly sub-Gaussian . In this case we have the following
useful result, which is proven in "Concentration of measure for sub-Gaussian random variables" (Section 7.2).
Corollary 3.1:
Suppose that
is an
M N
c2 = 1/M .
Let
x RN ,
E k Y k22 =k x k22
(3.42)
M 2
2
2
2
P k Y k2 k x k2 k x k2 2exp
(3.43)
and
with
This tells us that the norm of a sub-Gaussian random vector strongly concentrates about its mean. Using
this result, in "Proof of the RIP for sub-Gaussian matrices" (Section 7.3) we provide a simple proof based
on that in [7] that sub-Gaussian matrices satisfy the RIP.
Theorem 3.6:
Fix
(0, 1).
Let
be an
M N
c2 = 1/M .
N
,
M 1 Klog
K
ij
ij
drawn
If
(3.44)
1 2e2 M ,
Note that in light of the measurement bounds in "The restricted isometry property" (Section 3.3) we see
that (3.44) achieves the optimal number of measurements (up to a constant).
Using random matrices to construct
focus on the RIP. First, one can show that for random constructions the measurements are
democratic,
meaning that it is possible to recover a signal using any suciently large subset of the measurements [58],
[129].
measurements.
Second, and perhaps more signicantly, in practice we are often more interested in the
x is sparse with respect to some basis . In this case what we actually require is that the
satises the RIP. If we were to use a deterministic construction then we would need to explicitly
take into account in our construction of , but when is chosen randomly we can avoid this consideration.
For example, if is chosen according to a Gaussian distribution and is an orthonormal basis then one
can easily show that will also have a Gaussian distribution, and so provided that M is suciently high
will satisfy the RIP with high probability, just as before. Although less obvious, similar results hold for
setting where
product
universality,
constitutes a
Finally, we note that since the fully random matrix approach is sometimes impractical to build in hardware, several hardware architectures have been implemented and/or proposed that enable random measurements to be acquired in practical settings. Examples include the random demodulator (Section 6.5) [192],
random ltering [194], the modulated wideband converter [147], random convolution [2], [166], and the
compressive multiplexer [179]. These architectures typically use a reduced amount of randomness and are
26
that have signicantly more structure than a fully random matrix. Perhaps some-
what surprisingly, while it is typically not quite as easy as in the fully random case, one can prove that many
of these constructions also satisfy the RIP.
12
3.6 Coherence
While the spark (Section 3.2), null space property (Section 3.2) (NSP), and restricted isometry property
(Section 3.3) (RIP) all provide guarantees for the recovery of sparse (Section 2.3) signals, verifying that a
general matrix
N
K
that are easily computable to provide more concrete recovery guarantees. The
coherence
of a matrix is
Denition 3.5:
The coherence of a matrix
i , j
of
, (), is the largest absolute inner product between any two columns
:
() =
max
|< i , j >|
.
i k2 k j k2
(3.45)
1i<jN k
() 1/ M .
()
N M,
hq
N M
M (N 1) , 1 ; the
One can sometimes relate coherence to the spark, NSP, and RIP. For example, the coherence and spark
properties of a matrix can be related by employing the Gershgorin circle theorem [100], [200].
Theorem 3.7:
(Theorem 2 of [100])
Lemma 3.5:
For any matrix
G = T
union of
discs
,
spark () 1 +
1
.
()
(3.46)
Proof:
spark () does not depend on the scaling of the columns, we can assume without loss of
has unit-norm columns. Let {1, ..., N } with || = p determine a set of indices.
T
consider the restricted Gram matrix G = , which satises the following properties:
Since
generality that
We
gii = 1, 1 i p;
|gij | (), 1 i, j p, i 6= j .
P
(p 1) () < 1
1 + 1/ ().
or,
By merging Theorem 1 from "Null space conditions" (Section 3.2) with Lemma 3.5, p. 26, we can pose
the following condition on
12 This
27
Theorem 3.8:
(Theorem 12 of [71])
If
K<
then for each measurement vector
1
2
y RM
1+
1
()
,
(3.47)
x K
such that
y = x.
Theorem 3.8, (Theorem 12 of [71]), p. 26, together with the Welch bound, provides an upper bound on
the level of sparsity
K=O
. Another straightforward
application of the Gershgorin circle theorem (Theorem 3.7, (Theorem 2 of [100]), p. 26) connects the RIP
to the coherence property.
Lemma 3.6:
If has unit-norm columns and
= (K 1) for all K < 1/.
coherence
= (),
then
with
()
herence bounds have been studied both for deterministic and randomized matrices. For example, there are
known matrices
of size M M 2
frame generated from the Alltop sequence [114] and more general equiangular tight frames [180]. These constructions restrict the number of measurements needed to recover a
Furthermore, it can be shown that when the distribution used has zero mean and nite variance, then in
the asymptotic regime (as
and
p
() = (2logN ) /M [23], [29],
M = O K 2 logN , matching the known
to grow asymptotically as
nite-dimensional bounds.
The measurement bounds dependent on coherence are handicapped by the squared dependence on the
sparsity
K,
but it is possible to overcome this bottleneck by shifting the types of guarantees from worst-
()
K = O (M logN ),
which returns to the linear dependence of the measurement bound on the signal
28
Chapter 4
`_1
minimization
As we will see later in this course , there now exist a wide variety of approaches to recover a sparse (Section 2.3) signal
y = x
x= argmin kzk0
z
where
B (y)
ensures that
B (y) = {z : z = y}.
subject to
z,
is sparse or compressible
z B (y) ,
y.
(4.1)
Recall that
simply
For example, if our measurements are exact and noise-free, then we can set
When the measurements have been contaminated with a small amount of bounded
x = ,
= argmin kzk0
z
where
B (y) = {z : z = y}
or
z B (y)
subject to
B (y) = {z : kz yk2 }.
By setting
(4.2)
(4.2) are essentially identical. Moreover, as noted in "Matrices that satisfy the RIP" (Section 3.5), in many
cases the introduction of
such that
will
satisfy the desired properties. Thus, for most of the remainder of this course we will restrict our attention
to the case where
k x xk2 = k c ck2 6= k k2 ,
bound on
1 This
2
k x xk,
k c ck2
30
Although it is possible to analyze the performance of (4.1) under the appropriate assumptions on
do not pursue this strategy since the objective function
k k0
we
dicult to solve. In fact, one can show that for a general matrix
the true minimum is NP-hard. One avenue for translating this problem into something more tractable is to
replace
k k0
k k1 .
x= argmin kzk1
z
Provided that
B (y)
Specically, we consider
subject to
z B (y) .
(4.3)
B (y) = {z : z = y},
the
(a)
(b)
2
Figure 4.1: Best approximation of a point in R by a a one-dimensional subspace using the `1 norm
1
and the `p quasinorm with p = 2 . (a) Approximation in `1 norm (b) Approximation in `p quasinorm
It is clear that replacing (4.1) with (4.3) transforms a computationally intractable problem into a tractable
one, but it may not be immediately obvious that the solution to (4.3) will be at all similar to the solution
to (4.1). However, there are certainly intuitive reasons to expect that the use of
`1
promote sparsity. As an example, recall the example we discussed earlier shown in Figure 4.1. In this case
`1
p < 1,
`1
`p
minimization
sparsity has a long history, dating back at least to the work of Beurling on Fourier transform extrapolation
from partial observations [16].
Additionally, in a somewhat dierent context, in 1965 Logan [133] showed that a bandlimited signal
can be perfectly recovered in the presence of
arbitrary
method consists of searching for the bandlimited signal that is closest to the observed signal in the
This can be viewed as further validation of the intuition gained from Figure 4.1 the
`1
`1
norm.
norm is well-suited
to sparse errors.
Historically, the use of
`1
computing power in the late 1970's and early 1980's. In one of its rst applications, it was demonstrated that
geophysical signals consisting of spike trains could be recovered from only the high-frequency components of
these signals by exploiting `1 minimization [132], [184], [207]. Finally, in the 1990's there was renewed interest
in these approaches within the signal processing community for the purpose of nding sparse approximations
31
(Section 2.4) to signals and images when represented in overcomplete dictionaries or unions of bases [43],
[140]. Separately,
`1
variable selection in linear regression (Section 6.1), known as the Lasso [187].
Thus, there are a variety of reasons to suspect that
for sparse signal recovery.
`1
`1
(Section 4.2) and noisy (Section 4.3) settings from a theoretical perspective. We will then further discuss
`1
x= argmin kzk1
subject to
B (y).
z B (y) .
(4.4)
on Lemma 4 from "`1 minimization proof" (Section 7.4). The key ideas in this proof follow from [25].
Lemma 4.1:
Suppose that
2K <
2 1.
the
entries of
hc0
x, x R
K entries
Let
corresponding to the
of
h =x x.
1
= 0 1 .
If
Let
2K
with
k x k1 kxk1 ,
then
K (x)
|< h , h >|
.
khk2 C0 1 + C1
kh k2
K
(4.5)
where
C0 = 2
2 2K
,
1 1 + 2 2K
1 1
C1 =
2
.
1 1 + 2 2K
(4.6)
Proof:
We begin by observing that
h = h + hc ,
khk2 kh k2 + khc k2 .
We rst aim to bound
khc k2 .
khc k2 = k
hj k
j2
where the
of
hc0
(4.7)
khj k2
j2
khc0 k1
,
K
(4.8)
largest entries
on.
We now wish to bound
khc0 k1 .
Since
kxk1 k x k1 ,
obtain
kxk1 kx + hk1
3
4 This
(4.9)
32
khc0 k1
(4.10)
kx x0 k1 + kh0 k1 + kxc0 k1 .
Recalling that
K (x)1 = kxc0 k1 = kx x0 k1 ,
khc0 k1 kh0 k1 + 2K (x)1 .
(4.11)
khc k2
K (x)
kh0 k1 + 2K (x)1
kh0 k2 + 2 1
K
K
(4.12)
K (x)
khk2 2kh k2 + 2 1 .
K
kh k2 .
(4.13)
proof" (Section 7.4) with (4.11) and again applying standard bounds on
kh k2
norms we obtain
khc k
|<h ,h>|
0 1 +
kh k2
K
kh k +2K (x)1
,h>|
+ |<h
0 1K
kh k2
(x)
,h>|
kh0 k2 + 2 KK 1 + |<h
.
kh k2
Since
`p
(4.14)
kh0 k2 kh k2 ,
K (x)
|< h , h >|
.
(1 ) kh k2 2 1 +
kh k2
K
(4.15)
(4.13) results in
khk2
Plugging in for
and
4
+2
1
K (x)1
2 |< h , h >|
+
.
1
kh k2
K
(4.16)
B (y),
x B (y)
aects
|< h , h >|.
As an example,
Theorem 4.1:
Suppose that
the form
y = x.
Then when
2K
with
B (y) = {z : z = y},
2K <
21
the solution
K (x)
^
k x xk2 C0 1 .
K
(4.17)
33
Proof:
Since
x B (y)
h =x x,
K (x)
|< h , h >|
khk2 C0 1 + C1
.
kh k2
K
Furthermore, since
x, x B (y)
y = x = x
(4.18)
and hence
h = 0.
Therefore the
Theorem 4.1, (Theorem 1.1 of [25]), p. 32 is rather remarkable. By considering the case where x K =
{x : kxk0 K} we can see that provided satises the RIP which as shown earlier allows for as few as
O (Klog (N/K)) measurements we can recover any K -sparse xexactly. This result seems improbable on
its own, and so one might expect that the procedure would be highly sensitive to noise, but we will see next
that Lemma 4.1, p. 31 can also be used to demonstrate that this approach is actually stable.
the NSP implies the simplied version of Lemma 4.1, p. 31. This proof directly mirrors that of Lemma 4.1,
p.
31.
Thus, by the same argument as in the proof of Theorem 4.1, (Theorem 1.1 of [25]), p.
32, it is
satises the NSP then it will obey the same error bound.
The ability to perfectly reconstruct a sparse (Section 2.3) signal from noise-free (Section 4.2) measurements
represents a promising result.
able to represent it using a nite number of bits, and hence the measurements will typically be subject to
quantization error.
x= argmin kzk1
z
subject to
z B (y) .
(4.19)
to stably recover sparse signals under a variety of common noise models [34], [39], [112].
As might be
expected, the restricted isometry property (Section 3.3) (RIP) is extremely useful in establishing performance
guarantees in noise.
In our analysis we will make repeated use of Lemma 1 from "Noise-free signal recovery" (Section 4.2), so
we repeat it here for convenience.
Lemma 4.2:
Suppose that
dene
2K
with
2K <
2 1.
Let
x, x RN
be given, and
h =x x. Let 0 denote the index set corresponding to the K entries of x with largest
1 the index set corresponding to the K entries of hc0 with largest magnitude. Set
magnitude and
= 0 1 .
If
k x k1 kxk1 ,
then
K (x)
|< h , h >|
.
khk2 C0 1 + C1
kh k2
K
5 This
(4.20)
34
where
2 2K
,
C0 = 2
1 1 + 2 2K
1 1
C1 =
2
.
1 1 + 2 2K
(4.21)
Theorem 4.2:
Suppose that
Then when
B (y) = {z : kz yk2 },
with
2K <
the solution
to (4.19) obeys
K (x)
^
k x xk2 C0 1 + C2 ,
K
(4.22)
where
2 2K
,
C0 = 2
1 1 + 2 2K
1 1
Proof:
We are interested in bounding
that
k x k1 kxk1 .
khk2 = k x xk2 .
C2 = 4
Since
1 + 2K
.
1 1 + 2 2K
kek2 , x B (y),
(4.23)
|< h , h >|.
^
^
x
x k = k x y + y xk2 k x yk2 + ky xk2 2
khk2 = k
(4.24)
2
where the last inequality follows since
x, x B (y).
1 + 2K kh k2 .
(4.25)
Thus,
p
K (x)
K (x)
khk2 C0 1 + C1 2 1 + 2K = C0 1 + C2 ,
K
K
(4.26)
In order to place this result in context, consider how we would recover a sparse vector
to already know the
oracle estimator.
0 .
if we happened
In this case a natural approach is to reconstruct the signal using a simple pseudoinverse:
^
x0
xc0
= 0 y = T0 0
= 0.
1
T0 y
(4.27)
35
is the
to the equation
k x xk2 = k T0 0
1
T0 (x + e) xk2 = k T0 0
Therefore, if
T0 ek2 .
(4.28)
2K (with constant 2K ),
1
^
k x xk2
.
1 + 2K
1 2K
(4.29)
x is exactly K -sparse, then the guarantee for the pseudoinverse recovery method, which is given
kek2
known as the
Dantzig selector
kT ek
Theorem 4.3:
Suppose that
of the form
solution
to (4.19) obeys
K (x)
^
k x xk2 C0 1 + C3 K,
K
(4.30)
where
2 2K
C0 = 2
,
1 1 + 2 2K
4 2
C3 =
.
1 1 + 2 2K
1 1
Proof:
The proof mirrors that of Theorem 4.2, (Theorem 1.2 of [26]), p. 34. Since
have that
x B (y),
so
k x k1 kxk1
(4.31)
kT ek ,
33 applies.
we again
We follow a similar
|< h , h >|.
We rst note
that
k hk
^
k
x y k
T
+ kT (y x) k 2
(4.32)
x, x B (y).
h = h .
Using this
|< h , h >| = < h , T h > kh k2 kT hk2 .
k hk 2,
2K (2). Thus,
Finally, since
kT hk2
is at most
K (x)
K (x)
khk2 C0 1 + C1 2 2K = C0 1 + C3 K,
K
K
(4.33)
2,
and thus
(4.34)
36
as desired.
`0
measurements. We now see that Theorem 4.2, (Theorem 1.2 of [26]), p. 34 and Theorem 4.3, p. 35 can
`1 minimization.
x K = {x : kxk0 K}, so
that
K (x)1 = 0
Theorem 4.2, (Theorem 1.2 of [26]), p. 34 and Theorem 4.3, p. 35 depend only on the noise
To begin, suppose that the coecients of
mean zero and variance
2 .
e RM
e.
Since the Gaussian distribution is itself sub-Gaussian, we can apply results such
as Corollary 1 from "Concentration of measure for sub-Gaussian random variables" (Section 7.2) to show
that there exists a constant
P kek2 (1 + ) M exp c0 2 M .
= 1,
(4.35)
we obtain the following result
Corollary 4.1:
^
N 0, 2 . Then when B (y) = {z : kz yk2 2 M }, the solution x to (4.19) obeys
1 + 2K
^
k x xk2 8
M
1 1 + 2 2K
Suppose that
2K
with
that
i.i.d.
(4.36)
1 exp (c0 M ).
We can similarly consider Theorem 4.3, p. 35 in the context of Gaussian noise. If we assume that the
columns of
variance
have unit norm, then each coecient of T e is a Gaussian random variable with mean zero and
. Using standard tail bounds for the Gaussian distribution (see Theorem 1 from "Sub-Gaussian
P T e i t exp t2 /2
for
i = 1, 2, ..., n.
Thus, using the union bound over the bounds for dierent
(4.37)
i,
we obtain
p
1
P kT ek 2 logN N exp (2logN ) = .
N
(4.38)
Applying this to Theorem 4.3, p. 35, we obtain the following result, which is a simplied version of Theorem
1.1 of [39].
Corollary 4.2:
2K < 2 1.
Furthermore, suppose that x K and that we obtain measurements of the form y = x + e where
2
T
the entries of e are i.i.d. N 0, . Then when B (y) = {z : k (z y) k 2 logN }, the
Suppose that
solution
2K
with
to (4.19) obeys
k x xk2 4 2
with probability at least
1
N.
p
1 + 2K
KlogN
1 1 + 2 2K
(4.39)
37
Ignoring the precise constants and the probabilities with which the bounds hold (which we have made no
eort to optimize), we observe that if
K,
we can see that Corollary 4.2, p. 36 yields a bound that is adaptive to this change, providing a stronger
guarantee when
is reduced.
Thus, while they provide very similar guarantees, there are certain circumstances where the Dantzig selector
is preferable. See [39] for further discussion of the comparative advantages of these approaches.
We now briey return to the noise-free (Section 4.2) setting to take a closer look at instance-optimal guarantees for recovering non-sparse signals. To begin, recall that in Theorem 1 from "Noise-free signal recovery"
(Section 4.2) we bounded the
`2 -norm
x= argmin kzk1
subject to
z B (y) .
(4.40)
as
^
k x xk2 C0 K (x)1 / K
(4.41)
when B (y) = {z : z = y}. One can generalize this result to measure the reconstruction error using the
`p -norm for any p [1, 2]. For example, by a slight modication of these arguments, one can also show
that
k x xk1 C0 K (x)1
(see [27]). This leads us to ask whether we might replace the bound for the
k x xk2 CK (x)2 .
`2
Theorem 4.4:
Suppose that
for some
Proof:
K 1,
M N
is an
then
M>
h RN
: RM RN
kx (x) k2 CK (x)2
p
1 1 1/C 2 N .
(4.42)
M
{vi }N
i=1
h N ()
N (),
such that
|| K .
{hi }N
i=1
In
as
follows:
hj =
NX
M
vi (j) vi .
i=1
6 This
(4.44)
38
hj =
in the
j -th
PN M
< ej , vi > vi where ej denotes the vector of all zeros except for a 1
hj = PN ej where PN denotes an orthogonal projection onto
We note that
i=1
N (). Since
for hj we observe
that
2
M
NX
1
1
2
2
|vi (j) | = |hj (j) | 1 2 khj k22 1 2 .
C
C
i=1
Summing over
j = 1, 2, ..., N ,
1/C 2
we obtain
N NX
M
X
|vi (j) | =
NX
M X
N
j=1 i=1
and thus
(4.45)
|vi (j) | =
NX
M
i=1 j=1
p
M 1 1 1/C 2 N
kvi k22 = N M,
(4.46)
i=1
as desired.
all
sense this result is overly pessimistic, and we will now see that the results we just established for signal
recovery in noise can actually allow us to overcome this limitation by essentially treating the approximation
error as noise.
Towards this end, notice that all the results concerning
`1
isometry property (Section 3.5) (RIP). This is an important theoretical property, but as noted in "Matrices
that satisfy the RIP" (Section 3.5), in practice it is very dicult to obtain a deterministic guarantee that the
matrix
satises the RIP. In particular, constructions that rely on randomness are only known to satisfy
(Section 3.5), which opens the door to slightly weaker results that hold only with high probability.
Theorem 4.5:
Fix
(0, 1).
Let
be an
M N
ij are i.i.d.
c2 = 1/M . If
M 1 Klog
N
K
with
ij
drawn
,
(4.47)
Even within the class of probabilistic results, there are two distinct avors.
1 2e2 M ,
to combine a probabilistic construction of a matrix that will satisfy the RIP with high probability with
the previous results in this chapter.
that signal x.
x,
x.
instance-optimal in probability.
x.
distinction in practice, but if we assume for the moment that it is permissible to draw a new matrix
each
x,
for
The distinction is
for
then we can see that Theorem 4.4, (Theorem 5.1 of [45]), p. 37 may be somewhat pessimistic. In
order to establish our main result we will rely on the fact, previously used in "Matrices that satisfy the RIP"
(Section 3.5), that sub-Gaussian matrices preserve the norm of an arbitrary vector with high probability.
Specically, a slight modication of Corollary 1 from "Matrices that satisfy the RIP" (Section 3.5) shows
that for any
x RN ,
if we choose
that
P k x k22 2 k x k22 exp (3 M )
(4.48)
39
with
3 = 4/ .
Theorem 4.6:
the solution
to (4.40) obeys
8 1 + 2K 1 + 2 2K
k x xk2
K (x)2 .
1 1 + 2 2K
^
(4.49)
Proof:
will satisfy the RIP of
2K with probability at least 12exp (2 M ). Next, let denote the index set corresponding
to the K entries of x with largest magnitude and write x = x + xc . Since x K , we can write
x = x +xc = x +e. If is sub-Gaussian then from Lemma 2 from "Sub-Gaussian random
variables" (Section 7.1) we have that xc is also sub-Gaussian, and one can apply (4.48) to obtain
that with probability at least 1 exp (3 M ), kxc k2 2kxc k2 = 2K (x)2 . Thus, applying the
union bound we have that with probability exceeding 1 2exp (2 M ) exp (3 M ), we satisfy
the necessary conditions to apply Theorem 1 from "Signal recovery in noise" (Section 4.3) to x ,
in which case K (x )1 = 0 and hence
First we recall that, as noted above, from Theorem 4.5, p. 38 we have that
order
k x x k2 2C2 K (x)2 .
(4.50)
(4.51)
Thus, although it is not possible to achieve a deterministic guarantee of the form in (4.42) without taking
a prohibitively large number of measurements, it
is
hold with high probability while simultaneously taking far fewer measurements than would be suggested by
Theorem 4.4, (Theorem 5.1 of [45]), p. 37. Note that the above result applies only to the case where the
parameter is selected correctly, which requires some limited knowledge of
x,
namely
K (x)2 .
In practice this
limitation can easily be overcome through a parameter selection technique such as cross-validation [209],
but there also exist more intricate analyses of
`1
performance without requiring an oracle for parameter selection [212]. Note that Theorem 4.6, p. 39 can
also be generalized to handle other measurement matrices and to the case where
than sparse.
is compressible rather
Moreover, this proof technique is applicable to a variety of the greedy algorithms described
later in this course that do not require knowledge of the noise level to establish similar results [44], [152].
`1
minimization based on the restricted isometry property (Section 3.3) (RIP) described in
"Signal recovery in noise" (Section 4.3) allows us to establish a variety of guarantees under dierent noise
settings, but one drawback is that the analysis of how many measurements are actually required for a matrix
to satisfy the RIP is relatively loose. An alternative approach to analyzing
7 This
`1
minimization algorithms is to
40
examine them from a more geometric perspective. Towards this end, we dene the closed
as the
cross-polytope :
`1
C N = {x RN : kxk1 1}.
2N points {pi }2N
i=1 . Let
2N
as either the convex hull of {pi }i=1 or equivalently as
Note that
CN
(4.52)
C N RM
C N = {y RM : y = x, x C N }.
x K = {x : kxk0 K},
For any
of
x.
we can associate a
K -faces
of
C N
K -face
of
(4.53)
with the support and sign pattern
for
x= argmin kzk1
subject to
z B (y) .
(4.54)
B (y) = {z : z = y}.
Thus,
polytope with certain other polytopes (the simplex and the hypercube), one can apply the same technique to
obtain results concerning the recovery of more limited signal classes, such as sparse signals with nonnegative
or bounded entries [77].
Given this result, one can then study random matrix constructions from this perspective to obtain
probabilistic bounds on the number of
Gaussian distribution.
results as
N .
C N with is generated
that K = M and M = N ,
K -faces
of
phase transition
and
K -faces
[77].
These results provide sharp bounds on the minimum number of measurements required in the noiseless
setting.
In general, these bounds are signicantly stronger than the corresponding measurement bounds
obtained within the RIP-based framework given in "Noise-free signal recovery" (Section 4.2), which tend to
be extremely loose in terms of the constants involved. However, these sharper bounds also require somewhat
more intricate analysis and typically more restrictive assumptions on
one of the main strengths of the RIP-based analysis presented in "Noise-free signal recovery" (Section 4.2)
and "Signal recovery in noise" (Section 4.3) is that it gives results for a broad class of matrices that can also
be extended to noisy settings.
Chapter 5
y = x + e
of a signal
y.
x,
Considerable eorts have been directed towards developing algorithms that perform fast, accurate, and
stable reconstruction of
from
y.
typically satises certain geometric conditions, such as the restricted isometry property (Section 3.3) (RIP).
Practical algorithms exploit this fact in various ways in order to drive down the number of measurements,
enable faster reconstruction, and ensure robustness to both numerical and stochastic errors.
The design of sparse recovery algorithms are guided by various criteria. Some important ones are listed
as follows.
same number of measurements (up to a small constant) required for the stable embedding of
K -sparse
signals.
stable with regards to perturbations of the input signal, as well as noise added to the measurements;
both types of errors arise naturally in practical systems.
Speed.
Sparse recovery algorithms must strive towards expending minimal computational resources,
Keeping in mind that a lot of applications in CS deal with very high-dimensional signals.
Performance guarantees.
`1
algorithms, we will have the same considerations. For example, we can choose to design algorithms
that possess instance-optimal or probabilistic guarantees (Section 4.4). We can also choose to focus on
algorithm performance for the recovery of exactly
recovery of general signals
xs.
K -sparse
signals
x,
performance guarantees in either the noise-free (Section 4.2) or noisy (Section 4.3) settings.
A multitude of algorithms satisfying some (or even all) of the above have been proposed in the literature.
While it is impossible to describe all of them in this chapter, we refer the interested reader to the DSP
resources webpage
tend to fall under three categories: convex optimization-based approaches (Section 5.2), greedy methods
(Section 5.3), and combinatorial techniques (Section 5.4). The rest of the chapter discusses several properties
and example algorithms of each avor of CS reconstruction.
online at <http://cnx.org/content/m37292/1.3/>.
41
42
An important class of sparse recovery algorithms (Section 5.1) fall under the purview of
Algorithms in this category seek to optimize a convex function
convex optimization.
RN .
5.2.1 Setup
Let
J (x)
sparsity-promoting
be a convex
signal representation
from measurements
y = x, R
J (x)
M N
x.)
To recover a sparse
(5.1)
(5.2)
y.
and
unconstrained
formulation:
(5.3)
for some
> 0.
The parameter
cross-validation [18].
J (x) = k x k1 ,
subject to
the
`1 -norm
k x k1
of
x,
and
is known as the
H (x, y) =
Lasso
problem.
More generally,
J ()
and can be replaced by other, more complex, functions; for example, the desired signal may be piecewise
constant, and simultaneously have a sparse representation under a known basis transform
In this case,
J (x) = T V (x) + k x k1
(5.4)
It might be tempting to use conventional convex optimization packages for the above formulations ((5.1),
(5.2), and (5.3)). Nevertheless, the above problems pose two key challenges which are specic to practical
problems encountered in CS (Section 1.1): (i) real-world applications are invariably large-scale (an image of
a resolution of
standard optimization software package); (ii) the objective function is nonsmooth, and standard smoothing
techniques do not yield very good results.
involving matrix factorizations) are not eective or even applicable. These unique challenges encountered in
the context of CS have led to considerable interest in developing improved sparse recovery algorithms in the
optimization community.
J (x) = k x k1
`1 -minimization
N3
These can be
) using standard interior-point methods [19]. This was the rst feasible
reconstruction algorithm used for CS recovery and has strong theoretical guarantees, as shown earlier in
3 This
43
In the noisy case, the problem can be recast as a second-order cone program
(SOCP) with quadratic constraints. Solving LPs and SOCPs is a principal thrust in optimization research;
nevertheless, their application in practical CS problems is limited due to the fact that both the signal
dimension
N,
M,
and SOCPs correspond to the constrained formulations in (5.1) and (5.2) and are solved using
rst order
interior-point methods.
A newer algorithm called l1_ls" [124] is based on an interior-point algorithm that uses a preconditioned
conjugate gradient (PCG) method to approximately solve linear systems in a truncated-Newton framework.
The algorithm exploits the structure of the Hessian to construct their preconditioner; thus, this is a second
order method.
Computational results show that about a hundred PCG steps are sucient for obtaining
accurate reconstruction. This method has been typically shown to be slower than rst-order methods, but
could be faster in cases where the true target signal is highly sparse.
`1 -minimization
min k x k1 + H (x) ,
(5.5)
and analyzed in [12], [96], [148], [156], and then further studied or extended in [48], [54], [85], [87], [110],
[213]. Shrinkage is a classic method used in wavelet-based image denoising. The shrinkage operator on any
scalar component can be dened as follows:
ift > ,
if t , and
t+
ift < .
shrink (t, ) = {
i = 1, ..., N ,
the
(5.6)
ith
coecient of
at the
(k + 1)
th
by
xk+1
= shrink
i
where
>0
xk [U+25BD]H xk
i
H (),
the gradient
(5.7)
k)
and
[U+25BD]H
is as specied by
xk+1
and
xk .
For
xk ;
thus each iteration of (5.7) essentially boils down to a small number of matrix-vector multiplications.
The simplicity of the iterative approach is quite appealing, both from a computational, as well as a codedesign standpoint.
proposed, both to improve the eciency of the basic iteration in (5.7), and to extend its applicability to
various kinds of
[88], [97], [213]. In principle, the basic iteration in (5.7) would not be practically eective
without a continuation (or path-following) strategy [110], [213] in which we choose a gradually decreasing
sequence of values for the parameter
This procedure is known as
Continuation (FPC) has been compared favorably with another similar method known as Gradient Projection for Sparse Reconstruction (GPSR) [97] and l1_ls [124]. A key aspect to solving the unconstrained
optimization problem is the choice of the parameter
may be chosen
by trial and error; for the noiseless constrained formulation, we may solve the corresponding unconstrained
minimization by choosing a large value for
44
In the case of recovery from noisy compressive measurements, a commonly used choice for the convex
cost function
H (x)
residual.
Thus we have:
=k y x k22
H (x)
(5.8)
= 2> (y x) .
[U+25BD]H (x)
For this particular choice of penalty function, (5.7) reduces to the following iteration:
xk+1
= shrink
i
xk [U+25BD]H y xk i , (5.9)
which is run until convergence to a xed point. The algorithm is detailed in pseudocode form below.
Bregman iterations.
y k+1
= y k + y xk
xk+1
(5.10)
The problem in the second step can be solved by the algorithms reviewed above. Bregman iterations were
introduced in [159] for constrained total variation minimization problems, and was proved to converge for
closed, convex functions
J (x).
> 0.
For moderate
J (x) = k x k1
than 5. Compared to the alternate approach that solves (5.1) through directly solving the unconstrained
problem in (5.3) with a very large
Bregman iterations are often more stable and sometimes much faster.
5.2.5 Discussion
All the methods discussed in this section optimize a convex function (usually the
(possibly unbounded) set.
This implies
guaranteed
(Section 4.1), convex optimization methods will recover the underlying signal
ation methods also guarantee
unconstrained formulation.
`1 -norm)
stable
x.
over a convex
In other words,
`1
minimization"
45
5.3.1 Setup
As opposed to solving a (possibly computationally expensive) convex optimization (Section 5.2) program,
an alternate avor to sparse recovery (Section 5.1) is to apply methods of
the goal of sparse recovery is to recover the
sparsest
vector
sparse approximation.
Recall that
y.
In
min{|I| : y =
I
where
i xi },
(5.11)
iI
i = 1, ..., N ,
and
well known that searching over the power set formed by the columns of
smallest cardinality is NP-hard.
greedily
selecting columns of
ith
denotes the
column of
.
I
It is
with
y.
y.
residual r RM ;
RM N ;
the as-yet unexplained portion of the measurements. At each iteration of the algorithm, we select a vector
from the dictionary that is maximally correlated with the residual
k = argmax
< rk , >
k k
r:
.
(5.12)
Once this column is selected, we possess a better representation of the signal, since a new coecient
indexed by
approximation as follows:
rk
^
x k
= rk1
<rk1 ,k >k
kk k2
and repeat the iteration. A suitable stopping criterion is when the norm of
quantity. MP is described in pseudocode form below.
(5.13)
46
Although MP is intuitive and can nd an accurate approximation of the signal, it possesses two major
drawbacks:
(i) it oers no guarantees in terms of recovery error; indeed, it does not exploit the special
O (M N T )
(ii) the required number of iterations required can be quite large. The
[83] , where
T.
onto the
orthogonal subspace
k,
r to
selected
This quantity thus better represents the unexplained portion of the residual, and is subtracted from
form a new residual, and the process is repeated. If
at time step
t,
xk
^
= argmink y x k2 ,
x
= xt ,
rt
= y t .
(5.14)
These steps are repeated until convergence. This is known as Orthogonal Matching Pursuit (OMP) [160].
Tropp and Gilbert [191] proved that OMP can be used to recover a sparse signal with high probability using
compressive measurements. The algorithm converges in at most
requires the added computational cost of orthogonalization at each iteration. Indeed, the total complexity
of OMP can be shown to be
O (M N K) .
While OMP is provably fast and can be shown to lead to exact recovery, the guarantees accompanying
OMP for sparse recovery are weaker than those associated with optimization techniques (Section 4.1). In
particular, the reconstruction guarantees are
matrix with
M = CKlogN
measurements.
Ksparse
signal with
M = CKlogN
more measurements. For example, see [59].) Another issue with OMP is robustness to noise; it is unknown
whether the solution obtained by OMP will only be perturbed slightly by the addition of a small amount of
noise in the measurements. Nevertheless, OMP is an ecient method for CS recovery, especially when the
signal sparsity
47
K.
Pursuit (StOMP) [69] is a better choice for approximately sparse signals in a large-scale setting.
StOMP oers considerable computational advantages over
`1
Matching Pursuit for large scale problems with sparse solutions. The algorithm starts with an initial residual
T rk1
at the
estimate of the signal using this expanded set of columns, just as before.
Unlike OMP, the number of iterations in StOMP is xed and chosen before hand;
in [69].
O (KN logN )
S = 10 is recommended
However, StOMP does not bring in its wake any reconstruction guarantees.
memory requirements compared to OMP where the orthogonalization requires the maintenance of a Cholesky
factorization of the dictionary elements.
`1
K,
columns of the
greedy-like methods.
One variant of such an approach is employed by the CoSaMP algorithm.
An interesting feature of
CoSaMP is that unlike MP, OMP and StOMP, new indices in a signal estimate can be added
until the end. A brief description of CoSaMP is as follows: at the start of a given iteration
signal estimate is
xi1 .
2K
e T r
Merge supports:
Prune
T supp xi1
b
e;
.
by subspace projection:
by retaining its
b|T T y , b|T C 0
as well as
.
^
xi .
r y xi .
i,
suppose the
48
1.
ii+1
2. e T r {form signal residual estimate}
3. supp (T (e,2K)) {prune
signal residual estimate}
^
4. T supp x_i 1 {merge supports}
5. b|_T _T y , b|_T C {form signal estimate}
^
6. x_i T (b, K) {prune signal estimate}
^
7. r y x_i {update measurement residual}
end while
^ ^
return x x_i
As discussed in [153], the key computational issues for CoSaMP are the formation of the signal residual, and
the method used for subspace projection in the signal estimation step. Under certain general assumptions,
the computational cost of CoSaMP can be shown to be
O (M N ),
which is
original signal. This represents an improvement over both greedy algorithms as well as convex methods.
While CoSaMP arguably represents the state of the art in sparse recovery algorithm performance, it
possesses one drawback:
An incorrect choice of input sparsity may lead to a worse guarantee than the actual error incurred by a
weaker algorithm such as OMP. The stability bounds accompanying CoSaMP ensure that the error due to
an incorrect parameter choice is bounded, but it is not yet known how these bounds translate into practice.
x0 ,
^
^
xi+1 = T xi + T y xi , K .
(5.15)
In [17], Blumensath and Davies proved that this sequence of iterations converges to a xed point
if the matrix
x;
further,
satises an instance-optimality
guarantee of the type described earlier (Section 4.1). The guarantees (as well as the proof technique) are
reminiscent of the ones that are derived in the development of other algorithms such as ROMP and CoSaMP.
5.3.7 Discussion
While convex optimization techniques are powerful methods for computing sparse representations, there are
also a variety of greedy/iterative methods for solving such problems. Greedy algorithms rely on iterative
approximation of the signal coecients and support, either by iteratively identifying the support of the
signal until a convergence criterion is met, or alternatively by obtaining an improved estimate of the sparse
signal at each iteration by accounting for the mismatch to the measured data.
can actually be shown to have performance guarantees that match those obtained for convex optimization
approaches.
used for
`1
In fact, some of the more sophisticated greedy algorithms are remarkably similar to those
minimization described previously (Section 5.2).
performance guarantees are substantially dierent. There also exist iterative techniques for sparse recovery
based on message passing schemes for sparse graphical models.
those in [14], [116]) can be directly interpreted as message passing methods [73].
49
In addition to convex optimization (Section 5.2) and greedy pursuit (Section 5.3) approaches, there is another
important class of sparse recovery algorithms that we will refer to as
combinatorial algorithms.
These
algorithms, mostly developed by the theoretical computer science community, in many cases pre-date the
compressive sensing (Section 1.1) literature but are highly relevant to the sparse signal recovery problem
(Section 5.1).
5.4.1 Setup
The oldest combinatorial algorithms were developed in the context of
group testing problem, we suppose that there are
group testing
elements
are anomalous and need to be identied. For example, we might wish to identify defective products in an
industrial setting, or identify a subset of diseased tissue samples in a medical context. In both of these cases
the vector
xi 6= 0
for the
xi = 0
otherwise. Our goal is to design a collection of tests that allow us to identify the support (and possibly the
values of the nonzeros) of
while also minimizing the number of tests performed. In the simplest practical
ith
whose entries ij
Another application area in which combinatorial algorithms have proven useful is computation on
streams
[49], [149].
with destination
i.
j th
test. If the output of the test is linear with respect to the inputs, then the problem
Suppose that
xi
data
A non-
exhaustive list includes Random Fourier Sampling [106], HHS Pursuit [106], and Sparse Sequential Matching
Pursuit [13]. We do not provide a full discussion of each of these algorithms; instead, we describe two simple
methods that highlight the avors of combinatorial sparse recovery
count-min
and
count-median.
H as
mN .
hH
with each column being a binary vector with exactly one 1 at the location
the overall
dened on
H,
M N
, where
5 This
y,
is a nite set
of size
j = h (i).
m N,
To construct
Thus, if
M = md,
is a binary
via the following two properties. First, the coecients of the measurement vector
ment vector
d ones.
Now given any signal x, we acquire linear measurements y = x.
matrix of size
Note that
{h1 , ..., hd }.
h.
ith
yi
is simply
50
given by:
yi =
xj .
(5.16)
j:h(j)=i
In other words, for a xed signal coecient index
an observation of
xj
j,
each measurement
yi
by the function
h.
Signal
recovery essentially consists of estimating the signal values from these corrupted" observations.
The
count-min algorithm is useful in the special case where the entries of the original signal are positive.
Given measurements
j th
signal
xj = minyi : hl (j) = i.
(5.17)
xj
xj
corrupted by other signal values, and picking the one with the lowest magnitude. Despite the simplicity
m = 4/K ,
d = ClogN
satises:
k x x k /K k x x k1 ,
where
K -term
approximation of
in the
`1
(5.18)
sense.
count-median
compute the median of all those measurements that are comprised of a corrupted version of
xj
and declare
xj = median yi : hl (j) = i.
(5.19)
The recovery guarantees for count-median are similar to that for count-min, with a dierent value of
the failure probability constant.
perfectly noiseless,
5.4.4 Summary
Although we ultimately wish to recover a sparse signal from a small number of linear measurements in both
of these settings, there are some important dierences between such settings and the compressive sensing
6
the amount of computation required to perform recovery. For example, it is often useful to design
so that
it has few nonzeros, i.e., the sensing matrix itself is also sparse [11], [102], [117]. In general, most methods
involve careful construction of the sensing matrix (Section 3.1)
and greedy methods that work with any matrix satisfying a generic condition such as the restricted isometry
property (Section 3.3).
This additional degree of freedom can lead to signicantly faster algorithms [42],
51
Second, note that the computational complexity of all the convex methods and greedy algorithms described above is always at least linear in
N,
entries of
x.
example. In this context, one may seek to develop algorithms whose complexity is linear only in the
length
may seem, such algorithms are indeed possible. See [103], [105] for examples.
5.5.1 Setup
8
In
is xed and belongs to a known set of signals. In this section, we depart from this
Throughout this course , we have almost exclusively worked within a deterministic signal framework.
other words, our signal
framework and assume that the sparse (Section 2.3) (or compressible (Section 2.4)) signal of interest arises
from a known
probability distribution,
y = x
priors
on the elements of
x,
and
Bayesian
x.
The algorithms discussed in this section demonstrate a digression from the conventional sparse recovery
techniques typically used in compressive sensing (Section 1.1) (CS). We note that none of these algorithms are
accompanied by guarantees on the number of measurements required, or the delity of signal reconstruction;
indeed, in a Bayesian signal modeling framework, there is no well-dened notion of reconstruction error.
However, such methods do provide insight into developing recovery algorithms for rich classes of signals, and
may be of considerable practical interest.
In particular, sparse codes such as LDPC codes have had grand success.
The
advantage that sparse coding matrices may have in ecient encoding of signals and their low complexity
decoding algorithms, is transferable to CS encoding and decoding with the use of
(Section 3.1)
7 This
8
sparse
sensing matrices
52
Figure 5.1: Factor graph depicting the relationship between the variables involved in CS decoding using
BP. Variable nodes are black and the constraint nodes are white.
A sensing matrix
that denes the relation between the signal x and measurements y can be represented
x (i) and measurement nodes y (i) [170], [175]. The factor
graph in Figure 5.1 represents the relationship between the signal coecients and measurements in the CS
decoding problem.
The choice of signal probability density is of practical interest. In many applications, the signals of interest
need to be modeled as being compressible (as opposed to being strictly sparse). This behavior is modeled
by a two-state Gaussian mixture distribution, with each signal coecient taking either a large or small
coecient value state. Assuming that the elements of
occur more frequently than the large coecients. Other distributions besides the two-state Gaussian may
also be used to model the coecients, for e.g., the i.i.d. Laplace prior on the coecients of
The ultimate goal is to estimate (i.e., decode)
x,
given
and
x.
a Bayesian inference problem in which we want to approximate the marginal distributions of each of the
coecients conditioned on the observed measurements
y (i).
x (i)
Estimate (MLE), or the Maximum a Posteriori (MAP) estimates of the coecients from their distributions.
This sort of inference can be solved using a variety of methods; for example, the popular belief propagation
method (BP) [170] can be applied to solve for the coecients approximately. Although exact inference in
arbitrary graphical models is an NP hard problem, inference using BP can be employed when
is sparse
enough, i.e., when most of the entries in the matrix are equal to zero.
(RVMs). An RVM is essentially a Bayesian learning method that produces sparse classication by linearly
weighting a small number of xed basis functions from a large dictionary of potential candidates (for more
details the interested reader may refer to [189], [188]).
method to determine the elements of a sparse
columns of
The RVM setup employs a hierarchy of priors; rst, a Gaussian prior is assigned to each of the
of
x;
of the
ith
elements
53
each
xi .
If
p (x|) =
N
Y
N xi |0, i1
(5.20)
i=1
and the Gamma prior on
is written as:
p (|a, b) =
N
Y
(i |a, b)
(5.21)
i=1
The overall prior on
to peak at
xi = 0
RVM approach can be visualized using a graphical model similar to the one in "Sparse recovery via belief
propagation" (Section 5.5.2: Sparse recovery via belief propagation). Using the observed measurements
the posterior density on each
xi
y,
(MCMC) methods). For a detailed analysis of the RVM with a measurement noise prior, refer to [119], [188].
Alternatively, we can eliminate the need to set the hyperparameters
Gaussian measurement noise with mean 0 and variance
for
2 ,
and
as follows.
and maximize it by the EM algorithm (or directly dierentiate) to nd estimates for
L () = logp y|, 2 = log
Assuming
p y|x, 2 p (y|) dx.
(5.22)
N N
O N3
the RVM is available which monotonically maximizes the marginal likelihoods of the priors by a gradient
ascent, resulting in an algorithm with complexity
O NM2
and deleted, thus building the model up constructively, and the true sparsity of the signal
is exploited to
minimize model complexity. This is known as Fast Marginal Likelihood Maximization, and is employed by
the Bayesian Compressive Sensing (BCS) algorithm [119] to eciently evaluate the posterior densities of
xi .
A key advantage of the BCS algorithm is that it enables evaluation of error bars on each estimated
coecient of
to
x;
these give us an idea of the (in)accuracies of these estimates. These error bars could be used
adaptively select the linear projections (i.e., the rows of the matrix ) to reduce uncertainty in the signal.
This provides an intriguing connection between CS and machine learning techniques such as experimental
design and active learning [91], [138].
54
Chapter 6
Many of the sparse recovery algorithms (Section 5.1) we have described so far in this course
were originally
developed to address the problem of sparse linear regression and model selection in statistics. In this setting
we are given some data consisting of a set of input variables and response variables. We will suppose that
there are a total of
M 1
observations as an
vector
M N
matrix
We can
y.
such that
practice it is common that only a few input variables are actually necessary to predict the response variable.
In this case the
that we wish to estimate is sparse, and we can apply all of the techniques that we have
x.
model selection
In communications, error correction refers to mechanisms that can detect and correct errors in the data
that appear duet to distortion in the transmission channel. Standard approaches for error correction rely on
repetition schemes, redundancy checks, or nearest neighbor code search. We consider the particular case in
which a signal
with
The techniques developed for sparse recovery (Section 5.1) in the context of compressive sensing (Sec-
1 This
2
3 This
55
56
matrix
that holds
= 0.
y = y = x + e = e.
If the matrix
it satises a condition such as the restricted isometry property (Section 3.3)) and
then the error vector
^
^
e , and the signal can be recovered as x=
y = y
y = y e .
As an example, when the codewords m have random independent and identically distributed sub-Gaussian
(Section 7.1) entries, then a K -sparse error can be corrected if M < N CKlogN/K for a xed constant C
Another scenario where compressive sensing (Section 1.1) and sparse recovery algorithms (Section 5.1) can
be potentially useful is the context of
group testing
group testing
combinatorial
anomalous
elements that we wish to nd. For example, we might wish to identify defective products in an industrial
setting, or identify a subset of diseased tissue samples in a medical context. In both of these cases the vector
xi 6= 0
for the
xi = 0
otherwise.
Our goal is to design a collection of tests that allow us to identify the support (and possibly the values of the
nonzeros) of
while also minimizing the number of tests performed. In the simplest practical setting these
th
the vector
whose entries
ij
j th
item is used
test. If the output of the test is linear with respect to the inputs, then the problem of recovering
is essentially the same as the standard sparse recovery problem in compressive sensing.
data streams
i.
xi
represents
is
typically infeasible since the total number of possible destinations (represented by a 32-bit IP address) is
N = 232 . Thus, instead of attempting to store x directly, one can store y = x where is an M N matrix
with M N . In this context the vector y is often called a sketch. Note that in this problem y is computed
in a dierent manner than in the compressive sensing context. Specically, in the network trac example we
xi
4 This
5 This
57
magnetic elds to cause water molecules in the human body to disorient and then reorient themselves, which
causes a release of detectable radiofrequencies. We assume that the object to be imaged as a collection of
voxels. The MRI's magnetic pulses are sent incrementally along a gradient leading to a dierent phase and
frequency encoding for each column and row of voxels respectively. Abstracting away from the technicalities
of the physical process, the magnetic eld measured in MRI acquisition corresponds to a Fourier coecient
of the imaged object; the object can then be recovered by an inverse Fourier transform. , we can view the
MRI as measuring Fourier samples.
A major limitation of the MRI process is the linear relation between the number of measured data
samples and scan times. Long-duration MRI scans are more susceptible to physiological motion artifacts, add
discomfort to the patient, and are expensive [134]. Therefore, minimizing scan time without compromising
image quality is of direct benet to the medical community.
The theory of compressive sensing (Section 1.1) (CS) can be applied to MR image reconstruction by
exploiting the transform-domain sparsity of MR images [135], [136], [137], [198]. In standard MRI reconstruction, undersampling in the Fourier domain results in aliasing artifacts when the image is reconstructed.
However, when a known transform renders the object image sparse (Section 2.3) or compressible (Section 2.4),
the image can be reconstructed using sparse recovery (Section 5.1) methods. While the discrete cosine and
wavelet transforms are commonly used in CS to reconstruct these images, the use of total variation norm
minimization also provides high-quality reconstruction.
6.4.2 Electroencephalography
Electroencephalography (EEG) and Magnetoencephalography (MEG) are two popular noninvasive methods
to characterize brain function by measuring scalp electric potential distributions and magnetic elds due to
neuronal ring. EEG and MEG provide temporal resolution on the millisecond timescale characteristic of
neural population activity and can also help to estimate the current sources inside the brain by solving an
inverse problem [107].
Models for neuromagnetic sources suggest that the underlying activity is often limited in spatial extent.
Based on this idea, algorithms like FOCUSS (Focal Underdetermined System Solution) are used to identify
highly localized sources by assuming a sparse model to solve an underdetermined problem [108].
FOCUSS is a recursive linear estimation procedure, based on a weighted pseudo-inverse solution. The
algorithm assigns a current (with nonlinear current location parameters) to each element within a region so
that the unknown current values can be related linearly to the measurements. The weights at each step are
derived from the solution of the previous iterative step. The algorithm converges to a source distribution
in which the number of parameters required to describe source currents does not exceed the number of
measurements. The initialization determines which of the localized solutions the algorithm converges to.
We now consider the application of compressive sensing (Section 1.1) (CS) to the problem of designing a
system that can acquire a continuous-time signal
digital converter
x (t). Specically, we would like to build an analog-tox (t) at its Nyquist rate when x (t) is sparse. In this
x (t)
has some kind of sparse (Section 2.3) structure in the Fourier domain,
meaning that it is still bandlimited but that much of the spectrum is empty. We will discuss the dierent
possible signal models for mathematically capturing this structure in greater detail below.
challenge is that our measurement system (Section 3.1) must be built using analog hardware. This imposes
severe restrictions on the kinds of operations we can perform.
6 This
58
{j (t)}M
j=1 .
x (t),
t [0, T ],
and
y [j] =
(6.1)
0
Building an analog system to collect such measurements will require three main components:
j (t);
x (t) with
M
M
each respective
j (t);
We could then sample and quantize the output of each of the integrators to collect the measurements
y [j].
Of course, even in this somewhat idealized setting, it should be clear that what we can build in hardware will constrain our choice of
arbitrarily complex
j (t)
j (t)
corre-
lator/integrator pairs operating in parallel, which will be potentially prohibitively expensive both in dollar
cost as well as costs such as size, weight, and power (SWAP).
As a result, there have been a number of eorts to design simpler architectures, chiey by carefully
j (t). The simplest to describe and historically earliest idea is to choose j (t) =
(t tj ), where {tj }M
j=1 denotes a sequence of M locations in time at which we would like to sample the
signal x (t). Typically, if the number of measurements we are acquiring is lower than the Nyquist-rate, then
these locations cannot simply be uniformly spaced in the interval [0, T ], but must be carefully chosen. Note
designing structured
that this approach simply requires a single traditional ADC with the ability to sample on a non-uniform
grid, avoiding the requirement for
have been studied in other contexts outside of the CS framework. For example, there exist specialized fast
algorithms for the recovery of extremely large Fourier-sparse signals. The algorithm uses samples at a nonuniform sequence of locations that are highly structured, but where the initial location is chosen using a
(pseudo)random seed. This literature provides guarantees similar to those available from standard CS [101],
[104]. Additionally, there exist frameworks for the sampling and recovery of multi-band signals, whose Fourier
transforms are mostly zero except for a few frequency bands. These schemes again use non-uniform sampling
patterns based on coset sampling [21], [20], [95], [93], [146], [202]. Unfortunately, these approaches are often
highly sensitive to
jitter, or error
random demodulator
The architecture of the random demodulator is depicted in Figure 6.1. The analog input
with a pseudorandom square pulse of
values at a rate of
Na Hz,
called the
x (t)
is correlated
where
Ma Hz Na Hz.
1's,
j/Ma
y [j] =
(6.2)
(j1)/Ma
In practice, data is processed in time blocks of period
in the chipping sequence, and
M = Ma T
T , and we dene N = Na T
of this model below, but the key observation is that the correlator and chipping sequence operate at a
fast rate, while the back-end ADC operates at a low rate.
modulator/chipping sequence combination than a high-rate ADC [130]. In fact, many systems already use
components of this front end for binary phase shift keying demodulation, as well as for other conventional
communication schemes such as CDMA.
7A
correlator is also known as a demodulator due to its most common application: demodulating radio signals.
59
y [1]
=
=
But since
the
nth
Na
is the Nyquist-rate of
R 1/Ma
0
PNa /Ma
n=1
x (t),
pc (t) x (t) dt
R n/Na
pc [n] (n1)/N
x (t) dt.
a
R n/Na
(n1)/Na
x [n].
x (t) dt
(6.3)
x (t)
on
Thus, we obtain
Na /Ma
y [1] =
pc [n] x [n] .
(6.4)
n=1
In general, our measurement process is equivalent to multiplying the signal
1's
in
pc [n]
such a
In general,
will have M
example, with
N = 12, M = 4,
and
+1
structure are extremely ecient to apply, requiring only
1 +1 +1
1 +1 1
+1 +1 1
banded matrix
T = 1,
Na /Ma
(6.5)
60
A detailed analysis of the random demodulator in [193] studied the properties of these matrices applied to
a particular signal model. Specically, it is shown that if
transform (DFT) matrix, then the matrix
represents the N N
M = O Klog 2 (N/K) ,
where the probability is taken with respect to the random choice of
(6.6)
pc [n].
x (t)
is a
periodic (or nite-length) signal such that once it is sampled it is sparse or compressible in the basis
then it should be possible to recover
x (t)
K -sparse
(in
`1
signals with
M CKlog (N/K + 1)
measurements where
C 1.7
(6.7)
[193].
Note that the signal model considered in [193] is somewhat restrictive, since even a pure tone will not
yield a sparse DFT unless the frequency happens to be equal to
k/Na
k.
Perhaps a more
realistic signal model is the multi-band signal model of [21], [20], [95], [93], [146], [202], where the signal
is assumed to be bandlimited outside of
B,
where
KB
total possible bandwidth. It remains unknown whether the random demodulator can be exploited to recover
such signals. Moreover, there also exist other CS-inspired architectures that we have not explored in this
[3], [167], [195], and this remains an active area of research. We have simply provided an overview of one of
the more promising approaches in order to illustrate the potential applicability of the ideas of this course
6.6.1 Architecture
Several hardware architectures have been proposed that apply the theory of compressive sensing (Section 1.1)
(CS) in an imaging setting [80], [143], [165]. We will focus on the so-called
[182], [205], [206].
products
single-pixel camera
[80], [181],
The single-pixel camera is an optical computer that sequentially measures the inner
y [j] =< x, j > between an N -pixel sampled version of the incident light-eld from the scene
x) and a set of N -pixel test functions {j }M
j=1 . The architecture is illustrated in
Figure 6.2, and an aerial view of the camera in the lab is shown in Figure 6.3. As shown in these gures,
the light-eld is focused by a lens (Lens 1 in Figure 6.3) not onto a CCD or CMOS sampling array but
rather onto a spatial light modulator (SLM). An SLM modulates the intensity of a light beam according to
a control signal. A simple example of a transmissive SLM that either passes or blocks parts of the beam is
an overhead transparency. Another example is a liquid crystal display (LCD) projector.
8
9 This
61
Figure 6.2: Single-pixel camera block diagram. Incident light-eld (corresponding to the desired image
x)
is reected o a digital micromirror device (DMD) array whose mirror orientations are modulated
Each dierent
mirror pattern produces a voltage at the single photodiode that corresponds to one measurement
y [j].
The Texas Instruments (TI) digital micromirror device (DMD) is a reective SLM that selectively redirects parts of the light beam. The DMD consists of an array of bacterium-sized, electrostatically actuated
micro-mirrors, where each mirror in the array is suspended above an individual static random access memory
(SRAM) cell. Each mirror rotates about a hinge and can be positioned in one of two states (10 degrees
from horizontal) according to which bit is loaded into the SRAM cell; thus light falling on the DMD can be
reected in two directions depending on the orientation of the mirrors.
Each element of the SLM corresponds to a particular element of
For a given
j ,
x).
we can orient the corresponding element of the SLM either towards (corresponding to a 1
at that element of
j )
j )
Figure 6.3). This second lens collects the reected light and focuses it onto a single photon detector (the
single pixel) that integrates the product of
and
as its
x
j into row vectors, we can thus model this system as computing the product
y = x, where each row of corresponds to a j . To compute randomized measurements, we set the
mirror orientations j randomly using a pseudorandom number generator, measure y [j], and then repeat
the process M times to obtain the measurement vector y .
obtained by dithering the mirrors back and forth during the photodiode integration time. By reshaping
into a column vector and the
62
The single-pixel design reduces the required size, complexity, and cost of the photon detector array down
to a single unit, which enables the use of exotic detectors that would be impossible in a conventional digital
camera. Example detectors include a photomultiplier tube or an avalanche photodiode for low-light (photonlimited) imaging, a sandwich of several photodiodes sensitive to dierent light wavelengths for multimodal
sensing, a spectrometer for hyperspectral imaging, and so on.
In addition to sensing exibility, the practical advantages of the single-pixel design include the facts
that the quantum eciency of a photodiode is higher than that of the pixel sensors in a typical CCD or
CMOS array and that the ll factor of a DMD can reach 90% whereas that of a CCD/CMOS array is only
about 50%. An important advantage to highlight is that each CS measurement receives about
N/2
times
more photons than an average pixel sensor, which signicantly reduces image distortion from dark noise and
read-out noise.
The single-pixel design falls into the class of multiplex cameras. The baseline standard for multiplexing
is classical raster scanning, where the test functions
on each mirror in turn. There are substantial advantages to operating in a CS rather than raster scan mode,
including fewer total measurements (M for CS rather than
N = 256 256
N = 256 256
x and reconstructed
and
M = N/50[80].
image taken under low-light conditions using RGB color lters and a photomultiplier tube with
M = N/10.
In both cases, the images were reconstructed using total variation minimization, which is closely related to
wavelet coecient
`1
63
(a)
(b)
(c)
256 256
M = 1300
a low-light setting using a single photomultiplier tube sensor, RGB color lters, and
M = 6500
random
measurements.
{0, 1}-valued,
to be
j .
is impractical. Furthermore,
W0 = 1,
Wj
Wlog2 N
denote the
1 Wj1
Wj =
2 Wj1
O (N logN )
Wj1
Wj1
N N
recursively as
1/ N
(6.8)
1 1
W1 =
2 1
1
1
(6.9)
64
and
1
1
W2 =
2 1
1
We can exploit these constructions as follows.
a
M N
WB
submatrix of
WB
1
2
each row of
.
1
Suppose that
N = 2B
(6.10)
and generate
indexed by
Furthermore, let
WB .
I denote
I WB is the
random N N
Let
rows, so that
denote a
1
1
N I WB +
2
2
D.
(6.11)
1
2 merely rescales and shifts I WB to have {0, 1}-valued entries, and recall that
will be reshaped into a 2-D matrix of numbers that is then displayed on the DMD array.
N I WB +
Furthermore,
as
=
Note that
WB .
This
step adds some additional randomness since some of the rows of the Walsh matrix are highly correlated
with coarse scale wavelet basis functions but permuting the pixels eliminates this structure. Note that
at this point we do not have any strict guarantees that such
product
will yield a
satisfying the restricted isometry property (Section 3.3), but this approach seems to work well
in practice.
10
Standard digital color images of a scene of interest consist of three components red, green and blue
which contain the intensity level for each of the pixels in three dierent groups of wavelengths. This concept
has been extended in the
hyperspectral
and
of the scene at dierent wavelengths. An example datacube is shown in Figure 6.5. Each of its entries is
called a voxel.
dimension
f (x, y) = {f (x, y, )} .
spectral signature
the corresponding point in the scene that is not captured by its color. For example, using spectral signatures,
it is possible to identify the type of material observed (for example, vegetation vs. ground vs. water), or its
chemical composition.
Datacubes are high-dimensional, since the standard number of pixels present in a digitized image is
multiplied by the number of spectral bands desired.
observed data. The spatial structure common in natural images is also observed in hyperspectral imaging,
while each pixel's spectral signature is usually smooth.
10 This
65
Compressive sensing (Section 1.1) (CS) architectures for hyperspectral imaging perform lower-dimensional
projections that multiplex in the spatial domain, the spectral domain, or both.
The same digital micromirror device (DMD) provides reectivity for wavelengths from
near infrared to near ultraviolet. Thus, by converting the datacube into a vector sorted by spectral band,
the matrix that operates on the data to obtain the CS measurements is represented as
x,y
x,y
.
.
.
.
.
.
..
.
.
.
x,y
This architecture performs multiplexing only in the spatial domain, i.e. dimensions
no mixing of the dierent spectral bands along the dimension
(6.12)
and
y,
since there is
66
Figure 6.6:
a spectrometer that captures the modulated light intensity for all spectral bands, for each of the CS
measurements.
coded aperture, whose eect is to "punch holes" in the sheared datacube by blocking certain pixels of light.
Subsequently, a second dispersive element acts on the masked, sheared datacube; however, this element
shears in the opposite direction, eectively inverting the shearing of the rst dispersive element. The resulting
datacube is upright, but features "sheared" holes of datacube voxels that have been masked out.
The resulting modied datacube is then received by a sensor array, which attens the spectral dimension
by measuring the sum of all the wavelengths received; the received light eld resembles the target image,
allowing for optical adjustments such as focusing. In this way, the measurements consist of full sampling in
the spatial
and
dimension.
67
(a)
(b)
Figure 6.7: Dual disperser coded aperture snapshot spectral imager (DD-CASSI). (a) Schematic of the
DD-CASSI components. (b) Illustration of the datacube processing performed by the components.
(a)
Figure 6.8:
(b)
Single disperser coded aperture snapshot spectral imager (SD-CASSI). (a) Schematic of
the SD-CASSI components. (b) Illustration of the datacube processing performed by the components.
68
A reconstruction
algorithm then searches for the signal of lowest complexity (i.e., with the fewest dyadic squares) that generates
compressive measurements close to those observed [99].
Figure 6.9: Example dyadic square partition for piecewise spatially constant datacube.
x,y
x,y
.
.
.
.
.
.
..
.
.
.
x,y
(6.13)
69
[1, 1] x,y
[1, 2] x,y
[2, 2] x,y
.
.
.
.
.
..
(6.14)
In this manner, the datacube sparsity bases simultaneously enforces both spatial and spectral structure,
potentially achieving a sparsity level lower than the sums of the spatial sparsities for the separate spectral
slices, depending on the level of structure between them and how well can this structure be captured through
sparsity.
6.7.3 Summary
Compressive sensing (Section 1.1) will make the largest impact in applications with very large, high dimensional datasets that exhibit considerable amounts of structure. Hyperspectral imaging is a leading example
of such applications; the sensor architectures and data structure models surveyed in this module show initial
promising work in this new direction, enabling new ways of simultaneously sensing and compressing such
data. For standard sensing architectures, the data structures surveyed also enable new transform codingbased compression schemes.
11
A powerful data model for many applications is the geometric notion of a low-dimensional
that possesses merely
manifold.
K -dimensional
Data
manifold in
the high-dimensional ambient space. Once the manifold model is identied, any point on it can be represented
using essentially
observes
a truck moving down along a straight line on a highway. Then, the set of images captured by the camera
forms a 1-dimensional manifold in the image space
RN .
11 This
70
(a)
(b)
Figure 6.10: (a) A rotating cube has 3 degrees of freedom, thus giving rise to a 3-dimensional manifold
parametrized by a
Kdimensional
vector
In many applications, it is benecial to explicitly characterize the structure (alternately, identify the
parameters) of the manifold formed by a set of observed signals. This is known as
manifold learning
and has
been the subject of considerable study over the last several years; well-known manifold learning algorithms
include Isomap [185], LLE [169], and Hessian eigenmaps [72].
dimensional manifold were to be imagined as the surface of a twisted sheet of rubber, manifold learning can
be described as the process of unraveling the sheet and stretching it out on a 2D at surface. Figure 6.11
indicates the performance of Isomap on a simple 2-dimensional dataset comprising of images of a translating
disk.
71
(a)
Figure 6.11:
(b)
(c)
(1 , 2 ).
(b) True
N
Isomap embedding learned from original data in R .
linear, nonadaptive
and
K = 2
dimensions,
(c)
RN
sparse (Section 2.3) signals (see "The restricted isometry property" (Section 3.3); however, the dierence is
that the number of projections required to preserve the ensemble structure does
not
of the individual images, but rather on the dimension of the underlying manifold.
This result has far reaching implications; it suggests that a wide variety of signal processing tasks can
be performed
storage and processing costs. In particular, this enables provably ecient manifold learning in the projected
domain [113]. Figure 6.12 illustrates the performance of Isomap on the translating disk dataset under varying
numbers of random projections.
72
(a)
(b)
(c)
(d)
Figure 6.12: Isomap embeddings learned from random projections of the 625 images of shifting squares.
(a) 25 random projections; (b) 50 random projections; (c) 25 random projections; (d) full data.
The advantages of random projections extend even to cases where the original data is available in the
ambient space
RN .
For example, consider a wireless network of cameras observing a static scene. The set
of images captured by the cameras can be visualized as living on a low-dimensional manifold in the image
space. To perform joint image analysis, the following steps might be executed:
1.
Collate:
unit.
Each camera node transmits its respective captured image (of size
N ) to a central processing
73
2.
Preprocess:
intrinsic dimensionK
ifold.
3.
Learn:
The central processor performs a nonlinear embedding of the data points for instance, using
In situations where
K -dimensional
image compression (such as JPEG) at each node before transmitting to the central processing. However,
this requires a good deal of processing power at each sensor, and the compression would have to be undone
during the learning step, thus adding to overall computational costs.
As an alternative, every camera could encode its image by computing (either directly or indirectly) a small
number of random projections to communicate to the central processor [57]. These random projections are
obtained by linear operations on the data, and thus are cheaply computed. Clearly, in many situations it will
be less expensive to store, transmit, and process such randomly projected versions of the sensed images. The
method of random projections is thus a powerful tool for ensuring the stable embedding of low-dimensional
manifolds into an intermediate space of reasonable size. It is now possible to think of settings involving a
huge number of low-power devices that inexpensively capture, store, and transmit a very small number of
measurements of high-dimensional data.
12
While the compressive sensing (Section 1.1) (CS) literature has focused almost exclusively on problems in
signal reconstruction/approximation (Section 5.1), this is frequently not necessary. For instance, in many
signal processing applications (including computer vision, digital communications and radar systems), signals
are acquired only for the purpose of making a detection or classication decision. Tasks such as detection
do not require a reconstruction of the signal, but only require estimates of the relevant
sucient statistics
Reconstruct the full data using standard sparse recovery (Section 5.1) techniques and apply standard
computer vision/inference algorithms on the reconstructed images.
directly
information scalability
property of compressive measurements. This property arises from the following two observations:
random
The
number
the nature of the inference task. Informally, we observe that more sophisticated tasks require more
measurements.
We examine three possible inference problems for which algorithms that
measurements
signal), classication (assigning the observed signal to one of two (or more) signal classes), and parameter
estimation (calculating a
12 This
function of the
observed signal).
74
6.9.1 Detection
In detection one simply wishes to answer the question: is a (known) signal present in the observations? To
solve this problem, it suces to estimate a relevant
sucient statistic.
inequality, it is possible to show that such sucient statistics for a detection problem can be accurately
estimated from random projections, where the quality of this estimate depends on the signal to noise ratio
(SNR) [55]. We make no assumptions on the signal of interest
detecting
s,
even when it is not known in advance. Thus, we can use random projections for dimensionality-
(a)
(b)
(c)
chirp
in strong narrowband interference. (Bottom) Probability of error to reconstruct and detect chirp signals
embedded in strong sinusoidal interference (SIR
tion requires
= 6
6.9.2 Classication
Similarly, random projections have long been used for a variety of classication and clustering problems. The
Johnson-Lindenstrauss Lemma is often exploited in this setting to compute approximate nearest neighbors,
75
which is naturally related to classication. The key result that random projections result in an isometric
embedding allows us to generalize this work to several new classication algorithms and settings [55].
Classication can also be performed when more elaborate models are used for the dierent classes.
Suppose the signal/image class of interest can be modeled as a low-dimensional manifold (Section 6.8) in
the ambient space. In such case it can be shown that, even under random projections, certain geometric
properties of the signal class are preserved up to a small distortion; for example, interpoint Euclidean (`2 )
distances are preserved [10]. This enables the design of classication algorithms in the
projected domain.
One
such algorithm is known as the smashed lter [56]. As an example, under equal distribution among classes
and a gaussian noise setting, the smashed lter is equivalent to building a nearest-neighbor (NN) classier in
the measurement domain. Further, it has been shown that for a
Kdimensional
manifold,
M = O (KlogN )
measurements are sucient to perform reliable compressive classication. Thus, the number of measurements
scales as the dimension of the signal class, as opposed to the
sparsity
(a)
(b)
Figure 6.14: Results for smashed lter image classication and parameter estimation experiments. (a)
Classication rates and (b) average estimation error for varying number of measurements
levels
and noise
the manifolds increase as well, thus increasing the noise tolerance and enabling more accurate estimation
and classication. Thus, the classication and estimation performances improve as
decreases and
6.9.3 Estimation
x RN , and suppose that we wish to estimate some
y = x, where is again an M N matrix. The data
f (x)
Consider a signal
function
measurements
has previously analyzed this problem for many common functions, such as linear functions,
histograms.
projections.
As an example, in the case where
to the norms of
and
f)
is a
linear
sketches,
`p
norms, and
M.
class of random matrices, and can be viewed as a straightforward consequence of the same concentration of
measure inequality (Section 7.2) that has proven useful for CS and in proving the JL Lemma [55].
76
Parameter estimation can also be performed when the signal class is modeled as a low-dimensional
manifold.
where
K N.
obtained via
can be parameterized by a
Kdimensional
parameter vector
Then, it can be shown that with 0(KlogN ) measurements, the parameter vector can be
in Figure 6.14(b).
13
Sparse (Section 2.3) and compressible (Section 2.4) signals are present in many sensor network applications,
such as environmental monitoring, signal eld recording and vehicle surveillance. Compressive sensing (Section 1.1) (CS) has many properties that make it attractive in this settings, such as its low complexity sensing
and compression, its universality and its graceful degradation. CS is robust to noise, and allows querying
more nodes to obey further detail on signals as they become interesting. Packet drops also do not harm the
network nearly as much as many other protocols, only providing a marginal loss for each measurement not
obtained by the receiver. As the network becomes more congested, data can be scaled back smoothly.
Thus CS can enable the design of generic compressive sensors that perform random or incoherent projections.
Several methods for using CS in sensor networks have been proposed. Decentralized methods pass data
throughout the network, from neighbor to neighbor, and allow the decoder to probe any subset of nodes.
In contrast, centralized methods require all information to be transmitted to a centralized data center, but
reduce either the amount of information that must be transmitted or the power required to do so. We briey
summarize each class below.
random set of nodes, in each stage aggregating and forwarding the observations received to a new set of
random nodes. In essence, a spatial dot product is being performed as each node collects and aggregates
information, compiling a sum of the weighted samples to obtain
accurate as more rounds of random gossiping occur. To recover the data, a basis that provides data sparsity
(or at least compressibility) is required, as well as the random projections used. However, this information
does not need to be known while the data is being passed.
The method can also be applied when each sensor observes a compressible signal.
sensor computes multiple random projections of the data and transmits them using randomized gossiping
to the rest of the network. A potential drawback of this technique is the amount of storage required per
sensor, as it could be considerable for large networks .
only a subset of the sensors, where each group of sensors of a certain size will be known to contain CS
measurements for all the data in the network. To maintain a constant error as the network size grows, the
number of transmissions becomes
is partitioned,
kM n2
, where
is the number of values desired from each sensor and n are the number of nodes in the
network. The results can be improved by using geographic gossiping algorithms [63].
13 This
77
each node must perform, in order to reduce overall power consumption [208].
projections of its data, passing along information to a small set of
the resulting CS measurements are sparse, since
N L
projections can still be used as CS measurements with quality similar to that of full random projections. Since
the CS measurement matrix formed by the data nodes is sparse, a relatively small amount of communication
is performed by each encoding node and the overall power required for transmission is reduced.
The projections are aggregated at the central location by the receiving antenna, with further
noise being added. In this way, the fusion center receives the CS measurements, from which it can perform
reconstruction using knowledge of the random projections.
A drawback of this method is the required accurate synchronization. Although CWS is constraining the
power of each node, it is also relying on constructive interference to increase the power received by the data
center.
The nodes themselves must be accurately synchronized to know when to transmit their data.
In
addition, CWS assumes that the nodes are all at approximately equal distances from the fusion center, an
assumption that is acceptable only when the receiver is far away from the sensor network. Mobile nodes
could also increase the complexity of the transmission protocols. Interference or path issues also would have
a large eect on CWS, limiting its applicability.
If these limitations are addressed for a suitable application, CWS does oer great power benets when
very little is known about the data beyond sparsity in a xed basis.
M 2/(2+1) ,
where
is some positive constant based on the network structure. With much more a priori
information about the sensed data, other methods will achieve distortions proportional to
M 2 .
78
14
Biosensing of pathogens is a research area of high consequence. An accurate and rapid biosensing paradigm
has the potential to impact several elds, including healthcare, defense and environmental monitoring. In
this module we address the concept of biosensing based on compressive sensing (Section 1.1) (CS) via the
DNA microarrays are a frequently applied solution for microbe sensing; they have a signicant edge
over competitors due to their ability to sense many organisms in parallel [128], [171]. A DNA microarray
consists of genetic sensors or
a microarray, each DNA sequence can be viewed as a sequence of four DNA bases {A,
bind with one another in complementary pairs:
with
and
with
C.
T , G, C } that tend to
a target organism's genetic sample will tend to bind or hybridize with its complementary subsequence on a
microarray to form a stable structure. The target DNA sample to be identied is uorescently tagged before
it is ushed over the microarray. The extraneous DNA is washed away so that only the bound DNA is left
on the array. The array is then scanned using laser light of a wavelength designed to trigger uorescence in
the spots where binding has occurred. A specic pattern of array spots will uoresce, which is then used to
infer the genetic makeup in the test sample.
Figure 6.15:
Cartoon of traditional DNA microarray showing strong and weak hybridization of the
uniquely identify only one target of interest (each spot contains multiple copies of a probe for robustness.)
The rst concern with this design is that very often the targets in a test sample have similar base sequences,
causing them to hybridize with the wrong probe (see Figure 6.15). These cross-hybridization events lead
to errors in the array readout. Current microarray design methods do not address cross-matches between
similar DNA sequences.
The second concern in choosing unique identier based DNA probes is its restriction on the number of
organisms that can be identied. In typical biosensing applications multiple organisms must be identied;
therefore a large number of DNA targets requires a microarray with a large number of spots. In fact, there
are over 1000 known harmful microbes, many with more than 100 strains.
processing speed of microarray data is directly related to its number of spots, representing a signicant
problem for commercial deployment of microarray-based biosensors. As a consequence readout systems for
traditional DNA arrays cannot be miniaturized or implemented using electronic components and require
complicated uorescent tagging.
The third concern is the inecient utilization of the large number of array spots in traditional microarrays.
Although the number of potential agents in a sample is very large,
14 This
79
Therefore, in a traditionally designed microarray only a small fraction of spots will be active at a given time,
corresponding to the few targets present.
To combat these problems, a Compressive Sensing DNA Microarray (CSM) uses combinatorial testing
sensors in order to reduce the number of sensor spots [145], [174], [176]. Each spot in the CSM identies a
group of target organisms, and several spots together generate a unique pattern identier for a single target.
(See also "Group testing and data stream algorithms" (Section 6.3).) Designing the probes that perform
this combinatorial sensing is the essence of the microarray design process, and what we aim to describe in
this module.
To obtain a CS-type measurement scheme, we can choose each probe in a CSM to be a group identier
such that the readout of each probe is a probabilistic combination of all the targets in its group.
The
probabilities are representative of each probe's hybridization anity (or stickiness) to those targets in its
group; the targets that are not in its group have low anity to the probe. The readout signal at each spot
of the microarray is a linear combination of hybridization anities between its probe sequence and each of
the target agents.
with
spots identifying
targets
Figure 6.16 illustrates the sensing process. To formalize, we assume there are
xj ,
with anity
i,j .
j
PN
The target
i is yi =
1iM
and
j=1
i,j xj = i x,
where
y = x.
While group testing has previously been proposed for microarrays [172], the sparsity in the target signal
is key in applying CS. The chief advantage of a CS-based approach over regular group testing is in its
information scalability.
estimate
minute quantities of certain pathogens in the environment, but it is only their large concentrations that
may be harmful to us. Furthermore, we are able to use CS recovery methods such as Belief Propagation
(Section 5.5) that decode
80
Chapter 7
Appendices
7.1 Sub-Gaussian random variables
A number of distributions, notably Gaussian and Bernoulli, are known to satisfy certain concentration of
measure (Section 7.2) inequalities. We will analyze this phenomenon from a more general perspective by
considering the class of sub-Gaussian distributions [22].
Denition 7.1:
A random variable
t R.
is called
We use the
E (exp (Xt))
is the
sub-Gaussian
E (exp (Xt)) exp c2 t2 /2
2
notation X Sub c
to denote
moment-generating function
of
c>0
such that
(7.1)
that
X,
satises (7.1).
the moment-generating function of a Gaussian random variable. Thus, a sub-Gaussian distribution is one
whose moment-generating function is bounded by that of a Gaussian. There are a tremendous number of
sub-Gaussian distributions, but there are two particularly important examples:
Example 7.1
X N 0, 2 , i.e., X is a zero-mean Gaussian random variable with variance 2 , then X
Sub 2 . Indeed, as mentioned
above, the moment-generating function of a Gaussian is given by
E (exp (Xt)) = exp 2 t2 /2 , and thus (7.1) is trivially satised.
If
Example 7.2
If
such
that
A common way to characterize sub-Gaussian random variables is through analyzing their moments.
We
consider only the mean and variance in the following elementary lemma, proven in [22].
Lemma 7.1:
If
(Buldygin-Kozachenko [22])
X Sub c2
then,
E (X) = 0
(7.2)
E X 2 c2 .
(7.3)
and
1 This
81
CHAPTER 7. APPENDICES
82
X Sub c2
then
E X 2 c2 .
In some
settings it will be useful to consider a more restrictive class of random variables for which this inequality
becomes an equality.
Denition 7.2:
A random variable
is called
where
2 = E X 2
, i.e., the
inequality
E (exp (Xt)) exp 2 t2 /2
t R. To denote
X SSub 2 .
Example 7.3
If
X N 0, 2
, then
that
X SSub 2
(7.4)
2 ,
Example 7.4
If
X U (1, 1),
i.e.,
[1, 1],
X SSub (1/3).
then
Example 7.5
Now consider the random variable with distribution such that
P (X = 1) = P (X = 1) =
For any
For
1s
, P (X = 0) = s, s [0, 1) .
2
s (2/3, 1), X is not strictly sub-Gaussian.
(7.5)
We now provide an equivalent characterization for sub-Gaussian and strictly sub-Gaussian random variables,
proven in [22], that illustrates their concentration of measure behavior.
Theorem 7.1:
(Buldygin-Kozachenko [22])
A random variable
for all
t t0 .
X Sub c2
Moreover, if
t2
P (|X| t) 2exp 2
2a
2
X SSub , then (7.6) holds for
(7.6)
all
t>0
with
a = .
Finally, sub-Gaussian distributions also satisfy one of the fundamental properties of a Gaussian distribution: the sum of two sub-Gaussian random variables is itself a sub-Gaussian random variable. This result is
established in more generality in the following lemma.
Lemma 7.2:
X = [X1 ,X2 , ..., XN ], where each Xi is independent and identically
distributed
2
2
c
k
k
. Similarly, if each
Xi Sub c2 . Then for any RN , < X, > Sub
2
Xi SSub 2 , then for any RN , < X, > SSub 2 k k22 .
Suppose that
(i.i.d.) with
Proof:
Since the
Xi
P
N
E exp t i=1 i Xi
Q
N
=E
exp
(t
X
)
i
i
i=1
QN
= i=1 E (exp (ti Xi ))
QN
2
i=1 exp c2 (i t) /2
P
N
2
2 2
= exp
i=1 i c t /2 .
c2 = 2
(7.7)
83
Sub-Gaussian distributions (Section 7.1) have a close relationship to the concentration of measure phenomenon [131]. To illustrate this, we note that we can combine Lemma 2 and Theorem 1 from "Sub-Gaussian
random variables" (Section 7.1) to obtain deviation bounds for weighted sums of sub-Gaussian random variables. For our purposes, however, it will be more enlightening to study the
random variables. In particular, if
k X k2
Xi
is i.i.d. with
Xi Sub (c),
then we would
In order to establish the result, we will make use of Markov's inequality for nonnegative random variables.
Lemma 7.3:
(Markov's Inequality)
and
t > 0,
P (X t)
Proof:
Let
f (x)
E (X)
.
t
Z
xf (x) dx
E (X) =
0
X.
Z
xf (x) dx
(7.8)
tf (x) dx = tP (X t) .
(7.9)
In addition, we will require the following bound on the exponential moment of a sub-Gaussian random
variable.
Lemma 7.4:
Suppose
X Sub c2
. Then
E exp X 2 /2c2
for any
Proof:
1
,
1
(7.10)
[0, 1).
First, observe that if = 0, then the lemma holds trivially. Thus, suppose that (0, 1). Let
f (x) denote the probability density function for X . Since X is sub-Gaussian, we have by denition
that
(7.11)
for any
t R.
If we multiply by
exp c2 t2 /2
, then we obtain
exp tx c2 t2 /2 f (x) dx exp c2 t2 ( 1) /2 .
(7.12)
t, we obtain
Z
exp tx c2 t2 /2 dt f (x) dx
Z
exp c2 t2 ( 1) /2 dt,
(7.13)
which reduces to
1
2
c
2 This
exp x /2c
1
f (x) dx
c
2
.
1
(7.14)
CHAPTER 7. APPENDICES
84
We now state our main theorem, which generalizes the results of [53] and uses substantially the same
proof technique.
Theorem 7.2:
Suppose that
X = [X1 , X2 , ..., XM ],
where each
Xi
is i.i.d. with
Xi Sub c2
and
E Xi2 = 2 .
Then
Moreover, for
depending only
E k X k22 = M 2 .
2 2
any (0, 1) and for any c / , max , there exists
2 2
on max and the ratio /c such that
2
P k X k22 M 2 exp M (1 ) /
(7.15)
a constant
(7.16)
and
2
P k X k22 M 2 exp M ( 1) / .
(7.17)
Proof:
Since the
Xi
E kX
k22
M
X
Xi2
M
X
2 = M 2
(7.18)
i=1
i=1
and hence (7.15) holds. We now turn to (7.16) and (7.17). Let us rst consider (7.17). We begin
by applying Markov's inequality:
P k X k22 M 2
Since
Xi Sub c2
= P exp k X k22 exp M 2
E(exp(kXk2 ))
exp(M 22)
QM
E(exp(Xi2 ))
= i=1
.
exp(M 2 )
(7.19)
E exp Xi2
1
.
1 2c2
(7.20)
Thus,
M
Y
E exp
Xi2
i=1
1
1 2c2
M/2
(7.21)
and hence
P k X k22 M
2
exp 2 2
1 2c2
,
2 c2
.
(1 + )
2c2 2
!M/2
(7.22)
is
(7.23)
85
P k X k22 M 2
M/2
2
2
.
exp
1
c2
c2
(7.24)
M/2
2
2
2 exp 1 2
.
c
c
(7.25)
Similarly,
P kX
k22
!
2
max 2 /c 1
= max 4, 2
(max 2 /c 1) log (max 2 /c)
2
any 0, max /c we have the bound
(7.26)
log () ( 1)
2( 1)
,
(7.27)
and hence
2( 1)
exp ( 1)
= 2 /c2 ,
By setting
!
.
(7.28)
= 2 /c2
establishes
(7.17).
This result tells us that given a vector with entries drawn according to a sub-Gaussian distribution, we
can expect the norm of the vector to concentrate around its expected value of
probability as
tight concentration. However, recall that for strictly sub-Gaussian distributions we have that
c2 = 2 ,
in
which there is no such restriction. Moreover, for strictly sub-Gaussian distributions we also have the following
3
useful corollary.
Corollary 7.1:
Suppose that
with
X = [X1 , X2 , ..., XM ],
where each
Xi
is i.i.d. with
Xi SSub 2
. Then
E k X k22 = M 2
(7.29)
M 2
2
2
2
P k X k2 M M 2exp
(7.30)
> 0,
Proof:
Since each
Xi SSub 2
3 Corollary
Xi Sub 2 and E Xi2 = 2 , in which case we may
= 1 and = 1 + . This allows us to simplify and combine
, we have that
7.1, p. 85 exploits the strictness in the strictly sub-Gaussian distribution twice rst to ensure that
(1, 2]
corollary for non-strictly sub-Gaussian vectors but for which we consider a more restricted range of
provided that
c2 / 2 < 2.
However, since most of the distributions of interest in this thesis are indeed strictly sub-Gaussian, we do not pursue this route.
Note also that if one is interested in very small
C.
CHAPTER 7. APPENDICES
86
1+2
max = 2.
This result
generalizes the main results of [1] to the broader family of general strictly sub-Gaussian distributions via a
much simpler proof.
Corollary 7.2:
is
x RN .
Suppose that
Y = x
for
an
M N
ij are
x RN ,
> 0,
and any
i.i.d.
with
ij SSub (1/M ).
Let
E k Y k22 =k x k22
(7.31)
M 2
P k Y k22 k x k22 k x k22 2exp
(7.32)
and
with
Proof:
Let
Observe that if
We now show how to exploit the concentration of measure (Section 7.2) properties of sub-Gaussian distributions (Section 7.1) to provide a simple proof that sub-Gaussian matrices satisfy the restricted isometry
property (Section 3.3) (RIP). Specically, we wish to show that by constructing an
random with
K (0, 1)
M N
x K
(where
matrix
at
such that
(7.33)
and draw a
according to a Gaussian distribution, or indeed any continuous univariate distribution. In this case, with
To ensure that the matrix will satisfy the RIP, we will impose two conditions on the random distribution.
First, we require that the distribution is sub-Gaussian. In order to simplify our argument, we will use the
simpler results stated in Corollary 2 from "Concentration of measure for sub-Gaussian random variables"
(Section 7.2), which we briey recall.
4 This
87
Corollary 7.3:
is an M N matrix whose entries ij are
x RN . Then for any > 0, and any x RN ,
Suppose that
Y = x
for
i.i.d.
with
ij SSub (1/M ).
Let
E k Y k22 =k x k22
(7.34)
M 2
2
2
2
P k Y k2 k x k2 k x k2 2exp
(7.35)
and
with
is strictly
sub-Gaussian.
This is done simply to yield more concrete constants. The argument could easily be modied to establish
a similar result for general sub-Gaussian distributions by instead using Theorem 2 from "Concentration of
measure for sub-Gaussian random variables" (Section 7.2).
Our second condition is that we require that the distribution yield a matrix that is approximately normpreserving, which will require that
1
E 2ij =
,
M
and hence the variance is
(7.36)
1/M .
We shall now show how the concentration of measure inequality in Corollary 7.3, p.
86 can be used
together with covering arguments to prove the RIP for sub-Gaussian random matrices. Our general approach
will be to construct nets of points in each
a union bound, and then extend the result from our nite set of points to all possible
Thus, in order to prove the result, we will require the following upper bound on the number of points required
to construct the nets of points. (For an overview of results similar to Lemma 7.5, p. 87 and of various related
concentration of measure results, we refer the reader to the excellent introduction of [5].)
Lemma 7.5:
(0, 1) be given.
with k x k2 1 there is
Let
Proof:
satisfying
x RK
q 1 RK
K
point qi R
with k q1 k2 1.
Q so that at step i we add a
with k qi k2 1
which satises k qi qj k2 > for all j < i. This continues until we can add no more points (and
K
hence for any x R
with k x k2 1 there is a point q Q satisfying k x q k2 .) Now we
wish to bound |Q|. Observe that if we center balls of radius /2 at each point in Q, then these
K
balls are disjoint and lie within a ball of radius 1 + /2. Thus, if B (r) denotes a ball of radius r
K
in R , then
We construct
|Q| Vol B K (/2) Vol B K (1 + /2)
(7.37)
and hence
|Q|
(B K (1+/2))
Vol
Vol(B K (/2))
(1+/2)K
(/2)K
K
(3/) .
We now turn to our main theorem, which is based on the proof given in [8].
(7.38)
CHAPTER 7. APPENDICES
88
Theorem 7.3:
Fix (0, 1).
SSub (1/M ). If
Let
M N
be an
M 1 Klog
N
K
ij
are i.i.d.
with
ij
,
(7.39)
1 2e2 M ,
Proof:
index set
min k x q k2 /14.
(7.40)
qQT
T,
QT
Q=
|QT | (42/)
sets QT together:
with
QT .
(7.41)
T :|T |=K
There are
N
K
N
K
T.
NK
N (N 1) (N 2) (N K + 1)
=
K!
K!
eN
K
K
,
where the last inequality follows since from Sterling's approximation we have
Hence
|Q| (42eN/K)
(7.42)
K! (K/e)
distribution, from Corollary 7.3, p. 86 we have (7.35). We next use the union bound to apply (7.35)
to this set of points with
= / 2,
1 2(42eN/K) eM
/2
(7.43)
we have
1 / 2 k q k22 k q k22 1 + / 2 k q k22 ,
We observe that if
42eN
log
K
and thus (7.43) exceeds
(7.44)
We now dene
for all q Q.
K
Klog
1 2e2 M
N
K
log
42e
M log (42e/)
1
(7.45)
as desired.
k x k2
1 + Ak x k2 ,
for all x K , k x k2 1.
(7.46)
k x k2 k q k2 + k (x q) k2 1 + / 2 + 1 + A /14.
(7.47)
Our goal is to show that
pick a
89
Since by denition
1 + / 2 + 1 + A /14.
1+A
Therefore
1+A
1 + / 2
1 /14
1 + ,
(7.48)
as desired. We have proved the upper inequality in (7.33). The lower inequality follows from this
since
k x k2 k q k2 k (x q) k2
1 / 2 1 + /14 1 ,
(7.49)
Above we prove above that the RIP holds with high probability when the matrix
is drawn according to a
strictly sub-Gaussian distribution. However, we are often interested in signals that are sparse or compressible
in some orthonormal basis
6= I ,
columns of
distribution. This
universality
of
was initially observed for the Gaussian distribution (based on symmetry arguments), but we can now see is
a property of more general sub-Gaussian distributions. Indeed, it follows that with high probability such a
will simultaneously satisfy the RIP with respect to an exponential number of xed bases.
7.4
`_1
minimization proof
We now establish one of the core lemmas that we will use throughout this course . Specically, Lemma 7.9, p.
90 is used in establishing the relationship between the RIP and the NSP (Section 3.4) as well as establishing
results on
`1
minimization (Section 4.1) in the context of sparse recovery in both the noise-free (Section 4.2)
and noisy (Section 4.3) settings. In order to establish Lemma 7.9, p. 90, we establish the following preliminary
lemmas.
Lemma 7.6:
Suppose
u, v
kuk2 + kvk2
Proof:
We begin by dening the
2ku + vk2 .
T
(7.50)
`p norms
kwk1 2kwk2 .
(Lemma 1 from "The RIP and the NSP" (Section 3.4)) with
K = 2,
we have
kuk2 + kvk2
Since
and
are orthogonal,
q
2 kuk22 + kvk22 .
(7.51)
Lemma 7.7:
If
of vectors
u, v K
2K ,
(7.52)
CHAPTER 7. APPENDICES
90
Proof:
Suppose u, v K with disjoint support
ku vk22 = 2. Using the RIP we have
and that
kuk2 = kvk2 = 1.
Then,
u v 2K
2 (1 2K ) ku vk22 2 (1 + 2K ) .
and
(7.53)
1
ku + vk22 ku vk22 2K
4
|< u, v >|
(7.54)
Lemma 7.8:
be an arbitrary subset of
kuj k2
j2
kuc0 k1
.
K
(7.55)
Proof:
We begin by observing that for
j 2,
kuj k
since the
sort
kuj1 k1
K
Applying standard bounds on
(7.56)
`p
norms (Lemma
kuj k2
j2
kuc k
1 X
kuj k1 = 0 1 ,
kuj k
K j1
K
j2
(7.57)
We are now in a position to prove our main result. The key ideas in this proof follow from [28].
Lemma 7.9:
Suppose that
that
of
|0 | K ,
hc0
and let
h RN
2K .
be given. Dene
khc k
|< h , h >|
kh k2 0 1 +
,
kh k2
K
(7.58)
where
22K
,
1 2K
1
.
1 2K
(7.59)
Proof:
Since
h 2K ,
(1 2K ) kh k22 kh k22 .
(7.60)
91
Dene
h = h
hj > .
(7.61)
j2
In order to bound the second term of (7.61), we use Lemma 7.7, p. 89, which implies that
< h , h > 2K kh k kh k ,
i
j
i 2
j 2
for any
i, j .
kh0 k2 + kh1 k2
(7.62)
2kh k2 .
Substituting into
(7.62) we obtain
P
< h , j2 hj >
P
P
= j2 < h0 , hj > + j2 < h1 , hj >
P
P
j2 < h0 , hj > + j2 < h1 , hj >
P
P
2K kh0 k2 j2 khj k2 + 2K kh1 k2 j2 khj k2
P
22K kh k2 j2 khj k2 .
(7.63)
X
kh c k
< h ,
22K kh k 0 1 .
h
>
j
2
K
j2
(7.64)
(1 2K ) kh k22
P
< h , h > < h , j2 hj >
P
|< h , h >| + < h , j2 hj >
khc k
0 1
,
|< h , h >| + 22K kh k2 K
(7.65)
GLOSSARY
92
Glossary
A
A matrix
satises the
(NSP) of order
C>0
such that,
h N ()
khc k
kh k2 C 1
K
such that || K .
A matrix
(3.3)
(RIP) of order
if there exists a
(3.9)
x K = {x : kxk0 K}.
A random variable
is called
strictly sub-Gaussian
if
X Sub 2
where
2 = E X 2
, i.e.,
the inequality
E (exp (Xt)) exp 2 t2 /2
t R. To denote
X SSub 2 .
A random variable
t R.
is called
that
sub-Gaussian
We use the
(7.4)
E (exp (Xt)) exp c2 t2 /2
2
notation X Sub c
to denote
c>0
such that
(7.1)
that
satises
L
: RN RM denote a sensing matrix and : RM RN denote a recovery algorithm.
M
say that the pair (, ) is C -stable if for any x K and any e R
we have that
Let
k (x + e) xk2 Ckek.
We
(3.11)
T
The coherence of a matrix
columns
i , j
of
, (),
:
() =
7 http://cnx.org/content/m37185/latest/
max
|< i , j >|
.
i k2 k j k2
(3.45)
1i<jN k
Bibliography
[1] D. Achlioptas. Database-friendly random projections. In
Proc. IEEE Work. Stat. Signal Processing, Madison, WI, Aug. 2007.
Proc. IEEE Work. Stat. Signal Processing, Madison, WI, Aug. 2007.
[5] K. Ball.
9(1):518211;77, 2009.
[10] R. Baraniuk and M. Wakin.
9(1):518211;77, 2009.
[11] D. Baron, S. Sarvotham, and R. Baraniuk. Sudocodes - fast measurement and reconstruction of sparse
signals. In
Proc. IEEE Int. Symp. Inform. Theory (ISIT), Seattle, WA, Jul. 2006.
[12] J. Bect, L. Blanc-Feraud, G. Aubert, and A. Chambolle. A -unied variational framework for image
restoration. In
Proc. European Conf. Comp. Vision (ECCV), Prague, Czech Republic, May 2004.
[14] R. Berinde, P. Indyk, and M. Ruzic. Practical near-optimal sparse recovery in the l1 norm. In
Allerton Conf. Communication, Control, and Computing, Monticello, IL, Sept. 2008.
[15] A. Beurling. Sur les int[U+FFFD]ales de fourier absolument convergentes et leur application
transformation fonctionelle. In
[U+FFFD]e
[16] A. Beurling. Sur les int[U+FFFD]ales de fourier absolument convergentes et leur application
transformation fonctionelle. In
Proc.
[U+FFFD]e
BIBLIOGRAPHY
94
[17] T. Blumensath and M. Davies. Iterative hard thresholding for compressive sensing.
In
Appl. Comput.
Madison, WI,
Aug. 2007.
[19] S. Boyd and L. Vanderberghe.
Convex Optimization.
2004.
[20] Y. Bresler. Spectrum-blind sampling and compressive sensing for continuous-index signals. In
Work. Inform. Theory and Applications (ITA), San Diego, CA, Jan. 2008.
Proc.
[21] Y. Bresler and P. Feng. Spectrum-blind minimum-rate sampling and reconstruction of 2-d multiband
signals. In
Proc. IEEE Int. Conf. Image Processing (ICIP), Zurich, Switzerland, Sept. 1996.
[25] E. Cand[U+FFFD]
The restricted isometry property and its implications for compressed sensing.
[26] E. Cand[U+FFFD]
The restricted isometry property and its implications for compressed sensing.
[27] E. Cand[U+FFFD]
The restricted isometry property and its implications for compressed sensing.
[28] E. Cand[U+FFFD]
The restricted isometry property and its implications for compressed sensing.
[29] E.
Cand[U+FFFD]nd
Y.
Plan.
Near-ideal
model
selection
by
minimization.
Ann. Stat.,
37(5A):21458211;2177, 2009.
[30] E. Cand[U+FFFD]nd J. Romberg.
decompositions.
Inverse Problems,
23(3):9698211;985, 2007.
[32] E. Cand[U+FFFD] J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction
from highly incomplete frequency information.
[33] E. Cand[U+FFFD] J. Romberg, and T. Tao.
measurements.
[35] E. Cand[U+FFFD] M. Rudelson, T. Tao, and R. Vershynin. Error correction via linear programming.
In
Proc. IEEE Symp. Found. Comp. Science (FOCS), Pittsburg, PA, Oct. 2005.
BIBLIOGRAPHY
95
[37] E. Cand[U+FFFD]nd T. Tao. Near optimal signal recovery from random projections: Universal encoding
strategies?
[38] E. Cand[U+FFFD]nd T. Tao. Near optimal signal recovery from random projections: Universal encoding
strategies?
[39] E. Cand[U+FFFD]nd T. Tao. The dantzig selector: Statistical estimation when is much larger than.
[40] C. Carath[U+FFFD]ory.
[U+FFFD]er
[U+FFFD]er
[42] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In
Proc. Int.
20(1):338211;61, 1998.
[44] A. Cohen, W. Dahmen, and R. DeVore.
sensing.
In
2008.
[45] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best -term approximation.
J. Amer.
[46] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation.
J. Amer.
Noiselets.
10:278211;44,
2001.
[48] P. Combettes and J. Pesquet.
bases.
[49] G. Cormode and M. Hadjieleftheriou. Finding the frequent items in streams of data.
Comm. ACM,
52(10):978211;105, 2009.
[50] G. Cormode and M. Hadjieleftheriou. Finding the frequent items in streams of data.
Comm. ACM,
52(10):978211;105, 2009.
[51] G. Cormode and S. Muthukrishnan. Improved data stream summaries: The count-min sketch and its
applications.
IEEE
[53] S. Dasgupta and A. Gupta. An elementary proof of the johnson-lindenstrauss lemma. Technical report
TR-99-006, Univ. of Cal. Berkeley, Comput. Science Division, Mar. 1999.
[54] I. Daubechies, M. Defrise, and C. De Mol.
problems with a sparsity constraint.
[55] M. Davenport, P. Boufounos, M. Wakin, and R. Baraniuk. Signal processing with compressive measurements.
BIBLIOGRAPHY
96
[56] M. Davenport, M. Duarte, M. Wakin, J. Laska, D. Takhar, K. Kelly, and R. Baraniuk. The smashed
lter for compressive classication and target recognition. In
[57] M. Davenport, C. Hegde, M. Duarte, and R. Baraniuk. Joint manifolds for data fusion.
IEEE Trans.
[58] M. Davenport, J. Laska, P. Boufouons, and R. Baraniuk. A simple proof that random matrices are
democratic. Technical report TREE 0906, Rice Univ., ECE Dept., Nov. 2009.
[59] M. Davenport and M. Wakin. Analysis of orthogonal matching pursuit using the restricted isometry
property.
[62] R.
DeVore.
Deterministic
constructions
of
compressed
sensing
matrices.
J. Complex.,
23(4):9188211;925, 2007.
[63] A. Dimakis, A. Sarwate, and M. Wainwright.
networks.
In
Geographic gossip:
2006.
[64] D. Donoho. Denoising by soft-thresholding.
[65] D. Donoho. Neighborly polytopes and sparse solutions of underdetermined linear equations. Technical
report 2005-04, Stanford Univ., Stat. Dept., Jan. 2005.
[66] D. Donoho. Compressed sensing.
[67] D. Donoho. For most large underdetermined systems of linear equations, the minimal -norm solution
is also the sparsest solution.
[68] D. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension.
[69] D. Donoho, I. Drori, Y. Tsaig, and J.-L. Stark. Sparse solution of underdetermined linear equations
by stagewise orthogonal matching pursuit. Preprint, 2006.
[70] D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via
l1 minimization.
[71] D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via
minimization.
Hessian eigenmaps:
[73] D. Donoho, A. Maleki, and A. Montanari. Message passing algorithms for compressed sensing.
Proc.
[74] D. Donoho and J. Tanner. Neighborliness of randomly projected simplices in high dimensions.
Proc.
[75] D. Donoho and J. Tanner. Sparse nonnegative solutions of undetermined linear equations by linear
programming.
BIBLIOGRAPHY
97
[76] D. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when the projection radically lowers dimension.
In
Single-pixel
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),
Toulouse,
Multiscale
Proc. Work. Struc. Parc. Rep. Adap. Signaux (SPARS), Rennes, France, Nov. 2005.
[84] M. F. Duarte, M. B. Wakin, D. Baron, and R. G. Baraniuk. Universal distributed sensing via random
projections. In
page 1778211;185,
Sparse and Redundant Representations: From Theory to Applications in Signal and Image
Processing. Springer, New York, NY, 2010.
[86] M. Elad.
[87] M. Elad, B. Matalon, J. Shtok, and M. Zibulevsky. A wide-angle view at iterated shrinkage algorithms.
In
Proc. SPIE Optics Photonics: Wavelets, San Diego, CA, Apr. 2007.
[88] M. Elad, B. Matalon, and M. Zibulevsky. Coordinate and subspace optimization methods for linear
least squares with non-quadratic regularization.
23(3):3468211;367,
2007.
[89] Y. Erlich, N. Shental, A. Amir, and O. Zuk. Compressed sensing approach for high throughput carrier
Proc. Allerton Conf. Communication, Control, and Computing, Monticello, IL, Sept. 2009.
screen. In
[90] Y. Erlich, N. Shental, A. Amir, and O. Zuk. Compressed sensing approach for high throughput carrier
Proc. Allerton Conf. Communication, Control, and Computing, Monticello, IL, Sept. 2009.
screen. In
[91] V. Fedorov.
[92] P. Feng.
Universal spectrum blind minimum rate sampling and reconstruction of multiband signals.
Universal spectrum blind minimum rate sampling and reconstruction of multiband signals.
In
May 1996.
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),
Atlanta, GA,
BIBLIOGRAPHY
98
In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),
Atlanta, GA,
May 1996.
[96] M. Figueiredo and R. Nowak.
IEEE Trans.
Appli-
1(4):5868211;597, 2007.
[98] A. Garnaev and E. Gluskin. The widths of euclidean balls.
[99] M. Gehm, R. John, D. Brady, R. Willett, and T. Schultz. Single-shot compressive spectral imaging
with a dual disperser architecture.
[100] S. Gervsgorin.
[U+FFFD]er
[101] A. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Near-optimal sparse fourier representations via sampling. In
[104] A. Gilbert, S. Muthukrishnan, and M. Strauss. Improved time bounds for near-optimal sparse fourier
representations. In
Proc. SPIE Optics Photonics: Wavelets, San Diego, CA, Aug. 2005.
[105] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. One sketch for all: Fast algorithms for compressed
sensing. In
Proc. ACM Symp. Theory of Comput., San Diego, CA, Jun. 2007.
[106] AC Gilbert, MJ Strauss, JA Tropp, and R. Vershynin. One sketch for all: fast algorithms for compressed
sensing.
In
page
95(4):2318211;251,
1995.
[108] I. Gorodnitsky and B. Rao. Convergence analysis of a class of adaptive weighted norm extrapolation
algorithms. In
Proc. Asilomar Conf. Signals, Systems, and Computers, Pacic Grove, CA, Nov. 1993.
[109] I. Gorodnitsky, B. Rao, and J. George. Source localization in magnetoencephalagraphy using an iterative weighted minimum norm algorithm. In
NY, 2001.
[112] J. Haupt and R. Nowak. Signal reconstruction from noisy random projections.
BIBLIOGRAPHY
99
[113] C. Hegde, M. Wakin, and R. Baraniuk. Random projections for manifold learning. In
Proc. Adv. in
[117] S. Jafarpour, W. Xu, B. Hassibi, and R. Calderbank. Ecient and robust compressed sensing using
optimized expander graphs.
56(6):23468211;2356, 2008.
[120] W. Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. In
Proc. Conf.
[121] R. Kainkaryam, A. Breux, A. Gilbert, P. Woolf, and J. Schiefelbein. poolmc: Smart pooling of mrna
samples in microarray experiments.
[122] R. Kainkaryam, A. Breux, A. Gilbert, P. Woolf, and J. Schiefelbein. poolmc: Smart pooling of mrna
samples in microarray experiments.
2007.
[124] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. An interior point method for large-scale
-regularized least squares.
[125] S. Kirolos, J. Laska, M. Wakin, M. Duarte, D. Baron, T. Ragheb, Y. Massoud, and R. Baraniuk.
Analog-to-information conversion via random demodulation. In
[126] V. Kotelnikov. On the carrying capacity of the ether and wire in telecommunications. In
Izd. Red.
[129] J. Laska, P. Boufounos, M. Davenport, and R. Baraniuk. Democracy in action: Quantization, saturation, and compressive sensing. Preprint, 2009.
[130] J. Laska, S. Kirolos, M. Duarte, T. Ragheb, R. Baraniuk, and Y. Massoud. Theory and implementation
of an analog-to-information convertor using random demodulation. In
BIBLIOGRAPHY
100
[131] M. Ledoux.
RI, 2001.
[132] S. Levy and P. Fullagar. Reconstruction of a sparse spike train from a portion of its spectrum and
application to high-resolution deconvolution.
[133] B. Logan.
[134] M. Lustig, D. Donoho, and J. Pauly. Rapid mr imaging with compressed sensing and randomly undersampled 3dft trajectories. In
[135] M. Lustig, J. Lee, D. Donoho, and J. Pauly. Faster imaging with randomly perturbed, under-sampled
spirals and reconstruction. In
k-t sparse:
[137] M. Lustig, J. Santos, J. Lee, D. Donoho, and J. Pauly. Application of compressed sensing for rapid mr
imaging. In
Proc. Work. Struc. Parc. Rep. Adap. Signaux (SPARS), Rennes, France, Nov. 2005.
[138] D. MacKay.
Neural Comput.,
4:5908211;604, 1992.
[139] S. Mallat.
[140] S. Mallat.
[141] S. Mallat.
Symp. Elec. Imag.: Comp. Imag., San Jose, CA, Jan. 2009.
Proc. IS&T/SPIE
[144] S. Mendelson, A. Pajor, and N. Tomczack-Jaegermann. Uniform uncertainty principle for bernoulli
and subgaussian ensembles.
Proc. Work. Inform. Theory and Applications (ITA), San Diego, CA,
Jan. 2007.
[146] M. Mishali and Y. C. Eldar. Blind multi-band signal reconstruction: Compressed sensing for analog
signals.
[147] M. Mishali and Y. C. Eldar. From theory to practice: Sub-nyquist sampling of sparse wideband analog
signals.
Contemporary Math.,
313:858211;96, 2002.
Data Streams: Algorithms and Applications, volume 1 of Found. Trends in Theoretical Comput. Science. Now Publishers, Boston, MA, 2005.
[149] S. Muthukrishnan.
Data Streams: Algorithms and Applications, volume 1 of Found. Trends in Theoretical Comput. Science. Now Publishers, Boston, MA, 2005.
[150] S. Muthukrishnan.
[151] D. Needell and J. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples.
BIBLIOGRAPHY
101
[152] D. Needell and J. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples.
[153] D. Needell and J. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples.
[154] D. Needell and R. Vershynin. Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit.
4(2):3108211;316,
2010.
[156] R. Nowak and M. Figueiredo.
Proc. Asilomar Conf. Signals, Systems, and Computers, Pacic Grove, CA, Nov. 2001.
In
[158] B. Olshausen and D. Field. Emergence of simple-cell receptive eld properties by learning a sparse
representation.
[159] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total
variation-based image restoration.
4(2):4608211;489,
2005.
[160] Y. Pati, R. Rezaifar, and P. Krishnaprasad. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In
hold, 1993.
[162] R. Prony.
[165] R. Robucci, L. Chiu, J. Gray, J. Romberg, P. Hasler, and D. Anderson. Compressive sensing on a cmos
separable transform image sensor. In
2(4):10988211;1128,
2(4):10988211;1128,
2009.
[167] J. Romberg. Compressive sensing by random convolution.
2009.
[168] M. Rosenfeld.
BIBLIOGRAPHY
102
Science,
290(5500):23238211;2326, 2000.
[170] S. Sarvotham, D. Baron, and R. Baraniuk. Compressed sensing reconstruction via belief propagation.
Technical report TREE-0601, Rice Univ., ECE Dept., 2006.
[171] M. Schena, D. Shalon, R. Davis, and P. Brown. Quantitative monitoring of gene expression patterns
with a complementary dna microarray.
37(1):108211;21, 1949.
[174] M. Sheikh, O. Milenkovic, S. Sarvotham, and R. Baraniuk.
In
2007.
[176] M. Sheikh, S. Sarvotham, O. Milenkovic, and R. Baraniuk. Dna array decoding from nonlinear measurements by belief propagation.
In
2007.
[177] N. Shental, A. Amir, and O. Zuk.
se(que)nsing.
In
[180] T. Strohmer and R. Heath. Grassmanian frames with applications to coding and communication.
Symp. Elec. Imag.: Comp. Imag., San Jose, CA, Jan. 2006.
Appl.
A compressed
Proc. IS&T/SPIE
[182] D. Takhar, J. Laska, M. Wakin, M. Duarte, D. Baron, S. Sarvotham, K. Kelly, and R. Baraniuk. A
new compressive imaging camera architecture using optical-domain compression. In
Symp. Elec. Imag.: Comp. Imag., San Jose, CA, Jan. 2006.
tice.
Proc. IS&T/SPIE
Kluwer, 2001.
[185] J. Tenenbaum, V.de Silva, and J. Landford. A global geometric framework for nonlinear dimensionality
reduction.
[186] R.
Tibshirani.
Regression
58(1):2678211;288, 1996.
shrinkage
and
selection
via
the
lasso.
BIBLIOGRAPHY
[187] R.
Tibshirani.
103
Regression
shrinkage
and
selection
via
the
lasso.
58(1):2678211;288, 1996.
[188] M. Tipping. Sparse bayesian learning and the relevance vector machine.
1:2118211;244, 2001.
[189] M. Tipping and A. Faul. Fast marginal likelihood maximization for sparse bayesian models. In
Int. Conf. Art. Intell. Stat. (AISTATS), Key West, FL, Jan. 2003.
Proc.
[190] J. Tropp and A. Gilbert. Signal recovery from partial information via orthogonal matching pursuit.
[191] J. Tropp and A. Gilbert. Signal recovery from partial information via orthogonal matching pursuit.
[192] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk. Beyond nyquist: Ecient sampling of
sparse, bandlimited signals.
[193] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk. Beyond nyquist: Ecient sampling of
sparse, bandlimited signals.
[194] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk. Random lters for compressive sampling
and reconstruction.
In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),
In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),
346(238211;24):12718211;1274, 2008.
[197] J. A. Tropp. On the conditioning of random subdictionaries.
2008.
[198] J. Trzasko and A. Manduca. Highly undersampled magnetic resonance image reconstruction via homotopic -minimization.
[199] V. Vapnik.
[200] R. Varga.
[201] S. Vasanawala, M. Alley, R. Barth, B. Hargreaves, J. Pauly, and M. Lustig. Faster pediatric mri via
compressed sensing.
In
2009.
[202] R. Venkataramani and Y. Bresler. Further results on spectrum blind sampling of 2-d signals. In
IEEE Int. Conf. Image Processing (ICIP), Chicago, IL, Oct. 1998.
[203] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with nite rate of innovation.
Proc.
IEEE Trans.
[204] A. Wagadarikar, R. John, R. Willett, and D. Brady. Single disperser design for coded aperture snapshot
spectral imaging.
[205] M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk. An
architecture for compressive imaging.
GA, Oct. 2006.
In
Atlanta,
BIBLIOGRAPHY
104
[206] M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk.
Compressive imaging for video representation and coding.
In
Beijing,
Walker
and
T.
Ulrych.
Autoregressive
recovery
of
the
acoustic
impedance.
Geophysics,
48(10):13388211;1350, 1983.
[208] W. Wang, M. Garofalakis, and K. Ramchandran. Distributed sparse random projections for renable
approximation. In
Proc. Int. Symp. Inform. Processing in Sensor Networks (IPSN), Cambridge, MA,
Apr. 2007.
[209] R.
Ward.
Compressive
sensing
with
cross
validation.
55(12):57738211;5782, 2009.
[210] L. Welch. Lower bounds on the maximum cross correlation of signals.
20(3):3978211;399, 1974.
[211] E. Whittaker. On the functions which are represented by the expansions of the interpolation theory.
[212] P. Wojtaszczyk. Stability and instance optimality for gaussian measurements in compressed sensing.
IEEE
[214] W. Yin, S. Osher, D. Goldfarb, and J. Darbon. Bregman iterative algorithms for -minimization with
applications to compressed sensing.
INDEX
105
Keywords are listed by the section with that keyword (page numbers are in parentheses).
Keywords
do not necessarily appear in the text of the page. They are merely associated with that section.
apples, 1.1 (1)
A
B
Approximation, 2.4(11)
Atoms, 2.2(7)
Basis, 2.2(7)
Belief propagation, 5.5(51), 6.11(78)
Best K-term approximation, 2.4(11)
Biosensing, 6.11(78)
F
G
Coherence, 3.6(26)
Instance-optimality, 4.4(37)
Frame, 2.2(7)
CoSaMP, 5.3(45)
Lasso, 5.2(42)
Count-median, 5.4(49)
Count-min, 5.4(49)
Cross-polytope, 4.5(39)
Electroencephalography (EEG)
Analysis, 2.2(7)
Ex.
apples, 1
Democracy, 3.5(24)
Detection, 6.9(73)
Norms, 2.1(5)
INDEX
106
p norms, 2.1(5)
7.4(89)
5.3(45)
7.3(86)
7.3(86)
Synthesis, 2.2(7)
Sensing, 1.1(1)
Sensing matrices, 3.1(15)
Sensing matrix design, 3.1(15)
Shrinkage, 5.2(42)
Universality, 3.5(24)
`_0
`_1
minimization, 4.1(29)
minimization, 4.1(29), 7.4(89)
ATTRIBUTIONS
Attributions
Collection:
Edited by: Richard Baraniuk, Mark A. Davenport, Marco F. Duarte, Chinmay Hegde
URL: http://cnx.org/content/col11133/1.5/
License: http://creativecommons.org/licenses/by/3.0/
Module: "Introduction to compressive sensing"
By: Mark A. Davenport, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk
URL: http://cnx.org/content/m37172/1.7/
Pages: 1-3
Copyright: Mark A. Davenport, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk
License: http://creativecommons.org/licenses/by/3.0/
Module: "Introduction to vector spaces"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37167/1.6/
Pages: 5-7
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Bases and frames"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37165/1.6/
Pages: 7-8
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sparse representations"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37168/1.5/
Pages: 8-10
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Compressible signals"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37166/1.5/
Pages: 11-14
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sensing matrix design"
By: Mark A. Davenport
URL: http://cnx.org/content/m37169/1.6/
Page: 15
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
107
ATTRIBUTIONS
108
Module: "Null space conditions"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37170/1.6/
Pages: 16-18
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "The restricted isometry property"
By: Mark A. Davenport
URL: http://cnx.org/content/m37171/1.6/
Pages: 18-22
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "The RIP and the NSP"
By: Mark A. Davenport
URL: http://cnx.org/content/m37176/1.5/
Pages: 22-24
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Matrices that satisfy the RIP"
By: Mark A. Davenport
URL: http://cnx.org/content/m37177/1.5/
Pages: 24-25
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Coherence"
By: Marco F. Duarte
URL: http://cnx.org/content/m37178/1.5/
Pages: 26-27
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Signal recovery via
`_1
minimization"
ATTRIBUTIONS
Module: "Instance-optimal guarantees revisited"
By: Mark A. Davenport
URL: http://cnx.org/content/m37183/1.6/
Pages: 37-39
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "The cross-polytope and phase transitions"
By: Mark A. Davenport
URL: http://cnx.org/content/m37184/1.5/
Pages: 39-40
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sparse recovery algorithms"
By: Chinmay Hegde
URL: http://cnx.org/content/m37292/1.3/
Page: 41
Copyright: Chinmay Hegde
License: http://creativecommons.org/licenses/by/3.0/
Module: "Convex optimization-based methods"
By: Wotao Yin
URL: http://cnx.org/content/m37293/1.5/
Pages: 42-44
Copyright: Wotao Yin
License: http://creativecommons.org/licenses/by/3.0/
Module: "Greedy algorithms"
By: Chinmay Hegde
URL: http://cnx.org/content/m37294/1.4/
Pages: 45-48
Copyright: Chinmay Hegde
License: http://creativecommons.org/licenses/by/3.0/
Module: "Combinatorial algorithms"
By: Mark A. Davenport, Chinmay Hegde
URL: http://cnx.org/content/m37295/1.3/
Pages: 49-51
Copyright: Mark A. Davenport, Chinmay Hegde
License: http://creativecommons.org/licenses/by/3.0/
Module: "Bayesian methods"
By: Chinmay Hegde, Mona Sheikh
URL: http://cnx.org/content/m37359/1.4/
Pages: 51-53
Copyright: Chinmay Hegde, Mona Sheikh
License: http://creativecommons.org/licenses/by/3.0/
Module: "Linear regression and model selection"
By: Mark A. Davenport
URL: http://cnx.org/content/m37360/1.3/
Page: 55
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
109
110
Module: "Sparse error correction"
By: Mark A. Davenport
URL: http://cnx.org/content/m37361/1.3/
Pages: 55-56
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Group testing and data stream algorithms"
By: Mark A. Davenport
URL: http://cnx.org/content/m37362/1.4/
Page: 56
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Compressive medical imaging"
By: Mona Sheikh
URL: http://cnx.org/content/m37363/1.4/
Pages: 56-57
Copyright: Mona Sheikh
License: http://creativecommons.org/licenses/by/3.0/
Module: "Analog-to-information conversion"
By: Mark A. Davenport, Jason Laska
URL: http://cnx.org/content/m37375/1.4/
Pages: 57-60
Copyright: Mark A. Davenport, Jason Laska
License: http://creativecommons.org/licenses/by/3.0/
Module: "Single-pixel camera"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37369/1.4/
Pages: 60-64
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Hyperspectral imaging"
By: Marco F. Duarte
URL: http://cnx.org/content/m37370/1.4/
Pages: 64-69
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Compressive processing of manifold-modeled data"
By: Marco F. Duarte
URL: http://cnx.org/content/m37371/1.6/
Pages: 69-73
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Inference using compressive measurements"
By: Marco F. Duarte
URL: http://cnx.org/content/m37372/1.4/
Pages: 73-76
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
ATTRIBUTIONS
ATTRIBUTIONS
Module: "Compressive sensor networks"
By: Marco F. Duarte
URL: http://cnx.org/content/m37373/1.3/
Pages: 76-77
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Genomic sensing"
By: Mona Sheikh
URL: http://cnx.org/content/m37374/1.3/
Pages: 78-79
Copyright: Mona Sheikh
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sub-Gaussian random variables"
By: Mark A. Davenport
URL: http://cnx.org/content/m37185/1.6/
Pages: 81-82
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Concentration of measure for sub-Gaussian random variables"
By: Mark A. Davenport
URL: http://cnx.org/content/m32583/1.7/
Pages: 83-86
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Proof of the RIP for sub-Gaussian matrices"
By: Mark A. Davenport
URL: http://cnx.org/content/m37186/1.4/
Pages: 86-89
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "`_1 minimization proof"
By: Mark A. Davenport
URL: http://cnx.org/content/m37187/1.4/
Pages: 89-91
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
111
About Connexions
Since 1999, Connexions has been pioneering a global system where anyone can create course materials and
make them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching and
learning environment open to anyone interested in education, including students, teachers, professors and
lifelong learners. We connect ideas and facilitate educational communities.
Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12
schools, distance learners, and lifelong learners.
English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part
of an exciting new information distribution system that allows for
Connexions
has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course
materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.