0% found this document useful (0 votes)

283 views

Compressed Sensing

Text book on compressed sensing

Uploaded by

pavan2446

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

283 views

Compressed Sensing

Text book on compressed sensing

Uploaded by

pavan2446

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 118

An Introduction to Compressive Sensing

Collection Editors:
Richard Baraniuk
Mark A. Davenport
Marco F. Duarte
Chinmay Hegde

An Introduction to Compressive Sensing

Collection Editors:
Richard Baraniuk
Mark A. Davenport
Marco F. Duarte
Chinmay Hegde
Authors:
Richard Baraniuk
Mark A. Davenport
Marco F. Duarte
Chinmay Hegde

Jason Laska
Mona Sheikh
Wotao Yin

Online:
< http://cnx.org/content/col11133/1.5/ >

CONNEXIONS
Rice University, Houston, Texas

This selection and arrangement of content as a collection is copyrighted by Richard Baraniuk, Mark A. Davenport, Marco F. Duarte, Chinmay Hegde.

It is licensed under the Creative Commons Attribution 3.0 license

(http://creativecommons.org/licenses/by/3.0/).
Collection structure revised: April 2, 2011
PDF generated: September 23, 2011
For copyright and attribution information for the modules contained in this collection, see p. 107.

Table of Contents
1 Introduction
1.1 Introduction to compressive sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Sparse and Compressible Signal Models
2.1 Introduction to vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Bases and frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Sparse representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Compressible signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Sensing Matrices
3.1 Sensing matrix design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Null space conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 The restricted isometry property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 The RIP and the NSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Matrices that satisfy the RIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Sparse Signal Recovery via `_1 Minimization
4.1 Signal recovery via `_1 minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Noise-free signal recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Signal recovery in noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Instance-optimal guarantees revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5 The cross-polytope and phase transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Algorithms for Sparse Recovery
5.1 Sparse recovery algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Convex optimization-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Greedy algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4 Combinatorial algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Bayesian methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Applications of Compressive Sensing
6.1 Linear regression and model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Sparse error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3 Group testing and data stream algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4 Compressive medical imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.5 Analog-to-information conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.6 Single-pixel camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.7 Hyperspectral imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.8 Compressive processing of manifold-modeled data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.9 Inference using compressive measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.10 Compressive sensor networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.11 Genomic sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7 Appendices
7.1 Sub-Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Concentration of measure for sub-Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3 Proof of the RIP for sub-Gaussian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 `_1 minimization proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Attributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Chapter 1

Introduction
1

1.1 Introduction to compressive sensing

We are in the midst of a digital revolution that is driving the development and deployment of new kinds
of sensing systems with ever-increasing delity and resolution. The theoretical foundation of this revolution
is the pioneering work of Kotelnikov, Nyquist, Shannon, and Whittaker on sampling continuous-time bandlimited signals [126], [157], [173], [211]. Their results demonstrate that signals, images, videos, and other
data can be exactly recovered from a set of uniformly spaced samples taken at the so-called

Nyquist rate

twice the highest frequency present in the signal of interest. Capitalizing on this discovery, much of signal
processing has moved from the analog to the digital domain and ridden the wave of Moore's law. Digitization
has enabled the creation of sensing and processing systems that are more robust, exible, cheaper and,
consequently, more widely-used than their analog counterparts.
As a result of this success, the amount of data generated by sensing systems has grown from a trickle to a
torrent. Unfortunately, in many important and emerging applications, the resulting Nyquist rate is so high
that we end up with far too many samples. Alternatively, it may simply be too costly, or even physically
impossible, to build devices capable of acquiring samples at the necessary rate. Thus, despite extraordinary
advances in computational power, the acquisition and processing of signals in application areas such as
imaging, video, medical imaging, remote surveillance, spectroscopy, and genomic data analysis continues to
pose a tremendous challenge.
To address the logistical and computational challenges involved in dealing with such high-dimensional
data, we often depend on compression, which aims at nding the most concise representation of a signal
that is able to achieve a target level of acceptable distortion. One of the most popular techniques for signal
compression is known as

sparse

compressible

transform coding,

and typically relies on nding a basis or frame that provides

representations for signals in a class of interest. By a sparse representation, we mean

that for a signal of length

we can represent it with

K N

nonzero coecients; by a compressible

representation, we mean that the signal is well-approximated by a signal with only

nonzero coecients.

Both sparse and compressible signals can be represented with high delity by preserving only the values
and locations of the largest coecients of the signal. This process is called

sparse approximation, and forms

the foundation of transform coding schemes that exploit signal sparsity and compressibility, including the
JPEG, JPEG2000, MPEG, and MP3 standards.
Leveraging the concept of transform coding,
for signal acquisition and sensor design.

compressive sensing

(CS) has emerged as a new framework

CS enables a potentially large reduction in the sampling and

computation costs for sensing signals that have a sparse or compressible representation.

The Nyquist-

Shannon sampling theorem states that a certain minimum number of samples is required in order to perfectly
capture an arbitrary bandlimited signal, but when the signal is sparse in a known basis we can vastly reduce
the number of measurements that need to be stored. Consequently, when sensing sparse signals we might

1 This

content is available online at <http://cnx.org/content/m37172/1.7/>.

CHAPTER 1. INTRODUCTION

be able to do better than suggested by classical results.

This is the fundamental idea behind CS: rather

than rst sampling at a high rate and then compressing the sampled data, we would like to nd ways to

directly

sense the data in a compressed form i.e., at a lower sampling rate.

The eld of CS grew out

of the work of Emmanuel Cands, Justin Romberg, and Terence Tao and of David Donoho, who showed
that a nite-dimensional signal having a sparse or compressible representation can be recovered from a small
set of linear, nonadaptive measurements [6], [24], [66]. The design of these measurement schemes and their
extensions to practical data models and acquisition schemes are one of the most central challenges in the
eld of CS.
Although this idea has only recently gained signicant attraction in the signal processing community,
there have been hints in this direction dating back as far as the eighteenth century. In 1795, Prony proposed
an algorithm for the estimation of the parameters associated with a small number of complex exponentials
sampled in the presence of noise [162]. The next theoretical leap came in the early 1900's, when Carathodory

any K sinusoids is uniquely determined by its value at t = 0 and

2K points in time [40], [41]. This represents far fewer samples than the number of Nyquist-rate
samples when K is small and the range of possible frequencies is large. In the 1990's, this work was generalized

showed that a positive linear combination of

any

other

by George, Gorodnitsky, and Rao, who studied sparsity in the context of biomagnetic imaging and other
contexts [109], [164], and by Bressler and Feng, who proposed a sampling scheme for acquiring certain classes
of signals consisting of

components with nonzero bandwidth (as opposed to pure sinusoids) [94], [92]. In

the early 2000's Vetterli, Marziliano, and Blu proposed a sampling scheme for non-bandlimited signals that
are governed by only

parameters, showing that these signals can be sampled and recovered from just

samples [203].
A related problem focuses on recovery of a signal from partial observation of its Fourier transform. Beurling proposed a method for extrapolating these observations to determine the entire Fourier transform [15].
One can show that if the signal consists of a nite number of impulses, then Beurling's approach will correctly
recover the entire Fourier transform (of this non-bandlimited signal) from
Fourier transform. His approach to nd the signal with smallest

any

suciently large piece of its

norm among all signals agreeing with

the acquired Fourier measurements bears a remarkable resemblance to some of the algorithms used in CS.
More recently, Cands, Romberg, Tao [24], [30], [32], [33], [37], and Donoho [66] showed that a signal
having a sparse representation can be recovered

exactly from a small set of linear, nonadaptive measurements.

This result suggests that it may be possible to sense sparse signals by taking far fewer measurements, hence
the name

compressive

respects.

First, rather than sampling the signal at specic points in time, CS systems typically acquire

sensing.

Note, however, that CS diers from classical sampling in two important

measurements in the form of inner products between the signal and more general test functions. We will
see throughout this course that

randomness

often plays a key role in the design of these test functions.

Second, the two frameworks dier in the manner in which they deal with

signal recovery,

i.e., the problem

of recovering the original signal from the compressive measurements. In the Nyquist-Shannon framework,
signal recovery is achieved through cardinal sine (sinc) interpolation a linear process that requires little
computation and has a simple interpretation.
CS has already had notable impact on several applications. One example is medical imaging (Section 6.4),
where it has enabled speedups by a factor of seven in pediatric MRI while preserving diagnostic quality [201].
Moreover, the broad applicability of this framework has inspired research that extends the CS framework
by proposing practical implementations for numerous applications, including sub-Nyquist analog-to-digital
converters (Section 6.5) (ADCs), compressive imaging architectures (Section 6.6), and compressive sensor
networks (Section 6.10).
This course introduces the basic concepts in compressive sensing. We overview the concepts of sparsity
(Section 2.3), compressibility (Section 2.4), and transform coding.

We overview the key results in the

eld, beginning by focusing primarily on the theory of sensing matrix design (Section 3.1),

`1 -minimization

(Section 4.1), and alternative algorithms for sparse recovery (Section 5.1). We then review applications of
sparsity in several signal processing problems such as sparse regression and model selection (Section 6.1), error
correction (Section 6.2), group testing (Section 6.3), and compressive inference (Section 6.9). We also discuss
applications of compressive sensing in analog-to-digital conversion (Section 6.5), biosensing (Section 6.11),

conventional (Section 6.6) and hyperspectral (Section 6.7) imaging, medical imaging (Section 6.4), and sensor
networks (Section 6.10).

1.1.1 Acknowledgments
The authors would like to thank Ewout van den Berg, Yonina Eldar, Piotr Indyk, Gitta Kutyniok, and Yaniv
Plan for their feedback regarding some portions of this course which now also appear in the introductory
chapter

Compressed Sensing: Theory and Applications, Cambridge

2 http://www-stat.stanford.edu/markad/publications/ddek-chapter1-2011.pdf

University Press, 2011.

CHAPTER 1. INTRODUCTION

Chapter 2

Sparse and Compressible Signal Models

2.1 Introduction to vector spaces

For much of its history, signal processing has focused on signals produced by physical systems.

Many

natural and man-made systems can be modeled as linear. Thus, it is natural to consider signal models that
complement this kind of linear structure. This notion has been incorporated into modern signal processing
by modeling signals as

vectors

vector space.

living in an appropriate

This captures the linear structure that

we often desire, namely that if we add two signals together then we obtain a new, physically meaningful
signal. Moreover, vector spaces allow us to apply intuitions and tools from geometry in

R3 ,

such as lengths,

distances, and angles, to describe and compare signals of interest. This is useful even when our signals live
in high-dimensional or innite-dimensional spaces.
2

Throughout this course , we will treat signals as real-valued functions having domains that are either
continuous or discrete, and either innite or nite. These assumptions will be made clear as necessary in
each chapter. In this course, we will assume that the reader is relatively comfortable with the key concepts
in vector spaces. We now provide only a brief review of some of the key concepts in vector spaces that will
be required in developing the theory of compressive sensing (Section 1.1) (CS). For a more thorough review
3

of vector spaces see this introductory course in Digital Signal Processing .

We will typically be concerned with

normed vector spaces,

i.e., vector spaces endowed with a

norm.

N -dimensional Euclidean space,

frequent use of the `p norms, which are

the case of a discrete, nite domain, we can view our signals as vectors in an
denoted by
dened for

RN . When dealing
p [1, ] as

RN ,

with vectors in

P
kxkp = {

N
i=1

we will make

|xi |

p [1, ) ;

max |xi |,

i=1,2,...,N

In Euclidean space we can also consider the standard

inner product

< x, z >= z x =
This inner product leads to the

norm:

kxk2 =

(2.1)

p = .

N
X

RN ,

which we denote

xi zi .

(2.2)

i=1

< x, x >.
`p norms

In some contexts it is useful to extend the notion of

to the case where

p < 1.

In this case, the

norm dened in (2.1) fails to satisfy the triangle inequality, so it is actually a quasinorm.
make frequent use of the notation

1 This
2
3

kxk0 := |supp (x) |,

where

supp (x) = {i : xi 6= 0}

content is available online at <http://cnx.org/content/m37167/1.6/>.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

Digital Signal Processing <http://cnx.org/content/col11172/latest/>
5

We will also

denotes the support of

CHAPTER 2. SPARSE AND COMPRESSIBLE SIGNAL MODELS

and

|supp (x) |

denotes the cardinality of

supp (x).

Note that

k k0

is not even a quasinorm, but one can

easily show that

lim kxk0 xp = |supp (x) |,

(2.3)

p0
justifying this choice of notation. The
of

(quasi-)norms have notably dierent properties for dierent values

To illustrate this, in Figure 2.1 we show the unit sphere, i.e.,

norms in

. Note that for

p<1

{x : kxkp = 1},

induced by each of these

the corresponding unit sphere is nonconvex (reecting the quasinorm's

violation of the triangle inequality).

(a)

(b)

(c)

(d)

1
.
2
(a) Unit sphere for `1 norm (b) Unit sphere for `2 norm (c) Unit sphere for ` norm (d) Unit sphere for
Figure 2.1: Unit spheres in

for the

norms with

p = 1, 2, ,

and for the

quasinorm with

quasinorm

We typically use norms as a measure of the strength of a signal, or the size of an error. For example,
suppose we are given a signal
space

x R2

and wish to approximate it using a point in a one-dimensional ane

If we measure the approximation error using an

that minimizes

kx x kp .

The choice of

norm, then our task is to nd the

point

norm, we can imagine growing an

x A

that is closest to

x A

will have a signicant eect on the properties of the resulting

approximation error. An example is illustrated in Figure 2.2. To compute the closest point in
each

sphere centered on

in the corresponding

until it intersects with

norm. We observe that larger

the error more evenly among the two coecients, while smaller

using

This will be the

tends to spread out

leads to an error that is more unevenly

distributed and tends to be sparse. This intuition generalizes to higher dimensions, and plays an important
role in the development of CS theory.

(a)

(b)

(c)

(d)

by a a one-dimensional subspace using the `p norms

1
. (a) Approximation in `1 norm (b) Approximation in
2
`2 norm (c) Approximation in ` norm (d) Approximation in `p quasinorm
Figure 2.2: Best approximation of a point in

for

p = 1, 2, ,

and the

quasinorm with

2.2 Bases and frames

A set

= {i }iI is called a basis for a nite-dimensional vector space (Section 2.1) V if the vectors in
V and are linearly independent. This implies that each vector in the space can be represented

the set span

as a linear combination of this (smaller, except in the trivial case) set of basis vectors in a unique fashion.
Furthermore, the coecients of this linear combination can be found by the inner product of the signal and
a dual set of vectors. In discrete settings, we will only consider real nite-dimensional Hilbert spaces where

V = RN

and

I = {1, ..., N }.

Mathematically, any signal

x RN

may be expressed as,

ai i ,

(2.4)

iI
where our coecients are computed as

ai =< x, i >

and

{i }iI

are the vectors that constitute our dual

x. Here, we call our dual

is our analysis basis.
An orthonormal basis (ONB) is dened as a set of vectors = {i }iI that form a basis and whose
elements are orthogonal and unit norm. In other words, < i , j >= 0 if i 6= j and one otherwise. In the
= T ).
case of an ONB, the synthesis basis is simply the Hermitian adjoint of analysis basis (
basis. Another way to denote our basis and its dual is by how they operate on

basis

our

synthesis basis

(used to reconstruct our signal by (2.4)) and

It is often useful to generalize the concept of a basis to allow for sets of possibly linearly dependent
vectors, resulting in what is known as a

d<n

corresponding to a matrix

frame.

More formally, a frame is a set of vectors

, such that for all vectors

Akxk22 kT xk22 Bkxk22

{i }ni=1

Rd ,

,
(2.5)

0 < A B < . Note that the condition A > 0 implies that the rows of must be linearly
A is chosen as the largest possible value and B as the smallest for these inequalities to
hold, then we call them the (optimal) frame bounds. If A and B can be chosen as A = B , then the frame is
called A-tight, and if A = B = 1, then is a Parseval frame. A frame is called equal-norm, if there exists
some > 0 such that ki k2 = for all i = 1, ..., N , and it is unit-norm if = 1. Note also that while the
concept of a frame is very general and can be dened in innite-dimensional spaces, in the case where is
T
a d N matrix A and B simply correspond to the smallest and largest eigenvalues of , respectively.
with

independent. When

4 This

content is available online at <http://cnx.org/content/m37165/1.6/>.

CHAPTER 2. SPARSE AND COMPRESSIBLE SIGNAL MODELS

Frames can provide richer representations of data due to their redundancy: for a given signal
exist innitely many coecient vectors

such that

provides a dierent choice of a coecient vector

x = .

is called an (alternate) dual frame. The particular choice

(2.6)

T 1

is referred to as the

It is also known as the Moore-Penrose pseudoinverse. Note that since

linearly independent rows, we ensure that

More formally, any frame satisfying

T =
T =I

dual frame.

there

In particular, each choice of a dual frame

is invertible, so that

A>0

requires

canonical

to have

is well-dened. Thus, one way to

obtain a set of feasible coecients is via

d = T T

(2.7)

One can show that this sequence is the smallest coecient sequence in

such that

norm, i.e.,

kd k2 kk2

for all

x = .

Finally, note that in the sparse approximation (Section 2.3) literature, it is also common for a basis or
frame to be referred to as a
being called

atoms.

dictionary

overcomplete dictionary

respectively, with the dictionary elements

2.3 Sparse representations

Transforming a signal to a new basis or frame (Section 2.2) may allow us to represent a signal more concisely.
The resulting compression is useful for reducing data storage and data transmission, which can be quite
expensive in some applications. Hence, one might wish to simply transmit the analysis coecients obtained
in our basis or frame expansion instead of its high-dimensional correlate.

In cases where the number of

non-zero coecients is small, we say that we have a sparse representation. Sparse signal models allow us
to achieve high rates of compression and in the case of compressive sensing (Section 1.1), we may use the
knowledge that our signal is sparse in a known basis or frame to recover our original signal from a small
number of measurements. For sparse data, only the non-zero coecients need to be stored or transmitted
in many cases; the rest can be assumed to be zero).
Mathematically, we say that a signal

K -sparse

when it has at most

nonzeros, i.e.,

kxk0 K .

let

K = {x : kxk0 K}
denote the set of all

K -sparse

(2.8)

signals. Typically, we will be dealing with signals that are not themselves

. In
x =

sparse, but which admit a sparse representation in some basis

this case we will still refer to

K -sparse,

where

with the understanding that we can express

as being

k k0 K .

Sparsity has long been exploited in signal processing and approximation theory for tasks such as compression [60], [161], [183] and denoising [64], and in statistics and learning theory as a method for avoiding
overtting [199]. Sparsity also gures prominently in the theory of statistical estimation and model selection [111], [186], in the study of the human visual system [158], and has been exploited heavily in image
processing tasks, since the multiscale wavelet transform [139] provides nearly sparse representations for natural images. Below, we briey describe some one-dimensional (1-D) and two-dimensional (2-D) examples.

2.3.1 1-D signal models

We will now present an example of three basis expansions that yield dierent levels of sparsity for the same
signal.

A simple periodic signal is sampled and represented as a periodic train of weighted impulses (see

Figure 2.3). One can interpret sampling as a basis expansion where our elements in our basis are impulses

5 This

content is available online at <http://cnx.org/content/m37168/1.5/>.

placed at periodic points along the time axis. We know that in this case, our dual basis consists of sinc functions used to reconstruct our signal from discrete-time samples. This representation contains many non-zero
coecients, and due to the signal's periodicity, there are many redundant measurements. Representing the
signal in the Fourier basis, on the other hand, requires only two non-zero basis vectors, scaled appropriately
at the positive and negative frequencies (see Figure 2.3).

Driving the number of coecients needed even

lower, we may apply the discrete cosine transform (DCT) to our signal, thereby requiring only a single nonzero coecient in our expansion (see Figure 2.3). The DCT equation is
with

k = 0, , N 1, xn

Xk =

PN 1
n=0

xn cos

the input signal, and N the length of the transform.

(a)

(b)

(c)

Figure 2.3:

basis

Cosine signal in three representations: (a) Train of impulses (b) Fourier basis (c) DCT

1
2

CHAPTER 2. SPARSE AND COMPRESSIBLE SIGNAL MODELS

2.3.2 2-D signal models

This same concept can be extended to 2-D signals as well. For instance, a binary picture of a nighttime sky is
sparse in the standard pixel domain because most of the pixels are zero-valued black pixels. Likewise, natural
images are characterized by large smooth or textured regions and relatively few sharp edges. Signals with this
structure are known to be very nearly sparse when represented using a multiscale wavelet transform [139]. The
wavelet transform consists of recursively dividing the image into its low- and high-frequency components. The
lowest frequency components provide a coarse scale approximation of the image, while the higher frequency
components ll in the detail and resolve edges. What we see when we compute a wavelet transform of a
typical natural image, as shown in Figure 2.4, is that most coecients are very small. Hence, we can obtain a
good approximation of the signal by setting the small coecients to zero, or

thresholding

the coecients, to

K -sparse representation. When measuring the approximation error using an `p norm, this procedure
yields the best K -term approximation of the original signal, i.e., the best approximation of the signal using
6
only K basis elements.

obtain a

(a)

(b)

Figure 2.4: Sparse representation of an image via a multiscale wavelet transform. (a) Original image.

(b) Wavelet representation. Large coecients are represented by light pixels, while small coecients are
represented by dark pixels. Observe that most of the wavelet coecients are close to zero.

Sparsity results through this decomposition because in most natural images most pixel values vary little
from their neighbors.

Areas with little contrast dierence can be represent with low frequency wavelets.

Low frequency wavelets are created through stretching a mother wavelet and thus expanding it in space. On
the other hand, discontinuities, or edges in the picture, require high frequency wavelets, which are created
through compacting a mother wavelet.

At the same time, the transitions are generally limited in space,

mimicking the properties of the high frequency compacted wavelet. See "Compressible signals" (Section 2.4)
for an example.

6 Thresholding

yields the best

K -term

approximation of a signal with respect to an orthonormal basis.

When redundant

frames are used, we must rely on sparse approximation algorithms like those described later in this course [86], [139].

2.4 Compressible signals

2.4.1 Compressibility and K -term approximation

An important assumption used in the context of compressive sensing (Section 1.1) (CS) is that signals exhibit
a degree of structure. So far the only structure we have considered is sparsity (Section 2.3), i.e., the number
of non-zero values the signal has when representation in an orthonormal basis (Section 2.2)

The signal is

considered sparse if it has only a few nonzero values in comparison with its overall length.
Few structured signals are truly sparse; rather they are compressible.
sorted coecient magnitudes in
compressible in the basis

A signal is

decay rapidly. To consider this mathematically, let

compressible

:
x = ,

where

are the coecients of

coecients

if its

be a signal which is

in the basis

(2.9)

is compressible, then the magnitudes of the sorted

observe a power law decay:

|s | C1 sq , s = 1, 2, ....
We dene a signal as being compressible if it obeys this power law decay. The larger

(2.10)

is, the faster the

magnitudes decay, and the more compressible a signal is. Figure 2.5 shows images that are compressible in
dierent bases.

7 This

content is available online at <http://cnx.org/content/m37166/1.5/>.

CHAPTER 2. SPARSE AND COMPRESSIBLE SIGNAL MODELS

(a)

(b)

Wavelet coefficient magnitude

1.5

0.5

(c)

Figure 2.5:

3
Sorted indices

6
4

x 10

(d)

The image in the upper left is a signal that is compressible in space.

When the pixel

values are sorted from largest to smallest, there is a sharp descent. The image in the lower left is not
compressible in space, but it is compressible in wavelets since its wavelet coecients exhibit a power law
decay.

Because the magnitudes of their coecients decay so rapidly, compressible signals can be represented
well by

coecients. The best

K -term

approximation of a signal is the one in which the

coecients are kept, with the rest being zero. The error between the true signal and its
is denoted the

K -term

approximation error

K (x),

largest

term approximation

dened as

K (x) = arg min k x k2 .

(2.11)

For compressible signals, we can establish a bound with power law decay as follows:

K (x) C2 K 1/2s .

(2.12)

r
In fact, one can show that K (x)2 will decay as K
if and only if the sorted coecients
r+1/2
i
[61]. Figure 2.6 shows an image and its K -term approximation.

(a)

decay as

(b)

Sparse approximation of a natural image.

Figure 2.6:

(a) Original image.

(b) Approximation of

image obtained by keeping only the largest 10% of the wavelet coecients. Because natural images are
compressible in a wavelet domain, approximating this image it in terms of its largest wavelet coecients
maintains good delity.

2.4.2 Compressibility and `p spaces

A signal's compressibility is related to the
is an element of an

space to which the signal belongs. An innite sequence

space for a particular value of

if and only if its

x (n)

norm is nite:

! p1
k x kp =

|xi |

< .

(2.13)

i
The smaller
bounded.

non-zero values.
various

is, the faster the sequence's values must decay in order to converge so that the norm is

In the limiting case of

p = 0,

the norm is actually a pseudo-norm and counts the number of

decreases, the size of its corresponding

unit balls (all sequences whose

space also decreases.

norm is 1) in 3 dimensions.

Figure 2.7 shows

CHAPTER 2. SPARSE AND COMPRESSIBLE SIGNAL MODELS

Figure 2.7: As the value of

decreases, the size of the corresponding

space also decreases. This can

be seen visually when comparing the the size of the spaces of signals, in three dimensions, for which the

norm is less than or equal to one. The volume of these

balls decreases with

Suppose that a signal is sampled innitely nely, and call it

a bounded

which is in an

x [n].

In order for this sequence to have

norm, its coecients must have a power-law rate of decay with

space with

q > 1/p.

Therefore a signal

obeys a power law decay, and is therefore compressible.

Chapter 3

Sensing Matrices
3.1 Sensing matrix design

In order to make the discussion more concrete, we will restrict our attention to the standard nite-dimensional
compressive sensing (Section 1.1) (CS) model. Specically, given a signal
systems that acquire

x RN ,

we consider measurement

linear measurements. We can represent this process mathematically as

y = x,

(3.1)

M N matrix and y RM . The matrix represents a dimensionality reduction, i.e., it

M
maps R , where N is generally large, into R , where M is typically much smaller than N . Note that in the
standard CS framework we assume that the measurements are non-adaptive, meaning that the rows of are
where

is an

xed in advance and do not depend on the previously acquired measurements. In certain settings adaptive
measurement schemes can lead to signicant performance gains.
Note that although the standard CS framework assumes that

is a nite-length vector with a discrete-

valued index (such as time or space), in practice we will often be interested in designing measurement systems
for acquiring continuously-indexed signals such as continuous-time signals or images. For now we will simply
think of

as a nite-length window of Nyquist-rate samples, and we temporarily ignore the issue of how to

directly acquire compressive measurements without rst sampling at the Nyquist rate.
There are two main theoretical questions in CS. First, how should we design the sensing matrix
ensure that it preserves the information in the signal
from measurements

Second, how can we recover the original signal

In the case where our data is sparse (Section 2.3) or compressible (Section 2.4), we

will see that we can design matrices

with M N

that ensure that we will be able to recover (Section 4.1)

the original signal accurately and eciently using a variety of practical algorithms (Section 5.1).
2

We begin in this part of the course

by rst addressing the question of how to design the sensing matrix

Rather than directly proposing a design procedure, we instead consider a number of desirable properties that
we might wish

to have (including the null space property (Section 3.2), the restricted isometry property

(Section 3.3), and bounded coherence (Section 3.6)). We then provide some important examples of matrix
constructions (Section 3.5) that satisfy these properties.

1 This
2

content is available online at <http://cnx.org/content/m37169/1.6/>.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

CHAPTER 3. SENSING MATRICES

3.2 Null space conditions

A natural place to begin in establishing conditions on

tion 3.1) is by considering the null space of

in the context of designing a sensing matrix (Sec-

denoted

N () = {z : z = 0}.

all

(3.2)

x from the measurements x, then it is

x, x' K = {x : kxk0 K}, we must have x 6= x' ,
'
since otherwise it would be impossible to distinguish x from x based solely on the measurements y . More

'
'
formally, by observing that if x = x then x x
= 0 with x x' 2K , we see that uniquely
represents all x K if and only if N () contains no vectors in 2K . There are many equivalent ways of
If we wish to be able to recover

sparse (Section 2.3) signals

immediately clear that for any pair of distinct vectors

characterizing this property; one of the most common is known as the

spark

[70].

3.2.1 The spark

Denition 3.1:
The spark of a given matrix

is the smallest number of columns of

that are linearly dependent.

This denition allows us to pose the following straightforward guarantee.

Theorem 3.1:
For any vector

(Corollary 1 of [70])

y RM ,

there exists at most one signal

x K

such that

y = x

if and only if

spark () > 2K .

Proof:

We rst assume that, for any

y RM ,

there exists at most one signal

Now suppose for the sake of a contradiction that

spark () 2K .

x K

such that

y = x.

This means that there exists

2K columns that are linearly independent, which in turn implies that there
h N () such that h 2K . In this case, since h 2K we can write h = x x' ,
'
'
where x, x K . Thus, since h N () we have that x x
= 0 and hence x = x' . But
this contradicts our assumption that there exists at most one signal x K such that y = x.
Therefore, we must have that spark () > 2K .
'
Now suppose that spark () > 2K . Assume that for some y there exist x, x K such that

'
'
'
y = x = x . We therefore have that x x = 0. Letting h = x x , we can write this as
h = 0. Since spark () > 2K , all sets of up to 2K columns of are linearly independent, and
'
therefore h = 0. This in turn implies x = x , proving the theorem.

some set of at most

exists an

spark () [2, M + 1].

M 2K .

It is easy to see that

the requirement

Therefore, Theorem 3.1, (Corollary 1 of [70]), p. 16 yields

3.2.2 The null space property

When dealing with

exactly

sparse vectors, the spark provides a complete characterization of when sparse re-

covery is possible. However, when dealing with

approximately

sparse (Section 2.4) signals we must introduce

[46]. Roughly speaking, we must also ensure

somewhat more restrictive conditions on the null space of

that

N ()

does not contain any vectors that are too compressible in addition to vectors that are sparse. In

order to state the formal denition we dene the following notation that will prove to be useful throughout
4

much of this course . Suppose that

{1, 2, , N } is a subset of indices and let c = {1, 2, , N } \ .

N vector obtained by setting the entries of x indexed by c to zero.

we typically mean the length

3 This
4

content is available online at <http://cnx.org/content/m37170/1.6/>.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

Similarly, by
to zero.

we typically mean the

M N

matrix obtained by setting the columns of

indexed by

Denition 3.2:

A matrix

satises the

null space property

(NSP) of order

if there exists a constant

C >0

such that,

khc k
kh k2 C 1
K
holds for all

h N ()

and for all

such that

(3.3)

|| K .
should not be too concentrated on
K -sparse, then there exists a such that
a matrix satises the NSP then the only

The NSP quanties the notion that vectors in the null space of

a small subset of indices. For example, if a vector

khc k1 = 0 and hence (3.3) implies

K -sparse vector in N () is h = 0.

that

h = 0

is exactly

as well. Thus, if

To fully illustrate the implications of the NSP in the context of sparse recovery, we now briey discuss
how we will measure the performance of sparse recovery algorithms when dealing with general non-sparse

Towards this end, let

: RM R N

represent our specic recovery method. We will focus primarily on

guarantees of the form

K (x)
k (x) xk2 C 1
K
for all

(3.4)

where we recall that

K (x)p = min kx x kp .

(3.5)

x K

This guarantees exact recovery of all possible

K -sparse

signals, but also ensures a degree of robustness to

non-sparse signals that directly depends on how well the signals are approximated by
guarantees are called

instance-optimal

K -sparse vectors. Such

x [46].

since they guarantee optimal performance for each instance of

This distinguishes them from guarantees that only hold for some subset of possible signals, such as sparse
or compressible signals the quality of the guarantee adapts to the particular choice of
commonly referred to as

uniform guarantees

since they hold uniformly for all

These are also

Our choice of norms in (3.4) is somewhat arbitrary. We could easily measure the reconstruction error
using other

norms. The choice of

however, will limit what kinds of guarantees are possible, and will

also potentially lead to alternative formulations of the NSP. See, for instance, [46].

Moreover, the form

of the right-hand-side of (3.4) might seem somewhat unusual in that we measure the approximation error
as

K (x)1 / K

rather than simply something like

K (x)2 .

However, we will see later in this course

that

such a guarantee is actually not possible without taking a prohibitively large number of measurements, and
that (3.4) represents the best possible guarantee we can hope to obtain (see "Instance-optimal guarantees
revisited" (Section 4.4)).
Later in this course, we will show that the NSP of order

is sucient to establish a guarantee of the

form (3.4) for a practical recovery algorithm (see "Noise-free signal recovery" (Section 4.2)). Moreover, the
following adaptation of a theorem in [46] demonstrates that if there exists
(3.4), then

Theorem 3.2:
Let

algorithm. If the pair

5 We

denote a sensing matrix and

(, )

satises (3.4) then

: RM RN

or the

M ||

2K .

vector obtained by keeping only the entries

matrix obtained by only keeping the columns corresponding to

from the context, but typically there is no substantive dierence between the two.

denote an arbitrary recovery

satises the NSP of order

note that this notation will occasionally be abused to refer to the length

recovery algorithm satisfying

(Theorem 3.2 of [46])

: RN RM

corresponding to

any

2K .

must necessarily satisfy the NSP of order

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

The usage should be clear

CHAPTER 3. SENSING MATRICES

Proof:
h N () and let be the indices corresponding to the 2K largest entries of h. We
into 0 and 1 , where |0 | = |1 | = K . Set x = h1 + hc and x' = h
0 , so that
h = x x' . Since by construction x' K , we can apply (3.4) to obtain x' = x' . Moreover,
since h N (), we have
Suppose

next split

h = x x' = 0
so that

x' = x.

Thus,

x' = (x).

(3.6)

Finally, we have that

khc k
K (x)
kh k2 khk2 = kx x' k2 = kx (x) k2 C 1 = 2C 1 ,
K
2K

(3.7)

where the last inequality follows from (3.4).

3.3 The restricted isometry property

The null space property (Section 3.2) (NSP) is both necessary and sucient for establishing guarantees of
the form

K (x)
k (x) xk2 C 1 ,
K
but these guarantees do not account for

noise.

(3.8)

When the measurements are contaminated with noise or have

been corrupted by some error such as quantization, it will be useful to consider somewhat stronger conditions.
In [36], Cands and Tao introduced the following isometry condition on matrices

and established its

important role in compressive sensing (Section 1.1) (CS).

Denition 3.3:
A matrix

satises the

restricted isometry property

(RIP) of order

if there exists a

K (0, 1)

such that

(1 K ) kxk22 kxk22 (1 + K ) kxk22 ,

holds for all

If a matrix

(3.9)

x K = {x : kxk0 K}.
2K , then we can interpret (3.9) as saying that approximately
K -sparse vectors. This will clearly have fundamental implications

satises the RIP of order

preserves the distance between any pair of

concerning robustness to noise.

It is important to note that in our denition of the RIP we assume bounds that are symmetric about 1,
but this is merely for notational convenience. In practice, one could instead consider arbitrary bounds

kxk22 kxk22 kxk22

where

0 < < .

Given any such bounds, one can always scale

bound about 1 in (3.9). Specically, multiplying

with constant

K = ( ) / ( + ).

2/ ( + )

so that it satises the symmetric

will result in an

that satises (3.9)

We will not explicitly show this, but one can check that all of the

theorems in this course based on the assumption that

some scaling of

(3.10)

satises the RIP actually hold as long as there exists

to satisfy (3.9), we lose nothing

that satises the RIP. Thus, since we can always scale

by restricting our attention to this simpler bound.

satises the RIP of order K with constant K , then for any K ' < K we automatically
satises the RIP of order K ' with constant K ' K . Moreover, in [151] it is shown that if

Note also that if

have that

7 This

content is available online at <http://cnx.org/content/m37171/1.6/>.

K with a suciently small constant, then it will also automatically satisfy the RIP
, albeit with a somewhat worse constant.

satises the RIP of order

of order

for certain

Lemma 3.1:
Suppose that

(Corollary 3.4 of [151])

satises the RIP of order

K ' = b K
2 c

K . Let
K ' < K ,

with constant

be a positive integer. Then

where

denotes the oor

operator.
This lemma is trivial for

= 1, 2,

but for

to higher orders. Note however, that

(and

K 4)

this allows us to extend from RIP of order

must be suciently small in order for the resulting bound to be

useful.

3.3.1 The RIP and stability

We will see later in this course

that if a matrix

satises the RIP, then this is sucient for a variety of

algorithms (Section 5.1) to be able to successfully recover a sparse signal from noisy measurements. First,
however, we will take a closer look at whether the RIP is actually necessary. It should be clear that the
lower bound in the RIP is a necessary condition if we wish to be able to recover all sparse signals
measurements

x for the same reasons that the NSP is necessary.

from the

We can say even more about the necessity

of the RIP by considering the following notion of stability.

Denition 3.4:

: RN RM denote a sensing matrix and : RM RN denote a recovery algorithm.

M
that the pair (, ) is C -stable if for any x K and any e R
we have that

Let
say

k (x + e) xk2 Ckek.

(3.11)

This denition simply says that if we add a small amount of noise to the measurements, then the impact
of this on the recovered signal should not be arbitrarily large.

Theorem 3.3, p.

19 below demonstrates

that the existence of any decoding algorithm (potentially impractical) that can stably recover from noisy
measurements requires that

satisfy the lower bound of (3.9) with a constant determined by

Theorem 3.3:
If the pair

(, )

C -stable,

then

1
kxk2 kxk2
C
for all

Proof:

(3.12)

x 2K .

Pick any

x, z K .

Dene

ex =

(z x)
2

and

ez =

(x z)
,
2

(3.13)

and note that

x + ex = z + ez =
8

(x + z)
.
2

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

(3.14)

CHAPTER 3. SENSING MATRICES

Let

x= (x + ex ) = (z + ez ).

From the triangle inequality and the denition of

C -stability,

we have that
^

kx zk2

= kx x + x zk2
^

kx x k2 + k x zk2

(3.15)

Ckex k + Ckez k2
= Ckx zk2 .
Since this holds for any

Note that as

C 1,

x, z K ,

we have that

the result follows.

K = 1 1/C 2 0.
must adjust so that it

must satisfy the lower bound of (3.9) with

Thus, if we desire to reduce the impact of noise in our recovered signal then we
satises the lower bound of (3.9) with a tighter constant.

One might respond to this result by arguing that since the upper bound is not necessary, we can avoid
redesigning

simply by rescaling

so that as long as

will satisfy (3.12) for any constant

choice of

satises the RIP with

2K < 1,

the rescaled version

In settings where the size of the noise is independent of our

this is a valid point by scaling

we are simply adjusting the gain on the signal part of our

measurements, and if increasing this gain does not impact the noise, then we can achieve arbitrarily high
signal-to-noise ratios, so that eventually the noise is negligible compared to the signal.
However, in practice we will typically not be able to rescale
practical settings the noise is not independent of

to be arbitrarily large. Moreover, in many

For example, suppose that the noise vector

quantization noise produced by a nite dynamic range quantizer with

lie in the interval

[T, T ],

represents

bits. Suppose the measurements

and we have adjusted the quantizer to capture this range. If we rescale

then the measurements now lie between

[T, T ],

and we must scale the dynamic range of our quantizer

In this case the resulting quantization error is simply

and we have achieved

no reduction

in the

reconstruction error.

3.3.2 Measurement bounds

We can also consider how many measurements are necessary to achieve the RIP. If we ignore the impact of

and focus only on the dimensions of the problem (N ,

and

then we can establish a simple lower

bound. We rst provide a preliminary lemma that we will need in the proof of the main theorem.

Lemma 3.2:
Let

N satisfying
K < N/2 be given. There exists a

kxk2 K and for any x, z X with x 6= z ,

p
kx zk2 K/2,

and

we have

set

X K

such that for any

(3.16)

and

K
log|X| log
2

N
K

(3.17)

Proof:
We will begin by considering the set

U = {x {0, +1, 1}
By construction,
we automatically

kxk22 = K forall x U .
have kxk2
K.

: kxk0 = K}.

Thus if we construct

by picking elements from

(3.18)

then

|U | =

Next, observe that

K/2

then

kx zk0 K/2.

N
K

2K .

kx zk0 kx zk22 ,

and thus if

kx zk22

x U,

N
{z U : kx zk22 K/2} |{z U : kx zk K/2}|
3K/2 .
0
K/2
From this we observe that for any xed

Thus, suppose we construct the set

adding

Note also that

(3.19)

by iteratively choosing points that satisfy (3.16).

After

points to the set, there are at least

N
K

2K j

N
K/2

3K/2

points left to pick from. Thus, we can construct a set of size

|X|

N
K/2

3K/2

N
K

(3.20)

|X|

provided that

(3.21)

Next, observe that

N
K

K/2
K/2
Y N K + i N
(K/2)! (N K/2)!
1
=

,
K! (N K)!
K/2 + i
K
2
i=1

N
K/2

where the inequality follows from the fact that

Thus, if we set

K/2

|X| = (N/K)

(n K + i) / (K/2 + i)

(3.22)

is decreasing as a function

then we have

N
K/2
K/2
K/2
K/2
K
3
3N
N
N
N
1

.
|X|
=
=

N
4
4K
K
4K
K
2

(3.23)

K/2
Hence, (3.21) holds for

|X| = (N/K)

K/2

, which establishes the lemma.

Using this lemma, we can establish the following bound on the required number of measurements to
satisfy the RIP.

Theorem 3.4:
Let

M N

be an

matrix that satises the RIP of order

M CKlog
where

Proof:

C = 1/2log

N
K

with constant

0, 21

. Then

(3.24)

24 + 1 0.28.

We rst note that since

satises the RIP, then for the set of points

in Lemma 3.2, p. 20 we

have,

kx zk2
for all

x, z X ,

since

x z 2K

and

kxk2
for all

1 kx zk2

K/4

(3.25)

1
2 . Similarly, we also have

1 + kxk2

3K/2

(3.26)

x X.

From the lower bound we can say that for any pair of points

K/4/2 =

K/16

and

z ,

x, z X , if we center balls of radius

then these balls will be disjoint. In turn, the upper bound tells

CHAPTER 3. SENSING MATRICES

us that the entire set of balls is itself contained within a larger ball of radius
we let

3K/2 +

K/16.

B M (r) = {x RM : kxk2 r}, this implies that

p

p

p
Vol B M
3K/2 + K/16
|X| Vol B M
K/16 ,
M
M
p
p
p
3K/2 + K/16
K/16 ,
|X|

M
24 + 1
|X|,

The theorem follows by applying the bound for

|X|

(3.27)

log|X|

.
log ( 24+1)

from Lemma 3.2, p. 20.

1
2 is arbitrary and is made merely for convenience minor modications
to the argument establish bounds for max for any max < 1. Moreover, although we have made no eort
Note that the restriction to

to optimize the constants, it is worth noting that they are already quite reasonable.

N
`1 ball [98]. However, both this result and Theorem 3.4,
of M on the desired RIP constant . In order to quantify

Although the proof is somewhat less direct, one can establish a similar result (in the dependence on
and
p.

by examining the

Gelfand width

of the

21 fail to capture the precise dependence

this dependence, we can exploit recent results concerning the

Johnson-Lindenstrauss lemma, which concerns

embeddings of nite sets of points in low-dimensional spaces [120]. Specically, it is shown in [118] that if
we are given a point cloud with

points and wish to embed these points in

1 ,

distance between any pair of points is preserved up to a factor of

c0 > 0

such that the squared

then we must have that

c0 log (p)
,
2

M
where

(3.28)

is a constant.

The Johnson-Lindenstrauss lemma is closely related to the RIP. We will see in "Matrices that satisfy
the RIP" (Section 3.5) that any procedure that can be used for generating a linear, distance-preserving
embedding for a point cloud can also be used to construct a matrix that satises the RIP. Moreover, in [127]
it is shown that if a matrix

satises the RIP of order

to construct a distance-preserving embedding for

M
Thus, for small

K = c1 log (p) with constant , then can be

= /4. Combining these we obtain

used

points with

c0 log (p)
16c0 K
=
.
2
c1 2

(3.29)

satises
Klog (N/K). See

the number of measurements required to ensure that

the RIP of order

K/ 2 ,

[127] for further details.

be proportional to

which may be signicantly higher than

3.4 The RIP and the NSP

will

Next we will show that if a matrix satises the restricted isometry property (Section 3.3) (RIP), then it also
satises the null space property (Section 3.2) (NSP). Thus, the RIP is strictly stronger than the NSP.

Theorem 3.5:
Suppose that
order

satises the RIP of order

with

2K <

2 1.

Then

satises the NSP of

with constant

C=
9 This

22K

.
1 1 + 2 2K

content is available online at <http://cnx.org/content/m37176/1.5/>.

(3.30)

Proof:
The proof of this theorem involves two useful lemmas.

K -sparse

standard norm inequality by relating a

The rst of these follows directly from

vector to a vector in

RK .

We include a simple

proof for the sake of completeness.

Lemma 3.3:
u K .

Suppose

Then

kuk1
kuk2 Kkuk .
K

Proof:

(3.31)

u, kuk1 = |< u, sgn (u) >|. By applying the Cauchy-Schwarz inequality we obtain kuk1
kuk2 ksgn (u) k2 . The lower bound followssince sgn (u) has exactly K nonzero entries all equal to
1 (since u K ) and thus ksgn (u) k = K . The upper bound is obtained by observing that each
of the K nonzero entries of u can be upper bounded by kuk .
For any

Below we state the second key lemma that we will need in order to prove Theorem 3.5, p. 22.

This result is a general result which holds for arbitrary

clear that when we do have

h N (),

not just vectors

h N ().

It should be

the argument could be simplied considerably. However,

this lemma will prove immensely useful when we turn to the problem of sparse recovery from noisy
10

measurements (Section 4.3) later in this course

, and thus we establish it now in its full generality.

We state the lemma here, which is proven in "`1 minimization proof" (Section 7.4).

Lemma 3.4:
any subset of
entries of

hc0

2K , and let h RN , h 6= 0 be arbitrary. Let 0 be

such that |0 | K . Dene 1 as the index set corresponding to the K
largest magnitude, and set = 0 1 . Then

satises
{1, 2, ..., N }

Suppose that

with

the RIP of order

khc k
|< h , h >|
kh k2 0 1 +
,
kh k2
K

(3.32)

where

22K
,
1 2K

1
.
1 2K

Again, note that Lemma 3.4, p. 23 holds for arbitrary

In order to prove Theorem 3.5, p. 22,

we merely need to apply Lemma 3.4, p. 23 to the case where

Towards this end, suppose that

h N ().

(3.33)

h N ().

It is sucient to show that

khc k
kh k2 C 1
K
holds for the case where
can take

is the index set corresponding to the

to be the index set corresponding to the

(3.34)

largest entries of

Thus, we

and apply Lemma 3.4,

p. 23.
The second term in Lemma 3.4, p. 23 vanishes since

h = 0,

and thus we have

khc k
kh k2 0 1 .
K

(3.35)

Using Lemma 3.3, p. 23,

khc0 k1 = kh1 k1 + khc k1

Kkh1 k2 + khc k1

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

(3.36)

CHAPTER 3. SENSING MATRICES

resulting in

khc k
kh k2 kh1 k2 + 1 .
K
Since

kh1 k2 kh k2 ,

The assumption

(3.37)

we have that

2K <

khc k
(1 ) kh k2 1 .
K
21

ensures that

< 1,

(3.38)

and thus we may divide by

without

changing the direction of the inequality to establish (3.34) with constant

22K

,
C=
=
1
1 1 + 2 2K

(3.39)

as desired.

3.5 Matrices that satisfy the RIP

We now turn to the question of how to construct matrices that satisfy the restricted isometry property
(Section 3.3) (RIP). It is possible to deterministically construct matrices of size
RIP of order

but such constructions also require

construction in [62] requires

some constant
on

M = O K 2 logN

M N

that satisfy the

to be relatively large [62], [115]. For example, the

while the construction in [115] requires

M = O (KN )

for

In many real-world settings, these results would lead to an unacceptably large requirement

M.
Fortunately, these limitations can be overcome by randomizing the matrix construction. We will construct

our random matrices as follows: given

and

generate random matrices

by choosing the entries

as independent realizations from some probability distribution. We begin by observing that if all we require

M = 2K and draw a according to a Gaussian distribution. With

2K columns will be linearly independent, and hence all subsets of 2K columns
will be bounded below by 1 2K where 2K > 0. However, suppose we wish to know the constant 2K . In

N
N
order to nd the value of the constant we must consider all possible
K K -dimensional subspaces of R .
From a computational perspective, this is impossible for any realistic values of N and K . Thus, we focus on
the problem of achieving the RIP of order 2K for a specied constant 2K . Our treatment is based on the

is that

2K > 0

then we may set

probability 1, any subset of

simple approach rst described in [7] and later generalized to a larger class of random matrices in [144].
To ensure that the matrix will satisfy the RIP, we will impose two conditions on the random distribution.
First, we require that the distribution will yield a matrix that is norm-preserving, which will require that

1
E 2ij =
,
(3.40)
M
and hence the variance of the distribution is 1/M . Second, we require that the distribution is a sub-Gaussian
distribution (Section 7.1), meaning that there exists a constant c > 0 such that

2 2
E eij t ec t /2
for all

t R.

This says that the moment-generating function of our distribution is dominated by that of

a Gaussian distribution, which is also equivalent to requiring that tails of our distribution decay

fast

(3.41)

at least as

as the tails of a Gaussian distribution. Examples of sub-Gaussian distributions include the Gaussian

11 This

content is available online at <http://cnx.org/content/m37177/1.5/>.

distribution, the Bernoulli distribution taking values

1/ M ,

and more generally any distribution with

bounded support. See "Sub-Gaussian random variables" (Section 7.1) for more details.
For the moment, we will actually assume a bit more than sub-Gaussianity. Specically, we will assume
that the entries of

are

strictly

sub-Gaussian, which means that they satisfy (3.41) with

c2 = E 2ij =

1
M . Similar results to the following would hold for general sub-Gaussian distributions, but to simplify the
constants we restrict our present attention to the strictly sub-Gaussian . In this case we have the following
useful result, which is proven in "Concentration of measure for sub-Gaussian random variables" (Section 7.2).

Corollary 3.1:
Suppose that

is an

M N

matrix whose entries

strictly sub-Gaussian distribution with

and any

c2 = 1/M .

ij are i.i.d. with ij drawn according to a

Y = x for x RN . Then for any > 0,

Let

x RN ,

E k Y k22 =k x k22

(3.42)

M 2
2
2
2

P k Y k2 k x k2 k x k2 2exp

(3.43)

and

with

= 2/ (1 log (2)) 6.52.

This tells us that the norm of a sub-Gaussian random vector strongly concentrates about its mean. Using
this result, in "Proof of the RIP for sub-Gaussian matrices" (Section 7.3) we provide a simple proof based
on that in [7] that sub-Gaussian matrices satisfy the RIP.

Theorem 3.6:
Fix

(0, 1).

Let

be an

M N

random matrix whose entries

c2 = 1/M .

N
,
M 1 Klog
K

according to a strictly sub-Gaussian distribution with

satises the RIP of order K with the prescribed

where 1 is arbitrary and 2 = /2 log (42e/) /1 .

then

are i.i.d. with

drawn

(3.44)

with probability exceeding

1 2e2 M ,

Note that in light of the measurement bounds in "The restricted isometry property" (Section 3.3) we see
that (3.44) achieves the optimal number of measurements (up to a constant).
Using random matrices to construct

has a number of additional benets. To illustrate these, we will

focus on the RIP. First, one can show that for random constructions the measurements are

democratic,

meaning that it is possible to recover a signal using any suciently large subset of the measurements [58],
[129].

Thus, by using random

measurements.

one can be robust to the loss or corruption of a small fraction of the

Second, and perhaps more signicantly, in practice we are often more interested in the

x is sparse with respect to some basis . In this case what we actually require is that the
satises the RIP. If we were to use a deterministic construction then we would need to explicitly
take into account in our construction of , but when is chosen randomly we can avoid this consideration.
For example, if is chosen according to a Gaussian distribution and is an orthonormal basis then one
can easily show that will also have a Gaussian distribution, and so provided that M is suciently high
will satisfy the RIP with high probability, just as before. Although less obvious, similar results hold for
setting where

product

sub-Gaussian distributions as well [7].

This property, sometimes referred to as

signicant advantage of using random matrices to construct

universality,

constitutes a

Finally, we note that since the fully random matrix approach is sometimes impractical to build in hardware, several hardware architectures have been implemented and/or proposed that enable random measurements to be acquired in practical settings. Examples include the random demodulator (Section 6.5) [192],
random ltering [194], the modulated wideband converter [147], random convolution [2], [166], and the
compressive multiplexer [179]. These architectures typically use a reduced amount of randomness and are

CHAPTER 3. SENSING MATRICES

modeled via matrices

that have signicantly more structure than a fully random matrix. Perhaps some-

what surprisingly, while it is typically not quite as easy as in the fully random case, one can prove that many
of these constructions also satisfy the RIP.

3.6 Coherence

While the spark (Section 3.2), null space property (Section 3.2) (NSP), and restricted isometry property
(Section 3.3) (RIP) all provide guarantees for the recovery of sparse (Section 2.3) signals, verifying that a
general matrix

satises any of these properties has a combinatorial computational complexity, since in

each case one must essentially consider

N
K

submatrices. In many settings it is preferable to use properties

that are easily computable to provide more concrete recovery guarantees. The

coherence

of a matrix is

one such property [71], [190].

Denition 3.5:
The coherence of a matrix

i , j

, (), is the largest absolute inner product between any two columns

:
() =

max

|< i , j >|
.
i k2 k j k2

(3.45)

1i<jN k

It is possible to show that the coherence of a matrix is always in the range

lower bound is known as the Welch bound [168], [180], [210]. Note that when
approximately

() 1/ M .

()

N M,

N M
M (N 1) , 1 ; the

the lower bound is

One can sometimes relate coherence to the spark, NSP, and RIP. For example, the coherence and spark
properties of a matrix can be related by employing the Gershgorin circle theorem [100], [200].

Theorem 3.7:

(Theorem 2 of [100])

N N matrix M with entries mij , 1 i, j N

P, lie in the
di = di (ci , ri ), 1 i N , centered at ci = mii and with radius ri = j6=i |mij |.
The eigenvalues of an

Applying this theorem on the Gram matrix

Lemma 3.5:
For any matrix

G = T

union of

discs

leads to the following straightforward result.

,
spark () 1 +

1
.
()

(3.46)

Proof:
spark () does not depend on the scaling of the columns, we can assume without loss of
has unit-norm columns. Let {1, ..., N } with || = p determine a set of indices.
T
consider the restricted Gram matrix G = , which satises the following properties:

Since

generality that
We

gii = 1, 1 i p;
|gij | (), 1 i, j p, i 6= j .
P

j6=i |gij | < |gii | then the matrix G is positive

are linearly independent. Thus, the spark condition implies
equivalently, p < 1 + 1/ () for all p < spark (), yielding spark ()

From Theorem 3.7, (Theorem 2 of [100]), p. 26, if

denite, so that the columns of

(p 1) () < 1
1 + 1/ ().

or,

By merging Theorem 1 from "Null space conditions" (Section 3.2) with Lemma 3.5, p. 26, we can pose
the following condition on

12 This

that guarantees uniqueness.

content is available online at <http://cnx.org/content/m37178/1.5/>.

Theorem 3.8:

(Theorem 12 of [71])

K<
then for each measurement vector

1
2

y RM

1
()

(3.47)

there exists at most one signal

x K

such that

y = x.
Theorem 3.8, (Theorem 12 of [71]), p. 26, together with the Welch bound, provides an upper bound on
the level of sparsity

K=O

that guarantees uniqueness using coherence:

. Another straightforward

application of the Gershgorin circle theorem (Theorem 3.7, (Theorem 2 of [100]), p. 26) connects the RIP
to the coherence property.

Lemma 3.6:
If has unit-norm columns and
= (K 1) for all K < 1/.

coherence

= (),

then

satises the RIP of order

with

The proof of this lemma is similar to that of Lemma 3.5, p. 26.

The results given here emphasize the need for small coherence

()

for the matrices used in CS. Co-

herence bounds have been studied both for deterministic and randomized matrices. For example, there are
known matrices

of size M M 2

that achieve the coherence lower bound

() = 1/ M , such as the Gabor

frame generated from the Alltop sequence [114] and more general equiangular tight frames [180]. These constructions restrict the number of measurements needed to recover a

K -sparse signal to be M = O K 2 logN

Furthermore, it can be shown that when the distribution used has zero mean and nite variance, then in
the asymptotic regime (as

and

[67]. Such constructions would allow

p
() = (2logN ) /M [23], [29],
M = O K 2 logN , matching the known

grow) the coherence converges to

to grow asymptotically as

nite-dimensional bounds.
The measurement bounds dependent on coherence are handicapped by the squared dependence on the
sparsity

but it is possible to overcome this bottleneck by shifting the types of guarantees from worst-

case/deterministic behavior, to average-case/probabilistic behavior [196], [197]. In this setting, we pose a

K -sparse signals x K . It is then possible to show that if has low

k k2 , and if K = O 2 () logN , then the signal x can be recovered
y = x with high probability. Note that if we replace the Welch bound, then we

probabilistic prior on the set of

coherence

()

and spectral norm

from its CS measurements

obtain

K = O (M logN ),

which returns to the linear dependence of the measurement bound on the signal

sparsity that appears in RIP-based results.

CHAPTER 3. SENSING MATRICES

Chapter 4

Sparse Signal Recovery via `_1

Minimization
4.1 Signal recovery via

`_1

minimization

As we will see later in this course , there now exist a wide variety of approaches to recover a sparse (Section 2.3) signal

from a small number of linear measurements.

We begin by considering a natural rst

approach to the problem of sparse recovery.

Given measurements

y = x

and the knowledge that our original signal

(Section 2.4), it is natural to attempt to recover

x= argmin kzk0
z

where

B (y)

ensures that

the observed measurements.

B (y) = {z : z = y}.

subject to

is sparse or compressible

z B (y) ,

is consistent with the measurements

counts the number of nonzero entries in

by solving an optimization problem of the form

(4.1)

Recall that

kzk0 = |supp (z) |

simply

so (4.1) simply seeks out the sparsest signal consistent with

For example, if our measurements are exact and noise-free, then we can set

When the measurements have been contaminated with a small amount of bounded

B (y) = {z : kz yk2 }. In both cases, (4.1) nds the sparsest x that is

y.
are inherently assuming that x itself is sparse. In the more common setting where

noise, we could instead set

consistent with the measurements

Note that in (4.1) we

x = ,

we can easily modify the approach and instead consider

= argmin kzk0
z

where

B (y) = {z : z = y}

z B (y)

subject to

B (y) = {z : kz yk2 }.

By setting

(4.2)

we see that (4.1) and

(4.2) are essentially identical. Moreover, as noted in "Matrices that satisfy the RIP" (Section 3.5), in many
cases the introduction of

does not signicantly complicate the construction of matrices

such that

will

satisfy the desired properties. Thus, for most of the remainder of this course we will restrict our attention
to the case where

= I . It is important to note, however, that this restriction

is a general dictionary and not an orthonormal basis.

in our analysis when

k x xk2 = k c ck2 6= k k2 ,
bound on

1 This
2

k x xk,

and thus a bound on

k c ck2

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

For example, in this case

cannot directly be translated into a

which is often the metric of interest.

content is available online at <http://cnx.org/content/m37179/1.5/>.

does impose certain limits

CHAPTER 4. SPARSE SIGNAL RECOVERY VIA `_1 MINIMIZATION

Although it is possible to analyze the performance of (4.1) under the appropriate assumptions on
do not pursue this strategy since the objective function

k k0

is nonconvex, and hence (4.1) is potentially very

dicult to solve. In fact, one can show that for a general matrix

, even nding a solution that approximates

the true minimum is NP-hard. One avenue for translating this problem into something more tractable is to
replace

k k0

with its convex approximation

k k1 .

x= argmin kzk1
z

Provided that

B (y)

Specically, we consider

subject to

z B (y) .

(4.3)

is convex, (4.3) is computationally feasible. In fact, when

B (y) = {z : z = y},

the

resulting problem can be posed as a linear program [43].

(a)

(b)

2
Figure 4.1: Best approximation of a point in R by a a one-dimensional subspace using the `1 norm
1
and the `p quasinorm with p = 2 . (a) Approximation in `1 norm (b) Approximation in `p quasinorm

It is clear that replacing (4.1) with (4.3) transforms a computationally intractable problem into a tractable
one, but it may not be immediately obvious that the solution to (4.3) will be at all similar to the solution
to (4.1). However, there are certainly intuitive reasons to expect that the use of

minimization will indeed

promote sparsity. As an example, recall the example we discussed earlier shown in Figure 4.1. In this case

`1
p < 1,

the solutions to the

minimization problem coincided exactly with the solution to the

problem for any

and notably, is sparse. Moreover, the use of

minimization

minimization to promote or exploit

sparsity has a long history, dating back at least to the work of Beurling on Fourier transform extrapolation
from partial observations [16].
Additionally, in a somewhat dierent context, in 1965 Logan [133] showed that a bandlimited signal
can be perfectly recovered in the presence of

arbitrary

corruptions on a small interval. Again, the recovery

method consists of searching for the bandlimited signal that is closest to the observed signal in the
This can be viewed as further validation of the intuition gained from Figure 4.1 the

norm.

norm is well-suited

to sparse errors.
Historically, the use of

minimization on large problems nally became practical with the explosion of

computing power in the late 1970's and early 1980's. In one of its rst applications, it was demonstrated that
geophysical signals consisting of spike trains could be recovered from only the high-frequency components of
these signals by exploiting `1 minimization [132], [184], [207]. Finally, in the 1990's there was renewed interest
in these approaches within the signal processing community for the purpose of nding sparse approximations

(Section 2.4) to signals and images when represented in overcomplete dictionaries or unions of bases [43],
[140]. Separately,

minimization received signicant attention in the statistics literature as a method for

variable selection in linear regression (Section 6.1), known as the Lasso [187].
Thus, there are a variety of reasons to suspect that
for sparse signal recovery.

minimization will provide an accurate method

More importantly, this also provides a computationally tractable approach to

the sparse signal recovery problem. We now provide an overview of

minimization in both the noise-free

(Section 4.2) and noisy (Section 4.3) settings from a theoretical perspective. We will then further discuss

algorithms for performing

minimization (Section 5.2) later in this course .

4.2 Noise-free signal recovery

We now begin our analysis of
^

x= argmin kzk1

subject to

for various specic choices of

B (y).

z B (y) .

(4.4)

In order to do so, we require the following general result which builds

on Lemma 4 from "`1 minimization proof" (Section 7.4). The key ideas in this proof follow from [25].

Lemma 4.1:
Suppose that

2K <

2 1.

satises the restricted isometry property (Section 3.3) (RIP) of order

the

entries of

hc0

x, x R
K entries

Let

corresponding to the

be given, and dene

h =x x.
1

with largest magnitude and

with largest magnitude. Set

= 0 1 .

Let

with

denote the index set

the index set corresponding to

k x k1 kxk1 ,

then

K (x)
|< h , h >|
.
khk2 C0 1 + C1
kh k2
K

(4.5)

where

C0 = 2

2 2K

,
1 1 + 2 2K
1 1

C1 =

2

.
1 1 + 2 2K

(4.6)

Proof:
We begin by observing that

h = h + hc ,

so that from the triangle inequality

khk2 kh k2 + khc k2 .
We rst aim to bound

khc k2 .

From Lemma 3 from "`1 minimization proof" (Section 7.4) we have

khc k2 = k

hj k

j2
where the
of

hc0

are dened as before, i.e.,

(in absolute value),

(4.7)

khj k2

khc0 k1

,
K

(4.8)

is the index set corresponding to the

as the index set corresponding to the next

largest entries

largest entries, and so

on.
We now wish to bound

khc0 k1 .

Since

kxk1 k x k1 ,

by applying the triangle inequality we

obtain

kxk1 kx + hk1

= kx0 + h0 k1 + kxc0 + hc0 k1

kx0 k1 kh0 k1 + khc0 k1 kxc0 k1 .

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

3
4 This

content is available online at <http://cnx.org/content/m37181/1.6/>.

(4.9)

CHAPTER 4. SPARSE SIGNAL RECOVERY VIA `_1 MINIMIZATION

Rearranging and again applying the triangle inequality,

khc0 k1

kxk1 kx0 k1 + kh0 k1 + kxc0 k1

(4.10)

kx x0 k1 + kh0 k1 + kxc0 k1 .
Recalling that

K (x)1 = kxc0 k1 = kx x0 k1 ,
khc0 k1 kh0 k1 + 2K (x)1 .

(4.11)

Combining this with (4.8) we obtain

khc k2

K (x)
kh0 k1 + 2K (x)1

kh0 k2 + 2 1
K
K

(4.12)

`p norms (Lemma 1 from "The RIP

kh0 k2 kh k2 this combines with (4.7) to yield

where the last inequality follows from standard bounds on

and the NSP" (Section 3.4)). By observing that

K (x)
khk2 2kh k2 + 2 1 .
K
kh k2 .

We now turn to establishing a bound for

(4.13)

Combining Lemma 4 from "`1 minimization

proof" (Section 7.4) with (4.11) and again applying standard bounds on

kh k2

norms we obtain

Since

(4.14)

kh0 k2 kh k2 ,

The assumption that

K (x)
|< h , h >|
.
(1 ) kh k2 2 1 +
kh k2
K

2K < 2 1 ensures that < 1. Dividing by (1 )

(4.15)

and combining with

(4.13) results in

khk2
Plugging in for

and

4
+2
1

K (x)1
2 |< h , h >|

+
.
1
kh k2
K

(4.16)

yields the desired constants.

`1 minimization algorithms described by

satisfying the RIP. In order to obtain specic bounds for

Lemma 4.1, p. 31 establishes an error bound for the class of

(4.4) when combined with a measurement matrix
concrete examples of

B (y),

we must examine how requiring

x B (y)

aects

|< h , h >|.

As an example,

in the case of noise-free measurements we obtain the following theorem.

Theorem 4.1:
Suppose that
the form

(Theorem 1.1 of [25])

satises the RIP of order

y = x.

Then when

with

B (y) = {z : z = y},

2K <

the solution

K (x)
^
k x xk2 C0 1 .
K

and we obtain measurements of

to (4.4) obeys

(4.17)

Proof:
Since

x B (y)

we can apply Lemma 4.1, p. 31 to obtain that for

h =x x,

K (x)
|< h , h >|
khk2 C0 1 + C1
.
kh k2
K
Furthermore, since

x, x B (y)

we also have that

y = x = x

(4.18)

and hence

h = 0.

Therefore the

second term vanishes, and we obtain the desired result.

Theorem 4.1, (Theorem 1.1 of [25]), p. 32 is rather remarkable. By considering the case where x K =
{x : kxk0 K} we can see that provided satises the RIP which as shown earlier allows for as few as
O (Klog (N/K)) measurements we can recover any K -sparse xexactly. This result seems improbable on
its own, and so one might expect that the procedure would be highly sensitive to noise, but we will see next
that Lemma 4.1, p. 31 can also be used to demonstrate that this approach is actually stable.

satises the RIP. One could easily

satises the null space property (Section 3.2)
(NSP) instead. Specically, if we are only interested in the noiseless setting, in which case h lies in the null
space of , then Lemma 4.1, p. 31 simplies and its proof could be broken into two steps: (i) show that if
satises the RIP then it satises the NSP (as shown in "The RIP and the NSP" (Section 3.4)), and (ii)
Note that Theorem 4.1, (Theorem 1.1 of [25]), p. 32 assumes that

modify the argument to replace this with the assumption that

the NSP implies the simplied version of Lemma 4.1, p. 31. This proof directly mirrors that of Lemma 4.1,
p.

31.

Thus, by the same argument as in the proof of Theorem 4.1, (Theorem 1.1 of [25]), p.

straightforward to show that if

32, it is

satises the NSP then it will obey the same error bound.

4.3 Signal recovery in noise

The ability to perfectly reconstruct a sparse (Section 2.3) signal from noise-free (Section 4.2) measurements
represents a promising result.

However, in most real-world systems the measurements are likely to be

contaminated by some form of noise.

For instance, in order to process data in a computer we must be

able to represent it using a nite number of bits, and hence the measurements will typically be subject to
quantization error.

Moreover, systems which are implemented in physical hardware will be subject to a

variety of dierent types of noise depending on the setting.

Perhaps somewhat surprisingly, one can show that it is possible to modify
^

x= argmin kzk1
z

subject to

z B (y) .

(4.19)

to stably recover sparse signals under a variety of common noise models [34], [39], [112].

As might be

expected, the restricted isometry property (Section 3.3) (RIP) is extremely useful in establishing performance
guarantees in noise.
In our analysis we will make repeated use of Lemma 1 from "Noise-free signal recovery" (Section 4.2), so
we repeat it here for convenience.

Lemma 4.2:
Suppose that
dene

satises the RIP of order

with

2K <

2 1.

Let

x, x RN

be given, and

h =x x. Let 0 denote the index set corresponding to the K entries of x with largest
1 the index set corresponding to the K entries of hc0 with largest magnitude. Set

magnitude and

= 0 1 .

k x k1 kxk1 ,

then

K (x)
|< h , h >|
.
khk2 C0 1 + C1
kh k2
K
5 This

content is available online at <http://cnx.org/content/m37182/1.5/>.

(4.20)

CHAPTER 4. SPARSE SIGNAL RECOVERY VIA `_1 MINIMIZATION

where

2 2K

,
C0 = 2
1 1 + 2 2K
1 1

C1 =

2

.
1 1 + 2 2K

(4.21)

4.3.1 Bounded noise

We rst provide a bound on the worst-case performance for uniformly bounded noise, as rst investigated
in [34].

Theorem 4.2:
Suppose that
Then when

(Theorem 1.2 of [26])

satises the RIP of order 2K

B (y) = {z : kz yk2 },

with

2K <

the solution

21 and let y = x+e where kek2 .

to (4.19) obeys

K (x)
^
k x xk2 C0 1 + C2 ,
K

(4.22)

where

2 2K

,
C0 = 2
1 1 + 2 2K
1 1

Proof:
We are interested in bounding
that

k x k1 kxk1 .

khk2 = k x xk2 .

C2 = 4

Since

1 + 2K

.
1 1 + 2 2K

kek2 , x B (y),

(4.23)

and therefore we know

Thus we may apply Lemma 4.2, p. 33, and it remains to bound

|< h , h >|.

To do this, we observe that

^
^
x
x k = k x y + y xk2 k x yk2 + ky xk2 2
khk2 = k

(4.24)

2
where the last inequality follows since

x, x B (y).

Combining this with the RIP and the Cauchy-

Schwarz inequality we obtain

|< h , h >| kh k2 khk2 2

1 + 2K kh k2 .

(4.25)

Thus,

p
K (x)
K (x)
khk2 C0 1 + C1 2 1 + 2K = C0 1 + C2 ,
K
K

(4.26)

completing the proof.

In order to place this result in context, consider how we would recover a sparse vector
to already know the

oracle estimator.

locations of the nonzero coecients, which we denote by

0 .

if we happened

This is referred to as the

In this case a natural approach is to reconstruct the signal using a simple pseudoinverse:
^

xc0

= 0 y = T0 0
= 0.

T0 y

(4.27)

0 has full column-rank (and hence we are considering the case

M K matrix with the columns indexed by c0 removed) so that there is a unique solution
y = 0 x0 . With this choice, the recovery error is given by

The implicit assumption in (4.27) is that

where

is the

to the equation

k x xk2 = k T0 0

T0 (x + e) xk2 = k T0 0

We now consider the worst-case bound for this error.

then the largest singular value of
worst-case recovery error over all

Therefore, if

T0 ek2 .

(4.28)

Using standard properties of the singular value

2K (with constant 2K ),

0 lies in the range 1/ 1 + 2K , 1/ 1 2K . Thus, if we consider the

e such that kek2 , then the recovery error can be bounded by

decomposition, it is straightforward to show that if

satises the RIP of order

^
k x xk2
.
1 + 2K
1 2K

(4.29)

x is exactly K -sparse, then the guarantee for the pseudoinverse recovery method, which is given

perfect knowledge of the true support of x,

cannot improve upon the bound in Theorem 4.2, (Theorem 1.2

of [26]), p. 34 by more than a constant value.

We now examine a slightly dierent noise model. Whereas Theorem 4.2, (Theorem 1.2 of [26]), p. 34
assumed that the noise norm

kek2

known as the

in the case where

Dantzig selector

was small, the theorem below analyzes a dierent recovery algorithm

kT ek

is small [39]. We will see below that this will lead

to a simple analysis of the performance of this algorithm in Gaussian noise.

Theorem 4.3:

satises the RIP of order 2K with 2K < 2 1 and we obtain measurements

y = x + e where kT ek . Then when B (y) = {z : kT (z y) k }, the

Suppose that
of the form
solution

to (4.19) obeys

K (x)
^
k x xk2 C0 1 + C3 K,
K

(4.30)

where

2 2K

C0 = 2
,
1 1 + 2 2K

4 2

C3 =
.
1 1 + 2 2K

1 1

Proof:
The proof mirrors that of Theorem 4.2, (Theorem 1.2 of [26]), p. 34. Since
have that

x B (y),

k x k1 kxk1

and thus Lemma 4.2, p.

(4.31)

kT ek ,

33 applies.

approach as in Theorem 4.2, (Theorem 1.2 of [26]), p. 34 to bound

we again

We follow a similar

|< h , h >|.

We rst note

that

k hk

^
k
x y k
T

+ kT (y x) k 2

(4.32)

where the last inequality again follows since

x, x B (y).

Next, note that

h = h .

Using this

we can apply the Cauchy-Schwarz inequality to obtain

|< h , h >| = < h , T h > kh k2 kT hk2 .
k hk 2,
2K (2). Thus,

Finally, since

kT hk2

we have that every coecient of

is at most

K (x)
K (x)
khk2 C0 1 + C1 2 2K = C0 1 + C3 K,
K
K

(4.33)

and thus

(4.34)

CHAPTER 4. SPARSE SIGNAL RECOVERY VIA `_1 MINIMIZATION

as desired.

4.3.2 Gaussian noise

Finally, we also examine the performance of these approaches in the presence of Gaussian noise. The case of
Gaussian noise was rst considered in [112], which examined the performance of

minimization with noisy

measurements. We now see that Theorem 4.2, (Theorem 1.2 of [26]), p. 34 and Theorem 4.3, p. 35 can

`1 minimization.
x K = {x : kxk0 K}, so

be leveraged to provide similar guarantees for

To simplify our discussion we will restrict

our attention to the case where

that

K (x)1 = 0

and the error bounds in

Theorem 4.2, (Theorem 1.2 of [26]), p. 34 and Theorem 4.3, p. 35 depend only on the noise
To begin, suppose that the coecients of
mean zero and variance

2 .

e RM

are i.i.d. according to a Gaussian distribution with

Since the Gaussian distribution is itself sub-Gaussian, we can apply results such

as Corollary 1 from "Concentration of measure for sub-Gaussian random variables" (Section 7.2) to show
that there exists a constant

c0 > 0 such that for any > 0,

P kek2 (1 + ) M exp c0 2 M .

Applying this result to Theorem 4.2, (Theorem 1.2 of [26]), p. 34 with

= 1,

(4.35)
we obtain the following result

for the special case of Gaussian noise.

Corollary 4.1:

2K < 2 1. Furthermore, suppose

x K and that we obtain measurements of the form y = x + e where the entries of e are

^
N 0, 2 . Then when B (y) = {z : kz yk2 2 M }, the solution x to (4.19) obeys

1 + 2K
^

k x xk2 8
M
1 1 + 2 2K
Suppose that

satises the RIP of order

with probability at least

with

that
i.i.d.

(4.36)

1 exp (c0 M ).

We can similarly consider Theorem 4.3, p. 35 in the context of Gaussian noise. If we assume that the
columns of
variance

have unit norm, then each coecient of T e is a Gaussian random variable with mean zero and

. Using standard tail bounds for the Gaussian distribution (see Theorem 1 from "Sub-Gaussian

random variables" (Section 7.1)), we have that

P T e i t exp t2 /2
for

i = 1, 2, ..., n.

Thus, using the union bound over the bounds for dierent

(4.37)

we obtain

p
1
P kT ek 2 logN N exp (2logN ) = .
N

(4.38)

Applying this to Theorem 4.3, p. 35, we obtain the following result, which is a simplied version of Theorem
1.1 of [39].

Corollary 4.2:

2K < 2 1.
Furthermore, suppose that x K and that we obtain measurements of the form y = x + e where

2
T
the entries of e are i.i.d. N 0, . Then when B (y) = {z : k (z y) k 2 logN }, the
Suppose that

solution

has unit-norm columns and satises the RIP of order

with

to (4.19) obeys

k x xk2 4 2
with probability at least

1
N.

p
1 + 2K

KlogN
1 1 + 2 2K

(4.39)

Ignoring the precise constants and the probabilities with which the bounds hold (which we have made no
eort to optimize), we observe that if

M = O (KlogN ) then these results appear to be essentially the same.

M and N are xed and we consider the eect of varying

However, there is a subtle dierence. Specically, if

we can see that Corollary 4.2, p. 36 yields a bound that is adaptive to this change, providing a stronger

guarantee when

is small, whereas the bound in Corollary 4.1, p. 36 does not improve as

is reduced.

Thus, while they provide very similar guarantees, there are certain circumstances where the Dantzig selector
is preferable. See [39] for further discussion of the comparative advantages of these approaches.

4.4 Instance-optimal guarantees revisited

We now briey return to the noise-free (Section 4.2) setting to take a closer look at instance-optimal guarantees for recovering non-sparse signals. To begin, recall that in Theorem 1 from "Noise-free signal recovery"
(Section 4.2) we bounded the

`2 -norm

of the reconstruction error of

x= argmin kzk1

subject to

z B (y) .

(4.40)

^
k x xk2 C0 K (x)1 / K

(4.41)

when B (y) = {z : z = y}. One can generalize this result to measure the reconstruction error using the
`p -norm for any p [1, 2]. For example, by a slight modication of these arguments, one can also show
that

k x xk1 C0 K (x)1

(see [27]). This leads us to ask whether we might replace the bound for the

error with a result of the form

k x xk2 CK (x)2 .

Unfortunately, obtaining such a result requires an

unreasonably large number of measurements, as quantied by the following theorem of [45].

Theorem 4.4:
Suppose that

for some

Proof:

K 1,

(Theorem 5.1 of [45])

M N

is an

then

h RN

matrix and that

: RM RN

is a recovery algorithm that satises

kx (x) k2 CK (x)2

p
1 1 1/C 2 N .

(4.42)

N (). We write h = h + hc where is an

arbitrary set of indices satisfying || K . Set x = hc , and note that x = hc = h h =
h since h N (). Since h K , (4.42) implies that (x) = (h ) = h . Hence,
kx (x) k2 = khc (h ) k2 = khk2 . Furthermore, we observe that K (x)2 kxk2 , since
by denition K (x)2 kx x
k2 for all x
K , including x
= 0. Thus khk2 Ckhc k2 . Since
2
2
2
c
khk2 = kh k2 + kh k2 , this yields

1
1
kh k22 = khk22 khc k22 khk22 2 khk22 = 1 2 khk22 .
(4.43)
C
C
We begin by letting

denote any vector in

This must hold for any vector

particular, let

M
{vi }N
i=1

h N ()

and for any set of indices

be an orthonormal basis for

N (),

such that

|| K .
{hi }N
i=1

and dene the vectors

In
as

follows:

hj =

NX
M

vi (j) vi .

i=1

6 This

content is available online at <http://cnx.org/content/m37183/1.6/>.

(4.44)

CHAPTER 4. SPARSE SIGNAL RECOVERY VIA `_1 MINIMIZATION

hj =

in the

j -th

PN M

< ej , vi > vi where ej denotes the vector of all zeros except for a 1
hj = PN ej where PN denotes an orthogonal projection onto

kPN ej k22 + kPN

ej k22 = kej k22 = 1, we have that khj k2 1. Thus, by setting = {j}

We note that

i=1

entry. Thus we see that

N (). Since
for hj we observe

that

2

M

NX
1
1

2
2
|vi (j) | = |hj (j) | 1 2 khj k22 1 2 .

C
C
i=1
Summing over

j = 1, 2, ..., N ,

1/C 2

we obtain

N NX
M
X

|vi (j) | =

NX
M X
N

j=1 i=1
and thus

(4.45)

|vi (j) | =

NX
M

i=1 j=1

p
M 1 1 1/C 2 N

kvi k22 = N M,

(4.46)

i=1

as desired.

Thus, if we want a bound of the form (4.42) that holds for

all

regardless of what recovery algorithm we use we will need to take

signals x with a constant C 1, then

M N measurements. However, in a

sense this result is overly pessimistic, and we will now see that the results we just established for signal
recovery in noise can actually allow us to overcome this limitation by essentially treating the approximation
error as noise.
Towards this end, notice that all the results concerning

instance-optimal guarantees that apply simultaneously to all

minimization stated thus far are deterministic

given any matrix that satises the restricted

isometry property (Section 3.5) (RIP). This is an important theoretical property, but as noted in "Matrices
that satisfy the RIP" (Section 3.5), in practice it is very dicult to obtain a deterministic guarantee that the
matrix

satises the RIP. In particular, constructions that rely on randomness are only known to satisfy

the RIP with high probability.

As an example, recall Theorem 1 from "Matrices that satisfy the RIP"

(Section 3.5), which opens the door to slightly weaker results that hold only with high probability.

Theorem 4.5:
Fix

(0, 1).

Let

be an

M N

ij are i.i.d.
c2 = 1/M . If

random matrix whose entries

according to a strictly sub-Gaussian distribution (Section 7.1) with

M 1 Klog

N
K

with

drawn

satises the RIP of order K with the prescribed

where 1 is arbitrary and 2 = /2 log (42e/) /1 .

then

(4.47)

with probability exceeding

Even within the class of probabilistic results, there are two distinct avors.

1 2e2 M ,

The typical approach is

to combine a probabilistic construction of a matrix that will satisfy the RIP with high probability with
the previous results in this chapter.

This yields a procedure that, with high probability, will satisfy a

deterministic guarantee applying to all possible signals

given a signal

that signal x.

we can draw a random matrix

A weaker kind of result is one that states that

and with high probability expect certain performance

This type of guarantee is sometimes called

instance-optimal in probability.

essentially whether or not we need to draw a new random

for each signal

This may be an important

distinction in practice, but if we assume for the moment that it is permissible to draw a new matrix
each

for

The distinction is

for

then we can see that Theorem 4.4, (Theorem 5.1 of [45]), p. 37 may be somewhat pessimistic. In

order to establish our main result we will rely on the fact, previously used in "Matrices that satisfy the RIP"
(Section 3.5), that sub-Gaussian matrices preserve the norm of an arbitrary vector with high probability.
Specically, a slight modication of Corollary 1 from "Matrices that satisfy the RIP" (Section 3.5) shows
that for any

x RN ,

if we choose

according to the procedure in Theorem 4.5, p. 38, then we also have

that

P k x k22 2 k x k22 exp (3 M )

(4.48)

with

3 = 4/ .

Using this we obtain the following result.

Theorem 4.6:

x RN be xed. Set 2K < 2 1 Suppose that is an M N sub-Gaussian random matrix

with M 1 Klog (N/K). Suppose we obtain measurements of the form y = x. Set = 2K (x)2 .
Then with probability exceeding 12exp (2 M )exp (3 M ), when B (y) = {z : kz yk2 },
Let

the solution

to (4.40) obeys

8 1 + 2K 1 + 2 2K

k x xk2
K (x)2 .
1 1 + 2 2K
^

(4.49)

Proof:
will satisfy the RIP of
2K with probability at least 12exp (2 M ). Next, let denote the index set corresponding
to the K entries of x with largest magnitude and write x = x + xc . Since x K , we can write
x = x +xc = x +e. If is sub-Gaussian then from Lemma 2 from "Sub-Gaussian random
variables" (Section 7.1) we have that xc is also sub-Gaussian, and one can apply (4.48) to obtain
that with probability at least 1 exp (3 M ), kxc k2 2kxc k2 = 2K (x)2 . Thus, applying the
union bound we have that with probability exceeding 1 2exp (2 M ) exp (3 M ), we satisfy
the necessary conditions to apply Theorem 1 from "Signal recovery in noise" (Section 4.3) to x ,
in which case K (x )1 = 0 and hence
First we recall that, as noted above, from Theorem 4.5, p. 38 we have that

order

k x x k2 2C2 K (x)2 .

(4.50)

From the triangle inequality we thus obtain

k x xk2 = k x x + x xk2 k x x k2 + kx xk2 (2C2 + 1) K (x)2

(4.51)

which establishes the theorem.

Thus, although it is not possible to achieve a deterministic guarantee of the form in (4.42) without taking
a prohibitively large number of measurements, it

possible to show that such performance guarantees can

hold with high probability while simultaneously taking far fewer measurements than would be suggested by
Theorem 4.4, (Theorem 5.1 of [45]), p. 37. Note that the above result applies only to the case where the
parameter is selected correctly, which requires some limited knowledge of

namely

K (x)2 .

In practice this

limitation can easily be overcome through a parameter selection technique such as cross-validation [209],
but there also exist more intricate analyses of

minimization that show it is possible to obtain similar

performance without requiring an oracle for parameter selection [212]. Note that Theorem 4.6, p. 39 can
also be generalized to handle other measurement matrices and to the case where
than sparse.

is compressible rather

Moreover, this proof technique is applicable to a variety of the greedy algorithms described

later in this course that do not require knowledge of the noise level to establish similar results [44], [152].

4.5 The cross-polytope and phase transitions

The analysis of

minimization based on the restricted isometry property (Section 3.3) (RIP) described in

"Signal recovery in noise" (Section 4.3) allows us to establish a variety of guarantees under dierent noise
settings, but one drawback is that the analysis of how many measurements are actually required for a matrix
to satisfy the RIP is relatively loose. An alternative approach to analyzing

7 This

content is available online at <http://cnx.org/content/m37184/1.5/>.

minimization algorithms is to

CHAPTER 4. SPARSE SIGNAL RECOVERY VIA `_1 MINIMIZATION

examine them from a more geometric perspective. Towards this end, we dene the closed
as the

cross-polytope :

ball, also known

C N = {x RN : kxk1 1}.
2N points {pi }2N
i=1 . Let
2N
as either the convex hull of {pi }i=1 or equivalently as
Note that

is the convex hull of

(4.52)

C N RM

denote the convex polytope dened

C N = {y RM : y = x, x C N }.
x K = {x : kxk0 K},

For any
of

we can associate a

One can show that the number of

K -faces

C N

K -face

(4.53)
with the support and sign pattern

is precisely the number of index sets of size

for

which signals supported on them can be recovered by

x= argmin kzk1

subject to

z B (y) .

(4.54)

`1 minimization yields the same solution as `0 minimization for all x K

K -faces of C N is identical to the number of K -faces of C N . Moreover, by
N
counting the number of K -faces of C , we can quantify exactly what fraction of sparse vectors can be
recovered using `1 minimization with as our sensing matrix. See [65], [68], [74], [75], [76] for more details
with

B (y) = {z : z = y}.

Thus,

if and only if the number of

and [77] for an overview of the implications of this body of work.

Note also that by replacing the cross-

polytope with certain other polytopes (the simplex and the hypercube), one can apply the same technique to
obtain results concerning the recovery of more limited signal classes, such as sparse signals with nonnegative
or bounded entries [77].
Given this result, one can then study random matrix constructions from this perspective to obtain
probabilistic bounds on the number of
Gaussian distribution.
results as

N .

C N with is generated
that K = M and M = N ,

K -faces

Under the assumption

This analysis leads to the

phase transition

there are sharp thresholds dictating that the fraction of

with high probability, depending on

and

at random, such as from a

one can obtain asymptotic

phenomenon, where for large problem sizes

K -faces

preserved will tend to either one or zero

[77].

These results provide sharp bounds on the minimum number of measurements required in the noiseless
setting.

In general, these bounds are signicantly stronger than the corresponding measurement bounds

obtained within the RIP-based framework given in "Noise-free signal recovery" (Section 4.2), which tend to
be extremely loose in terms of the constants involved. However, these sharper bounds also require somewhat
more intricate analysis and typically more restrictive assumptions on

(such as it being Gaussian). Thus,

one of the main strengths of the RIP-based analysis presented in "Noise-free signal recovery" (Section 4.2)
and "Signal recovery in noise" (Section 4.3) is that it gives results for a broad class of matrices that can also
be extended to noisy settings.

Chapter 5

Algorithms for Sparse Recovery

5.1 Sparse recovery algorithms

Given noisy compressive measurements

y = x + e

of a signal

(Section 1.1) (CS) is to recover a sparse (Section 2.3) signal

a core problem in compressive sensing

from a set of measurements (Section 3.1)

Considerable eorts have been directed towards developing algorithms that perform fast, accurate, and

stable reconstruction of

from

As we have seen in previous chapters (Section 3.1), a good CS matrix

typically satises certain geometric conditions, such as the restricted isometry property (Section 3.3) (RIP).
Practical algorithms exploit this fact in various ways in order to drive down the number of measurements,
enable faster reconstruction, and ensure robustness to both numerical and stochastic errors.
The design of sparse recovery algorithms are guided by various criteria. Some important ones are listed
as follows.

Minimal number of measurements.

Sparse recovery algorithms must require approximately the

same number of measurements (up to a small constant) required for the stable embedding of

K -sparse

signals.

Robustness to measurement noise and model mismatch

Sparse recovery algorithms must be

stable with regards to perturbations of the input signal, as well as noise added to the measurements;
both types of errors arise naturally in practical systems.

Speed.

Sparse recovery algorithms must strive towards expending minimal computational resources,

Keeping in mind that a lot of applications in CS deal with very high-dimensional signals.

Performance guarantees.

In previous chapters (Section 4.1), we have already seen a range of

performance guarantees that hold for sparse signal recovery using

minimization. In evaluating other

algorithms, we will have the same considerations. For example, we can choose to design algorithms
that possess instance-optimal or probabilistic guarantees (Section 4.4). We can also choose to focus on
algorithm performance for the recovery of exactly
recovery of general signals

xs.

K -sparse

signals

or consider performance for the

Alternately, we can also consider algorithms that are accompanied by

performance guarantees in either the noise-free (Section 4.2) or noisy (Section 4.3) settings.
A multitude of algorithms satisfying some (or even all) of the above have been proposed in the literature.
While it is impossible to describe all of them in this chapter, we refer the interested reader to the DSP
resources webpage

for a more complete list of recovery algorithms.

Broadly speaking, recovery methods

tend to fall under three categories: convex optimization-based approaches (Section 5.2), greedy methods
(Section 5.3), and combinatorial techniques (Section 5.4). The rest of the chapter discusses several properties
and example algorithms of each avor of CS reconstruction.

1 This content is available

2 http://dsp.rice.edu/cs

online at <http://cnx.org/content/m37292/1.3/>.

CHAPTER 5. ALGORITHMS FOR SPARSE RECOVERY

5.2 Convex optimization-based methods

An important class of sparse recovery algorithms (Section 5.1) fall under the purview of
Algorithms in this category seek to optimize a convex function

convex optimization.

f () of the unknown variable x over a (possibly

RN .

unbounded) convex subset of

5.2.1 Setup
Let

J (x)

sparsity-promoting

be a convex

signal representation

cost function (i.e.,

from measurements

y = x, R

J (x)

M N

is small for sparse

x.)

To recover a sparse

, we may either solve

min{J (x) : y = x},

(5.1)

min{J (x) : H (x, y) }

(5.2)

when there is no noise, or solve

when there is noise in the measurements. Here,

the vectors

and

is a cost function that penalizes the distance between

For an appropriate penalty parameter

(5.2) is equivalent to the

unconstrained

formulation:

minJ (x) + H (x, y)

(5.3)

for some

> 0.

The parameter

may be chosen by trial-and-error, or by statistical techniques such as

cross-validation [18].

J and H are usually chosen as follows:

1
2
2 k x y k2 , the `2 -norm of the error between the
observed measurements and the linear projections of the target vector x. In statistics, minimizing this H
For convex programming algorithms, the most common choices of

J (x) = k x k1 ,

subject to

the

`1 -norm

k x k1

and

is known as the

H (x, y) =

Lasso

problem.

More generally,

J ()

acts as a regularization term

and can be replaced by other, more complex, functions; for example, the desired signal may be piecewise
constant, and simultaneously have a sparse representation under a known basis transform

In this case,

we may use a mixed regularization term:

J (x) = T V (x) + k x k1

(5.4)

It might be tempting to use conventional convex optimization packages for the above formulations ((5.1),
(5.2), and (5.3)). Nevertheless, the above problems pose two key challenges which are specic to practical
problems encountered in CS (Section 1.1): (i) real-world applications are invariably large-scale (an image of
a resolution of

1024 1024 pixels leads to optimization over a million variables,

well beyond the reach of any

standard optimization software package); (ii) the objective function is nonsmooth, and standard smoothing
techniques do not yield very good results.

Hence, for these problems, conventional algorithms (typically

involving matrix factorizations) are not eective or even applicable. These unique challenges encountered in
the context of CS have led to considerable interest in developing improved sparse recovery algorithms in the
optimization community.

5.2.2 Linear programming

In the noiseless (Section 4.2) case, the

J (x) = k x k1

`1 -minimization

(Section 4.1) problem (obtained by substituting

in (5.1)) can be recast as a linear program (LP) with equality constraints.

solved in polynomial time (O

These can be

) using standard interior-point methods [19]. This was the rst feasible

reconstruction algorithm used for CS recovery and has strong theoretical guarantees, as shown earlier in

3 This

content is available online at <http://cnx.org/content/m37293/1.5/>.

this course (Section 4.1).

In the noisy case, the problem can be recast as a second-order cone program

(SOCP) with quadratic constraints. Solving LPs and SOCPs is a principal thrust in optimization research;
nevertheless, their application in practical CS problems is limited due to the fact that both the signal
dimension

and the number of constraints

can be very large in many scenarios. Note that both LPs

and SOCPs correspond to the constrained formulations in (5.1) and (5.2) and are solved using

rst order

interior-point methods.
A newer algorithm called l1_ls" [124] is based on an interior-point algorithm that uses a preconditioned
conjugate gradient (PCG) method to approximately solve linear systems in a truncated-Newton framework.
The algorithm exploits the structure of the Hessian to construct their preconditioner; thus, this is a second
order method.

Computational results show that about a hundred PCG steps are sucient for obtaining

accurate reconstruction. This method has been typically shown to be slower than rst-order methods, but
could be faster in cases where the true target signal is highly sparse.

5.2.3 Fixed-point continuation

As opposed to solving the constrained formulation, an alternate approach is to solve the unconstrained
formulation in (5.3). A widely used method for solving

`1 -minimization

problems of the form

min k x k1 + H (x) ,

(5.5)

for a convex and dierentiable

H , is an iterative procedure based on shrinkage (also called soft thresholding;

H , this method was independently proposed

see (5.6) below). In the context of solving (5.5) with a quadratic

and analyzed in [12], [96], [148], [156], and then further studied or extended in [48], [54], [85], [87], [110],
[213]. Shrinkage is a classic method used in wavelet-based image denoising. The shrinkage operator on any
scalar component can be dened as follows:

ift > ,

if t , and

ift < .

shrink (t, ) = {

This concept can be used eectively to solve (5.5).

following the xed-point iteration: for

i = 1, ..., N ,

the

(5.6)

In particular, the basic algorithm can be written as

ith

coecient of

at the

(k + 1)

time step is given

xk+1
= shrink
i
where

xk [U+25BD]H xk

serves as a step-length for gradient descent (which may vary with

the user. It is easy to see that the larger

a quadratic penalty term

H (),

the gradient

(5.7)

and

is, the larger the allowable distance between

[U+25BD]H

is as specied by

xk+1

and

xk .

can be easily computed as a linear function of

For

xk ;

thus each iteration of (5.7) essentially boils down to a small number of matrix-vector multiplications.
The simplicity of the iterative approach is quite appealing, both from a computational, as well as a codedesign standpoint.

Various modications, enhancements, and generalizations to this approach have been

proposed, both to improve the eciency of the basic iteration in (5.7), and to extend its applicability to
various kinds of

[88], [97], [213]. In principle, the basic iteration in (5.7) would not be practically eective

without a continuation (or path-following) strategy [110], [213] in which we choose a gradually decreasing
sequence of values for the parameter
This procedure is known as

to guide the intermediate iterates towards the nal optimal solution.

continuation; in [110], the performance of an algorithm known as Fixed-Point

Continuation (FPC) has been compared favorably with another similar method known as Gradient Projection for Sparse Reconstruction (GPSR) [97] and l1_ls [124]. A key aspect to solving the unconstrained
optimization problem is the choice of the parameter

As discussed above, for CS recovery,

may be chosen

by trial and error; for the noiseless constrained formulation, we may solve the corresponding unconstrained
minimization by choosing a large value for

CHAPTER 5. ALGORITHMS FOR SPARSE RECOVERY

In the case of recovery from noisy compressive measurements, a commonly used choice for the convex
cost function

H (x)

is the square of the norm of the

residual.

Thus we have:

=k y x k22

H (x)

(5.8)

= 2> (y x) .

[U+25BD]H (x)

For this particular choice of penalty function, (5.7) reduces to the following iteration:

xk+1
= shrink
i

xk [U+25BD]H y xk i , (5.9)

which is run until convergence to a xed point. The algorithm is detailed in pseudocode form below.

Inputs: CS matrix , signal measurements y , parameter sequence _n

^
Outputs: Signal estimate x
^
initialize: x_0 = 0, r = y , k = 0.
while alting criterion false do
1. k k + 1
^
2. x x T r {take a gradient step}
^
3. x shrink (x, _k ) {perform soft thresholding}
^
4. r y x {update measurement residual}
end while
^ ^
return xx

5.2.4 Bregman iteration methods

It turns out that an ecient method to obtain the solution to the constrained optimization problem in
(5.1) can be devised by solving a small number of the unconstrained problems in the form of (5.3). These
subproblems are commonly referred to as

Bregman iterations.

A simple version can be written as follows:

y k+1

= y k + y xk

xk+1

= argmin J (x) + 2 k x y k+1 k .

(5.10)

The problem in the second step can be solved by the algorithms reviewed above. Bregman iterations were
introduced in [159] for constrained total variation minimization problems, and was proved to converge for
closed, convex functions

J (x).

a nite number of steps for any

In [214], it is applied to (5.1) for

> 0.

For moderate

J (x) = k x k1

and shown to converge in

the number of iterations needed is typically lesser

than 5. Compared to the alternate approach that solves (5.1) through directly solving the unconstrained
problem in (5.3) with a very large

Bregman iterations are often more stable and sometimes much faster.

5.2.5 Discussion
All the methods discussed in this section optimize a convex function (usually the
(possibly unbounded) set.

This implies

given that the sampling matrix

guaranteed

satises the conditions specied in "Signal recovery via

(Section 4.1), convex optimization methods will recover the underlying signal
ation methods also guarantee
unconstrained formulation.

`1 -norm)

convergence to the global optimum.

stable

over a convex

In other words,

minimization"

In addition, convex relax-

recovery by reformulating the recovery problem as the SOCP, or the

5.3 Greedy algorithms

5.3.1 Setup
As opposed to solving a (possibly computationally expensive) convex optimization (Section 5.2) program,
an alternate avor to sparse recovery (Section 5.1) is to apply methods of
the goal of sparse recovery is to recover the

sparsest

vector

sparse approximation.

Recall that

which explains the linear measurements

other words, we aim to solve the (nonconvex) problem:

min{|I| : y =
I

where

i xi },

(5.11)

denotes a particular subset of the indices

i = 1, ..., N ,

and

well known that searching over the power set formed by the columns of
smallest cardinality is NP-hard.

greedily

selecting columns of

ith

denotes the

column of

for the optimal subset

.
I

It is
with

Instead, classical sparse approximation methods tackle this problem by

and forming successively better approximations to

5.3.2 Matching Pursuit

Matching Pursuit (MP), named and introduced to the signal processing community by Mallat and
Zhang [141], [142], is an iterative greedy algorithm that decomposes a signal into a linear combination
of elements from a dictionary. In sparse recovery, this dictionary is merely the sampling matrix
we seek a sparse representation (x) of our signal

MP is conceptually very simple. A key quantity in MP is the

residual r RM ;

RM N ;

the residual represents

the as-yet unexplained portion of the measurements. At each iteration of the algorithm, we select a vector
from the dictionary that is maximally correlated with the residual

k = argmax

< rk , >

k k

r:
.

(5.12)

Once this column is selected, we possess a better representation of the signal, since a new coecient
indexed by

has been added to our signal approximation.

Thus, we update both the residual and the

approximation as follows:

rk
^

x k

= rk1

<rk1 ,k >k
kk k2

= xk + < rk1 , k > .

and repeat the iteration. A suitable stopping criterion is when the norm of
quantity. MP is described in pseudocode form below.

Inputs: Measurement matrix , signal measurements y

^
Outputs: Sparse signal x
^
initialize: x_0 = 0, r = y , i = 0.
while alting criterion false do
1. i i + 1
2. b T r {form residual signal estimate}
^
^
3. x_i x_i 1 + T (1) {update largest magnitude coefficient}
^
4. r r x_i {update measurement residual}
end while
^ ^
return x x_i
4 This

(5.13)

content is available online at <http://cnx.org/content/m37294/1.4/>.

becomes smaller than some

CHAPTER 5. ALGORITHMS FOR SPARSE RECOVERY

Although MP is intuitive and can nd an accurate approximation of the signal, it possesses two major
drawbacks:

(i) it oers no guarantees in terms of recovery error; indeed, it does not exploit the special

structure present in the dictionary

complexity of MP is

O (M N T )

(ii) the required number of iterations required can be quite large. The

[83] , where

is the number of MP iterations

5.3.3 Orthogonal Matching Pursuit (OMP)

Matching Pursuit (MP) can prove to be computationally infeasible for many problems, since the complexity
of MP grows linearly in the number of iterations

By employing a simple modication of MP, the maximum

number of MP iterations can be upper bounded as follows. At any iteration

contribution of the dictionary element with which the residual
projection of

onto the

orthogonal subspace

Instead of subtracting the

is maximally correlated, we compute the

to the linear span of the currently selected dictionary elements.

r to
selected

This quantity thus better represents the unexplained portion of the residual, and is subtracted from
form a new residual, and the process is repeated. If
at time step

is the submatrix formed by the columns of

the following operations are performed:

xk
^

= argmink y x k2 ,
x

= xt ,

= y t .

(5.14)

These steps are repeated until convergence. This is known as Orthogonal Matching Pursuit (OMP) [160].
Tropp and Gilbert [191] proved that OMP can be used to recover a sparse signal with high probability using
compressive measurements. The algorithm converges in at most

iterations, where K is the sparsity, but

requires the added computational cost of orthogonalization at each iteration. Indeed, the total complexity
of OMP can be shown to be

O (M N K) .

While OMP is provably fast and can be shown to lead to exact recovery, the guarantees accompanying
OMP for sparse recovery are weaker than those associated with optimization techniques (Section 4.1). In
particular, the reconstruction guarantees are
matrix with

M = CKlogN

measurements.

not uniform, i.e., it cannot be shown that a single measurement

rows can be used to recover every possible

Ksparse

signal with

M = CKlogN

(Although it is possible to obtain such uniform guarantees when it is acceptable to take

more measurements. For example, see [59].) Another issue with OMP is robustness to noise; it is unknown
whether the solution obtained by OMP will only be perturbed slightly by the addition of a small amount of
noise in the measurements. Nevertheless, OMP is an ecient method for CS recovery, especially when the
signal sparsity

is low. A pseudocode representation of OMP is shown below.

Inputs: Measurement matrix , signal measurements y

^
Outputs: Sparse representation x
^
Initialize: _0 = 0, r = y , = , i = 0.
while alting criterion false do
1. i i + 1
2. b T r {form residual signal estimate}
3. supp (T (b, 1)) {add index of residual's largest magnitude entry to signal support}
^
^
4. x_i|_ _ x, x_i|_ C 0 {form signal estimate}
^
5. r y x_i {update measurement residual}
end while
^ ^
return x x_i

5.3.4 Stagewise Orthogonal Matching Pursuit (StOMP)

Orthogonal Matching Pursuit is ineective when the signal is not very sparse as the computational cost

increases quadratically with the number of nonzeros

In this setting, Stagewise Orthogonal Matching

Pursuit (StOMP) [69] is a better choice for approximately sparse signals in a large-scale setting.
StOMP oers considerable computational advantages over

minimization (Section 5.2) and Orthogonal

Matching Pursuit for large scale problems with sparse solutions. The algorithm starts with an initial residual

k th stage (as in OMP). However, instead of

picking a single dictionary element, it uses a threshold parameter to determine the next best set of columns
of whose correlations with the current residual exceed . The new residual is calculated using a least squares
r0 = y

and calculates the set of all projections

T rk1

at the

estimate of the signal using this expanded set of columns, just as before.
Unlike OMP, the number of iterations in StOMP is xed and chosen before hand;
in [69].

In general, the complexity of StOMP is

O (KN logN )

S = 10 is recommended

, a signicant improvement over OMP.

However, StOMP does not bring in its wake any reconstruction guarantees.

StOMP also has moderate

memory requirements compared to OMP where the orthogonalization requires the maintenance of a Cholesky
factorization of the dictionary elements.

5.3.5 Compressive Sampling Matching Pursuit (CoSaMP)

Greedy pursuit algorithms (such as MP and OMP) alleviate the issue of computational complexity encountered in optimization-based sparse recovery, but lose the associated strong guarantees for uniform signal
recovery, given a requisite number of measurements of the signal. In addition, it is unknown whether these
greedy algorithms are robust to signal and/or measurement noise.
There have been some recent attempts to develop greedy algorithms (Regularized OMP [154], [155],
Compressive Sampling Matching Pursuit (CoSaMP) [153] and Subspace Pursuit [52]) that bridge this gap
between uniformity and complexity.

Intriguingly, the restricted isometry property (Section 3.3) (RIP),

developed in the context of analyzing

Indeed, if the matrix

minimization (Section 4.1), plays a central role in such algorithms.

satises the RIP of order

matrix is approximately orthonormal.

this implies that every subset of

columns of the

This property is used to prove strong convergence results of these

greedy-like methods.
One variant of such an approach is employed by the CoSaMP algorithm.

An interesting feature of

CoSaMP is that unlike MP, OMP and StOMP, new indices in a signal estimate can be added

deleted from the current set of chosen indices.

a chosen index (or equivalently, a chosen atom from the dictionary

remains in the signal representation

until the end. A brief description of CoSaMP is as follows: at the start of a given iteration
signal estimate is

xi1 .

Form signal residual estimate:

Find the biggest

e T r

coecients of the signal residual

Merge supports:

Form signal estimate

Prune

T supp xi1
b

call this set of indices

by subspace projection:

by retaining its

b|T T y , b|T C 0

largest coecients. Call this new estimate

Update measurement residual:

as well as

In contrast, greedy pursuit algorithms suer from the fact that

.
^

xi .

r y xi .

This procedure is summarized in pseudocode form below.

Inputs: Measurement matrix , measurements y , signal sparsity K

^
Output: K -sparse approximation x to true signal representation x
^
Initialize: x_0 = 0 , r = y ; i = 0
while alting criterion false do

suppose the

CHAPTER 5. ALGORITHMS FOR SPARSE RECOVERY

ii+1
2. e T r {form signal residual estimate}
3. supp (T (e,2K)) {prune
signal residual estimate}

^
4. T supp x_i 1 {merge supports}
5. b|_T _T y , b|_T C {form signal estimate}
^
6. x_i T (b, K) {prune signal estimate}
^
7. r y x_i {update measurement residual}
end while
^ ^
return x x_i
As discussed in [153], the key computational issues for CoSaMP are the formation of the signal residual, and
the method used for subspace projection in the signal estimation step. Under certain general assumptions,
the computational cost of CoSaMP can be shown to be

O (M N ),

which is

independent of the sparsity of the

original signal. This represents an improvement over both greedy algorithms as well as convex methods.
While CoSaMP arguably represents the state of the art in sparse recovery algorithm performance, it
possesses one drawback:

the algorithm requires prior knowledge of the sparsity

of the target signal.

An incorrect choice of input sparsity may lead to a worse guarantee than the actual error incurred by a
weaker algorithm such as OMP. The stability bounds accompanying CoSaMP ensure that the error due to
an incorrect parameter choice is bounded, but it is not yet known how these bounds translate into practice.

5.3.6 Iterative Hard Thresholding

Iterative Hard Thresholding (IHT) is a well-known algorithm for solving nonlinear inverse problems. The
structure of IHT is simple: starting with an initial estimate

x0 ,

iterative hard thresholding (IHT) obtains a

sequence of estimates using the iteration:

^
^
xi+1 = T xi + T y xi , K .

(5.15)

In [17], Blumensath and Davies proved that this sequence of iterations converges to a xed point
if the matrix

possesses the RIP, they showed that the recovered signal

further,

satises an instance-optimality

guarantee of the type described earlier (Section 4.1). The guarantees (as well as the proof technique) are
reminiscent of the ones that are derived in the development of other algorithms such as ROMP and CoSaMP.

5.3.7 Discussion
While convex optimization techniques are powerful methods for computing sparse representations, there are
also a variety of greedy/iterative methods for solving such problems. Greedy algorithms rely on iterative
approximation of the signal coecients and support, either by iteratively identifying the support of the
signal until a convergence criterion is met, or alternatively by obtaining an improved estimate of the sparse
signal at each iteration by accounting for the mismatch to the measured data.

Some greedy methods

can actually be shown to have performance guarantees that match those obtained for convex optimization
approaches.
used for

In fact, some of the more sophisticated greedy algorithms are remarkably similar to those
minimization described previously (Section 5.2).

However, the techniques required to prove

performance guarantees are substantially dierent. There also exist iterative techniques for sparse recovery
based on message passing schemes for sparse graphical models.

In fact, some greedy algorithms (such as

those in [14], [116]) can be directly interpreted as message passing methods [73].

5.4 Combinatorial algorithms

In addition to convex optimization (Section 5.2) and greedy pursuit (Section 5.3) approaches, there is another
important class of sparse recovery algorithms that we will refer to as

combinatorial algorithms.

These

algorithms, mostly developed by the theoretical computer science community, in many cases pre-date the
compressive sensing (Section 1.1) literature but are highly relevant to the sparse signal recovery problem
(Section 5.1).

5.4.1 Setup
The oldest combinatorial algorithms were developed in the context of
group testing problem, we suppose that there are

group testing

[89], [121], [177]. In the

total items, of which an unknown subset of

elements

are anomalous and need to be identied. For example, we might wish to identify defective products in an
industrial setting, or identify a subset of diseased tissue samples in a medical context. In both of these cases
the vector

indicates which elements are anomalous, i.e.,

xi 6= 0

for the

anomalous elements and

xi = 0

otherwise. Our goal is to design a collection of tests that allow us to identify the support (and possibly the
values of the nonzeros) of

while also minimizing the number of tests performed. In the simplest practical

setting these tests are represented by a binary matrix

item is used in the

ith

whose entries ij

are equal to 1 if and only if the

of recovering the vector

is essentially the same as the standard sparse recovery problem.

Another application area in which combinatorial algorithms have proven useful is computation on

streams

[49], [149].

with destination

j th

test. If the output of the test is linear with respect to the inputs, then the problem

Suppose that

data

represents the number of packets passing through a network router

Simply storing the vector

is typically infeasible since the total number of possible

N = 232 . Thus, instead of attempting to store x directly,

one can store y = x where is an M N matrix with M N . In this context the vector y is often
called a sketch. Note that in this problem y is computed in a dierent manner than in the compressive
sensing context. Specically, in the network trac example we do not ever observe xi directly; rather, we
observe increments to xi (when a packet with destination i passes through the router). Thus we construct
y iteratively by adding the ith column to y each time we observe an increment to xi , which we can do since
y = x is linear. When the network trac is dominated by trac to a small number of destinations, the
vector x is compressible, and thus the problem of recovering x from the sketch x is again essentially the
destinations (represented by a 32-bit IP address) is

same as the sparse recovery problem.

Several combinatorial algorithms for sparse recovery have been developed in the literature.

A non-

exhaustive list includes Random Fourier Sampling [106], HHS Pursuit [106], and Sparse Sequential Matching
Pursuit [13]. We do not provide a full discussion of each of these algorithms; instead, we describe two simple
methods that highlight the avors of combinatorial sparse recovery

count-min

and

count-median.

5.4.2 The count-min sketch

Dene
of size

H as
mN .

the set of all discrete-valued functions

Each function

h : {1, ..., N } {1, ..., m}.

can be specied by a binary

with each column being a binary vector with exactly one 1 at the location
the overall

sampling matrix , we choose d functions h1 , ..., hd

dened on

M N

, where

according to the mother binary functions

5 This

is a nite set

of size

j = h (i).

m N,

To construct

Thus, if

M = md,

is a binary

with each column containing exactly

It is easy to visualize the measurements

via the following two properties. First, the coecients of the measurement vector
ment vector

independently from the uniform distribution

and vertically concatenate their characteristic matrices.

d ones.
Now given any signal x, we acquire linear measurements y = x.

matrix of size

Note that

characteristic matrix (h)

{h1 , ..., hd }.

Second, consider the

which corresponds to the mother binary function

content is available online at <http://cnx.org/content/m37295/1.3/>.

ith

are naturally grouped

coecient of the measure-

Then, the expression for

is simply

CHAPTER 5. ALGORITHMS FOR SPARSE RECOVERY

given by:

yi =

xj .

(5.16)

j:h(j)=i
In other words, for a xed signal coecient index
an observation of

each measurement

as expressed above consists of

corrupted by other signal coecients mapped to the same

by the function

Signal

recovery essentially consists of estimating the signal values from these corrupted" observations.
The

count-min algorithm is useful in the special case where the entries of the original signal are positive.

Given measurements

using the sampling matrix

as constructed above, the estimate of the

j th

signal

entry is given by:

xj = minyi : hl (j) = i.

(5.17)

Intuitively, this means that the estimate of

is formed by simply looking at all measurements that comprise

corrupted by other signal values, and picking the one with the lowest magnitude. Despite the simplicity

of this algorithm, it is accompanied by an arguably powerful instance-optimality guarantee: if

and

m = 4/K ,

then with high probability, the recovered signal

d = ClogN

satises:

k x x k /K k x x k1 ,
where

represents the best

K -term

approximation of

in the

(5.18)

sense.

5.4.3 The count-median sketch

For the general setting when the coecients of the original signal could be either positive or negative, a similar
algorithm known as

count-median

can be used. Instead of picking the minimum of the measurements, we

compute the median of all those measurements that are comprised of a corrupted version of

and declare

it as the signal coecient estimate, i.e.,

xj = median yi : hl (j) = i.

(5.19)

The recovery guarantees for count-median are similar to that for count-min, with a dierent value of
the failure probability constant.

An important feature of both count-min and count-median is that they

require that the measurements be

perfectly noiseless,

in contrast to optimization/greedy algorithms which

can tolerate small amounts of measurement noise.

5.4.4 Summary
Although we ultimately wish to recover a sparse signal from a small number of linear measurements in both
of these settings, there are some important dierences between such settings and the compressive sensing
6

setting studied in this course .

First, in these settings it is natural to assume that the designer of the

reconstruction algorithm also has full control over

and is thus free to choose

in a manner that reduces

the amount of computation required to perform recovery. For example, it is often useful to design

so that

it has few nonzeros, i.e., the sensing matrix itself is also sparse [11], [102], [117]. In general, most methods
involve careful construction of the sensing matrix (Section 3.1)

which is in contrast with the optimization

and greedy methods that work with any matrix satisfying a generic condition such as the restricted isometry
property (Section 3.3).

This additional degree of freedom can lead to signicantly faster algorithms [42],

[51], [103], [105].

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

Second, note that the computational complexity of all the convex methods and greedy algorithms described above is always at least linear in

putational cost of reading out all

since in order to recover

entries of

we must at least incur the com-

This may be acceptable in many typical compressive

sensing applications, but this becomes impractical when

is extremely large, as in the network monitoring

example. In this context, one may seek to develop algorithms whose complexity is linear only in the

of the representation of the signal, i.e., its sparsity K .

reconstruction of

but instead returns only its

length

In this case the algorithm does not return a complete

largest elements (and their indices). As surprising as it

may seem, such algorithms are indeed possible. See [103], [105] for examples.

5.5 Bayesian methods

5.5.1 Setup
8

is xed and belongs to a known set of signals. In this section, we depart from this

Throughout this course , we have almost exclusively worked within a deterministic signal framework.
other words, our signal

framework and assume that the sparse (Section 2.3) (or compressible (Section 2.4)) signal of interest arises
from a known

probability distribution,

i.e., we assume sparsity promoting

recover from the stochastic measurements

y = x

Such an approach falls under the purview of

priors

on the elements of

and

a probability distribution on each nonzero element of

Bayesian

methods for sparse recovery (Section 5.1).

The algorithms discussed in this section demonstrate a digression from the conventional sparse recovery
techniques typically used in compressive sensing (Section 1.1) (CS). We note that none of these algorithms are
accompanied by guarantees on the number of measurements required, or the delity of signal reconstruction;
indeed, in a Bayesian signal modeling framework, there is no well-dened notion of reconstruction error.
However, such methods do provide insight into developing recovery algorithms for rich classes of signals, and
may be of considerable practical interest.

5.5.2 Sparse recovery via belief propagation

As we will see later in this course, there are signicant parallels to be drawn between error correcting codes
and sparse recovery [170].

In particular, sparse codes such as LDPC codes have had grand success.

The

advantage that sparse coding matrices may have in ecient encoding of signals and their low complexity
decoding algorithms, is transferable to CS encoding and decoding with the use of
(Section 3.1)

7 This
8

The sparsity in the

sparse

sensing matrices

matrix is equivalent to the sparsity in LDPC coding graphs.

content is available online at <http://cnx.org/content/m37359/1.4/>.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

CHAPTER 5. ALGORITHMS FOR SPARSE RECOVERY

Figure 5.1: Factor graph depicting the relationship between the variables involved in CS decoding using

BP. Variable nodes are black and the constraint nodes are white.

A sensing matrix

that denes the relation between the signal x and measurements y can be represented
x (i) and measurement nodes y (i) [170], [175]. The factor

as a bipartite graph of signal coecient nodes

graph in Figure 5.1 represents the relationship between the signal coecients and measurements in the CS
decoding problem.
The choice of signal probability density is of practical interest. In many applications, the signals of interest
need to be modeled as being compressible (as opposed to being strictly sparse). This behavior is modeled
by a two-state Gaussian mixture distribution, with each signal coecient taking either a large or small
coecient value state. Assuming that the elements of

are i.i.d., it can be shown that small coecients

occur more frequently than the large coecients. Other distributions besides the two-state Gaussian may
also be used to model the coecients, for e.g., the i.i.d. Laplace prior on the coecients of
The ultimate goal is to estimate (i.e., decode)

given

and

The decoding problem takes the form of

a Bayesian inference problem in which we want to approximate the marginal distributions of each of the
coecients conditioned on the observed measurements

y (i).

x (i)

We can then estimate the Maximum Likelihood

Estimate (MLE), or the Maximum a Posteriori (MAP) estimates of the coecients from their distributions.
This sort of inference can be solved using a variety of methods; for example, the popular belief propagation
method (BP) [170] can be applied to solve for the coecients approximately. Although exact inference in
arbitrary graphical models is an NP hard problem, inference using BP can be employed when

is sparse

enough, i.e., when most of the entries in the matrix are equal to zero.

5.5.3 Sparse Bayesian learning

Another probabilistic approach used to estimate the components of

x is by using Relevance Vector Machines

(RVMs). An RVM is essentially a Bayesian learning method that produces sparse classication by linearly
weighting a small number of xed basis functions from a large dictionary of potential candidates (for more
details the interested reader may refer to [189], [188]).
method to determine the elements of a sparse
columns of

From the CS perspective, we may view this as a

which linearly weight the basis functions comprising the

The RVM setup employs a hierarchy of priors; rst, a Gaussian prior is assigned to each of the
of

subsequently, a Gamma prior is assigned to the inverse-variance

of the

ith

elements

Gaussian prior. Therefore

each

controls the strength of the prior on its associated weight in

xi .

is the sparse vector to be

reconstructed, its associated Gaussian prior is given by:

p (x|) =

N
Y

N xi |0, i1

(5.20)

i=1
and the Gamma prior on

is written as:

p (|a, b) =

N
Y

(i |a, b)

(5.21)

i=1
The overall prior on
to peak at

xi = 0

x can be analytically evaluated to be the Student-t distribution, which can be designed

a and b. This enables the desired solution x to be sparse. The

with appropriate choice of

RVM approach can be visualized using a graphical model similar to the one in "Sparse recovery via belief
propagation" (Section 5.5.2: Sparse recovery via belief propagation). Using the observed measurements
the posterior density on each

is estimated by an iterative algorithm (e.g., Markov Chain Monte Carlo

(MCMC) methods). For a detailed analysis of the RVM with a measurement noise prior, refer to [119], [188].
Alternatively, we can eliminate the need to set the hyperparameters
Gaussian measurement noise with mean 0 and variance
for

2 ,

and

as follows.

and maximize it by the EM algorithm (or directly dierentiate) to nd estimates for

L () = logp y|, 2 = log

Assuming

we can directly nd the marginal log likelihood

p y|x, 2 p (y|) dx.

(5.22)

5.5.4 Bayesian compressive sensing

Unfortunately, evaluation of the log-likelihood in the original RVM setup involves taking the inverse of
an

N N

matrix, rendering the algorithm's complexity to be

O N3

A fast alternative algorithm for

the RVM is available which monotonically maximizes the marginal likelihoods of the priors by a gradient
ascent, resulting in an algorithm with complexity

O NM2

. Here, basis functions are sequentially added

and deleted, thus building the model up constructively, and the true sparsity of the signal

is exploited to

minimize model complexity. This is known as Fast Marginal Likelihood Maximization, and is employed by
the Bayesian Compressive Sensing (BCS) algorithm [119] to eciently evaluate the posterior densities of

xi .

A key advantage of the BCS algorithm is that it enables evaluation of error bars on each estimated
coecient of
to

these give us an idea of the (in)accuracies of these estimates. These error bars could be used

adaptively select the linear projections (i.e., the rows of the matrix ) to reduce uncertainty in the signal.

This provides an intriguing connection between CS and machine learning techniques such as experimental
design and active learning [91], [138].

CHAPTER 5. ALGORITHMS FOR SPARSE RECOVERY

Chapter 6

Applications of Compressive Sensing

6.1 Linear regression and model selection

Many of the sparse recovery algorithms (Section 5.1) we have described so far in this course

were originally

developed to address the problem of sparse linear regression and model selection in statistics. In this setting
we are given some data consisting of a set of input variables and response variables. We will suppose that
there are a total of

input variables, and we observe a total of

represent the set of input variable observations as an

M 1

observations as an

vector

M N

matrix

input and response pairs.

We can

and the set of response variable

y can be approximated as a linear function of the input variables,

y x. However, when the number of input variables is large compared to
the number of observations, i.e., M N , this becomes extremely challenging because we wish to estimate
N parameters from far fewer than N observations. In general this would be impossible to overcome, but in
In linear regression, it is assumed that

i.e., there exists an

such that

practice it is common that only a few input variables are actually necessary to predict the response variable.
In this case the

that we wish to estimate is sparse, and we can apply all of the techniques that we have

learned so far for sparse recovery to estimate

of obtaining a regression, but it also performs

In this setting, not only does sparsity aid us in our goal

model selection

by identifying the most relevant variables in

predicting the response.

6.2 Sparse error correction

In communications, error correction refers to mechanisms that can detect and correct errors in the data
that appear duet to distortion in the transmission channel. Standard approaches for error correction rely on
repetition schemes, redundancy checks, or nearest neighbor code search. We consider the particular case in

x with M entries is coded by taking length-N linearly independent codewords {1 , ....M },

N >P
M and summing them using the entries of x as coecients. The received message is a length-N
M
code y =
m=1 i xi = f , where is a matrix that has the dierent codewords for columns. We assume
that the transmission channel corrupts the entries of y in an additive way, so that the received data is
y = x + e, where e is an error vector.

which a signal
with

The techniques developed for sparse recovery (Section 5.1) in the context of compressive sensing (Sec-

e therefore making it possible to

x when e is suciently sparse (Section 2.3) [35]. To estimate the error, we
build a matrix that is a basis for the orthogonal subspace to the span of the matrix , i.e., an (N M )N
tion 1.1) (CS) provide a number of methods to estimate the error vector
correct it and obtain the signal

1 This
2
3 This

content is available online at <http://cnx.org/content/m37360/1.3/>.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

content is available online at <http://cnx.org/content/m37361/1.3/>.

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

matrix

that holds

= 0.

When such a matrix is obtained, we can modify the measurements by multi-

plying them with the matrix to obtain

y = y = x + e = e.

If the matrix

is well-suited for CS (i.e.,

e is suciently sparse,

it satises a condition such as the restricted isometry property (Section 3.3)) and
then the error vector

e can be estimated accurately using CS. Once the estimate e

is obtained, the error-free

^
^
e , and the signal can be recovered as x=

y = y
y = y e .
As an example, when the codewords m have random independent and identically distributed sub-Gaussian
(Section 7.1) entries, then a K -sparse error can be corrected if M < N CKlogN/K for a xed constant C

measurements can be estimated as

(see "Matrices that satisfy the RIP" (Section 3.5)).

6.3 Group testing and data stream algorithms

Another scenario where compressive sensing (Section 1.1) and sparse recovery algorithms (Section 5.1) can
be potentially useful is the context of

group testing

and the related problem of

computation on data streams.

6.3.1 Group testing

Among the historically oldest of all sparse recovery algorithms were developed in the context of

group testing

[90], [122], [178]. In this problem we suppose that there are

combinatorial

total items and

anomalous

elements that we wish to nd. For example, we might wish to identify defective products in an industrial
setting, or identify a subset of diseased tissue samples in a medical context. In both of these cases the vector

indicates which elements are anomalous, i.e.,

xi 6= 0

for the

anomalous elements and

xi = 0

otherwise.

Our goal is to design a collection of tests that allow us to identify the support (and possibly the values of the
nonzeros) of

while also minimizing the number of tests performed. In the simplest practical setting these

tests are represented by a binary matrix

in the

the vector

whose entries

are equal to 1 if and only if the

j th

item is used

test. If the output of the test is linear with respect to the inputs, then the problem of recovering

is essentially the same as the standard sparse recovery problem in compressive sensing.

6.3.2 Computation on data streams

Another application area in which ideas related to compressive sensing have proven useful is computation
on

data streams

[50], [150]. As an example of a typical data streaming problem, suppose that

the number of packets passing through a network router with destination

represents

Simply storing the vector

typically infeasible since the total number of possible destinations (represented by a 32-bit IP address) is

N = 232 . Thus, instead of attempting to store x directly, one can store y = x where is an M N matrix
with M N . In this context the vector y is often called a sketch. Note that in this problem y is computed
in a dierent manner than in the compressive sensing context. Specically, in the network trac example we

xi (when a packet with destination i passes

y iteratively by adding the ith column to y each time we observe an
increment to xi , which we can do since y = x is linear. When the network trac is dominated by trac
to a small number of destinations, the vector x is compressible, and thus the problem of recovering x from
the sketch x is again essentially the same as the sparse recovery problem in compressive sensing.
do not ever observe

directly, rather we observe increments to

through the router). Thus we construct

6.4 Compressive medical imaging

6.4.1 MR image reconstruction

Magnetic Resonance Imaging (MRI) is a medical imaging technique based on the core principle that protons
in water molecules in the human body align themselves in a magnetic eld. MRI machines repeatedly pulse

4 This
5 This

content is available online at <http://cnx.org/content/m37362/1.4/>.

content is available online at <http://cnx.org/content/m37363/1.4/>.

magnetic elds to cause water molecules in the human body to disorient and then reorient themselves, which
causes a release of detectable radiofrequencies. We assume that the object to be imaged as a collection of
voxels. The MRI's magnetic pulses are sent incrementally along a gradient leading to a dierent phase and
frequency encoding for each column and row of voxels respectively. Abstracting away from the technicalities
of the physical process, the magnetic eld measured in MRI acquisition corresponds to a Fourier coecient
of the imaged object; the object can then be recovered by an inverse Fourier transform. , we can view the
MRI as measuring Fourier samples.
A major limitation of the MRI process is the linear relation between the number of measured data
samples and scan times. Long-duration MRI scans are more susceptible to physiological motion artifacts, add
discomfort to the patient, and are expensive [134]. Therefore, minimizing scan time without compromising
image quality is of direct benet to the medical community.
The theory of compressive sensing (Section 1.1) (CS) can be applied to MR image reconstruction by
exploiting the transform-domain sparsity of MR images [135], [136], [137], [198]. In standard MRI reconstruction, undersampling in the Fourier domain results in aliasing artifacts when the image is reconstructed.
However, when a known transform renders the object image sparse (Section 2.3) or compressible (Section 2.4),
the image can be reconstructed using sparse recovery (Section 5.1) methods. While the discrete cosine and
wavelet transforms are commonly used in CS to reconstruct these images, the use of total variation norm
minimization also provides high-quality reconstruction.

6.4.2 Electroencephalography
Electroencephalography (EEG) and Magnetoencephalography (MEG) are two popular noninvasive methods
to characterize brain function by measuring scalp electric potential distributions and magnetic elds due to
neuronal ring. EEG and MEG provide temporal resolution on the millisecond timescale characteristic of
neural population activity and can also help to estimate the current sources inside the brain by solving an
inverse problem [107].
Models for neuromagnetic sources suggest that the underlying activity is often limited in spatial extent.
Based on this idea, algorithms like FOCUSS (Focal Underdetermined System Solution) are used to identify
highly localized sources by assuming a sparse model to solve an underdetermined problem [108].
FOCUSS is a recursive linear estimation procedure, based on a weighted pseudo-inverse solution. The
algorithm assigns a current (with nonlinear current location parameters) to each element within a region so
that the unknown current values can be related linearly to the measurements. The weights at each step are
derived from the solution of the previous iterative step. The algorithm converges to a source distribution
in which the number of parameters required to describe source currents does not exceed the number of
measurements. The initialization determines which of the localized solutions the algorithm converges to.

6.5 Analog-to-information conversion

We now consider the application of compressive sensing (Section 1.1) (CS) to the problem of designing a
system that can acquire a continuous-time signal

digital converter

x (t). Specically, we would like to build an analog-tox (t) at its Nyquist rate when x (t) is sparse. In this

(ADC) that avoids having to sample

context, we will assume that

x (t)

has some kind of sparse (Section 2.3) structure in the Fourier domain,

meaning that it is still bandlimited but that much of the spectrum is empty. We will discuss the dierent
possible signal models for mathematically capturing this structure in greater detail below.

For now, the

challenge is that our measurement system (Section 3.1) must be built using analog hardware. This imposes
severe restrictions on the kinds of operations we can perform.

6 This

content is available online at <http://cnx.org/content/m37375/1.4/>.

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

6.5.1 Analog measurement model

To be more concrete, since we are dealing with a continuous-time signal
continuous-time test functions
would like to collect

{j (t)}M
j=1 .

x (t),

we must also consider

We then consider a nite window of time, say

t [0, T ],

and

measurements of the form

y [j] =

x (t) j (t) dt.

(6.1)

0
Building an analog system to collect such measurements will require three main components:

j (t);
x (t) with

1. hardware for generating the test signals

2.
3.

M
M

correlators that multiply the signal

each respective

j (t);

integrators with a zero-valued initial state.

We could then sample and quantize the output of each of the integrators to collect the measurements

y [j].

Of course, even in this somewhat idealized setting, it should be clear that what we can build in hardware will constrain our choice of
arbitrarily complex

j (t)

since we cannot reliably and accurately produce (and reproduce)

in analog hardware. Moreover, the architecture described above requires

corre-

lator/integrator pairs operating in parallel, which will be potentially prohibitively expensive both in dollar
cost as well as costs such as size, weight, and power (SWAP).
As a result, there have been a number of eorts to design simpler architectures, chiey by carefully

j (t). The simplest to describe and historically earliest idea is to choose j (t) =
(t tj ), where {tj }M
j=1 denotes a sequence of M locations in time at which we would like to sample the
signal x (t). Typically, if the number of measurements we are acquiring is lower than the Nyquist-rate, then
these locations cannot simply be uniformly spaced in the interval [0, T ], but must be carefully chosen. Note
designing structured

that this approach simply requires a single traditional ADC with the ability to sample on a non-uniform
grid, avoiding the requirement for

parallel correlator/integrator pairs. Such non-uniform sampling systems

have been studied in other contexts outside of the CS framework. For example, there exist specialized fast
algorithms for the recovery of extremely large Fourier-sparse signals. The algorithm uses samples at a nonuniform sequence of locations that are highly structured, but where the initial location is chosen using a
(pseudo)random seed. This literature provides guarantees similar to those available from standard CS [101],
[104]. Additionally, there exist frameworks for the sampling and recovery of multi-band signals, whose Fourier
transforms are mostly zero except for a few frequency bands. These schemes again use non-uniform sampling
patterns based on coset sampling [21], [20], [95], [93], [146], [202]. Unfortunately, these approaches are often
highly sensitive to

jitter, or error

in the timing of when the samples are taken.

We will consider a rather dierent approach, which we call the

random demodulator

The architecture of the random demodulator is depicted in Figure 6.1. The analog input
with a pseudorandom square pulse of
values at a rate of

Na Hz,

called the

chipping sequence pc (t),

x (t)

is correlated

which alternates between

Na Hz is at least as fast as the Nyquist rate of x (t).

1/Ma and sampled by a traditional integrate-and-dump

where

is integrated over a time period

Ma Hz Na Hz.

1's,

[125], [130], [193].

The mixed signal

back-end ADC at

In this case our measurements are given by

j/Ma

y [j] =

pc (t) x (t) dt.

(6.2)

(j1)/Ma
In practice, data is processed in time blocks of period
in the chipping sequence, and

M = Ma T

T , and we dene N = Na T

as the number of elements

as the number of measurements. We will discuss the discretization

of this model below, but the key observation is that the correlator and chipping sequence operate at a
fast rate, while the back-end ADC operates at a low rate.

In hardware it is easier to build a high-rate

modulator/chipping sequence combination than a high-rate ADC [130]. In fact, many systems already use
components of this front end for binary phase shift keying demodulation, as well as for other conventional
communication schemes such as CDMA.

correlator is also known as a demodulator due to its most common application: demodulating radio signals.

Figure 6.1: Random demodulator block diagram.

6.5.2 Discrete formulation

x (t),
x (t) at its Nyquist-rate to yield a discrete-time vector x, and
then applies a matrix to obtain the measurements y = x. To see this we let pc [n] denote the sequence of
1 used to generate the signal pc (t), i.e., pc (t) = pc [n] for t [(n 1) /Na , n/Na ]. As an example, consider
the rst measurement, or the case of j = 1. In this case, t [0, 1/Ma ], so that pc (t) is determined by pc [n]
for n = 1, 2, ..., Na /Ma . Thus, from (6.2) we obtain
Although the random demodulator directly acquires compressive measurements without rst sampling
it is equivalent to a system which rst samples

y [1]

=
=

But since
the

nth

is the Nyquist-rate of

R 1/Ma
0

PNa /Ma
n=1

x (t),

interval, yielding a sample denoted

pc (t) x (t) dt
R n/Na
pc [n] (n1)/N
x (t) dt.
a

R n/Na
(n1)/Na

x [n].

x (t) dt

(6.3)

simply calculates the average value of

x (t)

Thus, we obtain

Na /Ma

y [1] =

pc [n] x [n] .

(6.4)

n=1
In general, our measurement process is equivalent to multiplying the signal

1's

pc [n]

and then summing every sequential block of

such a

In general,

will have M

rows and each row will contain

general case. This is extremely useful during recovery.

with the random sequence of

example, with

N = 12, M = 4,

and

+1
structure are extremely ecient to apply, requiring only

coecients. We can represent this as a

containing Na /Ma pseudorandom 1s per row. For

is expressed as

1 +1 +1

1 +1 1

+1 +1 1

banded matrix

T = 1,

Na /Ma

(6.5)

N/M nonzeros. Note that matrices satisfying this

O (N ) computations compared to O (M N ) in the

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

A detailed analysis of the random demodulator in [193] studied the properties of these matrices applied to
a particular signal model. Specically, it is shown that if
transform (DFT) matrix, then the matrix

represents the N N

normalized discrete Fourier

will satisfy the restricted isometry property (Section 3.3)

(RIP) with high probability, provided that

M = O Klog 2 (N/K) ,
where the probability is taken with respect to the random choice of

(6.6)

pc [n].

This means that if

x (t)

is a

periodic (or nite-length) signal such that once it is sampled it is sparse or compressible in the basis
then it should be possible to recover

x (t)

Moreover, it is empirically demonstrated that combining

demodulator can recover

K -sparse

(in

minimization (Section 4.1) with the random

signals with

M CKlog (N/K + 1)
measurements where

C 1.7

from the measurements provided by the random demodulator.

(6.7)

[193].

Note that the signal model considered in [193] is somewhat restrictive, since even a pure tone will not
yield a sparse DFT unless the frequency happens to be equal to

k/Na

for some integer

Perhaps a more

realistic signal model is the multi-band signal model of [21], [20], [95], [93], [146], [202], where the signal
is assumed to be bandlimited outside of

bands each of bandwidth

where

is much less than the

total possible bandwidth. It remains unknown whether the random demodulator can be exploited to recover
such signals. Moreover, there also exist other CS-inspired architectures that we have not explored in this
[3], [167], [195], and this remains an active area of research. We have simply provided an overview of one of
the more promising approaches in order to illustrate the potential applicability of the ideas of this course

to the problem of analog-to-digital conversion.

6.6 Single-pixel camera

6.6.1 Architecture
Several hardware architectures have been proposed that apply the theory of compressive sensing (Section 1.1)
(CS) in an imaging setting [80], [143], [165]. We will focus on the so-called
[182], [205], [206].
products

single-pixel camera

[80], [181],

The single-pixel camera is an optical computer that sequentially measures the inner

y [j] =< x, j > between an N -pixel sampled version of the incident light-eld from the scene
x) and a set of N -pixel test functions {j }M
j=1 . The architecture is illustrated in

under view (denoted by

Figure 6.2, and an aerial view of the camera in the lab is shown in Figure 6.3. As shown in these gures,
the light-eld is focused by a lens (Lens 1 in Figure 6.3) not onto a CCD or CMOS sampling array but
rather onto a spatial light modulator (SLM). An SLM modulates the intensity of a light beam according to
a control signal. A simple example of a transmissive SLM that either passes or blocks parts of the beam is
an overhead transparency. Another example is a liquid crystal display (LCD) projector.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

8
9 This

content is available online at <http://cnx.org/content/m37369/1.4/>.

Figure 6.2: Single-pixel camera block diagram. Incident light-eld (corresponding to the desired image

is reected o a digital micromirror device (DMD) array whose mirror orientations are modulated

according to the pseudorandom pattern

supplied by a random number generator.

Each dierent

mirror pattern produces a voltage at the single photodiode that corresponds to one measurement

y [j].

The Texas Instruments (TI) digital micromirror device (DMD) is a reective SLM that selectively redirects parts of the light beam. The DMD consists of an array of bacterium-sized, electrostatically actuated
micro-mirrors, where each mirror in the array is suspended above an individual static random access memory
(SRAM) cell. Each mirror rotates about a hinge and can be positioned in one of two states (10 degrees
from horizontal) according to which bit is loaded into the SRAM cell; thus light falling on the DMD can be
reected in two directions depending on the orientation of the mirrors.
Each element of the SLM corresponds to a particular element of
For a given

j ,

(and its corresponding pixel in

x).

we can orient the corresponding element of the SLM either towards (corresponding to a 1

at that element of

j )

or away from (corresponding to a 0 at that element of

j )

a second lens (Lens 2 in

Figure 6.3). This second lens collects the reected light and focuses it onto a single photon detector (the
single pixel) that integrates the product of

and

to compute the measurement

output voltage. This voltage is then digitized by an A/D converter. Values of

y [j] =< x, j >

as its

between 0 and 1 can be

x
j into row vectors, we can thus model this system as computing the product
y = x, where each row of corresponds to a j . To compute randomized measurements, we set the
mirror orientations j randomly using a pseudorandom number generator, measure y [j], and then repeat
the process M times to obtain the measurement vector y .
obtained by dithering the mirrors back and forth during the photodiode integration time. By reshaping
into a column vector and the

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

Figure 6.3: Aerial view of the single-pixel camera in the lab.

The single-pixel design reduces the required size, complexity, and cost of the photon detector array down
to a single unit, which enables the use of exotic detectors that would be impossible in a conventional digital
camera. Example detectors include a photomultiplier tube or an avalanche photodiode for low-light (photonlimited) imaging, a sandwich of several photodiodes sensitive to dierent light wavelengths for multimodal
sensing, a spectrometer for hyperspectral imaging, and so on.
In addition to sensing exibility, the practical advantages of the single-pixel design include the facts
that the quantum eciency of a photodiode is higher than that of the pixel sensors in a typical CCD or
CMOS array and that the ll factor of a DMD can reach 90% whereas that of a CCD/CMOS array is only
about 50%. An important advantage to highlight is that each CS measurement receives about

N/2

times

more photons than an average pixel sensor, which signicantly reduces image distortion from dark noise and
read-out noise.
The single-pixel design falls into the class of multiplex cameras. The baseline standard for multiplexing
is classical raster scanning, where the test functions

{j } are a sequence of delta functions [n j] that turn

on each mirror in turn. There are substantial advantages to operating in a CS rather than raster scan mode,
including fewer total measurements (M for CS rather than

for raster scan) and signicantly reduced dark

noise. See [80] for a more detailed discussion of these issues.

Figure 6.4 (a) and (b) illustrates a target object (a black-and-white printout of an R)
image

taken by the single-pixel camera prototype in Figure 6.3 using

Figure 6.4(c) illustrates an

N = 256 256

x and reconstructed
and

M = N/50[80].

color single-pixel photograph of a printout of the Mandrill test

image taken under low-light conditions using RGB color lters and a photomultiplier tube with

M = N/10.

In both cases, the images were reconstructed using total variation minimization, which is closely related to
wavelet coecient

minimization (Section 4.1) [38].

(a)

(b)

(c)

Figure 6.4: Sample image reconstructions from single-pixel camera. (a)

a black-and-white R. (b) Image reconstructed from

sub-Nyquist). (c)

256 256

M = 1300

256256 conventional image of

single-pixel camera measurements (50

pixel color reconstruction of a printout of the Mandrill test image imaged in

a low-light setting using a single photomultiplier tube sensor, RGB color lters, and

M = 6500

random

measurements.

6.6.2 Discrete formulation

Since the DMD array is programmable, we can employ arbitrary test functions
restrict the

{0, 1}-valued,

to be

storing these patterns for large values of

as noted above, even pseudorandom

than purely random

j .

However, even when we

is impractical. Furthermore,

can be computationally problematic during recovery. Thus, rather

we can also consider

that admit a fast transform-based implementation by taking

random submatrices of a Walsh, Hadamard, or noiselet transform [47], [31].

We will describe the Walsh

transform for the purpose of illustration.

We will suppose that
begin by setting

W0 = 1,

is a power of 2 and let

and we now dene

Wlog2 N

denote the

1 Wj1
Wj =
2 Wj1
O (N logN )

Walsh transform matrix. We

Wj1
Wj1

This construction produces an orthonormal matrix with entries of

requiring

N N

recursively as

1/ N

(6.8)

that admits a fast implementation

computations to apply. As an example, note that

1 1
W1 =
2 1

1
1

(6.9)

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

and

1
1
W2 =
2 1

1
We can exploit these constructions as follows.
a

M N

submatrix of the identity

submatrix of

1
2

each row of

.
1

Suppose that

N = 2B

(6.10)

and generate

indexed by

Furthermore, let

WB .

I denote
I WB is the
random N N
Let

rows, so that

denote a

1
1
N I WB +
2
2

(6.11)

1
2 merely rescales and shifts I WB to have {0, 1}-valued entries, and recall that
will be reshaped into a 2-D matrix of numbers that is then displayed on the DMD array.

N I WB +

Furthermore,

=
Note that

obtained by picking a random set of

consisting of the rows of

permutation matrix. We can generate

can be thought of as either permuting the pixels or permuting the columns of

WB .

This

step adds some additional randomness since some of the rows of the Walsh matrix are highly correlated
with coarse scale wavelet basis functions but permuting the pixels eliminates this structure. Note that
at this point we do not have any strict guarantees that such
product

combined with a wavelet basis

will yield a

satisfying the restricted isometry property (Section 3.3), but this approach seems to work well

in practice.

6.7 Hyperspectral imaging

Standard digital color images of a scene of interest consist of three components red, green and blue
which contain the intensity level for each of the pixels in three dierent groups of wavelengths. This concept
has been extended in the

hyperspectral

acquired consists of a three-dimensional

dimension

multispectral imaging sensing modalities, where the data to be

datacube that has two spatial dimensions x and y and one spectral

and

In simple terms, a datacube is a 3-D function

f (x, y, ) that can be represented as a stacking of intensities

of the scene at dierent wavelengths. An example datacube is shown in Figure 6.5. Each of its entries is
called a voxel.
dimension

We also dene a pixel's

f (x, y) = {f (x, y, )} .

spectral signature

as the stacking of its voxels in the spectral

The spectral signature of a pixel can give a wealth of information about

the corresponding point in the scene that is not captured by its color. For example, using spectral signatures,
it is possible to identify the type of material observed (for example, vegetation vs. ground vs. water), or its
chemical composition.
Datacubes are high-dimensional, since the standard number of pixels present in a digitized image is
multiplied by the number of spectral bands desired.

However, considerable structure is present in the

observed data. The spatial structure common in natural images is also observed in hyperspectral imaging,
while each pixel's spectral signature is usually smooth.

10 This

content is available online at <http://cnx.org/content/m37370/1.4/>.

Figure 6.5: Example hyperspectral datacube, with labeled dimensions.

Compressive sensing (Section 1.1) (CS) architectures for hyperspectral imaging perform lower-dimensional
projections that multiplex in the spatial domain, the spectral domain, or both.

Below, we detail three

example architectures, as well as three possible models to sparsify hyperspectral datacubes.

6.7.1 Compressive hyperspectral imaging architectures

6.7.1.1 Single pixel hyperspectral camera
The single pixel camera (Section 6.6) uses a single photodetector to record random projections of the light
emanated from the image, with the dierent random projections being captured in sequence. A single pixel
hyperspectral camera requires a light modulating element that is reective across the wavelengths of interest,
as well as a sensor that can record the desired spectral bands separately [123]. A block diagram is shown in
Figure 6.6.
The single sensor consists of a single spectrometer that spans the necessary wavelength range, which
replaces the photodiode. The spectrometer records the intensity of the light reected by the modulator in
each wavelength.

The same digital micromirror device (DMD) provides reectivity for wavelengths from

near infrared to near ultraviolet. Thus, by converting the datacube into a vector sorted by spectral band,
the matrix that operates on the data to obtain the CS measurements is represented as

x,y

.
.
.

x,y

This architecture performs multiplexing only in the spatial domain, i.e. dimensions
no mixing of the dierent spectral bands along the dimension

(6.12)

and

since there is

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

Block diagram for a single pixel hyperspectral camera.

Figure 6.6:

The photodiode is replaced by

a spectrometer that captures the modulated light intensity for all spectral bands, for each of the CS
measurements.

6.7.1.2 Dual disperser coded aperture snapshot spectral imager

The dual disperser coded aperture snapshot spectral imager (DD-CASSI), shown in Figure 6.7, is an architecture that combines separate multiplexing in the spatial and spectral domain, which is then sensed by a
wide-wavelength sensor/pixel array, thus attening the spectral dimension [99].
First, a dispersive element separates the dierent spectral bands, which still overlap in the spatial domain.
In simple terms, this element shears the datacube, with each spectral slice being displaced from the previous
by a constant amount in the same spatial dimension.

The resulting datacube is then masked using the

coded aperture, whose eect is to "punch holes" in the sheared datacube by blocking certain pixels of light.
Subsequently, a second dispersive element acts on the masked, sheared datacube; however, this element
shears in the opposite direction, eectively inverting the shearing of the rst dispersive element. The resulting
datacube is upright, but features "sheared" holes of datacube voxels that have been masked out.
The resulting modied datacube is then received by a sensor array, which attens the spectral dimension
by measuring the sum of all the wavelengths received; the received light eld resembles the target image,
allowing for optical adjustments such as focusing. In this way, the measurements consist of full sampling in
the spatial

and

dimensions, with an aggregation eect in the spectral

dimension.

(a)

(b)

Figure 6.7: Dual disperser coded aperture snapshot spectral imager (DD-CASSI). (a) Schematic of the

DD-CASSI components. (b) Illustration of the datacube processing performed by the components.

6.7.1.3 Single disperser coded aperture snapshot spectral imager

The single disperser coded aperture snapshot spectral imager (SD-CASSI), shown in Figure 6.8, is a simplication of the DD-CASSI architecture in which the rst dispersive element is removed [204]. Thus, the
light eld received at the sensors does not resemble the target image. Furthermore, since the shearing is not
reversed, the area occupied by the sheared datacube is larger than that of the original datacube, requiring
a slightly larger number of pixels for the capture.

(a)

Figure 6.8:

(b)

Single disperser coded aperture snapshot spectral imager (SD-CASSI). (a) Schematic of

the SD-CASSI components. (b) Illustration of the datacube processing performed by the components.

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

6.7.2 Sparsity structures for hyperspectral datacubes

6.7.2.1 Dyadic Multiscale Partitioning
This sparsity (Section 2.3) structure assumes that the spectral signature for all pixels in a neighborhood
is close to constant; that is, that the datacube is piecewise constant with smooth borders in the spatial
dimensions. The complexity of an image is then given by the number of spatial dyadic squares with constant
spectral signature necessary to accurately approximate the datacube; see Figure 6.9.

A reconstruction

algorithm then searches for the signal of lowest complexity (i.e., with the fewest dyadic squares) that generates
compressive measurements close to those observed [99].

Figure 6.9: Example dyadic square partition for piecewise spatially constant datacube.

6.7.2.2 Spatial-only sparsity

This sparsity structure operates on each spectral band separately and assumes the same type of sparsity
structure for each band [204]. The sparsity basis is drawn from those commonly used in images, such as
wavelets, curvelets, or the discrete cosine basis.

Since each basis operates only on a band, the resulting

sparsity basis for the datacube can be represented as a block-diagonal matrix:

x,y

.
.
.

x,y

(6.13)

6.7.2.3 Kronecker product sparsity

This sparsity structure employs separate sparsity bases for the spatial dimensions and the spectral dimension,
and builds a sparsity basis for the datacube using the Kronecker product of these two [79]:

[1, 1] x,y

[1, 2] x,y

= x,y = [2, 1] x,y

[2, 2] x,y

.
.

.
.
.

(6.14)

In this manner, the datacube sparsity bases simultaneously enforces both spatial and spectral structure,
potentially achieving a sparsity level lower than the sums of the spatial sparsities for the separate spectral
slices, depending on the level of structure between them and how well can this structure be captured through
sparsity.

6.7.3 Summary
Compressive sensing (Section 1.1) will make the largest impact in applications with very large, high dimensional datasets that exhibit considerable amounts of structure. Hyperspectral imaging is a leading example
of such applications; the sensor architectures and data structure models surveyed in this module show initial
promising work in this new direction, enabling new ways of simultaneously sensing and compressing such
data. For standard sensing architectures, the data structures surveyed also enable new transform codingbased compression schemes.

6.8 Compressive processing of manifold-modeled data

A powerful data model for many applications is the geometric notion of a low-dimensional
that possesses merely

intrinsic degrees of freedom can be assumed to lie on a

manifold.

K -dimensional

Data

manifold in

the high-dimensional ambient space. Once the manifold model is identied, any point on it can be represented
using essentially

pieces of information. For instance, suppose a stationary camera of resolution

observes

a truck moving down along a straight line on a highway. Then, the set of images captured by the camera
forms a 1-dimensional manifold in the image space

RN .

Another example is the set of images captured by

a static camera observing a cube that rotates in 3 dimensions. (Figure 6.10).

11 This

content is available online at <http://cnx.org/content/m37371/1.6/>.

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

(a)

(b)

Figure 6.10: (a) A rotating cube has 3 degrees of freedom, thus giving rise to a 3-dimensional manifold

in image space. (b) Illustration of a manifold

parametrized by a

Kdimensional

vector

In many applications, it is benecial to explicitly characterize the structure (alternately, identify the
parameters) of the manifold formed by a set of observed signals. This is known as

manifold learning

and has

been the subject of considerable study over the last several years; well-known manifold learning algorithms
include Isomap [185], LLE [169], and Hessian eigenmaps [72].

An informal example is as follows: if a 2-

dimensional manifold were to be imagined as the surface of a twisted sheet of rubber, manifold learning can
be described as the process of unraveling the sheet and stretching it out on a 2D at surface. Figure 6.11
indicates the performance of Isomap on a simple 2-dimensional dataset comprising of images of a translating
disk.

(a)

Figure 6.11:

(b)

(c)

(a) Input data consisting of 1000 images of a disk shifted in

parametrized by an articulation vector

(1 , 2 ).

(b) True
N
Isomap embedding learned from original data in R .

linear, nonadaptive

and

K = 2

dimensions,

values of the sampled data.

(c)

manifold dimensionality reduction technique has recently been introduced that

K -dimensional manifold M in the ambient space

M = CKlog (N ); note that K < M < < N .
The result of [9] is that the pairwise metric structure of sample points from M is preserved with high accuracy
N
M
under projection from R
to R . This is analogous to the result for compressive sensing (Section 1.1) of

employs the technique of random projections [9]. Consider a

and its projection onto a random subspace of dimension

sparse (Section 2.3) signals (see "The restricted isometry property" (Section 3.3); however, the dierence is
that the number of projections required to preserve the ensemble structure does

not

depend on the sparsity

of the individual images, but rather on the dimension of the underlying manifold.
This result has far reaching implications; it suggests that a wide variety of signal processing tasks can
be performed

directly on the random projections

acquired by these devices, thus saving valuable sensing,

storage and processing costs. In particular, this enables provably ecient manifold learning in the projected
domain [113]. Figure 6.12 illustrates the performance of Isomap on the translating disk dataset under varying
numbers of random projections.

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

(a)

(b)

(c)

(d)

Figure 6.12: Isomap embeddings learned from random projections of the 625 images of shifting squares.

(a) 25 random projections; (b) 50 random projections; (c) 25 random projections; (d) full data.

The advantages of random projections extend even to cases where the original data is available in the
ambient space

RN .

For example, consider a wireless network of cameras observing a static scene. The set

of images captured by the cameras can be visualized as living on a low-dimensional manifold in the image
space. To perform joint image analysis, the following steps might be executed:
1.

Collate:
unit.

Each camera node transmits its respective captured image (of size

N ) to a central processing

Preprocess:

The central processor estimates the

intrinsic dimensionK

of the underlying image man-

ifold.
3.

Learn:

The central processor performs a nonlinear embedding of the data points for instance, using

Isomap [185] into a

In situations where

K -dimensional

Euclidean space, using the estimate of

from the previous step.

is large and communication bandwidth is limited, the dominating costs will be in

the rst transmission/collation step.

To reduce the communication expense, one may perform nonlinear

image compression (such as JPEG) at each node before transmitting to the central processing. However,
this requires a good deal of processing power at each sensor, and the compression would have to be undone
during the learning step, thus adding to overall computational costs.
As an alternative, every camera could encode its image by computing (either directly or indirectly) a small
number of random projections to communicate to the central processor [57]. These random projections are
obtained by linear operations on the data, and thus are cheaply computed. Clearly, in many situations it will
be less expensive to store, transmit, and process such randomly projected versions of the sensed images. The
method of random projections is thus a powerful tool for ensuring the stable embedding of low-dimensional
manifolds into an intermediate space of reasonable size. It is now possible to think of settings involving a
huge number of low-power devices that inexpensively capture, store, and transmit a very small number of
measurements of high-dimensional data.

6.9 Inference using compressive measurements

While the compressive sensing (Section 1.1) (CS) literature has focused almost exclusively on problems in
signal reconstruction/approximation (Section 5.1), this is frequently not necessary. For instance, in many
signal processing applications (including computer vision, digital communications and radar systems), signals
are acquired only for the purpose of making a detection or classication decision. Tasks such as detection
do not require a reconstruction of the signal, but only require estimates of the relevant

sucient statistics

for the problem at hand.

As a simple example, suppose a surveillance system (based on compressive imaging) observes the motion
of a person across a static background. The relevant information to be extracted from the data acquired by
this system would be, for example, the identity of the person, or the location of this person with respect to
a predened frame of coordinates. There are two ways of doing this:

Reconstruct the full data using standard sparse recovery (Section 5.1) techniques and apply standard
computer vision/inference algorithms on the reconstructed images.

Develop an inference test which operates

directly

on the compressive measurements, without ever

reconstructing the full images.

A crucial property that enables the design of compressive inference algorithms is the

information scalability

property of compressive measurements. This property arises from the following two observations:

For certain signal models, the action of a

random

linear function on the set of signals of interest

preserves enough information to perform inference tasks on the observed measurements.

The

number

of random measurements required to perform the inference task usually depends on

the nature of the inference task. Informally, we observe that more sophisticated tasks require more
measurements.
We examine three possible inference problems for which algorithms that

measurements

directly operate on the compressive

can be developed: detection (determining the presence or absence of an information-bearing

signal), classication (assigning the observed signal to one of two (or more) signal classes), and parameter
estimation (calculating a

12 This

function of the

observed signal).

content is available online at <http://cnx.org/content/m37372/1.4/>.

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

6.9.1 Detection
In detection one simply wishes to answer the question: is a (known) signal present in the observations? To
solve this problem, it suces to estimate a relevant

sucient statistic.

Based on a concentration of measure

inequality, it is possible to show that such sucient statistics for a detection problem can be accurately
estimated from random projections, where the quality of this estimate depends on the signal to noise ratio
(SNR) [55]. We make no assumptions on the signal of interest
detecting

and hence we can build systems capable of

even when it is not known in advance. Thus, we can use random projections for dimensionality-

reduction in the detection setting without knowing the relevant structure.

In the case where the class of signals of interest corresponds to a low dimensional subspace, a truncated,
simplied sparse approximation can be applied as a detection algorithm; this has been dubbed as IDEA [81].
In simple terms, the algorithm will mark a detection when a large enough amount of energy from the
measurements lies in the projected subspace. Since this problem does not require accurate estimation of the
signal values, but rather whether it belongs in the subspace of interest or not, the number of measurements
necessary is much smaller than that required for reconstruction, as shown in Figure 6.13.

(a)

(b)

(c)

Figure 6.13: Performance for IDEA. (Top) Sample wideband

chirp

signal and same chirp embedded

in strong narrowband interference. (Bottom) Probability of error to reconstruct and detect chirp signals
embedded in strong sinusoidal interference (SIR
tion requires

fewer measurements and

= 6

dB) using greedy algorithms. In this case, detec-

fewer computations than reconstruction for an equivalent

probability of success. Taken from [81].

6.9.2 Classication
Similarly, random projections have long been used for a variety of classication and clustering problems. The
Johnson-Lindenstrauss Lemma is often exploited in this setting to compute approximate nearest neighbors,

which is naturally related to classication. The key result that random projections result in an isometric
embedding allows us to generalize this work to several new classication algorithms and settings [55].
Classication can also be performed when more elaborate models are used for the dierent classes.
Suppose the signal/image class of interest can be modeled as a low-dimensional manifold (Section 6.8) in
the ambient space. In such case it can be shown that, even under random projections, certain geometric
properties of the signal class are preserved up to a small distortion; for example, interpoint Euclidean (`2 )
distances are preserved [10]. This enables the design of classication algorithms in the

projected domain.

One

such algorithm is known as the smashed lter [56]. As an example, under equal distribution among classes
and a gaussian noise setting, the smashed lter is equivalent to building a nearest-neighbor (NN) classier in
the measurement domain. Further, it has been shown that for a

Kdimensional

manifold,

M = O (KlogN )

measurements are sucient to perform reliable compressive classication. Thus, the number of measurements
scales as the dimension of the signal class, as opposed to the

sparsity

of the individual signal. Some example

results are shown in Figure 6.14(a).

(a)

(b)

Figure 6.14: Results for smashed lter image classication and parameter estimation experiments. (a)

Classication rates and (b) average estimation error for varying number of measurements
levels

for a set of images of several objects under varying shifts. As

and noise

increases, the distances between

the manifolds increase as well, thus increasing the noise tolerance and enabling more accurate estimation
and classication. Thus, the classication and estimation performances improve as

decreases and

increases in all cases. Taken from [82].

6.9.3 Estimation
x RN , and suppose that we wish to estimate some
y = x, where is again an M N matrix. The data

f (x)

Consider a signal

function

measurements

streaming (Section 6.8) community

but only observe the

has previously analyzed this problem for many common functions, such as linear functions,
histograms.

These estimates are often based on so-called

projections.
As an example, in the case where
to the norms of

and

is a

linear

sketches,

norms, and

which can be thought of as random

function, one can show that the estimation error (relative

can be bounded by a constant determined by

This result holds for a wide

class of random matrices, and can be viewed as a straightforward consequence of the same concentration of
measure inequality (Section 7.2) that has proven useful for CS and in proving the JL Lemma [55].

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

Parameter estimation can also be performed when the signal class is modeled as a low-dimensional
manifold.
where

Suppose an observed signal

K N.

obtained via

can be parameterized by a

Kdimensional

parameter vector

Then, it can be shown that with 0(KlogN ) measurements, the parameter vector can be

multiscale manifold navigation in the compressed domain [82].

Some example results are shown

in Figure 6.14(b).

6.10 Compressive sensor networks

Sparse (Section 2.3) and compressible (Section 2.4) signals are present in many sensor network applications,
such as environmental monitoring, signal eld recording and vehicle surveillance. Compressive sensing (Section 1.1) (CS) has many properties that make it attractive in this settings, such as its low complexity sensing
and compression, its universality and its graceful degradation. CS is robust to noise, and allows querying
more nodes to obey further detail on signals as they become interesting. Packet drops also do not harm the
network nearly as much as many other protocols, only providing a marginal loss for each measurement not
obtained by the receiver. As the network becomes more congested, data can be scaled back smoothly.
Thus CS can enable the design of generic compressive sensors that perform random or incoherent projections.
Several methods for using CS in sensor networks have been proposed. Decentralized methods pass data
throughout the network, from neighbor to neighbor, and allow the decoder to probe any subset of nodes.
In contrast, centralized methods require all information to be transmitted to a centralized data center, but
reduce either the amount of information that must be transmitted or the power required to do so. We briey
summarize each class below.

6.10.1 Decentralized algorithms

Decentralized algorithms enable the calculation of compressive measurements at each sensor in the network,
thus being useful for applications where monitoring agents traverse the network during operation.

6.10.1.1 Randomized gossiping

In randomized gossiping [163], each sensor communicates

random projection of its data sample to a

random set of nodes, in each stage aggregating and forwarding the observations received to a new set of
random nodes. In essence, a spatial dot product is being performed as each node collects and aggregates
information, compiling a sum of the weighted samples to obtain

CS measurements which becomes more

accurate as more rounds of random gossiping occur. To recover the data, a basis that provides data sparsity
(or at least compressibility) is required, as well as the random projections used. However, this information
does not need to be known while the data is being passed.
The method can also be applied when each sensor observes a compressible signal.

In this case, each

sensor computes multiple random projections of the data and transmits them using randomized gossiping
to the rest of the network. A potential drawback of this technique is the amount of storage required per
sensor, as it could be considerable for large networks .

In this case, each sensor can store the data from

only a subset of the sensors, where each group of sensors of a certain size will be known to contain CS
measurements for all the data in the network. To maintain a constant error as the network size grows, the
number of transmissions becomes
is partitioned,

kM n2

, where

represents the number of groups in which the data

is the number of values desired from each sensor and n are the number of nodes in the

network. The results can be improved by using geographic gossiping algorithms [63].

13 This

content is available online at <http://cnx.org/content/m37373/1.3/>.

6.10.1.2 Distributed sparse random projections

A second method modies the randomized gossiping approach by limiting the number of communications
Each data node takes M
L neighbors, and summing the observations;

each node must perform, in order to reduce overall power consumption [208].
projections of its data, passing along information to a small set of
the resulting CS measurements are sparse, since

N L

of each row's entries will be zero. Nonetheless, these

projections can still be used as CS measurements with quality similar to that of full random projections. Since
the CS measurement matrix formed by the data nodes is sparse, a relatively small amount of communication
is performed by each encoding node and the overall power required for transmission is reduced.

6.10.2 Centralized algorithms

Decentralized algorithms are used when the sensed data must be routed to a single location; this architecture
is common in sensor networks were low power, simple nodes perform sensing and a powerful central location
performs data processing.

6.10.2.1 Compressive wireless sensing

Compressive wireless sensing (CWS) emphasizes the use of synchronous communication to reduce the transmission power of each sensor [4]. In CWS, each sensor calculates a noisy projection of their data sample.
Each sensor then transmits the calculated value by analog modulation and transmission of a communication
waveform.

The projections are aggregated at the central location by the receiving antenna, with further

noise being added. In this way, the fusion center receives the CS measurements, from which it can perform
reconstruction using knowledge of the random projections.
A drawback of this method is the required accurate synchronization. Although CWS is constraining the
power of each node, it is also relying on constructive interference to increase the power received by the data
center.

The nodes themselves must be accurately synchronized to know when to transmit their data.

addition, CWS assumes that the nodes are all at approximately equal distances from the fusion center, an
assumption that is acceptable only when the receiver is far away from the sensor network. Mobile nodes
could also increase the complexity of the transmission protocols. Interference or path issues also would have
a large eect on CWS, limiting its applicability.
If these limitations are addressed for a suitable application, CWS does oer great power benets when
very little is known about the data beyond sparsity in a xed basis.

M 2/(2+1) ,

where

Distortion will be proportional to

is some positive constant based on the network structure. With much more a priori

information about the sensed data, other methods will achieve distortions proportional to

M 2 .

6.10.2.2 Distributed compressive sensing

Distributed Compressive Sensing (DCS) provides several models for combining neighboring sparse signals,
relying on the fact that such sparse signals may be similar to each other, a concept that is termed joint
sparsity [84]. In an example model, each signal has a common component and a local innovation, with the
commonality only needing to be encoded once while each innovation can be encoded at a lower measurement
rate. Three dierent joint sparsity models (JSMs) have been developed:
1. Both common signal and innovations are sparse;
2. Sparse innovations with shared sparsity structure;
3. Sparse innovations and dense common signal.
Although JSM 1 would seem preferable due to the relatively limited amount of data, only JSM 2 is computationally feasible for large sensor networks; it has been used in many applications [84]. JSMs 1 and 3 can
be solved using a linear program, which has cubic complexity on the number of sensors in the network.
DCS, however, does not address the communication or networking necessary to transmit the measurements to a central location; it relies on standard communication and networking techniques for measurement
transmission, which can be tailored to the specic network topology.

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

6.11 Genomic sensing

Biosensing of pathogens is a research area of high consequence. An accurate and rapid biosensing paradigm
has the potential to impact several elds, including healthcare, defense and environmental monitoring. In
this module we address the concept of biosensing based on compressive sensing (Section 1.1) (CS) via the

Compressive Sensing Microarray

(CSM), a DNA microarray adapted to take CS-style measurements.

DNA microarrays are a frequently applied solution for microbe sensing; they have a signicant edge
over competitors due to their ability to sense many organisms in parallel [128], [171]. A DNA microarray
consists of genetic sensors or

spots, each containing DNA sequences termed probes.

From the perspective of

a microarray, each DNA sequence can be viewed as a sequence of four DNA bases {A,
bind with one another in complementary pairs:

with

and

with

T , G, C } that tend to

Therefore, a DNA subsequence in

a target organism's genetic sample will tend to bind or hybridize with its complementary subsequence on a
microarray to form a stable structure. The target DNA sample to be identied is uorescently tagged before
it is ushed over the microarray. The extraneous DNA is washed away so that only the bound DNA is left
on the array. The array is then scanned using laser light of a wavelength designed to trigger uorescence in
the spots where binding has occurred. A specic pattern of array spots will uoresce, which is then used to
infer the genetic makeup in the test sample.

Figure 6.15:

Cartoon of traditional DNA microarray showing strong and weak hybridization of the

unique pathogen identier at dierent microarray spots

There are three issues with the traditional microarray design.

Each spot consists of probes that can

uniquely identify only one target of interest (each spot contains multiple copies of a probe for robustness.)
The rst concern with this design is that very often the targets in a test sample have similar base sequences,
causing them to hybridize with the wrong probe (see Figure 6.15). These cross-hybridization events lead
to errors in the array readout. Current microarray design methods do not address cross-matches between
similar DNA sequences.
The second concern in choosing unique identier based DNA probes is its restriction on the number of
organisms that can be identied. In typical biosensing applications multiple organisms must be identied;
therefore a large number of DNA targets requires a microarray with a large number of spots. In fact, there
are over 1000 known harmful microbes, many with more than 100 strains.

The implementation cost and

processing speed of microarray data is directly related to its number of spots, representing a signicant
problem for commercial deployment of microarray-based biosensors. As a consequence readout systems for
traditional DNA arrays cannot be miniaturized or implemented using electronic components and require
complicated uorescent tagging.
The third concern is the inecient utilization of the large number of array spots in traditional microarrays.
Although the number of potential agents in a sample is very large,

not all agents

are expected to be present

in a signicant concentration at a given time and location, or in an air/water/soil sample to be tested.

14 This

content is available online at <http://cnx.org/content/m37374/1.3/>.

Therefore, in a traditionally designed microarray only a small fraction of spots will be active at a given time,
corresponding to the few targets present.
To combat these problems, a Compressive Sensing DNA Microarray (CSM) uses combinatorial testing
sensors in order to reduce the number of sensor spots [145], [174], [176]. Each spot in the CSM identies a

group of target organisms, and several spots together generate a unique pattern identier for a single target.
(See also "Group testing and data stream algorithms" (Section 6.3).) Designing the probes that perform
this combinatorial sensing is the essence of the microarray design process, and what we aim to describe in
this module.
To obtain a CS-type measurement scheme, we can choose each probe in a CSM to be a group identier
such that the readout of each probe is a probabilistic combination of all the targets in its group.

The

probabilities are representative of each probe's hybridization anity (or stickiness) to those targets in its
group; the targets that are not in its group have low anity to the probe. The readout signal at each spot
of the microarray is a linear combination of hybridization anities between its probe sequence and each of
the target agents.

Figure 6.16: Structure of the CSM sensing matrix

with

spots identifying

targets

M spots on the CSM and

1 j N , the probe at spot i

Figure 6.16 illustrates the sensing process. To formalize, we assume there are

targets; we have far fewer spots than target agents. For

hybridizes with target

xj ,

with anity

i,j .

so that the total hybridization of spot

j
PN

The target

i is yi =

1iM

and

occurs in the tested DNA sample with concentration

j=1

i,j xj = i x,

where

vector, respectively. The resulting measured microarray signal intensity vector

CS measurement model

and x are a row and column

y = {yi }i = 1, ..., M ts the

y = x.

While group testing has previously been proposed for microarrays [172], the sparsity in the target signal
is key in applying CS. The chief advantage of a CS-based approach over regular group testing is in its
information scalability.

We are able to not just detect, but

estimate

number of measurements similar to that of group testing [78].

the target signal with a reduced

This is important since there are always

minute quantities of certain pathogens in the environment, but it is only their large concentrations that
may be harmful to us. Furthermore, we are able to use CS recovery methods such as Belief Propagation
(Section 5.5) that decode

while accounting for experimental noise and measurement nonlinearities due to

excessive target molecules [176].

CHAPTER 6. APPLICATIONS OF COMPRESSIVE SENSING

Chapter 7

Appendices
7.1 Sub-Gaussian random variables

A number of distributions, notably Gaussian and Bernoulli, are known to satisfy certain concentration of
measure (Section 7.2) inequalities. We will analyze this phenomenon from a more general perspective by
considering the class of sub-Gaussian distributions [22].

Denition 7.1:
A random variable

holds for all

The function

t R.

is called

We use the

E (exp (Xt))

is the

sub-Gaussian

if there exists a constant

E (exp (Xt)) exp c2 t2 /2

2
notation X Sub c
to denote

moment-generating function

c>0

such that
(7.1)

that

satises (7.1).

while the upper bound in (7.1) is

the moment-generating function of a Gaussian random variable. Thus, a sub-Gaussian distribution is one
whose moment-generating function is bounded by that of a Gaussian. There are a tremendous number of
sub-Gaussian distributions, but there are two particularly important examples:

Example 7.1

X N 0, 2 , i.e., X is a zero-mean Gaussian random variable with variance 2 , then X
Sub 2 . Indeed, as mentioned
above, the moment-generating function of a Gaussian is given by

E (exp (Xt)) = exp 2 t2 /2 , and thus (7.1) is trivially satised.
If

Example 7.2
If

X is a zero-mean, bounded random variable, i.e., one

|X| B with probability 1, then X Sub B 2 .

for which there exists a constant

such

that

A common way to characterize sub-Gaussian random variables is through analyzing their moments.

consider only the mean and variance in the following elementary lemma, proven in [22].

Lemma 7.1:
If

(Buldygin-Kozachenko [22])

X Sub c2

then,

E (X) = 0

(7.2)

E X 2 c2 .

(7.3)

and

1 This

content is available online at <http://cnx.org/content/m37185/1.6/>.

CHAPTER 7. APPENDICES

Lemma 7.1, (Buldygin-Kozachenko [22]), p. 81 shows that if

X Sub c2

then

E X 2 c2 .

In some

settings it will be useful to consider a more restrictive class of random variables for which this inequality
becomes an equality.

Denition 7.2:
A random variable

is called

strictly sub-Gaussian if X Sub 2

where

2 = E X 2

, i.e., the

inequality

E (exp (Xt)) exp 2 t2 /2
t R. To denote
X SSub 2 .

holds for all

notation

Example 7.3
If

X N 0, 2

, then

that

X SSub 2

(7.4)

2 ,

is strictly sub-Gaussian with variance

we will use the

Example 7.4
If

X U (1, 1),

i.e.,

is uniformly distributed on the interval

[1, 1],

X SSub (1/3).

then

Example 7.5
Now consider the random variable with distribution such that

P (X = 1) = P (X = 1) =
For any

s [0, 2/3], X SSub (1 s).

For

1s
, P (X = 0) = s, s [0, 1) .
2
s (2/3, 1), X is not strictly sub-Gaussian.

(7.5)

We now provide an equivalent characterization for sub-Gaussian and strictly sub-Gaussian random variables,
proven in [22], that illustrates their concentration of measure behavior.

Theorem 7.1:

(Buldygin-Kozachenko [22])

A random variable

for all

t t0 .

X Sub c2

Moreover, if

if and only if there exists a t0

0 and a constant a 0 such that

t2
P (|X| t) 2exp 2
2a

2
X SSub , then (7.6) holds for

(7.6)
all

t>0

with

a = .

Finally, sub-Gaussian distributions also satisfy one of the fundamental properties of a Gaussian distribution: the sum of two sub-Gaussian random variables is itself a sub-Gaussian random variable. This result is
established in more generality in the following lemma.

Lemma 7.2:
X = [X1 ,X2 , ..., XN ], where each Xi is independent and identically
distributed

2
2
c
k

k
. Similarly, if each
Xi Sub c2 . Then for any RN , < X, > Sub
2

Xi SSub 2 , then for any RN , < X, > SSub 2 k k22 .
Suppose that

(i.i.d.) with

Proof:

Since the

are i.i.d., the joint distribution factors and simplies as:

P

N
E exp t i=1 i Xi

Xi are strictly sub-Gaussian,

E < X, >2 = 2 k k22 .
If the

Q

N
=E
exp
(t
X
)
i
i
i=1
QN
= i=1 E (exp (ti Xi ))

QN
2
i=1 exp c2 (i t) /2

P
N
2
2 2
= exp
i=1 i c t /2 .

then the result follows by setting

c2 = 2

(7.7)

and observing that

7.2 Concentration of measure for sub-Gaussian random variables

Sub-Gaussian distributions (Section 7.1) have a close relationship to the concentration of measure phenomenon [131]. To illustrate this, we note that we can combine Lemma 2 and Theorem 1 from "Sub-Gaussian
random variables" (Section 7.1) to obtain deviation bounds for weighted sums of sub-Gaussian random variables. For our purposes, however, it will be more enlightening to study the
random variables. In particular, if

k X k2

like to know how

is a vector where each

norm of a vector of sub-Gaussian

is i.i.d. with

Xi Sub (c),

then we would

deviates from its mean.

In order to establish the result, we will make use of Markov's inequality for nonnegative random variables.

Lemma 7.3:

(Markov's Inequality)

For any nonnegative random variable

and

t > 0,

P (X t)

Proof:
Let

f (x)

E (X)
.
t

denote the probability density function for

Z
xf (x) dx

E (X) =
0

Z
xf (x) dx

(7.8)

tf (x) dx = tP (X t) .

(7.9)

In addition, we will require the following bound on the exponential moment of a sub-Gaussian random
variable.

Lemma 7.4:
Suppose

X Sub c2

. Then

E exp X 2 /2c2
for any

Proof:

1
,
1

(7.10)

[0, 1).

First, observe that if = 0, then the lemma holds trivially. Thus, suppose that (0, 1). Let
f (x) denote the probability density function for X . Since X is sub-Gaussian, we have by denition
that

exp (tx) f (x) dx exp c2 t2 /2

(7.11)

for any

t R.

If we multiply by

exp c2 t2 /2

, then we obtain

exp tx c2 t2 /2 f (x) dx exp c2 t2 ( 1) /2 .

(7.12)

t, we obtain

Z

exp tx c2 t2 /2 dt f (x) dx

Now, integrating both sides with respect to

exp c2 t2 ( 1) /2 dt,

(7.13)

which reduces to

1
2
c
2 This

exp x /2c

1
f (x) dx
c

content is available online at <http://cnx.org/content/m32583/1.7/>.

2
.
1

(7.14)

CHAPTER 7. APPENDICES

This simplies to prove the lemma.

We now state our main theorem, which generalizes the results of [53] and uses substantially the same
proof technique.

Theorem 7.2:
Suppose that

X = [X1 , X2 , ..., XM ],

where each

is i.i.d. with

Xi Sub c2

and

E Xi2 = 2 .

Then

Moreover, for
depending only

E k X k22 = M 2 .
2 2

any (0, 1) and for any c / , max , there exists
2 2
on max and the ratio /c such that

2
P k X k22 M 2 exp M (1 ) /

(7.15)
a constant

(7.16)

and

2
P k X k22 M 2 exp M ( 1) / .

(7.17)

Proof:
Since the

are independent, we obtain

E kX

k22

M
X

Xi2

M
X

2 = M 2

(7.18)

i=1

and hence (7.15) holds. We now turn to (7.16) and (7.17). Let us rst consider (7.17). We begin
by applying Markov's inequality:

P k X k22 M 2

Since

Xi Sub c2

= P exp k X k22 exp M 2
E(exp(kXk2 ))
exp(M 22)
QM
E(exp(Xi2 ))
= i=1
.
exp(M 2 )

(7.19)

, we have from Lemma 7.4, p. 83 that

E exp Xi2

= E exp 2c2 Xi2 /2c2

1
.
1 2c2

(7.20)

Thus,

M
Y

E exp

Xi2

i=1

1
1 2c2

M/2
(7.21)

and hence

P k X k22 M

By setting the derivative to zero and solving for

exp 2 2
1 2c2
,

one can show that the optimal

2 c2
.
(1 + )

2c2 2

!M/2
(7.22)

(7.23)

Plugging this in we obtain

P k X k22 M 2

M/2
2
2
.
exp
1

c2
c2

(7.24)

M/2
2
2
2 exp 1 2
.
c
c

(7.25)

Similarly,

P kX

k22

In order to combine and simplify these inequalities, note that if we dene

!
2
max 2 /c 1
= max 4, 2
(max 2 /c 1) log (max 2 /c)

2
any 0, max /c we have the bound

then we have that for

(7.26)

log () ( 1)

2( 1)
,

(7.27)

and hence

2( 1)
exp ( 1)

= 2 /c2 ,

By setting

!
.

(7.28)

(7.25) reduces to yield (7.16). Similarly, setting

= 2 /c2

establishes

(7.17).

This result tells us that given a vector with entries drawn according to a sub-Gaussian distribution, we

M 2 with exponentially high

choices for in (7.17) is limited to

can expect the norm of the vector to concentrate around its expected value of

M grows. Note, however, that the range of allowable

c2 / 2 1. Thus, for a general sub-Gaussian distribution, we may

probability as

be unable to achieve an arbitrarily

tight concentration. However, recall that for strictly sub-Gaussian distributions we have that

c2 = 2 ,

which there is no such restriction. Moreover, for strictly sub-Gaussian distributions we also have the following
3

useful corollary.

Corollary 7.1:
Suppose that

and for any

with

X = [X1 , X2 , ..., XM ],

where each

is i.i.d. with

Xi SSub 2

. Then

E k X k22 = M 2

(7.29)

M 2
2
2
2

P k X k2 M M 2exp

(7.30)

> 0,

= 2/ (1 log (2)) 6.52.

Proof:

Since each

Xi SSub 2

apply Theorem 7.2, p. 84 with

3 Corollary

Xi Sub 2 and E Xi2 = 2 , in which case we may
= 1 and = 1 + . This allows us to simplify and combine

, we have that

7.1, p. 85 exploits the strictness in the strictly sub-Gaussian distribution twice rst to ensure that

is an admissible range for

and then to simplify the computation of

(1, 2]

One could easily establish a dierent version of this

corollary for non-strictly sub-Gaussian vectors but for which we consider a more restricted range of

provided that

c2 / 2 < 2.

However, since most of the distributions of interest in this thesis are indeed strictly sub-Gaussian, we do not pursue this route.
Note also that if one is interested in very small

then there is considerable room for improvement in the constant

CHAPTER 7. APPENDICES

the bounds in (7.16) and (7.17) to obtain (7.30). The value of

1+2

so that we can set

follows from the observation that

max = 2.

Finally, from Corollary 7.1, p.

85 we also have the following additional useful corollary.

This result

generalizes the main results of [1] to the broader family of general strictly sub-Gaussian distributions via a
much simpler proof.

Corollary 7.2:
is
x RN .

Suppose that

Y = x

for

M N

ij are
x RN ,

matrix whose entries

Then for any

> 0,

and any

i.i.d.

with

ij SSub (1/M ).

Let

E k Y k22 =k x k22

(7.31)

M 2
P k Y k22 k x k22 k x k22 2exp

(7.32)

and

with

= 2/ (1 log (2)) 6.52.

Proof:
Let

i denote the ith row of .

Yi denotes the rst element of Y , then Yi =< i , x >,

Yi SSub k x k22 /M .
M -dimensional random vector Y , we obtain (7.32).

Observe that if

and thus by Lemma 2 from "Sub-Gaussian random variables" (Section 7.1),

Applying Corollary 7.1, p. 85 to the

7.3 Proof of the RIP for sub-Gaussian matrices

We now show how to exploit the concentration of measure (Section 7.2) properties of sub-Gaussian distributions (Section 7.1) to provide a simple proof that sub-Gaussian matrices satisfy the restricted isometry
property (Section 3.3) (RIP). Specically, we wish to show that by constructing an
random with

suciently large, then with high probability there exists a

K (0, 1)

M N

(1 K ) kxk22 kxk22 (1 + K ) kxk22

holds for all

x K

(where

matrix

such that

x with at most K nonzeros).

2K > 0, then we may set M = 2K

(7.33)

denotes the set of all signals

We begin by observing that if all we require is that

and draw a

according to a Gaussian distribution, or indeed any continuous univariate distribution. In this case, with

2K columns will be linearly independent, and hence all subsets of 2K columns

1 2K where 2K > 0. However, suppose we
wish
to know the constant 2K . In
N
N
order to nd the value of the constant we must consider all possible
K K -dimensional subspaces of R .
From a computational perspective, this is impossible for any realistic values of N and K . Moreover, in light
of the lower bounds we described earlier in this course, the actual value of 2K in this case is likely to be very
close to 1. Thus, we focus instead on the problem of achieving the RIP of order 2K for a specied constant
2K .
probability 1, any subset of
will be bounded below by

To ensure that the matrix will satisfy the RIP, we will impose two conditions on the random distribution.
First, we require that the distribution is sub-Gaussian. In order to simplify our argument, we will use the
simpler results stated in Corollary 2 from "Concentration of measure for sub-Gaussian random variables"
(Section 7.2), which we briey recall.

4 This

content is available online at <http://cnx.org/content/m37186/1.4/>.

Corollary 7.3:
is an M N matrix whose entries ij are
x RN . Then for any > 0, and any x RN ,

Suppose that

Y = x

for

i.i.d.

with

ij SSub (1/M ).

Let

E k Y k22 =k x k22

(7.34)

M 2
2
2
2

P k Y k2 k x k2 k x k2 2exp

(7.35)

and

with

= 2/ (1 log (2)) 6.52.

By exploiting this theorem, we assume that the distribution used to construct

is strictly

sub-Gaussian.

This is done simply to yield more concrete constants. The argument could easily be modied to establish
a similar result for general sub-Gaussian distributions by instead using Theorem 2 from "Concentration of
measure for sub-Gaussian random variables" (Section 7.2).
Our second condition is that we require that the distribution yield a matrix that is approximately normpreserving, which will require that

1
E 2ij =
,
M
and hence the variance is

(7.36)

1/M .

We shall now show how the concentration of measure inequality in Corollary 7.3, p.

86 can be used

together with covering arguments to prove the RIP for sub-Gaussian random matrices. Our general approach
will be to construct nets of points in each

K -dimensional subspace, apply (7.35) to all of these points through

K -dimensional signals.

a union bound, and then extend the result from our nite set of points to all possible

Thus, in order to prove the result, we will require the following upper bound on the number of points required
to construct the nets of points. (For an overview of results similar to Lemma 7.5, p. 87 and of various related
concentration of measure results, we refer the reader to the excellent introduction of [5].)

Lemma 7.5:
(0, 1) be given.
with k x k2 1 there is
Let

Proof:

Q such that |Q| (3/)

k x q k 2 .

There exists a set of points

a point

satisfying

and for any

x RK

q 1 RK
K
point qi R

with k q1 k2 1.
Q so that at step i we add a
with k qi k2 1
which satises k qi qj k2 > for all j < i. This continues until we can add no more points (and
K
hence for any x R
with k x k2 1 there is a point q Q satisfying k x q k2 .) Now we
wish to bound |Q|. Observe that if we center balls of radius /2 at each point in Q, then these
K
balls are disjoint and lie within a ball of radius 1 + /2. Thus, if B (r) denotes a ball of radius r
K
in R , then
We construct

in a greedy fashion. We rst select an arbitrary point

We then continue adding points to

|Q| Vol B K (/2) Vol B K (1 + /2)

(7.37)

and hence

|Q|

(B K (1+/2))

Vol

Vol(B K (/2))

(1+/2)K
(/2)K
K

(3/) .

We now turn to our main theorem, which is based on the proof given in [8].

(7.38)

CHAPTER 7. APPENDICES

Theorem 7.3:
Fix (0, 1).
SSub (1/M ). If

Let

M N

be an

random matrix whose entries

M 1 Klog

N
K

are i.i.d.

with

satises the RIP of order K with the prescribed

where 1 is arbitrary and 2 = /2 log (42e/) /1 .

then

(7.39)

with probability exceeding

1 2e2 M ,

Proof:

k x k2 = 1, since is linear. Next, x an

T {1, 2, ..., N } with |T | = K , and let XT denote the K -dimensional subspace spanned
by the columns of T . We choose a nite set of points QT such that QT XT , k q k2 1 for all
q QT , and for all x XT with k x k2 1 we have
First note that it is enough to prove (7.33) in the case

index set

min k x q k2 /14.

(7.40)

qQT

From Lemma 7.5, p. 87, we can choose such a set

process for each possible index set

and collect all the

|QT | (42/)
sets QT together:

with

. We then repeat this

QT .

(7.41)

T :|T |=K
There are

N
K

possible index sets

N
K

We can bound this number by

NK
N (N 1) (N 2) (N K + 1)

=
K!
K!

eN
K

K
,

where the last inequality follows since from Sterling's approximation we have
Hence

|Q| (42eN/K)

. Since the entries of

(7.42)

K! (K/e)

are drawn according to a strictly sub-Gaussian

distribution, from Corollary 7.3, p. 86 we have (7.35). We next use the union bound to apply (7.35)
to this set of points with

= / 2,

with the result that, with probability exceeding

1 2(42eN/K) eM

(7.43)

we have

1 / 2 k q k22 k q k22 1 + / 2 k q k22 ,
We observe that if

42eN
log
K
and thus (7.43) exceeds

(7.44)

satises (7.39) then

We now dene

for all q Q.

Klog

1 2e2 M

N
K

log

42e

M log (42e/)
1

(7.45)

as desired.

as the smallest number such that

k x k2

1 + Ak x k2 ,

for all x K , k x k2 1.

(7.46)

A . For this, we recall that for any x K with k x k2 1, we can

q Q such that k x q k2 /14 and such that x q K (since if x XT , we can pick
q QT XT satisfying k x q k2 /14). In this case we have
q

k x k2 k q k2 + k (x q) k2 1 + / 2 + 1 + A /14.
(7.47)
Our goal is to show that

pick a

Since by denition

is the smallest number for which (7.46) holds, we obtain

1 + / 2 + 1 + A /14.

1+A

Therefore

1+A

1 + / 2

1 /14

1 + ,

(7.48)

as desired. We have proved the upper inequality in (7.33). The lower inequality follows from this
since

k x k2 k q k2 k (x q) k2

1 / 2 1 + /14 1 ,

(7.49)

which completes the proof.

Above we prove above that the RIP holds with high probability when the matrix

is drawn according to a

strictly sub-Gaussian distribution. However, we are often interested in signals that are sparse or compressible
in some orthonormal basis

6= I ,

columns of

Theorem 7.3, p. 87 will establish the RIP for

distribution. This

universality

to satisfy the RIP. In this

K -dimensional subspaces spanned by sets of
for again drawn from a sub-Gaussian

in which case we would like the matrix

setting it is easy to see that by choosing our net of points in the

with respect to the sparsity-inducing basis is an attractive property that

was initially observed for the Gaussian distribution (based on symmetry arguments), but we can now see is
a property of more general sub-Gaussian distributions. Indeed, it follows that with high probability such a

will simultaneously satisfy the RIP with respect to an exponential number of xed bases.

7.4

`_1

minimization proof

We now establish one of the core lemmas that we will use throughout this course . Specically, Lemma 7.9, p.
90 is used in establishing the relationship between the RIP and the NSP (Section 3.4) as well as establishing
results on

minimization (Section 4.1) in the context of sparse recovery in both the noise-free (Section 4.2)

and noisy (Section 4.3) settings. In order to establish Lemma 7.9, p. 90, we establish the following preliminary
lemmas.

Lemma 7.6:
Suppose

u, v

are orthogonal vectors. Then

kuk2 + kvk2

Proof:
We begin by dening the

2ku + vk2 .
T

21 vector w = [kuk2 , kvk2 ]

(7.50)

`p norms
kwk1 2kwk2 .

. By applying standard bounds on

(Lemma 1 from "The RIP and the NSP" (Section 3.4)) with

K = 2,

we have

From this we obtain

kuk2 + kvk2
Since

and

are orthogonal,

q
2 kuk22 + kvk22 .

kuk22 + kvk22 = ku + vk22 ,

(7.51)

which yields the desired result.

Lemma 7.7:
If

satises the restricted isometry property (Section 3.3) (RIP) of order

of vectors

u, v K

|< u, v >| 2K kuk2 kvk2 .

5 This
6

2K ,

then for any pair

with disjoint support,

content is available online at <http://cnx.org/content/m37187/1.4/>.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

(7.52)

CHAPTER 7. APPENDICES

Proof:
Suppose u, v K with disjoint support
ku vk22 = 2. Using the RIP we have

and that

kuk2 = kvk2 = 1.

Then,

u v 2K

2 (1 2K ) ku vk22 2 (1 + 2K ) .

and

(7.53)

Finally, applying the parallelogram identity

1
ku + vk22 ku vk22 2K
4

|< u, v >|

(7.54)

establishes the lemma.

Lemma 7.8:

{1, 2, ..., N } such that |0 | K . For any vector u RN , dene

1 as the index set corresponding to the K largest entries of uc0 (in absolute value), 2 as the
index set corresponding to the next K largest entries, and so on. Then
Let

be an arbitrary subset of

kuj k2

kuc0 k1

.
K

(7.55)

Proof:
We begin by observing that for

j 2,
kuj k

since the

sort

u to have decreasing magnitude.

kuj1 k1
K
Applying standard bounds on

(7.56)

norms (Lemma

1 from "The RIP and the NSP" (Section 3.4)) we have

kuj k2

kuc k
1 X
kuj k1 = 0 1 ,
kuj k
K j1
K
j2

(7.57)

proving the lemma.

We are now in a position to prove our main result. The key ideas in this proof follow from [28].

Lemma 7.9:
Suppose that
that
of

|0 | K ,

hc0

Let 0 be an arbitrary subset of {1, 2, ..., N } such

1 as the index set corresponding to the K entries
= 0 1 . Then

satises the RIP of order

and let

h RN

2K .

be given. Dene

with largest magnitude, and set

khc k
|< h , h >|
kh k2 0 1 +
,
kh k2
K

(7.58)

where

22K
,
1 2K

1
.
1 2K

(7.59)

Proof:
Since

h 2K ,

the lower bound on the RIP immediately yields

(1 2K ) kh k22 kh k22 .

(7.60)

Dene

as in Lemma 7.8, p. 90, then since

h = h

j2 hj , we can rewrite (7.60) as

(1 2K ) kh k22 < h , h > < h ,

hj > .

(7.61)

j2
In order to bound the second term of (7.61), we use Lemma 7.7, p. 89, which implies that

< h , h > 2K kh k kh k ,
i
j
i 2
j 2
for any

i, j .

Furthermore, Lemma 7.6, p. 89 yields

kh0 k2 + kh1 k2

(7.62)

2kh k2 .

Substituting into

(7.62) we obtain

P

< h , j2 hj >

P
P

= j2 < h0 , hj > + j2 < h1 , hj >

P

P
j2 < h0 , hj > + j2 < h1 , hj >
P
P
2K kh0 k2 j2 khj k2 + 2K kh1 k2 j2 khj k2

P
22K kh k2 j2 khj k2 .

(7.63)

From Lemma 7.8, p. 90, this reduces to

X

kh c k
< h ,
22K kh k 0 1 .
h
>

j
2

K

j2

(7.64)

Combining (7.64) with (7.61) we obtain

(1 2K ) kh k22

P

< h , h > < h , j2 hj >

P

|< h , h >| + < h , j2 hj >

khc k
0 1
,
|< h , h >| + 22K kh k2 K

which yields the desired result upon rearranging.

(7.65)

GLOSSARY

Glossary

A
A matrix

satises the

null space property

(NSP) of order

if there exists a constant

C>0

such that,

holds for all

h N ()

khc k
kh k2 C 1
K
such that || K .

and for all

satises the restricted isometry property

K (0, 1) such that

A matrix

(3.3)

(RIP) of order

if there exists a

(1 K ) kxk22 kxk22 (1 + K ) kxk22 ,

holds for all

(3.9)

x K = {x : kxk0 K}.

A random variable

is called

strictly sub-Gaussian

X Sub 2

where

2 = E X 2

, i.e.,

the inequality

E (exp (Xt)) exp 2 t2 /2
t R. To denote
X SSub 2 .

holds for all

notation

A random variable

holds for all

t R.

is called

that

is strictly sub-Gaussian with variance

sub-Gaussian

We use the

(7.4)

if there exists a constant

E (exp (Xt)) exp c2 t2 /2

2
notation X Sub c
to denote

c>0

, we will use the

such that
(7.1)

that

satises

L
: RN RM denote a sensing matrix and : RM RN denote a recovery algorithm.
M
say that the pair (, ) is C -stable if for any x K and any e R
we have that

Let

k (x + e) xk2 Ckek.

(3.11)

T
The coherence of a matrix
columns

i , j

, (),

is the largest absolute inner product between any two

:
() =

The spark of a given matrix

7 http://cnx.org/content/m37185/latest/

max

|< i , j >|
.
i k2 k j k2

(3.45)

1i<jN k

is the smallest number of columns of

that are linearly dependent.

Bibliography
[1] D. Achlioptas. Database-friendly random projections. In

(PODS), Santa Barbara, CA, May 2001.

Proc. Symp. Principles of Database Systems

[2] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak.

matrices. In

[3] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak.

matrices. In

Toeplitz-structured compressed sensing

Proc. IEEE Work. Stat. Signal Processing, Madison, WI, Aug. 2007.

Toeplitz-structured compressed sensing

Proc. IEEE Work. Stat. Signal Processing, Madison, WI, Aug. 2007.

[4] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak. Compressive wireless sensing. In

Inform. Processing in Sensor Networks (IPSN), Nashville, TN, Apr. 2006.

[5] K. Ball.

Proc. Int. Symp.

An Elementary Introduction to Modern Convex Geometry, volume 31 of Flavors of Geometry.

MSRI Publ., Cambridge Univ. Press, Cambridge, England, 1997.

[6] R. Baraniuk. Compressive sensing.

IEEE Signal Processing Mag., 24(4):1188211;120, 124, 2007.

[7] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin.

property for random matrices.

[8] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin.

property for random matrices.
[9] R. Baraniuk and M. Wakin.

A simple proof of the restricted isometry

Const. Approx., 28(3):2538211;263, 2008.

A simple proof of the restricted isometry

Const. Approx., 28(3):2538211;263, 2008.

Random projections of smooth manifolds.

Found. Comput. Math.,

Random projections of smooth manifolds.

Found. Comput. Math.,

9(1):518211;77, 2009.
[10] R. Baraniuk and M. Wakin.
9(1):518211;77, 2009.
[11] D. Baron, S. Sarvotham, and R. Baraniuk. Sudocodes - fast measurement and reconstruction of sparse
signals. In

Proc. IEEE Int. Symp. Inform. Theory (ISIT), Seattle, WA, Jul. 2006.

[12] J. Bect, L. Blanc-Feraud, G. Aubert, and A. Chambolle. A -unied variational framework for image
restoration. In

Proc. European Conf. Comp. Vision (ECCV), Prague, Czech Republic, May 2004.

Sequential sparse matching pursuit. In Communication, Control, and

Computing, 2009. Allerton 2009. 47th Annual Allerton Conference on, page 368211;43. IEEE, 2010.

[13] R. Berinde and P. Indyk.

[14] R. Berinde, P. Indyk, and M. Ruzic. Practical near-optimal sparse recovery in the l1 norm. In

Allerton Conf. Communication, Control, and Computing, Monticello, IL, Sept. 2008.

[15] A. Beurling. Sur les int[U+FFFD]ales de fourier absolument convergentes et leur application
transformation fonctionelle. In

[U+FFFD]e

Proc. Scandinavian Math. Congress, Helsinki, Finland, 1938.

[16] A. Beurling. Sur les int[U+FFFD]ales de fourier absolument convergentes et leur application
transformation fonctionelle. In

Proc.

[U+FFFD]e

Proc. Scandinavian Math. Congress, Helsinki, Finland, 1938.

BIBLIOGRAPHY

[17] T. Blumensath and M. Davies. Iterative hard thresholding for compressive sensing.

Harmon. Anal., 27(3):2658211;274, 2009.

[18] P. Boufounos, M. Duarte, and R. Baraniuk.

measurements using cross validation.

Appl. Comput.

Sparse signal reconstruction from noisy compressive

Proc. IEEE Work. Stat. Signal Processing,

Madison, WI,

Aug. 2007.
[19] S. Boyd and L. Vanderberghe.

Convex Optimization.

Cambridge Univ. Press, Cambridge, England,

2004.
[20] Y. Bresler. Spectrum-blind sampling and compressive sensing for continuous-index signals. In

Work. Inform. Theory and Applications (ITA), San Diego, CA, Jan. 2008.

Proc.

[21] Y. Bresler and P. Feng. Spectrum-blind minimum-rate sampling and reconstruction of 2-d multiband
signals. In

Proc. IEEE Int. Conf. Image Processing (ICIP), Zurich, Switzerland, Sept. 1996.

[22] V. Buldygin and Y. Kozachenko.

Metric Characterization of Random Variables and Random Processes.

American Mathematical Society, Providence, RI, 2000.

[23] T. Cai and T. Jiang.

Limiting laws of coherence of random matrices with applications to testing

covariance structure and construction of compressed sensing matrices. Preprint, 2010.

[24] E. Cand[U+FFFD] Compressive sampling. In

Proc. Int. Congress of Math., Madrid, Spain, Aug. 2006.

[25] E. Cand[U+FFFD]

The restricted isometry property and its implications for compressed sensing.

[26] E. Cand[U+FFFD]

The restricted isometry property and its implications for compressed sensing.

[27] E. Cand[U+FFFD]

The restricted isometry property and its implications for compressed sensing.

[28] E. Cand[U+FFFD]

The restricted isometry property and its implications for compressed sensing.

Comptes rendus de l'Acad[U+FFFD]e des Sciences, S[U+FFFD]e I, 346(9-10):5898211;592, 2008.

[29] E.

Cand[U+FFFD]nd

Plan.

Near-ideal

model

selection

minimization.

Ann. Stat.,

37(5A):21458211;2177, 2009.
[30] E. Cand[U+FFFD]nd J. Romberg.
decompositions.

Quantitative robust uncertainty principles and optimally sparse

Found. Comput. Math., 6(2):2278211;254, 2006.

[31] E. Cand[U+FFFD]nd J. Romberg. Sparsity and incoherence in compressive sampling.

Inverse Problems,

23(3):9698211;985, 2007.
[32] E. Cand[U+FFFD] J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction
from highly incomplete frequency information.
[33] E. Cand[U+FFFD] J. Romberg, and T. Tao.
measurements.

Stable signal recovery from incomplete and inaccurate

Comm. Pure Appl. Math., 59(8):12078211;1223, 2006.

[34] E. Cand[U+FFFD] J. Romberg, and T. Tao.

measurements.

IEEE Trans. Inform. Theory, 52(2):4898211;509, 2006.

Stable signal recovery from incomplete and inaccurate

Comm. Pure Appl. Math., 59(8):12078211;1223, 2006.

[35] E. Cand[U+FFFD] M. Rudelson, T. Tao, and R. Vershynin. Error correction via linear programming.
In

Proc. IEEE Symp. Found. Comp. Science (FOCS), Pittsburg, PA, Oct. 2005.

[36] E. Cand[U+FFFD]nd T. Tao.

51(12):42038211;4215, 2005.

Decoding by linear programming.

IEEE Trans. Inform. Theory,

BIBLIOGRAPHY

[37] E. Cand[U+FFFD]nd T. Tao. Near optimal signal recovery from random projections: Universal encoding

IEEE Trans. Inform. Theory, 52(12):54068211;5425, 2006.

strategies?

[38] E. Cand[U+FFFD]nd T. Tao. Near optimal signal recovery from random projections: Universal encoding

IEEE Trans. Inform. Theory, 52(12):54068211;5425, 2006.

strategies?

[39] E. Cand[U+FFFD]nd T. Tao. The dantzig selector: Statistical estimation when is much larger than.

Ann. Stat., 35(6):23138211;2351, 2007.

[40] C. Carath[U+FFFD]ory.

[U+FFFD]er

den variabilit[U+FFFD]bereich der koezienten von potenzreihen,

Math. Ann., 64:958211;115, 1907.

die gegebene werte nicht annehmen.

[41] C. Carath[U+FFFD]ory.

[U+FFFD]er

positiven harmonischen funktionen.

den variabilit[U+FFFD]bereich der fourierschen konstanten von

Rend. Circ. Mat. Palermo, 32:1938211;217, 1911.

[42] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In

Coll. Autom. Lang. Programm., M[U+FFFD]ga, Spain, Jul. 2002.

[43] S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit.

Proc. Int.

SIAM J. Sci. Comp.,

20(1):338211;61, 1998.
[44] A. Cohen, W. Dahmen, and R. DeVore.
sensing.

Instance optimal decoding by thresholding in compressed

Int. Conf. Harmonic Analysis and Partial Dierential Equations,

Madrid, Spain, Jun.

2008.
[45] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best -term approximation.

J. Amer.

[46] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation.

J. Amer.

Math. Soc., 22(1):2118211;231, 2009.

[47] R. Coifman, F. Geshwind, and Y. Meyer.

Noiselets.

Appl. Comput. Harmon. Anal.,

10:278211;44,

2001.
[48] P. Combettes and J. Pesquet.
bases.

Proximal thresholding algorithm for minimization over orthonormal

SIAM J. Optimization, 18(4):13518211;1376, 2007.

[49] G. Cormode and M. Hadjieleftheriou. Finding the frequent items in streams of data.

Comm. ACM,

52(10):978211;105, 2009.
[50] G. Cormode and M. Hadjieleftheriou. Finding the frequent items in streams of data.

Comm. ACM,

52(10):978211;105, 2009.
[51] G. Cormode and S. Muthukrishnan. Improved data stream summaries: The count-min sketch and its
applications.

J. Algorithms, 55(1):588211;75, 2005.

[52] W. Dai and O. Milenkovic.

Subspace pursuit for compressive sensing signal reconstruction.

Trans. Inform. Theory, 55(5):22308211;2249, 2009.

IEEE

[53] S. Dasgupta and A. Gupta. An elementary proof of the johnson-lindenstrauss lemma. Technical report
TR-99-006, Univ. of Cal. Berkeley, Comput. Science Division, Mar. 1999.
[54] I. Daubechies, M. Defrise, and C. De Mol.
problems with a sparsity constraint.

An iterative thresholding algorithm for linear inverse

Comm. Pure Appl. Math., 57(11):14138211;1457, 2004.

[55] M. Davenport, P. Boufounos, M. Wakin, and R. Baraniuk. Signal processing with compressive measurements.

IEEE J. Select. Top. Signal Processing, 4(2):4458211;460, 2010.

BIBLIOGRAPHY

[56] M. Davenport, M. Duarte, M. Wakin, J. Laska, D. Takhar, K. Kelly, and R. Baraniuk. The smashed
lter for compressive classication and target recognition. In

Comp. Imag., San Jose, CA, Jan. 2007.

Proc. IS&T/SPIE Symp. Elec. Imag.:

[57] M. Davenport, C. Hegde, M. Duarte, and R. Baraniuk. Joint manifolds for data fusion.

Image Processing, 19(10):25808211;2594, 2010.

IEEE Trans.

[58] M. Davenport, J. Laska, P. Boufouons, and R. Baraniuk. A simple proof that random matrices are
democratic. Technical report TREE 0906, Rice Univ., ECE Dept., Nov. 2009.
[59] M. Davenport and M. Wakin. Analysis of orthogonal matching pursuit using the restricted isometry
property.

IEEE Trans. Inform. Theory, 56(9):43958211;4401, 2010.

[60] R. DeVore. Nonlinear approximation.

Acta Numerica, 7:518211;150, 1998.

[61] R. DeVore. Nonlinear approximation.

Acta Numerica, 7:518211;150, 1998.

[62] R.

DeVore.

Deterministic

constructions

compressed

sensing

matrices.

J. Complex.,

23(4):9188211;925, 2007.
[63] A. Dimakis, A. Sarwate, and M. Wainwright.
networks.

Geographic gossip:

Ecient aggregation for sensor

Proc. Int. Symp. Inform. Processing in Sensor Networks (IPSN),

Nashville, TN, Apr.

2006.
[64] D. Donoho. Denoising by soft-thresholding.

IEEE Trans. Inform. Theory, 41(3):6138211;627, 1995.

[65] D. Donoho. Neighborly polytopes and sparse solutions of underdetermined linear equations. Technical
report 2005-04, Stanford Univ., Stat. Dept., Jan. 2005.
[66] D. Donoho. Compressed sensing.

IEEE Trans. Inform. Theory, 52(4):12898211;1306, 2006.

[67] D. Donoho. For most large underdetermined systems of linear equations, the minimal -norm solution
is also the sparsest solution.

Comm. Pure Appl. Math., 59(6):7978211;829, 2006.

[68] D. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension.

Discrete and Comput. Geometry, 35(4):6178211;652, 2006.

[69] D. Donoho, I. Drori, Y. Tsaig, and J.-L. Stark. Sparse solution of underdetermined linear equations
by stagewise orthogonal matching pursuit. Preprint, 2006.
[70] D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via
l1 minimization.

Proc. Natl. Acad. Sci., 100(5):21978211;2202, 2003.

[71] D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via
minimization.

Proc. Natl. Acad. Sci., 100(5):21978211;2202, 2003.

[72] D. Donoho and C. Grimes.

dimensional data.

Hessian eigenmaps:

Locally linear embedding techniques for high-

Proc. Natl. Acad. Sci., 100(10):55918211;5596, 2003.

[73] D. Donoho, A. Maleki, and A. Montanari. Message passing algorithms for compressed sensing.

Proc.

[74] D. Donoho and J. Tanner. Neighborliness of randomly projected simplices in high dimensions.

Proc.

Natl. Acad. Sci., 106(45):189148211;18919, 2009.

Natl. Acad. Sci., 102(27):94528211;9457, 2005.

[75] D. Donoho and J. Tanner. Sparse nonnegative solutions of undetermined linear equations by linear
programming.

Proc. Natl. Acad. Sci., 102(27):94468211;9451, 2005.

BIBLIOGRAPHY

[76] D. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when the projection radically lowers dimension.

J. Amer. Math. Soc., 22(1):18211;53, 2009.

[77] D. Donoho and J. Tanner. Precise undersampling theorems.

[78] D.-Z. Du and F. Hwang.

Proc. IEEE, 98(6):9138211;924, 2010.

Combinatorial group testing and its applications.

World Scientic Publishing

Co., Singapore, 2000.

[79] M. Duarte and R. Baraniuk. Kronecker compressive sensing. Preprint, 2009.
[80] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk.
imaging via compressive sampling.

IEEE Signal Processing Mag., 25(2):838211;91, 2008.

[81] M. Duarte, M. Davenport, M. Wakin, and R. Baraniuk.

projections.

Single-pixel

Sparse signal detection from incoherent

Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),

Toulouse,

France, May 2006.

[82] M. Duarte, M. Davenport, M. Wakin, J. Laska, D. Takhar, K. Kelly, and R. Baraniuk.
random projections for compressive classication. In

Multiscale

Proc. IEEE Int. Conf. Image Processing (ICIP),

San Antonio, TX, Sept. 2007.

[83] M. Duarte, M. Wakin, and R. Baraniuk. Fast reconstruction of piecewise smooth signals from random
projections. In

Proc. Work. Struc. Parc. Rep. Adap. Signaux (SPARS), Rennes, France, Nov. 2005.

[84] M. F. Duarte, M. B. Wakin, D. Baron, and R. G. Baraniuk. Universal distributed sensing via random
projections. In

Int. Workshop on Inform. Processing in Sensor Networks (IPSN),

page 1778211;185,

Nashville, TN, Apr. 2006.

[85] M. Elad. Why simple shrinkage is still relevant for redundant representations?

Theory, 52(12):55598211;5569, 2006.

IEEE Trans. Inform.

Sparse and Redundant Representations: From Theory to Applications in Signal and Image
Processing. Springer, New York, NY, 2010.

[86] M. Elad.

[87] M. Elad, B. Matalon, J. Shtok, and M. Zibulevsky. A wide-angle view at iterated shrinkage algorithms.
In

Proc. SPIE Optics Photonics: Wavelets, San Diego, CA, Apr. 2007.

[88] M. Elad, B. Matalon, and M. Zibulevsky. Coordinate and subspace optimization methods for linear
least squares with non-quadratic regularization.

Appl. Comput. Harmon. Anal.,

23(3):3468211;367,

2007.
[89] Y. Erlich, N. Shental, A. Amir, and O. Zuk. Compressed sensing approach for high throughput carrier

Proc. Allerton Conf. Communication, Control, and Computing, Monticello, IL, Sept. 2009.

screen. In

[90] Y. Erlich, N. Shental, A. Amir, and O. Zuk. Compressed sensing approach for high throughput carrier

Proc. Allerton Conf. Communication, Control, and Computing, Monticello, IL, Sept. 2009.

screen. In

[91] V. Fedorov.
[92] P. Feng.

Theory of Optimal Experiments.

Academic Press, New York, NY, 1972.

Universal spectrum blind minimum rate sampling and reconstruction of multiband signals.

Ph. d. thesis, University of Illinois at Urbana-Champaign, Mar. 1997.

[93] P. Feng.

Universal spectrum blind minimum rate sampling and reconstruction of multiband signals.

Ph. d. thesis, University of Illinois at Urbana-Champaign, Mar. 1997.

[94] P. Feng and Y. Bresler.
signals.

May 1996.

Spectrum-blind minimum-rate sampling and reconstruction of multiband

Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),

Atlanta, GA,

BIBLIOGRAPHY

[95] P. Feng and Y. Bresler.

signals.

Spectrum-blind minimum-rate sampling and reconstruction of multiband

Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),

Atlanta, GA,

May 1996.
[96] M. Figueiredo and R. Nowak.

An em algorithm for wavelet-based image restoration.

Image Processing, 12(8):9068211;916, 2003.

[97] M. Figueiredo, R. Nowak, and S. Wright.

IEEE Trans.

Gradient projections for sparse reconstruction:

cation to compressed sensing and other inverse problems.

Appli-

IEEE J. Select. Top. Signal Processing,

1(4):5868211;597, 2007.
[98] A. Garnaev and E. Gluskin. The widths of euclidean balls.

Dokl. An. SSSR, 277:10488211;1052, 1984.

[99] M. Gehm, R. John, D. Brady, R. Willett, and T. Schultz. Single-shot compressive spectral imaging
with a dual disperser architecture.
[100] S. Gervsgorin.

[U+FFFD]er

Optics Express, 15(21):140138211;14027, 2007.

die abgrenzung der eigenwerte einer matrix.

Fiz.-Mat., 6:7498211;754, 1931.

Izv. Akad. Nauk SSSR Ser.

[101] A. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Near-optimal sparse fourier representations via sampling. In

Proc. ACM Symp. Theory of Comput., Montreal, Canada, May 2002.

[102] A. Gilbert and P. Indyk. Sparse recovery using sparse matrices.

[103] A. Gilbert, Y. Li, E. Porat, and M. Strauss.
measurements. In

Proc. IEEE, 98(6):9378211;947, 2010.

Approximate sparse recovery: Optimizaing time and

Proc. ACM Symp. Theory of Comput., Cambridge, MA, Jun. 2010.

[104] A. Gilbert, S. Muthukrishnan, and M. Strauss. Improved time bounds for near-optimal sparse fourier
representations. In

Proc. SPIE Optics Photonics: Wavelets, San Diego, CA, Aug. 2005.

[105] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. One sketch for all: Fast algorithms for compressed
sensing. In

Proc. ACM Symp. Theory of Comput., San Diego, CA, Jun. 2007.

[106] AC Gilbert, MJ Strauss, JA Tropp, and R. Vershynin. One sketch for all: fast algorithms for compressed
sensing.

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing,

page

2378211;246. ACM, 2007.

[107] I. Gorodnitsky, J. George, and B. Rao. Neuromagnetic source imaging with focuss: A recursive weighted
minimum norm algorithm.

Electroencephalography and Clinical Neurophysiology,

95(4):2318211;251,

1995.
[108] I. Gorodnitsky and B. Rao. Convergence analysis of a class of adaptive weighted norm extrapolation
algorithms. In

Proc. Asilomar Conf. Signals, Systems, and Computers, Pacic Grove, CA, Nov. 1993.

[109] I. Gorodnitsky, B. Rao, and J. George. Source localization in magnetoencephalagraphy using an iterative weighted minimum norm algorithm. In

Proc. Asilomar Conf. Signals, Systems, and Computers,

Pacic Grove, CA, Oct. 1992.

[110] E. Hale, W. Yin, and Y. Zhang. A xed-point continuation method for -regularized minimization with
applications to compressed sensing. Technical report TR07-07, Rice Univ., CAAM Dept., 2007.
[111] T. Hastie, R. Tibshirani, and J. Friedman.

The Elements of Statistical Learning.

Springer, New York,

NY, 2001.
[112] J. Haupt and R. Nowak. Signal reconstruction from noisy random projections.

Theory, 52(9):40368211;4048, 2006.

IEEE Trans. Inform.

BIBLIOGRAPHY

[113] C. Hegde, M. Wakin, and R. Baraniuk. Random projections for manifold learning. In

Neural Processing Systems (NIPS), Vancouver, BC, Dec. 2007.

[114] M. Herman and T. Strohmer.

High-resolution radar via compressed sensing.

Processing, 57(6):22758211;2284, 2009.

[115] P. Indyk. Explicit constructions for compressed sensing of sparse signals. In

on Discrete Algorithms (SODA), page 308211;33, Jan. 2008.

[116] P. Indyk and M. Ruzic. Near-optimal sparse recovery in the l1 norm. In

Comp. Science (FOCS), Philadelphia, PA, Oct. 2008.

Proc. Adv. in

IEEE Trans. Signal

Proc. ACM-SIAM Symp.

Proc. IEEE Symp. Found.

[117] S. Jafarpour, W. Xu, B. Hassibi, and R. Calderbank. Ecient and robust compressed sensing using
optimized expander graphs.

IEEE Trans. Inform. Theory, 55:42998211;4308, Sep. 2009.

[118] T. Jayram and D. Woodru.

Optimal bounds for johnson-lindenstrauss transforms and streaming

problems with sub-constant error. In

Proc. ACM-SIAM Symp. on Discrete Algorithms (SODA), San

Francisco, CA, Jan. 2011.

[119] S. Ji, Y. Xue, and L. Carin.

Bayesian compressive sensing.

IEEE Trans. Signal Processing,

56(6):23468211;2356, 2008.
[120] W. Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. In

Modern Anal. and Prob., New Haven, CT, Jun. 1982.

Proc. Conf.

[121] R. Kainkaryam, A. Breux, A. Gilbert, P. Woolf, and J. Schiefelbein. poolmc: Smart pooling of mrna
samples in microarray experiments.

BMC Bioinformatics, 11(1):299, 2010.

[122] R. Kainkaryam, A. Breux, A. Gilbert, P. Woolf, and J. Schiefelbein. poolmc: Smart pooling of mrna
samples in microarray experiments.

BMC Bioinformatics, 11(1):299, 2010.

[123] K. Kelly, D. Takhar, T. Sun, J. Laska, M. Duarte, and R. Baraniuk.

tispectral and confocal microscopy. In

Compressed sensing for mul-

Proc. American Physical Society Meeting,

Denver, CO, Mar.

2007.
[124] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. An interior point method for large-scale
-regularized least squares.

IEEE J. Select. Top. Signal Processing, 1(4):6068211;617, 2007.

[125] S. Kirolos, J. Laska, M. Wakin, M. Duarte, D. Baron, T. Ragheb, Y. Massoud, and R. Baraniuk.
Analog-to-information conversion via random demodulation. In

tems Work. (DCAS), Dallas, TX, Oct. 2006.

Proc. IEEE Dallas Circuits and Sys-

[126] V. Kotelnikov. On the carrying capacity of the ether and wire in telecommunications. In

Upr. Svyazi RKKA, Moscow, Russia, 1933.

[127] F. Krahmer and R. Ward.

Izd. Red.

New and improved johnson-lindenstrauss embeddings via the restricted

isometry property. Preprint, Sept. 2010.

[128] E. Lander. Array of hope.

Nature Genetics, 21:38211;4, 1999.

[129] J. Laska, P. Boufounos, M. Davenport, and R. Baraniuk. Democracy in action: Quantization, saturation, and compressive sensing. Preprint, 2009.
[130] J. Laska, S. Kirolos, M. Duarte, T. Ragheb, R. Baraniuk, and Y. Massoud. Theory and implementation
of an analog-to-information convertor using random demodulation. In

Circuits and Systems (ISCAS), New Orleans, LA, May 2007.

Proc. IEEE Int. Symposium on

BIBLIOGRAPHY

100

[131] M. Ledoux.

The Concentration of Measure Phenomenon. American Mathematical Society, Providence,

RI, 2001.
[132] S. Levy and P. Fullagar. Reconstruction of a sparse spike train from a portion of its spectrum and
application to high-resolution deconvolution.
[133] B. Logan.

Properties of High-Pass Signals.

Geophysics, 46(9):12358211;1243, 1981.

Ph. d. thesis, Columbia Universuty, 1965.

[134] M. Lustig, D. Donoho, and J. Pauly. Rapid mr imaging with compressed sensing and randomly undersampled 3dft trajectories. In

Proc. Annual Meeting of ISMRM, Seattle, WA, May 2006.

[135] M. Lustig, J. Lee, D. Donoho, and J. Pauly. Faster imaging with randomly perturbed, under-sampled
spirals and reconstruction. In

Proc. Annual Meeting of ISMRM, Miami, FL, May 2005.

[136] M. Lustig, J. Lee, D. Donoho, and J. Pauly.

spatio-temporal sparsity. In

k-t sparse:

High frame rate dynamic mri exploiting

Proc. Annual Meeting of ISMRM, Seattle, WA, May 2006.

[137] M. Lustig, J. Santos, J. Lee, D. Donoho, and J. Pauly. Application of compressed sensing for rapid mr
imaging. In

Proc. Work. Struc. Parc. Rep. Adap. Signaux (SPARS), Rennes, France, Nov. 2005.

[138] D. MacKay.

Information-based objective functions for active data selection.

Neural Comput.,

4:5908211;604, 1992.
[139] S. Mallat.

A Wavelet Tour of Signal Processing.

Academic Press, San Diego, CA, 1999.

[140] S. Mallat.

A Wavelet Tour of Signal Processing.

Academic Press, San Diego, CA, 1999.

[141] S. Mallat.

A Wavelet Tour of Signal Processing.

Academic Press, San Diego, CA, 1999.

[142] S. Mallat and Z. Zhang.

Matching pursuits with time-frequency dictionaries.

Processing, 41(12):33978211;3415, 1993.

IEEE Trans. Signal

[143] R. Marcia, Z. Harmany, and R. Willett. Compressive coded aperture imaging. In

Symp. Elec. Imag.: Comp. Imag., San Jose, CA, Jan. 2009.

Proc. IS&T/SPIE

[144] S. Mendelson, A. Pajor, and N. Tomczack-Jaegermann. Uniform uncertainty principle for bernoulli
and subgaussian ensembles.

Const. Approx., 28(3):2778211;289, 2008.

[145] O. Milenkovic, R. Baraniuk, and T. Simunic-Rosing.

novel dna microarray design. In

Compressed sensing meets bioinformatics: A

Proc. Work. Inform. Theory and Applications (ITA), San Diego, CA,

Jan. 2007.
[146] M. Mishali and Y. C. Eldar. Blind multi-band signal reconstruction: Compressed sensing for analog
signals.

IEEE Trans. Signal Processing, 57(3):9938211;1009, 2009.

[147] M. Mishali and Y. C. Eldar. From theory to practice: Sub-nyquist sampling of sparse wideband analog
signals.

IEEE J. Select. Top. Signal Processing, 4(2):3758211;391, 2010.

[148] C. De Mol and M. Defrise.

A note on wavelet-based inversion algorithms.

Contemporary Math.,

313:858211;96, 2002.

Data Streams: Algorithms and Applications, volume 1 of Found. Trends in Theoretical Comput. Science. Now Publishers, Boston, MA, 2005.

[149] S. Muthukrishnan.

Data Streams: Algorithms and Applications, volume 1 of Found. Trends in Theoretical Comput. Science. Now Publishers, Boston, MA, 2005.

[150] S. Muthukrishnan.

[151] D. Needell and J. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples.

Appl. Comput. Harmon. Anal., 26(3):3018211;321, 2009.

BIBLIOGRAPHY

101

[152] D. Needell and J. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples.

Appl. Comput. Harmon. Anal., 26(3):3018211;321, 2009.

[153] D. Needell and J. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples.

Appl. Comput. Harmon. Anal., 26(3):3018211;321, 2009.

[154] D. Needell and R. Vershynin. Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit.

Found. Comput. Math., 9(3):3178211;334, 2009.

[155] D. Needell and R. Vershynin.

Signal recovery from incomplete and inaccurate measurements via

regularized orthogonal matching pursuit.

IEEE J. Select. Top. Signal Processing,

4(2):3108211;316,

2010.
[156] R. Nowak and M. Figueiredo.

Fast wavelet-based image deconvolution using the em algorithm.

Proc. Asilomar Conf. Signals, Systems, and Computers, Pacic Grove, CA, Nov. 2001.

Trans. AIEE, 47:6178211;644, 1928.

[157] H. Nyquist. Certain topics in telegraph transmission theory.

[158] B. Olshausen and D. Field. Emergence of simple-cell receptive eld properties by learning a sparse
representation.

Nature, 381:6078211;609, 1996.

[159] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total
variation-based image restoration.

SIAM J. Multiscale Modeling and Simulation,

4(2):4608211;489,

2005.
[160] Y. Pati, R. Rezaifar, and P. Krishnaprasad. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In

Computers, Pacic Grove, CA, Nov. 1993.

[161] W. Pennebaker and J. Mitchell.

Proc. Asilomar Conf. Signals, Systems, and

JPEG Still Image Data Compression Standard.

Van Nostrand Rein-

hold, 1993.
[162] R. Prony.

Essai exp[U+FFFD]mental et analytique sur les lois de la dilatabilit[U+FFFD]s uides

[U+FFFD]stiques et sur celles de la force expansive de la vapeur de l'eau et de la vapeur de l'alkool,
[U+FFFD] [U+FFFD]ntes temp[U+FFFD]tures. J. de l'[U+FFFD]ole Polytechnique, Flor[U+FFFD] et

Prairial III, 1(2):248211;76, 1795.

R. Prony is Gaspard Riche, baron de Prony.

[163] M. Rabbat, J. Haupt, A. Singh, and R. Nowak.

randomized gossiping. In

Decentralized compression and predistribution via

Proc. Int. Symp. Inform. Processing in Sensor Networks (IPSN), Nashville,

TN, Apr. 2006.

[164] B. Rao. Signal processing with the sparseness constraint. In

and Signal Processing (ICASSP), Seattle, WA, May 1998.

Proc. IEEE Int. Conf. Acoust., Speech,

[165] R. Robucci, L. Chiu, J. Gray, J. Romberg, P. Hasler, and D. Anderson. Compressive sensing on a cmos
separable transform image sensor. In

(ICASSP), Las Vegas, NV, Apr. 2008.

Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing

[166] J. Romberg. Compressive sensing by random convolution.

SIAM J. Imag. Sci.,

2(4):10988211;1128,

SIAM J. Imag. Sci.,

2(4):10988211;1128,

2009.
[167] J. Romberg. Compressive sensing by random convolution.
2009.
[168] M. Rosenfeld.

The Mathematics of Paul Erd337;s II,

3188211;323. Springer, Berlin, Germany, 1996.

chapter In praise of the Gram matrix, page

BIBLIOGRAPHY

102

[169] S. Roweis and L. Saul.

Nonlinear dimensionality reduction by locally linear embedding.

Science,

290(5500):23238211;2326, 2000.
[170] S. Sarvotham, D. Baron, and R. Baraniuk. Compressed sensing reconstruction via belief propagation.
Technical report TREE-0601, Rice Univ., ECE Dept., 2006.
[171] M. Schena, D. Shalon, R. Davis, and P. Brown. Quantitative monitoring of gene expression patterns
with a complementary dna microarray.

Science, 270(5235):4678211;470, 1995.

[172] A. Schliep, D. Torney, and S. Rahmann.

decoding experiments. In
[173] C. Shannon.

Group testing with dna chips:

Generating designs and

Proc. Conf. Comput. Systems Bioinformatics, Stanford, CA, Aug. 2003.

Proc. Institute of Radio Engineers,

Communication in the presence of noise.

37(1):108211;21, 1949.
[174] M. Sheikh, O. Milenkovic, S. Sarvotham, and R. Baraniuk.

Compressed sensing dna microarrays.

Technical report TREE-0706, Rice Univ., ECE Dept., May 2007.

[175] M. Sheikh, S. Sarvotham, O. Milenkovic, and R. Baraniuk. Dna array decoding from nonlinear measurements by belief propagation.

Proc. IEEE Work. Stat. Signal Processing,

Madison, WI, Aug.

2007.
[176] M. Sheikh, S. Sarvotham, O. Milenkovic, and R. Baraniuk. Dna array decoding from nonlinear measurements by belief propagation.

Proc. IEEE Work. Stat. Signal Processing,

Madison, WI, Aug.

2007.
[177] N. Shental, A. Amir, and O. Zuk.
se(que)nsing.

[178] N. Shental, A. Amir, and O. Zuk.

se(que)nsing.

Identication of rare alleles and their carriers using compressed

Nucleic Acids Research, 38(19):e179, 2009.

Identication of rare alleles and their carriers using compressed

Nucleic Acids Research, 38(19):e179, 2009.

[179] J. P. Slavinsky, J. Laska, M. Davenport, and R. Baraniuk.

channel compressive sensing.

The compressive mutliplexer for multi-

Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing

(ICASSP), Prague, Czech Republic, May 2011.

[180] T. Strohmer and R. Heath. Grassmanian frames with applications to coding and communication.

Comput. Harmon. Anal., 14(3):2578211;275, Nov. 2003.

[181] D. Takhar, J. Laska, M. Wakin, M. Duarte, D. Baron, K. Kelly, and R. Baraniuk.

sensing camera: New theory and an implementation using digital micromirrors. In

Symp. Elec. Imag.: Comp. Imag., San Jose, CA, Jan. 2006.

Appl.

A compressed

Proc. IS&T/SPIE

[182] D. Takhar, J. Laska, M. Wakin, M. Duarte, D. Baron, S. Sarvotham, K. Kelly, and R. Baraniuk. A
new compressive imaging camera architecture using optical-domain compression. In

Symp. Elec. Imag.: Comp. Imag., San Jose, CA, Jan. 2006.

[183] D. Taubman and M. Marcellin.

tice.

Proc. IS&T/SPIE

JPEG 2000: Image Compression Fundamentals, Standards and Prac-

Kluwer, 2001.

[184] H. Taylor, S. Banks, and J. McCoy. Deconvolution with the norm.

Geophysics, 44(1):398211;52, 1979.

[185] J. Tenenbaum, V.de Silva, and J. Landford. A global geometric framework for nonlinear dimensionality
reduction.
[186] R.

Science, 290:23198211;2323, 2000.

Tibshirani.

Regression

58(1):2678211;288, 1996.

shrinkage

and

selection

via

the

lasso.

J. Royal Statist. Soc B,

BIBLIOGRAPHY
[187] R.

Tibshirani.

103

Regression

shrinkage

and

selection

via

the

lasso.

J. Royal Statist. Soc B,

58(1):2678211;288, 1996.
[188] M. Tipping. Sparse bayesian learning and the relevance vector machine.

J. Machine Learning Research,

1:2118211;244, 2001.
[189] M. Tipping and A. Faul. Fast marginal likelihood maximization for sparse bayesian models. In

Int. Conf. Art. Intell. Stat. (AISTATS), Key West, FL, Jan. 2003.

Proc.

[190] J. Tropp and A. Gilbert. Signal recovery from partial information via orthogonal matching pursuit.

IEEE Trans. Inform. Theory, 53(12):46558211;4666, 2007.

[191] J. Tropp and A. Gilbert. Signal recovery from partial information via orthogonal matching pursuit.

IEEE Trans. Inform. Theory, 53(12):46558211;4666, 2007.

[192] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk. Beyond nyquist: Ecient sampling of
sparse, bandlimited signals.

IEEE Trans. Inform. Theory, 56(1):5208211;544, 2010.

[193] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk. Beyond nyquist: Ecient sampling of
sparse, bandlimited signals.

IEEE Trans. Inform. Theory, 56(1):5208211;544, 2010.

[194] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk. Random lters for compressive sampling
and reconstruction.

Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),

Toulouse, France, May 2006.

[195] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk. Random lters for compressive sampling
and reconstruction.

Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP),

Toulouse, France, May 2006.

[196] J. A. Tropp. Norms of random submatrices and sparse approximation.

C. R. Acad. Sci. Paris, Ser. I,

346(238211;24):12718211;1274, 2008.
[197] J. A. Tropp. On the conditioning of random subdictionaries.

Appl. Comput. Harmon. Anal., 25:124,

2008.
[198] J. Trzasko and A. Manduca. Highly undersampled magnetic resonance image reconstruction via homotopic -minimization.
[199] V. Vapnik.
[200] R. Varga.

IEEE Trans. Med. Imaging, 28(1):1068211;121, 2009.

The Nature of Statistical Learning Theory.

Ger353;gorin and His Circles.

Springer-Verlag, New York, NY, 1999.

Springer, Berlin, Germany, 2004.

[201] S. Vasanawala, M. Alley, R. Barth, B. Hargreaves, J. Pauly, and M. Lustig. Faster pediatric mri via
compressed sensing.

Proc. Annual Meeting Soc. Pediatric Radiology (SPR),

Carlsbad, CA, Apr.

2009.
[202] R. Venkataramani and Y. Bresler. Further results on spectrum blind sampling of 2-d signals. In

IEEE Int. Conf. Image Processing (ICIP), Chicago, IL, Oct. 1998.

[203] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with nite rate of innovation.

Signal Processing, 50(6):14178211;1428, 2002.

Proc.

IEEE Trans.

[204] A. Wagadarikar, R. John, R. Willett, and D. Brady. Single disperser design for coded aperture snapshot
spectral imaging.

Appl. Optics, 47(10):B448211;51, 2008.

[205] M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk. An
architecture for compressive imaging.
GA, Oct. 2006.

Proc. IEEE Int. Conf. Image Processing (ICIP),

Atlanta,

BIBLIOGRAPHY

104

[206] M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk.
Compressive imaging for video representation and coding.

Proc. Picture Coding Symp.,

Beijing,

China, Apr. 2006.

[207] C.

Walker

and

Ulrych.

Autoregressive

recovery

the

acoustic

impedance.

Geophysics,

48(10):13388211;1350, 1983.
[208] W. Wang, M. Garofalakis, and K. Ramchandran. Distributed sparse random projections for renable
approximation. In

Proc. Int. Symp. Inform. Processing in Sensor Networks (IPSN), Cambridge, MA,

Apr. 2007.
[209] R.

Ward.

Compressive

sensing

with

cross

validation.

IEEE Trans. Inform. Theory,

55(12):57738211;5782, 2009.
[210] L. Welch. Lower bounds on the maximum cross correlation of signals.

IEEE Trans. Inform. Theory,

20(3):3978211;399, 1974.
[211] E. Whittaker. On the functions which are represented by the expansions of the interpolation theory.

Proc. Royal Soc. Edinburgh, Sec. A, 35:1818211;194, 1915.

[212] P. Wojtaszczyk. Stability and instance optimality for gaussian measurements in compressed sensing.

Found. Comput. Math., 10(1):18211;13, 2010.

[213] S. Wright, R. Nowak, and M. Figueiredo. Sparse reconstruction by separable approximation.

Trans. Signal Processing, 57(7):24798211;2493, 2009.

IEEE

[214] W. Yin, S. Osher, D. Goldfarb, and J. Darbon. Bregman iterative algorithms for -minimization with
applications to compressed sensing.

SIAM J. Imag. Sci., 1(1):1438211;168, 2008.

INDEX

105

Index of Keywords and Terms

Keywords are listed by the section with that keyword (page numbers are in parentheses).

Keywords

do not necessarily appear in the text of the page. They are merely associated with that section.
apples, 1.1 (1)

A
B

Terms are referenced by the page they appear on. Ex.

Analog to digital converter (ADC), 6.5(57)

Magnetoencephalography (MEG), 6.4(56)

Approximation, 2.4(11)

ell_p norms, 2.1(5)

Atoms, 2.2(7)

Error correcting codes, 5.5(51), 6.2(55)

Estimation, 6.9(73)

Basis, 2.2(7)
Belief propagation, 5.5(51), 6.11(78)
Best K-term approximation, 2.4(11)
Biosensing, 6.11(78)

Explicit constructions, 3.5(24)

F
G

Coherence, 3.6(26)

Group testing, 5.4(49), 6.3(56), 6.11(78)

Compressive imaging, 6.6(60)

Instance-optimality, 4.4(37)

Compressive sensing, 1.1(1), 3.1(15)

Concentration inequalities, 7.2(83)
Concentration of measure, 3.5(24), 7.2(83),
7.3(86)
Convex optimization, 5.2(42)

Inner products, 2.1(5)

Instance optimality, 3.2(16)

Compressive multiplexer, 3.5(24)

Compressive signal processing, 6.9(73)

Hilbert space, 2.2(7)

Hyperspectral imaging, 6.7(64)

Combinatorial algorithms, 5.4(49)

Compressibility, 1.1(1), 2.3(8), 2.4(11)

Gaussian noise, 4.3(33)

Greedy algorithms, 5.3(45)

Bregman iteration, 5.2(42)

Classication, 6.9(73)

Frame, 2.2(7)

Gershgorin circle theorem, 3.6(26)

Bounded noise, 4.3(33)

Instance-optimality in probability, 4.4(37)

Johnson-Lindenstrauss lemma, 3.3(18),

7.3(86)

L1 minimization, 4.2(31), 4.3(33),

Convex relaxation, 4.1(29)

4.4(37), 4.5(39), 5.2(42)

CoSaMP, 5.3(45)

Lasso, 5.2(42)

Count-median, 5.4(49)
Count-min, 5.4(49)

M Magnetic resonance imaging (MRI), 6.4(56)

Manifolds, 6.8(69)

Cross-polytope, 4.5(39)

Electroencephalography (EEG)

Analysis, 2.2(7)

Bayesian methods, 5.5(51)

Ex.

apples, 1

Matching pursuit, 5.3(45)

Dantzig selector, 4.3(33)

Measurement bounds, 3.3(18)

Data streams, 5.4(49), 6.3(56)

Model selection, 6.1(55)

Democracy, 3.5(24)

Modulated wideband converter, 3.5(24)

Detection, 6.9(73)

Multiband signal, 6.5(57)

Deterministic constructions, 3.5(24)

Dictionary, 2.2(7)

Nonlinear approximation, 1.1(1), 2.4(11)

Dimensionality reduction, 3.1(15)

Norms, 2.1(5)

Distributed compressive sensing, 6.10(76)

DNA microarray, 6.11(78)
Dual frame, 2.2(7)

Noise-free recovery, 4.2(31)

Null space property, 3.2(16), 3.4(22)

Orthogonal matching pursuit, 5.3(45)

Orthonormal basis, 2.2(7)

INDEX

106

p norms, 2.1(5)

Sparse recovery algorithms, 5.1(41)

Parseval frame, 2.2(7)

Sparse signal recovery, 4.2(31), 4.3(33),

Phase transition, 4.5(39)

7.4(89)

Power law decay, 2.4(11)

Sparsity, 1.1(1), 2.3(8)

Probabilistic guarantees, 4.4(37)

Stability in compressed sensing, 3.3(18)

Stagewise orthogonal matching pursuit,

Random convolution, 3.5(24)

5.3(45)

Random demodulator, 3.5(24), 6.5(57)

Strictly sub-Gaussian distributions, 7.1(81)

Random ltering, 3.5(24)

Sub-Gaussian distributions, 7.1(81),

Random matrices, 3.5(24), 7.2(83),

7.3(86)

Sub-Gaussian matrices, 3.5(24), 7.3(86)

Random projections, 6.8(69)

Sub-Gaussian random variables, 7.2(83)

Relevance vector machine, 5.5(51)

Restricted isometry property, 3.3(18),
3.4(22), 3.5(24), 3.6(26), 7.3(86)

Synthesis, 2.2(7)

Tight frame, 2.2(7)

Sensing, 1.1(1)
Sensing matrices, 3.1(15)
Sensing matrix design, 3.1(15)

The ell_0 norm, 2.1(5)

Transform coding, 1.1(1)

Uniform guarantees, 3.2(16), 4.4(37)

Sensor networks, 6.10(76)

Uniform uncertainty principle, 3.3(18)

Shrinkage, 5.2(42)

Universality, 3.5(24)

Signal acquisition, 1.1(1)

Signal recovery in noise, 4.3(33)

Sketching, 5.4(49), 6.3(56)

Spark, 3.2(16), 3.6(26)
Sparse Bayesian learning, 5.5(51)
Sparse linear regression, 6.1(55)
Sparse recovery, 1.1(1), 4.1(29), 5.4(49)

Vector spaces, 2.1(5)

Vectors, 2.1(5)

Single-pixel camera, 6.6(60), 6.7(64)

W Welch bound, 3.6(26)

`_0
`_1

minimization, 4.1(29)
minimization, 4.1(29), 7.4(89)

ATTRIBUTIONS
Attributions
Collection:

An Introduction to Compressive Sensing

Edited by: Richard Baraniuk, Mark A. Davenport, Marco F. Duarte, Chinmay Hegde
URL: http://cnx.org/content/col11133/1.5/
License: http://creativecommons.org/licenses/by/3.0/
Module: "Introduction to compressive sensing"
By: Mark A. Davenport, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk
URL: http://cnx.org/content/m37172/1.7/
Pages: 1-3
Copyright: Mark A. Davenport, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk
License: http://creativecommons.org/licenses/by/3.0/
Module: "Introduction to vector spaces"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37167/1.6/
Pages: 5-7
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Bases and frames"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37165/1.6/
Pages: 7-8
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sparse representations"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37168/1.5/
Pages: 8-10
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Compressible signals"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37166/1.5/
Pages: 11-14
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sensing matrix design"
By: Mark A. Davenport
URL: http://cnx.org/content/m37169/1.6/
Page: 15
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/

107

ATTRIBUTIONS

108
Module: "Null space conditions"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37170/1.6/
Pages: 16-18
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "The restricted isometry property"
By: Mark A. Davenport
URL: http://cnx.org/content/m37171/1.6/
Pages: 18-22
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "The RIP and the NSP"
By: Mark A. Davenport
URL: http://cnx.org/content/m37176/1.5/
Pages: 22-24
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Matrices that satisfy the RIP"
By: Mark A. Davenport
URL: http://cnx.org/content/m37177/1.5/
Pages: 24-25
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Coherence"
By: Marco F. Duarte
URL: http://cnx.org/content/m37178/1.5/
Pages: 26-27
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Signal recovery via

`_1

minimization"

By: Mark A. Davenport

URL: http://cnx.org/content/m37179/1.5/
Pages: 29-31
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Noise-free signal recovery"
By: Mark A. Davenport
URL: http://cnx.org/content/m37181/1.6/
Pages: 31-33
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Signal recovery in noise"
By: Mark A. Davenport
URL: http://cnx.org/content/m37182/1.5/
Pages: 33-37
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/

ATTRIBUTIONS
Module: "Instance-optimal guarantees revisited"
By: Mark A. Davenport
URL: http://cnx.org/content/m37183/1.6/
Pages: 37-39
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "The cross-polytope and phase transitions"
By: Mark A. Davenport
URL: http://cnx.org/content/m37184/1.5/
Pages: 39-40
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sparse recovery algorithms"
By: Chinmay Hegde
URL: http://cnx.org/content/m37292/1.3/
Page: 41
Copyright: Chinmay Hegde
License: http://creativecommons.org/licenses/by/3.0/
Module: "Convex optimization-based methods"
By: Wotao Yin
URL: http://cnx.org/content/m37293/1.5/
Pages: 42-44
Copyright: Wotao Yin
License: http://creativecommons.org/licenses/by/3.0/
Module: "Greedy algorithms"
By: Chinmay Hegde
URL: http://cnx.org/content/m37294/1.4/
Pages: 45-48
Copyright: Chinmay Hegde
License: http://creativecommons.org/licenses/by/3.0/
Module: "Combinatorial algorithms"
By: Mark A. Davenport, Chinmay Hegde
URL: http://cnx.org/content/m37295/1.3/
Pages: 49-51
Copyright: Mark A. Davenport, Chinmay Hegde
License: http://creativecommons.org/licenses/by/3.0/
Module: "Bayesian methods"
By: Chinmay Hegde, Mona Sheikh
URL: http://cnx.org/content/m37359/1.4/
Pages: 51-53
Copyright: Chinmay Hegde, Mona Sheikh
License: http://creativecommons.org/licenses/by/3.0/
Module: "Linear regression and model selection"
By: Mark A. Davenport
URL: http://cnx.org/content/m37360/1.3/
Page: 55
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/

109

110
Module: "Sparse error correction"
By: Mark A. Davenport
URL: http://cnx.org/content/m37361/1.3/
Pages: 55-56
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Group testing and data stream algorithms"
By: Mark A. Davenport
URL: http://cnx.org/content/m37362/1.4/
Page: 56
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Compressive medical imaging"
By: Mona Sheikh
URL: http://cnx.org/content/m37363/1.4/
Pages: 56-57
Copyright: Mona Sheikh
License: http://creativecommons.org/licenses/by/3.0/
Module: "Analog-to-information conversion"
By: Mark A. Davenport, Jason Laska
URL: http://cnx.org/content/m37375/1.4/
Pages: 57-60
Copyright: Mark A. Davenport, Jason Laska
License: http://creativecommons.org/licenses/by/3.0/
Module: "Single-pixel camera"
By: Marco F. Duarte, Mark A. Davenport
URL: http://cnx.org/content/m37369/1.4/
Pages: 60-64
Copyright: Marco F. Duarte, Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Hyperspectral imaging"
By: Marco F. Duarte
URL: http://cnx.org/content/m37370/1.4/
Pages: 64-69
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Compressive processing of manifold-modeled data"
By: Marco F. Duarte
URL: http://cnx.org/content/m37371/1.6/
Pages: 69-73
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Inference using compressive measurements"
By: Marco F. Duarte
URL: http://cnx.org/content/m37372/1.4/
Pages: 73-76
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/

ATTRIBUTIONS

ATTRIBUTIONS
Module: "Compressive sensor networks"
By: Marco F. Duarte
URL: http://cnx.org/content/m37373/1.3/
Pages: 76-77
Copyright: Marco F. Duarte
License: http://creativecommons.org/licenses/by/3.0/
Module: "Genomic sensing"
By: Mona Sheikh
URL: http://cnx.org/content/m37374/1.3/
Pages: 78-79
Copyright: Mona Sheikh
License: http://creativecommons.org/licenses/by/3.0/
Module: "Sub-Gaussian random variables"
By: Mark A. Davenport
URL: http://cnx.org/content/m37185/1.6/
Pages: 81-82
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Concentration of measure for sub-Gaussian random variables"
By: Mark A. Davenport
URL: http://cnx.org/content/m32583/1.7/
Pages: 83-86
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "Proof of the RIP for sub-Gaussian matrices"
By: Mark A. Davenport
URL: http://cnx.org/content/m37186/1.4/
Pages: 86-89
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/
Module: "`_1 minimization proof"
By: Mark A. Davenport
URL: http://cnx.org/content/m37187/1.4/
Pages: 89-91
Copyright: Mark A. Davenport
License: http://creativecommons.org/licenses/by/3.0/

111

About Connexions
Since 1999, Connexions has been pioneering a global system where anyone can create course materials and
make them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching and
learning environment open to anyone interested in education, including students, teachers, professors and
lifelong learners. We connect ideas and facilitate educational communities.
Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12
schools, distance learners, and lifelong learners.

Connexions materials are in many languages, including

English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part
of an exciting new information distribution system that allows for

Print on Demand Books.

Connexions

has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course
materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.

CompTIA+Network++ (N10 009) +Study+Guide
100% (12)
CompTIA+Network++ (N10 009) +Study+Guide
477 pages
Mobile Phone Unlock Codes
71% (14)
Mobile Phone Unlock Codes
8 pages
Mobile Secret Hack Codes
77% (78)
Mobile Secret Hack Codes
22 pages
Baofeng Bible Complete Book - Vdigital
100% (4)
Baofeng Bible Complete Book - Vdigital
137 pages
Comptia
100% (8)
Comptia
1,791 pages
Logic Design Theory by Nn Biswas
No ratings yet
Logic Design Theory by Nn Biswas
3 pages
A Discovery of Witches Deborah E. Harkness
84% (64)
A Discovery of Witches Deborah E. Harkness
283 pages
The Hacker Playbook 1 - Practical Guide To Penetration Testing
91% (11)
The Hacker Playbook 1 - Practical Guide To Penetration Testing
308 pages
The Baofeng Radio Bible (10 IN 1) The Definitive Guerrilla's Handbook To Master Your Baofeng Radio To Be Prepared For Any... (Cooper Hartman) (Z-Library)
100% (2)
The Baofeng Radio Bible (10 IN 1) The Definitive Guerrilla's Handbook To Master Your Baofeng Radio To Be Prepared For Any... (Cooper Hartman) (Z-Library)
200 pages
Deviant Devoted Companion Final Download
100% (3)
Deviant Devoted Companion Final Download
44 pages
Doblinger Matlab Course
No ratings yet
Doblinger Matlab Course
99 pages
Mobile Tricks Hacking Mobile Networks
100% (2)
Mobile Tricks Hacking Mobile Networks
3 pages
Honda Accord 2008-2010 Oem
100% (4)
Honda Accord 2008-2010 Oem
2,950 pages
COMPTIA Security Plus Master Cheat Sheet
No ratings yet
COMPTIA Security Plus Master Cheat Sheet
33 pages
AIM: To Design of LPC Filter Using Levinson-Durbin Algorithm
No ratings yet
AIM: To Design of LPC Filter Using Levinson-Durbin Algorithm
3 pages
EENG 479: Digital Signal Processing: Student Name: ID
No ratings yet
EENG 479: Digital Signal Processing: Student Name: ID
12 pages
Linear Integrated Circuits - S. Salivahanan and v. S. K. Bhaaskaran
No ratings yet
Linear Integrated Circuits - S. Salivahanan and v. S. K. Bhaaskaran
79 pages
Ece V Digital Signal Processing (10ec52) Notes
No ratings yet
Ece V Digital Signal Processing (10ec52) Notes
161 pages
Universal Codes For Cell Phones and Advanced Tricks
80% (5)
Universal Codes For Cell Phones and Advanced Tricks
30 pages
American Survival Guide, Prepper Survival Field Manual - Spring 2017
100% (12)
American Survival Guide, Prepper Survival Field Manual - Spring 2017
132 pages
Star Codes
100% (1)
Star Codes
31 pages
Numerical Reasoning Tests: Don't Lose Out On That Job. Practice Aptitude Tests Today
100% (1)
Numerical Reasoning Tests: Don't Lose Out On That Job. Practice Aptitude Tests Today
3 pages
Muhammad A Prophet For Our Time PDF
No ratings yet
Muhammad A Prophet For Our Time PDF
24 pages
James F. Alexander, Holly Barrett Waldron, Michael S. Robbins, and Andrea A. Neeb-Functional Family Therapy For Adolescent Behavior Problems-American Psychological Association (AP
100% (3)
James F. Alexander, Holly Barrett Waldron, Michael S. Robbins, and Andrea A. Neeb-Functional Family Therapy For Adolescent Behavior Problems-American Psychological Association (AP
269 pages
Working With Grid Coordinates Richard J. Sincovec, LSI
0% (1)
Working With Grid Coordinates Richard J. Sincovec, LSI
26 pages
Compressive Sensing
No ratings yet
Compressive Sensing
53 pages
GTU PHD Core Syllabus CMOS Analog Circuit Design
No ratings yet
GTU PHD Core Syllabus CMOS Analog Circuit Design
1 page
Digital Voltmeter Using PIC Microcontroller
No ratings yet
Digital Voltmeter Using PIC Microcontroller
7 pages
(Ebook) Principles of Linear Systems and Signals by B. P. Lathi ISBN 9780198062271, 0198062273 - Read the ebook now or download it for a full experience
100% (1)
(Ebook) Principles of Linear Systems and Signals by B. P. Lathi ISBN 9780198062271, 0198062273 - Read the ebook now or download it for a full experience
51 pages
Basic Electronics 131101
No ratings yet
Basic Electronics 131101
2 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages
Basics of Digital System Design For Beginners ISBN:978-81-957614-4-9
No ratings yet
Basics of Digital System Design For Beginners ISBN:978-81-957614-4-9
8 pages
VLSI Design
No ratings yet
VLSI Design
19 pages
657186
No ratings yet
657186
313 pages
Automatic Car Parking System
No ratings yet
Automatic Car Parking System
14 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
Ece-Vii-Image Processing U3
No ratings yet
Ece-Vii-Image Processing U3
7 pages
FPGA
0% (1)
FPGA
21 pages
CP and DSA - RESOURCES
No ratings yet
CP and DSA - RESOURCES
5 pages
POWER ELECTRONICS 2015 Note For VTU Electrical and Electronics Semester 4
No ratings yet
POWER ELECTRONICS 2015 Note For VTU Electrical and Electronics Semester 4
265 pages
FM Generation Detection
100% (1)
FM Generation Detection
29 pages
Multirate Signal Processing 1.4
No ratings yet
Multirate Signal Processing 1.4
45 pages
FUNDAMENTALS OF ROBOTICS Analysis and Control: Engineering
No ratings yet
FUNDAMENTALS OF ROBOTICS Analysis and Control: Engineering
1 page
Sensors and Actuator Notes
No ratings yet
Sensors and Actuator Notes
5 pages
Signal Coding and Estimation Theory Practical
100% (2)
Signal Coding and Estimation Theory Practical
11 pages
Computer Vision Course
No ratings yet
Computer Vision Course
552 pages
Sensors and Signal Conditioning, Second Edition: January 2001
No ratings yet
Sensors and Signal Conditioning, Second Edition: January 2001
3 pages
Signal and Systems - CLO
No ratings yet
Signal and Systems - CLO
3 pages
DSP Project
No ratings yet
DSP Project
23 pages
Diffamp
100% (2)
Diffamp
14 pages
Op-Amps and Linear Integrated Circuit Technology by Ramakant a. Gayakwad
No ratings yet
Op-Amps and Linear Integrated Circuit Technology by Ramakant a. Gayakwad
552 pages
Adaptive Radar Signal Processing 1st Edition Simon Haykin - The latest ebook edition with all chapters is now available
100% (1)
Adaptive Radar Signal Processing 1st Edition Simon Haykin - The latest ebook edition with all chapters is now available
50 pages
Ramakant Gayakwad - Opamp & Linear Intigrated Circuits
No ratings yet
Ramakant Gayakwad - Opamp & Linear Intigrated Circuits
553 pages
A Review of Sliding Mode Observer Based Sensorless Control Methods For PMSM Drive
No ratings yet
A Review of Sliding Mode Observer Based Sensorless Control Methods For PMSM Drive
16 pages
Artificial Neural Networks Yegnanarayana PDF Downloadgolkes PDF
No ratings yet
Artificial Neural Networks Yegnanarayana PDF Downloadgolkes PDF
2 pages
Introduction To HDL
No ratings yet
Introduction To HDL
72 pages
Digital Signal Processing by S Salivahanan PDF Free
No ratings yet
Digital Signal Processing by S Salivahanan PDF Free
655 pages
VLSI Signal Processing (Major Elective-II)
No ratings yet
VLSI Signal Processing (Major Elective-II)
2 pages
Signals Sampling Theorem
No ratings yet
Signals Sampling Theorem
3 pages
Mesh Warping
No ratings yet
Mesh Warping
6 pages
Course File Neural Network
No ratings yet
Course File Neural Network
13 pages
FPGA Implementation of CORDIC-Based QRD-RLS Algorithm
No ratings yet
FPGA Implementation of CORDIC-Based QRD-RLS Algorithm
5 pages
Principles of Communications: Chapter 5: Digital Transmission Through Bandlimited Channels
No ratings yet
Principles of Communications: Chapter 5: Digital Transmission Through Bandlimited Channels
68 pages
Question Bank EE V S&S
No ratings yet
Question Bank EE V S&S
10 pages
Signals and Systems: 18EC45 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Signals and Systems: 18EC45 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
3 pages
Digital Signal Processing WBUT Semester Questions
No ratings yet
Digital Signal Processing WBUT Semester Questions
17 pages
Appendix
No ratings yet
Appendix
19 pages
DSD Using VHDL
100% (1)
DSD Using VHDL
54 pages
CP16036
No ratings yet
CP16036
6 pages
Matlab: Contact: Email
No ratings yet
Matlab: Contact: Email
12 pages
SDE 1st Intro To Statistical Signal Processing
No ratings yet
SDE 1st Intro To Statistical Signal Processing
463 pages
Orcad / Pspice Simulator - 7400 Library - 7408, 7432 & 7486 Simulation Settings: Analysis Type - Time Domain
No ratings yet
Orcad / Pspice Simulator - 7400 Library - 7408, 7432 & 7486 Simulation Settings: Analysis Type - Time Domain
6 pages
Computer Aided Design of Electrical Machines
From Everand
Computer Aided Design of Electrical Machines
K.M. Vishnu Murthy
No ratings yet
An Introduction To Compressive Sensing
No ratings yet
An Introduction To Compressive Sensing
118 pages
Introduction To Compressive Sensing 2.5
No ratings yet
Introduction To Compressive Sensing 2.5
118 pages
Sparse Representations For Radar With MATLAB Examples (Peter Knee) (Z-Library)
No ratings yet
Sparse Representations For Radar With MATLAB Examples (Peter Knee) (Z-Library)
87 pages
Robust Compressed Sensing
No ratings yet
Robust Compressed Sensing
45 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
BBC 2009Ch1
No ratings yet
BBC 2009Ch1
70 pages
Image Segmentation and Pre-Processing: Goals
No ratings yet
Image Segmentation and Pre-Processing: Goals
29 pages
Linear Algebra 2005
No ratings yet
Linear Algebra 2005
3 pages
Belongie Iccv01
No ratings yet
Belongie Iccv01
8 pages
Tesseract OSCON
No ratings yet
Tesseract OSCON
22 pages
Scs Mit Feb18
No ratings yet
Scs Mit Feb18
38 pages
ARRL Extra Class License Study Guied
75% (4)
ARRL Extra Class License Study Guied
500 pages
Ham Radio For The New Ham What To Do The Minute You Get Your Amateur Radio License - Stan Merrill
50% (2)
Ham Radio For The New Ham What To Do The Minute You Get Your Amateur Radio License - Stan Merrill
137 pages
Samsung Codes
100% (2)
Samsung Codes
3 pages
04-Datalink, Diagnostic CAN
100% (4)
04-Datalink, Diagnostic CAN
17 pages
NSO-184 Notes
No ratings yet
NSO-184 Notes
17 pages
Popular Electronics 1961-02
100% (2)
Popular Electronics 1961-02
136 pages
S and S Emissionssensorscatalog 2021
No ratings yet
S and S Emissionssensorscatalog 2021
20 pages
Levels in Digital Audio
No ratings yet
Levels in Digital Audio
8 pages
F 150 Wiring Harness
No ratings yet
F 150 Wiring Harness
8 pages
Ham Radio An Easy Guide For Beginners - Steve Markelo
100% (4)
Ham Radio An Easy Guide For Beginners - Steve Markelo
10 pages
Metro Settings
100% (1)
Metro Settings
8 pages
Practical Guide To Radio-Frequency Analysis and Design
No ratings yet
Practical Guide To Radio-Frequency Analysis and Design
49 pages
CompTIA Network+ v1.0 (N10-008) - Full Access
100% (1)
CompTIA Network+ v1.0 (N10-008) - Full Access
34 pages
Stealth Amateur Radio
No ratings yet
Stealth Amateur Radio
7 pages
Wireless Hacking
100% (2)
Wireless Hacking
19 pages
Charles de Koninck, Jacob Klein, and Socratic Logocentrism
No ratings yet
Charles de Koninck, Jacob Klein, and Socratic Logocentrism
5 pages
Built Heritage Assessment 3 Outline
No ratings yet
Built Heritage Assessment 3 Outline
2 pages
Study and apply the Pasqill-Gifford puff model to calculate the dispersion of the hazard substance in the air for assessing the environmental risk caused by the receiver of the Liquid Petroleum Gas (LPG)
50% (2)
Study and apply the Pasqill-Gifford puff model to calculate the dispersion of the hazard substance in the air for assessing the environmental risk caused by the receiver of the Liquid Petroleum Gas (LPG)
7 pages
Grammar: 6A The Passive Be + Past Participle
No ratings yet
Grammar: 6A The Passive Be + Past Participle
1 page
Book Review: The Empathic Civilization: The Race To Global Consciousness in A World in Crisis
No ratings yet
Book Review: The Empathic Civilization: The Race To Global Consciousness in A World in Crisis
4 pages
Spir Star Duralife Flex Hose Type 4/2 Data Sheet
No ratings yet
Spir Star Duralife Flex Hose Type 4/2 Data Sheet
1 page
Student's Favorite Activities : Graph or Chart Description
No ratings yet
Student's Favorite Activities : Graph or Chart Description
6 pages
FV-Unit 45 Assignment Brief Spring 2023
No ratings yet
FV-Unit 45 Assignment Brief Spring 2023
8 pages
Planning For Big Data
No ratings yet
Planning For Big Data
84 pages
PM Presentation
No ratings yet
PM Presentation
17 pages
Definitions of Syncopation: Analysis, 2
No ratings yet
Definitions of Syncopation: Analysis, 2
2 pages
Thought Experiments The Ship of Theseus
No ratings yet
Thought Experiments The Ship of Theseus
10 pages
BSBRSK501 Learner Assessment Tasks
No ratings yet
BSBRSK501 Learner Assessment Tasks
31 pages
Cues/ Data Cues/ Data: Family Nursing Problem Family Nursing Problem
No ratings yet
Cues/ Data Cues/ Data: Family Nursing Problem Family Nursing Problem
14 pages
Centralization & Decentralization
No ratings yet
Centralization & Decentralization
14 pages
Human Rights Topics For Preschool and Lower Primary School
No ratings yet
Human Rights Topics For Preschool and Lower Primary School
9 pages
Synchro Flex
No ratings yet
Synchro Flex
19 pages
METHODOLOGY
No ratings yet
METHODOLOGY
31 pages
Unit 6
No ratings yet
Unit 6
51 pages
Command Manager
100% (3)
Command Manager
52 pages
2406 9MA0-01 a Level Pure Mathematics - June 2024 PDF PDF Mathematics Applied Mathematics
No ratings yet
2406 9MA0-01 a Level Pure Mathematics - June 2024 PDF PDF Mathematics Applied Mathematics
1 page

Compressed Sensing

Uploaded by

Compressed Sensing

Uploaded by

An Introduction to Compressive Sensing

An Introduction to Compressive Sensing

It is licensed under the Creative Commons Attribution 3.0 license

1.1 Introduction to compressive sensing

and typically relies on nding a basis or frame that provides

representations for signals in a class of interest. By a sparse representation, we mean

that for a signal of length

we can represent it with

nonzero coecients; by a compressible

representation, we mean that the signal is well-approximated by a signal with only

sparse approximation, and forms

(CS) has emerged as a new framework

CS enables a potentially large reduction in the sampling and

content is available online at <http://cnx.org/content/m37172/1.7/>.

be able to do better than suggested by classical results.

This is the fundamental idea behind CS: rather

sense the data in a compressed form  i.e., at a lower sampling rate.

The eld of CS grew out

any K sinusoids is uniquely determined by its value at t = 0 and

showed that a positive linear combination of

suciently large piece of its

norm among all signals agreeing with

exactly from a small set of linear, nonadaptive measurements.

Note, however, that CS diers from classical sampling in two important

often plays a key role in the design of these test functions.

i.e., the problem

We overview the key results in the

Compressed Sensing: Theory and Applications, Cambridge

University Press, 2011.

Sparse and Compressible Signal Models

2.1 Introduction to vector spaces

This captures the linear structure that

of vector spaces see this introductory course in Digital Signal Processing .

normed vector spaces,

i.e., vector spaces endowed with a

N -dimensional Euclidean space,

In Euclidean space we can also consider the standard

In some contexts it is useful to extend the notion of

to the case where

In this case, the

kxk0 := |supp (x) |,

content is available online at <http://cnx.org/content/m37167/1.6/>.

An Introduction to Compressive Sensing <http://cnx.org/content/col11133/latest/>

denotes the support of

CHAPTER 2. SPARSE AND COMPRESSIBLE SIGNAL MODELS

denotes the cardinality of

is not even a quasinorm, but one can

easily show that

lim kxk0 xp = |supp (x) |,

(quasi-)norms have notably dierent properties for dierent values

To illustrate this, in Figure 2.1 we show the unit sphere, i.e.,

. Note that for

induced by each of these

the corresponding unit sphere is nonconvex (reecting the quasinorm's

violation of the triangle inequality).

and for the

and wish to approximate it using a point in a one-dimensional ane

If we measure the approximation error using an

norm, then our task is to nd the

norm, we can imagine growing an

will have a signicant eect on the properties of the resulting

until it intersects with

norm. We observe that larger

This will be the

tends to spread out

leads to an error that is more unevenly

by a a one-dimensional subspace using the `p norms

2.2 Bases and frames

the set span

Mathematically, any signal

may be expressed as,

are the vectors that constitute our dual

x. Here, we call our dual

(used to reconstruct our signal by (2.4)) and

More formally, a frame is a set of vectors

, such that for all vectors

and typically relies on nding a basis or frame that provides

nonzero coecients; by a compressible

sense the data in a compressed form i.e., at a lower sampling rate.

The eld of CS grew out

suciently large piece of its

Note, however, that CS diers from classical sampling in two important

(quasi-)norms have notably dierent properties for dierent values

the corresponding unit sphere is nonconvex (reecting the quasinorm's

and wish to approximate it using a point in a one-dimensional ane

norm, then our task is to nd the

will have a signicant eect on the properties of the resulting

provides a dierent choice of a coecient vector

is well-dened. Thus, one way to

obtain a set of feasible coecients is via

Driving the number of coecients needed even

are the coecients of

coecients. The best

space to which the signal belongs. An innite sequence

the norm is actually a pseudo-norm and counts the number of

balls decreases with

Suppose that a signal is sampled innitely nely, and call it

norm, its coecients must have a power-law rate of decay with

is a nite-length vector with a discrete-