Sparse Binary Matrices in Compressed Sensing
Sparse Binary Matrices in Compressed Sensing
M. A. IWEN
DEPARTMENT OF MATHEMATICS
MICHIGAN STATE UNIVERSITY
619 RED CEDAR ROAD
EAST LANSING, MI 48824
EMAIL: MARKIWEN@[Link]
1. Introduction
Noisy group testing problems generally involve designing pooling schemes which use as few
expensive tests as possible in order to identify a small number of important elements from a large
universe, U, of items (see, e.g., [13]). In this setting each one of the expensive tests in question
corresponds to observing the result of an experiment, or calculation, performed on a different subset
of U. If each test is sufficiently sensitive to the small number of hidden items in U that must be
identified, one might hope that testing a correspondingly small number of subsets of U in bulk
would still allow the hidden elements to be found. Thus, designing a pooling scheme corresponds
to choosing a good collection of subsets of U to observe so that tests performed on these subsets
will always allow one to discover a small number of important elements hidden within U.
Many data mining tasks can be cast in a similar framework – that is – as problems concerned with
identifying a small number of interesting items from a tremendously large group without exceeding
certain resource constraints (e.g., without using too much memory, communication power, runtime,
etc.). Specific examples include the sketching and monitoring of heavy-hitters in high-volume
data streams [5, 7], source localization in sensor networks [25], and the design of high throughput
sequencing schemes for biological specimen analysis [14]. Note that pooling schemes in many such
group testing related applications naturally correspond to binary matrices (i.e., because each row
of the binary matrix, r ∈ {0, 1}N , selects a subset for testing/observation). Furthermore, it is
generally better for these binary binary matrices to have a small number of nonzero entries in each
This research was supported in part by NSA grant H98230-13-1-0275. The majority of the work reported on herein
was completed while the author was a visiting assistant professor at Duke University.
column (i.e., because this reduces the number of times each item in U must be tested/observed).
Thus, we focus on designing sparse binary measurement matrices herein.
Roughly speaking, one can cast many such applications as a type of compressed sensing [12]
problem. The large set containing the small number of important elements we want to identify is
modeled as a vector, x ∈ RN . The nth entry in the vector, xn , is a real number which indicates
the “importance” of the nth set element (the larger the magnitude, the more important). Our
goal is now to locate k N of the largest magnitude entries of x (i.e., the important elements).
Unfortunately, for reasons that vary with the specific problem at hand (e.g., because only o(N )
memory is available in the massive data stream context), we are allowed to store just m N
linear measurements of x which we must compute during a single pass over its entries. The m
linear measurement operators are represented as a measurement matrix, M ∈ Rm×N . Having
access to only M x ∈ Rm , we seek to identify, and then estimate, the k largest magnitude entries of
x. This identification and estimation is performed by a sparse recovery algorithm, A : Rm → RN ,
which (implicitly) returns a vector in RN having O(k) nonzero entries. We prefer A to be fast,
especially for applications involving massive data sets.
In this paper we consider the design of sparse matrices M ∈ {0, 1}m×N , with m N , together
with associated nonlinear functions, A : Rm → RN , which have the property that A (M x) ≈ x for
all vectors x ∈ RN that are well approximated by their best k-term approximation,
(1) xopt
k := arg min kx − yk2 .1
y∈ RN ,kyk0 ≤k
More specifically, we will focus on designing (M, A)-pairs which achieve error guarantees of the
form
(2) kx − A (M x)kp ≤ min kx − ykp + Cp,q · k 1/p−1/q kx − ykq
y∈ RN ,kyk0 ≤k
for constants 1 ≤ q ≤ p ≤ 2, and Cp,q ∈ R+ (e.g., see [6, 15]). We will refer to such an error
guarantee as an “`p , `q ” error guarantee below.
Over the past several years this type of design problem has achieved a considerable amount of
attention under the moniker of “compressed sensing” (e.g., see [15, 12, 4, 10, 3, 2, 20], and references
therein). Most compressed sensing papers – this one included – generate their measurement ma-
trices, M , randomly. This leads to two different probabilistic models in which the aforementioned
“`p , `q ” error guarantees may hold. In the first model, a single randomly generated measurement
matrix, M , is shown to satisfy (2) for all x ∈ RN with high probability. We will refer to this as the
“for all” model. In the second model, a randomly generated measurement matrix is shown to satisfy
(2) for each given x ∈ RN with high probability (assuming that M is generated independently of
x). We will refer to this second model as the “for each” model. All results proven herein are proven
in the second, “for each”, model.
1.1. Results and Related Work. Any sparse recovery algorithm, A, that achieves either an
“`1 , `1 ”, “`2 , `1 ”, or “`2 , `2 ” error guarantee in the “for each” model must use an associated mea-
surement matrix, M , having at least m ≥ Ck log(N/k)) rows [11, 23].2 Note that this implies an
Ω(k log(N/k)) lower runtime complexity bound for the recovery algorithm, A. It remains an open
problem to prove (or disprove) the existence of a O(k log N )-time recovery algorithm achieving
any such error guarantee. In this paper we present a compressed sensing matrix/recovery algo-
rithm pair, (M, A), with an “`2 , `1 ” guarantee, where A runs in O((k log k) log N )-time − a single
O(log k)-factor from the known lower bound. We also present two other compressed sensing results
1Here kyk denotes the number of nonzero entries in y ∈
0 R N
, while kykp denotes the standard `p -norm for all
R
P 1/p
N −1 p N
p ≥ 1, i.e., kykp = n=0 |yn | for all y ∈ .
2C will always represent an absolute constant.
2
Paper Measurements, m Runtime of A Error Guarantee
[17] k log≥2 N k 2 log≥2 N `2 , `1 X
[22] k log N N <1 `1 , `1 X
3 3
[9] k log N k log N `2 , `2
[16] k log N k log≥2 N `2 , `2
Herein∗ k log2 N k log2 N `2 , `1
Herein (k log k) log N (k log k) log N `2 , `1
XError guarantees hold in the “for all” model.
∗
Requires only O(log2 k) random bits.
Figure 1. Summary of previous sub-linear time results, and the results obtained herein.
which can be obtained using the same methods: one which uses an optimal number (up to constant
factors) of randomly selected rows from an incoherent binary matrix as measurements, and another
O(k log2 N )-time recovery result which requires fewer random bits3 than previous algorithms (i.e.,
less randomness).
Previous work involving the development of compressed sensing methods having both sub-linear
time reconstruction algorithms, and the type of “`p , `q ” error guarantees considered herein, began
with [9]. In [9] Cormode et al. built on streaming algorithm techniques with weaker error guarantees
(e.g., see [5, 7, 8]) in order to develop O(k log3 N )-time recovery algorithms, A, with associated
“`2 , `2 ” error guarantees in the “for each” model. Similar techniques were later utilized by Gilbert
et. al. in [16] to create sub-linear time algorithms with the same error guarantees, but whose
associated measurement matrices, M ∈ Rm×N , have a near-optimal number of rows up to constant
factors (i.e., m = O(k log N )). Other related compressed sensing methods with fast runtimes and
“`2 , `1 ” error guarantees in the “for all” model were also considered in [17]. Unlike these previous
methods, the compressed sensing methods developed herein utilize the combinatorial properties of
a new class of sparse binary measurement matrices formed by randomly selecting sub-matrices from
larger incoherent matrices.
Perhaps the measurement matrices considered herein are most similar to previous compressed
sensing matrices based on unbalanced expander graphs (see, e.g., [2, 18, 20]). Indeed, the mea-
surement matrices used in this paper are created by randomly sampling rows from larger binary
matrices that are, in fact, the adjacency matrices of a subclass of unbalanced expander graphs.
However, unlike previous approaches which use the properties of general unbalanced expanders,
we use different combinatorial techniques which allow us to develop O(k polylog N )-time recovery
algorithms. To the best of our knowledge, the runtimes we obtain by doing so are the best known
for any such method having “`p , `q ” error guarantees.
See Figure 1 for a comparison of the sub-linear time compressed sensing results proven herein
(last two rows) with previous sub-linear time compressed sensing results discussed above (first
four rows). The columns of Figure 1 list the following characteristics of each compressed sensing
method: (i) the number of measurement matrix rows, m, (ii) the runtime complexity of the recovery
algorithm, and (iii) the “`p , `q ” error guarantee achieved by the method. All error guarantees hold
in the “for each” model unless indicated otherwise by a X.
1.2. Techniques and Organization. It has been shown that all binary matrices satisfying easily
verifiable coherence conditions4 have strong combinatorial properties capable of producing entirely
3More precisely, the number of random bits is O(log2 k). To the best of our knowledge this represents the first
fast recovery result which requires a number of random bits that is entirely independent of N , the length of x.
4
Any matrix whose maximum inner product between all pairs of columns is small compared to the minimal number
of ones in each column satisfies the required coherence conditions. See Section 2 for details.
3
deterministic compressed sensing algorithms requiring Ω(k 2 log N ) runtime and measurements [1].
In this paper we demonstrate a general means for utilizing these same types of matrices to construct
compressed sensing approximation schemes with near-optimal runtime and sampling complexities.
Our new compressed sensing matrices are formed by randomly sampling a small number of rows
from any sufficiently incoherent binary matrix. The resulting random sub-matrices are then shown
to still satisfy sufficiently strong combinatorial properties with respect to any given input vector, x,
in order to allow standard fast compressed sensing techniques (i.e., similar of those utilized in [9])
to produce accurate results. Furthermore, the theory is developed in a modular fashion, making it
easy to utilize different binary incoherent matrix constructions in order to generate new results. We
take advantage of this modularity in order to generate the two new results listed in Figure 1, as well
as to show that our new measurement matrices also allow for compressive sensing with an optimal
number of measurements (up to constant factors) in O(N log N )-time.5 Each result is produced
by utilizing a different combination of two incoherent binary matrix constructions: deterministic
algebraic constructions due to DeVore [10], and randomly constructed incoherent binary matrices
with fewer rows constructed below in Section 3.
The remainder of this paper is organized as follows: In Section 2 we fix notation and review
existing results that are needed for later sections. In Section 3 we construct incoherent binary
matrices with a near optimal number of rows. These new binary matrices ultimately allow the
development of our O ((k log k) log N )-time recovery result via the techniques developed later in
Section 4. Section 4 constructs our compressed sensing measurement matrices by randomly sam-
pling rows from the previously discussed binary incoherent matrices (i.e., from both the matrices
reviewed in Section 2 as well as the matrices constructed in Section 3). Our main results are then
proven in Section 5. Finally, we conclude with a short discussion in Section 6.
2. Preliminaries
Let [N ] = {0, . . . , N − 1} for any N ∈ N. We consider the elements of any given x ∈ RN to
be ordered according to magnitude by the sequence j0 , j1 , . . . , jN −1 so that |xj0 | ≥ |xj1 | ≥ · · · ≥
xjN −1 . We set Skopt = {j0 , j1 , . . . , jk−1 } ⊂ [N ] for a given x, and let xS opt = xopt k ∈ R denote the
N
k
associated vector with exactly k nonzero entries:
xopt
k = x j0 , x opt
k = x j1 , . . . , x opt
k = xjk−1 .
j0 j1 jk−1
All results below deal with randomly sampling rows from a rectangular binary matrix whose columns
are all nearly pairwise orthogonal.
Definition 1. Let K, α ∈ [N ]. An m × N matrix, M ∈ {0, 1}m×N , is called (K, α)-coherent if both
of the following properties hold:
(1) Every column of M contains at least K ones.
(2) For all j, l ∈ [N ] with j 6= l, the associated columns, M·,j and M·,l ∈ {0, 1}m , have
hM·,j , M·,l i ≤ α.
These matrices are closely related to nonadaptive group testing matrices, unbalanced expander
graphs, binary matrices with the restricted isometry property, and codebook design problems in
signal processing. Several (implicit) constructions of (K, α)-coherent matrices exist (e.g., the num-
ber theoretic and algebraic constructions
n of [9] and [10], o
respectively). In addition, every (K, α)-
coherent matrix must have Ω min (K 2 /α2 ) logK/α N, N rows. See [1] for details.
Given any binary matrix M ∈ {0, 1}m×N with at least K ∈ [m] ones in column n ∈ [N ], let
M (K, n) denote a K × N submatrix of M created by selecting the first K rows of M with nonzero
5See Theorem 5 for details.
4
entries in the nth column. The following useful fact concerning (K, α)-coherent matrices is proven
in [1].
Lemma 1. Suppose M is a (K, α)-coherent matrix. Let n ∈ [N ], k ∈ K
α , ∈ (0, 1], c ∈
[2, ∞) ∩ N, and x ∈ R . If K > c · (kα/) then (M (K, n) · x)j will be contained in the interval
N
!
x−xopt
(k/)
x−xopt
(k/) c−2
xn − k
1
, xn + k
1
for more than c · K values of j ∈ [K].
In addition to (K, α)-coherent matrices, we will also utilize a bit-test matrix, BN ∈ {0, 1}(1+dlog2 N e)×N ,
whose nth -column is a one followed by n ∈ [N ] written in base 2. These bit-test matrices will allow
us to quickly identify large elements of a vector x using techniques from [9]. The row tensor product
of two matrices, A ∈ Rm1 ×N and B ∈ Rm2 ×N , denoted A ~ B, is defined to be the (m1 · m2 ) × N
matrix with entries are given by
(A ~ B)i,j = Ai mod m1 ,j · B (i−(i mod m1 ))
,j
.
m1
for more than K/2 values of j ∈ [K] for all n ∈ [N ]. Then, there exists an algorithm that takes M
and (M ~ BN ) x as input, and outputs a vector z ∈ RN satisfying
22 x − xopt
(k/) 1
kx − zk2 ≤ x − + xkopt
√ .
2 k
Furthermore, the algorithm can be implemented to run in O (m log N ) time.
Pseudocode for a faster randomized variant of the algorithm referred to by Theorem 1 can be
found in the Appendix. This randomized variant and its associated measurement matrices are the
focus of this paper. Briefly put, both algorithms operate in two phases. During the first phase all
heavy entires of the input vector, x, are identified using standard bit testing techniques [9]. These
heavy vector elements are then estimated during the second round using an approach from the
computer science streaming literature [8, 9]. The approximations provided by the binary matrices
in Lemma 1 guarantee that taking the median of all K entires of (M (K, n) · x) will provide a good
estimate of each important entry, xn .
1−σ
as long as K < mp. Thus, we can bound the probability that Sj < K above by 3N for any desired
σ ∈ [0, 1) by ensuring that
2
P [Sj < K] < e 1−σ
K
−mp 1− mp /2
(5) ≤ .
3N
Simplifying Equation 5 above, we obtain
K 2
3N
mp 1 − ≥ 2 ln .
mp 1−σ
Solving for mp in terms of K and N , we learn that
2 3N
(mp) − 2 K + ln (mp) + K 2 ≥ 0.
1−σ
This will hold whenever
s
3N 3N 2 3N
mp ≥ K + ln + 2K ln + ln > K.
1−σ 1−σ 1−σ
Applying Equation 5 together with the union bound over all N choices of Sj yields the desired
lower bound. A similar argument guarantees that every row will also have fewer than
s
3N 3N 2 3N
eK + e ln + e 2K ln + ln
1−σ 1−σ 1−σ
ones with probability at least 1 − 1−σ
3 . 2
Proof: Let Ii,j be the inner product of the j th column of M with the ith column of M for a given
i, j ∈ [N ] with i 6= j. We want Ii,j ≤ α. Since Ii,j is binomial with E [Ii,j ] = mp2 the Chernoff
bound implies that
mp2
2
eα/mp
P [Ii,j > α] = P Ii,j > mp E
α
· [I ] <
2 i,j α/mp2
α
e mp 2
2
as long as α > mp2 . For the sake of simplicity, suppose that α = 2mp2 = 2 log4/e 3N 1−σ . Then,
3N 2
7
Lemma 2 guarantees that a randomly constructed binary matrix will satisfy the first (K, α)-
coherent property with high probability. Similarly, Lemma 3 guarantees the second (K, α)-coherent
property. Solving for p in light of Equation 4 and Lemma 3 we get that we can set
2
3N
mp 2 log 4/e 1−σ
(6) p= = r
mp
3N
K + ln 1−σ + 2K ln 1−σ 3N
+ ln2 1−σ
3N
and
2
r !
3N 3N 2 3N
K + ln 1−σ + 2K ln 1−σ + ln 1−σ
mp
(7) m= = .
p log4/e 3N 2
1−σ
3N 2
Note these equations both make sense whenever K ≥ α ≥ 2 log4/e . We have the following.
1−σ
2
Theorem 2. Fix σ ∈ [0, 1). Let m, K, α ∈ [N ] be such that K ≥ α ≥ 2 log4/e 3N 1−σ , and let
m = Θ K 2 /α as per Equation 7. Randomly generate a matrix, M ∈ {0, 1}m×N , each of whose
entries is an i.i.d. Bernoulli random variable which is 1 with the probability, p, given in Equation 6.
Then, M will be both (K, α)-coherent and have Θ (K) ones per column with probability at least σ.
Although the matrices developed above have fewer rows than DeVore’s, we hasten to point out
that they are generally less structured. This ultimately means that they will be difficult to store
in compact form, and, therefore, of limited use when space complexity is a dominant concern.
4.1. Identification Matrix. The following corollary to Lemma 1 will be used to construct matri-
ces for the identification of the largest magnitude entries in x. Note that the corollary is essentially
opt
a coupon collection result (i.e., we want to collect, for each element in S2k/ , a “good row” satisfying
Equation 9).
Corollary 1. Suppose M is an m × N (K, α)-coherent matrix. Let −1 ∈ N+ , k ∈ [K/α],
c ∈ [14, ∞) ∩ N, σ ∈ [2/3, 1), and x ∈ RN . Select a subset of the rows of M , s0 ⊂ [m], by
independently choosing
7 m 2k/
(8) γ ≥ · ln
6 K 1−σ
values from [m] uniformly at random with replacement. If K > c · (kα/) then with probability at
opt
least σ every n ∈ S2k/ ⊂ [N ] will have an associated row of Ms0 , in ∈ [γ], for which
· x − xopt
(k/) 1
(9) (Ms0 · x)in − xn ≤ .
k
8
opt
Proof: Fix n ∈ S2k/ . Lemma 1 implies that each randomly selected row of M , j ∈ [m], will satisfy
6 K
Equation 9 with probability at least 7 · m. Hence, the probability that none of the γ selected rows
will satisfy Equation 9 is at most
6 K γ
1− · .
7 m
7 m
Let x = 6 ·K . If γ satisfies Equation 8 we have that
∞
!
X 1 2k/
γ 1+ ≥ x · ln .
hxh−1 1−σ
h=2
This in turn implies that
∞
!
X 1 1 2k/
γ = −γ · ln 1 − ≥ ln .
hxh x 1−σ
h=1
Thus,
6 K γ
1−σ
· 1− ≤
7 m 2k/
opt
whenever γ satisfies Equation 8. Taking the union bound over all 2k/ elements of S2k/ finishes
the proof. 2
It is straightforward to show that a random sub matrix, Ms0 , will have O(log N ) ones in every
column with high probability when it is constructed as per Corollary 1 from a (K, α)-coherent
matrix having Θ (K) ones per column.8 It is also important to note that an analogous variant of
Corollary
1 can be proven for DeVore’s (Θ(K), Θ(logK N ))-coherent matrices by randomly selecting
2k/
O ln 1−σ blocks of Θ(K) rows (see Section 2.1). Randomly selecting rows from a DeVore
matrix
in blocks
both (i) guarantees that every
column
of
the resulting sub matrix will have
2k/ 2k/
O ln 1−σ ones, and (ii) requires only O ln 1−σ ln K random bits. The following theorem
is proven via standard bit testing techniques (see, e.g., [9, 1]).
Theorem 3. Suppose NM is an m × N (K, α)-coherent matrix. Let ∈ (0, 1], σ ∈ [2/3, 1), k ∈
, and x ∈ R . Construct Ms0 as per Corollary 1. Then, with probability at least σ,
K · 14α
(Ms0 ~ BN ) x will allow Phase 1 (i.e., lines 4 through 14) of Algorithm 1 in the appendix to recover
all n ∈ [N ] for which
· x − xopt
(k/) 1
(10) |xn | ≥ 4 .
k
m 2k/
The required Phase 1 runtime is O K ln 1−σ ln N .
The only new observation required for the proof of Theorem 3 beyond those used to prove the
analogous results in [9, 1, 19] involves noting that any n satisfying Equation 10 also belongs to
opt
S2k/ .
To finish, we note that applying (a variant of) Corollary 1 to a (Θ(K), Θ(logK N ))-coherent
matrix from Section 2.1 produces a random matrix, Ms0 , having
2k/ k 2k/
O K ln =O · logk/ N · ln
1−σ 1−σ
8This result follows via techniques analogous to those utilized in Section 3 in order to establish Theorem 2 (i.e.,
via the Chernoff and union bounds).
9
rows. For σ fixed, this reduces to O ((k/) log N ) rows. Furthermore, Ms0 will have O (log(k/))
ones in all columns. Applying Corollary 1 to a (Θ(K), Θ(log N ))-coherent matrix from Section 3
produces a random matrix, Ms0 , having
m 2k/ K 2k/ k 2k/
O ln =O · ln =O · ln
K 1−σ log N 1−σ 1−σ
rows. For σ fixed, this reduces to O ((k/) log(k/)) rows. Furthermore, Ms0 will have O (log N )
ones in all columns with high probability.
4.2. Estimation Matrix. The following corollary constructs measurements capable of estimating
every entry of x that is identified as large in magnitude during Phase 1 of Algorithm 1. Furthermore,
the estimation procedure is simple, requiring only median operations (see Phase 2 of Algorithm 1).
Corollary 2. Suppose M is an m × N (K, α)-coherent matrix. Let −1 ∈ N+ , k ∈ [K/α],
c ∈ [14, ∞) ∩ N, σ ∈ [2/3, 1), S ⊆ [N ], and x ∈ RN . Select a multiset of the rows of M , s̃ ⊂ [m],
by independently choosing
m 2|S|
(11) β ≥ 28.56 · ln
K 1−σ
values from [m] uniformly at random with replacement. If K > c · (kα/) then Ms̃ will have both
of the following properties with probability at least σ:
(1) There will be at least ˜l = 21 · ln 2|S| nonzero values in every column of Ms̃ indexed by S.
1−σ
Hence, Ms̃ (˜l, n) will be well defined for all n ∈ S.
(2) For all n ∈ S more than ln /2 of the entries in Ms̃ (ln , n) · x (i.e., more than half of the
values j ∈ [ln ], counted with multiplicity) will have
· x − xopt
(k/) 1
(Ms̃ (ln , n) · x)j − xn ≤ .
k
Proof: Fix n ∈ S. We select our multiset, s̃ ⊂ [m], of the rows of M by independently choosing β
elements of [m] uniformly at random with replacement. Denote the j th element chosen for s̃ by s̃j .
Finally, let Pjn be the random variable indicating whether Ms̃j ,n > 0, and let Qnj be the random
variable indicating whether s̃j satisfies
· x − xopt
(k/) 1
(12) (M · x)s̃j − xn ≤
k
conditioned on Pjn . Thus, Pjn = 1 if Ms̃j ,n > 0, and 0 otherwise. Similarly,
1 if s̃j satisfies Equation 12 and Pjn = 1
n
Qj = .
0 otherwise
β β
µ = E
X 6 X
Qnj P1n , . . . , Pβn ≥ Pjn .
7
j=1 j=1
Pβ n
Let ln = j=1 Pj . The Chernoff bound (see, e.g., [24]) guarantees that
β
P Qnj < 7 n ln ≤ e− 18µ ≤ e− l21n .
4 · l
X
j=1
10
ln
Thus, if ln > 21 we can see that βj=1 Qnj will be less than ln2+1 with probability less than e− 21 .
P
2|S|
Hence, if ln ≥ 21 ln 1−σ then Property 2 will fail to be satisfied for n with probability less than
P n = 1 ≥ K so that µ̃ = E [l ] ≥ K β.
h i
1−σ
2|S| . Focusing now on ln , we note that P j m n m
Let ˜l = 21 ln 1−σ . Applying the Chernoff bound one additional time reveals that P ln < ˜l <
h i
2|S|
2
e . Hence, if we wish to bound P ln < ˜l from above by 1−σ
−µ̃· 1− µ̃l̃ /2
h i
2|S| it suffices to have
˜ ˜2 m˜ 2|S|
µ̃2 − 44 21 µ̃l + l ≥ 0. Setting β ≥ 1.36 · K l = 28.56 · K ln 1−σ
m
achieves this goal. The end
result is that Ms̃ will fail to satisfy both Properties 1 and 2 for any n ∈ S with probability less
than 1−σ |S| . Applying the union bound over all n ∈ S finishes the proof. 2
Note that corollary 2 considers selecting a multiset of rows from a (K, α)-coherent matrix. Hence,
some rows may be selected more than once. If this occurs, rows should be considered to be
selected multiple times for counting purposes only. That is, all computations involving a row
which is selected several times should still be carried out only once. However, the results of these
computations should be considered with greater weight during subsequent reconstruction efforts
(e.g., multiplely selected rows should be considered as generating multiple duplicate entries in
Ms̃ · x).
As above, it is straightforward to show that a random sub matrix, Ms̃ , will have O(log N )
ones in every column with high probability when it is constructed as per Corollary 2 from a (K, α)-
coherent matrix having Θ (K) ones per column. In addition, an analogous variant of Corollary
2 can
2|S|
be proven for DeVore’s (Θ(K), Θ(logK N ))-coherent matrices by randomly selecting O ln 1−σ
blocks of Θ(K) rows. Randomly selecting rows from a DeVore matrix in blocks
this way both
2|S|
guarantees that all columns of the resulting sub matrix will have O ln 1−σ ones, and also
2|S|
requires only O ln 1−σ ln K random bits. Note that we must be able to quickly construct
arbitrary columns of Ms̃ in order to execute Phase 2 of Algorithm 1 in the low memory setting
(i.e., when we can not explicitly store either the entire matrix M , or the randomly selected sub
matrix Ms̃ in memory). In this setting DeVore’s (Θ(K), Θ(logKN ))-coherent
matrices allow us to
2|S|
reconstruct any column of a random sub matrix containing O ln 1−σ blocks of rows, Ms̃ , in
2|S|
just O ln 1−σ · logK N -time (see Section 2.1 for details).
Corollary 2 will generally be applied with S ⊂ [N ] set to the subset discovered by Phase 1 of
Algorithm 1.9 Hence, we will generally have |S| equal to the number of rows in a matrix Ms0
constructed via Corollary 1. In more extreme settings, where we want to be able to estimate all
entries of x with high probability, we will set S = [N ]. Corollary 2 implies the following theorem.
Theorem 4. SupposeN M is an m × N (K, α)-coherent matrix. Let ∈ (0, 1], σ ∈ [2/3, 1), k ∈
K · 14α , and x ∈ R . Construct Ms̃ as per Corollary 2. Then, with probability at least σ, Ms̃ x
will allow Phase 2 (i.e., lines 15 through 19) of Algorithm 1 in the appendix to estimate all xn with
n ∈ S with a zn satisfying
· x − xopt
(k/) 1
(13) |zn − xn | ≤ .
k
9In fact, we select the rows for M independently of the subset, S, found during Phase 1 of Algorithm 1, before
s̃
the subset has been identified. Note that we only require an upper bound on the size of S before selecting rows from
M for our estimation matrix. Such an upper bound is supplied in advance by Corollary 1.
11
2|S|
The required Phase 2 runtime (and memory complexity) is O |S| ln 1−σ · logK N when M is
a (Θ(K), Θ(logK N ))-coherent DeVore matrix. Phase 2 requires O (|S| · log N )-time if M is a
(Θ(K), Θ(log N ))-coherent matrix from Section 3.
Proof: Equation 13 follows from the second property
of Ms̃ guaranteed by Corollary 2. Lines 15
2|S|
through 17 can be accomplished in O |S| · ln 1−σ · logK N -time using a median-of-medians al-
gorithm when M is a (Θ(K), Θ(logK N ))-coherent DeVore matrix. When M is a (Θ(K), Θ(log N ))-
coherent matrix from Section 3, lines 15 through 17 can be accomplished in O (|S| · log N )-time.10
Lines 18 and 19 can always be accomplished in O(|S| log |S|)-time. 2
We conclude this section by noting that applying (a variant of) Corollary 2 to a (Θ(K), Θ(logK N ))-
coherent matrix from Section 2.1 produces a random matrix, Ms̃ , having
2|S| k 2|S|
O K ln =O · logk/ N · ln
1−σ 1−σ
rows. Applying Corollary 2 to a (Θ(K), Θ(log N ))-coherent matrix from Section 3 produces a
random matrix, Ms̃ , having
m 2|S| K 2|S| k 2|S|
O ln =O · ln =O · ln
K 1−σ log N 1−σ 1−σ
rows. For σ fixed, this reduces to O ((k/) log(|S|)) rows. Furthermore, Ms̃ will have O (log N )
ones in all columns with high probability.
5. Main Results
We may now prove the three new results mentioned in Section 1.1. We have the following
theorem.
Theorem 5. Let ∈ (0, 1], σ ∈ [2/3, 1), x ∈ RN , and k ∈ [N ].11 With probability at least σ
Algorithm 1 will output a vector z ∈ RN satisfying
22 x − xopt
(k/)
(14) kx − zk2 ≤ x− xkopt + √ 1
2 k
when executed using any of the following identification and estimation matrices:
(1) A (Θ(k log N/), Θ(log N ))-coherent matrix from Section 3 used for estimation via Corol-
lary 2 with S = [N ]. Only Phase 2 of Algorithm 1 need be applied
(i.e.,
no
identification will
k N
be performed). The resulting number of measurements is O · ln 1−σ . The required
runtime
is O(N log N ).
k
(2) A Θ( logk/ N ), Θ(logk/ N ) -coherent matrix from Section 2.1 used for both identification
k
(via Corollary 1 variant) and estimation (via Corollary
2 variant
with |S| = O( ln (N/(1 − σ)))).
k N
The resulting number of measurements is O · ln 1−σ ln N . The required runtime is
O k · ln2 1−σ
N
.
10However, using the matrixes from Section 3 requires O(N log N )-memory since their columns contain ones in
random locations that must be remembered.
11For the sake of simplicity, we assume k = Ω(log N ) when stating the measurement and runtime bounds below.
12
(3) A (Θ(k log N/),
Θ(log N ))-coherent matrixfrom Section 3 used for identification (via Corol-
lary 1), and a Θ( k logk/ N ), Θ(logk/ N ) -coherent matrix from Section 2.1 used for esti-
k
mation (via Corollary
2 variant
with |S| = O( ln (k/(1 − σ)))). The resulting
number
of
k k/ k k/ N
measurements is O · ln 1−σ ln N . The required runtime is O · ln 1−σ ln 1−σ .
Proof: The runtime and measurement bounds follow from Theorem 3, Theorem 4, and the subse-
quent Section 4 discussions. The error guarantee for z follows from Theorem 3, Theorem 4, and
the proof of Theorem 7 in [19]. 2
It is interesting to consider the possibility of improving the runtime bounds obtained in Theo-
rem 5 by using iterative recovery techniques akin to those employed in [16]. This appears to be
difficult. In particular, such iterative recovery methods generally require the contributions of par-
tial solutions to be subtracted from the input measurements of the original vector, x, after each of
O(log k) rounds. Assuming that one must subtract some partial solution containing at least Ω(k)
nonzero entries from a large (constant fraction) of the initial measurements of x at some point
during reconstruction, it becomes clear that updating our measurements will not be O(k log N )-
time unless our measurement matrix contains O(log N ) nonzero entries per column. Unfortunately,
fast nonadaptive identification of previously undiscovered heavy elements of x (e.g., via bit-testing
methods) requires the use of matrices having Ω(log N ) nonzero entries in many columns during
each new round of iterative approximation. Hence, it appears as if only O(1) rounds of identifi-
cation may be performed using the techniques considered herein before the required measurement
matrices have too many ones per column in order to allow O(k log N )-time recovery. The author
considers this as (a weak) justification for utilizing only one round of identification in Algorithm 1.
6. Conclusion
In this paper we present a compressed sensing recovery algorithm with an “`2 , `1 ” error guarantee
that runs in only O ((k log k) log N )-time. This runtime is within a O(log k) factor of the known
lower Ω(k log N ) runtime bound. Demonstrating (or refuting) the existence of a O(k log N )-time
(i.e., linear-time in its required input size) compressed sensing recovery algorithm with similar error
guarantees remains an open problem.
References
[1] J. Bailey, M. A. Iwen, and C. V. Spencer. On the design of deterministic matrices for fast recovery of fourier
compressible functions. SIAM J. Matrix Anal. Appl., 33(1):263 – 289, 2012.
[2] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss. Combining geometry and combinatorics: A unified
approach to sparse signal recovery. In Communication, Control, and Computing, 2008 46th Annual Allerton
Conference on, pages 798–805. IEEE, 2008.
[3] T. Blumensath and M. E. Davies. Iterative hard thresholding for compressed sensing. Applied and Computational
Harmonic Analysis, 27(3):265 – 274, 2009.
[4] E. J. Candes, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements.
Comm. Pure Appl. Math., 59(8):12081223, 2006.
[5] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Automata, Languages
and Programming, pages 784–784, 2002.
[6] A. Cohen, W. Dahmen, and R. DeVore. Compressed Sensing and Best k-term Approximation. Journal of the
American Mathematical Society, 22(1):211–231, January 2008.
[7] G. Cormode and S. Muthukrishnan. What’s hot and what’s not: tracking most frequent items dynamically.
In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database
systems, pages 296–306. ACM, 2003.
[8] G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applica-
tions. Journal of Algorithms, 55(1):58–75, 2005.
13
[9] G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing. Structural Information
and Communication Complexity, pages 280–294, 2006.
[10] R. DeVore. Deterministic constructions of compressed sensing matrices. Journal of Complexity, 23(4-6):918–925,
2007.
[11] K. Do Ba, P. Indyk, E. Price, and D. Woodruff. Lower bounds for sparse recovery. In Proceedings of the Twenty-
First Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1190–1197. Society for Industrial and Ap-
plied Mathematics, 2010.
[12] D. L. Donoho. Compressed sensing. IEEE Trans. Info. Theory, 52(4):1289 – 1306, 2006.
[13] D. Du and F. Hwang. Combinatorial group testing and its applications. World Scientific Pub Co Inc, 2000.
[14] Y. Erlich, K. Chang, A. Gordon, R. Ronen, O. Navon, M. Rooks, and G. Hannon. Dna sudokuharnessing
high-throughput sequencing for multiplexed specimen analysis. Genome research, 19(7):1243–1253, 2009.
[15] S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive Sensing. Springer, to appear.
[16] A. Gilbert, Y. Li, E. Porat, and M. Strauss. Approximate sparse recovery: optimizing time and measurements.
In Proceedings of the 42nd ACM symposium on Theory of computing, pages 475–484. ACM, 2010.
[17] A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: fast algorithms for compressed
sensing. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, STOC ’07, pages
237–246, New York, NY, USA, 2007. ACM.
[18] P. Indyk and M. Ruzic. Near-optimal sparse recovery in the l1 norm. In Foundations of Computer Science, 2008.
FOCS’08. IEEE 49th Annual IEEE Symposium on, pages 199–207. IEEE, 2008.
[19] M. A. Iwen. Improved approximation guarantees for sublinear-time fourier algorithms. Applied and Computa-
tional Harmonic Analysis, 34(1):57 – 82, 2013.
[20] S. Jafarpour, W. Xu, B. Hassibi, and R. Calderbank. Efficient and robust compressed sensing using optimized
expander graphs. Information Theory, IEEE Transactions on, 55(9):4299–4308, 2009.
[21] B. S. Kashin. The diameters of octahedra. Uspekhi Mat. Nauk, 30(4):251–252, 1975.
[22] E. Porat and M. J. Strauss. Sublinear time, measurement-optimal, sparse recovery for all. In Proceedings of the
Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1215–1227, 2012.
[23] E. Price and D. Woodruff. (1+ eps)-approximate sparse recovery. In Foundations of Computer Science (FOCS),
2011 IEEE 52nd Annual Symposium on, pages 295–304. IEEE, 2011.
[24] P. Raghavan and R. Motwani. Randomized Algorithms. Cambridge Univ. Press, 1995.
[25] Y. Zheng, N. Pitsianis, and D. Brady. Nonadaptive group testing based fiber sensor deployment for multiperson
tracking. Sensors Journal, IEEE, 6(2):490–494, 2006.
14
Algorithm 1 Approximate x
1: Input: Ms̃ and Ms̃ x for estimation, and (Ms0 ~ BN ) x for identification
2: Output: z, an approximation to xkopt
3: Initialize multiset S ← ∅, z ← 0 ∈ RN , b ← 0 ∈ Rdlog2 N e
Phase 1: Identify All Heavy n ∈ [0, N ) ∩ N
4: for j from 1 to |s | do0
15