Information Theoretic Inequalities
Information Theoretic Inequalities
6,NOVEMBER1991 1501
Abstract-The role of inequalities in information theory is out to be another information quantity, the Fisher infor-
reviewed and the relationship of these inequalities to inequali- mation.
ties in other branches of mathematics is developed.
A large number of inequalities can be derived from a
Index Terms-Information inequalities, entropy power, Fisher strengthened Young’s inequality. These inequalities in-
information, uncertainty principles. clude the entropy power inequality, the Brunn M inkowski
inequality and the Heisepberg uncertainty inequality.
I. PREFACE:~NEQUALITIES ININFORMATIONTHEORY
These inequalities are extreme points of the set of in-
NEQUALITIES in information theory have been equalities derivable from a central idea. Logically inde-
I driven by a desire to solve communication theoretic pendent derivations of these inequalities exist and are
problems. To solve such problems, especially to prove based on Fisher information inequalities such as the
converses for channel capacity theorems, the algebra of Cramer-Rao inequality.
information was developed and chain rules for entropy Turning our attention to simple inequalities for differ-
and mutual information were derived. Fano’s inequality, ential entropy, we apply them to the standard multivari-
for example, bounds the probability of error by the condi- ate normal to furnish new and simpler proofs of the major
tional entropy. Some deeper inequalities were developed determinant inequalities in classical mathematics. In par-
as early as Shannon’s 1948 paper. For example, Shannon ticular Hadamard’s inequality, Ky Fan’s inequality and
stated the entropy power inequality in order to bound the others can be derived thjs way. Indeed we find some new
capacity of non-Gaussian additive noise channels. matrix inequalities by this method. Moreover the entropy
Information theory is no longer restricted to the do- power inequality, when specialized to matrices, turns out
main of communication theory. For this reason it is inter- to yield M inkowski’s determinant inequality, yet another
esting to consider the set of known inequalities in infor- tangency with the M inkowski of Brunn-Minkowski.
mation theory and search for other inequalities of the In the process of finding determinant inequalities we
same type. Thus motivated, we will look for natural fami- derive some new differential entropy inequalities. We
lies of information theoretic inequalities. restate one of them as follows. Suppose one is looking at
For example, the entropy power inequality, which says ocean waves at a certain subset of points. Then the
that the entropy of the sum of two independent random average entropy per sample of a random subset of sam-
vectors is no less than the entropy of the sum of their ples can be shown to increase as the number of sampling
independent normal counterparts, has a strong formal points increases. Gn the other hand, the per sample
resemblance to the Brunn M inkowski inequality, which conditional entropy of the samples, conditioned on the
says that the volume of the set sum of two sets is greater values of the remaining samples, monotonically decreases.
than or equal to the volume of the set sum of their Once again using these entropy inequalities on the stan-
spherical counterparts. Similarly, since the exponentiated dard multivariate normal leads to associated matrix in-
entropy is a measure of volume it makes sense to consider equalities and in particular to an extension of the se-
the surface area of the volume of the typical set associ- quence of inequalities found by Hadamard and Szasz.
ated with a given probability density. Happily, this turns By turning our attention from the historically necessary
inequalities to the natural set of inequalities suggestedby
Manuscript received February 1, 1991. This work was supported in
part by the National Science Foundation under Grant NCR-89-14538
information theory itself, we find, full circle, that these
and in part by JSEP Contract DAAL 03-91-C-0010. A Dembo was inequalities turn out to be useful as well. They improve
supported in part by the SDIO/IST, managed by the Army Research determinant inequalities, lead to overlooked inequalities
Office under Contract DAAL 03-90-G-0108 and in part by the Air Force
Office of Scientific Research, Air Force Systems Command under
for the entropy rate of random subsets and demonstrate
Contract AF88-0327. Sections III- and IV are based on material pre- the unity between physics, mathematics, information
sented at the IEEE/CAM Workshop on Information Theory, 1989. theory and statistics (through unified proofs of the
A. Dembo is with the Statistics Department, Stanford University,
Stanford, CA 94305.
Heisenberg, entropy power, Fisher information and
T. M. Cover is with the Information Systems Laboratory, Stanford Brunn-Minkowski inequalities).
University, Stanford, CA 94305. The next section is devoted to differential entropy
J. Thomas was with Stanford University. He is now with the IBM
T. J. Watson Research Center, Yorktown Heights, NY 10598.
inequalities for random subsets of samples. These in-
IEEE Log Number 9103368. equalities when specialized to multivariate normal vari-
001%9448,‘91 $01.00 01991 IEEE
1502 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991
ables provide the determinant inequalities presented in Lemma 2: If (X,Y) have a joint density, then h(XIY)
Section V. Section III focuses on the entropy power = h(X,Y)- h(Y).
inequality (including the related Brunn-Minkowski,
Proof:
Young’s and Fisher information inequalities) while Sec-
tion IV deals with various uncertainty principles and their h(xIY) = -lf(x9y)Inf(xly)hdy
interrelations.
We now observe certain natural properties of these Proof: Let g(x) be any density satisfying
information quantities. lg(X>XiXj dx = Kjj, for all i, j. Then,
Lemma 1: D(f Ilg) r 0, with equality iff f = g a.e.
Proofi Let A be the support set of f. Then, by
Jensen’s inequality, = jg ln(gb#k)
with equality only if g/f = 1, a.e., by the strict concavity = -h(g)+h(&), (1)
of the logarithm (see [HI, [29]). q where the substitution /g In 4K = j~$~ In $K follows from
DEMBO et al.: INFORMATION THEORETIC INEQUALITIES 1503
theorem that is a consequence of the general formalism other determinant inequality along the lines of Szasz’s
developed by Han [271. theorem; however, unlike the inequalities in the previous
section, there is no normalization by the number of ele-
Theorem 2:
ments in the subset.
g!“) 5 gp _< - * * 5 gp. Let
Proofi The proof proceeds on lines very similar to ip)=- l
the proof of the theorem for the unconditional entropy
c I(X(S);X(S”))
s: ISI = k
per element for a random subset. We will first prove that (: 1
g@)2 gr!r, and then use this to prove the rest of the be the average mutual information between a subset and
mequalities. its complement averaged over all subsets of size k. By the
By the chain rule, the entropy of a collection of random symmetry of mutual information and the definition of ip),
variables is less than the sum of the entropies, i.e., it is clear that ip) = iF’.k.
h(X,,X,,.--,X,z)
=j~s~(x(sj)~x(sc))+z(xj~x(sc)/x(sj))
n
1 n h(X,,X,,...,Xi-,,Xj+~,...,X,lxi) -I(X(Sj);X(S’))-I(X(Sj);XjlX(S’))
2-- c 7
n i=l n-l
= j~sh(XjIX(Sj))-h(X;IX(Sj)>X(Sc))
which is equivalent to gr) 2 gr!r.
We now prove that gp) 2 gp!r for all k I n by first -h(XjlX(S’))+h(XjIX(S’),X(Sj))
conditioning on a k-element subset, then taking a uni-
form choice over its (k - l&element subsets. For each = c h(XjlX(Sj))- h(XJX(S”)).
jES
k-element subset, gv’ >_ g$!?r and hence, the inequality
remains true after taking the ‘expectation over all k-ele- Summing this over all subsets of size k, we obtain
ment subsets chosen uniformly from the n elements.
1
0
C kZ(X(s);X(s”))- jI$s’(X(sj)~X(s~))
S: ISI = k [
C. Inequalities for Average Mutual Information
between Subsets = C C h(XjlX(Sj))-h(XjIX(S”)).
S:ISI=kjES
The previous two theorems can be used to prove the Reversing the order of summation, we obtain
following statement about mutual information.
Corollary 2: Let
fP,=--
L c
qx(s);x(s”))
C
=e
S: ISI = k [
k~(X(s);X(s”))- jf.s’(X(sj);X(sf))
1
k ’ C h(XilX(Sj))-h(XjlX(SC))
( k 1 S:IS[=k j=lS:ISI=k,Ssj
Then,
fp 2 fi’“‘r . . * > f Cn) =I? c h(XjIX(S’))
- n ’ j=l S’:I,S’l=k-l,S’s+j
Proof: This result follows from the identity
-h(XjIX((S’U j}‘))
1(X(S); X(S’)> = h(X(S)) - h(XW(X(F)) and Theo-
rems 1 and 2. 0
=a c h(XiIX( S’))
We now prove an inequality for the average mutual j=l LS’:S’c(jJC,IS’I=k-1 ~ -
I* (6)
information between a subset and its complement, aver-
aged over all subsets of size k in a set of random - c h( XjlX( s”))
variables. This inequality will be used to prove yet an- 9’: S” c[jF, IS”1 = n - k
DEMBO etal.:INFORMATIONTHEORETICINEQUALITIES 1505
Since k I [n/2], k - 1 < n - k. So we would expect normal counterparts, has a strong formal resemblance to
that the second sum in (6) to be less than the first sum, the Brunn M inkowski inequality, which says that the
since both sums have the same number of terms but the volume of the set sum of two sets is greater than or equal
second sum corresponds to entropies with more condi- to the volume of the set sum of their spherical counter-
tioning. We will prove this by using a simple symmetry parts. Both are interpreted here as convexity inequalities
argument. for RCnyi entropies that measure the uncertainty associ-
The set S” with n - k elements has 21: subsets of ated with a random variable X via the pth norm of its
( 1
size k - 1. For each such subset S’ of size k - 1, we have density (see Section III-A). A strengthened version of
Young’s inequality about the norms of convolutions of
h(XjlX(S”)) Ih(XjlX(S’)), (7) functions, due to Beckner [3] and Brascamp and Lieb [8]
since conditioning reduces entropy. Since (7) is true for is equivalent to a more general convexity inequality, with
each subset s’ c s”, it is true of the average over subsets. both the entropy power and the Brunn-Minkowski in-
Hence, equality being extreme points (see Section III-B).
This proof of the entropy power inequality (due to Lieb
c h(XjlX(S'))* [30]) is different from Stam’s [38] proof, which relies upon
S’:S’cS”,\S’I=k-1 a convexity inequality for Fisher information. Neverthe-
less, the interpretation of the entropy power inequality as
a convexity inequality for entropy allows for a new, sim-
(8)
Summing (8) over all subsets S” of size IZ- k, we get pler version of Stam’s proof, presented here in Section
III-C.
C h( ,“))
xjlx( Isoperimetric versions of the entropy power and the
S”: lS”j = n - k
Fisher information inequalities have derivations that par-
allel the classical derivation of the isoperimetric inequal-
h( xjlx( s’))
c ity as a consequence of the Brunn-Minkowski inequality
S’: S’CS”, IS’1=k-1
(see Section III-D following Costa and Cover [14] and
Dembo [191).
= C h(XjlX(S')), (9)
S’: IS’I=k-1
since by symmetry, each subset S’ occurs in A. Entropy Power and Brunn - Minkowski Inequalities
(,-,:,)=(;Z:) sets S”. The definition of the entropy power and the associated
entropy power inequality stated next are due to Shannon
Combining (6) and (9), we get [37]. The entropy power inequality is instrumental in
establishing the capacity region of the Gaussian broadcast
c kZ(X(S);X(S”)) - c Z(X(S,);X(S;)) 2 0. channel ([5]) and in proving convergence in relative en-
S: ISI = k jES I tropy for the central lim it theorem ([2]).
Since each set of size k - 1 occurs n - k + 1 times in the Definition: The entropy power of a random vector X E
second sum, we have R” with a density is
c kZ(X(S);X(S”))
S: ISI = k
N(X)=&exp ih(X) .
2 C C z(x(sj);x(sf))
i i
S:ISI=kiES
In particular, N(X) = JKll’” when X = +K.
=(n-k+l) c Z(X(S’);X(S’“)).
S’:IS’I=k-1 Theorem 4 (Entropy Power Inequality): If X, Y are two
Dividing this equation by k(i), we have the theorem independent random vectors with densities in R” and
both h(X) and h(Y) exist, then,
1
N(X+Y)av(X)+N(Y). (10)
Equality holds iff X and Y are both multivariate normal
with proportional covariances.
Theorem 5 (Minkowski’s Inequality [341): For any two Remark: See Section V-A for an alternative informa-
nonnegative definite matrices K,, K, tion theoretic proof of both Theorems 5 and 8, which
avoids the entropy power inequality.
[K, + Kzll’” 2 lK$‘” + IKZll’+,
The entropy power inequality has a strong formal re-
with equality iff K, is proportional to K,. semblance of the Brunn-Minkowski inequality. For defin-
ing the latter, let p denote Lebesgue measure in Rn (i.e.,
Proof: Let X1,X, be independent with Xi N 4K,.
set volume in Rn) and A + B denote the M inkowski sum
Noting that X, + X, N $K,+K, and using the entropy
(in Rn) of the (measurable) sets A and B, that is
power inequality yields
A+B={x+y:x~A,y0}.
IK, + Kzll’n = N( X, + X,)
Theorem 9 (Brunn-Minkowski Inequality 1241)
2 N(-X,) + N( X,)
0 ,u(A+B)l’n>p(A)l’n+~(B)*‘n. (14)
= lK$“” + IK211?
The following alternative statement of the entropy Proof: For a very simply geometric proof, see [24].
power inequality is given in Costa and Cover [14]. An alternative proof of this inequality as an extreme point
of Young’s inequality (which is due to Brascamp and
Theorem 6: For any two independent random vectors Lieb, see [71 and [91>is presented in Section III-B.
X, Y such that both h(X) and h(Y) exist, The entropy power is a measure of the effective vari-
ance of a random vector while p(A>l/” measures the
h(X+Y)rh(2++), w effective radius of a set A. Thus, the entropy power
where 2, Y are two independent multivariate normal with inequality, which says that the effective variance of the
proportional covariances, chosen so that h(i) = h(X) and sum of two independent random vectors is no less than
h(f) = h(Y). the sum of the effective variances of these vectors, is the
Proof: For X and Y multivariate normal, M inkowski’s dual of the Brunn-Minkowski inequality, which says that
inequality and the entropy power inequality (lo), hold the effective radius of the set sum of two sets is no less
with equality. Furthermore, X and Y are chosen so that than the sum of the effective radii of these sets. In this
formal duality normal random variables are the analog of
N(X+f)=N(X)+N(E)=N(X)+N(Y)IN(x+Y), balls (being the equality cases for the previously men-
tioned inequalities), and the sum of two independent
where the last inequality follows from (10). Thus (10) and random vectors is the analog of the M inkowski sum of
(11) are equivalent. q sets. This analogy is suggestedin [14], where the existence
Alternatively, the entropy power inequality also of a family of intermediate inequalities is conjectured.
amounts to the convexity of the entropy under the “co- We shall further develop this issue here and show in
variance preserving transformation” fix + JGiY as Section III-B that Young’s inequality is the bridge be-
follows. tween the entropy power and the Brunn-Minkowski in-
equalities. The following family of Renyi entropies helps
Theorem 7: For any 0 I h I 1, in illustrating these relationships.
Definition: The pth R&yi entropy h,(X) of a random
h(&X+mY)-Ah(X)-(l-h)h(Y)>O. (12) variable X with density f in R” is defined by
Proof: For X and Y the inequality (12) holds trivially
with equality. Therefore, (12) is equivalent to h,(X) = &lnE[f(X)‘“-“1 =&l”(“fll,),
h(fiX+fiY)rh(&b+f), (15)
The latter inequality is merely (11) with fix substituted for 0 < p <w, p + 1, where IIf (Ip = [ Jf(x)p&ll/P. The
for X and flY substituted for Y. 0 RCnyi entropies for p = 0 and p = 1 are defined as the
Remark: Theorem 7 parallels part of Lieb’s proof of lim its of h,(X) as p + 0 and p -+ 1, respectively. It fol-
Theorem 4 (in [30]). lows directly from the previous definition that
In parallel with the above derivation of M inkowski’s h,(X) = ji,mohhp(X) = lnp({X: f(x) > 0}), (16)
inequality, the following theorem due to Ky Fan [22]
results from specializing (12) to the multivariate normal. and
Theorem 8 (1;;: Fan D2]) In IK I is concave. h,(X)= !Ln, h,(X)=h(X). (1-J)
F ‘-
Proofi Consider (12) for X N 4K1 and Y N 4K2. Then,
dTx+diTY is also multivariate normal with covari- Therefore, the (Shannon) entropy is identified with the
ante AK, + (1 - h)K,, and (12) becomes Renyi entropy of index p = 1, while the logarithm of the
essential support of the density is identified with the
lnIAK,+(l-A)K,I~hln(K,]+(l-A)ln]K-J. 0 (13) RCnyi entropy of index p = 0.
REMBO et al.: INFORMATION THEORETIC INEQUALITIES 1507
A convexity inequality for Renyi entropies of index Theorem 11 (Young’s Inequality): If l/r + 1 = l/q +
p = 0, which is the dual of (121, is the following. l/p, then for 1 I r, p, q 2 ~0,
Theorem 10: For any 0 5 A 5 1 and any two indepen- ~~~~~~~{(ll.f*RIII)/(Il~IlpllYiiy)) 5 (cpC*/C,y2. (20)
dent random vectors X, Y,
g = L;(R”)
h,(AX+(l-A)Y)-Ah,(X)-(1-h)h,(Y)>O. (18)
Here,
Remarks: cp = (p)l’P/ Ipl(l’P’,
a) While Theorem 7 deals with convexity under the where p’ is the Holder conjugate of p (i.e., l/p + l/p’ =
“variance preserving transformation” fiX + 1) and cq and c, are likewise defined. The converse
fiY, this theorem deals with convexity under inequality holds for the infimum of Ilf*gll,./ Ilfll,,llgll,
the “support size preserving transformation” AX + whenO<r,p,q<l.
(l- A)Y.
Remark: For the multivariate normal densities f = +hK,
b) The proof of Theorem 10 is deferred to Section
III-B. A family of convexity inequalities for RCnyi and g = 4(1-A)K, (where A = (l/p’)/(l/r’), and conse-
entropies is derived there as consequences of quently 1 - A = (l/q’)/(l/r’)), Young’s inequality re-
Young’s inequality and both Theorems 7 and 10 are duces to K. Fan’s matrix Theorem 8. Actually, (20) is
obtained as extreme (limit) points. Here we derive established in [8] by showing that the supremum is
achieved by multivariate normal densities, where the con-
only the Brunn-Minkowski inequality as a conse-
stants in the right side of (20) are determined by applying
quence of Theorem 10.
K. Fan’s matrix Theorem 8. For a detailed study of cases
Proof of Theorem 9: Choose a pair of independent of equality in this and related inequalities see [311.
random vectors X and Y in R” such that the support of The following convexity inequality for RCnyi entropies
the density of AX is the set A and the support of the (which is the natural extension of Theorem 7) is a direct
density of (l- A)Y is B. Clearly, the support of the consequence of Young’s inequality.
density of AX + (1 - h)Y is the (essential) M inkowski sum
Theorem 12: For anyO<r<a, r#l and anyOIA<
A + B, while (l/A)A and (l/(1 - A))B are the support
sets of the densities of X and Y, respectively. Therefore, 1, let p,q be such that l/p’ = Al/r’ and l/q’ =
taking (16) into account, the inequality (18) specializes for (l- All/r’, th en f or any two independent random vectors
these random vectors to X, Y with densities in R”,
h,(fiX+mY)-Ah,(X)-(l-A)h,(Y)
In,u((A + B) 2 AInp((l/A)A)
rh,(~,)-Ah,(~,)-(l-A)h,(~,), (21)
+(l-A)ln~((l/(l-A))B). (19)
provided that both h,(X) and h,(Y) exist.
Observing that In p((l/A)A) = In p(A) - n In A and Here, 41 stands for the standard normal density in R’.
In p((l/(l - h)jB) = In p(B) - iz In (1 - A>, the Brunn- In establishing the inequality (21) we use the well-known
M inkowski inequality results when rearranging the above scaling property of RCnyi entropies
inequality for the particular choice of A = hp(aX)=hp(X)+nlnlal.
&4>1’“/(&4>1’” + /L(B>~‘“>. 0 (22)
This identity follows from the definition in (15) by a
B. Young’s Inequality and Its Consequences change of variable argument.
There is a strong formal resemblance between the Proof: Fix r and A. We plan to apply Young’s in-
convexity inequalities (12) and (18) (where the former equality for f the density of fix and g the density of
yields the entropy power inequality while the latter results J1-hY. Since h,(X) and h,(Y) are well defined, so are
in the Brunn-Minkowski inequality). This resemblance
suggeststhe existence of a family of intermediate inequal- h,(fiX)=-p’lnIlfll,=h,(X)+~lnA
ities. Young’s inequality, which is presented in the sequel,
and
results after few manipulations with these inequalities
(see (21)). In particular, we follow Lieb’s (in [30]) and h,(mY)=-q’ln~~g~/,=h,(Y)+~ln(l-A).
Brascamp and Lieb’s (in [9]> approach in regarding and
proving Theorems 7 and 10 (respectively) as lim its of (21). These identities are applications of (15) and (22), and in
For that purpose let L,(P) denote the space of com- particular they imply that f E L,(R”) and g E L,(Rn).
plex valued measurable functions on R” with Ilfll, <co Further, since X and Y are assumed independent,
and let f*g(x) = /f<x - y)g(y) dy denote the convolution
operation. - r’lnllf+gllr = h,(fiX+fiY).
The following sharp version of Young’s inequality is Observe that p, q in Theorem 12 are such that l/p’ +
due to Beckner [3] and Brascamp and Lieb [8]. l/q’ = l/r’ (so that l/r + 1 = l/q + l/p), and l/r’ < 0
1508 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBEF 1991
implies 0 < r,p,q < 1, while l/r’ > 0 implies 1 < r,p,q. where H(A) 4 = - A In A - (1 - A)ln(l - A). Combining
Therefore, Theorem 11 is applicable for f and g, result- this lim it with (2.5)yields
ing in
h,(fiX+dl-Y)=Ah,(X)-(1-A)h,(Y)+(A).
- ~‘1n{(llf*~ll,>/(ll~Il,ll~ll,)}~ - r’~ln(c,c,/c,). Inequality (18) is now obtained by the resealing X + fix
(23) and Y +- m Y (using the scaling property (22)): This
completes the proof of Theorem 10. 0
This inequality holds with equality for f = 4*, and g =
&-A), (i.e., X - 41, Y N +r) since for any p # 0, p # 1 Remarks:
Theorem 13 (Fisher Information Inequality): For any Since de, /dt = - l/ t2 we obtain by an application of the
two independent random vectors X, Y and any 0 5 A I 1, well-known scaling property J(X) = cu2J(aX)
AJ(X)+(l-A)J(Y)-J(fiX+\ll-hY)>O. (28)
2t;{s(t)} =AJ(X,)+(l-A)J(Y,)-J(K). (31)
This is the first instrumental tool for the proof of the
entropy power inequality presented in the sequel. The
second tool is DeBruijn’s identity, the link between en- Since V, = fix, + mY, the Fisher information in-
tropy and Fisher information (for proofs consider [6], [14], equality (28) applies to (31) and thus establishes the
differential inequality (30). 0
[381).
Theorem 14 (DeBruijn’s identity (381): Let X be any
Remarks:
randotn vector in R” such that J(X) exists and let Z - 41
be a standard normal, which is independent of X. Then a) The representation (31) is very similar to the one in
[l]. Such a representation was also used in [21 for
;h(X+J;Z),6s,= ;J(X). proving a strong version of the central lim it
theorem.
We are now ready to present the simplified version of b) Two independent proofs of the entropy power in-
Stam’s proof.’ equality via the equival&rt convexity inequality (12)
have been presented. In the first proof, the underly-
Proof of Theorem 7 (by normal perturbations): Con- ing tool is Young’s inequality from mathematical
sider the continuous family of pairs of independent ran- analysis, and results about (Shannon’s) entropy are
dom vectors the lim it as r --f 1 of analogous results about RCnyi
entropies (i.e., about norms of operators in L,(R”)).
x, =fix+fix,, O<t_<l,
In the second proof, the underlying tool is a suffi-
y,=fiy+fiy,, O<t<l, cient statistic inequality for Fisher information, and
results about entropy are obtained by integration
where the standard multivariate normals X, N +1 and over the path of a continuous normal perturbation.
Y0 N +I are independent of X,Y and of each other. Fix This proof also settles the cases of equality that are
O~A~landletV,=~X~+~Y,.Clearly,V,-$,is not determined in the first proof. We will encounter
also a standard normal, and V, = fiVl + fiV, for all this duality again in Section .IV where uncertainty
0 I t 5 1. We now consider the function principles are derived by similar arguments.
c) The strong formal resemblance between convexity
s(t)=h(v,)-Ah(X,)-(1-A)h(Y,), forOIt51.
inequalities (12) and (18) dealing with entroIGes and
Theorem 7 (i.e., inequality (12)) amounts to s(l) 2 0, and the M inkowski sum of sets suggests the following
since V,, X,, and Y0 are identically distributed s(O)= 0. inequality:
Therefore, our goal is to establish the differential inequal-
ity &A+ B) /J(A) P(B)
S(A+B) ‘S(A)+ S(B) ’ (32)
-&(t,, 2 0, Ostsl, (30)
as the dual of the Fisher information inequality (37).
which clearly implies inequality (12) and thus completes Here, S(C) denotes the outer M inkowski content of
the proof. By virtue of the scaling property (22) (applied the boundary of a set C, which is defined as
here for p = 1, cy= Jl/t and for the variables X,, Y, and
y> the function s(t) may also be expressed as S(C)=lirri;f ~[~(C+EB~)-~(C)],
Does (32) hold when both A and B are compact, tion J(X) exists,.
convex and nonempty sets? Alternatively, is the ratio
of volume-to-surface area increased by M inkowski $(X)N(X) 2 1. (34)
sums for such sets? When in addition A (or B) is a
ball the inequality (32) indeed holds as a direct Proof (following (141): For Y = &Z, where Z is a
consequence of the Alexandrov-Fenchel inequality standard multivariate normal (so N(Y) = E), the entropy
(see [lo], p. 143). power inequality (10) reduces to
d) Consider the functions
$v(x+&z)-N(X)] rl. (35)
E
Clearly,
.f(v
- r Theorem 17 (Fisher Information Isoperimetric Inequal-
ity): When the Fisher information J(X) of a random
[/
Y>
((l-h)+A/r)g(y)(A+(l-h)/r)dy
1 . vector X in R" exists and is differentiable with respect to
a small independent normal perturbation then
Note that for r = 1, V = X + Y and (33) is merely the
Fisher information inequality (28). In conclusion, if d ([~{~(X+~Z))]-']~=o,l. (36)
z
(33) holds for r # 1 then this remark is the skeleton
of a new proof of Young’s inequality for these values Proof (following (191): While the Fisher information
of r, a proof which is orthogonal to the existing inequality (28) is the dual of the convexity inequality (12),
proofs of [3] and [S]. the inequality
J(x+Y)-l- J(x)-1-J(Y)-120, (37)
D. Isoperimetric Inequalities
where X, Y are any two independent random vectors, is
The classical isoperimetric inequality states that balls
the dual of the entropy power inequality (10). This equiva-
have the smallest surface area per given volume. Recall
lent statement of the Fisher information inequality is
that S(A) is the surface area of a set A and that B, is the
proved for example in [6] (for n = 11 and [20] (for y1# 1).
unit ball. So, an alternative statement of the isoperimetric
For Y=&Z (so that J(Y)-i = e/n) and in the lim it
inequality is as follows.
E J 0 this inequality yields
Theorem 15 (The Classical Isoperimetric Ineqdity):
lim L(J(X+~Z)~‘-J(X)~‘}-~
S(A) 2 n *p( Ap -l’n)p( Bp) e-0 E
with equality if A is a ball in R".
Proofi Consider the nth power of the Brunn-
Since this is the same inequality as (36) the proof is
M inkowski inequality (14) for B, = l B1 (so that pL(B,)‘ln
completed. q
= E,uL(BJ ‘1,). The isoperimetric inequality results by sub-
tracting p(A), dividing by E and considering the lim it as Remark: Inequality (36) is equivalent to the “I2 in-
E JO. 0equality” of Bakry and Emery (see [l]).
The Fisher information isoperimetric inequality sug-
A dual “isoperimetric inequality” was derived by such
gests that the sensitivity of the inverse of the Fisher
an approach out of the entropy power inequality (see [141
information with respect to a small independent normal
following [38]).
perturbation is m inimal when the unperturbed variable
Theorem 16 (Isoperimetric Inequality for Entropies): For already possessesa multivariate normal distribution. Note
any random vector X in R" for which the Fisher informa- that the inverse of the Fisher information is exactly the
DEMBO ef al.: INFORMATION THEORETIC INEQUALITIES 1511
Cramer-Rao lower bound for the error of the estimate of While Lieb’s proof of this conjecture (in [30]) is based on
a translation parameter (see also Section IV-B). Hausdorff-Young and Young inequalities, here a stronger
The concavity of the entropy power, which is proved “incremental” result is derived as a direct consequenceof
directly in great length in [15], is the following immediate the isoperimetric inequality for entropies. This demon-
corollary of the Fisher information isoperimetric inequal- strates once again the close relationship between Fisher
ity (36). information and entropy.
Corollary 3 (Concavity of the Entropy Power): When
the Fisher information of X exists and is differentiable A. Barn’s Uncertainty Princi&
with respect to a small independent normal perturbation
We adopt the following definition of conjugate vari-
then
ables in quantum mechanics.
-${N(X +~z)},~=O
IO. Definition: Associate with any complex wave amplitude
function I) in L,(R”) a probability density
Proof (following (191): Two applications of DeBruijn’s
identity (29) yield
Let 4(y) E L,(R”) be the Fourier transform of I,!&), and
g+(y) the density similarly associated with 4. Then, the
random vectors X N fJ, and Y N g+ are called conjugate
= N(X)
variables.
2+;;{J(X+&Z)),.=0
Stam’s uncertainty principle relates the Fisher informa-
tion matrix associated with a random vector (defined
The isoperimetric Fisher information inequality is clearly next) with the covariance of its conjugate variable.
equivalent to Definition: The Fisher information matrix J(X) of a
2 Id random vector X with a density f is
Section IV-A corresponds to a proper normalization, in where throughout this section X,Y is any pair of conju-
which h/2r is replaced by 1/4~r. This normalization gate variables. A seemingly unrelated fact, a stronger
yields the following multivariate uncertainty relationships. version of Theorem 16 (the isoperimetric inequality for
entropies) whose detailed derivation is given in [20], states
Theorem 19: The n-dimensional Weyl-Heisenberg un-
that
certainty principle may be stated in any of the following
four equivalent forms: N(X)IJ(X)1%1.
16rr*K”*K K1’* - I -> 0 (43) By combining the above two inequalities one obtains
16r*K;~*K;K;~* - Z2 0 (44) 16?7*1K,li’“N(X) 2 1.
16r*K, - K;’ 2 0 (45) Now, the maximum entropy inequality N(Y) I (Kyll’n
16rr*K, - K;’ 2 0, (46) (Lemma 5) suggests the following sharper uncertainty
where X,Y are any pair of conjugate vectors (see Section principle.
IV-A for definition). Theorem 21 (Hirschman’s Uncertainty Principle):
There exists a simple and direct proof of this inequality
as a consequence of an appropriate Cauchy-Schwartz 16r*N(Y)N(X) 2 1. (49)
inequality. Here we present an alternative proof illustrat- This uncertainty principle was conjectured by Hirschman
ing the connection of this uncertainty principle with the (in [28]) who proved a weaker version with a smaller
Cramer-Rao inequality: constant. It follows as a corollary of the following strong
Theorem 20 (Cramer - Rao Inequality): version of Hausdorff-Young inequality (due to Beckner
J(X) - K,’ 2 0,
[31X
(47)
J(Y)-K,‘>O. Theorem 22 (Hausdorff -Young Inequality): Let 4(y)
(48) be the Fourier transform of I,!&) E L,(Rn). Then for any
Proof of the Weyl- Heisenberg Inequality: Adding 11p12
Stam’s uncertainty principle (39) and the Cramer-Rao
inequality (47) yields the Weyl-Heisenberg principle (45). 11411,~
I cpn’*11~11, (50)
0 where (l/p) + (l/p’) = 1, and cP = ~‘/J’/P’(~‘~‘).
We interpret this relationship by suggesting that Stam’s Remarks:
uncertainty principle “measures” the fluctuations in the
phase of the amplitude wave functions 9(x> and $(y>, a) In 1401,the time duration of the function I/J(X) is
while the Cramer-Rao inequality “measures” the amount measured via rP = exp{h,,,(X)} and its bandwidth is
of “nonnormality” of the associated densities f#(x) and measured by 0,, = exp{hp,,2(,Y)1. In this terminol-
ogy, Hirschman’s uncertainty principle amounts to
t?,(Y).
Actually, Stam’s identities (40, (42) establish the equiu- the following “time-bandwidth” uncertainty relation
alence of the Weyl-Heisenberg principle and the specific /e\”
Cramer-Rao inequality given in Theorem 20. This equiv-
alence is established by proving the Cramer-Rao inequal-
ity as a consequence of the Weyl-Heisenberg principle. b) One can also establish Young’s inequality in the
range 1 I p, q I 2 I r out of the Hausdorff-Young
Proof of the Cramer - Rao Inequality (47): Suppose inequality (Theorem 22) and elementary properties
that X is a random variable in R” with a density f(x) for of the Fourier transform (see [3]).
which J(X) < ~0.Let G(x) = m be the associated real c> Cases of equality in (50) are studied in [31].
valued amplitude wave function. Clearly, Stam’s identity d) Carlen [12] obtains the isoperimetric inequality (34)
(41) holds. Substituting this identity into the Weyl- as a consequence of Hirschman’s uncertainty princi-
Heisenberg principle (45) yields the Cramer-Rao inequal- ple.
ity (47). 0
Remark: This equivalence is generalized in [20], and D. A Discrete Version of Hirschman’s Uncertainty Principle
shown there to hold between general families of Hausdorff-Young inequalities exist for Fourier trans-
Weyl-Heisenberg and Cramer-Rao inequalities. forms on groups other than R”. Each of these inequalities
yields the corresponding Hirschman’s uncertainty princi-
C. Hausdorff - Young Inequality and Hirschman’s ple by considering the lim it as p -+ 2. As an explicit
Uncertainty Principle example to demonstrate this idea we show here that any
An immediate consequence of Stam’s uncertainty prin- unitary square matrix U (possibly of infinite dimension),
ciple (39) is that with supijluijl = M < 1, yields a nontrivial Hausdorff-
Young inequality and consequently the following uncer-
16%-*(K,I”” r /J(X) y, tainty principle.
DEMBO ef al.: INFORMATION THEORETIC INEQUALITIES 1513
Theorem 23: The integer valued random variables X, Y l/n for k=1,2;.., n, and for this pair of distributions
with P(X = i> = /xii*/ I/xl/$ and P(Y = i) = the previous inequality holds with equality.
Kux>i12/lIZkIlt are “conjugate” variables, where x is any The discrete entropy is bounded above by the base 2
vector with llxl12<a. For any such pair logarithm of the size of the support set of the distribution.
Therefore, the uncertainty principle (53) implies that the
H(X)+H(Y)22ln . $ (51) product of the support sets of the vector x and its discrete
1 i Fourier transform is at least the dimension n of the
Proof: The unitary matrix U is an isometry on the Fourier transform. This is Theorem 1 of [21] (where
appropriate Hilbert space, i.e., for every x, llUxll2 = Ilxll~. similar support-set inequalities are derived also for x such
Furthermore, clearly llUxllmI M llxllr, where llxllP = that (1 - E) of Ilxl12 is concentrated over a relatively small
[Ci=rIxilp]r/P and IlxlL, = ~up~,~{lx~l). Riesz’s interpola- index set).
tion theorem (between the extreme bounds above for
p = 1 and p = 2) yields the following “Hausdorff-Young” E. Wehrl’s Conjecture
inequality for any vector x and any 1 I p I 2
Wehrl introduced a new definition of the “classical”
IIUXllp~
5 M (*-~)‘%dIp, (52) entropy corresponding to a quantum system in an attempt
where l/p’ + l/p = 1. Consider now a pair of conjugate to build a bridge between quantum theory and thermody-
variables X and Y with distribution functions as previ- namics (see [39]). Consider a single particle in R”. The
ously defined. Then (52) implies an uncertainty principle (quantum) state of the particle is characterized by the
for the (discrete) RCnyi entropies of X and Y. Specifi- “density matrix” p, a nonnegative definite linear operator
tally, let on L,(R”) of unit trace (i.e., whose eigenvalues are
.I nonnegative real numbers that sum to one). The coherent
1
In CP(X= i)“* states are the normalized L,(R”) functions
%AX) = l-(p/2) i
*(xlP,d
= 1-pp/2 ln(llxllp/lldl*)~
= ( &)n’2( Jn’4exp( -+(r-q)‘(xq)+@‘x),
and
i.e., in the “classical” theory there is an inherent m inimal The projection operators Pi correspond to densities
level of uncertainty (due to “quantization”) the value of
which is n. Further, this m inimal uncertainty is obtained
iff the operator p is a projection operator on one of the
coherent states. where e, E L,(R’) and lIei = 1. Theorem 25 is thus the
Wehrl’s conjecture, which is restated below as a lower immediate consequence of the following two lemmas.
bound on the entropy power of X,, was proved in [30] by
an application of the strong versions of Young and Lemma 6: For any two random vectors X, Y in R*” and
Hausdorff-Young inequalities (casesof equality were later any 0 5 A 5 1, let 2 = B,X +(l - B,)Y, where B, denotes
determined by Carlen [ll]). a Bernoulli (A) random variable, independent of both X
and Y. The density of 2 is therefore the convex combina-
Theorem 24 (Wehrl- Lieb): For X, a random variable tion Af + (1 - A)g, where f, g are the densities of X and
in R*” with density f, of the form of (54) Y, respectively. Then,
qq> 2 1, AJ(X)+(l-A)J(Y)-J(Z)rO.
Pro05 With this notation, after some manipulations
and equality holds iff p is of rank 1 and X, has a
we obtain
standard normal distribution.
AJ(X)+(l-A)J(Y)-J(Z)
Remarks:
g(p,q)f(p,q)
a> It is fairly easy to show that the above conditions for =A(l-A)/
equality are equivalent to p being a projection oper- Af(p,q)+(l-A)dp,q)
ator on exactly one coherent state.
b) Both the previous discussion and statement of Theo- *(OIn~/(o1n#)dpdq. (55)
rem 24 correspond to the normalization under which
h/2rr is replaced by 1/4~. In the real world all Since (V In f(p, q)/dp, qMV In f(p, q)/g(p, 4)) 2 0,
levels of uncertainty are to be appropriately restated the integral in the right side of (55) is nonnegative and the
in terms of multiples of Plan&s constant h. proof is complete. 0
Recall the isoperimetric inequality for entro-
Lemma 7: For any random vector X in R*” with a
pies (34)
density of the form f(p, q) = I le(x)$(xlp, q) &I* where
e E L,(R*“) and Ilel12= 1,
J(X) = 2n.
with equality iff Xp has a standard normal distribu- The proof of this lemma is by direct calculation. (For
tion. Because of this result, the above theorem details see 1201).
(Wehrl’s conjecture) is an immediate consequence
of the following stronger “incremental” version. V. DETERMINANT INEQUALITIES
Theorem 25 (Carlen [Ill, Dembo (201): For X,, as be- A. Basic Inequalities
fore, Throughout we will assume that K is a nonnegative
definite symmetric n x n matrix. Let IK( denote the de-
&qJ 5 1, terminant of K. In Section III, we have seen that the
entropy power inequality yields the M inkowski inequality
with equality iff p is an operator of rank 1. (see Theorem 5) and the concavity of In IK I (see Theo-
rem 8).
Remark: Starting with Theorem 25 and applying a per-
We now give Hadamard’s inequality using the proof in
turbation argument similar to the one presented in Sec-
[171.See also [331for an alternative proof.
tion III-C yields the monotonicity of N(fiX, + fix,*),
with respect to t E [O,11, where p* is any projection Theorem 26 (Hadamard): IK I < rIy=‘=,Kii, with equality
operator on a coherent state and Xp and X,,, are inde- iff Kij = 0, i # j.
pendent random vectors. The appropriate interpretation
Proof Let X N 4K. Then
of this result is, however, unclear.
Proofi The operator p may be decomposed into p = ~~n~2~e).lKI=h(X,,X2;..,X,)
C~=,XjPi where hi 2 0, C~=IAi = 1, and Pi are rank one
projection operators. Therefore, by (54) and the linearity
of p and Pi, I i h(Xi) = i t ln2~elKii(,
i=l i=l
We now provide a direct information theoretic proof of Remark: Since h(X,IX,-,, . . . , Xi> is a decreasing se-
Fan’s (see [22]) Theorem 8 (which states that In IK I is a quence, it has a lim it. Hence, by the Cesaro mean lim it
concave function of K). This proof does not use the theorem,
entropy power inequality, and provides an alternative to
the proof in Section III. h(X,,X,,...,X,)
lim
Proof of Theorem 8: Let X, and X2 be normally
distributed n-vectors, Xi N 4,$x>, i = 1,2. Let the random = lim 1 5 h(XkIXk-l,...,X1)
variable 13 have distribution Pr(f3 = 1) = A, Pr(6’= 2) = n+m n kc1
l- A, 0 5 A I 1. Let 0, Xi, and X2 be independent and
let Z = X0. Then Z has covariance K, = AK, +(1 - A)K,. = lim h(X,IX,-,;..,X,). (59)
n-tm
IIowever, Z will not be multivariate normal. By first using
Lemma 5, followed by Lemma 3, we have Translating this to determinants, one obtains the result
lK,I
~~n(2~e)n~AK,+(l-A)K2~~h(Z)~h(ZiO) lim IK,I
n+m 1/n = fkc IK,J ’
M inimization of a: over a set of allowed covariance In this derivation, a) follows from Lemma 8, b) from
matrices {K,} is aided by the following theorem. the fact the conditioning decreasesentropy and c) follows
from the fact that Z is a function of X and Y. The sum
Theorem 29: In(lK,I/IK,-,) is concave in K,. X, + Y, is normal conditioned on X1, X,, . . *, X,-i, Yi,
Proof: We remark that Theorem 8 is not applicable y2,. . ‘,y,-1, and hence, we can express its, entropy in
because In(lK,I/ IK,-,I) is the difference of two concave terms of its variance, obtaining equality ,d). Then e) fol-
functions. Let Z = X0, where X, N ~,Jx>, X, N 4,,(x), lows from the independence of X, and Y, conditioned on
Pr(B=l}= A =l-Pr{O=2}, and X,,X,,O are indepen- the past X,,X,;..,X,_,,Y,,Y,;..,Y,_,, and f> follows
dent. The covariance matrix K, of Z is given by from the fact that for a set of jointly normal random
K,=hS,+(l-h)T,. variables, the conditional variance is constant, indepen-
dent of the conditioning variables (Lemma 8).
The following chain of inequalities proves the theorem: In general, by setting A = hS and B = Cl- h)T, we
obtain
h~In(2~e)PlS,,I/IS,-pl+(1-A)~ln(2~e)plT,l/l~~-,l
(~)Ah(X,,,,,X,,,-,,...,X,,,-.,1IX,,,,~..,X,,,-,)+
(1-~)h(X2,n,X2,n-1,...,X2,n-p+lIX2,1,...,X2,n-p)
=h(Z,,Z,~~,~~~,Zn-p+~lZ1,...,Zn-p,e)
(b) i.e., IK,I/ JK,-,I is concave. 0
~h(Z,,Z,-l,...,Z,-p+llZ1,...,Zn--p)
Cc)1 IK,,l Simple examples show that IK,I/ IK,-,I is not neces-
5 5 In(2~e)P- (64 sarily concave for p 2 2.
IK,-,I ’
where a) follows from
h(X,,X,-l,...,X,-p+llX1,...,Xn--p) C. Subset Inequalities for Determinants
=h(X,;.* ,X,J-h(X1,-,X,-p), We now prove a generalization of Hadamard’s inequal-
b) follows from the conditioning lemma, and c> follows ity due to Siasz [35]. Let KG,, i,; 1., ik) be the principal
from a conditional version of Lemma 5. submatrix of K formed by the rows and columns with
Theorem 29 for the case p = 1 is due to Bergstrom [4]. indexes i,,i,; **,i,.
However, for p =,l, we can prove an even stronger theo- Theorem 31 (Szasz): If K is a positive definite n X n
rem, also due to Bergstrom [41. 0 matrix and Pk denotes the product of all the principal
Theorem 30: IK,I/IK,-ll is concave in K,. k-rowed m inors of K, i.e.,
Proof: Again we use the properties of normal ran-
dom variables. Let us assume that we have two indepen-
dent normal random vectors, X N $A, and Y N 4B,. Let
Z=XSY.
Then
IA, + &I (a)
i ln2rre = h(Z,IZ,-,,Z,-,,...,Z,)
IA,-, + &-,I
(63)
DEMBO er al.: INFORMATION THEORETIC INEQUALITIES 1517
then Let
P, 2 P$/(“I’) 2 Py(“T’) 2 * * * -> P n’
Prrbofi Let X- (bK. Then the theorem follows di-
rectiy from Theorem 1, with the identification h(kn)=
(l/n)ln Pk +(1/2)ln27re. Theorem 34:
0
We can also prove a related theorem. R,>R2>... >R,-,>R,.
Theorem 32: Let K be a positive definite n x n matrix Proof: The theorem follows immediately from Corol-
and let lary 2 and the identification
1 1 IK(S)llK(Sc)l
QO=-.--
n l~il~i*<c,, <i~~~lK(i,,i~~...,i,)ll’*. I(X(S);X(S”)) =?ln
IKI ’
(k 1
In particular, the outer inequality R, 2 R, results in
Then,
[lo] Y. D. Burago and V. A. Zalgaller, Geometric Inequalities. New [25] L. Gross, “Logarithmic Sobolev inequalities,” Amer. .I. Math., vol.
York: Springer Verlag, 1980. 97, pp. 1061-1083, 1975.
[II] E. A. Carlen, “Some integral identities and inequalities for entire [26] -, “Logarithmic Sobolev inequalities for the heat kernel on a
functions and their application to the coherent state transform,” J. Lie group, ” in White Noise Analysis. Singapore: World Scientific,
Functional Anal., 1991. 1990.
[12] -, “Superadditivity of Fisher’s information and logarithmic [27] T. S. Han, “Nonnegative entropy measures of multivariate symmet-
Sobolev inequalities,” .I. Functional Anal., 1991. ric correlations,” Inform. Contr., vol. 36, pp. 133-156, 1978.
[13] E. A. Carlen and A. Soffer, “Entropy production by convolution [28] I. I. Hirschman, “A note on entropy,” Amer. .I. Math., vol. 79, pp.
and central limit theorems with strong rate information,” Commun. 152-156, 1957.
Math. Phys., 1991. [29] S. Kullback, “A lower bound for discrimination information in
[14] M. Costa and T: M. Cover, “On the similarity of the entropy power terms of variation,” IEEE Trans. Inform. Theory, vol. IT-4, pp.
inequality and the Brunn-Minkowski inequality,” IEEE Trans. 126-127, 1967.
Inform. Theory, vol. IT-30, pp. 837-839, 1984. [30] E. H. Lieb, “Proof of an entropy conjecture of Wehrl,” Commun.
[15] M. H. M. Costa, “A new entropy power inequality,” IEEE Trans. Math. Phys., vol. 62, pp. 35-41, 1978.
Inform. Theory, vol. IT-31, pp. 751-760, 1985. [31] -, “Gaussian kernels have Gaussian maximizers,” Inventions
[16] T. M. Cover and J. A. Thomas, “Determinant inequalities via Math., vol. 102, pp. 179-208, 1990.
information theory,” SIAM .I. Matrix Anal. and its Applicat., vol. 9, [32] A. Marshall and I. Olkin, Inequalities: Theory of Majorization and
no. 3, pp. 384-392, July 1988. its Applications. New York: Academic Press, 1979.
[17] T. M. Cover and A. El Gamal, “An information theoretic proof of [33] -, “A convexity proof of Hadamard’s inequality,” Amer. Math.
Hadamard’s inequality,” IEEE Trans. Inform. Theory, vol. IT-29, Monthly, vol. 89, pp. 687-688, 1982.
pp. 930-931, Nov. 1983. [34] H. Minkowski, “Diskontinuitltsbereich fir arithmetische iquiva-
[18] LCsiszar, “Informationstheoretische konvegenenzbegriffe im raum lenz,” .I. fur Math., vol. 129, pp. 220-274, 1950.
der vahrscheinlichkeitsverteilungen,” Publ. Math. Inst., Hungarian [35] L. Mirsky, “On a generalization of Hadamard’s determinantal
Academy of Sci., VII, ser. A, pp. 137-157, 1962. inequality due to Szasz,” Arch. Math., vol. 8, pp. 274-275, 1957.
[19] A. Dembo, “A simple proof of the concavity of the entropy power [36] A. Oppenheim, “Inequalities connected with definite Hermitian
with resoect to the variance of additive normal noise,” IEEE Trans. forms,” .I. Lon. Math. Sot., vol. 5, pp. 114-119, 1930.
Inform. &Theory, vol. 35, pp. 887-888, July 1989. [37] C. E. Shannon, “A mathematical theory of communication,” Bell
[20] I, “Information inequalities and uncertainty principles,” Tech. Syst. Tech. .I., vol. 27, pp. 379-423, 623-656, 1948.
Reo.. Dept. of Statist.. Stanford Univ., Stanford, CA, 1990. [38] A. Stam, “Some inequalities satisfied by the quantities, of informa-
[21] D. L. Donoho and P.‘B. Stark, “Uncertainty principles and signal tion of Fisher and Shannon,” Inform. Contr., vol. 2, pp. 101-112,
recovery,” SIAM .I. Appl. Math., vol. 49, pp. 906-931, 1989. 1959.
[22] K. Fan: “On a theorem of Weyl concerning the eigenvalues of [39] A. Wehrl, “General properties of entropy,” Reu. Modern Phys., vol.
linear transformations II,” Proc. National Acad. Sci. U.S., vol. 36, 50, pp. 221-260, 1978.
1950, pp. 31-35. [40] M. Zakai, “A class of definitions of ‘duration’ (or ‘uncertainty’) and
[23] -, “Some inequalities concerning positive-definite matrices,” the associated uncertainty relations,” Inform. Contr., vol. 3, pp.
Proc. Cambridge Phil. Sot.. vol. 51. 1955. PD. 414-421. 101-115, 1960.
[24] H. Federer, Geometric measure theory, Dol.-B 1.53 of Grundl. Math.
wiss. Berlin: Springer-Verlag, 1969.