0% found this document useful (0 votes)
19 views18 pages

Information Theoretic Inequalities

Uploaded by

oumarfolysow363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views18 pages

Information Theoretic Inequalities

Uploaded by

oumarfolysow363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

IEEETRANSACTIONSONINFORMATIONTHEORY,VOL.37,NO.

6,NOVEMBER1991 1501

Information Theoretic Inequalities


Amir Dembo, Thomas M . Cover, Fellow, IEEE, and Joy A. Thomas, Member, IEEE
Invited Paper

Abstract-The role of inequalities in information theory is out to be another information quantity, the Fisher infor-
reviewed and the relationship of these inequalities to inequali- mation.
ties in other branches of mathematics is developed.
A large number of inequalities can be derived from a
Index Terms-Information inequalities, entropy power, Fisher strengthened Young’s inequality. These inequalities in-
information, uncertainty principles. clude the entropy power inequality, the Brunn M inkowski
inequality and the Heisepberg uncertainty inequality.
I. PREFACE:~NEQUALITIES ININFORMATIONTHEORY
These inequalities are extreme points of the set of in-
NEQUALITIES in information theory have been equalities derivable from a central idea. Logically inde-
I driven by a desire to solve communication theoretic pendent derivations of these inequalities exist and are
problems. To solve such problems, especially to prove based on Fisher information inequalities such as the
converses for channel capacity theorems, the algebra of Cramer-Rao inequality.
information was developed and chain rules for entropy Turning our attention to simple inequalities for differ-
and mutual information were derived. Fano’s inequality, ential entropy, we apply them to the standard multivari-
for example, bounds the probability of error by the condi- ate normal to furnish new and simpler proofs of the major
tional entropy. Some deeper inequalities were developed determinant inequalities in classical mathematics. In par-
as early as Shannon’s 1948 paper. For example, Shannon ticular Hadamard’s inequality, Ky Fan’s inequality and
stated the entropy power inequality in order to bound the others can be derived thjs way. Indeed we find some new
capacity of non-Gaussian additive noise channels. matrix inequalities by this method. Moreover the entropy
Information theory is no longer restricted to the do- power inequality, when specialized to matrices, turns out
main of communication theory. For this reason it is inter- to yield M inkowski’s determinant inequality, yet another
esting to consider the set of known inequalities in infor- tangency with the M inkowski of Brunn-Minkowski.
mation theory and search for other inequalities of the In the process of finding determinant inequalities we
same type. Thus motivated, we will look for natural fami- derive some new differential entropy inequalities. We
lies of information theoretic inequalities. restate one of them as follows. Suppose one is looking at
For example, the entropy power inequality, which says ocean waves at a certain subset of points. Then the
that the entropy of the sum of two independent random average entropy per sample of a random subset of sam-
vectors is no less than the entropy of the sum of their ples can be shown to increase as the number of sampling
independent normal counterparts, has a strong formal points increases. Gn the other hand, the per sample
resemblance to the Brunn M inkowski inequality, which conditional entropy of the samples, conditioned on the
says that the volume of the set sum of two sets is greater values of the remaining samples, monotonically decreases.
than or equal to the volume of the set sum of their Once again using these entropy inequalities on the stan-
spherical counterparts. Similarly, since the exponentiated dard multivariate normal leads to associated matrix in-
entropy is a measure of volume it makes sense to consider equalities and in particular to an extension of the se-
the surface area of the volume of the typical set associ- quence of inequalities found by Hadamard and Szasz.
ated with a given probability density. Happily, this turns By turning our attention from the historically necessary
inequalities to the natural set of inequalities suggestedby
Manuscript received February 1, 1991. This work was supported in
part by the National Science Foundation under Grant NCR-89-14538
information theory itself, we find, full circle, that these
and in part by JSEP Contract DAAL 03-91-C-0010. A Dembo was inequalities turn out to be useful as well. They improve
supported in part by the SDIO/IST, managed by the Army Research determinant inequalities, lead to overlooked inequalities
Office under Contract DAAL 03-90-G-0108 and in part by the Air Force
Office of Scientific Research, Air Force Systems Command under
for the entropy rate of random subsets and demonstrate
Contract AF88-0327. Sections III- and IV are based on material pre- the unity between physics, mathematics, information
sented at the IEEE/CAM Workshop on Information Theory, 1989. theory and statistics (through unified proofs of the
A. Dembo is with the Statistics Department, Stanford University,
Stanford, CA 94305.
Heisenberg, entropy power, Fisher information and
T. M. Cover is with the Information Systems Laboratory, Stanford Brunn-Minkowski inequalities).
University, Stanford, CA 94305. The next section is devoted to differential entropy
J. Thomas was with Stanford University. He is now with the IBM
T. J. Watson Research Center, Yorktown Heights, NY 10598.
inequalities for random subsets of samples. These in-
IEEE Log Number 9103368. equalities when specialized to multivariate normal vari-
001%9448,‘91 $01.00 01991 IEEE
1502 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

ables provide the determinant inequalities presented in Lemma 2: If (X,Y) have a joint density, then h(XIY)
Section V. Section III focuses on the entropy power = h(X,Y)- h(Y).
inequality (including the related Brunn-Minkowski,
Proof:
Young’s and Fisher information inequalities) while Sec-
tion IV deals with various uncertainty principles and their h(xIY) = -lf(x9y)Inf(xly)hdy
interrelations.

II. INFORMATION INEQUALITIES = -lf(x,y)ln(f(x,y)/f(y)) hdy


A. Basic Inequalities
= -jf(x,y)lnf(x,y)hdy +/f(y)lnf(y)dy
In this section, we introduce some of the basic informa-
tion theoretic quantities and a few well-known simple =h(X,Y)-h(Y). q
inequalities using convexity. We assume throughout that
the vector X = (Xi, X,, . . . , X,) has a probability density Lemma 3: h(XIY) < h(X), with equality iff X and Y
f(XI,X2,. * *>x~). We need the following definitions. are independent.
Definition: The entropy h(X,, X,, . . . , X,1, sometimes
written h(f), is defined by Proofi
h(X)-h(XIY)=lf(x,y)ln(f(xly)/f(x))
h(X,, x*3.. . ,X,)=-/flnf=E(-lnf(X)).
= / f(x,y)ln(f(x,y)/f(x)f(y)) 20,
The entropy may be infinite and it is well defined as
long as either E(max{ln f(X), OH or E(max{ - In f(X), O>> by D(f(x, y>llf(x)f(y>> 2 0. Equality implies f(x, y> =
are finite. f(x>f(y), a.e., by strict concavity of the logarithm. q
The entropy is a measure of the number of bits re- Lemma 4 (Chain Rule, Subadditivity of the Entropy):
quired to describe a random variable to a particular
accuracy. Approximately b + h(X) bits suffice to describe h(X,,X,,. . . 3X,)= f h(XiIXi-l,Xi-2,....,X1)
X to b-bit accuracy. Also, e(2/n)h(x) can be interpreted as i=l
the effective support set size for the random variable X.
This point is further explored in Section III. s 2 h( xj)
Definition: The functional i=l
with equality iff Xi, X,; * *, X,, are independent.
D(fllg) =/f(x) ln(f(x>/gW) dr: Proof: The equality is the chain rule for entropies,
is called the relative entropy, where f and g are probabil- which we .obtain by repeatedly applying Lemma 2. The
ity densities. inequality follows from Lemma 3, and we have equality iff
The’ relative entropy D(f l/g> is also known as the Xl, x*; . ., X,, are independent. q
Kullback Leibler information number, information for dis- We will also need the entropy maximizing property of
crimination, and information distance. We also note that the multivariate normal. Throughout we denote by bK(x)
D(f Ilg> is the error exponent in the hypothesis test of the joint density of the multivariate normal vector with
density f versus g. zero-mean and covariance K.
Definition: The conditional entropy h(X(Y) of X given Lemma 5: Let the random vector X E R" have zero-
Y is defined by mean and covariance K = EXX’, i.e., Kij = EXiXj, 1 I i,
j _<n. Then h(X) I $ln(2rre)“lK!, with equality iff f(x) =
h(XIY) = -/f(x,y)lnf(xly)dudy. 4,(x).

We now observe certain natural properties of these Proof: Let g(x) be any density satisfying
information quantities. lg(X>XiXj dx = Kjj, for all i, j. Then,
Lemma 1: D(f Ilg) r 0, with equality iff f = g a.e.
Proofi Let A be the support set of f. Then, by
Jensen’s inequality, = jg ln(gb#k)

- D(f llg) =Lf Wg/f)


=-- h(g)- jglnh
~In~f(g/f)=lnJAg~lnl=O,
A’

with equality only if g/f = 1, a.e., by the strict concavity = -h(g)+h(&), (1)
of the logarithm (see [HI, [29]). q where the substitution /g In 4K = j~$~ In $K follows from
DEMBO et al.: INFORMATION THEORETIC INEQUALITIES 1503

the fact that g and 4K yield the same expectation of the or


quadratic form In +K(~). q 1
,h(X,,&,-3,)
B. Subset Inequalities for Entropy
1 n h(X~,X2,...,X~-l,Xi+~~...,X~)
Motivated by a desire to prove Szasz’sgeneralization of 4- c
n-1 ) (2)
Hadamard’s inequality in Section V, we develop a new ’ i=*
inequality on the entropy rates of random subsets of which is the desired result h’,“’ I h’,“l 1.
random variables. We now prove that h(kn)I h$‘? i for all k I n, by first
Let Xi, X2, .‘. . , X,, be a set of n random variables with conditioning on a k-element subset, then taking a uni-
an arbitrary joint distribution. Let S be any subset of the form choice over its (k - l&element subsets. For each
indices { 1,2, * * *, n). We will use X(S) to denote the k-element subset, h(kk)I hikjl, and hence, the inequality
subset of random variables with indices in S and S” to remains true after taking the expectation over all k-ele-
denote the complement of S with respect {1,2, * . . , n}. For ment subsets chosen uniformly from the n elements. q
example, if S = {l, 3], then X(S) = IX,, X,] and XC?‘) =
{X2, &,X5,. . *, X,}. Recall that the entropy h(X) of a Corollary 1: Let r > 0, and define
random vector X E.R~ with density function f(x) is
$O= - lc eWW(O/k) .
(3)
h(X) = -jf(x)In f(x) 05. S: ISI = k
( ii 1
If S = Ii,, i,, . . . , ik}, let Then,
p&p2 ..* &p.
(4)
h(X(S)) =h(xi,,xi,,***,xi,).
Proof: Starting from (2) in Theorem 1, we multiply
Let
both sides by r, exponentiate, and then apply the arith-
h(kn)= vw)) metic-mean, geometric-mean inequality to obtain
ib k
k S:lSI=k e(l/n)rh(X,,Xz;..,X~,, < el/n~.:=,(rh(X,,X2;..,X,-,,X,+,,...,Xnj/n-1)
( 1 -
1 n
be the entropy rate per element for subsets of size k s-- Ce (rh(X1,Xzr...,X,-l,Xj+lr...,X,)/n--1)
averaged over all k-element subsets. Here, h(kn) is the n i=l
average entropy in bits per symbol of a randomly drawn
for all r 2 0, (5)
k-element subset of {Xi, X2,. * *, XJ. This quantity is
monotonically nonincreasing in k as stated in the follow- which is equivalent to ~2) I s$ttl. Now we use the same
ing theorem (due to Han [27]). arguments as in Theorem 1, taking an average over all
subsets to prove the result that for all k < n, sl;“) I $2 1.
Theorem 1: q
h:“‘>$)> +.. >h@)n *
The conditional entropy rate per element for a k ele-
Proof: (Following [16]). We first prove the inequality ment subset S is h(X(S)IX(S”))/k.
n 5 h??,. We write
h”” Definition: The average conditional entropy rate per
element for all subsets of size k is the average of the
h(X,,&,.**, X,) =h(X,,X,/-,X,-,)
previous quantities for k-element subsets of {l, 2, * * *, n},
+h(X,IX1,Xz;..,X,-,), i.e.,
h(X,,X,,. ..,X,)=h(X,,X,,...,X,-2,X,) fWS)Ix(S”))
gp’=- l c
+h(X,-,IX,,X,,...,X,-,,X,) S: ISI = k
k .
( It 1
~h(X,,X,,...,x,-,,x,)
Here, g,(S) is the entropy per element of the set S
+h(X,-1IX1,X2,...,Xn-2), conditional on the elements of the set SC. When the size
of the set S increases, one could expect a greater depen-
dence between the elements of the set S, and expect a
h(X,,X,>. decrease in the entropy per element. This explains Theo-
Adding these n inequalities and using the chain rule, rem 1.
we obtain In the case of the conditional entropy per element, as k
increases, the size of the conditioning set SC decreases
nh(XI,X2,-~~,X,J and the entropy of the set S increases since conditioning
n reduces entropy. In the conditional case, the increase in
5 c h(X,,X,,...,Xi-~,Xi+~,...,X,) entropy per element due to the decrease in conditioning
iLl
dominates the decrease due to additional dependence
+h(X,,&,-..,X,) between the elements and hence, we have the following
1504 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

theorem that is a consequence of the general formalism other determinant inequality along the lines of Szasz’s
developed by Han [271. theorem; however, unlike the inequalities in the previous
section, there is no normalization by the number of ele-
Theorem 2:
ments in the subset.
g!“) 5 gp _< - * * 5 gp. Let
Proofi The proof proceeds on lines very similar to ip)=- l
the proof of the theorem for the unconditional entropy
c I(X(S);X(S”))
s: ISI = k
per element for a random subset. We will first prove that (: 1
g@)2 gr!r, and then use this to prove the rest of the be the average mutual information between a subset and
mequalities. its complement averaged over all subsets of size k. By the
By the chain rule, the entropy of a collection of random symmetry of mutual information and the definition of ip),
variables is less than the sum of the entropies, i.e., it is clear that ip) = iF’.k.

h(X,,X,,**-, X,) i k h(Xi). Theorem 3:


i-l i’,“)si’;“)< **a <it:)21.
Subtracting both sides of this inequality from nh(X,, Remark: Note that the dependence between sets and
x*3. * *, X,), we have their complements is greatest when they are of equal size.
(n-l)h(X,,X,,-J,J Proofi Let k I [n /2]. Consider a particular subset S
of size k. S has k subsets of size .k - 1. Let Sj denote the
2 2 (h(X,,X2,...,X,)-h(Xi))
i=l subset S -{j). Then
k~(X(s);X(s”))- j$s’(X(Sj);X(sf)) 1
= k h(X,,X,;..,Xi-,,Xi+,;..,X,lXi).
i=l
Dividing this by n(n - l), we obtain = j~sz(x(sj)?~~x(sc))-z(x(sj)~x(sc)>xj)

h(X,,X,,.--,X,z)
=j~s~(x(sj)~x(sc))+z(xj~x(sc)/x(sj))
n
1 n h(X,,X,,...,Xi-,,Xj+~,...,X,lxi) -I(X(Sj);X(S’))-I(X(Sj);XjlX(S’))
2-- c 7
n i=l n-l
= j~sh(XjIX(Sj))-h(X;IX(Sj)>X(Sc))
which is equivalent to gr) 2 gr!r.
We now prove that gp) 2 gp!r for all k I n by first -h(XjlX(S’))+h(XjIX(S’),X(Sj))
conditioning on a k-element subset, then taking a uni-
form choice over its (k - l&element subsets. For each = c h(XjlX(Sj))- h(XJX(S”)).
jES
k-element subset, gv’ >_ g$!?r and hence, the inequality
remains true after taking the ‘expectation over all k-ele- Summing this over all subsets of size k, we obtain
ment subsets chosen uniformly from the n elements.
1
0
C kZ(X(s);X(s”))- jI$s’(X(sj)~X(s~))
S: ISI = k [
C. Inequalities for Average Mutual Information
between Subsets = C C h(XjlX(Sj))-h(XjIX(S”)).
S:ISI=kjES
The previous two theorems can be used to prove the Reversing the order of summation, we obtain
following statement about mutual information.
Corollary 2: Let

fP,=--
L c
qx(s);x(s”))
C

=e
S: ISI = k [
k~(X(s);X(s”))- jf.s’(X(sj);X(sf))
1
k ’ C h(XilX(Sj))-h(XjlX(SC))
( k 1 S:IS[=k j=lS:ISI=k,Ssj
Then,
fp 2 fi’“‘r . . * > f Cn) =I? c h(XjIX(S’))
- n ’ j=l S’:I,S’l=k-l,S’s+j
Proof: This result follows from the identity
-h(XjIX((S’U j}‘))
1(X(S); X(S’)> = h(X(S)) - h(XW(X(F)) and Theo-
rems 1 and 2. 0
=a c h(XiIX( S’))
We now prove an inequality for the average mutual j=l LS’:S’c(jJC,IS’I=k-1 ~ -

I* (6)
information between a subset and its complement, aver-
aged over all subsets of size k in a set of random - c h( XjlX( s”))
variables. This inequality will be used to prove yet an- 9’: S” c[jF, IS”1 = n - k
DEMBO etal.:INFORMATIONTHEORETICINEQUALITIES 1505

Since k I [n/2], k - 1 < n - k. So we would expect normal counterparts, has a strong formal resemblance to
that the second sum in (6) to be less than the first sum, the Brunn M inkowski inequality, which says that the
since both sums have the same number of terms but the volume of the set sum of two sets is greater than or equal
second sum corresponds to entropies with more condi- to the volume of the set sum of their spherical counter-
tioning. We will prove this by using a simple symmetry parts. Both are interpreted here as convexity inequalities
argument. for RCnyi entropies that measure the uncertainty associ-
The set S” with n - k elements has 21: subsets of ated with a random variable X via the pth norm of its
( 1
size k - 1. For each such subset S’ of size k - 1, we have density (see Section III-A). A strengthened version of
Young’s inequality about the norms of convolutions of
h(XjlX(S”)) Ih(XjlX(S’)), (7) functions, due to Beckner [3] and Brascamp and Lieb [8]
since conditioning reduces entropy. Since (7) is true for is equivalent to a more general convexity inequality, with
each subset s’ c s”, it is true of the average over subsets. both the entropy power and the Brunn-Minkowski in-
Hence, equality being extreme points (see Section III-B).
This proof of the entropy power inequality (due to Lieb
c h(XjlX(S'))* [30]) is different from Stam’s [38] proof, which relies upon
S’:S’cS”,\S’I=k-1 a convexity inequality for Fisher information. Neverthe-
less, the interpretation of the entropy power inequality as
a convexity inequality for entropy allows for a new, sim-
(8)

Summing (8) over all subsets S” of size IZ- k, we get pler version of Stam’s proof, presented here in Section
III-C.
C h( ,“))
xjlx( Isoperimetric versions of the entropy power and the
S”: lS”j = n - k
Fisher information inequalities have derivations that par-
allel the classical derivation of the isoperimetric inequal-
h( xjlx( s’))
c ity as a consequence of the Brunn-Minkowski inequality
S’: S’CS”, IS’1=k-1
(see Section III-D following Costa and Cover [14] and
Dembo [191).
= C h(XjlX(S')), (9)
S’: IS’I=k-1

since by symmetry, each subset S’ occurs in A. Entropy Power and Brunn - Minkowski Inequalities

(,-,:,)=(;Z:) sets S”. The definition of the entropy power and the associated
entropy power inequality stated next are due to Shannon
Combining (6) and (9), we get [37]. The entropy power inequality is instrumental in
establishing the capacity region of the Gaussian broadcast
c kZ(X(S);X(S”)) - c Z(X(S,);X(S;)) 2 0. channel ([5]) and in proving convergence in relative en-
S: ISI = k jES I tropy for the central lim it theorem ([2]).
Since each set of size k - 1 occurs n - k + 1 times in the Definition: The entropy power of a random vector X E
second sum, we have R” with a density is
c kZ(X(S);X(S”))
S: ISI = k
N(X)=&exp ih(X) .
2 C C z(x(sj);x(sf))
i i
S:ISI=kiES
In particular, N(X) = JKll’” when X = +K.
=(n-k+l) c Z(X(S’);X(S’“)).
S’:IS’I=k-1 Theorem 4 (Entropy Power Inequality): If X, Y are two
Dividing this equation by k(i), we have the theorem independent random vectors with densities in R” and
both h(X) and h(Y) exist, then,
1
N(X+Y)av(X)+N(Y). (10)
Equality holds iff X and Y are both multivariate normal
with proportional covariances.

In the sequel (see Section III-C), we shall present a


simplified version of Stam’s first proof of this inequality
III. THEENTROPYPOWERAND RELATED (in [381) as well as a less known proof due to Lieb [30].
ANALYTICAL INEQUALITIES The next matrix inequality (Oppenheim [36], Marshall
The entropy power inequality, which says that the ‘en- and Olkin [32, p. 4751) follows immediately from the
tropy of the sum of two independent random vectors is no entropy power inequality when specialized to the multi-
less than the entropy of the sum of their independent variate normal.
1506 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

Theorem 5 (Minkowski’s Inequality [341): For any two Remark: See Section V-A for an alternative informa-
nonnegative definite matrices K,, K, tion theoretic proof of both Theorems 5 and 8, which
avoids the entropy power inequality.
[K, + Kzll’” 2 lK$‘” + IKZll’+,
The entropy power inequality has a strong formal re-
with equality iff K, is proportional to K,. semblance of the Brunn-Minkowski inequality. For defin-
ing the latter, let p denote Lebesgue measure in Rn (i.e.,
Proof: Let X1,X, be independent with Xi N 4K,.
set volume in Rn) and A + B denote the M inkowski sum
Noting that X, + X, N $K,+K, and using the entropy
(in Rn) of the (measurable) sets A and B, that is
power inequality yields
A+B={x+y:x~A,y0}.
IK, + Kzll’n = N( X, + X,)
Theorem 9 (Brunn-Minkowski Inequality 1241)
2 N(-X,) + N( X,)
0 ,u(A+B)l’n>p(A)l’n+~(B)*‘n. (14)
= lK$“” + IK211?
The following alternative statement of the entropy Proof: For a very simply geometric proof, see [24].
power inequality is given in Costa and Cover [14]. An alternative proof of this inequality as an extreme point
of Young’s inequality (which is due to Brascamp and
Theorem 6: For any two independent random vectors Lieb, see [71 and [91>is presented in Section III-B.
X, Y such that both h(X) and h(Y) exist, The entropy power is a measure of the effective vari-
ance of a random vector while p(A>l/” measures the
h(X+Y)rh(2++), w effective radius of a set A. Thus, the entropy power
where 2, Y are two independent multivariate normal with inequality, which says that the effective variance of the
proportional covariances, chosen so that h(i) = h(X) and sum of two independent random vectors is no less than
h(f) = h(Y). the sum of the effective variances of these vectors, is the
Proof: For X and Y multivariate normal, M inkowski’s dual of the Brunn-Minkowski inequality, which says that
inequality and the entropy power inequality (lo), hold the effective radius of the set sum of two sets is no less
with equality. Furthermore, X and Y are chosen so that than the sum of the effective radii of these sets. In this
formal duality normal random variables are the analog of
N(X+f)=N(X)+N(E)=N(X)+N(Y)IN(x+Y), balls (being the equality cases for the previously men-
tioned inequalities), and the sum of two independent
where the last inequality follows from (10). Thus (10) and random vectors is the analog of the M inkowski sum of
(11) are equivalent. q sets. This analogy is suggestedin [14], where the existence
Alternatively, the entropy power inequality also of a family of intermediate inequalities is conjectured.
amounts to the convexity of the entropy under the “co- We shall further develop this issue here and show in
variance preserving transformation” fix + JGiY as Section III-B that Young’s inequality is the bridge be-
follows. tween the entropy power and the Brunn-Minkowski in-
equalities. The following family of Renyi entropies helps
Theorem 7: For any 0 I h I 1, in illustrating these relationships.
Definition: The pth R&yi entropy h,(X) of a random
h(&X+mY)-Ah(X)-(l-h)h(Y)>O. (12) variable X with density f in R” is defined by
Proof: For X and Y the inequality (12) holds trivially
with equality. Therefore, (12) is equivalent to h,(X) = &lnE[f(X)‘“-“1 =&l”(“fll,),

h(fiX+fiY)rh(&b+f), (15)
The latter inequality is merely (11) with fix substituted for 0 < p <w, p + 1, where IIf (Ip = [ Jf(x)p&ll/P. The
for X and flY substituted for Y. 0 RCnyi entropies for p = 0 and p = 1 are defined as the
Remark: Theorem 7 parallels part of Lieb’s proof of lim its of h,(X) as p + 0 and p -+ 1, respectively. It fol-
Theorem 4 (in [30]). lows directly from the previous definition that
In parallel with the above derivation of M inkowski’s h,(X) = ji,mohhp(X) = lnp({X: f(x) > 0}), (16)
inequality, the following theorem due to Ky Fan [22]
results from specializing (12) to the multivariate normal. and
Theorem 8 (1;;: Fan D2]) In IK I is concave. h,(X)= !Ln, h,(X)=h(X). (1-J)
F ‘-
Proofi Consider (12) for X N 4K1 and Y N 4K2. Then,
dTx+diTY is also multivariate normal with covari- Therefore, the (Shannon) entropy is identified with the
ante AK, + (1 - h)K,, and (12) becomes Renyi entropy of index p = 1, while the logarithm of the
essential support of the density is identified with the
lnIAK,+(l-A)K,I~hln(K,]+(l-A)ln]K-J. 0 (13) RCnyi entropy of index p = 0.
REMBO et al.: INFORMATION THEORETIC INEQUALITIES 1507

A convexity inequality for Renyi entropies of index Theorem 11 (Young’s Inequality): If l/r + 1 = l/q +
p = 0, which is the dual of (121, is the following. l/p, then for 1 I r, p, q 2 ~0,
Theorem 10: For any 0 5 A 5 1 and any two indepen- ~~~~~~~{(ll.f*RIII)/(Il~IlpllYiiy)) 5 (cpC*/C,y2. (20)
dent random vectors X, Y,
g = L;(R”)
h,(AX+(l-A)Y)-Ah,(X)-(1-h)h,(Y)>O. (18)
Here,
Remarks: cp = (p)l’P/ Ipl(l’P’,
a) While Theorem 7 deals with convexity under the where p’ is the Holder conjugate of p (i.e., l/p + l/p’ =
“variance preserving transformation” fiX + 1) and cq and c, are likewise defined. The converse
fiY, this theorem deals with convexity under inequality holds for the infimum of Ilf*gll,./ Ilfll,,llgll,
the “support size preserving transformation” AX + whenO<r,p,q<l.
(l- A)Y.
Remark: For the multivariate normal densities f = +hK,
b) The proof of Theorem 10 is deferred to Section
III-B. A family of convexity inequalities for RCnyi and g = 4(1-A)K, (where A = (l/p’)/(l/r’), and conse-
entropies is derived there as consequences of quently 1 - A = (l/q’)/(l/r’)), Young’s inequality re-
Young’s inequality and both Theorems 7 and 10 are duces to K. Fan’s matrix Theorem 8. Actually, (20) is
obtained as extreme (limit) points. Here we derive established in [8] by showing that the supremum is
achieved by multivariate normal densities, where the con-
only the Brunn-Minkowski inequality as a conse-
stants in the right side of (20) are determined by applying
quence of Theorem 10.
K. Fan’s matrix Theorem 8. For a detailed study of cases
Proof of Theorem 9: Choose a pair of independent of equality in this and related inequalities see [311.
random vectors X and Y in R” such that the support of The following convexity inequality for RCnyi entropies
the density of AX is the set A and the support of the (which is the natural extension of Theorem 7) is a direct
density of (l- A)Y is B. Clearly, the support of the consequence of Young’s inequality.
density of AX + (1 - h)Y is the (essential) M inkowski sum
Theorem 12: For anyO<r<a, r#l and anyOIA<
A + B, while (l/A)A and (l/(1 - A))B are the support
sets of the densities of X and Y, respectively. Therefore, 1, let p,q be such that l/p’ = Al/r’ and l/q’ =
taking (16) into account, the inequality (18) specializes for (l- All/r’, th en f or any two independent random vectors
these random vectors to X, Y with densities in R”,
h,(fiX+mY)-Ah,(X)-(l-A)h,(Y)
In,u((A + B) 2 AInp((l/A)A)
rh,(~,)-Ah,(~,)-(l-A)h,(~,), (21)
+(l-A)ln~((l/(l-A))B). (19)
provided that both h,(X) and h,(Y) exist.
Observing that In p((l/A)A) = In p(A) - n In A and Here, 41 stands for the standard normal density in R’.
In p((l/(l - h)jB) = In p(B) - iz In (1 - A>, the Brunn- In establishing the inequality (21) we use the well-known
M inkowski inequality results when rearranging the above scaling property of RCnyi entropies
inequality for the particular choice of A = hp(aX)=hp(X)+nlnlal.
&4>1’“/(&4>1’” + /L(B>~‘“>. 0 (22)
This identity follows from the definition in (15) by a
B. Young’s Inequality and Its Consequences change of variable argument.
There is a strong formal resemblance between the Proof: Fix r and A. We plan to apply Young’s in-
convexity inequalities (12) and (18) (where the former equality for f the density of fix and g the density of
yields the entropy power inequality while the latter results J1-hY. Since h,(X) and h,(Y) are well defined, so are
in the Brunn-Minkowski inequality). This resemblance
suggeststhe existence of a family of intermediate inequal- h,(fiX)=-p’lnIlfll,=h,(X)+~lnA
ities. Young’s inequality, which is presented in the sequel,
and
results after few manipulations with these inequalities
(see (21)). In particular, we follow Lieb’s (in [30]) and h,(mY)=-q’ln~~g~/,=h,(Y)+~ln(l-A).
Brascamp and Lieb’s (in [9]> approach in regarding and
proving Theorems 7 and 10 (respectively) as lim its of (21). These identities are applications of (15) and (22), and in
For that purpose let L,(P) denote the space of com- particular they imply that f E L,(R”) and g E L,(Rn).
plex valued measurable functions on R” with Ilfll, <co Further, since X and Y are assumed independent,
and let f*g(x) = /f<x - y)g(y) dy denote the convolution
operation. - r’lnllf+gllr = h,(fiX+fiY).
The following sharp version of Young’s inequality is Observe that p, q in Theorem 12 are such that l/p’ +
due to Beckner [3] and Brascamp and Lieb [8]. l/q’ = l/r’ (so that l/r + 1 = l/q + l/p), and l/r’ < 0
1508 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBEF 1991

implies 0 < r,p,q < 1, while l/r’ > 0 implies 1 < r,p,q. where H(A) 4 = - A In A - (1 - A)ln(l - A). Combining
Therefore, Theorem 11 is applicable for f and g, result- this lim it with (2.5)yields
ing in
h,(fiX+dl-Y)=Ah,(X)-(1-A)h,(Y)+(A).
- ~‘1n{(llf*~ll,>/(ll~Il,ll~ll,)}~ - r’~ln(c,c,/c,). Inequality (18) is now obtained by the resealing X + fix
(23) and Y +- m Y (using the scaling property (22)): This
completes the proof of Theorem 10. 0
This inequality holds with equality for f = 4*, and g =
&-A), (i.e., X - 41, Y N +r) since for any p # 0, p # 1 Remarks:

(24) a) The proof of Theorem 7 follows Lieb’s proof of the


entropy power inequality (see [30]).
b) In [9], Brascamp and Lieb prove the PrCkopa-
Combining all these identities, the inequality (23) results Liendler inequality
in (21). 0
/s:p(f(~)l-Ag(i)A]~~l, (26)
We now show that the convexity Theorems 7 and 10
(i.e., the inequalities (12) and (18) respectively) are the for every pair of densities f, g in R” and any 0 < A
extreme lim it points r + 1 and r -+ 0 of the RCnyi entropy < 1. For g(a) a uniform density on A/A and f(.> a
convexity Theorem 12. uniform density on B/(1 - A), this inequality re-
Proof of Theorem 7: Fix 0 < A < 1, and assume that duces to the Brunn-Minkowski inequality (19). The
h(X) and h(Y) are well defined. Further assume that (21) proof of Theorem 10 is a simplified version of
>holds for some r0 f 1. Then, Theorem 12 holds for any Brascamp and Lieb’s proof of (26).
choice of r between r0 and 1 (i.e., the entropies h,(X) c) Theorem 7 of [8] deals with Xi,. . *, X,, independent
and h,(Y) exist for the resulting p and q>. It is easily random variables with densities in Rn, and (k - 1) 2
verified that r --) 1 with A fixed implies that p + 1 and 12 1 deterministic linear combinations of these vari-
q -+ 1. Therefore, by the continuity of entropies (17) in ables Yl; . . , q. Let V have the density of Yi condi-
the lim it as r + 1, the inequality (21) reduces to (12), thus tional upon Yl = * . * = 6, then this theorem implies
completing the proof of Theorem 7. 0 that the m inimum of
Proof of Theorem 10: Again fix 0 I A I 1. Now
assume that h,(X) and h,(Y) are well defined and that hr(‘) - j=lI? Ajhp,(xj)
(21) holds for some r0 < 1. Then Theorem 12 holds for
i
any choice of r between r0 and 0 (i.e., the entropies is obtained for Xl, * . . , X, normal random varia-
h,(X) and h,(Y) exist for the resulting p and q). Fur- bles with appropriate diagonal covariance matrices.
ther, as r-0, also p=l/(l-A(l-l/r))+0 and q= This theorem holds for any 1 < r I w, and any Aj =
l/(1 --cl- A)(1 - l/r)> + 0. Thus, in the lim it r + 0, the r’/p(i 2 0 such that ZF=iAj = 1+ r’(Z - 1). For I= 1,
inequality (21) reduces by (16) to Cy= iAj = 1 and V = Yi = Xi + * * * + X,, this inequal-
ity results in Young’s inequality. It seems plausible
h,(fiX+flY)-Ah,(X)-(l-h)h,(Y) that new entropy inequalities may be derived by
considering lim its of this more general inequality for
1>1.
1 A
= 5 lim - InA-- ln1- ~(1-A) In1 C. Fisher Information and the Entropy Power Inequality
r+~ i l-r r 1-P P 1-q 4 1’
Stam’s proof of the entropy power inequality (see [38])
(25) is based on a simple inequality about Fisher information
where the right-hand equality is in view of (24). coupled with a continuous normal perturbation argument.
Note that A /(l - p) + (1 - A)/(1 - q) = (1 + r)/(l - r) A simplified version of this proof is presented here, where
and lim ,,, (r/p) = A while lim ,,, (r/q) = (1 - A). a simple expliiit normal perturbation yields the convexity
Therefore, inequality (12). As we have seen already, inequality (12) is
equivalent to the entropy power inequality (10).
1 A Definition: The Fisher information of X with respect to
-lnl--- & ~(1-A) In1
F3 i 1- r r 1-P P 1-q 4 1 a scalar translation parameter is
A r (l-h) r J(X) =pfwfwg). (27)
-Ini-- ~
l-p lnp- (l-q) lnq
Equivalent statements of the following convexity in-
= H(A) + Rio
lim &lnr=H(A), equality about Fisher information are proved in [6], [14],
[38]. (For matrix versions see [20]).
DEMBO et al.: INFORMATION THEORETIC INEQUALITIES 1509

Theorem 13 (Fisher Information Inequality): For any Since de, /dt = - l/ t2 we obtain by an application of the
two independent random vectors X, Y and any 0 5 A I 1, well-known scaling property J(X) = cu2J(aX)
AJ(X)+(l-A)J(Y)-J(fiX+\ll-hY)>O. (28)
2t;{s(t)} =AJ(X,)+(l-A)J(Y,)-J(K). (31)
This is the first instrumental tool for the proof of the
entropy power inequality presented in the sequel. The
second tool is DeBruijn’s identity, the link between en- Since V, = fix, + mY, the Fisher information in-
tropy and Fisher information (for proofs consider [6], [14], equality (28) applies to (31) and thus establishes the
differential inequality (30). 0
[381).
Theorem 14 (DeBruijn’s identity (381): Let X be any
Remarks:
randotn vector in R” such that J(X) exists and let Z - 41
be a standard normal, which is independent of X. Then a) The representation (31) is very similar to the one in
[l]. Such a representation was also used in [21 for
;h(X+J;Z),6s,= ;J(X). proving a strong version of the central lim it
theorem.
We are now ready to present the simplified version of b) Two independent proofs of the entropy power in-
Stam’s proof.’ equality via the equival&rt convexity inequality (12)
have been presented. In the first proof, the underly-
Proof of Theorem 7 (by normal perturbations): Con- ing tool is Young’s inequality from mathematical
sider the continuous family of pairs of independent ran- analysis, and results about (Shannon’s) entropy are
dom vectors the lim it as r --f 1 of analogous results about RCnyi
entropies (i.e., about norms of operators in L,(R”)).
x, =fix+fix,, O<t_<l,
In the second proof, the underlying tool is a suffi-
y,=fiy+fiy,, O<t<l, cient statistic inequality for Fisher information, and
results about entropy are obtained by integration
where the standard multivariate normals X, N +1 and over the path of a continuous normal perturbation.
Y0 N +I are independent of X,Y and of each other. Fix This proof also settles the cases of equality that are
O~A~landletV,=~X~+~Y,.Clearly,V,-$,is not determined in the first proof. We will encounter
also a standard normal, and V, = fiVl + fiV, for all this duality again in Section .IV where uncertainty
0 I t 5 1. We now consider the function principles are derived by similar arguments.
c) The strong formal resemblance between convexity
s(t)=h(v,)-Ah(X,)-(1-A)h(Y,), forOIt51.
inequalities (12) and (18) dealing with entroIGes and
Theorem 7 (i.e., inequality (12)) amounts to s(l) 2 0, and the M inkowski sum of sets suggests the following
since V,, X,, and Y0 are identically distributed s(O)= 0. inequality:
Therefore, our goal is to establish the differential inequal-
ity &A+ B) /J(A) P(B)
S(A+B) ‘S(A)+ S(B) ’ (32)
-&(t,, 2 0, Ostsl, (30)
as the dual of the Fisher information inequality (37).
which clearly implies inequality (12) and thus completes Here, S(C) denotes the outer M inkowski content of
the proof. By virtue of the scaling property (22) (applied the boundary of a set C, which is defined as
here for p = 1, cy= Jl/t and for the variables X,, Y, and
y> the function s(t) may also be expressed as S(C)=lirri;f ~[~(C+EB~)-~(C)],

s(t) = h( V, + ,/&) - Ah( X + 6X0)


where B, denotes a ball of radius p centered at the
origin (in particular, when C is a convex set or a set
-(i-A)h(Y+J;TY,),
with piecewise smooth boundary then S(C) coin-
where E, = ((l/t) - 1). Therefore, by DeBruijn’s identity cides with the usual surface area of C; see [lo],
(29)
p. 69).
When inequality (32) holds, the Brunn-Minkowski
inequality follows by a continuous perturbation (by
balls) argument paralleling Stam’s proof of the en-
tropy power inequality. However, (32) does not hold
-AJ(X+GX,)-(l-A)J(Y+&Y,)). in general for nonconvex sets. For example, it is
false when A is the unit ball and B is the union of
‘At the time of the writing of this paper, the same result was
two balls of distance 3 apart (so that A + B is also
independently derived by Carlen and Soffer and will appear in [13]. the union of two disjoint balls).
1510 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

Does (32) hold when both A and B are compact, tion J(X) exists,.
convex and nonempty sets? Alternatively, is the ratio
of volume-to-surface area increased by M inkowski $(X)N(X) 2 1. (34)
sums for such sets? When in addition A (or B) is a
ball the inequality (32) indeed holds as a direct Proof (following (141): For Y = &Z, where Z is a
consequence of the Alexandrov-Fenchel inequality standard multivariate normal (so N(Y) = E), the entropy
(see [lo], p. 143). power inequality (10) reduces to
d) Consider the functions
$v(x+&z)-N(X)] rl. (35)
E
Clearly,

Theorem 7 (inequality (12)) amounts to s,(l)


- s,(O)2 0 and therefore is a direct consequence of
~{N(X+~Z)},~_o=ZN(X)~{h(X+ n
2t(&,(t)/dt)2 0. It can be shown that Young’s Therefore, in the lim it E JO, inequality (35) yields the
inequality (20) is in essence equivalent to s,(l)- isoperimetric inequality for entropies by DeBruijn’s iden-
s,(O)2 0. Therefore, it is tempting to suggest that tity (29). 0
the stronger inequality 2t(ds,(t)/dt) 2 0, holds for
Remark: Inequality (34) is equivalent to Gross’s loga-
all 0 5 t 5 1 and for some (or all) Y # 1. The latter
rithmic Sobolev inequality (see [25]). This is discussed in
inequality holds iff for every X N f and Y N g
[12]. For more literature on this subject see [26].
{J<vr>l~
A2((l-*>r+ We The same approach is applied in [19] for deriving the
following isoperimetric inequality about Fisher informa-
+(1-h)2(hr+(l-h)){J(Y)}, (33) tion.
where the density of V, is proportional to

.f(v
- r Theorem 17 (Fisher Information Isoperimetric Inequal-
ity): When the Fisher information J(X) of a random

[/
Y>
((l-h)+A/r)g(y)(A+(l-h)/r)dy
1 . vector X in R" exists and is differentiable with respect to
a small independent normal perturbation then
Note that for r = 1, V = X + Y and (33) is merely the
Fisher information inequality (28). In conclusion, if d ([~{~(X+~Z))]-']~=o,l. (36)
z
(33) holds for r # 1 then this remark is the skeleton
of a new proof of Young’s inequality for these values Proof (following (191): While the Fisher information
of r, a proof which is orthogonal to the existing inequality (28) is the dual of the convexity inequality (12),
proofs of [3] and [S]. the inequality
J(x+Y)-l- J(x)-1-J(Y)-120, (37)
D. Isoperimetric Inequalities
where X, Y are any two independent random vectors, is
The classical isoperimetric inequality states that balls
the dual of the entropy power inequality (10). This equiva-
have the smallest surface area per given volume. Recall
lent statement of the Fisher information inequality is
that S(A) is the surface area of a set A and that B, is the
proved for example in [6] (for n = 11 and [20] (for y1# 1).
unit ball. So, an alternative statement of the isoperimetric
For Y=&Z (so that J(Y)-i = e/n) and in the lim it
inequality is as follows.
E J 0 this inequality yields
Theorem 15 (The Classical Isoperimetric Ineqdity):
lim L(J(X+~Z)~‘-J(X)~‘}-~
S(A) 2 n *p( Ap -l’n)p( Bp) e-0 E
with equality if A is a ball in R".
Proofi Consider the nth power of the Brunn-
Since this is the same inequality as (36) the proof is
M inkowski inequality (14) for B, = l B1 (so that pL(B,)‘ln
completed. q
= E,uL(BJ ‘1,). The isoperimetric inequality results by sub-
tracting p(A), dividing by E and considering the lim it as Remark: Inequality (36) is equivalent to the “I2 in-
E JO. 0equality” of Bakry and Emery (see [l]).
The Fisher information isoperimetric inequality sug-
A dual “isoperimetric inequality” was derived by such
gests that the sensitivity of the inverse of the Fisher
an approach out of the entropy power inequality (see [141
information with respect to a small independent normal
following [38]).
perturbation is m inimal when the unperturbed variable
Theorem 16 (Isoperimetric Inequality for Entropies): For already possessesa multivariate normal distribution. Note
any random vector X in R" for which the Fisher informa- that the inverse of the Fisher information is exactly the
DEMBO ef al.: INFORMATION THEORETIC INEQUALITIES 1511

Cramer-Rao lower bound for the error of the estimate of While Lieb’s proof of this conjecture (in [30]) is based on
a translation parameter (see also Section IV-B). Hausdorff-Young and Young inequalities, here a stronger
The concavity of the entropy power, which is proved “incremental” result is derived as a direct consequenceof
directly in great length in [15], is the following immediate the isoperimetric inequality for entropies. This demon-
corollary of the Fisher information isoperimetric inequal- strates once again the close relationship between Fisher
ity (36). information and entropy.
Corollary 3 (Concavity of the Entropy Power): When
the Fisher information of X exists and is differentiable A. Barn’s Uncertainty Princi&
with respect to a small independent normal perturbation
We adopt the following definition of conjugate vari-
then
ables in quantum mechanics.
-${N(X +~z)},~=O
IO. Definition: Associate with any complex wave amplitude
function I) in L,(R”) a probability density
Proof (following (191): Two applications of DeBruijn’s
identity (29) yield
Let 4(y) E L,(R”) be the Fourier transform of I,!&), and
g+(y) the density similarly associated with 4. Then, the
random vectors X N fJ, and Y N g+ are called conjugate
= N(X)
variables.
2+;;{J(X+&Z)),.=0
Stam’s uncertainty principle relates the Fisher informa-
tion matrix associated with a random vector (defined
The isoperimetric Fisher information inequality is clearly next) with the covariance of its conjugate variable.
equivalent to Definition: The Fisher information matrix J(X) of a
2 Id random vector X with a density f is

and the proof of (38) is thus completed. 0


In conclusion, the entropy power of X, = X + fiZ is Let K, and
Theorem 18 (Stam’s Uncertainty Principle):
concave with respect to the variance t of the additive K, be the covariance matrices of the conjugate random
normal perturbation. Moreover, since DeBruijn’s identity variables X and I’.. Then
holds for any random vector Z whose first four moments
1672K, - J(X) 2 0, (39)
coincide with those of the standard multivariate normal,
so does the concavity inequality (38). or, by the symmetrical roles of X and Y,
167r2K, - .I( Y) 2 0. (40)
IV. UNCERTAINTY PRINCIPLES
Proof: See [38]. 0
In [38], the Weyl-Heisenberg uncertainty principle is
derived from a specific version of the Cramer-Rao in- Remark: The left side of the matrix inequalities above
equality. This idea is further developed here in Section is a nonnegative definite matrix. This is the interpretation
IV-B, where we rederive the well-known fact that the of all matrix inequalities in the sequel.
Cramer-Rao inequality for location parameter is exactly The following identities, which are important conse-
the Weyl-Heisenberg uncertainty principle. Strong ties quences of Stam’s proof of Theorem 18, are derived
between Young’s inequality, the entropy power and the in [20].
Fisher information inequalities where explored in Section Stam ‘s Identities:
III. Similarly, Hirschman’s uncertainty principle, which is
presented in Section IV-C, is a consequence of the J(X) = 167T2K,, if F(x)/@(x) =exp(icp), (41)
Hausdorff-Young inequality and it involves entropy pow- where cp is a constant independent of x. Similarly,
ers of conjugate variables. Hausdorff-Young inequalities
exist for various groups and result in the corresponding J(Y) = 167?K,, if $(Y)/+(Y) = ew(icp). (42)
uncertainty principles. One such example, which is pre-
sented in Section IV-D, is related to bounds on the sizes B. Heisenberg’s Principle and the Cram& - Rao Inequality
of support sets of conjugate variables (see [21] for many
other bounds of this type). Heisenberg’s uncertainty principle is often stated as
A new proof of Wehrl’s conjecture about the m inimal h
possible value of the classical entropy associated with UXUY “y-g
certain quantum systems is presented in Section IV-E.’
where h is Plan&s constant and ax and uy are the
‘It was brought to our attention by an anonymous referee that this standard deviations of a pair X,Y of conjugate variables
result was obtained independently by Carlen and will appear in [ll]. in R’. However, the definition of conjugate variables in
1512 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

Section IV-A corresponds to a proper normalization, in where throughout this section X,Y is any pair of conju-
which h/2r is replaced by 1/4~r. This normalization gate variables. A seemingly unrelated fact, a stronger
yields the following multivariate uncertainty relationships. version of Theorem 16 (the isoperimetric inequality for
entropies) whose detailed derivation is given in [20], states
Theorem 19: The n-dimensional Weyl-Heisenberg un-
that
certainty principle may be stated in any of the following
four equivalent forms: N(X)IJ(X)1%1.
16rr*K”*K K1’* - I -> 0 (43) By combining the above two inequalities one obtains
16r*K;~*K;K;~* - Z2 0 (44) 16?7*1K,li’“N(X) 2 1.
16r*K, - K;’ 2 0 (45) Now, the maximum entropy inequality N(Y) I (Kyll’n
16rr*K, - K;’ 2 0, (46) (Lemma 5) suggests the following sharper uncertainty
where X,Y are any pair of conjugate vectors (see Section principle.
IV-A for definition). Theorem 21 (Hirschman’s Uncertainty Principle):
There exists a simple and direct proof of this inequality
as a consequence of an appropriate Cauchy-Schwartz 16r*N(Y)N(X) 2 1. (49)
inequality. Here we present an alternative proof illustrat- This uncertainty principle was conjectured by Hirschman
ing the connection of this uncertainty principle with the (in [28]) who proved a weaker version with a smaller
Cramer-Rao inequality: constant. It follows as a corollary of the following strong
Theorem 20 (Cramer - Rao Inequality): version of Hausdorff-Young inequality (due to Beckner
J(X) - K,’ 2 0,
[31X
(47)
J(Y)-K,‘>O. Theorem 22 (Hausdorff -Young Inequality): Let 4(y)
(48) be the Fourier transform of I,!&) E L,(Rn). Then for any
Proof of the Weyl- Heisenberg Inequality: Adding 11p12
Stam’s uncertainty principle (39) and the Cramer-Rao
inequality (47) yields the Weyl-Heisenberg principle (45). 11411,~
I cpn’*11~11, (50)
0 where (l/p) + (l/p’) = 1, and cP = ~‘/J’/P’(~‘~‘).
We interpret this relationship by suggesting that Stam’s Remarks:
uncertainty principle “measures” the fluctuations in the
phase of the amplitude wave functions 9(x> and $(y>, a) In 1401,the time duration of the function I/J(X) is
while the Cramer-Rao inequality “measures” the amount measured via rP = exp{h,,,(X)} and its bandwidth is
of “nonnormality” of the associated densities f#(x) and measured by 0,, = exp{hp,,2(,Y)1. In this terminol-
ogy, Hirschman’s uncertainty principle amounts to
t?,(Y).
Actually, Stam’s identities (40, (42) establish the equiu- the following “time-bandwidth” uncertainty relation
alence of the Weyl-Heisenberg principle and the specific /e\”
Cramer-Rao inequality given in Theorem 20. This equiv-
alence is established by proving the Cramer-Rao inequal-
ity as a consequence of the Weyl-Heisenberg principle. b) One can also establish Young’s inequality in the
range 1 I p, q I 2 I r out of the Hausdorff-Young
Proof of the Cramer - Rao Inequality (47): Suppose inequality (Theorem 22) and elementary properties
that X is a random variable in R” with a density f(x) for of the Fourier transform (see [3]).
which J(X) < ~0.Let G(x) = m be the associated real c> Cases of equality in (50) are studied in [31].
valued amplitude wave function. Clearly, Stam’s identity d) Carlen [12] obtains the isoperimetric inequality (34)
(41) holds. Substituting this identity into the Weyl- as a consequence of Hirschman’s uncertainty princi-
Heisenberg principle (45) yields the Cramer-Rao inequal- ple.
ity (47). 0
Remark: This equivalence is generalized in [20], and D. A Discrete Version of Hirschman’s Uncertainty Principle
shown there to hold between general families of Hausdorff-Young inequalities exist for Fourier trans-
Weyl-Heisenberg and Cramer-Rao inequalities. forms on groups other than R”. Each of these inequalities
yields the corresponding Hirschman’s uncertainty princi-
C. Hausdorff - Young Inequality and Hirschman’s ple by considering the lim it as p -+ 2. As an explicit
Uncertainty Principle example to demonstrate this idea we show here that any
An immediate consequence of Stam’s uncertainty prin- unitary square matrix U (possibly of infinite dimension),
ciple (39) is that with supijluijl = M < 1, yields a nontrivial Hausdorff-
Young inequality and consequently the following uncer-
16%-*(K,I”” r /J(X) y, tainty principle.
DEMBO ef al.: INFORMATION THEORETIC INEQUALITIES 1513

Theorem 23: The integer valued random variables X, Y l/n for k=1,2;.., n, and for this pair of distributions
with P(X = i> = /xii*/ I/xl/$ and P(Y = i) = the previous inequality holds with equality.
Kux>i12/lIZkIlt are “conjugate” variables, where x is any The discrete entropy is bounded above by the base 2
vector with llxl12<a. For any such pair logarithm of the size of the support set of the distribution.
Therefore, the uncertainty principle (53) implies that the
H(X)+H(Y)22ln . $ (51) product of the support sets of the vector x and its discrete
1 i Fourier transform is at least the dimension n of the
Proof: The unitary matrix U is an isometry on the Fourier transform. This is Theorem 1 of [21] (where
appropriate Hilbert space, i.e., for every x, llUxll2 = Ilxll~. similar support-set inequalities are derived also for x such
Furthermore, clearly llUxllmI M llxllr, where llxllP = that (1 - E) of Ilxl12 is concentrated over a relatively small
[Ci=rIxilp]r/P and IlxlL, = ~up~,~{lx~l). Riesz’s interpola- index set).
tion theorem (between the extreme bounds above for
p = 1 and p = 2) yields the following “Hausdorff-Young” E. Wehrl’s Conjecture
inequality for any vector x and any 1 I p I 2
Wehrl introduced a new definition of the “classical”
IIUXllp~
5 M (*-~)‘%dIp, (52) entropy corresponding to a quantum system in an attempt
where l/p’ + l/p = 1. Consider now a pair of conjugate to build a bridge between quantum theory and thermody-
variables X and Y with distribution functions as previ- namics (see [39]). Consider a single particle in R”. The
ously defined. Then (52) implies an uncertainty principle (quantum) state of the particle is characterized by the
for the (discrete) RCnyi entropies of X and Y. Specifi- “density matrix” p, a nonnegative definite linear operator
tally, let on L,(R”) of unit trace (i.e., whose eigenvalues are
.I nonnegative real numbers that sum to one). The coherent
1
In CP(X= i)“* states are the normalized L,(R”) functions
%AX) = l-(p/2) i
*(xlP,d
= 1-pp/2 ln(llxllp/lldl*)~
= ( &)n’2( Jn’4exp( -+(r-q)‘(xq)+@‘x),
and

HP’,*(y) = 1-(L,,2) In CP(Y=i)“‘*


.., I where p E R” and q E Rn are respectively the momentum
P’ and position parameters associated with the coherent
In (IlUxll,~/ IIUXIIZ), state. Note that when the particle is in quantum state
= l-p’/2
$(xIp, q) then its associated probability density
then (52) reads l~(~lp,q>l*/ 11~11~ is 4l,2(x - 4).
For any quantum operator p one can associate the
(~-;)H”,2(x)+(;-33p./,o following classical probability density function f, on the
parameter space R*”

f,(PJd =/5(xlP,q)P[~(xlP,q)l h9 (54)


For (l/p) = (l/2)+ E, (l/p’) = (l/2)-- E and as E JO this
inequality (when divided by E) yields the uncertainty where $ denotes the complex conjugate of $. Wehrl
principle (51). 0 argued that the proper definition of the “classical” en-
tropy associated with the operator p is the normalized
Remarks: This uncertainty principle is nontrivial for (Shannon) entropy of f,, i.e., h(X,) - n ln(2r), where X,,
M < 1. For example, consider the discrete Fourier trans- is a random variable on R*” with density f,. Wehrl and
form of size n that corresponds to a unitary matrix U for others have studied the properties of this classical entropy
which M = luijl = l/6. Here, Hirschman’s uncertainty (see, for example, [39]). One of the appealing properties
principle becomes they demonstrate is that the classical measure of uncer-
tainty h(X,) is an upper bound to the quantum measure
i ‘cx= k)log* p(X1=k) of uncertainty, i.e., the discrete quantum entropy
k=l - tr(plnp). As the quantum entropy is always nonnega-
1 tive they argue that while the differential entropy h( * )
+ 5 P(Y=k)log > log, n, (53)
*P(Y=k) - may well be negative it is never so for the physically
k=l
meaningful variables, i.e., for those of the form of X,, for
where the vector dm is the discrete Fourier some quantum operator p.
transform of the vector Jp(x=k). The quantum entropy is zero on any pure state (i.e.,
This inequality is sharp. For example, starting with whenever the operator p is of rank 1). On the other hand,
P(X = 1) = 1 results in a uniform distribution P(Y = k) = Wehrl conjectured that the classical entropy is never zero,
1514 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

i.e., in the “classical” theory there is an inherent m inimal The projection operators Pi correspond to densities
level of uncertainty (due to “quantization”) the value of
which is n. Further, this m inimal uncertainty is obtained
iff the operator p is a projection operator on one of the
coherent states. where e, E L,(R’) and lIei = 1. Theorem 25 is thus the
Wehrl’s conjecture, which is restated below as a lower immediate consequence of the following two lemmas.
bound on the entropy power of X,, was proved in [30] by
an application of the strong versions of Young and Lemma 6: For any two random vectors X, Y in R*” and
Hausdorff-Young inequalities (casesof equality were later any 0 5 A 5 1, let 2 = B,X +(l - B,)Y, where B, denotes
determined by Carlen [ll]). a Bernoulli (A) random variable, independent of both X
and Y. The density of 2 is therefore the convex combina-
Theorem 24 (Wehrl- Lieb): For X, a random variable tion Af + (1 - A)g, where f, g are the densities of X and
in R*” with density f, of the form of (54) Y, respectively. Then,
qq> 2 1, AJ(X)+(l-A)J(Y)-J(Z)rO.
Pro05 With this notation, after some manipulations
and equality holds iff p is of rank 1 and X, has a
we obtain
standard normal distribution.
AJ(X)+(l-A)J(Y)-J(Z)
Remarks:
g(p,q)f(p,q)
a> It is fairly easy to show that the above conditions for =A(l-A)/
equality are equivalent to p being a projection oper- Af(p,q)+(l-A)dp,q)
ator on exactly one coherent state.
b) Both the previous discussion and statement of Theo- *(OIn~/(o1n#)dpdq. (55)
rem 24 correspond to the normalization under which
h/2rr is replaced by 1/4~. In the real world all Since (V In f(p, q)/dp, qMV In f(p, q)/g(p, 4)) 2 0,
levels of uncertainty are to be appropriately restated the integral in the right side of (55) is nonnegative and the
in terms of multiples of Plan&s constant h. proof is complete. 0
Recall the isoperimetric inequality for entro-
Lemma 7: For any random vector X in R*” with a
pies (34)
density of the form f(p, q) = I le(x)$(xlp, q) &I* where
e E L,(R*“) and Ilel12= 1,
J(X) = 2n.
with equality iff Xp has a standard normal distribu- The proof of this lemma is by direct calculation. (For
tion. Because of this result, the above theorem details see 1201).
(Wehrl’s conjecture) is an immediate consequence
of the following stronger “incremental” version. V. DETERMINANT INEQUALITIES
Theorem 25 (Carlen [Ill, Dembo (201): For X,, as be- A. Basic Inequalities
fore, Throughout we will assume that K is a nonnegative
definite symmetric n x n matrix. Let IK( denote the de-
&qJ 5 1, terminant of K. In Section III, we have seen that the
entropy power inequality yields the M inkowski inequality
with equality iff p is an operator of rank 1. (see Theorem 5) and the concavity of In IK I (see Theo-
rem 8).
Remark: Starting with Theorem 25 and applying a per-
We now give Hadamard’s inequality using the proof in
turbation argument similar to the one presented in Sec-
[171.See also [331for an alternative proof.
tion III-C yields the monotonicity of N(fiX, + fix,*),
with respect to t E [O,11, where p* is any projection Theorem 26 (Hadamard): IK I < rIy=‘=,Kii, with equality
operator on a coherent state and Xp and X,,, are inde- iff Kij = 0, i # j.
pendent random vectors. The appropriate interpretation
Proof Let X N 4K. Then
of this result is, however, unclear.
Proofi The operator p may be decomposed into p = ~~n~2~e).lKI=h(X,,X2;..,X,)
C~=,XjPi where hi 2 0, C~=IAi = 1, and Pi are rank one
projection operators. Therefore, by (54) and the linearity
of p and Pi, I i h(Xi) = i t ln2~elKii(,
i=l i=l

with equality iff Xi, X2; * *,X, are independent, i.e.,


Kij = 0, i # j. cl
DEMBO et al.: INFORMATION THEORETIC INEQUALITIES 1515

We now provide a direct information theoretic proof of Remark: Since h(X,IX,-,, . . . , Xi> is a decreasing se-
Fan’s (see [22]) Theorem 8 (which states that In IK I is a quence, it has a lim it. Hence, by the Cesaro mean lim it
concave function of K). This proof does not use the theorem,
entropy power inequality, and provides an alternative to
the proof in Section III. h(X,,X,,...,X,)
lim
Proof of Theorem 8: Let X, and X2 be normally
distributed n-vectors, Xi N 4,$x>, i = 1,2. Let the random = lim 1 5 h(XkIXk-l,...,X1)
variable 13 have distribution Pr(f3 = 1) = A, Pr(6’= 2) = n+m n kc1
l- A, 0 5 A I 1. Let 0, Xi, and X2 be independent and
let Z = X0. Then Z has covariance K, = AK, +(1 - A)K,. = lim h(X,IX,-,;..,X,). (59)
n-tm
IIowever, Z will not be multivariate normal. By first using
Lemma 5, followed by Lemma 3, we have Translating this to determinants, one obtains the result
lK,I
~~n(2~e)n~AK,+(l-A)K2~~h(Z)~h(ZiO) lim IK,I
n+m 1/n = fkc IK,J ’

= At ln(2re)“lK,I+ (l- A) k ln(2re)“lK,I. B. Inequalities for Ratios of Determinants


Thus, We first prove a stronger version of Hadamard’s theo-
IAK,+(l-A)K2~~lK,IAlK21’-“, (56) rem due to K. Fan [23].
as desired. 0 Theorem 28: For all 12 p I n,
Taking logarithms and letting AK, = A, Cl- A)K, = B, IKI & jK(i,p+l,p+2;..,n)I
we obtain IK(p+l,p+2;-.,n)/ i=l IK(p+l,p+2;..,n)l ’

logIA+BI>Alog ; +(I-A),og Proof: We use the same idea as in Theorem 26,


I I except that we use the conditional form of Lemma 4, to
=AlogIAl+(l-A)logIBI+nH(A). (57) obtain
Maximizing the right-hand side over A, we obtain the IKI
optimum value of A as lAl”“/(lAll’” + IBI”“). Substi- $ ln(2re)P
- tuting this in (57), we obtain the M inkowski inequality IK(p+l,p+2;..,n)l
(Theorem 5). =h(X1,X2,...,XplXp+l,Xp+2,...,Xn)
We now prove a property of Toeplitz matrices. A
Toeplitz matrix K, which arises as the covariance matrix
of, a stationary random process, is characterized by the 5 ~h(X,IXp+,,Xp+,,...,X,)
i=l
property that Kij = K,, if Ii - jl = Ir - sl. Let K, denote
the principal m inor K(1,2,. . *, k). The following property p 1 IK(i,p+l,p+2;..,n)I
can be proved easily from the properties of the entropy = c Z1n2Te IK(p+1,p+2,...,n)I ’ ’ (60)
i=l
function.
The&em 27: If the positive definite n X n matrix K is
If (Xl, x2; . .>XJ - $K,, we know that the conditional
density of X, given (Xi, X2, * * *, X,-,> is univariate nor-
Toeplitz, then mal with mean linear in Xi, X2, * . . , X,-i and conditional
lKll 2 jK211’* 2 . . . 2 IK,J1’(n-lb lK,Il’” variance ^a,*.Here a,* is the m inimum mean-square error
E(X, - X,>* over all linear estimators Xn based on
and IKkl/IKk-ll is decreasing in k.
X,,&,* * *,xn-l.
Proof: Let (X1,X2; * *, XJ - 4K,. Then the quanti- Lemma 8: a,2 = IK,I/IK,-,I.
ties h(X,IX,-,; * *, Xi> are decreasing in k, since
Proofi Using the conditional normality of X,, Lemma
h(&Ix,-,,’ * ’,x,)=h(X,+,IX,,...,X,) 2 results in
2h(Xk+llXk,...,X2,X1) (58)
where the equality follows from the Toeplitz assumption ~ln2ae~~=h(X,,IX,,X2;..,X,,)
and the inequality from the fact that conditioning reduces
=h(X,,X,;.. ,X,)-h(X,,X,,...,X,-,)
entropy. Thus the running averages

;h(X’:. . ,X,,)=i ,i h(XiIX,-1>***,X1) = k ln(2re)“lK,l- i ln(2rre)“-11K,-,I


1=1
are decreasing in k. The theorem then follows from = i ln2relK,l/ IK,-ll. 0 (61)
h(X,, X2; . -, X,> = (1/2)ln(2re)kIK,l. 0
1516 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

M inimization of a: over a set of allowed covariance In this derivation, a) follows from Lemma 8, b) from
matrices {K,} is aided by the following theorem. the fact the conditioning decreasesentropy and c) follows
from the fact that Z is a function of X and Y. The sum
Theorem 29: In(lK,I/IK,-,) is concave in K,. X, + Y, is normal conditioned on X1, X,, . . *, X,-i, Yi,
Proof: We remark that Theorem 8 is not applicable y2,. . ‘,y,-1, and hence, we can express its, entropy in
because In(lK,I/ IK,-,I) is the difference of two concave terms of its variance, obtaining equality ,d). Then e) fol-
functions. Let Z = X0, where X, N ~,Jx>, X, N 4,,(x), lows from the independence of X, and Y, conditioned on
Pr(B=l}= A =l-Pr{O=2}, and X,,X,,O are indepen- the past X,,X,;..,X,_,,Y,,Y,;..,Y,_,, and f> follows
dent. The covariance matrix K, of Z is given by from the fact that for a set of jointly normal random
K,=hS,+(l-h)T,. variables, the conditional variance is constant, indepen-
dent of the conditioning variables (Lemma 8).
The following chain of inequalities proves the theorem: In general, by setting A = hS and B = Cl- h)T, we
obtain
h~In(2~e)PlS,,I/IS,-pl+(1-A)~ln(2~e)plT,l/l~~-,l

(~)Ah(X,,,,,X,,,-,,...,X,,,-.,1IX,,,,~..,X,,,-,)+
(1-~)h(X2,n,X2,n-1,...,X2,n-p+lIX2,1,...,X2,n-p)
=h(Z,,Z,~~,~~~,Zn-p+~lZ1,...,Zn-p,e)
(b) i.e., IK,I/ JK,-,I is concave. 0
~h(Z,,Z,-l,...,Z,-p+llZ1,...,Zn--p)

Cc)1 IK,,l Simple examples show that IK,I/ IK,-,I is not neces-
5 5 In(2~e)P- (64 sarily concave for p 2 2.
IK,-,I ’
where a) follows from
h(X,,X,-l,...,X,-p+llX1,...,Xn--p) C. Subset Inequalities for Determinants
=h(X,;.* ,X,J-h(X1,-,X,-p), We now prove a generalization of Hadamard’s inequal-
b) follows from the conditioning lemma, and c> follows ity due to Siasz [35]. Let KG,, i,; 1., ik) be the principal
from a conditional version of Lemma 5. submatrix of K formed by the rows and columns with
Theorem 29 for the case p = 1 is due to Bergstrom [4]. indexes i,,i,; **,i,.
However, for p =,l, we can prove an even stronger theo- Theorem 31 (Szasz): If K is a positive definite n X n
rem, also due to Bergstrom [41. 0 matrix and Pk denotes the product of all the principal
Theorem 30: IK,I/IK,-ll is concave in K,. k-rowed m inors of K, i.e.,
Proof: Again we use the properties of normal ran-
dom variables. Let us assume that we have two indepen-
dent normal random vectors, X N $A, and Y N 4B,. Let
Z=XSY.
Then
IA, + &I (a)
i ln2rre = h(Z,IZ,-,,Z,-,,...,Z,)
IA,-, + &-,I

(63)
DEMBO er al.: INFORMATION THEORETIC INEQUALITIES 1517

then Let
P, 2 P$/(“I’) 2 Py(“T’) 2 * * * -> P n’
Prrbofi Let X- (bK. Then the theorem follows di-
rectiy from Theorem 1, with the identification h(kn)=
(l/n)ln Pk +(1/2)ln27re. Theorem 34:
0
We can also prove a related theorem. R,>R2>... >R,-,>R,.
Theorem 32: Let K be a positive definite n x n matrix Proof: The theorem follows immediately from Corol-
and let lary 2 and the identification
1 1 IK(S)llK(Sc)l
QO=-.--
n l~il~i*<c,, <i~~~lK(i,,i~~...,i,)ll’*. I(X(S);X(S”)) =?ln
IKI ’
(k 1
In particular, the outer inequality R, 2 R, results in
Then,

itr(K)=Sr”)>Sr)> * . . 2 $‘& lKl’/“. q (65)


n
Proof: This follows directly from the corollary to Finally, we can convert Theorem 3 into a statement about
Theorem 1. with the identification sp) = (2re)Sp), and determinants by considering Xi, X2,. * *, X, to be nor-
r = 2 in (3) and (4). 0 mally distributed with covariance matrix K.
Let
Define the geometric mean of (I K I/ IK(Sc)l)l/k over
k-element subsets by IK(S)llK(S”)l l’(L)
Tk= n
IKI l/k(a) i S: ISI = k IKI I *
.i Theorem 35:
i-, I T2 I . . * I TLn,21.
Theorem 33:
Proof: The theorem follows directly from Theorem 3
i$;=QlrQ2s *** iQ,-,<Q,=IKI”“. and (64). 0

Proof: The theorem follows immediately from Theo- ACKNOWLEDGMENT


rem 2 and the identification A. Dembo thanks S. Karlin for pointing attention to [8]
and [30] and Y. Peres and G. Kalai for pointing attention
h(X(S)lX(S’)) =iln(2rre)’ ,K’F&, . to [lo]. The authors also thank E. Carlen for providing
preprints of [ll], [ 121,and [13].
The outermost inequality, Qi I Q,, can be rewritten as
REFERENCES

[l] D. Bakry and M. Emery, “Seminaire de Probabilities XIX,” in


Lecture Notes in Mathematics, 1123. New York: Springer, 1985, pp.
where 179-206.
[2] A. Barron, “Entropy and the central limit theorem,” Ann. Probab.,
IKI vol. 14, no. 1, pp. 336-342, 1986.
ffi2= IK(1,2;..,i-l,i+l;..,n)l (64) [3] W. Beckner, “Inequalities in Fourier analysis,” Ann. Math., vol.
102, pp. 159-182, 1975.
is the minimum mean-squared error in the linear predic- [4] R. Bellman, “Notes on matrix theory-IV: An inequality due to
Bergstrom,” Amer. Math. Monthly, vol. 62, pp. 172-173, 1955.
tion of Xi from the remaining X’s. It is the conditional [5] P. P. Bergmans, “A simple converse for broadcast channels with
variance of Xi given the remaining Xj’s if Xi, X,, * * . , X, additive white normal noise,” IEEE Trans. Inform. Theory, vol.
is jointly normal. Combining this with Hadamard’s in- IT-20, pp. 279-280, 1974.
[6] N. Blachman, “The convolution inequality for entropy powers,”
equality gives upper and lower bounds on the determi- IEEE Trans. Znform. Theory, vol. IT-11, pp. 267-271, Apr. 1965.
nant of a positive definite matrix. [7] H. J. Brascamp and E. J. Lieb, “Some inequalities for Gaussian
Corollary 4: measures and the long range order of the one dimensional plasma,”
in Functional Integration and Its Applications, A. M. Arthors, Ed.
nKii2lKll ~~~~ Oxford: Clarendon Press, 1975.
i i [8] -, “Best constants in Young’s inequality, its converse and its
generalization to more than three functions,” Adu. Math., vol. 20,
Hence, the determinant of a covariance matrix lies be- pp. 151-173, 1976.
tween the product of the unconditional variances Kii of [9] -, “On extensions of the Brunn-Minkowski and Prekopa-
Leindler theorems, including inequalities for log concave functions,
the random variables Xi and the product of the condi- and with an application to the diffusion equation,” J. Functional
tional variances ai2. Anal., vol. 22, pp. 366-389, 1976.
1518 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 6, NOVEMBER 1991

[lo] Y. D. Burago and V. A. Zalgaller, Geometric Inequalities. New [25] L. Gross, “Logarithmic Sobolev inequalities,” Amer. .I. Math., vol.
York: Springer Verlag, 1980. 97, pp. 1061-1083, 1975.
[II] E. A. Carlen, “Some integral identities and inequalities for entire [26] -, “Logarithmic Sobolev inequalities for the heat kernel on a
functions and their application to the coherent state transform,” J. Lie group, ” in White Noise Analysis. Singapore: World Scientific,
Functional Anal., 1991. 1990.
[12] -, “Superadditivity of Fisher’s information and logarithmic [27] T. S. Han, “Nonnegative entropy measures of multivariate symmet-
Sobolev inequalities,” .I. Functional Anal., 1991. ric correlations,” Inform. Contr., vol. 36, pp. 133-156, 1978.
[13] E. A. Carlen and A. Soffer, “Entropy production by convolution [28] I. I. Hirschman, “A note on entropy,” Amer. .I. Math., vol. 79, pp.
and central limit theorems with strong rate information,” Commun. 152-156, 1957.
Math. Phys., 1991. [29] S. Kullback, “A lower bound for discrimination information in
[14] M. Costa and T: M. Cover, “On the similarity of the entropy power terms of variation,” IEEE Trans. Inform. Theory, vol. IT-4, pp.
inequality and the Brunn-Minkowski inequality,” IEEE Trans. 126-127, 1967.
Inform. Theory, vol. IT-30, pp. 837-839, 1984. [30] E. H. Lieb, “Proof of an entropy conjecture of Wehrl,” Commun.
[15] M. H. M. Costa, “A new entropy power inequality,” IEEE Trans. Math. Phys., vol. 62, pp. 35-41, 1978.
Inform. Theory, vol. IT-31, pp. 751-760, 1985. [31] -, “Gaussian kernels have Gaussian maximizers,” Inventions
[16] T. M. Cover and J. A. Thomas, “Determinant inequalities via Math., vol. 102, pp. 179-208, 1990.
information theory,” SIAM .I. Matrix Anal. and its Applicat., vol. 9, [32] A. Marshall and I. Olkin, Inequalities: Theory of Majorization and
no. 3, pp. 384-392, July 1988. its Applications. New York: Academic Press, 1979.
[17] T. M. Cover and A. El Gamal, “An information theoretic proof of [33] -, “A convexity proof of Hadamard’s inequality,” Amer. Math.
Hadamard’s inequality,” IEEE Trans. Inform. Theory, vol. IT-29, Monthly, vol. 89, pp. 687-688, 1982.
pp. 930-931, Nov. 1983. [34] H. Minkowski, “Diskontinuitltsbereich fir arithmetische iquiva-
[18] LCsiszar, “Informationstheoretische konvegenenzbegriffe im raum lenz,” .I. fur Math., vol. 129, pp. 220-274, 1950.
der vahrscheinlichkeitsverteilungen,” Publ. Math. Inst., Hungarian [35] L. Mirsky, “On a generalization of Hadamard’s determinantal
Academy of Sci., VII, ser. A, pp. 137-157, 1962. inequality due to Szasz,” Arch. Math., vol. 8, pp. 274-275, 1957.
[19] A. Dembo, “A simple proof of the concavity of the entropy power [36] A. Oppenheim, “Inequalities connected with definite Hermitian
with resoect to the variance of additive normal noise,” IEEE Trans. forms,” .I. Lon. Math. Sot., vol. 5, pp. 114-119, 1930.
Inform. &Theory, vol. 35, pp. 887-888, July 1989. [37] C. E. Shannon, “A mathematical theory of communication,” Bell
[20] I, “Information inequalities and uncertainty principles,” Tech. Syst. Tech. .I., vol. 27, pp. 379-423, 623-656, 1948.
Reo.. Dept. of Statist.. Stanford Univ., Stanford, CA, 1990. [38] A. Stam, “Some inequalities satisfied by the quantities, of informa-
[21] D. L. Donoho and P.‘B. Stark, “Uncertainty principles and signal tion of Fisher and Shannon,” Inform. Contr., vol. 2, pp. 101-112,
recovery,” SIAM .I. Appl. Math., vol. 49, pp. 906-931, 1989. 1959.
[22] K. Fan: “On a theorem of Weyl concerning the eigenvalues of [39] A. Wehrl, “General properties of entropy,” Reu. Modern Phys., vol.
linear transformations II,” Proc. National Acad. Sci. U.S., vol. 36, 50, pp. 221-260, 1978.
1950, pp. 31-35. [40] M. Zakai, “A class of definitions of ‘duration’ (or ‘uncertainty’) and
[23] -, “Some inequalities concerning positive-definite matrices,” the associated uncertainty relations,” Inform. Contr., vol. 3, pp.
Proc. Cambridge Phil. Sot.. vol. 51. 1955. PD. 414-421. 101-115, 1960.
[24] H. Federer, Geometric measure theory, Dol.-B 1.53 of Grundl. Math.
wiss. Berlin: Springer-Verlag, 1969.

You might also like