chapter16
chapter16
X X
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Information and Asymptotic Equipartition Property
Consider a random variable X with the probability distribution P(x)
X a, b, c , d , e , f , g , h
Suppose we measure X exactly N times (N is very large) and we plan to send the results
to a friend
Limit 1 N Asymptotic
log2 P x i P x log2 P x H X Equipartition
N N i 1 x Property (AEP)
This means that as N→∞ ,
NH X
P x1, x 2 , x 3 ,....... x N 2
●This means that as N→∞ there can only be 2NH(X) different result sequences that are
probabilistically likely (and each one of them has the same a-priori probability)
●Therefore, we only need NH(X) bits to encode any result sequence that is likely to occur
●This means on average we need only H(X) bits per result to encode it
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Entropy and Information
Entropy:
The amount of information (in bits) that is gained by learning about the outcome of a
measurement of a random variable is given by the entropy function:
H X p x log2 p x
x
Equivalently, entropy is the minimum number of bits required on average to transmit
reliably the outcome of a measurement of the random variable
HX 0
Case 2;
H X p x log2 p x
x
2.69 bits!
A a1, a2 , a3 ,........ak
x xo x x o 2 2 x
Answer: A Gaussian (or Normal) distribution
2
x xo
Px
1
2 2
e 2 2
x o , 2 HX
1
2
log2 2 e 2
Discrete random variable:
What probability distribution maximizes H(N) subject to the constrains:
n no n 0,1,2,3.........
Answer: A Thermal (or Bose-Einstein) distribution
n
1 no 1
P n H N log2 1 no no log2 1
1 no 1 no no
How much information can be obtained on average from learning about the outcome of
a measurement of a random variable Y if the outcome of the measurement of another
random variable X is known?
H Y | X p x p y | x log2 p y | x
x y
p x H Y | X x
x
p x , y log2 p y | x
x ,y
Cases:
Difference between the information obtained on average from learning about the
outcome of a measurement of a random variable Y and the information obtained on
average from learning about the outcome of a measurement of a random variable Y if
the outcome of the measurement of another random variable X is known
I Y : X H Y H Y | X I X : Y H X H X | Y
p x, y
p x , y log2
x ,y p x p y
Mutual information quantifies how much information one random variable conveys
about another random variable
Cases:
o o
t 2 B 2 B
Answer: 2B real degrees of freedom per second (Nyquist Theorem) where B is the
single-sided signal bandwidth in Hertz (not radians)
Recall from Chapter 5 that real narrowband signals can always be written as:
x t Re a t e iot a t x1 t ix 2 t
x t x1 t cos ot x 2 t sin ot
So each time-domain sample of the signal carries information on two real degrees
of freedom
n
x n x t
B
t
One can sample the signal as follows and
then reconstruct the signal with these 1
samples
2 cos ot LPF 1B
n
1 x1 t x1 n x1 t
B
x t 2 B
1 x2 t
n
x2 n x2 t
2 B B
2 sin ot LPF
1
cos ot T
1
x1 t B
sin B t n B
x2 t x t x1 n cos ot
n B t n B
sin B t n B
x2 n sin ot
sin ot n B t n B
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Time Domain Basis
Note that the signal can be expended in an orthogonal time-domain basis set:
sin B t n B
x t x1 n cos ot
n B t n B
sin B t n B
x2 n sin ot
n B t n B
form a complete orthogonal set that can be used to expand any band-limited signal
centered at frequencies ±
1 2 1 1 2
P t x1 t x 22 t a t
2 2 2
Total energy of a narrowband signal:
1 1
E dt P t dt x1 t dt x 22 t
2
2 2
1 1 1
x1 n x 22 n
2
2 n B n B
Total energy is just half the energy of all the orthogonal sinc pulses in the signal:
1 1
o o
2 B 2 B
● One can map the message to the amplitudes of the two quadratures
● Note that one can send only 2B different quadrature values per second
Suppose one sends N different quadratures through the channel in time N/2B:
y 1 , y 2 , y 3 , y 4 .............y N
The data to be transmitted and the mapping process will impart an a-priori probability
distribution P(y) for the quadrature amplitudes
AWGN Noise
z n y n f n
Question: how much information (in bits) can be communicated over this channel
using these N quadratures?
P M N 2 P
N 2
Which equals = 1 M 1 2
M N 2 M
So the information in bits that can be transferred
using N-quadratures is:
N 2
P N P
log2 1 log2 1
M 2 M
Y Z
The channel capacity (per usage) with input Y and output Z is defined as the maximal of
the mutual information over all possible input distributions taking into account all
realistic constrains (such as the power/energy constrain):
max max
C I Z :Y H Z H Z |Y
pin y pin y
Any amount of information (in bits) less than or equal to C can be reliably transmitted
and recovered per usage of a noisy channel with an error probability that approaches
zero as the number of uses of the channel becomes large
The channel capacity (per usage) is more formally defined as the maximal of the mutual
information over all possible input distributions taking into account the power/energy
constrain:
max max
y pin y dy P
2
C I Z :Y H Z H Z |Y
pin y pin y
Mutual information will be maximized if the output Z is Gaussian, and Z will be Gaussian
if the input Y is Gaussian
max 1
C H Z log2 2 eM
pin y 2
Mutual information will be maximized if the output Z is Gaussian, and Z will be Gaussian
if the input Y is Gaussian
z n y n f n
If:
p f 0, M
Then:
pout z | y y , M
And then if:
pout y 0, P
Then:
pout z dy pout z | y pin y 0, P M
1 1 1 P
C log2 2 e P M log2 2 eM log2 1
2 2 2 M
Same as before!!
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Quantum Information: The Basics
The unit of quantum information is a “qubit” (not a bit):
0 1
Unlike the classical bit, a qubit can be in a superposition of the two logical states at
the same time
S ˆ Tr ˆ log2 ˆ
The Von Neumann entropy plays three roles (that we know of so far):
1) It quantifies the quantum information content in qubits of a quantum state (i.e. the
minimum number of qubits needed to reliably encode the quantum state)
2) It also quantifies the classical information in bits that can be gained about the
quantum state by making the best possible measurement
As you will see, the Von Neumann entropy will not always give the answer to the
question we will ask!
̂ A pure state
S ˆ 0
2) Suppose:
If the states in the ensemble were not all completely orthogonal then: S ˆ H
3) Suppose:
pi log2 pi pi S ˆ i H pi S ˆ i
i i i
4) Change of basis:
S U ˆ U S ˆ
A ˆ1, ˆ 2 , ˆ 3 ,........ˆ k
ˆ N ˆ ˆ ˆ .......... ˆ
x y 2 dx dy x y P x , y
2 This is not the only
measure used
Quantum Fidelity:
Given two quantum states, ̂ and ˆ , the fidelity F, a measure of the closeness
between them, is generally defined as the quantity:
2
F ˆ , ˆ Tr ˆ F ˆ , ˆ
This is not the only
ˆ ˆ measure used
Example: Suppose,
ˆ
ˆ
2
F ˆ , ˆ
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Quantum Messages and Quantum Data Compression
k
M ˆ1,ˆ 2 ,ˆ 3 ,........ˆ N ˆ pi ˆ i A ˆ1, ˆ 2 , ˆ 3 ,........ˆ k
i 1
A quantum message M consisting of a very long sequence of letters (or quantum
states) ˆ i :
M ˆ1,ˆ 2 ,ˆ 3 ,........ˆ N
can be compressed to NC qubits, in the limit of large N, without loss of fidelity,
where:
k
S ˆ C I ˆ S ˆ pi S ˆ i
i 1
The lower limit is achievable if the alphabet C represents pure states (not necessarily
orthogonal), or if the different letters in the alphabet commute
The most general measurements to obtain classical information from quantum states
can be described in terms of a complete set of positive Hermitian operators Fˆ j which
provide a resolution of the identity operator,
Fˆ j 1̂
j
These generalized measurements constitute a positive operator valued measure
(POVM). The probability pk that the outcome of a measurement on a quantum
state ̂ will be k is given as,
pk Tr ˆ Fˆk
Example: For a photon number measurement on a quantum state of light in a
cavity, the POVM is formed by the operators n n and the probabilities are
given as:
p n Tr ˆ n n
0 1
The quantum state is specified by two complex numbers and each can take an value
But the classical information that can be extracted from the above qubit is just one bit!!
Suppose the sender send the following two states with a-priori probability 1/2 each:
0 1 1 2 0
ˆ
0 1 2
One can use the following POVM:
Fˆ0 0 0 Fˆ1 1 1 Fˆ j 1̂
j
And
S ˆ 1 bit
ˆ N ˆ ˆ ˆ .......... ˆ
N terms
For the above quantum message, the obtained classical information I M : P per letter
about the preparation of the message, using the optimal measurement scheme, is
bounded as follows:
max k
I M : P I ˆ S ˆ pi S ˆ i
ˆ
F i 1
The upper limit in Holevo’s theorem can be achieved if and only if the quantum states
of all the letters in the alphabet C commute, i.e. ˆ i , ˆ k 0
S ˆ i fi , log2 fi , outcome of measurements
performed on it since it is not a
pure state
So if we choose the POVM to be:
F̂
ˆ
p Tr F Tr pi ˆ i Tr pi fi ,
i
pi fi ,
i , i
S ˆ p log2 p H M
max
I M : P H M H M | P
ˆ
F
p log2 p pi fi , log2 fi ,
i
k
S ˆ pi S ˆ i
i 1
ˆ N ˆ ˆ ˆ .......... ˆ
N terms
The channel is described by a trace preserving linear quantum operation E such that
the density operator of each letter at the output of the channel is related to the density
operator at the input of the channel by the relation:
ˆ E ˆ Eˆ jˆ Eˆ j Eˆ j Eˆ j 1̂
j j
A ˆ1, ˆ 2 , ˆ 3 ,........ˆ k
In the message, each letter ˆ i occurs with an a-priori probability pi
The density operators for each letter in the message and of the full message are then:
k
ˆ pi ˆ i ˆ N ˆ ˆ ˆ .......... ˆ
i 1
Question: How much classical information in bits can be communicated over the
channel per letter?
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Classical Information Over Quantum Channel: Channel Capacity
max k
C S E ˆ pi S E ˆ i
pi i 1
The classical capacity of the quantum channel is achievable (even for non-commuting
letters in the message) if the receiver is allowed to make block measurements on all
received letters
Note:
This capacity is also called the fixed-alphabet product-state capacity, since 1) the
optimization is not performed over the choice of input letters ˆ i , and 2) the input
letters are not assumed to be entangled over multiple uses of the channel and
therefore the input density operator is in a tensor product form
Power Constrain:
P
pin n nBo P pin n n no
n 0 n 0 B o
P P Bo P
C log2 1 log2 1 pin n n no
B o B o P n 0 B o
(bits)
P P Bo C
C log2 1 log2 1
B o B o P
P B o
log2 1
B o P
P
The capacity in bits per second is: log2 1
B o
P P B o
C B log2 1 log2 1
B o o P
P
C B log2 1
f f o
BS
In the limit P Bo , the quantum channel result is as if it were a classical AWGN
channel with added white noise with a noise power spectral density of o
WHY???
The transmitted photon can occupy any one of these time slots. The information in
bits transmitted per second by that one photon, and therefore the channel capacity,
becomes: log BT
2 P B o
C log2
T o P
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Classical Information Over a Photonic Channel: Coherent States
and Photon Number Detection
ˆ d 2 pin Channel ˆ d 2 pin
Fˆn n n
ˆ d 2 pin ˆ N ˆ ˆ ˆ .......... ˆ
Fˆn 1̂
n
N terms
● The channel is bandlimited
Power Constrain:
Bo d 2 pin
2
P
The chosen POVM, given below, for detection is (possibly) not the optimal POVM
for the coherent state alphabet (as we will see later)
Fˆn n n Fˆn 1̂
n
Since channel capacity definition includes use of the optimal POVM, which we are
(possibly) not using, we just calculate the mutual information between channel input
and the detector output
max
I O : I H O H O | I
pin
max
pout n log2 pout n d pin pout n | log2 pout n |
2
pin n0 n0
max
I O : I p n log2 pout n d pin pout n | log2 pout n |
2
pin n 0 out n 0
The above needs to be maximized over p() under the power constrain:
Bo d 2 pin
2
P
N terms
● The channel is bandlimited
Power Constrain:
Bo d 2 pin
2
P
Since channel capacity definition includes use of the optimal POVM, which we are
(possibly) not using, we just calculate the mutual information between channel input
and the detector output
pout | Tr Fˆ
1
2
pout Tr ˆ Fˆ d 2 pin pout |
The optimal mutual information between the channel input I and the heterodyne detector
output O is:
max
I O : I H O H O | I
pin
max
d 2 pout log2 pout d 2 pin d 2 pout | log2 pout |
pin
1 2 1
2
pout | e
Gaussian!!
2 2
r r
i i
With added noise
1 2 1 2 1 2 1 2 having a variance
pout | e e of 1/2 in each
2 1 2 2 1 2
quadrature!!
P
C log2 1
B o
And since one can send B letters per second, the capacity in bits/s is:
P
C B log2 1
B o
The result, although identical to the one obtained using photon number states and
photon number detection (in the high power limit), has more similarities with the
classical AWGN result if:
T
n n! m nm
n 0 1 T m nm
B
m 0 m ! n m! B
n n 1
S pout m | n m m pout m | n log2 p m | n log2 1 2 enT 1 T
m 0 m 0 2
max 1
C pout n log2 pout n pin n log2 1 2 enT 1 T
pin n n 0 n0 2
max 1
C pout n log2 pout n pin n log2 1 2 enT 1 T
pin n n 0 n0 2
The output distribution can be thermal if the input distribution is also thermal
n n
1 nin 1 ninT
pin n pout n
1 nin 1 nin 1 ninT 1 ninT
The optimal mutual information (bits per use) between the input I and the output O is:
max 1
I O : I pout n log2 pout n log2 1 eninT 1 T
pin n n 0 2
1 1
log2 1 ninT ninT log2 1 log2 1 eninT 1 T
ninT 2
P Pout B o 1 Pout
log2 1 out log2 1 log2 1 e 1 T
B o B o P out 2 B o
Here, is a number with values between 1 and 2 and depends on the value of T
The Low Power Limit: Pout B o The High Power Limit: Pout Bo (T 1)
Pout B o B Pout
I O : I log2 I O : I log2
o P
out
2 B o
(Now bits/s) (Now bits/s)
WHY?!?
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Classical Information Over a Lossy Photonic Channel: Capacity
k
ˆ pi ˆ i Channel E ˆ
i 1 Loss
Bath POVM
Fˆ j ?
The capacity of this quantum channel per letter is:
max k
C S E ˆ pi S E ˆ i
pi i 1
P P B o
C log2 1 out out log2 1
B o B o P out
S ˆ Tr ˆ log2 ˆ
The Von Neumann entropy plays three roles (that we know of so far):
1) It quantifies the quantum information content in qubits of a quantum state (i.e. the
minimum number of qubits needed to reliably encode the quantum state)
2) It also quantifies the classical information in bits that can be gained about the
quantum state by making the best possible measurement
As you will see, the Von Neumann entropy will not always give the answer to the
question we will ask!
1
a 0 1 1 0
2 A B A B
1
b 3 0 A 0 B 1 A 1 B
2
The answer for at least pure states of bipartite systems seems to be available
Alice Bob
Entanglement is a resource
Bell States:
1 1
a 0 1B 1A 0 B c 0 0 1 1
2 A
2 A B A B
1 1
b 0 1B 1A 0 d 0 0 1 1
2 A B
2 A B A B
Suppose Alice and Bob would like to prepare n copies of an entangled state:
0 A
0 B
0 A
1B 1 A 0 B
1 A 1B
Alice Bob
But what they already have in their possession are multiple copies of a Bell state
(doesn’t matter which one)
Suppose Alice and Bob use a minimum of kmin Bell states in their possession, and lots
of local operations on their respective qubits and classical communication between
each other (LOCC), and are able to generate n copies of the desired state
Then can we use the ratio kmin/n as a measure of entanglement in the state ??
i.e. how many Bell states does one need to use to generate one copy?
0 A
0 B
0 A
1B 1 A 0 B
1 A 1B
Alice Bob
But what they want are multiple copies of a Bell state (doesn’t matter which one)
Suppose Alice and Bob are able to prepare a maximum of kmax Bell states from the n
copies of the state in their possession, with only local operations on their
respective qubits and classical communication between each other (LOCC)
Then can we use the ratio kmax/n as a measure of entanglement in the state ??
i.e. how many Bell states does one generate per one copy?
0 A
0 B
0 A
1B 1 A 0 B
1 A 1B
ˆ A TrB
ˆB TrA
The above expression for bipartite entanglement works even when the qubits
involved are not 2-level systems but any arbitrary multilevel systems
1
a 0 1 1 0
2 A B A B
1
b 3 0 A 0 B 1 A 1 B
2
Answer:
E a 1.0
All four Bell states are
maximally entangled
E b 0.81
ECE 407 – Spring 2009 – Farhan Rana – Cornell University
Quantifying Entanglement of Bipartite Mixed States
Alice Bob
How many Bell states can Alice and Bob distill from ˆ AB ?