Approaching Shannon
Ruediger Urbanke, EPFL
Summer School @ USC, August 6th, 2010
Many thanks to Dan Costello, Shrinivas Kudekar, Alon Orlitsky, and Thomas Riedel
for their help with these slides.
Storing Shannon
Networking Shannon
Completing Shannon
Compressing Shannon
Reading Shannon
Coding
Disclaimer
Technical slides do not contain references.
These are all summarized at the end of each section.
Classes of Codes
(linear) block codes
sparse graph codes
convolutional codes
polar codes
How Do We Compare?
capacity
rate
block length
PN (R, C)
block error probability
complexity
How We Compare: Error Exponent
1
log(PN (R, C))
N N
E(R, C) = lim
error exponent
How We Compare: Finite-Length Scaling
f (z) scaling function; mother curve
10-2
PN (R, )
10-3
10-4
10
1/
0
N in
cre
asin
-5
10-6
10-7
10
threshold
10-1
-8
N 1/ ( )
scaling exponent
channel quality
How We Compare: Finite-Length Scaling
f (z) scaling function; mother curve
PN (R, C)
10-2
10-3
10-4
10
1/
0
N in
cre
asin
-5
10-6
10-7
threshold
10-1
10-8
N 1/ (C
C C)
C
scaling exponent
rate
Finite-Length Scaling
lim
N :N 1/ (CR)=z
PN (R, C) = f (z)
f (z)
scaling function; mother curve
>0
scaling exponent
Finite-Length Scaling -- References
V. Privman, Finite-size scaling theory, in Finite Size Scaling and Numerical Simulation of Statistical
Systems, V. Privman, ed., World Scientific Publ., Singapore, 1990, pp. 198.
Complexity
exponential versus polynomial
=C R
gap to capacity
linear -- but look at prefactor
Block Codes
Error Exponent of Block Codes under MAP
Figure ``borrowed from
1
log(PN (R, C)) error exponent
N N
E(R, C) = lim
quadratic
Error Exponent -- References
R. Gallager, Information Theory and Reliable Communication, Wiley 1968.
A. Barg and G. D. Forney, Jr., Random codes: Minimum distances and error exponents, IEEE
Transactions on Information Theory, Sept 2002.
Scaling of Block Codes under MAP -- BEC
``perfect code
PN
0
=0
distribution of erasures
PN
=1R
erasure fraction
=1
E[E] = N
2 ] = N (1 )
E[(E E)
E N (N , N (1 ))
N ((1 R) )
N ((1 R) )
Q
=Q
=Q
N (1 )
(1 )
(1 )
z = N ((1 R) )
Scaling of Block Codes under MAP -- BEC
random linear block codes are almost perfect
00101010001
01110010010
10101010101
01000101001
01011100010
probability that full rank
n1
i=0
square binary
random matrix
of dimension n
n1
2n 2i
in n
=
(1
2
) 0.28878809508 . . .
2n
i=0
00101010001
01110010010
10101010101
01000101001
if we have k rows
less then probability
decays by roughly
(k+1
2 )
hence for random linear block codes the transition is of constant (on an absolute scale) width
Scaling of Block Codes under MAP
log A(N, P ) = N C
N V Q1 (P ) + O(log(N ))
block length
error probability
A(N, P )
size of largest such code
C = E[i(x, y)]
i(x, y) = log
V = V[i(x, y)]
dp(y | x)
dp(y)
Finite-Length Scaling -- References
G. Landsburg, Uber eine Anzahlbestimmung und eine damit zusammenhangende Reihe, J.
Reine Angew. Math. vol. 111, pp. 87-88, 1893.
A. Feinstein, A new basic theorem of information theory, IRE Trans. Inform. Theory, vol.
PGIT-4, pp. 222, 1954.
V. Strassen, Asymptotische Abschtzungen in Shannons Informationstheorie, Trans. Third
Prague Conf. Information Theory, pp. 689723, 1962.
Y. Polyanskiy, H. V. Poor and S. Verd, "Dispersion of Gaussian Channels," 2009 IEEE Int.
Symposium on Information Theory, Seoul, Korea, June 28-July 3, 2009.
Y. Polyanskiy, H. V. Poor and S. Verd, "Dispersion of the Gilbert-Elliott Channel," 2009 IEEE
Int. Symposium on Information Theory, Seoul, Korea, June 28-July 3, 2009.
For a very simple proof of previous result ask Thomas Riedel, UIUC
convolutional codes
Convolutional Codes
Convolutional Codes
Figures ``borrowed from
affine
Finite-Length Scaling of LDPC Codes -- BEC
scaling behavior?
K constraint length
Convolutional Codes -- Some References
Big bang:
P. Elias, Coding for noisy channels, in IRE International Convention Record, Mar. 1955, pp. 3746.
Algorithms and error exponents:
J. M. Wozencraft, Sequential decoding for reliable communication, Research Lab. of Electron. Tech. Rept. 325,
MIT, Cambridge, MA, USA, 1957.
R. M. Fano, A heuristic discussion of probabilistic decoding, IEEE Trans. Information Theory, vol. IT-9, pp.
64-74, Apr. 1963.
A. J. Viterbi, Error bounds of convolutional codes and an asymptotically optimum decoding algorithm, IEEE
Trans. Inform. Theory, 13 (1967), pp. 260269.
H. L. Yudkin, Channel state testing in information decoding, Sc.D. thesis, Dept. of Elec. Engg.,M.I.T., 1964.
J. K. Omura, On the Viterbi decoding algorithm, IEEE Trans. Inform. Theory,15 (1969), pp. 177179.
G. D. Forney, Jr., The Viterbi algorithm, Proc. IEEE, 61 (1973), pp. 268278.
K. S. Zigangirov, Time-invariant convolutional codes: Reliability function, in Proc. 2nd Joint Soviet-Swedish
Workshop Information Theory, Grnna, Sweden, Apr. 1985.
N. Shulman and M. Feder, Improved Error Exponent for Time-Invariant and Periodically Time-Variant
Convolutional Codes, IEEE Trans. Inform. Theory, 46 (2000), pp. 97103.
G. D. Forney, Jr., The Viterbi algorithm: A personal history. E-print: cond-mat/0104079, 2005.
Overview:
A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding, McGraw-Hill, New York, NY,
USA, 1979.
S. Lin and D. J. Costello, Jr., Error Control Coding, Prent. Hall, Englewood Cliffs, NJ, USA, 2nd ed., 2004.
R. Johannesson and K. S. Zigangirov, Fundamentals of Convolutional Coding, IEEE Press, Piscataway, NJ, USA,
1999.
Some Open Questions
Scaling behavior
digrams such as TH, ED, etc. In the second-order approximation, digram structure is introduced. After a
letter is chosen, the next one is chosen in accordance with the frequencies with which the various letters
follow the first one. This requires a table of digram frequencies pi j . In the third-order approximation,
trigram structure is introduced. Each letter is chosen with probabilities which depend on the preceding two
letters.
3. T HE S ERIES OF A PPROXIMATIONS TO E NGLISH
To give a visual idea of how this series of processes approaches a language, typical sequences in the approximations to English have been constructed and are given below. In all cases we have assumed a 27-symbol
alphabet, the 26 letters and a space.
1. Zero-order approximation (symbols independent and equiprobable).
XFOML RXKHRJFFJUJ
QPAAMKBZAACIBZLUDREVOIGRES
LDPC ZLPWCFWKCYJ FFJEYVKCQSGHYD
HJQD.
2. First-order approximation (symbols independent but with frequencies of English text).
OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA
NAH BRL.
3. Second-order approximation (digram structure as in English).
ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.
4. Third-order approximation (trigram structure as in English).
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.
5. First-order word approximation. Rather than continue with tetragram, , n-gram structure it is easier
and better to jump at this point to word units. Here words are chosen independently but with their
appropriate frequencies.
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES
THE LINE MESSAGE HAD BE THESE.
6. Second-order word approximation. The word transition probabilities are correct but no further structure is included.
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT
THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
The resemblance to ordinary English text increases quite noticeably at each of the above steps. Note that
these samples have reasonably good structure out to about twice the range that is taken into account in their
construction. Thus in (3) the statistical process insures reasonable text for two-letter sequences, but fourletter sequences from the sample can usually be fitted into good sentences. In (6) sequences of four or more
words can easily be placed in sentences without unusual or strained constructions. The particular sequence
of ten words attack on an English writer that the character of this is not at all unreasonable. It appears then
that a sufficiently complex stochastic process will give a satisfactory representation of a discrete source.
The first two samples were constructed by the use of a book of random numbers in conjunction with
(for example 2) a table of letter frequencies. This method might have been continued for (3), (4) and (5),
since digram, trigram and word frequency tables are available, but a simpler equivalent method was used.
Sparse Graph Codes
LDPC Ensemble
n
o
d
e
s
x1+x4+x8 = 0
p
e
r
m
u
t
a
t
i
o
n
c
h
e
c
k
n
o
d
e
s
Hx=0
sparse
v
a
r
i
a
b
l
e
(1-(1-x)5)3
Asymptotic Analysis -- BEC
34
(1-(1-x)5)3
MAP versus BP
35
Capacity Achieving -- BEC
Capacity Approaching -- BMS
Error Exponent of LDPC Codes -- BP
P {|PN (G, , ) E[PN (G, , )]| } eN
graph
channel parameter
# of iterations
If E[PN (G, , )] converges to zero for large and if code
has error correcting radius then we can prove that the code
has an error exponent under iterative decoding.
simplest sufficient condition: code has expansion at least 3/4 which is
true whp if left degree is at least 5; (less restrictive conditions are known
but more complicated); codes used in ``practice do not have error exponents
Expansion
(dl, dr)-regular cannot
have expansion beyond
(dl-1)/dl
|C|
at most size
dl |V|
|V|
take the smallest ratio
|C|/(dl |V|)
over all ``small sets
remarkably, random graphs
essentially achieve this bound whp
Finite-Length Scaling of LDPC Codes -- BEC
PN = Q(z/)(1 + O(N
13
))
z=
scaling parameters computable
(x) =
1
5
x + x3
6
6
N (BP N 2/3 )
(we ignore error floor here!)
(x) = x5
R = 3/7
PN
10-1
10-2
= 0.5791
/ = 0.6887
10-3
10-4
10-5
0.3
0.35
0.4
0.45
0.5
BP = 0.4828
Optimization
10-1
10 11 12 13
10
40.58 %
0.0
10-2
rate/capacity
1.0
contribution to error floor
10-3
10 12 14 16 18 20 22 24 26
10-4
10-5
10-6
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
Finite-Length Scaling of LDPC Codes -- BAWGNC
same form of scaling law; parameters are computable but no proof
(3, 6) BSC
(3, 4) BAWGNC
Gap To Threshold versus Length
lim
N :N 1/ (CR)=z
PN (R, C) = f (z)
halving the gap requires increasing length by 4
1/
N1/
N
(C R) = z
N = (z/)
=2
fixes error
additive gap
Gap versus Complexity (per bit)
BEC/Threshold -- O(1); degrees are constant and we touch every edge at most
once
BEC/Capacity -- O(log(1/\delta)) for standard LDPC; degrees grow like log(1/\delta)
and we touch every edge once
BEC/Capacity -- O(1) for MN-type LDPC ensembles; degrees are constant and we
touch every edge at most `once
BMS/Threshold -- ???
BMS/Capacity -- ???
Sparse Graph Codes -- Some References
Big bang:
R. G. Gallager, Low-density parity-check codes, IRE Trans.Inform.Theory, 8 (1962).
C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error-correcting coding and
decoding, in Proc. of ICC, Geneva, Switzerland, May 1993.
Analysis:
M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. A. Spielman, Analysis of low density codes and
improved designs using irregular graphs, in Proc. of the 30th Annual ACM Symposium on Theory of
Computing, 1998, pp. 249258.
M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. A. Spielman, Efficient erasure correcting codes, IEEE
Trans. Inform. Theory, 47 (2001), pp. 569584.
T. Richardson, A. Shokrollahi, and R. Urbanke, Design of capacity-approaching irregular low-density
parity-check codes, IEEE Trans. Inform. Theory, 47 (2001), pp. 619637.
T. Richardson and R. Urbanke, The capacity of low-density parity check codes under messagepassing decoding, IEEE Trans. Inform. Theory, 47 (2001), pp. 599618.
S.-Y. Chung, G. D. Forney, Jr., T. Richardson, and R. Urbanke, On the design of low-density paritycheck codes within 0.0045 dB of the Shannon limit, IEEE Commun. Lett., 5 (2001), pp. 5860.
Error exponents:
D. Burshtein and G. Miller, Expander graph arguments for message-passing algorithms, IEEE Trans.
Inform. Theory, 47 (2001), pp. 782790.
O. Barak and D. Burshtein, Upper Bounds on the Error Exponents of LDPC Code Ensembles, The
IEEE International Symposium on Information Theory (ISIT-2006), Seattle, July 2006.
Sparse Graph Codes -- Some References
Finite-length scaling:
A. Montanari, Finite-size scaling of good codes, in Proc. of the Allerton Conf. on Commun., Control,
and Computing, Monticello, IL, USA, Oct. 2001.
A. Amraoui, A. Montanari, T. Richardson, and R. Urbanke, Finite-length scaling for iteratively decoded
LDPC ensembles, in Proc. of the Allerton Conf. on Commun., Control, and Computing, Monticello, IL,
USA, Oct. 2003.
J. Ezri, A. Montanari, and R. Urbanke, Finite-length scaling for Gallager A, in 44th Allerton Conf. on
Communication, Control, and Computing, Monticello, IL, Oct. 2006.
A. Dembo and A. Montanari, Finite size scaling for the core of large random hyper-graphs. E-print:
math.PR/0702007, 2007.
Ezri, J., Montanari, A., Oh, S. and Urbanke, R. (2008) The Slope Scaling Parameter for General
Channels, Decoders, and Ensembles. Proceeding of the IEEE International Symposium on Information
Theory.
Complexity:
A. Khandekar and R. J. McEliece, On the complexity of reliable communication on the erasure
channel, in Proc. IEEE Int. Symp. Information Theory (ISIT2001), Washington, DC, Jun. 2001, p. 1.
H. D. Pfister, I. Sason, and R. Urbanke, Capacity-achieving ensembles for the binary erasure channel
with bounded complexity, IEEE Transactions on Inform. Theory, vol. 51 , issue 7, 2005 , pp. 2352 2379.
Overviews:
D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge Univ. Press,
2003.
T. Richardson and R. Urbanke, Modern Coding Theory, Cambridge Univ. Press, 2008.
Some Open Questions
Simple design procedures?
Can you achieve capacity on general BMS channels?
Thresholds under LP decoding?
Scaling for general BMS channels?
Scaling under MAP?
Scaling under LP decoding?
Scaling under flipping decoding?
Scaling to capacity?
Polar Codes
patterns
Codes from Kronecker Product of G2
Reed-Muller Codes
choose rows of largest weight
Definition of Channels
Polar Codes
W -- BMS channel
Definition of Channels
Channel Polarization
0
0
bad channels
Definition of Channels
Successive Decoding
Successive Decoding
Stefan Meier http://ipgdemos.epfl.ch/polarcodes/
threshold
Channel Polarization
Stefan Meier http://ipgdemos.epfl.ch/polarcodes/
How Do Channels Polarize?
X1
known
noise
X2
BEC()
BEC()
U1 =X1+X2; observe Y1 and Y2
U2 =X2; U2 =X1+U1 ; observe Y1 and Y2
U1
U2
+
X1
1-(1-)2
much worse
X2
Y1 Y2
2 much better
X2
parity-check node
X1+U1=X1
Y2 Y1
repetition code
total capacity = (1-)2+1-2=2(1-)
Definition of Channels
How Do Channels Polarize?
0.9375 0.9961 0.75
0.5
0.8789
0.43750.8086 0.25
0.3164
0.5625 0.6836
0.1914
0.1211
0.0625
0.0039
0.5
Polar Codes -- Some References
Big bang:
E. Arikan, Channel polarization: A method for constructing capacity-achieving codes for symmetric
binary-input memoryless channels, http://arxiv.org/pdf/0807.3917
Exponent:
E. Arikan and E. Telatar, On the Rate of Channel Polarization, http://arxiv.org/pdf/0807.3806
S. B. Korada, E. Sasoglu, and R. Urbanke, Polar Codes: Characterization of Exponent, Bounds, and
Constructions, http://arxiv.org/pdf/0901.0536
Source Coding:
N. Hussami, S. B. Korada, and R. Urbanke, Performance of Polar Codes for Channel and Source
Coding, http://arxiv.org/pdf/0901.2370
S. B. Korada, and R. Urbanke, Polar Codes are Optimal for Lossy Source Coding, http://arxiv.org/pdf/
0903.0307
E. Arikan, Source Polarization, http://arxiv.org/pdf/1001.3087
Non-symmetric and non-binary channels:
E. Sasoglu, E. Telatar, and E.Arikan, Polarization for arbitrary discrete memoryless channels, http://
arxiv.org/pdf/0908.0302
R. Mori and T. Tanaka, Channel Polarization on q-ary Discrete Memoryless Channels by Arbitrary
Kernels, http://arxiv.org/pdf/1001.2662
R. Mori and T. Tanaka, Non-Binary Polar Codes using Reed-Solomon Codes and Algebraic Geometry
Codes, http://arxiv.org/pdf/1007.3661
MAC channel:
E. Sasoglu, E. Telatar and Edmund Yeh, Polar codes for the two-user multiple-access channel, http://
arxiv.org/pdf/1006.4255
E. Abbe and E. Telatar, Polar Codes for the m-User MAC and Matroids, http://arxiv.org/pdf/
1002.0777
Compound channel:
S. H. Hassani, S. B. Korada, and R. Urbanke, The Compound Capacity of Polar Codes, http://
arxiv.org/pdf/0907.3291
Wire-tap channel and security:
H. Mahdavifar and A. Vardy, Achieving the Secrecy Capacity of Wiretap Channels Using Polar Codes,
http://arxiv.org/pdf/1007.3568
E. Hof and S. Shamai, Secrecy-Achieving Polar-Coding for Binary-Input Memoryless Symmetric WireTap Channels, http://arxiv.org/pdf/1005.2759
Mattias Andersson, Vishwambhar Rathi, Ragnar Thobaben, Joerg Kliewer, Mikael Skoglund, Nested
Polar Codes for Wiretap and Relay Channels, http://arxiv.org/pdf/1006.3573
O. O. Koyluoglu and H. El Gamal, Polar Coding for Secure Transmission and Key Agreement, http://
arxiv.org/pdf/1003.1422
Constructions:
R. Mori and T. Tanaka, Performance and Construction of Polar Codes on Symmetric Binary-Input
Memoryless Channels, http://arxiv.org/pdf/0901.2207
M. Bakshi, S. Jaggi, and M. Effros, Concatenated Polar Codes, http://arxiv.org/pdf/1001.2545
Scaling:
S. H. Hassani and R. Urbanke, On the scaling of Polar codes: I. The behavior of polarized channels,
http://arxiv.org/pdf/1001.2766
T. Tanaka and R. Mori, Refined rate of channel polarization, http://arxiv.org/pdf/1001.2067
S. H. Hassani, K. Alishahi and R. Urbanke, On the scaling of Polar Codes: II. The behavior of unpolarized channels, http://arxiv.org/pdf/1002.3187
Error Exponent of Polar Codes -- BEC
A First Guess
Z ,
1 (1 Z)2 ,
wp
wp
1
2,
1
2.
Z 2,
2Z Z 2 ,
wp 12 ,
wp 12 .
Y = log(Z)
2Y,
Y 1,
wp 12 ,
wp 12 .
X = log(Y )
X + 1,
X,
wp 12 ,
wp 12 .
assume that Z is already small,
hence Y is large
Error Exponent of Polar Codes -- BEC
A First Guess
X + 1,
X,
wp 12 ,
wp 12 .
random walk on lattice with drift
after m steps we expect X to have value roughly m/2
this means we expect Y to have value roughly 2
m/2
this means we expect Z to have value roughly 2
=2
Error Exponent of Polar Codes
lim P (Zm 2
2m/2+
mQ1 (R/C)/2+o(
m)
)=R
Finite-Length Scaling for Polar Codes (BEC)
wp 12 ,
wp 21 .
Z 2,
1 (1 Z)2 ,
Battacharyya process
z=0
z=1
1
2
= 1/2
symmetry of distribution
(general case follows in similar manner)
QN (x) =
z=0
1
1
(i)
|{i : x E(WN ) }|
N
4
z=1
z = 2x
1
2
Finite-Length Scaling for Polar Codes -- BEC
2Q2N (x) = QN (1/2 1/4 x/2)+(121{x1/8} )QN (min x/2, 1/2 x/2)
1
Q(x) = lim N QN (x)
N
Q(x) = 2
1
1
Q(1/2
scaling assumption
1/4 x/2) + (1 21{x1/8} )Q(min x/2, 1/2 x/2)
solve this functional equation (numerically)
this gives Q(x) (up to scaling) and
Q(x)
BEC
3.62
BAWGNC
Finite-Length Scaling for Polar Codes -- BEC
Simulations versus Scaling
log10 PN (R, C)
PN (R, C) Q1 (N (C R))
N 1/ (C R)
C R
N = 223 , 224 , 225 , 226
Gap To Capacity versus Length
0.43 bits/channel use
86 % of capacity
lim
N :N 1/ (CR)=z
PN (R, C) = f (z)
1/
N1/
N
(C R) = z
N = (z/)
fixes error
additive gap
! 10 billion to get 1% close to capacity
Gap to Capacity versus Complexity
= O(log(1/))
complexity per bit
Some Open Questions
Variation on the theme that performs better at small lengths?
Do RM codes achieve capacity?
Make scaling conjecture under successive decoding rigorous.
Scaling behavior under MAP decoding?
Find a reasonable channel where they do not work. :-)
Message
sparse graph codes -- best codes in ``practice; still miss
some theory; error floor region is tricky; still somewhat of an
art to construct
polar codes -- nice for theory; not (yet) ready for applications
but the field is young; how do we improve finite-length
performance
scaling behavior is the next best thing to exact analysis;
probably more meaningful characterization for practical
case than error exponent