0% found this document useful (0 votes)
6 views133 pages

channelcoding_WS2324

The document contains lecture notes on Channel Coding from the Technical University of Munich, authored by Prof. Dr.-Ing. Antonia Wachter-Zeh and others for the Winter 2023/2024 edition. It covers various topics including principles of channel coding, finite fields, linear block codes, and Reed-Solomon codes, among others. The notes also provide references for further reading and specific literature related to each chapter.

Uploaded by

johnwickkimber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views133 pages

channelcoding_WS2324

The document contains lecture notes on Channel Coding from the Technical University of Munich, authored by Prof. Dr.-Ing. Antonia Wachter-Zeh and others for the Winter 2023/2024 edition. It covers various topics including principles of channel coding, finite fields, linear block codes, and Reed-Solomon codes, among others. The notes also provide references for further reading and specific literature related to each chapter.

Uploaded by

johnwickkimber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

Technical University of Munich

Coding and Cryptography


Prof. Dr.-Ing. Antonia Wachter-Zeh

Lecture Notes for

Channel Coding

Prof. Dr.-Ing. Antonia Wachter-Zeh


Marvin Xhemrishi, M.Sc.
Lorenz Welter, M.Sc.

Coding and Cryptography


Technical University of Munich

Edition: Winter 2023/2024


Author: Antonia Wachter-Zeh
(all rights reserved)
Contents

1 Motivation 1
1.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation of Channel Coding . . . . . . . . . . . . . . . . . 2

2 Principles of Channel Coding 5


2.1 Transmission Model . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Channel Models . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Error Channels: Binary/q-ary Symmetric Channel . . 6
2.2.2 Erasure Channels: Binary/q-ary Erasure Channel . . 8
2.2.3 The Additive White Gaussian Noise Channel . . . . . 9
2.3 Decoding Principles . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Maximum A-Posteriori Decoding . . . . . . . . . . . 11
2.3.2 Maximum Likelihood Decoding . . . . . . . . . . . . 12
2.3.3 Symbol-by-Symbol Maximum A-Posteriori Decoding 14
2.4 Decoding Principles in the Hamming Metric . . . . . . . . . 15
2.4.1 Hamming Weight and Hamming Distance . . . . . . 15
2.4.2 Error Detection . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 Erasure Correction . . . . . . . . . . . . . . . . . . . 16
2.4.4 Unique Error Correction . . . . . . . . . . . . . . . . 17
2.4.5 Nearest Codeword Decoding . . . . . . . . . . . . . . 18
2.4.6 List Decoding . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Decoding Results and Error Probability . . . . . . . . . . . . 19
2.5.1 Decoding Results . . . . . . . . . . . . . . . . . . . . 19
2.5.2 Error Probability . . . . . . . . . . . . . . . . . . . . 20

3 Finite Fields 23
3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Group . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.2 Field . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Prime Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Extension Fields . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Polynomials over Finite Fields . . . . . . . . . . . . . . . . . 31
3.5 Cyclotomic Cosets and Minimal Polynomials . . . . . . . . . 32
3.6 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Matrix Properties . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Linear Block Codes 41


4.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . 41
4.1.1 Definition and Parameters . . . . . . . . . . . . . . . 41
4.1.2 Maximum Possible Code Rate . . . . . . . . . . . . . 44
4.1.3 Encoding and Generator Matrix . . . . . . . . . . . . 45
4.1.4 Dual Code . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.5 Parity-Check Matrix and Syndrome . . . . . . . . . . 49
4.1.6 The Hamming Code . . . . . . . . . . . . . . . . . . 52
4.2 Standard Array (Coset) Decoding . . . . . . . . . . . . . . . 52
4.3 Bounds on the Cardinality and Minimum Distance . . . . . 54
4.3.1 Sphere-Packing Bound . . . . . . . . . . . . . . . . . 54
4.3.2 Singleton Bound . . . . . . . . . . . . . . . . . . . . 56
4.3.3 Gilbert–Varshamov Bound . . . . . . . . . . . . . . . 56
4.4 Obtaining Longer/Shorter Codes from Existing Codes . . . . 57

5 Reed–Solomon Codes 63
5.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Parity-Check Matrix and Generator Matrix . . . . . 63
5.1.2 Definition via Evaluation . . . . . . . . . . . . . . . . 65
5.1.3 Primitive Reed–Solomon Codes . . . . . . . . . . . . 66
5.1.4 Definition via Discrete Fourier Transform . . . . . . . 67
5.2 Syndrome-Based Unique Decoding . . . . . . . . . . . . . . 69
5.2.1 Syndrome Computation . . . . . . . . . . . . . . . . 70
5.2.2 The Key Equation and How to Solve it . . . . . . . . 70
5.2.3 Finding the Error Locations . . . . . . . . . . . . . . 74
5.2.4 Finding the Error Values . . . . . . . . . . . . . . . . 75
5.2.5 Unique Decoding: Overview . . . . . . . . . . . . . . 76
5.3 Interpolation-Based Unique Decoding . . . . . . . . . . . . . 76
5.4 List Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.1 Sudan Algorithm . . . . . . . . . . . . . . . . . . . . 78
5.4.2 Idea of Guruswami–Sudan Algorithm . . . . . . . . . 80

6 Cyclic Codes 83
6.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . 83
6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.2 Generator and Parity-Check Polynomials . . . . . . . 85
6.1.3 Generator and Parity-Check Matrix . . . . . . . . . . 86
6.2 BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.2 The BCH Bound . . . . . . . . . . . . . . . . . . . . 91
6.2.3 Special BCH Codes . . . . . . . . . . . . . . . . . . . 92
6.2.4 Decoding . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Reed–Muller Codes 95
7.1 First-Order Reed–Muller Codes . . . . . . . . . . . . . . . . 95
7.1.1 Definition and Construction . . . . . . . . . . . . . . 95
7.1.2 Unique Decoding . . . . . . . . . . . . . . . . . . . . 97
7.2 Connection to Hamming and Simplex Codes . . . . . . . . . 98
7.3 Reed–Muller Codes of Higher Order . . . . . . . . . . . . . . 100

8 Code Concatenation 105


8.1 Product Codes . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.2 Concatenated Codes . . . . . . . . . . . . . . . . . . . . . . 109
8.3 Generalized Concatenated Codes . . . . . . . . . . . . . . . 110

9 Convolutional Codes 115


9.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . 115
9.1.1 Convolutional Codes . . . . . . . . . . . . . . . . . . 115
9.1.2 Impulse Response and Generator Matrix . . . . . . . 117
9.1.3 State Diagram . . . . . . . . . . . . . . . . . . . . . . 118
9.1.4 Polynomial Representation . . . . . . . . . . . . . . . 119
9.1.5 Free Distance . . . . . . . . . . . . . . . . . . . . . . 120
9.2 Termination, Truncation & Tailbiting . . . . . . . . . . . . . 121
9.2.1 Termination . . . . . . . . . . . . . . . . . . . . . . . 121
9.2.2 Truncation . . . . . . . . . . . . . . . . . . . . . . . . 121
9.2.3 Tailbiting . . . . . . . . . . . . . . . . . . . . . . . . 122
9.3 Trellis and Viterbi Decoding . . . . . . . . . . . . . . . . . . 123
9.3.1 Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.3.2 Viterbi Decoding . . . . . . . . . . . . . . . . . . . . 124
Chapter 1

Motivation

1.1 References
This lecture is mostly self-contained. However, for further reading, the main
literature is given and also specific references for each chapter.
Main literature:
• R. M. Roth, Introduction to Coding Theory, Cambridge Univ. Press,
2006 [1]
• J. Justesen and T. Høholdt, A Course in Error-Correcting Codes,
European Mathematical Society, Jan. 2004. [2]
• M. Bossert, Kanalcodierung, 3rd ed. Oldenburg, 2013 [3].
(M. Bossert, Channel Coding for Telecommunications, Wiley, 1999 [4])
Further literature:
• R. E. Blahut, Algebraic Codes for Data Transmission, 1st ed. Cambridge
Univ. Press, 2003 [5]
• R. Johannesson and K. S. Zigangirov, Fundamentals of Convolutional
Coding, Wiley, 1999 [6]
• J. H. van Lint, Introduction to Coding Theory, 3rd ed. Springer,
1998 [7]
References for each Chapter:
• Chapter 2 (CC Principles): Bossert, Ch. 1; Roth, Ch. 1
• Chapter 3 (Finite Fields): Justesen–Hoholdt, Ch. 2; Roth, Ch. 3
• Chapter 4 (Linear Codes): Roth, Ch. 2
• Chapter 5 (Reed–Solomon Codes):
Definition & Key equation decoding: Roth, Ch. 5.1, 5.2, 6.1, 6.2, 6.3, 6.5;
List decoding: Justesen–Høholdt, Ch. 12
• Chapter 6 (Cyclic Codes): Roth, Ch. 8
• Chapter 7 (Reed–Muller Codes: Bossert, Ch. 5.1, 5.2
• Chapter 8 (Concatenation): Bossert, Ch. 9, Roth, Ch. 12
• Chapter 9 (Convolutional Codes): Bossert, Ch. 8, Johannesson, Ch. 1
2 Chapter 1 Motivation

1.2 Motivation of Channel Coding


We start these lecture notes with the important question What is channel
coding? and then give some basic properties of good channel codes.
The basic task of channel coding is to transmit (or store) data over a noisy
channel while still being able to reconstruct the transmitted message.
Figure 1.1 shows an example of transmitting numbers and words. The
channel distorts certain symbols. In the first example, the number 3789 is
transmitted, however 3759 is received. If the sender is allowed to choose any
combination of four numbers, then this error is undetectable and uncorrectable
for the receiver as 3759 is also a valid combination.
If on the other hand, the sender is only allowed to transmit words of the
English language, errors might be detectable and even correctable due to the
natural redundancy of language. For example, when we transmit “house”
and receive “houke”, we are able to correct this single error. When we
receive however “mouse” or “horse” we cannot detect the error, although,
similar to “houke”, only a single error has occurred.


Numbers: 3789 3759 error undetectable
Words: house houke error correctable
house mouse error undetectable
house horse error undetectable

Figure 1.1: Transmission of numbers and words

We can therefore say that channel coding deals with the task of encoding
information in a way that the information can be reconstructed in the
presence of errors. We usually assume that a message consists of symbols
of a certain alphabet which are encoded to a longer sequence of symbols
(i.e., a codeword) and transmitted over some channel which distorts some
symbols. To reconstruct (i.e., decode) the original message, the receiver has
to be able to remove distortions from the received word.
Figure 1.1 already provides some insight to properties of “good” codes which
are discussed in the following.
The transmission of any four numbers has shown that not all words should
be valid codewords to enable error detection or correction. Thus, redundancy
is necessary. Language has some natural redundancy (e.g., “houke” is
no valid “codeword”), but we want to find ways to add redundancy in a
structured way. This will enable us to make statements on how many errors
can always be detected/corrected.
We have also seen that a single error in “house” can result in other words
(“mouse”, “horse”). As it is highly probable in practice that even in a good

Channel Coding COD, TUM


1.2 Motivation of Channel Coding 3

channel some errors happen, a few corrupted symbols should not result in
another codeword. That means, any two codewords should have a large
distance.
Finally, one could simply transmit each message multiple times and hope
that at least one of them is received error-free (and can somehow be recognized
as such). This, however, results in a very inefficient scheme. We therefore
require that the redundancy should not be too large which implies that the
code rate (portion of information symbols in all transmitted symbols) is
sufficiently large.
The mentioned properties result in a trade-off between good error-correcting
capability (for which large redundancy is needed) versus high code rate (i.e.,
low redundancy) which is directly related to the efficiency of the scheme.
Some terminology that is used in this lecture should be clarified immediately
and is shown in the following high-level definition.

Definition 1 (Code, Codewords). An error-correcting code (short: code)


is a set of codewords.
A codeword is a vector (of bits, symbols, letters, ...) of a certain length.
We will transmit a codeword and obtain a received word, which is (in an
additive channel) the sum of the transmitted codeword and an error word.
Based on the received word, we want to reconstruct the transmitted codeword.
A code has several parameters (length, size, dimension, distance, redundancy,
rate...) which we will define later.

COD, TUM Channel Coding


Chapter 2

Principles of Channel Coding


The following chapter lies the foundation of channel coding in the sense
that we are analyzing the transmission model, channel models and decoding
principles.
The recommended literature for this chapter is [1, Chapter 1] and [3, Chapter 1].

2.1 Transmission Model

Figure 2.1 shows the transmission model that is used throughout this lecture.

u c r ĉ
source encoder + decoder sink û

transmitter channel receiver

Figure 2.1: Transmission model

The source generates information vectors u = (u0 , u1 , . . . , uk−1 ) whose


symbols u0 , u1 , . . . , uk−1 are chosen from a finite alphabet A and are assumed
to be statistically independent and equi-probable. We will later assume
that the alphabet A is a finite field (e.g., the binary field F2 = {0, 1}).
Of course, in practice, we might have a source that emits for example
letters. In this lecture, it is assumed that a mapping from these symbols to
a fixed alphabet A has already taken place. It is also assumed that suitable
source coding techniques were applied such that the symbols u0 , . . . , uk−1
are equi-probable.
The encoder maps any message vector u of length k to a valid codeword
vector c of length n from the same finite alphabet:

u ∈ Ak → c ∈ An and c ∈ C,
6 Chapter 2 Principles of Channel Coding

where C ⊆ An denotes the error-correcting code that is used. This mapping


from u to c has to be one-to-one such that on the receiver side, we can
reconstruct u b given c b. Frequently, systematic encoding is used. In this
case, the first k symbols of the codeword equal the information symbols,
i.e., c = (u0 , u1 , . . . , uk−1 , ck , . . . , cn−1 ) and the last n − k symbols are called
redundancy.
The codeword c is then transmitted over a noisy channel. We consider
only additive discrete memoryless channels (DMCs). On the one hand, this
means that the error is added (and not e.g., multiplied) and ri = ci + ei
is received. On the other hand, this means that the transmission of any
symbol ci is independent of the transmission of the other symbols:

Y
n−1
P (r|c) = P (ri |ci ).
i=0

The additive DMC summarizes filtering, modulation, demodulation, sampling


and quantization (D/A and A/D conversion).
The receiver then obtains a noisy version of the codeword, called the received
word r = (r0 , r1 , . . . , rn−1 ) = c + e. The task of the decoder is to
reconstruct c given only r. Designing capable and efficient channel decoders
is a major challenge in channel coding and large parts of this lecture will
deal with decoding techniques. We denote the “estimated” version of c by
cb. The goal of any decoder is that cb = c.
The second step on the receiver side, mapping cb to an information vector
b (in the sink) is an “easy” task. Due to the one-to-one mapping of
u
the encoding function, this is a simple mapping. In particular, when a
systematic encoding was used, the first k symbols of cb give u
b.

2.2 Channel Models

In this section, frequent DMCs are shown.

2.2.1 Error Channels: Binary/q-ary Symmetric Channel

The binary symmetric channel (BSC) and its generalization, the q-ary
symmetric channel (QSC), are channel models where symbol errors happen
with a certain probability.

Definition 2 (Binary Symmetric Channel (BSC)). The memoryless BSC


has input and output alphabet A = {0, 1}. For every pair of symbols (r, c) ∈
{0, 1}2 , we have:

Channel Coding COD, TUM


2.2 Channel Models 7


1 − p if r = c
P (r|c) =
p 6 c,
if r =

where p, with 0 ≤ p ≤ 1, is called crossover probability.

The BSC therefore flips each bit with probability p, independently of the
previous and following bits (DMC). This is illustrated in Figure 2.2.

1−p
c=0 r=0

c=1 r=1
1−p

Figure 2.2: Binary symmetric channel with crossover probability p

Clearly, if p = 0, no errors happen and all received symbols equal the


transmitted symbols. Similarly, if p = 1 and this fact is known to the
receiver, all received symbols are flipped and can therefore be flipped back
by the receiver. In both cases, the communication is therefore completely
reliable.
The “worst” BSC is a BSC with p = 12 : the output is statistically independent
of the input and the output symbols are 0 and 1 with equal probability.
The QSC is a generalization of the BSC to q-ary alphabets.

Definition 3 (q-ary Symmetric Channel (QSC)). The memoryless QSC


has an input and output alphabet A of size q. For every pair of symbols
(r, c) ∈ A2 , we have:

1 − p if r = c
P (r|c) =
 p if r = c0 , ∀c0 ∈ A \ {c},
q−1

where p, with 0 ≤ p ≤ 1, is called crossover probability.

The QSC therefore changes each symbol with probability p into another
symbol of the alphabet A, where all “false” symbols occur with equal
probability. The QSC is illustrated in Figure 2.3.
Similar to the BSC, for p = 0 no errors happen and the communication is
completely reliable. For p = q−1
q
, all output symbols are equally likely (they

COD, TUM Channel Coding


8 Chapter 2 Principles of Channel Coding

a
c=0 r=0
b b

a
c=1 r=1

.. b b ..
. .
b b

a=1−p
a
c=q−1 r =q−1 b = p/(q − 1)

Figure 2.3: q-ary symmetric channel with crossover probability p

all occur with probability 1q ) and the output is statistically independent of


the input.

2.2.2 Erasure Channels: Binary/q-ary Erasure Channel

The binary erasure channel (BEC) and its generalization, the q-ary erasure
channel (QEC), are channels where so-called erasures happen. This means,
the receiver knows that at a certain position an error happened but does
not know the error value. This can be modeled as a channel where the size
of the output alphabet is one larger than that of the input alphabet and
contains an additional symbol that denotes an erasure (we use ⊛ in this
lecture).

Definition 4 (Binary Erasure Channel (BEC)). The memoryless BEC


has input alphabet A = F2 = {0, 1} and output alphabet {0, 1, ⊛}, where ⊛
denotes an erasure. For every pair of symbols (r, c) ∈ F22 , we have:

1 − ϵ

 if r = c
P (r|c) =

ϵ if r = ⊛


0 if r 6= c, r 6= ⊛,

where ϵ, with 0 ≤ ϵ ≤ 1, is called erasure probability.

The BEC therefore erases each symbol with probability ϵ. This is illustrated
in Figure 2.4.

Channel Coding COD, TUM


2.2 Channel Models 9

1−ϵ
0 0
ϵ


ϵ

1 1
1−ϵ

Figure 2.4: Binary erasure channel with erasure probability ϵ

ε
ε
1−ε
0 0

1−ε
1 1

ε
.. ..
. .

1−ε
q−1 q−1

Figure 2.5: q-ary erasure channel with erasure probability ϵ

The QEC is a generalization of the BEC to q-ary alphabets. The QEC is


shown in Figure 2.5.
The models of BSC and BEC can also be combined into a binary symmetric
error erasure channel as shown in Figure 2.6.
The binary symmetric error erasure channel can of course also be generalized
to q-ary alphabets.

2.2.3 The Additive White Gaussian Noise Channel


The additive white Gaussian noise (AWGN) channel is an additive memoryless
channel with infinite output alphabet (usually R).

Definition 5 (Additive White Gaussian Noise (AWGN) Channel). The


AWGN channel is an additive memoryless channel that adds on each

COD, TUM Channel Coding


10 Chapter 2 Principles of Channel Coding

1−p−ε
0 0
ε
p
p ⊛

ε
1−p−ε
1 1

Figure 2.6: Binary symmetric error erasure channel with error probability
p and erasure probability ϵ

input symbol ci a symbol ei ∈ R. The error value ei follows a normal


distribution with mean µ = 0 and constant power spectral density (uniformly
distributed over all frequencies).

The definition of the AWGN channel is given for completeness only. In


this lecture, only discrete channels are considered. Hence, channel input
and channel output are from the same alphabet (potentially erasures can
happen). The decoder therefore has to reconstruct the transmitted codeword
based on these discrete received symbols. This is called hard decision
decoding.

However, if the decoder obtains an “analog” value (e.g., from the AWGN
channel), it can be used to improve the decoding performance compared to
hard decision decoding since some received analog values are more reliable
than others. This is called soft decision decoding and illustrated in the
following example and Figure 2.7.

Example 2.1 (Soft Information from AWGN Channel).


Assume, the error symbols ei are distributed according to a normal distribution
with mean zero as shown in the upper part of Figure 2.7. The receiver
uses a binary phase shift keying (BPSK) modulation and therefore transmits
symbols xi ∈ {−1, 1}. The distribution of p(ri |xi ) is shown in the lower part
of Figure 2.7.
Assume, two symbols are received: r1 = x1 +e1 = 0.75 and r2 = x2 +e2 = 2.5.
For hard decision decoding, both symbols are mapped to 1, since P (r1 |1) >
P (r1 | − 1) and P (r2 |1) > P (r2 | − 1).
However, we see that r2 is more reliable than r1 to make a decision on 1
|1) |1)
as PP(r(r22|−1) > PP(r(r11|−1) . A decoder that makes use of this additional soft
information, is called a soft decision decoder.

Channel Coding COD, TUM


2.3 Decoding Principles 11

p(ei )

ei

p(ri |xi )

ei
−1 r1 1 r2

Figure 2.7: AWGN channel

2.3 Decoding Principles

This section deals with basic decoding principles, i.e.: what is the goal of
the (channel) decoder (see the transmission model in Figure 2.1)? Recall
that decoding is usually the hard task for the receiver whereas mapping cb to
b (the sink) is an easy task due to the bijective mapping. In the following,
u
we analyze different decoding strategies of the decoder from a high level
point of view, without using a special code or explicit decoding algorithms.

2.3.1 Maximum A-Posteriori Decoding

The task of the decoder is to guess the channel input (i.e., a codeword)
c = (c0 , c1 , . . . , cn−1 ), given only the channel output r = (r0 , r1 , . . . , rn−1 ).
A maximum a-posteriori (MAP) decoder has the rule that given a received
word r, it decides for the most likely codeword ĉ from a given code C:

P (r|c)P (c)
ĉ = arg max P (c|r) = arg max ,
c∈C c∈C P (r)

where the last equality can be obtained by applying Bayes’ rule.


Since P (r) is fixed during one decoding process and therefore independent
of the maximization over different c, it can be ignored in the maximization
process and we obtain the following MAP decoding rule.

COD, TUM Channel Coding


12 Chapter 2 Principles of Channel Coding

Definition 6 (Maximum A-Posteriori (MAP) Decoding). Given the received


word r, a MAP decoder outputs:

ĉ = arg max P (r|c)P (c). (2.1)


c∈C

Example 2.2 (MAP Decoding over BEC).


Assume that a memoryless source emits 0 and 1 with P (0) = 0.1 and P (1) =
0.9. These symbols are transmitted over a BEC (illustrated in Figure 2.4)
with erasure probability ϵ = 0.2.
Consider an uncoded transmission of length n = 2, i.e., the code is equal to
the set of all words: C = {(00), (01), (10), (11)}.
Qn−1
Recall that the BEC is memoryless, i.e., P (r|c) = i=0 P (ri |ci ).
Qn−1
Since the source is memoryless, we have P (c) = i=0 P (ci ).
Assume that the received word is r = (1⊛). The decoder wants to find ĉ by
the MAP rule, see (2.1). For c = (c0 , c1 ), we have

P (r|c)P (c) = P (r0 |c0 ) · P (c0 ) · P (r1 |c1 ) · P (c1 ).

We analyze the expression P (r|c)P (c) for all c.


• c = (00): P ((1⊛)|(00)) · P (00) = P (1|0) ·P (⊛|0) · P (0)2 = 0
| {z }
=0

• c = (01): P ((1⊛)|(01)) · P (01) = P (1|0) ·P (⊛|1) · P (0) · P (1) = 0


| {z }
=0

• c = (10): P ((1⊛)|(10)) · P (10) = P (1|1) · P (⊛|0) · P (1) · P (0) =


(1 − ϵ) · ϵ · P (1) · P (0) = 0.0144
• c = (11): P ((1⊛)|(11)) · P (11) = P (1|1) · P (⊛|1) · P (1)2 = (1 − ϵ) · ϵ ·
P (1)2 = 0.1296
Therefore, maxc∈C P ((1⊛)|c)P (c) = 0.1296 and the MAP decoder decides
for ĉ = (11).

However, we see that using the MAP decision rule directly as in Example 2.2,
is usually not practically feasible as we have to go through all possible
codewords, which results in a large decoding complexity.

2.3.2 Maximum Likelihood Decoding

The maximum likelihood (ML) decoding rule is a special case of the MAP
decoding rule (see (2.1)). For ML decoding, it is assumed that all codewords
are equi-probable and therefore
1
ĉ = arg max P (r|c)P (c) = arg max P (r|c) · = arg max P (r|c).
c∈C c∈C |C| c∈C

Channel Coding COD, TUM


2.3 Decoding Principles 13

In practice, the ML decoding strategy is frequently also applied when P (c)


is unknown. The ML decoding rule is therefore defined as follows.

Definition 7 (Maximum Likelihood (ML) Decoding). Given the received


word r, an ML decoder outputs:

ĉ = arg max P (r|c). (2.2)


c∈C

Thus, MAP and ML (if P (c) is the same for all c ∈ C) decoders choose the
most likely codeword and therefore minimize the block error probability.

In practice, the huge drawback of ML decoding is the large complexity of


ML decoders as we have to analyze all codewords. This is not feasible
for practical parameters as an actual decoder. However, the ML decoding
performance is frequently used as a comparison to evaluate the performance
of efficient decoders.

Example 2.3 (ML Decoding over BEC).


Consider the same setting as in Example 2.2: The source is memoryless,
but we do not know its statistics and assume therefore P (0) = P (1) = 0.5.
The channel is a BEC (see Figure 2.4) with ϵ = 0.2. Consider an uncoded
transmission of length n = 2, i.e., the code is equal to the set of all words:
C = {(00), (01), (10), (11)}.
Assume the received word is r = (1⊛) and the decoder wants to find ĉ by
the ML rule, see (2.2). For c = (c0 , c1 ), we have

P (r|c) = P (r0 |c0 ) · P (r1 |c1 ).

We analyze the expression P (r|c) for all c.

• c = (00): P ((1⊗)|(00)) = P (1|0) ·P (⊗|0) = 0


| {z }
=0

• c = (01): P ((1⊗)|(01)) = P (1|0) ·P (⊗|1) = 0


| {z }
=0

• c = (10): P ((1⊗)|(10)) = P (1|1) · P (⊗|0)= (1 − ϵ) · ϵ = 0.16


• c = (11): P ((1⊗)|(11)) = P (1|1) · P (⊗|1)= (1 − ϵ) · ϵ = 0.16

There is no unique ML decision and the decoder decides randomly between


(10) and (11).

Example 2.3 also illustrates how knowledge about the statistics of the source
is helpful to make a (good) decision.

COD, TUM Channel Coding


14 Chapter 2 Principles of Channel Coding

2.3.3 Symbol-by-Symbol Maximum A-Posteriori Decoding

Previously, the MAP decoding rule (2.1) was applied to the whole received
word at once. However, it can also be applied to each symbol separately.
This is called symbol-by-symbol MAP. To decide for the symbol ĉi , ∀i ∈
{0, . . . , n − 1}, we decide on the most likely value by summing up the
probabilities of all codewords where the i-th position has that value.

Definition 8 ((Binary) Symbol-by-Symbol MAP Decoding). Given the


received word r, a symbol-by-symbol MAP decoder outputs
ĉ = (ĉ0 , ĉ1 , . . . , ĉn−1 ), where

 X X

0 if P (r|c)P (c) ≥ P (r|c)P (c)
ĉi =  c∈C:ci =0 c∈C:ci =1 (2.3)
1 else

for i = 0, . . . , n − 1.

However, the output word ĉ is not necessarily a valid codeword.

Example 2.4 (Symbol-by-symbol MAP over BEC).


Consider the same setting as in Example 2.2: The memoryless source emits 0
and 1 with P (0) = 0.1 and P (1) = 0.9. The channel is a BEC (see Figure 2.4)
with ϵ = 0.2. Consider an uncoded transmission of length n = 2, i.e., the
code is equal to the set of all words: C = {(00), (01), (10), (11)}.
Assume the received word is r = (1⊛) and the decoder wants to find ĉ by the
symbol-by-symbol MAP rule (2.3). We therefore apply (2.3) on each symbol.
• Decision for ĉ0 :
X X Example 2.2
P (r|c)P (c) = P (r|c)P (c) = 0
c∈C:c0 =0 (00),(01)
X X
P (r|c)P (c) = P (r|c)P (c)
c∈C:c0 =1 (10),(11)

Example 2.2
= 0.0144 + 0.1296 = 0.144

The decoder therefore decides for ĉ0 = 1.


• Decision for ĉ1 :
X X Example 2.2
P (r|c)P (c) = P (r|c)P (c) = 0.0144
c∈C:c1 =0 (10),(00)

Channel Coding COD, TUM


2.4 Decoding Principles in the Hamming Metric 15

X X Example 2.2
P (r|c)P (c) = P (r|c)P (c) = 0.1296
c∈C:c1 =1 (01),(11)

The decoder therefore decides for ĉ1 = 1.


The symbol-by-symbol MAP decoder then outputs ĉ = (11) which is the
same as the MAP decision in this case.

2.4 Decoding Principles in the Hamming Metric


While the previous section focused on probabilistic decoding principles, this
section considers decoding principles in the Hamming metric. We deal with
different ways of handling errors (detection, correction, list decoding) and
see that for all these cases, if the minimum distance of the code is large
enough, reconstruction is possible. First, the Hamming metric and the
Hamming weight are introduced and then decoding principles are analyzed.
Note that the principles that are presented here are usually not MAP or ML
decoding principles but usually have feasible computational complexity.

2.4.1 Hamming Weight and Hamming Distance

Definition 9 (Hamming Weight). The Hamming weight of a vector a =


(a0 , a1 , . . . , an−1 ), denoted by wt(a), is the number of its non-zero entries:

wt(a) := |{i : ai 6= 0, i = 0, . . . , n − 1}|.

Definition 10 (Hamming Distance). The Hamming distance between


two words a, b, denoted by d(a, b), is the number of coordinates where a
and b are different:

d(a, b) := |{i : ai 6= bi , i = 0, . . . , n − 1}|.

This clearly means that the Hamming distance between two words is equal
to the weight of the difference of the words: d(a, b) = wt(a − b).
Throughout this lecture, we say that a code has minimum Hamming distance d
if it is a set of words where the Hamming distance between any two codewords
is at least d and if there is a pair of words which are at distance d.

Example 2.5 (Hamming Distance).


Given two vectors a and b, where a = (101101) with Hamming weight

COD, TUM Channel Coding


16 Chapter 2 Principles of Channel Coding

wt(a) = 4, b = (110101) with Hamming weight wt(a) = 4. Their Hamming


distance is d(a, b) = 2.

2.4.2 Error Detection

A principle that is applied frequently in


c(2)
practice is error detection. This means,
the error is not corrected and the correct c(1) r
sent codeword cannot be reconstructed
but the receiver knows that some error
has happened. Due to this knowledge, d−1 c(3)
the receiver can ask for retransmission or
simply ignore the corresponding erroneous Figure 2.8: Error detection
word.
In Figure 2.8, codeword c(1) is transmitted and r is received. If the distance
between c(1) and r is less than the minimum distance of the code, the error
can always be detected as proven in the following theorem.

Theorem 1 (Error Detection). If c ∈ C, where C is a code of minimum


distance d, is transmitted and the channel adds an error e with 0 ≤ wt(e) ≤
d − 1, then the decoder can always detect from r = c + e whether an error
occurred or not.

Proof. Since the minimum distance between any two codewords is at least d, the
received word r = c + e cannot be a codeword if 1 ≤ wt(e) ≤ d − 1, and therefore
the receiver knows that an error occurred. If wt(e) = 0, no error has occurred
and r = c is a codeword.

2.4.3 Erasure Correction

When only erasures happen, up to d − 1 of them can be corrected.

Theorem 2 (Erasure Correction). If c ∈ C, where C is a code of minimum


distance d, is transmitted and the channel erases at most d−1 symbols (i.e.,
we know the locations), then the decoder can always correct the erasures.

Proof. Due to the minimum distance of the code, any two codewords differ by at
least d symbols. Thus, if we erase d − 1 fixed positions in all codewords, any two
codewords still differ by at least one symbol.
Therefore, we can reconstruct c if at most d − 1 erasures have happened.

Channel Coding COD, TUM


2.4 Decoding Principles in the Hamming Metric 17

2.4.4 Unique Error Correction

If we want to guarantee error


correction, compared to error c(2)
detection, the restriction on
the maximum error weight is
stricter. r

c(1)

Theorem 3 (Unique Decoding).


If c ∈ C, where C is a
j k c(3)
code of minimum distance d, (d−1)
2
is transmitted and the channel
adds
j k an error with wt(e) ≤
d−1
, then the decoder can
2 Figure 2.9: Unique error correction radius
always correct the errors.

j k
d−1
Proof. The decoding spheres of radius around all codewords do not overlap
2 j k
(see Figure 2.9). Therefore, for any received r = c+e with wt(e) ≤ d−1
2 , correct
decoding is guaranteed by finding the center of the decoding sphere that r lies
in.

Unique decoding is also called bounded minimum distance (BMD) decoding.

Example 2.6 (Unique Error Correction).


Consider
j k a code of minimum distance d = 5. This code can uniquely correct
d−1
2 = d−1
2 = 2 errors. Notice that this is the largest radius such that the
decoding spheres do not overlap (see also top of Figure 2.10).
j k
Similarly, for d = 4, we can guarantee to correct d−12 = b1.5c = 1 error
uniquely. We see that compared to d = 3, we do not gain in the error decoding
radius (but we can correct one erasure additionally). See also bottom of
Figure 2.10.

COD, TUM Channel Coding


18 Chapter 2 Principles of Channel Coding

1 2 3 4

1 2 3

Figure 2.10: Example for unique decoding

2.4.5 Nearest Codeword Decoding

A nearest codeword decoder decides on the “closest” (i.e., within smallest


Hamming distance) codeword. If more than one codeword is at the same
distance, it randomly decides for one. Figure 2.11 illustrates the principle
of nearest codeword decoding where the different regions are separated by
lines. These regions are sometimes called Voronoi regions or Voronoi cells.
In any symmetric channel where fewer errors are more likely than more
errors (e.g., in a BSC with p < 12 ), a nearest codeword decoder is an ML
decoder. This can be explained as follows: an ML decoder decides for
Y
n−1
cb = arg max P (r|c) = arg max P (ri |ci ), (2.4)
c∈C c∈C
i=0

where the second equality holds for any memoryless channel, in particular
DMCs. If P (bi |ci ) = pw < P (ci |ci ) = pc (independent of i), where bi ∈
A \ {ci } (for any i and any bi 6= ci ), maximizing (2.4) is equivalent to
minimizing d(r, c), and hence:

cb = arg min d(r, c).


c∈C

For the BSC, the statement P (bi |ci ) < P (ci |ci ) is equivalent to p < 21 .

2.4.6 List Decoding

The concept of list decoding can be seen as a generalization of unique


decoding. Here, the radius τ of the decoding spheres is increased to τ >
b d−1
2
c. Clearly, the decoding spheres are then overlapping. If a received

Channel Coding COD, TUM


2.5 Decoding Results and Error Probability 19

c(2)

c(1)
r

 d−1  c(3)
2

Figure 2.11: Nearest codeword decoding

word falls into more than one decoding sphere, a list decoder returns all
possible codewords. Equivalently and as shown in Figure 2.12, we can draw
a sphere of radius τ around the received word r and return all codewords
that lie in this sphere.
A list decoder of radius τ is therefore defined as follows, where
Bτ (r) = {a : d(a, r) ≤ τ }
denotes a sphere of radius τ around r.

Definition 11 (τ -List Decoder). Given a received word r, then a τ -list


decoder returns all codewords around r within radius τ , i.e., the returned
list of codewords is:
L = Bτ (r) ∩ C. (2.5)

The size of the output list is called list size and denoted by ℓ = |L|. The
maximum list size clearly depends on τ . For example, for τ = b d−1 2
c, we
have ℓ = 1. If τ exceeds a certain value, the list size can grow exponentially
in the code length (such a decoder has a huge complexity and is practically
not feasible).

2.5 Decoding Results and Error Probability

2.5.1 Decoding Results

For all previously mentioned decoding principles, the decoder outputs a


word ĉ for which one of the following three cases holds.
• Correct decoding: The decoder output ĉ equals the channel input.
Our goal is to maximize the probability of this event while still having
a decoding with small (or at least feasible) complexity.

COD, TUM Channel Coding


20 Chapter 2 Principles of Channel Coding

c(2)

c(1) r

 d−1  c(3)
2

Figure 2.12: List decoding

• Miscorrection: The decoder output ĉ is a valid codeword, but not


the channel input. This case cannot be detected at the receiver side
and is therefore the worst outcome and called decoding error.
• Decoding failure: The decoder output is not a valid codeword. The
receiver therefore detects that the output is erroneous and can e.g.,
ask for a retransmission. This case is called decoding failure.

2.5.2 Error Probability

The block error probability denotes the sum of decoding failure probability
and decoding error probability, i.e.,:
X
Pblock (c) = Pfail + Perr = P (r|c),
r : Dec(r)6=c

where Dec(r) denotes the output of the decoder applied to the received
word r.
Since a code of minimum distance d guarantees to decode b d−1 2
c errors
uniquely, for the BSC, we can give the following upper bound on the block
error probability:
!
X
n
n
Pblock ≤ · pi · (1 − p)n−i .
i
i=b d−1
2 c
+1

Practically, some errors beyond b d−1


2
c might be correctable (or even most,
by using list decoding) and therefore this provides an upper bound.
Sometimes, instead of block error probabilities, symbol (in the binary case
bit) error probabilities are considered. The symbol error probability can be
counted in terms of wrong codeword symbols or wrong information symbols,
i.e., the symbol error probability for the codeword is:
X
Psym (ci ) = P (r|c).
r:ĉi 6=ci

Channel Coding COD, TUM


2.5 Decoding Results and Error Probability 21

The symbol error probability w.r.t. the information word is:


X
Psym (ui ) = P (r|c).
r:ûi 6=ui

However, this symbol error probability Psym (ui ) depends on the explicit
mapping between information and codewords as shown in Example 2.7,
while the block error probability does not. Therefore, to compare the
performance of different algorithms, it is usually recommended to compare
block error rates.

Example 2.7 (Block and Bit Error Probabilities).


Consider two different mappings from information vectors of length k = 2 to
codewords of length n = 3:
m c m c
00 000 00 110
1. 10 101 2. 10 000
01 110 01 101
11 111 11 111
From both mappings, we obtain the same code C = {000, 101, 110, 111}.
Assume, we transmit c = (000) and receive r = (101). Thus, one block error
has occurred for both mappings. To analyze the number of bit errors, we
have to map the received word on an information word:
1. m = (00) c = (10) ⇒ 1 bit error
m
2. m = (10) c = (01) ⇒ 2 bit errors
m
Therefore, the number of bit errors depends on the explicit mapping from
information to codewords.
Note: this is not a good code as it is not able to detect or correct any single
error.

COD, TUM Channel Coding


Chapter 3

Finite Fields
As explained in the previous chapter, we consider vectors, matrices, and
elements over finite alphabets. In this chapter, we introduce finite fields
and their properties. It also gives a short summary of vector spaces. Notice
that these lecture notes do not provide a complete overview of finite fields,
but rather focus on properties that are needed in coding theory.
The recommended literature for this chapter is [1, Chapter 3] and [2, Chapter 2].

3.1 Basic Definitions

3.1.1 Group

First, we analyze which properties a set of elements has to fulfill to constitute


a so-called group.

Definition 12 (Group). A non-empty set of elements A = {a, b, c, ...}


together with an operation ∗ is called a group if the following axioms are
fulfilled:
1. Closure: for any a, b ∈ A, the element a ∗ b is in A,
2. Associativity: for any a, b, c ∈ A, (a ∗ b) ∗ c = a ∗ (b ∗ c),
3. Identity: there is an identity element e ∈ A for which e ∗ a = a ∗ e = a
for all a ∈ A,
4. Inverse: for all a ∈ A, there is an inverse element a−1 ∈ A such that
a ∗ a−1 = e.

If additionally a ∗ b = b ∗ a holds, the set is called a commutative or Abelian


group.
24 Chapter 3 Finite Fields

Example 3.1 (Group).


We show that A = {0, 1, 2, 3} is a group under addition modulo 4 and
therefore check if the axioms of Definition 12 are fulfilled.

1. Closure: (a + b) mod 4 ∈ {0, 1, 2, 3} ✓


e.g., 1 + 2 = 3, 2 + 3 = 5 ≡ 1 mod 4

2. Associativity: (a + b) + c = a + (b + c) mod 4 ✓
e.g.,(1 + 2) + 3 = 6 ≡ 2 mod 4
1 + (2 + 3) = 1 + 5 = 6 ≡ 2 mod 4
3. Identity: e = 0 =⇒ a + 0 = 0 + a = a ✓
4. Inverse: a−1 = −a = 4 − a mod 4 ✓
=⇒ a + a−1 = a + 4 − a = 4 ≡ 0 mod 4

Commutativity is also fulfilled and therefore A is an Abelian group.

Example 3.2 (N is No Group).


We show that the set of natural numbers N = {1, 2, 3, ...} does not form a
group under (usual) addition.
1. Closure ✓
2. Associativity ✓
3. ∄ identity element (0 6∈ N)
4. ∄ inverse elements (−1, −2, −3, · · · 6∈ N)
Since there is no identity element e such that a + e = a for any a ∈ N and
there are no inverse elements, the set N does not form a group under addition.

3.1.2 Field

We continue with the definition of a field which is defined using two operations,
addition + and multiplication ·.

Definition 13 (Field). A set F of at least two elements together with two


operations + and · is called a field if the following axioms are satisfied:
1. F forms a commutative group (with identity element “0”) under the
operation +.
2. The operation · is associative and commutative. The set F∗ := F \
{0} forms a commutative group (with identity element “1”) under the
operation ·.
3. Distributivity: For all a, b, c ∈ F: (a + b) · c = (a · c + b · c).

Channel Coding COD, TUM


3.2 Prime Fields 25

Common examples for fields include the sets of rational numbers Q, real
numbers R, and complex numbers C.

Example 3.3 (Binary Field).


This example shows that F2 := {0, 1} modulo 2 is a field. (This field is
actually called the binary finite field and will be considered later in detail.)
Addition “+”:
• Closure and Commutativity: 0 + 1 = 1, 1+1=0
0 + 0 = 0, 1+0=1
• Associativity: check all possible combinations ✓
• Identity: e = 0 because 0 + 0 = 0, 0 + 1 = 1
• Inverse: a−1 = 2 − a mod 2
since a = 0 : a−1 = 2 − 0 ≡ 0 mod 2
a = 1 : a−1 = 2 − 1 ≡ 1 mod 2

Multiplication “·”: F∗2 = F2 \ {0} = {1}


• Associativity and Commutativity: ✓
• Identity: e = 1 since 1 · 1 = 1
• Inverse: 1−1 = 1 since 1 · 1 = 1
Distributivity: ✓ e.g. (1 + 0) · 1 = 1 · 1 + 0 · 1 = 1

Example 3.4 (No Field).


We show that A = {0, 1, 2, 3} modulo 4 is no field.
“+” forms a group as shown in Example 3.1.
However, A is not a field due to the lack of a multiplicative inverse for the
element 2:

• Inverse: Let a = 2. Try to find a−1 : 2 · 1 = 2


2 · 2 = 4 ≡ 0 mod 4
2 · 3 = 6 ≡ 2 mod 4

As there has to be a multiplicative inverse for every element, A is no field.

3.2 Prime Fields


If the number of elements in a field is finite, it is called a finite field.
Finite fields exist only if their number of elements is a power of a prime.
We start now with the definition of a prime field which is used for many
error-correcting codes and whose size is a prime.

COD, TUM Channel Coding


26 Chapter 3 Finite Fields

Theorem 4 (Prime Field). Let p be a prime. Then, the set

Fp := {0, 1, . . . , p − 1}

together with the operations + mod p and · mod p is a (finite) field with
p elements, called prime field or Galois field.

Proof. The proof of this theorem will be done in the tutorial of the lecture.

Example 3.5 (F5 is a Field).


Consider F5 := {0, 1, 2, 3, 4} with the operations + mod 5 and · mod 5 and
show that this is a finite field.
Addition “+”: can be proven similar to e.g., Example 3.3.
Multiplication “·”:
• Identity: e = 1 =⇒ a · e = a
• Inverse: a = 1 : 1 · 1 = 1 =⇒ 1−1 = 1
a = 2 : 2 · 3 = 6 = 1 mod 5 =⇒ 2−1 = 3
a = 3 : 3 · 2 = 6 = 1 mod 5 =⇒ 3−1 = 2
a = 4 : 4 · 4 = 16 = 1 mod 5 =⇒ 4−1 = 4
Distributivity can be checked easily ✓.

Some general properties of finite fields will be given in the following section.

3.3 Extension Fields


In the previous section, we have considered prime fields where the size of the
field has to be a prime. This restriction on the field size raises the question:
How can we define a field of size, e.g., four? We recall from Example 3.4
that {0, 1, 2, 3} modulo 4 is no field. The answer to this question is to define
extension fields, e.g., the extension field F22 of size four.
For this purpose, we first introduce some notation:
• If we write F for a finite field, it can be either a prime or an extension
field.
• For some field F, then F[x] denotes the set of all polynomials with
coefficients from the finite field F, i.e.,

a0 + a1 x + · · · + ad x d , with ai ∈ F.

• Fpm denotes an extension field of the base field Fp where Fp is a prime


field for some prime p and the integer m ≥ 1 is called extension

Channel Coding COD, TUM


3.3 Extension Fields 27

degree. When m = 1, then Fpm = Fp is a prime field. All properties


of extension fields that are given in this section therefore hold for the
special case of prime fields as well.
We say that Fq is an extension field of Fp if Fp is a subfield of Fq , i.e., Fp ⊆ Fq ,
and addition and multiplication in Fq when performed on elements in Fp
coincide with the respective operations in Fp . Such a field is a vector space
over Fp and its extension degree m is the dimension of this space. Thus Fq
has pm elements and is therefore also called Fpm .
Extension fields will be defined via polynomials over the base field. Clearly,
we can do standard addition and multiplication of polynomials in Fp [z].
Also, polynomial division exists as stated in the following theorem.

Theorem 5 (Polynomial Division). Let a(x), b(x) ∈ F[x] with b(x) 6= 0


and deg a(x) ≥ deg b(x), then there exist unique polynomials q(x) and r(x)
with deg r(x) < deg b(x) such that a(x) = q(x)b(x) + r(x).

The quotient q(x) and the remainder r(x) can be calculated by (standard)
polynomial division.
For the definition of extension fields, we need irreducible polynomials. A
non-constant polynomial f (x) ∈ F[x] is called irreducible in F if it cannot
be written as product of several polynomials of smaller non-zero degree in
F[x]. Irreducible polynomials play a similar role for polynomials as primes
for integers: prime numbers are irreducible integers and integers can be
factorized into their prime factors. Similarly, polynomials can be factorized
into a product of irreducible polynomials. In fact, this factorization is
unique up to permutation and scalar multiples.

Example 3.6 (Irreducible Polynomials).


f (x) = x2 + 1 is irreducible over the reals R.
√ However, over the complex
numbers C = {a + jb : a, b ∈ R}, where j = −1, it is reducible and equals
f (x) = x2 + 1 = (x + j)(x − j).
Note thereby that C is an extension field of R of extension degree 2.

Example 3.7 (Irreducible Polynomials).


f (x) = x2 +1 is irreducible over F3 , but over F2 it equals f (x) = (x+1)(x+1).

Example 3.8 (Reducible Polynomials).


f (x) = xn − 1 = xn + 1 is reducible over F2n , since it can be written as

COD, TUM Channel Coding


28 Chapter 3 Finite Fields

Qn−1
f (x) = xn − 1 = i=0 (x − αi ) for some primitive element α ∈ F2n .

It therefore depends on the field if a polynomial is irreducible or not. It


can be shown that for any m > 0 there exists an irreducible polynomial of
degree m with coefficients in Fp , for any prime p. This existence is crucial
for constructing finite extension fields.
Based on the previous properties of polynomials over finite fields, we can
now state the construction of an extension field.

Theorem 6 (Finite Extension Field). Let p be a prime and m ≥ 1 an


integer. Let f (z) ∈ Fp [z] be an irreducible polynomial of degree m.
Then, the set of all polynomials in Fp [z] of degree less than m (there are pm
such polynomials) defines the finite extension field:
n o
Fpm := a(z) = a0 + a1 z + · · · + am−1 z m−1 : ai ∈ Fp
with the operations “addition +” and “multiplication ·”, where
• addition + is defined as the usual addition a(z) + b(z),
• multiplication · is defined by a(z) · b(z) mod f (z).

Clearly the size of the extension field is |Fpm | = pm .


Before constructing an example of an extension field, let us give some
properties of finite fields. For all elements a ∈ F it holds that a|F| = a
(for the proof see e.g., [1, Proposition 3.1]). In particular in a prime field,
we have ap = a.
Every finite field can be represented by a primitive element whose powers
generate all non-zero elements of the field and is defined as follows.

Definition 14 (Primitive Element). An element α of a finite field F with


q elements is called primitive if its powers αi , i = 0, . . . , q − 2, generate
F∗ = F \ {0} and αq−1 = 1.

Every finite field (prime and extension fields) has at least one primitive
element α (proof see e.g., [1, Theorem 3.8]). Therefore, multiplying powers
of the primitive element can be done by adding their exponents: αi · αj =
α(i+j) mod (q−1) .

Example 3.9 (Primitive Elements of F5 ).


We want to find the primitive elements of F5 by simply going through all
elements of F5 .

Channel Coding COD, TUM


3.3 Extension Fields 29

• α = 1: αi = 1, ∀i
=⇒ 1 does not generate F∗5 and is not primitive
• α = 2: 20 = 1
21 = 2
22 = 4
23 = 8 = 3
24 = 16 = 1
=⇒ 2 generates F∗5 and is primitive
• α = 3: 30 = 1
31 = 3
32 = 9 = 4
33 = 27 = 2
=⇒ 3 generates F∗5 and is primitive
• α = 4: 40 = 1
41 = 4
42 = 16 = 1
43 = 64 = 4
=⇒ 4 does not generate F∗5 and is not primitive
α = 2, 3 are primitive elements of F5 , whereas α = 1, 4 are not primitive.

A parameter that is frequently needed in the context of finite fields, is the


so-called order.

Definition 15 (Order of an Element). Let F be a finite field with q elements


and let a ∈ F, a 6= 0. The smallest positive integer s such that as = 1 is
called the order of a and denoted by ord(a).

A primitive element of F, which is a field of size q, by definition has order


ord(α) = q − 1. Further, any element β of order ord(β) = q − 1 is a primitive
element since the β i for i = 0, . . . , q − 2 are distinct and therefore generate
the whole field (except 0). In general for an element β with ord(β) = s,
the β i for i = 1, . . . , s are all different since otherwise β i = β j , for some
0 < i, j ≤ s and β j−i = 1 with 0 < j − i < s, contradicting the fact that
ord(β) = s. Illustrations of the powers of an element β with ord(β) = s <
q − 1 and a primitive element α are given in Figure 3.1.

For any irreducible polynomial f (z), there exists an element α ∈ Fpm such
that f (α) = 0. If α is a primitive element, then f (z) is called a primitive
polynomial.

COD, TUM Channel Coding


30 Chapter 3 Finite Fields

Example 3.10 (Extension Field F24 ).


We use the irreducible polynomial f (z) = 1 + z + z 4 to construct F24 . The
elements of F24 are all polynomials in F2 [z] of degree at most three (shown
in the middle column of Table 3.1).
By writing the polynomial coefficients in vectors, the field can also be seen
as the set of all binary vectors of length four (shown in the right column of
Table 3.1).
If we calculate z i mod f (z), for all i = 0, . . . , 14, we obtain all non-zero
polynomials in F2 [z] of degree at most three. For example:

z4 mod f (z) = z 4 mod (1 + z + z 4 ) ≡ 1 + z,


z5 mod f (z) ≡ z + z 2 ,
...
z 14 mod f (z) ≡ z 2 · (1 + z)3 ≡ 1 + z 3 .
Thus, we can use z as primitive element and call it α. In this example,
f (z) is therefore a primitive polynomial. Note that this is not always the
case for irreducible polynomials and the extension field can be built from
non-primitive, but irreducible, polynomials as well.
The corresponding αi are listed in the left column of Table 3.1.

Exponent of α Polynomial Binary vector


0 0 0000
α0 1 1000
α1 z 0100
α2 z2 0010
α3 z3 0001
α4 1+z 1100
α5 z + z2 0110
α6 z2 + z3 0011
α7 1 + z + z3 1101
α8 1 + z2 1010
α9 z + z3 0101
α10 1 + z + z2 1110
α11 z + z2 + z3 0111
α12 1 + z + z2 + z3 1111
α13 1 + z2 + z3 1011
α14 1 + z3 1001

Table 3.1: Construction of the extension field F24

Addition can be done by either adding the polynomials, e.g., z 3 + z 4 =


1 + z + z 3 = z 7 , or by adding the corresponding binary vectors, i.e., in this
case (0001) + (1100) = (1101).

Channel Coding COD, TUM


3.4 Polynomials over Finite Fields 31

Multiplication is done by a(z)·b(z) mod f (z), e.g., (1+z 2 )·z 2 = z 2 +z 4 ≡


1 + z + z 2 mod f (z). Alternatively, the powers of α can be added, i.e., in
this case α8 · α2 = α10 , which in turn corresponds to 1 + z + z 2 .

The following theorem summarizes the most important facts for finite fields.

Theorem 7 (Existence of Finite Fields). The following holds for finite


fields:
• For any prime p and integer m ≥ 1, there exists a finite field Fpm .
• The finite field Fpm has size pm .
• Any two finite fields F and F0 of the same size are isomorphic (i.e.,
they are the same up to ordering/renaming elements).

element β of order s primitive element α


1=β q−1
1 = αq−1 all non-zero field
q−2
elements Fq /{0}
β β αq−2 α

β q−3 ·β β2 αq−3 ·α α2

·β ·α
β3 α3

·β

β s+1 = β βs = 1

s = ord(β) ord(α) = q − 1

Figure 3.1: The powers of an element β with ord(β) = s and a primitive


element α.

3.4 Polynomials over Finite Fields


We denote by F[x] the ring of polynomials with coefficients in a finite
(prime or extension) field F. Calculations are done as “usual” by adding
and multiplying the polynomials where the coefficient-wise addition and
multiplication are done in F.
In this lecture, we also sometimes use F[x]/xb , i.e., the ring of residues
of the polynomials in F[x] modulo xb . Further, we sometimes calculate

COD, TUM Channel Coding


32 Chapter 3 Finite Fields

polynomials in F[x] modulo (xb − 1). The difference between these two
modulo operations is shown in the following example.

Example 3.11 (Modulo Calculation for Polynomials).


Consider F2 [x] and let b = 2. Let a(x) = a0 + a1 x + a2 x2 + a3 x3 = 1 + x2 + x3 .
On the one hand, a(x) mod xb ≡ a0 +a1 x+a2 x2 +a3 x3 mod x2 = a0 +a1 x =
1. This is true since xb ≡ 0 mod xb .
On the other hand, a(x) mod (xb −1) ≡ a0 +a2 +(a1 +a3 )x mod (x2 −1) = x.
Here, xb − 1 ≡ 0 mod (xb − 1) and therefore xb ≡ 1 mod (xb − 1).

3.5 Cyclotomic Cosets and Minimal Polynomials

For the definition of cyclic codes and in particular BCH codes, so-called
cyclotomic cosets and minimal polynomials are needed.
While this lecture only considers base fields of prime order, this is not
necessary and it is possible to use base fields whose order itself is some
0
prime power. That is: q denotes some prime power pm and we consider
the extension Fqm of the base field Fq . This general approach is taken in
this section since it allows one to see that it is possible for BCH codes (in
Chapter 6) to be defined over a prime power field instead of just prime
fields. But, as stated already, the reader can safely assume the base field is
prime in this lecture and in particular q = p in this section.

Definition 16 (Cyclotomic Coset). Let n be an integer with gcd(n, q) = 1.


The cyclotomic coset Ci with respect to n is defined by:

Ci := {i · q j mod n, ∀j = 0, 1, . . . , ni − 1},

where ni is the smallest integer such that i · q ni = i mod n.

Let m denote the smallest positive integer such that n divides q m − 1 (one
can show that such an m exists iff gcd(n, q) = 1). Properties of cyclotomic
cosets are as follows (given without proof):
• Their size is at most m: |Ci | ≤ m.
• Two cyclotomic cosets are either distinct or identical: Ci ∩ Cj = ∅ or
C i = Cj .
• C0 = {0}.
S
• i Ci = {0, 1, . . . , n − 1}.

Channel Coding COD, TUM


3.5 Cyclotomic Cosets and Minimal Polynomials 33

Definition 17 (Minimal Polynomial). Let α be an element of Fqm of order


n and let Ci be the i-th cyclotomic coset with respect to n. Then,
Y
mi (x) = (x − αj )
j∈Ci

is called the minimal polynomial of αi .

The minimal polynomial of αi is also the unique monic polynomial in Fq [x]


of lowest degree that has αi as root.
The following lemma shows that although α lies in the extension field, the
coefficients of the minimal polynomial lie in the base field.

Lemma 1 (Minimal Polynomial). Let α be an element of Fqm of order


n and let Ci be the i-th cyclotomic coset with respect to n. Let mi (x) =
Q
j∈Ci (x − α ). Then, mi (x) ∈ Fq [x].
j

Proof. First, the binomial theorem with


!
X
n
n n−k k
(a + b)n = a b
k=0
k
!
Xn
n n−k
(a − b) = n
a (−b)k
k=0
k

yields !
q l q−1
(x − α ) = x −
l q
αx q
+ ... ± αlq = xq ± αql ,
1
q
since i = 0 in Fqs . Second, we have
2 ni −1 ni
{αjq }j∈Ci = {αiq , αiq , . . . , αiq , αiq = αi } = {αj }j∈Ci .

Now, we use these two properties to obtain:


Y Y Y
(mi (x))q = (x − αj )q = (xq − αjq ) = (xq − αj ) = mi (xq ).
j∈Ci j∈Ci j∈Ci

For the LHS and RHS of the previous equation, we get:


 q
X
d X
d
(mi (x))q =  mij xj  = mqij xjq ,
j=0 j=0

X
d
(mi (xq )) = mij xjq .
j=0

The two expressions are equal if and only if mqij = mij , which is true if and only
if mij ∈ Fq (for the “only if” compare [1, Problem 3.11]).

COD, TUM Channel Coding


34 Chapter 3 Finite Fields

The previous lemma gives the motivation why we can define (cyclic) codes
over the base field Fq by using minimal polynomials.

Lemma 2. Given Fqm , let n be relatively prime to q and let m be the


smallest positive integer such that n divides q m − 1. Let α be an element of
order n. Then
(1) Y
n−1
(2) Y
xn − 1 = (x − αj ) = mi (x), (3.1)
j=0 mi (x)

where mi (x) ranges over all distinct minimal polynomials of the powers of
α (i.e., all distinct polynomials in the set {mi (x), 0 ≤ i < n}).

Proof. Part (2) follows from Part (1) since all cyclotomic cosets are distinct and
since their union equals {0, . . . , n − 1}.
Qn−1
For proving Part (1), let us write explicitly write j=0 (x − αj ):

Y
n−1 Y
n−1
(x − αj ) = −αj
j=0 j=0
 
X
n−1 Y
n−1
+ −αj  · x
i=0 j=0,j6=i
 
X
n−1 X Y
n−1
+ −αj  · x2
i1 =0 i2 >i1 j=0,j6=i1 ,i2

+ ...
+ xn .

For the coefficient of x0 , we rewrite the coefficient by using the arithmetic series.
First consider n to be odd. Then,

Y
n−1 Pn−1 n(n−1) n−1
j
−αj = −α j=0 = −α 2 = −(αn ) 2 = −1,
j=0

n−1
where we use the fact that αn = 1 and that 2 is an integer.
Second, let n be even. Then,

Y
n−1 Y
n−1 Pn−1 n(n−1) n2
j −n n
−αj = αj = α j=0 =α 2 =α 2 2 = α2.
j=0 j=0

n
We know that (α 2 )2 = αn = 1. The polynomial x2 − 1 = 0 has only −1 and 1
n n
as roots and thus, α 2 ∈ {−1, 1}. Assume α 2 = 1, then ord(α) < n which is a
n
contraction and therefore α 2 = −1 and the coefficient of x0 equals −1.
For the coefficient of x, we obtain by using the geometric series:

X
n−1 Y
n−1 X
n−1
(α−1 )n − 1
−αj = α−i = = 0,
i=0 j=0,j6=i i=0
α−1 − 1

Channel Coding COD, TUM


3.5 Cyclotomic Cosets and Minimal Polynomials 35

Qn−1 j
where the first equality follows by dividing the whole sum by j=0 α = 1 if n is
Q
odd and by − n−1 j
j=0 α = 1 if n is even.

For the coefficient of x2 , we obtain:

X
n−1 X Y
n−1 X
n−1 X
−αj = − α−i1 α−i2
i1 =0 i2 >i1 j=0,j6=i1 ,i2 i1 =0 i2 >i1
 
X
n−1 X
n−1 X
i1
=− α−i1  α−i2 − α−i2 
i1 =0 i2 =0 i2 =0
 
X
n−1  (α−1 )n −1 (α−1 )i1 +1 − 1
=− α−i1 
 − 
i1 =0 |
α−1 −1 }
{z
α−1 −1 
=0
X
n−1
α−(i1 +1) −1
=− α−i1
i1 =0
α−1 −1
X
α−1 n−1 1 X
n−1
−2i1
=− −1
α + −1
α−i1
α − 1 i =0 α − 1 i =0
1 1

α−1 (α−2 )n
−1 1 (α−1 )n − 1
=− + = 0,
α−1 − 1 α−2 − 1 α−1 − 1 α−1 − 1
Qn−1 j
where the first equality follows by dividing the whole sum by j=0 α = 1 if n is
Q
odd and by − n−1 j
j=0 α = 1 if n is even.

Similarly, it can be shown that the coefficients of x3 , . . . , xn−1 equal zero and
Part (1) of the statement follows.

Instead of the complete technically proof given above, we can explain Lemma 2
as follows. By the fundamental theorem of algebra, every polynomial of
degree n has at most n distinct roots. Since ord(α) = n, every element in
the set {α0 , α1 , . . . , αn−1 } is a distinct root of xn −1 since (αi )n = (αn )i = 1,
for all i. Therefore, these n elements must be all the n distinct roots of xn −1
and the statement follows.
Some properties of minimal polynomials are therefore as follows:
• deg mi (x) = |Ci |,
• α ∈ Fqm , but mi (x) ∈ Fq [x] (see Lemma 1),
• mi (x) is irreducible in Fq [x],
Qn
• mi (x)|(xn − 1) since xn − 1 = j=1 (x − αj ) (see Lemma 2).

Example 3.12 (Cyclotomic Cosets and Minimal Polynomials).


We calculate the cyclotomic cosets and the corresponding minimal polynomials
for q = 2 and n = 15.

COD, TUM Channel Coding


36 Chapter 3 Finite Fields

The cyclotomic cosets are:

C0 = {0}
C1 = {1, 2, 4, 8, 16 ≡ 1 mod 15} = {1, 2, 4, 8}
C2 = C1 (since 2 ∈ C1 )
C3 = {3, 6, 12, 24 ≡ 9 mod 15} = {3, 6, 9, 12}
C4 = C1
C5 = {5, 10, 20 mod 15} = {5, 10}
C6 = C3
C7 = {7, 14, 28 ≡ 13 mod 15, 56 ≡ 11 mod 15} = {7, 11, 13, 14}

Let α ∈ F24 be a primitive element. Then, the corresponding minimal


polynomials are (using the construction of F24 from Table 3.1):

m0 (x) = (x − α0 ) = x − 1
m1 (x) = (x − α)(x − α2 )(x − α4 )(x − α8 ) = x4 + x + 1
m3 (x) = (x − α3 )(x − α6 )(x − α9 )(x − α12 ) = x4 + x3 + x2 + x + 1
m5 (x) = (x − α5 )(x − α10 ) = x2 + x(−α5 − α10 ) + α5 α10 = x2 + x + 1
m7 (x) = (x − α7 )(x − α11 )(x − α13 )(x − α14 ) = x4 + x3 + 1.

We see that all minimal polynomials lie in F2 [x].

3.6 Vector Spaces

This section repeats basic properties of vector spaces, without giving any
claim of completeness. We thereby focus on vector spaces over finite fields,
but most properties hold for any field. A vector space is a set of vectors
that may be added and multiplied by scalars and still remain in the same
vector space.
Formally, a vector space over a field F is a set V together with two operations,
“+” and “·”, that satisfy certain axioms.
The first operation (addition) “+”: V × V → V, takes any two vectors
u, v ∈ V and outputs w = u+v ∈ V. The second operation (multiplication)
“·”: F × V → V, takes any scalar a ∈ F and any vector v ∈ V and outputs
w = av ∈ V.
To actually form a vector space, addition and multiplication have to fulfill
the following axioms (for u, v, w ∈ V and a, b ∈ F):
1. Associativity of addition: u + (v + w) = (u + v) + w,
2. Commutativity of addition: u + v = v + u,
3. Identity element of addition: ∃ an element, called 0, such that u+0 =
u, for all u ∈ V,

Channel Coding COD, TUM


3.6 Vector Spaces 37

4. Inverse elements of addition: For all u ∈ V, there exists an element,


called −u, such that u + (−u) = 0,
5. Compatibility of scalar multiplication with field multiplication: a(bu) =
(ab)u,
6. Identity element of scalar multiplication: 1u = u,
7. Distributivity of scalar multiplication with respect to vector addition:
a(u + v) = au + av,
8. Distributivity of scalar multiplication with respect to field addition:
(a + b)u = au + bu.
For example, Euclidean vectors form a vector space. Also, extension fields
form a vector space over the base field, see, e.g., Table 3.1 where F24 forms a
vector space over F2 . Table 3.1 can be seen as a special case of the following
example where vectors of length n form a vector space over a field F.

Example 3.13 (Cartesian Product as Vector Space).


Let F be a field. Then, the n-fold Cartesian product F × F × · · · × F with the
operations:

(u0 , u1 , . . . , un−1 ) + (v0 , v1 , . . . , vn−1 ) = (u0 + v0 , . . . , un−1 + vn−1 )


a · (u0 , u1 , . . . , un−1 ) = (a · u0 , a · u1 , . . . , a · un−1 )

is a vector space Fn .

Throughout this lecture, we also need some more basic concepts of linear
algebra such as linear (in)dependence. On the one hand, a vector v =
(v0 , v1 , . . . , vn−1 ) ∈ Fn is called linearly dependent of a set of vectors
{u(1) , u(2) , . . . , u(ℓ) } ⊂ Fn if there exist scalars ai ∈ F such that
X

v= ai u(i) .
i=1

On the other hand, the vectors {u(1) , u(2) , . . . , u(ℓ) } are linearly independent
if
X

ai u(i) = 0
i=1
implies that ai = 0 for all i = 1, . . . , ℓ.

Example 3.14 (Linear (In-)dependence).


Let two vectors be given:

u(1) = (1001) ∈ F42


u(2) = (0011) ∈ F42

COD, TUM Channel Coding


38 Chapter 3 Finite Fields

Then, v = (1010) is linearly dependent of u(1) , u(2) as it can be written as


v = u(1) + u(2) .
However, u(3) = (0100) is linearly independent of u(1) and u(2) as it cannot
be written as a1 · u(1) + a2 · u(2) , ∀a1 , a2 ∈ F2 .

The vectors u(1) , u(2) , . . . , u(ℓ) form a basis of a vector space V if they are
linearly independent and generate V.

Definition 18 (Subspace). Let V be a vector space. A non-empty subset


of vectors U ⊆ V is itself a vector space and is called subspace if u, v ∈ U
implies that
a·u+b·v∈U
for all a, b ∈ F.

Let V be a vector space over F with a basis of ℓ vectors. Then, any set of ℓ
linearly independent vectors in V is also a basis. The integer ℓ is called the
dimension of V.

Example 3.15 (Subspace).


The vectors u(1) = (1001), u(2) = (0011) form a basis for a 2-dimensional
subspace V.
Note that u(1) and v = (1010) form another basis for the same subspace V.

Example 3.16 (Subspace).


A 3-dimensional subspace V can be defined by the following basis vectors:
b(1) = (1000)
b(2) = (0100)
b(3) = (0010)

Further, b(1) , b(2) form a basis for a 2-dimensional subspace U of V.

3.7 Matrix Properties


In this section, we give some basic terminology for matrices that is needed
in this lecture. It is by far no complete list of matrix properties.
Let A ∈ Fm×n
q denote an m × n matrix with coefficients in Fq . A row vector
of length n is denoted by a = (a0 , a1 , . . . , an−1 ) ∈ Fnq . A column vector
is therefore aT ∈ Fn×1
q . Recall that the standard matrix product A · B is
non-commutative, i.e., in general it does not equal B · A.

Channel Coding COD, TUM


3.7 Matrix Properties 39

The vector space that is spanned by the columns of A is called column space
and the space spanned by the rows of A is called row space.
The rank of a matrix A is the dimension of the vector space generated
(or spanned) by its columns (or rows). This corresponds to the maximum
number of linearly independent columns (or rows). It thereby does not
matter if we consider the row or the column space as the row and the
column rank are always equal. Throughout this lecture, we denote it by
rank(A).
An m × n matrix A is said to have full rank if rank(A) = min{m, n}.
A square m × m matrix A is called invertible (or non-singular) if there
exists an m × m matrix A−1 such that

A · A−1 = A−1 · A = Im ,

where Im denotes the m × m identity matrix. An invertible matrix clearly


has full rank.
A matrix is in row echelon form if all non-zero rows are above all-zero
rows and in a non-zero row its leftmost non-zero coefficient (leading term)
is always strictly to the right of the leading term of the row above. Any
matrix can be brought to row echelon form by Gaussian elimination (row
operations on the matrix).

COD, TUM Channel Coding


Chapter 4

Linear Block Codes


This chapter introduces fundamental concepts of linear block codes. We
start with the definition of linear (block) codes as a subspace and then
continue with encoding and decoding. Their maximum size and their minimum
distance are analyzed via several bounds.
Note that the term block refers to the fact that each codeword is independent
of the previous and next codewords. In contrast to block codes, for convolutional
codes, memory between the different codewords is introduced (see Chapter 9).
Most of this lecture considers block codes; therefore, when we speak about
(linear) codes, we usually refer to (linear) block codes.
The recommended literature for this chapter is [1, Chapter 2].

4.1 Definition and Properties

4.1.1 Definition and Parameters

Definition 19 (Linear Block Code). A linear [n, k, d]q block code is a


k-dimensional linear subspace of the vector space Fnq with minimum Hamming
distance d.

A linear block code has the following determining parameters:


• n is the length of the code (i.e., the number of transmitted symbols),
• k its dimension (i.e., the dimension of the vector space but also the
number of information symbols that are encoded),
• n − k its redundancy (i.e., the number of symbols that do not contain
new information but are necessary to enable reconstruction of a corrupted
codeword),
• q its alphabet size,
• M := q k its cardinality (i.e., size),
k
• R := n
the rate of the code.
42 Chapter 4 Linear Block Codes

Since a linear code over Fq is a k-dimensional subspace of Fnq , the linear


combination of any two codewords is again a codeword: If a, b ∈ C ⊆ Fnq ,
then α · a + β · b ∈ C for any α, β ∈ Fq . Similarly, the all-zero codeword of
length n, denoted by 0 := (0, . . . , 0), is a codeword of any linear code.

Example 4.1 (Linear Code).


The set {(0, 0), (1, 1)} is a one-dimensional subspace of F22 and a linear [2, 1, 2]2
block code of cardinality M = 21 = 2 and code rate R = 12 .

In the following, we recall the definition of the minimum distance of a code


and prove that for linear codes, the minimum distance of a code is equal to
the minimum codeword weight.

Definition 20 (Minimum Distance). The minimum Hamming distance


of a block code C is the minimum number of differing symbols in any two
codewords:
d := min {d(a, b)}.
a,b∈C
a6=b

Lemma 3 (Minimum Distance of a Linear Code). For a linear block


code C:
d = min{wt(a)}.
a∈C
a6=0

Proof. Since C is linear, if a, b ∈ C also c := a − b ∈ C. If a 6= b, then c 6= 0.


Therefore, the minimum Hamming distance of C is

d = min {wt(a − b)} = min wt(c)


a,b∈C, c∈C
a6=b c6=0

When designing error-correcting codes, we obviously do not want to transmit


or store too much additional redundancy symbols, i.e., our goal is a high
code rate R, ideally approaching 1. However, in order to guarantee that
error detection or error correction is possible when errors occur, a certain
amount of redundancy has to be added to the information symbols. This
results in a trade-off between error-correction capability and code rate.
The following two examples show two “extreme” code families: one code
with very low code rate but largest possible distance (the repetition code)
and one code with very high code rate but minimum distance only two, i.e.,
such that no error can be corrected (the single-parity check code).

Channel Coding COD, TUM


4.1 Definition and Properties 43

Example 4.2 ([n, 1, n]2 Binary Repetition Code).


The repetition code simply copies the information symbols several times.
• Encoding: the message u0 is repeated n times. That means, we encode
u = (u0 ) to c = (u0 , . . . , u0 ).
• The code consists of M = 2 codewords.
• The Hamming distance between any two codewords is n since there are
only two codewords: (1, . . . , 1) and (0, . . . , 0) which have distance n.
1
• The code rate is R = n which is very low and tends to zero for larger
n.
• The code can correct b n−1
2 c errors, can detect n − 1 errors, and can
correct n − 1 erasures. This is the largest possible detecting and
correcting capability.

For example, consider the [3, 1, 3]2 binary repetition code of length 3:

C = {(000), (111)}.

The code rate is R = 31 , the minimum distance d = n = 3 and b d−1


2 c = 1
errors are uniquely correctable.

Example 4.3 ([n, n − 1, 2]2 Binary Single-Parity Check (SPC) Code).


Single-parity check codes simply append a single parity bit that enables error
detection but no error correction.
• Encoding: add one redundancy symbol to (u0 , u1 , . . . , un−2 ) such that
every codeword has an even number of ones (i.e., even Hamming weight).
• The code consists of M = 2n−1 codewords.
• Its minimum distance is d = 2: Since any codeword has even weight
and since it is a linear code, for a, b ∈ C, also c := a − b ∈ C has even
weight. Thus, the minimum weight of any non-zero codeword is 2.
n−1
• The code rate is R = n (very high).
• The SPC code cannot correct any error, but can detect one error and
can correct one erasure (very low capability).

For example, consider the [4, 3, 2]2 binary SPC code in the following.
Systematic encoding of all possible information words of length 3 is done by
appending a 0 if the weight of the information word is even and a 1 if the
weight of the information word is odd. This is shown in the following table.

COD, TUM Channel Coding


44 Chapter 4 Linear Block Codes

u → c
000 → 0000
001 → 0011
010 → 0101
011 → 0110
100 → 1001
101 → 1010
110 → 1100
111 → 1111

We can see that all codewords have even weight, the cardinality is M = 23 =
8, the code rate is R = 34 , and the minimum distance is d = 2.

4.1.2 Maximum Possible Code Rate


A natural question is: what is the maximum code rate such that we can still
transmit (almost) error-free over a certain communication channel? Claude
E. Shannon’s (see Figure 4.1) seminal publication from 1948 and his (noisy)
channel coding theorem (also called Shannon’s theorem or Shannon’s limit)
answered this question. Each channel has a certain channel capacity C
which depends on the physical properties of the channel.
As the focus of this lecture is rather code constructions and decoding algorithms,
we only shortly state the channel coding theorem without proof.

Figure 4.1: Claude E. Shannon

Theorem 8 (Channel Coding Theorem). Let R be the rate of the code


used to transmit messages over a channel. Let the capacity of the channel
be denoted as C.
If R < C, it is possible to transmit the messages over the channel in such
a way that the receiver can decode the messages with arbitrarily small error
probability as the block length n goes to infinity.

Channel Coding COD, TUM


4.1 Definition and Properties 45

Conversely, if R ≥ C it is not possible to reach an arbitrarily small block


error probability.

However, the proof of the channel coding theorem is non-constructive and


the code length is not limited. It is therefore an important question how to
design codes that actually get close to the capacity of the respective channel.

4.1.3 Encoding and Generator Matrix

To encode information, we have to map information words to codewords.


Formally, an encoder is defined as follows.

Definition 21 (Encoding). An encoder for an [n, k, d]q code C is a one-to-one


mapping from all information words u = (u0 , u1 , . . . , uk−1 ) over Fq of
length k to codewords of length n from C over Fq :

enc : Fkq → Fnq


u = (u0 , u1 , . . . , uk−1 ) 7→ c = (c0 , c1 , . . . , cn−1 ).

This mapping is a bijection between Fkq and C.


There are many encoders for the same code. That means, the set of
codewords remains the same, but different information words are mapped
to different codewords.
In practice, systematic encoding is frequently used. In order to encode
systematically, n − k redundancy symbols are appended to the vector of k
information symbols. Systematic encoding has the advantage that on the
receiver side after decoding, we just have to cut the first k symbols and
have restored the information symbols. In this lecture, we call an encoding
systematic, if the first k symbols of the codeword are the information
symbols. If the information symbols can be found at different positions
of the codeword (possibly in another order), we call it quasi-systematic
encoding. These two conventions are also illustrated in the following example.

COD, TUM Channel Coding


46 Chapter 4 Linear Block Codes

Example 4.4 (Systematic and Quasi-Systematic Encoding).


Consider first systematic encoding of an SPC of length 3 (see Example 4.3):

u → c
00 → 000
01 → 011
10 → 101
11 → 110

The first two symbols in each codeword are the information bits and the last
bit is a parity check bit (can be used for error detection).
Second, a quasi-systematic encoding of the same code is the following:

u → c
00 → 000
01 → 011
10 → 110
11 → 101

Both encoding methods result in the same code. Their difference is that in
the second mapping, the two information bits can be found in the first and
the third position.

To obtain an efficient way of performing the encoding, a generator matrix


is used. Recall thereby that a linear block code is a k-dimensional subspace
of Fnq . This subspace can be generated by k linearly independent vectors of
length n over Fq (a basis). Writing this basis in the rows of a k × n matrix
provides the generator matrix of the encoding.

Definition 22 (Generator Matrix). A k × n matrix G over Fq is called a


generator matrix of a linear [n, k, d]q code C if its row space generates
the code C (i.e., the rows of G form a basis of C).

The encoding of the information vector u to a codeword c is then done by


calculating:
c = u · G.

A generator matrix is called systematic if it has the form (Ik | A), where
Ik denotes the k × k identity matrix. The codeword is then c = (u | u · A)
where the first k positions equal the information symbols.
Similarly, it is called quasi-systematic if all k unit vectors (i.e., the columns
of Ik ) are columns of the generator matrix. For every linear block code, there

Channel Coding COD, TUM


4.1 Definition and Properties 47

is a quasi-systematic generator matrix which can be calculated by Gaussian


elimination (by doing elementary row operations). A code has a systematic
generator matrix if and only if the first k columns of any generator matrix
are linearly independent.
A generator matrix defines both, the code (the set of codewords) and the
encoding (the mapping between information words and codewords). However,
there are several generator matrices for one code, e.g., by choosing another
basis or just re-ordering the basis vectors. Elementary row operations on G
result in another generator matrix of C. That means, the set of codewords
remains the same but the encoding mapping is different.
The permutation of columns of G gives a generator matrix of an equivalent
code (a code with the same parameters, but different set of codewords).

Example 4.5 ([n, 1, n]q Repetition Code).


The generator matrix of a repetition code is GRP = (11 . . . 1).
| {z }
n

The encoding of u = (1) is therefore c = u · GRP = (1) · (11 . . . 1) = (11 . . . 1),


the encoding of u = (0) is c = u · GRP = (0) · (11 . . . 1) = (00 . . . 0).

Example 4.6 ([n, n − 1, 2]2 Binary Single-Parity Check (SPC) Code).


The systematic generator matrix GSPC of the binary SPC is the following
(n − 1) × n matrix:
 
1
0 ... 0 1
 
0 . . . . . . ... ... 
 
GSPC = . .
. .. 
. . 1 0 1
0 ... 0 1 1

Consider for example the [4, 3, 2]2 SPC code and the information word u =
(010). The corresponding codeword is then:
 
1 0 0 1
 
c = (010) · 0 1 0 1 = (0101).
0 0 1 1

We can see that the first n − 1 columns of GSPC equal In−1 which provides
a systematic encoding. The last bit is the parity bit which is simply the sum
of all information bits. Therefore, the last column is the all-one column.

We can also illustrate that there are several generator matrices for one code.
Let us therefore perform elementary row operations on the previously shown
systematic generator matrix GSPC of the [4, 3, 2]2 SPC-code:

COD, TUM Channel Coding


48 Chapter 4 Linear Block Codes

   
1 0 0 1 I 1 1 0 0 I + II
0 1 0 1 II 0 1 0 1 II
GSPC = → G0SPC =
0 0 1 1 III 1 1 1 1 I + II + III

The matrix GSPC is the systematic generator matrix of the SPC code and
the matrix G0SPC is another generator matrix of the same code which was
constructed by using elementary row operations on the matrix GSPC .

Example 4.7 (Column Permutations).


We consider column permutations on the following generator matrix G:

! !
1 1 0 1 0 1 1 1 0
G= → G = .
0 0 1 1 1 0 0 1

Both generator matrices define codes of length 4 and dimension 2. It becomes


clear that they actually define two different codes when looking at the sets
of all codewords.
The following table shows the mapping from information words of length 2 to
codewords of length 4, where c denotes the corresponding codeword defined
by G and c0 the corresponding codeword defined by G0 .

u c c0
00 0000 0000
01 0011 1001
10 1101 1110
11 1110 0111

We see that the minimum distance of both codes is 2. Thus, permuting


the columns of the generator matrix leads to different codes with the same
length n, dimension k, and minimum distance d.
Notice that the code in this example is not a really “good” code as the SPC
code has the same length and same minimum distance, but dimension 3.

4.1.4 Dual Code


In this section, we introduce the concept of the dual code. The dual code is
needed to define the so-called parity-check matrix which in turn is needed
for decoding.
Pn
Define the scalar product by ha, bi := i=1 ai bi . Then, the dual code is
defined as follows.

Definition 23 (Dual Code). Let C be a linear [n, k, d]q code. Then, the set
of vectors n o
C ⊥ := c⊥ ∈ Fnq : hc⊥ , ci = 0, ∀ c ∈ C

Channel Coding COD, TUM


4.1 Definition and Properties 49

is called the dual code of C. et

The dual code is equal to:


n o
C ⊥ = c⊥ ∈ Fnq : c⊥ GT = 0 . (4.1)

Lemma 4 (Parameters of the Dual Code). The dual code C ⊥ of an [n, k, d]q
code C is a linear [n, k ⊥ = n − k, d⊥ ]q code.

Proof. The length is trivial. The dimension follows from (4.1) as the rank of G
is k and therefore the dimension of the right kernel (and the dimension of the
dual code) is n − k. The distance d⊥ does not necessarily depend on d.

However, d⊥ is not necessarily determined by the parameters of the [n, k, d]q


code. For some classes of codes, there is a direct connection: For example,
if C is an RS(n, k) code with d = n − k + 1, then C ⊥ is an RS(n, n − k)
code with d⊥ = n⊥ − k ⊥ + 1 = k + 1 (see also Chapter 5).

Example 4.8 (Dual Code).


!
1 1 0 1
Consider the code C that is generated by G = .
0 0 1 1

We obtain C = {(0000), (0011), (1101), (1110)} which is a [4, 2, 2]2 code.


Its dual code is C ⊥ = {(0000), (1100), (1011), (0111)}. This dual code C ⊥ is
also a linear [4, 2, 2]2 code where k ⊥ = n − k = 2.
In this example, d⊥ = 2 = d which is not true in general.

4.1.5 Parity-Check Matrix and Syndrome

The parity-check matrix is based on the dual code and needed in the decoding
process.

Definition 24 (Parity–Check Matrix). An (n − k) × n matrix H over Fq


is called a parity-check matrix of an [n, k, d]q code C if it is a generator
matrix of the [n, n − k, d⊥ ]q dual code C ⊥ .

Let us now analyze some properties of the parity-check matrix.

COD, TUM Channel Coding


50 Chapter 4 Linear Block Codes

Since c⊥ ∈ C ⊥ , we have that hc, c⊥ i = 0, and since the rows of the


parity-check matrix H form a basis of C ⊥ (and therefore all rows of H
are codewords from C ⊥ ), we obtain that for any c ∈ C:

c · HT = 0,

and also that G · HT = 0


A parity-check matrix is therefore a matrix whose right kernel (= all vectors
which give zero when multiplied with all rows) is the code C. We can also
say that H is a parity-check matrix of the code C that is generated by G if
and only if the rank of H is n − k and G · HT = 0.
If G = (Ik | A) is a systematic generator matrix of a code C, then H =
(−AT | In−k ) is a parity-check matrix of C. We can verify this by multiplying
these two matrices:

G · HT = (Ik | A) · (−AT | In−k )T


!
−A
= (Ik |A)
In−k
= Ik · (−A) + A · In−k
= −A + A = 0.

The following lemma provides a crucial property on the linear (in)dependence


of the columns of H.

Lemma 5 (Linear (In)dependence of Columns of H). Any d − 1 columns


of H are linearly independent and there are d columns which are linearly
dependent.

Proof. The proof of the first part can be done by contradiction: Assume that
there exist δ ≤ d − 1 linearly dependent columns. It follows that there is a word
c such that:
 
 
 
 
 
  δ lin. dependent
0= ⋆ ⋆ ⋆ ·



  columns of H
 
 
δ non-zero positions  

As the product is 0, the word c is a codeword. This is a contradiction since


wt(c) = δ < d, where d is the minimum distance of the code. Thus, the
assumption is wrong and any d − 1 columns are linearly independent.
Further, there exist d linearly dependent columns since there is a codeword of
weight d as the minimum distance of the code is d (and not more).

Channel Coding COD, TUM


4.1 Definition and Properties 51

Based on the parity-check matrix, we can now define the syndrome.

Definition 25 (Syndrome). For a vector a ∈ Fnq and a parity-check matrix


H of a code C, the vector s := a · HT ∈ Fn−k
q is called the syndrome of a.

Clearly, the syndrome s equals 0 if and only if a ∈ C.

Example 4.9 (Parity-Check Matrices and Dual Code).


The following (n − 1) × n matrix is a parity-check matrix of the repetition
code:  
1 0 ... 0 1
 
0 . . . . . . ... ... 
 
HRP =  . .
. .. 
. . 1 0 1
0 ... 0 1 1

The parity check equation in the i-th row checks whether the values at the
i-th and the last position of the codeword c are equal. Intuitively, if we look
at the scalar product of the i-th row of HRP and a codeword, we see that the
i-th code symbol is added to the last code symbol. Therefore, by HRP , we
check if all symbols are equal to the last symbol (and hence, all are equal).
If this is true, the syndrome is all-zero.
Note that HRP = GSPC and therefore repetition and SPC codes are dual
codes. Hence,  
HSPC = 1 1 . . . 1 = GRP .

For the parameters of the dual code we have kSPC = n − kRP but dSPC 6= dRP .

Example 4.10 (Syndrome).


Consider the binary [5, 1, 5]2 repetition code.
Let c = (11111), then its syndrome is s = c · HTRP = 0.
On the other hand, let a = (10011), then its syndrome is
 
1 0 0 0
0 0
 1 0 
 
s = (10011) · 0 0 1 0 = (0110) .
  | {z }
0 0 0 1 length: n−k=5−1=4
1 1 1 1

The syndrome shows that an error occurred and can be used for error detection.

COD, TUM Channel Coding


52 Chapter 4 Linear Block Codes

4.1.6 The Hamming Code

The code class defined in this section, the Hamming code, is the most famous
single-error-correcting code and defined as follows.

Definition 26 (Hamming Code). The binary Hamming code H(m) of


order m is the code defined by a parity-check matrix which has all non-zero
binary vectors of length m as columns.

Lemma 6 (Parameters of Hamming Code). The Hamming code H(m) is


a [2m − 1, 2m − 1 − m, 3]2 code.

Proof. There are 2m −1 nonzero vectors of length m which are used as the columns
of the parity-check matrix. The length of the code is the number of columns of
the parity-check matrix, i.e., n = 2m − 1.
The parity-check matrix has n − k = m rows, i.e., the dimension of H(m) is
2m − 1 − m.
Any two columns of a parity-check matrix with pairwise different columns are
linearly independent and there are three columns which are linearly dependent.
Thus, the minimum distance is d = 3.

Example 4.11 (Hamming Code of Order 3).


Consider H(3), i.e., the [7, 4, 3]2 Hamming code.
By writing all non-zero vectors of length 3 as columns, we obtain for example
the following (systematic) parity-check matrix of H(3):
 
1 0 0 0 1 1 1
 
H = 0 1 0 1 0 1 1 .
0 0 1 1 1 0 1

Observe that any two arbitrary columns are linearly independent and there
exist three linearly dependent columns, e.g.,:
     
1 0 1
     
0 + 0 = 0 .
0 1 1

4.2 Standard Array (Coset) Decoding


In this section, we consider an ML decoding technique that works for
any linear block code. However, the complexity of this approach grows
exponentially in the length of the code and is therefore practically not

Channel Coding COD, TUM


4.2 Standard Array (Coset) Decoding 53

feasible for most code parameters. The decoding technique is based on


the so-called standard array. We follow the description of [1] here.

Definition 27 (Standard Array). Let C be an [n, k, d]q code. A standard


array for C is a q n−k × q k array in Fnq (i.e., its entries are vectors) such
that:
• Its first row contains all codewords of C as entries, starting with the
all-zero codeword.
• Each subsequent row starts with a word e ∈ Fnq of smallest Hamming
weight which has not appeared in previous rows, followed by the words
e + c, ∀c ∈ C, where c appears in the order of the first row.

The standard array is not unique as the order of the non-zero codewords in
the first row can be different. Also, when constructing the other rows, there
might be more than one word e of smallest Hamming weight that has not
appeared before and we can randomly choose one.
The rows of the standard array are called cosets. Two words a and b are
in the same coset if and only if a − b ∈ C. This in turn is equivalent to the
case that they have the same syndrome.
The first word in each row is a minimum weight word in its coset and is
called coset leader.
The actual decoding process is rather easy to explain. Given a received
word r, the task of the decoder is to find an error word ê such that r−ê ∈ C,
i.e., it is a codeword. Thus, ê must be in the same coset (i.e., row of the
standard array) as r. Since ML/nearest codeword decoding means that ê
should have minimum weight, we decide for ê being the coset leader.

Example 4.12 (Standard Array Decoding).


Let C be a [5, 2, 3]2 code with
!
1 0 1 1 0
G= .
0 1 0 1 1

Construction of the Standard Array:


The 23 × 22 standard array is given in the following table.

COD, TUM Channel Coding


54 Chapter 4 Linear Block Codes

00000 10110 01011 11101 ⇐ =C


00001 10111 01010 11100 ⇐ = C + (00001)
00010 10100 01001 11111 ⇐ = C + (00010)
00100 10010 01111 11001 ⇐ = C + (00100)
01000 11110 00011 10101 ⇐ = C + (01000)
10000 00110 11011 01101 ⇐ = C + (10000)
00101 10011 01110 11000 ⇐ = C + (00101)
10001 00111 11010 01100 ⇐ = C + (10001)

As mentioned before, the standard array is not unique. We can, e.g., permute
the rows where the first entry has weight one (rows 2 until 6). We may also
permute the three rightmost columns. It is also possible to use (01100) as
coset leader in the last row.
Decoding:
Given r, find the row (coset) in the standard array which contains r, and let
the decoded error word ê be the coset leader of this row.
Assume r = (01111). This vector is in the fourth row and the third column
(underlined in the above table). The coset leader of this row is (00100). The
decoding therefore decodes for ê = (00100) and outputs ĉ = r − ê = (01011).
This codeword is in the same column as r.

4.3 Bounds on the Cardinality and Minimum


Distance
In order to understand the limits of code constructions, we derive several
bounds on the maximum size and/or largest possible minimum distance in
this section. We thus consider the following questions:
• Given n and d, what is the maximum possible cardinality?
• What is the maximum dimension (for a given length and error-correcting
capability)?
• Given n and k, what is the maximum d and the maximum unique
decoding radius?
• What do n, k, and d have to fulfill such that we can always construct
a code with these parameters?
The first two bounds that we consider, the sphere-packing bound and the
Singleton bound, establish necessary conditions on the parameters of codes.
The third bound, the Gilbert–Varshamov bound, proves an existence result,
i.e., that there exists a code with certain parameters.

4.3.1 Sphere-Packing Bound

The sphere-packing bound considers the size of a code by packing spheres


around each codeword. A sphere Bt (a) of radius t around a word a ∈ Fnq is

Channel Coding COD, TUM


4.3 Bounds on the Cardinality and Minimum Distance 55

the set of all words in Hamming distance at most t, i.e., Bt (a) := {b ∈ Fnq :
d(a, b) ≤ t}.

Theorem 9 (Binary Sphere-Packing Bound). For any binary [n, k, d]2


code:
bX2 c
d−1 !
n
2 · k
≤ 2n . (4.2)
i=0 i

Proof. There are 2k codewords. Around each codeword, there is a decoding


sphere. The decoding spheres of radius b d−1
2 c around each codeword must not
overlap.
n
There are i words in distance exactly i from a fixed word and |Bt (a)| =
Pb d−1
2
c n
i=0 i (independent of a).

Thus, the total number of words in all decoding spheres (left-hand side of (4.2))
is at most the size of the whole space (right-hand side of (4.2)).

Codes which attain (fulfill it with equality) the sphere-packing bound are
called perfect codes.
We can list all linear perfect codes.

1. The set of all words, i.e., Fnq (this is an [n, n, 1]q code).
j k
d−1
Since = 0, it can easily be checked that this is a perfect code:
2  
The LHS of (4.2) is 2n · n
0
= 2n and therefore equals the RHS.
2. The binary [n, 1, n]2 repetition code for odd n.
j k j k
d−1 n−1 n−1
For odd length n, 2
= 2
= 2
.
The LHS of (4.2) equals therefore
n−1 ! !
X
2
n (∗) 1 Xn
n
21 · = 2· · = 2n ,
i=0 i 2 i=0 i

where (∗)
 holds
 because of the symmetry of the binomial coefficients,
N N
i.e., i = N −i .
3. The q-ary Hamming code H(m).

As we have derived the sphere-packing bound only for binary codes,


consider the [2m − 1, 2m − 1 − m, 3]2 Hamming code. Here, we obtain
!
X
1
n
= 2k (1 + n) = 22 −1−m · (1 + 2m − 1) = 22 −1 = 2n .
m m
2k ·
i=0 i

The equality for q-ary Hamming codes can be shown in the same way.

COD, TUM Channel Coding


56 Chapter 4 Linear Block Codes

4. The [23, 12, 7]2 Golay code and the [11, 6, 5]3 Golay code.

These cyclic codes will be considered later in Chapter 6.

It can be shown that there are no other linear perfect codes, cf. [1, Page 96].

4.3.2 Singleton Bound

Theorem 10 (Singleton Bound). For any [n, k, d]q code:

d ≤ n − k + 1.

Proof. Every linear block code has a quasi-systematic generator matrix, i.e., the
columns of Ik (unit vectors of length k) are columns of G.
If two information vectors differ in only one symbol, the two codewords differ in
at most n − k redundancy symbols and one information symbol. The other k − 1
systematic positions contain the same symbols in both codewords.
Hence, d ≤ n − (k − 1) = n − k + 1.

Codes which attain the Singleton bound are called Maximum Distance
Separable (MDS) codes. The following codes are (amongst others) MDS
codes.
1. The set of all words, i.e., Fnq (this is an [n, n, 1]q code).
2. The [n, 1, n]q repetition code.
3. The [n, n − 1, 2]q single parity-check (SPC) code.
4. The [n, k, n − k + 1]q≥n Reed–Solomon code RS(n, k). This is the
most famous class of MDS codes and can be constructed for all k and
n. The only limitation of Reed–Solomon codes is that the field size
has to be at least in the order of n. We will consider them in detail
in Chapter 5.
Note that there are no other binary MDS codes than the ones contained in
the previous enumeration (and their cosets).
It is possible to correct d − 1 = n − k erasures with an MDS code. That
means, if we know any k symbols of a codeword, we can reconstruct the
other n − k symbols.

4.3.3 Gilbert–Varshamov Bound

While the sphere-packing and Singleton bounds are upper bounds and
provide necessary conditions for any linear code, the Gilbert–Varshamov
(GV) bound shows the existence of codes with certain parameters, i.e., a
sufficient condition. The binary GV bound is as follows.

Channel Coding COD, TUM


4.4 Obtaining Longer/Shorter Codes from Existing Codes 57

Theorem 11 (Binary Gilbert–Varshamov Bound). There exists a binary


linear [n, k]2 code with minimum distance at least d whenever
!
n−k
X
d−2
n−1
2 > .
i=0 i

Proof. We construct iteratively an (n − k) × n parity-check matrix. We start


with the identity matrix In−k and add further columns. For each new column,
we have to check if any d − 1 linear columns are still linearly independent.
Assume we have already selected l − 1 (≥ n − k) columns h1 , h2 , . . . , hl−1 ∈
(n−k)×1
F2 . Note that In−k is contained in these columns.
A new column hl has to be chosen such that it cannot be written as a linear
combination of any d − 2 existing columns. That means, ineligible columns can
be written as hl = (h1 , . . . , hl−1 ) · x for some x ∈ Fl−1
2 of Hamming weight
wt(x) ≤ d − 2.
The number of vectors of length l − 1 and Hamming weight at most d − 2 equals
Pd−2 l−1
i=0 i . The number of ineligible columns is at most the number of vectors
of Hamming weight at most d − 2 (two different x might result in the same hl
and therefore equality does not always hold).
If the number of ineligible columns is strictly less than 2n−k , there is still a column
Pd−2 l−1
which can be added. Thus, if i=0 i < 2n−k , there is still a column which
can be added.
This has to hold for any l ≤ n and the statement in the theorem is for l = n.

4.4 Obtaining Longer/Shorter Codes from


Existing Codes
This section deals with constructing longer or shorter codes from existing
codes. We therefore consider the operations shortening, lengthening, puncturing
and extending.
The first operation that we consider is shortening. Given an [n, k, d]q
code C, the following procedure shortens the code by one (the first) position.
1. Choose all codewords of C with “0” at the first position.
2. Remove this “0” from all words.
Shortening a degenerate code (where all codewords are “0” in the first
position and the generator matrix has an all-zero column) clearly results
in an [n − 1, ks = k, d]q code. For any other linear code, we obtain the
following properties of shortening.

Lemma 7 (Properties of Shortening). Shortening a non-degenerate [n, k, d]q


code C as described above gives an [n−1, ks = k −1, ds ]q code Cs with ds ≥ d.

COD, TUM Channel Coding


58 Chapter 4 Linear Block Codes

1
Sphere-Packing Bound (q = 2)
0.9 Sphere-Packing Bound (q = 4)
Sphere-Packing Bound (q = 8)
0.8 Gilbert-Varshamov Bound (q = 2)
Gilbert-Varshamov Bound (q = 4)
Gilbert-Varshamov Bound (q = 8)
0.7
Singleton Bound

0.6

0.5
δ

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R

Figure 4.2: Comparison of the binary Gilbert–Varshamov, sphere-packing


and Singleton bounds for n = 1000

Proof. As one position is removed, the length is clearly reduced by one.


The dimension is based on the following fact: In a q-ary linear code, for any fixed
position j a fraction of 1q of the codewords are “0” at this position j. Thus, for
the size of the shortened code, we have |Cs | = 1q |C| and ks = k − 1.
The minimum distance does not decrease as we only remove codewords and
therefore the minimum distance is at least the one from the original code.

We now want to compare the generator and parity check matrices of C and
Cs . Let G = (Ik | A) be a systematic generator matrix of a non-degenerate
code C for some A ∈ Fqk×(n−k) . Then Step 1 of the shortening procedure
means that we fix c0 = u0 = 0. This is equivalent to removing the first row
of G. Step 2 of the shortening procedure means that also the first column
is removed. This results in a systematic generator matrix of Cs , denoted
by Gs = (Ik−1 | As ), where As ∈ Fq(k−1)×(n−k) consists of the k − 1 bottom
rows of A.
The parity check matrix of Cs is then Hs = (−ATs | In−k ). This is also
illustrated in Figure 4.3.

Channel Coding COD, TUM


4.4 Obtaining Longer/Shorter Codes from Existing Codes 59

 
1
 .. 
( 0 u1 u2 ... uk−1 )
 . 
 = ( 0 u1 ... uk−1 ck ... cn−1 )
1

1 means deleting first row


2 means deleting first column

1
G = Ik A k H = −AT In−k

Gs = Ik−1 As k−1 Hs = −ATs In−k

Figure 4.3: Generator and parity-check matrices after shortening a


systematic generator matrix

The reverse operation to shortening is lengthening. Given an [n, k, d]q


code C with a k × n systematic generator matrix G, the following procedure
lengthens the code by one position.
1. Append to all codewords a “0” as the first position. That means, add
an all-zero column as leftmost column to G.
2. Add a new “suitable” row to G, starting with “1”. Denote the new
(k + 1) × (n + 1) generator matrix by Gl and the generated code by Cl .
The second step of lengthening, i.e., which new row is suitable, depends on
the code C. The leading “1” of the new row guarantees that rank(Gl ) = k+1
and therefore that the dimension of Cl is k + 1. However, in general it is
not clear how to choose this new row of the generator matrix such that
distance does not decrease and the lengthening of the [n, k, d]q code C as
described above gives an [n + 1, k + 1, d]q code Cl . The hard task here
is to choose the first row such that the number of codewords multiplies
by q but the minimum distance does not decrease. Sometimes, appending
the all-one row works. For binary codes, this corresponds to adding all
inverted words (of the words of length n + 1 after Step 1) to the set of
words after Step 1. In general, when adding the all-one row, we obtain an
[n + 1, k + 1, min{d, n + 1 − maxc∈C wt(c)}]q code.
The following examples show how choosing the row of the generator matrix
is crucial.

Example 4.13 (Lengthening).


Let C = {(00), (11)} be the binary repetition code of length 2 with G = (11)

COD, TUM Channel Coding


60 Chapter 4 Linear Block Codes

and d = 2. The generator matrix of the lengthened code has the following
structure: !
1 ? ?
Gl =
0 1 1
If we choose (111) as first row, then (100) is a codeword and dl = 1 only.
However, with (101) as first row, we obtain a [3, 2, 2]2 code Cl .

Example 4.14 (Lengthening).


Let C be the [3, 2, 2]2 SPC code with
!
1 0 1
G= ,
0 1 1

minimum distance d = 2 and C = {(000), (101), (011), (110)}. The generator


matrix of the lengthened code has the following structure:
 
1 ? ? ?
 
Gl = 0 1 0 1 .
0 0 1 1

If we choose (1111) as first row, then

Cl = {(0000), (0101), (0011), (0110), (1111), (1010), (1100), (1001)}.

Thus, lengthening with the all-one row results in the code Cl which is a
[4, 3, 2]2 SPC code Cl .
Reversely, shortening the code Cl by the first position results in turn in the
[3, 2, 2]2 SPC code C.

The third operation that we consider is puncturing. Given an [n, k, d]q


code C with a k × n systematic generator matrix G, the following procedure
punctures the code by position j ∈ {0, . . . , n − 1}.
1. Puncture (= delete) position j in all codewords and obtain the code Cp .
The length of Cp is n − 1. The dimension decreases to k − 1 if the rank
of the generator matrix after removing the j-th column (denoted by Gp )
decreases to k − 1.
Consider the case when rank(Gp ) is k − 1 (note that here j must be an
information position, i.e., j ∈ {0, . . . , k −1}). Then puncturing is equivalent
to shortening and we obtain an [n − 1, k − 1, ds ]q code. While for shortening
the minimum distance can stay the same or increase, in this particular case
it remains the same, i.e., ds = d. The only exception occurs when the
original code contains a weight-one codeword whose j-th entry is 1 and all
others 0, in which case it is possible for ds to become larger than d.
Else if rank(Gp ) = k, puncturing yields an [n − 1, k, dp ]q code, where dp ∈
{d − 1, d}. Whether the distance decreases or not depends on the structure
of the generator matrix and the position that is punctured.

Channel Coding COD, TUM


4.4 Obtaining Longer/Shorter Codes from Existing Codes 61

Puncturing a position in the redundancy part (which implies that rank(Gp ) =


k) removes one column of A where G = (Ik | A) is a systematic generator
matrix of C. This and the implications for the parity-check matrix is
illustrated in Figure 4.4.

1 1
.. Gp = .. Ap
G= . A .
1 1
n−k n−k−1

H= −AT In−k n−k Hp = −ATp In−k−1 n−k−1

k k
n−1

Figure 4.4: Generator and parity-check matrices after puncturing

Example 4.15 (Puncturing).


Consider an [5, 2, 2]2 code C with the following systematic generator matrix
!
1 0 1 1 1
G= .
0 1 0 0 1

Puncturing at position j ∈ {0, 2, 3} results in a [4, 2, 2]2 code, i.e., neither


dimension nor distance are reduced.
Puncturing at position j ∈ {1, 4} results in a [4, 2, 1]2 code, i.e., the distance
is reduced by one.

Note that shortening of a code is equivalent to puncturing its dual code at


the same position and vice versa.
The reverse operation to puncturing is extension. We consider it here
for the binary case only. Given a binary [n, k, d]2 code C, we obtain the
extended code Ce as follows.
1. Append 0 or 1 to each codeword such that the weight of each word of
length (n + 1) is even.
If the minimum distance d of C is odd, the extended code Ce is an [n +
1, k, d + 1]2 code. This is true as all weight-d codewords of C have odd
weight and therefore we append a “1” and the minimum-weight codewords
of Ce have weight d + 1.
If d is even, the minimum distances of C and Ce are the same since we
append a “0” to all weight-d codewords of C and therefore their weight does
not change.

COD, TUM Channel Coding


62 Chapter 4 Linear Block Codes

The parity-check matrix He of Ce can be obtained from the parity-check


matrix H of C as follows.
 
1 1 ... ... 1
 
 0
 
He =  .. 
 H .
 
0

The all-one row on top guarantees that all codewords of Ce have even weight.

Example 4.16 (Extended Hamming Code).


The extended Hamming code is an [8, 4, 4]2 code that can be obtained from
the [7, 4, 3]2 Hamming code by extension.
A parity-check matrix of the extended Hamming code is therefore
 
1 1 1 1 1 1 1 1
1 0
 0 0 0 1 1 1 
He =  .
0 1 0 1 0 1 1 0
0 0 1 1 1 0 1 0

Channel Coding COD, TUM


Chapter 5

Reed–Solomon Codes
Reed–Solomon (RS) codes are a powerful class of codes which are probably
the most extensively used classed of codes in practice. This is due to
multiple reasons. First, RS codes are optimal in the sense that they are MDS
codes and attain the Singleton bound with minimum distance d = n − k + 1.
Second, they can be efficiently encoded and decoded. They were introduced
already in 1960 by Irving Reed and Gustave Solomon (see Figure 5.1).

Figure 5.1: Irving Reed (left) and Gustave Solomon (right)

Applications of RS codes include optical data storage media (CD, DVD),


magnetic data storage media (e.g., hard disk drives), space transmission,
and QR codes.
The recommended literature for this chapter is [1, Sections 5.1, 5.2 and
Sections 6.1, 6.2, 6.3, 6.5] and for list decoding RS codes [2, Chapter 12].

5.1 Definition and Properties

5.1.1 Parity-Check Matrix and Generator Matrix

We start with defining (Generalized) RS codes by their parity-check matrix


and later derive the corresponding generator matrix.

Definition 28 (Generalized Reed–Solomon Code). Let α0 , α1 , . . . , αn−1 ∈


Fq be non-zero distinct elements (called code locators) and let ν0 , ν1 , . . . , νn−1 ∈
64 Chapter 5 Reed–Solomon Codes

Fq be non-zero elements (called column multipliers), where n ≤ q − 1. The


code defined by the following (n − k) × n parity-check matrix
   
1 1 ... 1 ν0
   
 α0 α1 ... αn−1   ν1 
   
HRS :=  .. .. .. .. · ..  (5.1)
 . . . .   . 
   
α0n−k−1 α1n−k−1 . . . αn−1
n−k−1
νn−1

is called RS(n, k) (Generalized) Reed–Solomon code.

Theorem 12 (Minimum Distance of GRS Codes). The minimum distance


of RS(n, k) codes is d = n − k + 1.

Proof. Since HRS is a Vandermonde matrix (multiplied by a full-rank diagonal


matrix), any (n − k) × (n − k) submatrix of HRS has full rank and thus any n − k
columns of HRS are linearly independent. That means that d ≥ n − k + 1.

Due to the Singleton bound, d ≤ n − k + 1 and therefore equality follows.

Now, we calculate the corresponding generator matrix.

Theorem 13 (Generator Matrix of GRS Codes). Given an RS(n, k) code


defined by its parity-check matrix HRS as in (5.1). There exist non-zero
elements ν00 , ν10 , . . . , νn−1
0
such that

   
1 1 ... 1 ν0
   0 


α0 α1 . . . αn−1 
 
 ν10 

GRS := 
 .. .. .. .. 

· 
 . 

(5.2)
. . . .   . .
 
0
α0k−1 α1k−1 k−1
. . . αn−1 νn−1

is a generator matrix with GRS · HTRS = 0.

Proof. Rewrite GRS · HTRS = 0 and let a indicate the a-th row of G and b the
b-th column of HT :

   ν0   
 1 α0 ... αn−k−1
1 1 ... 1 0 ν0 0  0 ... 0 
α
 .0
α1 ... αn−1
 
0
ν1  ν1 1 α ... αn−k−1 
 .. .. .. .. · · · . .1 1  = .. . . .. .
. . . ..
.
..
. . . ..
.
..  . . .
. . . 0 ... 0
αk−1
0 αk−1
1 ... αk−1
n−1
0
νn−1
νn−1 n−k−1
1 αn−1 ... αn−1

Channel Coding COD, TUM


5.1 Definition and Properties 65

)
∀ a = 0, . . . , k − 1 X
n−1
⇐⇒ αja νj νj0 αjb = 0
∀ b = 0, . . . , n − k − 1 j=0
X
n−1
⇐⇒ νj νj0 αja+b = 0
j=0
X
n−1
⇐⇒ νj νj0 αjr = 0 ∀ r = 0, ..., n − 2. (⋆)
j=0

Rewrite (⋆) with matrices:


     
1 1 ... 1 ν0 ν00
     ν10 
 α0 α1 . . . αn−1   ν1   
 ..   ·  = 0.
 .. .. .. · ..   .. 
 . . . .   .   . 
0
α0n−2 α1n−2 n−2
. . . αn−1 νn−1 νn−1
| {z }
Parity-check matrix of an RS(n, 1) with d = n

 
Hence, ν00 , ν10 , . . . , νn−1
0 ∈ RS(n, 1) and wt ν00 , ν10 , . . . , νn−1
0 = n, i.e., the νi0 ’s
are all non-zero and determined by (⋆).

The rank of GRS is clearly k. If one code locator is chosen to be zero, n = q


is also possible. However, we will not consider this case in this lecture, since
some properties of our decoding algorithms require that αi 6= 0, ∀i.
Note that either the column multipliers of HRS , i.e., ν0 , . . . , νn−1 , or the
column multipliers of GRS , i.e., ν00 , . . . , νn−1
0
, can be chosen to be arbitrary
non-zero elements (e.g., all equal to 1). The second set of column multipliers
is then determined by GRS · HTRS = 0 as in the proof of Theorem 13.

5.1.2 Definition via Evaluation

Alternatively, we can define GRS codes by evaluating degree-restricted


polynomials as follows.

Theorem 14 (Generalized Reed–Solomon Code). Let α0 , α1 , . . . , αn−1 ∈ Fq


be n non-zero distinct elements and let ν00 , ν10 , . . . , νn−1
0
∈ Fq be non-zero
elements, where n ≤ q−1. Then, the code which is generated by the following
set of vectors
n
eval(u(x)) := (ν00 u(α0 ), ν10 u(α1 ), . . . , νn−1
0
u(αn−1 )) : u(x) ∈ Fq [x]
o
and deg u(x) < k

is an RS(n, k) code (equivalent to the definition via GRS ).

COD, TUM Channel Coding


66 Chapter 5 Reed–Solomon Codes

u(αi )

α0 α1 α2 α3 α4 α5 α6 α7 α8 α9 α10 α11 α12


Fq

at most
deg(u(x)) roots

eval(u(x)) = (0, ∗, 0, ∗, ∗, ∗, ∗, ∗, 0, ...)


∗: some non-zero element of Fq

Figure 5.2: Illustration of the minimal weight of a codeword of an RS code.

This statement becomes clear when writing the evaluation as a vector-matrix


product: evaluating u(x) = u0 + u1 x + · · · + uk−1 xk−1 at αi is equal to
u(αi ) = (u0 , u1 , . . . , uk−1 ) · (1, αi , . . . , αik−1 )T . Thus, eval(u(x)) = u · GRS
where u = (u0 , u1 , . . . , uk−1 ) and GRS is defined in (5.2).

This also provides an alternative proof of the minimum distance of RS


codes. Any polynomial u(x) of degree < k has at most k − 1 roots, therefore
eval(u(x)) has at most k − 1 zeros and its weight is at least n − (k − 1), as
illustrated in Figure 5.2. Due to the Singleton bound d = n − k + 1.

5.1.3 Primitive Reed–Solomon Codes

A special class of GRS codes are so-called primitive RS codes, defined as


follows.

Definition 29 (Primitive Reed–Solomon Code). Let n divide q − 1, α ∈ Fq


be an element of order n, the code locators be αi = αi , ∀i = 0, . . . , n − 1,
and ν00 , . . . , νn−1
0
= 1 (this is sometimes also called “normalized”). Then,
the GRS code from Theorem 13 is called a primitive RS code.

A primitive RS code therefore has a generator matrix of the form (see


Theorem 13):
 
1 1 1 ... 1
 
1 α α2 ... α(n−1) 
 
GRS = . .. .. .. .. . (5.3)
 .. . . . . 
 
1 αk−1 α2(k−1) . . . α(n−1)(k−1)

Channel Coding COD, TUM


5.1 Definition and Properties 67

Lemma 8 (Primitive RS Code). A primitive RS code defined as in Definition 29


and generator matrix as in (5.3) has a parity-check matrix of the form:
 
1 α α2 ... αn−1
 
1 α2 α4 ... α2(n−1) 
HRS =
. .. .. .. ..

.
 .. . . . . 
 
1 αn−k α(n−k)2 . . . α(n−k)(n−1)

That means, νi = αi = αi .

Proof. Since GRS · HTRS = 0 has to hold, we use from the proof of Theorem 13:

X
n−1
νj νj0 αjr = 0, ∀ r = 0, ..., n − 2.
j=0

Let αj = αj , νj0 = 1, and νj = αj = αj , for all j. With the help of the geometric
P 1−an an −1
series n−1 j
j=0 a = 1−a = a−1 , it follows:

X
n−1 X
n−1 j α(r+1)n − 1
αj·(r+1) = α(r+1) = = 0.
j=0 j=0
αr+1 − 1

The last equality follows from {αr+1 : r = 0, . . . , n − 2} = {α, α2 , . . . , αn−1 } 63 1,


the fact αn = 1 and αr+1 6= 1 ∀ r ∈ {0, . . . , n − 2} since ord(α) = n. In addition,
no smaller i exists such that αi = 1.

5.1.4 Definition via Discrete Fourier Transform

Alternatively to generator/parity-check matrices and the evaluation, some


textbooks define GRS codes via the discrete Fourier transform (DFT). In
this section, we consider the definition of RS codes by the DFT and show
why this definition is equivalent to primitive RS codes which can of course
be encoded by their generator matrix or by the evaluation.
As a starting point we give a recapitulation of the DFT over C, which also
motivates the usage of the DFT for RS codes.

Definition 30 (DFT over Complex Numbers). The DFT of a vector a =


(a0 , . . . , an−1 ) over C is A = (A0 , . . . , An−1 ), where

X
n−1 X
n−1
− 2πj ·iℓ
Aℓ := ai · e n = ai · wiℓ , ∀ ℓ = 0, . . . , N − 1
i=0 i=0

2πj
with the Fourier kernel defined as w = e− n and j being the imaginary unit
here.

COD, TUM Channel Coding


68 Chapter 5 Reed–Solomon Codes

Looking at w, it can be observed that it is an element of order n, since the


smallest integer i such that wi = 1 is i = n. In a finite field Fq , the Fourier
kernel is an element α ∈ Fq of order n.

Definition 31 (DFT over Finite Field). Let α ∈ Fq be an element of order


n (hence, n | q − 1), and let a = (a0 , a1 , . . . , an−1 ) denote a vector over Fq
and a(x) = a0 + a1 x + · · · + an−1 xn−1 its polynomial representation. The
DFT of a is denoted by A = (A0 , A1 , . . . , An−1 ) = F(a), where

X
n−1
Aj = n−1 ai α−ij = n−1 a(α−j ), ∀j = 0, . . . , n − 1.
i=0

Theorem 15 (Inverse DFT (IDFT) over Finite Field). Consider a =


(a0 , a1 , . . . , an−1 ) and its DFT A = F(a). Then a = F −1 (A) can be
calculated by the IDFT which is

X
n−1
ai = Aj αij = A(αi ), ∀i = 0, . . . , n − 1.
j=0

Pn−1
Proof. We have to prove that A(αi ) = ai where A(x) = j=0 Aj xj with Aj =
P
n−1 n−1
l=0 al α
−lj .

Therefore, we rewrite:

X
n−1 X
n−1 X
n−1 X
n−1 X
n−1
A(αi ) = Aj · (αi )j = n−1 al α−lj · αij = n−1 al αj(i−l) .
j=0 j=0 l=0 l=0 j=0

For fixed l 6= i, due to the geometric series, we obtain

X
n−1
(αl−i )n − 1
αj(i−l) = = 0,
j=0
αl−i − 1

since αn = 1 and if l 6= i, αl−i 6= 1.


For l = i, we obtain
X
n−1 X
n−1
αj(i−l) = α0 = n.
j=0 j=0

Thus,
X
n−1 X
n−1
A(αi ) = n−1 al αj(i−l) = ai .
l=0 j=0

The following corollary simply follows by comparing the definitions of primitive


RS codes and the one of RS codes via the DFT.

Channel Coding COD, TUM


5.2 Syndrome-Based Unique Decoding 69

Corollary 1 (Reed–Solomon Code). Let n | (q − 1) and let α ∈ Fq be an


element of order n. Then, the code which is generated by the following set
of polynomials
n o
c(x) := F −1 (u(x)) : u(x) ∈ Fq [x] and deg u(x) < k

is a primitive RS(n, k) code.

Throughout this lecture, we use the description as polynomials or vectors


whatever is more convenient.

For primitive normalized RS codes (where n | (q − 1), α is an element


of order n, and ν00 = · · · = νn−1
0
= 1), the three definitions (parity-check
matrix, evaluation, DFT) are equivalent.

The definitions via evaluation and generator matrix are equivalent and more
general than the one via the DFT as the one of the DFT only works for
primitive RS codes.

5.2 Syndrome-Based Unique Decoding

In this section, we deal with unique decoding of RS codes, i.e., with the
following task. We mainly follow the description of [1, Chapter 6].

Given: received
j k word r (or polynomial r(x)) such that r = c + e, where
wt(e) ≤ 2 and c ∈ RS(n, k).
d−1

Task: find the codeword c.

We denote the set of error locationsj kby E := supp(e) = {i : ei 6= 0, i =


0, . . . , n − 1} with |E| = wt(e) ≤ 2 .
d−1

The decoding consists of the following decoding steps:

1. Syndrome computation: s = r · HTRS .


Q
2. Find the error locator polynomial: Λ(x) := i∈E (1 − αi x).

3. Find the error locations E.

4. Find the error values.

We use the definition via evaluation in the following (which is equivalent


to the definition via the generator matrix GRS ). In the following, we go
through the aforementioned steps in detail.

COD, TUM Channel Coding


70 Chapter 5 Reed–Solomon Codes

5.2.1 Syndrome Computation

The first step in the decoding process is the syndrome computation. This is
the easiest step as it consists only of multiplying the received word by the
parity-check matrix. The main goal of this section is to derive an expression
of the syndrome polynomial that is used for the following decoding process.
The task of this decoding step is the following.

Given: received word r.


Task: find the syndrome vector s = (s0 , s1 , . . . , sd−2 ) ∈ Fqd−1 .

The syndrome is calculated by s = (s0 , s1 , . . . , sd−2 ) = r · HTRS , where HRS


is defined in (5.1). The syndrome coefficients for i = 0, . . . , d − 2 are

X
n−1 X
n−1 X
si = rj νj αji = ej νj αji = ej νj αji ,
j=0 j=0 j∈E

where we used that rj = cj + ej and c · HTRS = 0.


The syndrome polynomial is defined as the polynomial with the syndrome
coefficients as polynomial coefficients and is then given by

X
d−2 X X
d−2
S(x) := s i xi = ej νj (αj x)i .
i=0 j∈E i=0

Consider the ring Fq [x] modulo xd−1 (denoted by Fq [x]/xd−1 ). That means,
all polynomials have their coefficients in Fq (this is the polynomial ring
Fq [x] that we have considered before) and are calculated modulo xd−1 .
Calculating a polynomial modulo xd−1 means cutting higher powers as
shown in Example 3.11.
In Fq [x]/xd−1 , the following multiplicative inverse can be calculated:

X
d−2
(1 − αj x)−1 = (αj x)i mod xd−1 .
i=0

Using the previous inverse, the syndrome polynomial can be rewritten as


X e j νj
S(x) = mod xd−1 . (5.4)
j∈E 1 − αj x

5.2.2 The Key Equation and How to Solve it

In this section, we derive the so-called key equation. This equation provides
a relation between two polynomials that are in turn related to the error

Channel Coding COD, TUM


5.2 Syndrome-Based Unique Decoding 71

word: the error locator polynomial (ELP) and the error evaluator polynomial
(EEP). The first is related to the error locations while the latter is related to
the error values. The task in the section is to derive a relation between these
two polynomials (defined in the following) and the syndrome polynomial.

Given: syndrome s = (s0 , s1 , . . . , sd−2 ) (or equivalently S(x)).


Task: find the ELP Λ(x) and the EEP Ω(x).

The ELP is denoted by Λ(x) and indicates where the error positions are. It
is defined by Y
Λ(x) := (1 − αi x).
i∈E

Its roots are the αℓ−1 when ℓ is an erroneous position, i.e., Λ(αℓ−1 ) = 0 ⇐⇒
ℓ ∈ E. Thus, the roots of the ELP tell us where the errors are.
The EEP is denoted by Ω(x) and helps us to determine the error values. It
is defined by X Y
Ω(x) := e i νi (1 − αj x).
i∈E j∈E\{i}

If we insert the roots of the ELP (code locators at the error positions), we
obtain: Y
∀ℓ ∈ E : Ω(αℓ−1 ) = eℓ νℓ (1 − αj αℓ−1 ) 6= 0.
j∈E\{ℓ}

Note that Λ(x) and Ω(x) share no common roots, i.e., gcd(Λ(x), Ω(x)) = 1.
The degrees of the ELP and the EEP satisfy:
$ %
d−1
deg Ω(x) < |E| = deg Λ(x) ≤ .
2

Finally, we can establish a connection between the ELP and the EEP as
follows:
Q !
X − αj x)
j∈E\{i} (1 X ei νi
Ω(x) = Λ(x) e i νi Q = Λ(x) .
i∈E j∈E (1 − αj x) i∈E 1 − αi x

Combining this with the expression of the syndrome polynomial from (5.4)
yields:
$ %
d−1
Ω(x) = Λ(x)S(x) mod x , where deg Ω(x) < deg Λ(x) ≤
d−1
.
2
(5.5)
Equation (5.5) forms the so-called key equation for decoding (G)RS codes.
The next decoding step is to solve the key equation for the ELP. Once the
ELP is known, the EEP can be calculated from (5.5).

COD, TUM Channel Coding


72 Chapter 5 Reed–Solomon Codes

For t := wt(e) = |E| = deg Λ(x), the polynomial key equation from (5.5) is
equivalent to the following linear system of equations:
   
s0 0 0 ... 0 Ω0
   
 s1 s0 0 ... 0   Ω1
     
 .. .. .. .. ..   
..
   
 . . . . .   Λ0 = 1
  
.
   Λ   
 st−1 st−2 ... s0 0   1  Ωt−1 
 · ..  =   (5.6)
 s st−1 ... s1 s0     0 
 t   .   
   
 st+1 st ... s2 s1  Λt  0 
   
 . .. .. .. ..   . 
 .. . . . .   .. 
   
sd−2 sd−3 . . . sd−t−1 sd−t−2 0
j k Pt Pt−1
where t ≤ d−1
2
, Λ(x) = i=0 Λi xi , and Ω(x) = i=0 Ωi xi .
In principle, the last d − 1 − t ≥ t equations of (5.6) do not depend on Ω(x),
so we can solve them for the coefficients of Λ(x). Once we know Λ(x), we
can use the first t equations to determine Ω(x).
However, the problem with this strategy is that we do not know the actual
value of t. For this purpose, the following lemma is needed. Its proof is due
to Peterson1 .

Lemma 9 (Rank of Syndrome Matrix). The matrix with syndrome coefficients


 
sν−1 . . . s 1 s0
 
 sν . . . s 2 s1 
 
Sν :=  . .. .. ..  (5.7)
 .. . . . 
 
s2ν−2 . . . sν sν−1

is singular if ν > t and non-singular if ν = t.

Based on Lemma 9, we can determine t, i.e., the number of errors that


actually happened, by the following strategy.
These considerations lead to the following algorithm to solve the key equation
and obtain the ELP Λ(x) and the EEP Ω(x).
Note that (5.8) follows from (5.6) by moving the first column to the right
side (since Λ0 = 1) and using the t first equations of the lower block that
does not depend on Ω(x).
The following theorem follows from j Lemma
k 9 since St is set up such that
it has full rank. Note that for t > d−1
2
, the matrices Tt and St cannot be
set up properly as wej knowk the syndrome coefficients only up to sd−2 and
2t − 1 > d − 2 if t > 2 .
d−1

1
W.W. Peterson, “Encoding and error-correction procedures for the Bose-Chaudhuri
codes”, IEEE Trans. Inform. Theory, Sep. 1960, pp. 459–470.

Channel Coding COD, TUM


5.2 Syndrome-Based Unique Decoding 73

Algorithm 1: Find t
Input: s,
j d k
1 Set ν = d−1
2
and set up Sν
2 while Sν is singular do
3 Set ν ← ν − 1
4 Set up Sν
5 t=ν
Output: t

Algorithm 2: Solve the Key Equation


Input: jsyndrome
k coefficients s0 , s1 , . . . , sd−2
d−1
1 Set t = 2 and set up St as in (5.7)
2 while St is singular do
3 Set t ← t − 1 and set up St as in (5.7)
4 Solve the following linear system of equations for Λ1 , . . . , Λt :
   
st−1 . . . s 1 s0   st
  Λ1  
 st . . . s 2 s1   st+1 
   .   
 .
 .. .. .. .. 

· 
 ..  = − 
 .. 

(5.8)
 . . .   . 
Λt
s2t−2 . . . st st−1 s2t−1
| {z } | {z }
=St =Tt

Pt
5 Set Λ(x) = 1 + i=1 Λ i xi
6 Calculate Ω(x) = Λ(x)S(x) mod xd−1
Output: ELP Λ(x) and EEP Ω(x)

COD, TUM Channel Coding


74 Chapter 5 Reed–Solomon Codes

j k
Theorem 16 (Uniqueness of Solution). For t ≤ d−1
2
, the solution (Λ(x), Ω(x))
to the key equation (5.5) is unique.

It is important to remark that instead of solving the linear system of


equations by Gaussian elimination, it can be solved more efficiently by the
Berlekamp–Massey algorithm or the Euclidean algorithm (see Sugiyama et
al.). These algorithms also do not need to find t separately, they find t and
Λ(x) simultaneously and therefore reduce the complexity of solving the key
equation. This lecture however deals only with solving the linear system of
equations.
j k
If we apply this strategy for an error with t > d−1
2
, the matrix St (using
only those rows for which we have enough syndrome coefficients, i.e., up to
sd−2 ) has fewer rows than columns and therefore a solution space of size
larger than one. If we randomly choose a solution in this space, we might
find an invalid ELP (i.e., it does not have t roots) or one that does not
indicate the correct error positions.

5.2.3 Finding the Error Locations


Recall that in the previous subsection, we have found a principle to determine
the ELP and the EEP given only the syndrome S(x). In this subsection,
we show how to find the error locations given the ELP.

Y
Given: ELP Λ(x) = (1 − αi x).
i∈E
Task: find the set of error locations E.

We therefore want to find all ℓ such that Λ(αℓ−1 ) = 0 ⇐⇒ ℓ ∈ E. This is


a simple root finding of Λ(x). Since there are at most n possible roots (the
inverses of the code locators αℓ ), we can simply try all of them.
The so-called Chien search is a more efficient way of finding the roots of a
polynomial over a finite field and shown in the following when αℓ = αℓ .
Pt
The goal is to find all β ∈ Fq such that Λ(β) = 0, where Λ(x) = 1+ i=1 Λ i xi
and β = αiβ for a primitive element α.
For some fixed i, we denote the terms of the evaluation by
Λ(αi ) = 1 + Λ1 αi + Λ2 (αi )2 + · · · + Λt (αi )t =: 1 + γ1,i + γ2,i + · · · + γt,i .

Similarly, for i + 1, the evaluation is denoted as follows:


Λ(αi+1 ) = 1 + Λ1 αi+1 + Λ2 (αi+1 )2 + · · · + Λt (αi+1 )t
= 1 + Λ1 αi α + Λ2 (αi )2 α2 + · · · + Λt (αi )t αt
= 1 + γ1,i α + γ2,i α2 + · · · + γt,i αt
=: 1 + γ1,i+1 + γ2,i+1 + · · · + γt,i+1 .

Channel Coding COD, TUM


5.2 Syndrome-Based Unique Decoding 75

Thus, γj,i+1 = γj,i · αj and


X
t
1+ γj,i = 0 ⇐⇒ Λ(αi ) = 0. (5.9)
j=1

The root-finding process can therefore be accelerated by starting with i = 0


and checking whether the sum in (5.9) is zero (if true, αi is a root) and
iterate through all elements of Fq , i.e., up to i = q − 1. In each step we only
have to multiply the current γj,i ’s by αj . This root-finding is more efficient
than brute-force root finding since a multiplication by the constants αj is
cheaper than general variable multiplications.

5.2.4 Finding the Error Values

By means of the previous subsections, we can determine the ELP, the


EEP, and the set of error locations. Based on this knowledge, we want
to reconstruct the error values in this subsection.

Given: ELP Λ(x), EEP Ω(x), error locations E.


Task: find the error values ei , for all i ∈ E.

To solve this task, we derive the so-called Forney formula for error evaluation.
Ps
We use the standard derivative of a polynomial a(x) = i=0 ai xi , defined
by
X
s
a0 (x) := iai xi−1 .
i=1
By using the standard rule for the derivative of a product of two polynomials
(a(x)b(x))0 = a(x)0 b(x) + a(x)b0 (x), the derivative of the ELP is
X Y
Λ0 (x) = (−αℓ ) (1 − αj x).
ℓ∈E j∈E\{ℓ}

That means, for all i ∈ E, we obtain


Y
Λ0 (αi−1 ) = −αi (1 − αj αi−1 ). (5.10)
j∈E\{i}

Evaluating the EEP for all i ∈ E gives


Y
Ω(αi−1 ) = ei νi (1 − αj αi−1 ). (5.11)
j∈E\{i}

Plugging (5.10) into (5.11) and solving it for ei results in Forney’s formula
for error evaluation. For all i ∈ E, we can calculate ei by:
αi Ω(αi−1 )
ei = − · 0 −1 . (5.12)
νi Λ (αi )

COD, TUM Channel Coding


76 Chapter 5 Reed–Solomon Codes

5.2.5 Unique Decoding: Overview


The following table summarizes the different steps of unique decoding and
the input and output to the different steps.

Table 5.1: Overview of unique syndrome decoding steps for RS codes


Step Input Output
Syndrome computation r(x), H S(x)
Solve the key equation S(x) Λ(x), Ω(x)
Find the error locations Λ(x) E
Find the error values Λ(x), Ω(x), E ei , ∀i ∈ E

From the set of error positions E and the error values ei , we can reconstruct
the error word e = (e0 , e1 , . . . , en−1 ) and therefore the codeword c = r − e.

5.3 Interpolation-Based Unique Decoding


The previous section deals with syndrome-based unique decoding of (G)RS
codes whose main step consists of solving a key equation. In this section, we
consider an interpolation-based method. We will see that such an approach
has some advantages compared to the syndrome-based approach. Namely,
no syndrome calculation is necessary and the approach directly yields the
evaluation polynomial u(x) of deg u(x) < k (i.e., the message polynomial).
From a didactic point of view, the main advantage is that this approach can
be generalized to list decoding which is done in the next section.
0
Note that in this section, for simplicity, we choose ν00 , ν10 , . . . , νn−1 = 1.
Interpolation-based decoders (both, unique and list decoders) rely on two
steps:
1. Interpolation of a multi-variate polynomial,
2. Factorization of this polynomial.
The task of the interpolation step of the unique decoder (introduced by
Welch and Berlekamp) is shown in the following.

j k
Given: received word r such that r = c + e, where wt(e) ≤ τ := d−1
2
and c ∈ RS(n, k).
Task: Find a bivariate polynomial Q(x, y) = Q0 (x)+Q1 (x)·y such that
• Condition 1: Q(αi , ri ) = 0, ∀i = 0, . . . , n − 1,
• Condition 2: deg Q0 (x) < n − τ ,
• Condition 3: deg Q1 (x) < n − τ − (k − 1).

Channel Coding COD, TUM


5.3 Interpolation-Based Unique Decoding 77

Theorem 17 (Interpolation Step). There is at least one non-zero polynomial


Q(x, y) which satisfies the previous three conditions.

Proof. Condition 1 is a homogeneous linear system of equations with n equations.


The number of unknowns in this linear system of equations is the number of
Pn−τ −1 Pn−τ −(k−1)−1
coefficients of Q0 (x) = i=0 Q0,i xi and Q1 (x) = i=0 Q1,i xi , which
is: n − τ + n − τ − (k − 1) ≥ n + 1 (Conditions 2 and 3).
Since the number of coefficients is larger than the number of equations, there is
a non-zero solution Q(x, y).

Thus, we can calculate a suitable non-zero bivariate polynomial Q(x, y) by


solving a linear system of equations.
The second step, the factorization step has the following task.

Given: a non-zero bivariate interpolation polynomial Q(x, y) satisfying


Conditions 1, 2, 3.
Task: find u(x).

Theorem
j k 18 (Factorization Step). If c = eval(u(x)) and wt(e) ≤ τ =
d−1
2
, then Q(x, u(x)) = 0 and u(x) = −Q 0 (x)
Q1 (x)
.

Proof. The bivariate polynomial Q(x, y) satisfies Q(αi , u(αi )+ei ) = 0 (Condition 1).
Since ei = 0 for at least n − τ positions (the error-free positions), the univariate
polynomial Q(x, u(x)) = Q0 (x) + Q1 (x) · u(x) has at least n − τ roots, namely
the αi where u(αi ) = ri .
Any polynomial with at least b roots has degree at least b, hence, Q(x, u(x)) has
degree at least n − τ .
However, deg Q(x, u(x)) ≤ max{deg Q0 (x), deg Q1 (x) + deg u(x)} < n − τ .
This is a contradiction and to fulfill both constraints, Q(x, u(x)) = 0.
−Q0 (x)
Hence, Q0 (x) + u(x)Q1 (x) = 0 and u(x) = Q1 (x) .

Thus, given Q(x, y), we can simply divide −Q0 (x) by Q1 (x) (by standard
polynomial division) and get the message polynomial u(x).
This decoding principle was introduced by Welch and Berlekamp and is
summarized in the following algorithm.
From the previous theorems, we know that a non-zero solution for Q(x, y)
by the linear system of equations in the Welch–Berlekamp
j k decoder exists
and that u(x) defines the sent codeword if wt(e) ≤ 2 .
d−1

COD, TUM Channel Coding


78 Chapter 5 Reed–Solomon Codes

Algorithm 3: Welch–Berlekamp Unique Decoding Procedure


Input: received word r = (r0 , r1 , . . . , rn−1 )
1 Interpolation Step: solve the following linear system of equations:

 
Q0,0
 Q0,1 
  
α02 α0n−τ −1 r0 · α0 r0 ·
n−τ −(k−1)−1  ..   
1 α0 ... r0 ... α0  . 
 n−τ −(k−1)−1   0
1 α1 α12 ... α1n−τ −1 r1 r1 · α1 ... r1 ·
α1   Q0,n−τ −1   . 
. .  · 
. . .. .. .. .. .. .. ..  Q1,0   .. 
=
. . . . . . . . .  
n−τ −(k−1)−1 
 Q1,1  0
2
1 αn−1 αn−1 n−τ −1
. . . αn−1 rn−1 rn−1 · αn−1 . . . rn−1 · αn−1 
 .. 
 . 
Q1,n−τ −(k−1)−1

n−τ −(k−1)−1
X−1
n−τ X
2 Set Q0 (x) = Q0,i xi and Q1 (x) = Q1,i xi .
i=0 i=0

Q0 (x)
3 Factorization Step: calculate u(x) = −
Q1 (x)
Output: message word u

In terms of complexity, the syndrome-based decoder is usually faster than


the Welch–Berlekamp approach as the linear system of equations that has
to be solved is smaller (and additionally there exist faster algorithms such
as Berlekamp–Massey to solve it), but the Welch–Berlekamp principle can
be generalized to larger decoding radii as a list decoder as shown in the next
section.

5.4 List Decoding

In this section, an efficient list decoding algorithm is considered that can be


seen as a generalization of the Welch–Berlekamp interpolation-based unique
decoder.
j k The goal of list decoding is to increase the decoding radius beyond
d−1
2
, see also Section 2.4.6.

5.4.1 Sudan Algorithm

To generalize the Welch–Berlekamp algorithm, recall that its interpolation


step has to find a bivariate polynomial Q(x, y) = Q0 (x)+Q1 (x)·y satisfying
certain conditions. Sudan’s list decoding algorithm generalizes this approach
to a bivariate interpolation polynomial of larger y-degree.

The task of the interpolation step of Sudan’s list decoder is as follows.

Channel Coding COD, TUM


5.4 List Decoding 79

Given: received word r = c + e, where wt(e) ≤ τ and c ∈ RS(n, k)


and a fixed integer ℓ ≥ 1.
Task: Find a bivariate polynomial

Q(x, y) = Q0 (x) + Q1 (x) · y+Q2 (x) · y 2 + · · · + Qℓ (x) · y ℓ

such that:
• Condition 1: Q(αi , ri ) = 0, ∀i = 0, . . . , n − 1,
• Condition 2: deg Qj (x) < n − τ − j(k − 1), for all j = 0, . . . , ℓ.

j k
d−1
Note that for ℓ = 1 and τ = 2
, this is the Welch–Berlekamp interpolation
step.

Theorem 19 (Interpolation Step). If

ℓ ℓ
τ< n − (k − 1), (5.13)
ℓ+1 2
there is at least one non-zero polynomial Q(x, y) which satisfies the previous
conditions.

Proof. Condition 1 is again a homogeneous linear system of equations with n


equations.
The number of unknowns in this linear system of equations is the number of
coefficients of the Qi (x), which is n − τ + n − τ − (k − 1) + n − τ − 2(k − 1) +
... + n − τ − ℓ(k − 1) = (ℓ + 1)(n − τ ) − 12 ℓ(ℓ + 1)(k − 1).
If the number of coefficients is larger than the number of equations, there is a

non-zero Q(x, y) satisfying both conditions. This is true if τ < ℓ+1 n− 2ℓ (k−1).

The restriction of τ from Theorem 19 results in certain rate restrictions.


For ℓj = k1, Equation (5.13) gives τ < n−k+1 2
= d2 which is equivalent to
τ ≤ d−12
, i.e., the unique decoding radius.

For ℓ = 2, Equation (5.13) only gives τ ≥ d


2
if R < 1
3
+ n1 .
For general ℓ, Equation (5.13) only gives τ ≥ d
2
if R < 1
ℓ+1
+ n1 .
That means, that Sudan’s list decoder only improves upon the decoding
radius for low-rate RS codes.
Note that there is another technical restriction. We should also have τ <
n − ℓ(k − 1) such that the Qj (x)’s have non-negative degree.
The factorization step of Sudan’s algorithm is similar to Welch–Berlekamp
but cannot be solved by a simple polynomial division.

COD, TUM Channel Coding


80 Chapter 5 Reed–Solomon Codes

Theorem 20. If c = eval(u(x)) and wt(e) ≤ τ < ℓ


ℓ+1
n − 2ℓ (k − 1), then
Q(x, u(x)) = 0.

Proof. Similar to Welch–Berlekamp unique decoding.

Since the y-degree of Q(x, y) is ℓ, there are at most ℓ polynomials p(x)


with deg p(x) < k such that Q(x, p(x)) = 0. One of these polynomials
corresponds to the sent codeword (due to Theorem 20). That means that
Sudan’s list decoder is a (τ, ℓ)-list decoder which returns all codewords (of
which there exist at most ℓ) in radius τ around the received word.

Note that for reasonable parameters of the RS code, the probability that
ℓ > 1 is usually very small.

Algorithm 4: Sudan List Decoding Procedure


Input: received word r = (r0 , r1 , . . . , rn−1 )
1 Interpolation Step: solve the linear system of equations
corresponding to

Q(αi , ri ) = 0, ∀i = 0, . . . , n − 1

with the mentioned degree constraints.


2 Factorization Step: Find all u(x) with deg u(x) < k such that
Q(x, u(x)) = 0.
Output: list of all u = (u0 , u1 , . . . , uk−1 )

In this lecture, we do not go into detail on how to accomplish the factorization


step but there are efficient algorithms that can factorize such a bivariate
polynomial and find all y-roots.

From the previous theorems, we know that a non-zero solution of this system
of equations exists and that one of the u(x)’s defines the sent codeword if

wt(e) < ℓ+1 n− 2ℓ (k −1). However, this only works for low code rates R ≲ 13 .

5.4.2 Idea of Guruswami–Sudan Algorithm

As we have seen in the previous section, Sudan’s list decoder only increases
the decoding radius for low-rate GRS codes. A further generalization to
higher code rates was later suggested by Guruswami and Sudan.

The idea was to generalize the Sudan and Welch–Berlekamp algorithms by


using multiple roots of the bivariate polynomial. In this lecture, we do not
go into detail of this algorithm, we only shortly state the interpolation task.

Channel Coding COD, TUM


5.4 List Decoding 81

Given: received word r = c + e, where wt(e) ≤ τ and c ∈ RS(n, k).


Let ℓ ≥ 1 and s < ℓ be fixed integers.
Task: Find a bivariate polynomial

Q(x, y) = Q0 (x) + Q1 (x) · y + Q2 (x) · y 2 + · · · + Qℓ (x) · y ℓ

such that:
• Condition 1: Q(αi , ri ) = 0 with multiplicity s, ∀i = 0, . . . , n − 1,
• Condition 2: deg Qj (x) < s · (n − τ ) − j(k − 1), for all j = 0, . . . , ℓ.

Similar to Welch–Berlekamp’s and Sudan’s algorithms, it can be shown that


for τ < n(2ℓ−s+1)
2(ℓ+1)
− ℓ(k−1)
2s
, there exists a non-zero Q(x, y) satisfying these
conditions.
For this restriction on τ it can then also be shown that Q(x, u(x)) = 0.
A detailed analysis shows that we get an improvement in the decoding radius
compared to unique decoding if R ≤ ℓ+1 s
+ n1 and τ < n(2ℓ−s+1) − ℓ(k−1) .
q 2(ℓ+1) 2s
There is always a value for s such that τ < n − n(n − d) is possible with
list size ℓ ≤ τ 2 −(2τ
nd
−d)n
.
Figure 5.3 compares the relative decoding radii for unique decoding, the
Sudan algorithm and the Guruswami–Sudan algorithm.

1
Unique
Sudan, ℓ = 2
0.8 Sudan, ℓ = 3
Guruswami-Sudan

0.6
τ /n

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R

Figure 5.3: Comparison of unique decoding, the Sudan algorithm and the
Guruswami–Sudan algorithm

COD, TUM Channel Coding


Chapter 6

Cyclic Codes
Cyclic codes provide certain practical advantages in terms of efficiency and
fast implementations as they can be described compactly (by a generator or
parity-check polynomial) and encoding and decoding can be done by means
of shift registers.
The recommended literature for this chapter is [1, Chapter 8].

6.1 Definition and Properties

6.1.1 Definition
An interesting property of codes with several practical advantages is cyclicity,
defined as follows.

Definition 32 (Cyclic Code). A code is cyclic, if any cyclic shift of a


codeword is again a codeword, i.e.,
(c0 , c1 , . . . , cn−1 ) ∈ C =⇒ (cn−1 , c0 , . . . , cn−2 ) ∈ C.

The previous definition clearly implies that also a shift by i positions is


again a codeword.
The following example shows some cyclic and non-cyclic codes.

Example 6.1 (Cyclic Codes).


First, consider the [3, 2, 2]2 binary SPC code:
C1 = {(000), (011), (101), (110)}.
This code is linear and cyclic:
(011) ∈ C1 −−−−−−→ (101) ∈ C1 −−−−−−→ (011) ∈ C1 ,
shift right shift right

(000) → (000).
84 Chapter 6 Cyclic Codes

Second, a linear, but non-cyclic code is the following code:


C2 = {(000), (001), (010), (011)}.
(001) ∈ C2 −−−−−−→ (100) ∈
/ C2 .
shift right

Third, a non-linear, but cyclic code is:


C3 = {(100), (010), (001)}
(100) ∈ C3 −−−−−−→ (010) ∈ C3 −−−−−−→ (001) ∈ C3 .
shift right shift right

This code is non-linear since, e.g., 0 ∈


/ C3 and (110) = (100) + (010) ∈
/ C3 .

In the rest of this chapter we deal only with linear cyclic codes.

For cyclic codes, the polynomial description of all words frequently simplifies
notations. For this purpose, we associate each vector (c0 , c1 , . . . , cn−1 ) ∈ Fnq
with a polynomial c(x) := c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 ∈ Fq [x].

A cyclic shift of the vector corresponds then to

cn−1 +c0 x+· · ·+cn−2 xn−1 = x·c(x)−cn−1 ·(xn −1) = x·c(x) mod (xn −1).

The cyclic shift of the respective vector is illustrated in Figure 6.1.

c = ( c0 , c1 , c2 , . . . , cn−1 )
·x
( , c0 , c1 , . . . , cn−2 ) cn−1

mod xn − 1
( cn−1 , c0 , c1 , . . . , cn−2 ) cn−1

Figure 6.1: Connection between the cyclic shift of a codeword c and the
multiplication by x of the respective polynomial c(x).

That means, a linear code is cyclic if and only if

c(x) ∈ C =⇒ x · c(x) mod (xn − 1) ∈ C.

Hence, if c(x) is a codeword of C, so is xi · c(x) mod (xn − 1) for any i.


P
That means that for a linear code C also i ui xi · c(x) mod (xn − 1) is a
codeword, i.e., for every a(x) ∈ Fq [x]:

c(x) ∈ C =⇒ a(x) · c(x) mod (xn − 1) ∈ C. (6.1)

Channel Coding COD, TUM


6.1 Definition and Properties 85

6.1.2 Generator and Parity-Check Polynomials

Similar to the generator and parity-check matrices, for cyclic codes, we can
consider a generator polynomial and a parity-check polynomial (cf. [1]).

Theorem 21 (Generator Polynomial). Let C be a cyclic [n, k, d]q code.


Then, there is a unique monic polynomial g(x) such that for every c(x) ∈
Fq [x] of degree at most n − 1:

c(x) ∈ C ⇐⇒ g(x) | c(x).

Proof. First notice that if g(x) exists, it must be a codeword of minimum degree
since g(x) | g(x) and it is unique because it divides all other codewords.
First, we prove (⇐=): select g(x) to be a monic polynomial of smallest degree in
C. Note that it is always possible to choose g(x) monic as we can multiply any
non-monic codeword by a scalar such that it is monic and it is still a a codeword
due to linearity of the code. From (6.1), we know that for all u(x) ∈ Fq [x],
it holds that u(x) · g(x) mod (xn − 1) ∈ C. In particular, for every u(x) with
deg u(x) < n − deg g(x) we know u(x) · g(x) ∈ C. Hence, all polynomial multiples
of g(x) are codewords of C.
Second, we prove (=⇒). Let c(x) = u(x) · g(x) + r(x) with deg r(x) < deg g(x).
Since g(x) ∈ C and due to linearity, it follows that r(x) = c(x) − u(x) · g(x) ∈ C.
Since g(x) is a minimum-degree codeword and deg r(x) < deg g(x), we get r(x) =
0, thus g(x) | c(x).

The polynomial from the previous theorem is called generator polynomial.


Its degree is deg g(x) = n − k. Due to Theorem 21, a cyclic code can be
defined by:

C = {u(x) · g(x) : u(x) ∈ Fq [x] and deg u(x) < k}. (6.2)

Lemma 10 (Property of the Generator Polynomial, [1, p. 246]). Let g(x)


be the generator polynomial of a cyclic [n, k, d]q code. Then, g(x)|(xn − 1).

Proof. Observe that xn − 1 = h(x) · g(x) + r(x) with deg r(x) < deg g(x), which
means that r(x) = −h(x)g(x) mod (xn − 1). Since g(x) ∈ C and due to the
linearity of C, we have r(x) ∈ C. But since deg r(x) < deg g(x) and g(x) has
minimum degree of all codewords, r(x) = 0. Therefore, g(x) divides xn − 1.

The converse also holds: if g(x) is a polynomial in Fq [x] that divides xn − 1,


then (6.2) is a cyclic code (cf. [1]).
Similar to the parity-check matrix, also a parity-check polynomial can be
defined.

COD, TUM Channel Coding


86 Chapter 6 Cyclic Codes

Definition 33 (Parity-Check Polynomial). Let g(x) be the generator


polynomial of a cyclic [n, k, d]q code. The parity-check polynomial h(x)
is the monic polynomial of degree k obtained by:
xn − 1
h(x) := .
g(x)

6.1.3 Generator and Parity-Check Matrix


The generator and parity-check matrices for linear cyclic codes can be
calculated from the generator and parity-check polynomials as follows.
Denote the coefficients of the generator polynomial by g(x) = g0 + g1 x +
g2 x2 + · · · + gn−k xn−k . Then, calculating u(x) · g(x) is equivalent to:
 
g0 g1 . . . gn−k
 
 g0 g1 . . . gn−k 
 
(u0 , u1 , . . . , uk−1 ) ·  .. .. .. .. .
 . . . . 
 
g0 g1 . . . gn−k
| {z }
=G
Thus, the upper matrix G is a generator matrix for the linear cyclic code
defined by g(x).
The corresponding parity-check matrix is then given in the following theorem.

Theorem 22 (Parity-Check Matrix, [1, Proposition 8.3]). Let C be a cyclic


[n, k, d]q code with parity-check polynomial h(x) = h0 +h1 x+h2 x2 +· · ·+hk xk .
The (n − k) × n matrix
 
hk hk−1 . . . h0
 
 hk hk−1 . . . h0 
 
H= . . . . 
 .. .. .. .. 
 
hk hk−1 . . . h0
is a parity-check matrix of C.

Proof. We have to verify that G · HT = 0 holds. Note that rank(H) = n − k and


rank(G) = k holds. We therefore rewrite the following matrix equation:
 
hk
  
 ..


g0 . . . gn−k  . hk 
   
 g0 . . . gn−k   . .. 
 ·
 h0 .. . 
=0
 .. .. ..  
 . . .   h0 hk 
 .. 
g0 . . . gn−k  .. 
 . .
h0

Channel Coding COD, TUM


6.1 Definition and Properties 87

Let i ∈ {0, . . . , k − 1} indicate the i-th row of G and l ∈ {0, . . . , n − k − 1} the


l-th column of HT . The matrix equation is then equivalent to:

X
n−1
g(j−i) mod n h(k+l−j) mod n = 0, (6.3)
j=0

with gj = 0 ∀j ∈ {n − k + 1, . . . , n − 1},
hj = 0 ∀j ∈ {k + 1, . . . , n − 1}.

The expression in (6.3) is the coefficient of xk+l−i in the product g(x)·h(x) = xn −


1. For 1 ≤ k + l − i ≤ n − 1, the coefficient of xk+l−i equals 0 and GHT = 0.

Theorem 23 (Generator Polynomial of Dual Code, [1, p. 246]). Let C be


a cyclic [n, k, d]q code and let h(x) be its parity-check polynomial. Then, the
dual code of C is a cyclic [n, n − k, d⊥ ]q code with generator polynomial

xk · h(x−1 )
g ⊥ (x) = .
h(0)

Proof. We see that deg g ⊥ (x) = n−k ⊥ = k and deg h⊥ (x) = k ⊥ = n−k. Further
we construct via Theorem 22 the parity-check matrix H. Since h(0) 6= 0, H0 G⊥
has full rank. We have:

 
hk hk−1 . . . h0
 
 hk hk−1 . . . h0 
⊥ 
G =H= 
.. .. .. .. 
 . . . . 
hk hk−1 . . . h0
 ⊥ 
g0 g1⊥ . . . gn−k


 g0⊥ g1⊥ ... ⊥
gn−k 
 ⊥ 
= h(0) · 
 .. .. .. ..
.

 . . . . 

g0 ⊥
g1 ⊥
. . . gn−k ⊥

Reversing the order of the coefficients of h(x) is achieved by choosing g ⊥ (x) as


in the statement.

Second we have to prove that g ⊥ (x) actually defines a cyclic code, i.e., that it
Q
divides xn − 1. Let h(x) = kj=1 (x − αji ), where {j1 , j2 , . . . , jk } ⊂ {0, . . . , n − 1}
and α is an element of order n in the splitting field Fqs . We know that h(x)|(xn −
Q
1). Further h(0) = h0 = (−1)k kj=1 αji .

COD, TUM Channel Coding


88 Chapter 6 Cyclic Codes

We obtain:
xk · h(x−1 ) xk Yk
= (x−1 − αji )
h(0) h0 j=1

1 Y k
= (1 − αji x)
h0 j=1
Qk
j=1 α
ji Y
k
= (α−ji − x)
h0 j=1

Y
k
= (−1)k (α−ji − x)
j=1

Y
k
= (x − α−ji ).
j=1
Qk
Since {−j1 , −j2 , . . . , −jk } are all distinct, g ⊥ (x) = j=1 (x − α
−ji ) | (xn − 1).

The following examples show that some previously considered classes of


codes are cyclic and give their generator and parity-check polynomials.

Example 6.2 ([n, 1, n]2 Repetition Code).


The repetition code is clearly cyclic as it consists only of the all-one and the
all-zero word. The generator and parity-check polynomials are:
• gRP (x) = 1 + x + x2 + · · · + xn−1 ,
xn − 1
• hRP (x) = = x − 1.
gRP (x)

Example 6.3 ([n, n − 1, 2]2 Binary Single-Parity Check (SPC) Code).


Recall that this is the dual code to the repetition code. Thus, its generator
polynomial can be calculated by Theorem 23.
xk hRP (x−1 )
• gSPC (x) = = x − 1,
hRP (0)

xn − 1 Y
n−1
• hSPC (x) = = (x − αi ) = 1 + x + x2 + · · · + xn−1 .
gSPC (x) i=1

In this example, h(x) = g ⊥ (x) and g(x) = h⊥ (x), but in general this is not
true (see also Example 6.4).

Example 6.4 (Primitive RS Code RS(n, k) over Fq ).


Recall that for a primitive RS code, n divides q − 1 and ord(α) = n. The
code locators are αi = αi , i = 0, . . . , n − 1, and the column multipliers are
ν00 , . . . , νn−1
0 = 1.
From Lemma 8, we know that the parity-check matrix of a primitive RS code

Channel Coding COD, TUM


6.2 BCH Codes 89

is  
1 α ... αn−1
1 α2 α2(n−1) 
 ... 
HRS =
 .. .. .. .. .

. . . . 
1 αn−k . . . α(n−1)(n−k)

The vector representation HRS · cT = 0 implies that for any codeword c(x),
we have c(α) = c(α2 ) = · · · = c(αn−k ) = 0.
First, we prove that a primitive RS code is cyclic. Let

ce(x) := x · c(x) − cn · (xn − 1) = x · c(x) mod (xn − 1).

Then, ce(αℓ ) = αℓ ·c(αℓ )−cn ((αℓ )n −1) = αℓ ·c(αℓ ) and therefore also ce(αℓ ) = 0
for ℓ = 1, . . . , n − k. Thus, ce(x) is also a codeword of the RS code and the
primitive RS code is cyclic.
Second, we want to determine the generator polynomial of the primitive
RS code. Since for any codeword, we have c(α) = c(α2 ) = · · · = c(αn−k ) = 0,
it has αℓ , ℓ = 1, . . . , n − k, as roots and

g(x) = (x − α) · (x − α2 ) · · · (x − αn−k ).

We can calculate the parity-check polynomial by


xn − 1
h(x) = = (x − αn−k+1 ) · (x − αn−k+2 ) · · · (x − αn )
g(x)
Y
n−1
= (x − 1) (x − αi ).
i=n−k+1

Similarly, from the form of the generator matrix of a primitive RS code


(see (5.3)) it follows that for any codeword c⊥ (x) of the dual code, we
have c⊥ (1) = c⊥ (α) = · · · = c⊥ (αk−1 ) = 0 and therefore the generator
polynomial of the dual code is

g ⊥ (x) = (x − 1) · (x − α) · · · (x − αk−1 ).

Notice that this is unequal to h(x) (see also Theorem 23).


The parity-check polynomial of the dual code is therefore
xn − 1
h⊥ (x) = = (x − αk ) · (x − αk+1 ) · · · (x − αn−1 ).
g ⊥ (x)

6.2 BCH Codes


The second part of this chapter deals with special cyclic codes, the
Bose–Ray-Chaudhuri–Hocquenghem (BCH) codes.
For the definition of BCH codes, cyclotomic cosets and minimal polynomials
are needed, see Section 3.5.

COD, TUM Channel Coding


90 Chapter 6 Cyclic Codes

6.2.1 Definition
For the definition of BCH codes, we need a union of cyclotomic cosets,
denoted by D and called defining set of the code.

Definition 34 (BCH Code). Let D = Ci1 ∪ Ci2 ∪ · · · ∪ Ciℓ be the union of


ℓ ≥ 1 distinct cyclotomic cosets with respect to n (which divides q s − 1). Let
α ∈ Fqs be an element of order n. Then, an [n, k, d]q BCH code is defined
by the following generator polynomial:
Y
g(x) = (x − αi ).
i∈D

The following lemma shows that not all lengths are possible when constructing
a BCH code, in particular the length should be co-prime with the characteristic
of the field.

Lemma 11. n | (q s − 1) implies that gcd(n, q) = 1.

Proof. n | (q s − 1) means
∃ k ∈ N, such that k · n = q s − 1.
Moreover, assume now gcd(n, q) = c with c ∈ N, then it follows
∃ a, b ∈ N, such that
a · c = q,
b · c = n.
Replacing n and q with this gives
k · bc = (ac)s − 1
k · bc = as cs − 1
 
c −k · b + as cs−1 = 1.
Since k, a, b, c ∈ N it needs to hold that c = 1.

Consequently, only certain choices of q, s, and n allow to construct BCH


codes. For example, in the binary case (q = 2) there exist no even-length
BCH codes.
Since the i ranges over a union of cyclotomic cosets, g(x) is a multiple of
one or more minimal polynomials and g(x) ∈ Fq [x]. Thus, the BCH code is
a q-ary code.
The degree of the generator polynomial is deg g(x) = n − k = |D| =
Pℓ
j=1 |Cij |. The dimension of the BCH code is k = n − |D| and we note that
not all values from {1, . . . , n} are possible choices for k.
When n = q s − 1, we call the code a primitive BCH code.

Channel Coding COD, TUM


6.2 BCH Codes 91

6.2.2 The BCH Bound

In the previous section, we have seen that length and dimension of a BCH
code directly follow from the definition by the defining set. In this subsection,
we bound the minimum distance d of a BCH code constructed by g(x) =
Q
i∈D (x − α ).
i

Theorem 24 (The BCH Bound). Let C be an [n, k, d]q cyclic code (BCH
code) where n | (q s − 1) and α is an element of order n. Assume that
{b, b + 1, . . . , b + δ − 2} ⊆ D for some integers b and δ ≥ 2. Then d ≥ δ.

Proof. The polynomial ge(x) := (x − αb ) · · · (x − αb+δ−2 ) divides g(x) as {b, b +


1, . . . , b + δ − 2} is a subset of the defining set.
The polynomial ge(x) is the generator polynomial of an RS(n, n − δ + 1) code over
Fqs of minimum distance δ with αj = αj and νj = αbj for j = 0, . . . , n − 1.
Therefore, every codeword of C is also a codeword of RS(n, n − δ + 1) and C is a
subcode of RS(n, n − δ + 1).
Therefore, the minimum distance of C is at least the one of the RS code and
d ≥ δ.

That means that δ − 1 consecutive roots in the defining set result in a


minimum distance of at least δ. However, the true minimum distance d
can be larger than the BCH bound δ. Note that there are also better lower
bounds on the minimum distance based on the structure of the defining set
but they are not part of this lecture.
In the proof of Theorem 24, we have seen that every codeword of an [n, k, d ≥
δ]q BCH code C is also a codeword of an RS(n, n − δ + 1) code over Fqs .
Hence, C ⊆ (Fnq ∩ RS(n, n − δ + 1)).
On the other hand, any RS codeword c(x) with coefficients in Fq has to
be a multiple of one or more minimal polynomials while at the same time
being a multiple of ge(x) (defined as in the proof of Theorem 24). This is
only possible if c(x) is divisible by the generator polynomial of C, i.e., g(x).
That means, C ⊇ (Fnq ∩ RS(n, n − δ + 1)).
Thus,
C = Fnq ∩ RS(n, n − δ + 1),
i.e., all codewords of the RS code which lie in the subfield Fq are codewords
of the BCH code. BCH codes are subfield subcodes of Reed–Solomon codes.
The advantage of BCH codes compared to RS codes is the small field
size. In particular, in many practical applications, binary codes are needed.
However, this comes at the cost of code rate as BCH codes usually do not
achieve the Singleton bound and for the same minimum distance have a
smaller dimension than the corresponding RS code.

COD, TUM Channel Coding


92 Chapter 6 Cyclic Codes

6.2.3 Special BCH Codes

Some classes of cyclic codes that we have treated previously can be seen as
special BCH codes.

Example 6.5 ([n, 1, n]2 Repetition Code).


Qn−1
• g(x) = 1 + x + x2 + · · · + xn−1 = i=1 (x − αi ).
S
This is a BCH code with defining set D = i\{0} Ci .

Example 6.6 ([n, n − 1, 2]2 Single-Parity Check Code).


This is the dual code to the repetition code.
• g(x) = x − 1.
This is a BCH code with defining set D = C0 .

Example 6.7 ([2m − 1, 2m − 1 − m, d = 3]2 Binary Hamming Code).


The BCH code with D = C1 = {1, 2, 4, . . . , 2m−1 } is a Hamming code. Its
BCH bound is determined by {1, 2} ⊆ D, thus δ ≥ 3.
Note that there is an equivalent non-cyclic Hamming code (by permuting the
columns of the parity-check matrix).

Recall that codes that satisfy the sphere-packing bound (Section 4.3.1) with
equality are called perfect codes as the whole space is filled when decoding
spheres of radius b d−1
2
c are drawn around each codeword.
The following Golay codes are one of the few non-trivial perfect codes.

Example 6.8 ([23, 12, 7]2 Binary Golay Code).


The binary Golay code is defined by the following parameters:
• n = 23 | (211 − 1), hence there is an element α ∈ F211 of order 23 which
can be used to define the generator polynomial.
• The defining set is chosen to be D := C1 = {1, 2, 4, 8, 16, 9, 18, 13, 3, 6, 12},
Q
i∈C1 (x − α
• which means g(x) = i) = x11 + x9 + x7 + x6 + x5 + x + 1.
From the definition of the generator polynomial, we obtain the dimension:
k = n − |C1 | = 12.
From the defining set, we can calculate the BCH bound: we can find four
consecutive roots 1, 2, 3, 4 which results in the BCH bound d ≥ δ = 5.
However, the true minimum distance is d = 7.

Channel Coding COD, TUM


6.2 BCH Codes 93

By the sphere-packing bound, this is a perfect code.

Example 6.9 ([11, 6, 5]3 Ternary Golay Code).


Similar to the binary Golay code,
• n = 11 | (35 − 1), hence there is α ∈ F35 of order 11 which can be used
to define the generator polynomial.
• C1 = {1, 3, 9, 5, 4},
Q
• g(x) = i∈C1 (x − αi ) = x5 + x4 − x3 + x2 − 1.
The dimension is k = n − |C1 | = 6.
For the BCH bound, we can find three consecutive roots 3, 4, 5 which give
d ≥ δ = 4. However, the true minimum distance is d = 5.
By the sphere-packing bound, this is a perfect code.

6.2.4 Decoding

Since every [n, k, d ≥ δ]q BCH code C is a subfield subcode of an RS(n, n −


δ + 1) code over Fqs , we can simply decode in the RS code. That means,
we treat the received word as a corrupted RS codeword and decode it with
the decoders that we have considered in Chapter 5.
j k
δ−1
This way, unique decoding up to half the BCH bound, i.e., up to 2
errors is possible with any unique RS decoder.
Similarly, we can directly use the Sudan and Guruswami–Sudan list decoding
algorithms to list decode up to the same radius as the RS(n, n−δ +1) code.
However, since the BCH code contains fewer codewords than the overlying
RS code and has a larger true minimum distance, improvements are possible
(which are not considered in this lecture).

COD, TUM Channel Coding


Chapter 7

Reed–Muller Codes
The recommended literature for this chapter is [3, Sections 5.1, 5.2].

7.1 First-Order Reed–Muller Codes


We start this chapter with first-order Reed–Muller (RM) codes and their
unique decoding. Note that in this chapter, we consider only binary codes.

7.1.1 Definition and Construction

Definition 35 (First-Order RM Code). A binary first-order


Reed–Muller code RM(1, m) is defined by a generator matrix which
contains all 2m binary vectors of length m as columns and additionally the
all-one row.

A code constructed by this definition has the following parameters.

Theorem 25 (Parameters of First-Order RM Codes). The RM(1, m) code


is a [2m , m + 1, 2m−1 ]2 code.

The length and dimension follow from the construction, the minimum distance
is proved later for the recursive construction (see Theorem 26).
96 Chapter 7 Reed–Muller Codes

Example 7.1 (First-Order RM(1, 4) Code). For m = 4, we obtain a


[16, 5, 8]2 code with generator matrix
 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 
 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 
 
G= 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1  .
 
 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

From an RM(1, m) code, we can construct an RM(1, m+1) code recursively.

Theorem 26 (Recursive Construction). Given an RM(1, m) code, we can


construct an RM(1, m + 1) code as follows:
n o
RM(1, m + 1) = (u, u + v) : u ∈ RM(1, m), v ∈ CRP ,

where CRP = {(0, . . . , 0), (1, . . . , 1)} ∈ F22 is a repetition code of length 2m .
m

Proof. We have to prove that this is indeed an RM(1, m + 1) code, i.e., it has
parameters [2m+1 , m + 2, 2m ]2 .

• Length: nRM(1,m+1) = 2 · nRM(1,m) = 2m+1 .


• Dimension: kRM(1,m+1) = kRM(1,m) + 1 = m + 2. This is true as the
repetition code for v doubles the number of codewords.
• Minimum distance: d = 2m follows from the weight distribution shown in
Lemma 12.

Lemma 12 (Weight Distribution of First-Order RM Codes). The RM(1, m)


code has:
• one codeword of weight 0,
• one codeword of weight n = 2m ,
• 2m+1 − 2 codewords of weight 2m−1 .

Proof. Clearly, 0 := (0, 0, . . . , 0) ∈ RM(1, m) and has weight 0.


Also, 1 := (1, 1, . . . , 1) ∈ RM(1, m) and has weight n = 2m .
n
We therefore are left to prove that all other codewords have weight 2m−1 = 2.
This is true for m = 1, where
!
1 1
GRM(1,1) = ,
0 1

Channel Coding COD, TUM


7.1 First-Order Reed–Muller Codes 97

and RM(1, 1) = {(0, 0), (1, 1), (0, 1), (1, 0)}.
Now, let us show that RM(1, m + 1) has 2m+2 − 2 codewords of weight 2m
if RM(1, m) has 2m+1 − 2 codewords of weight 2m−1 . The codewords from
RM(1, m + 1) are constructed recursively by (u, u + v) as in Theorem 26.
First consider all words that were constructed with v = 0 and u ∈ / {0, 1}. (Notice
that u ∈ {0, 1} are the two cases of weight-0 and weight-2 words as mentioned
m

above.) There are |RM(1, m)| − 2 = 2m+1 − 2 codewords of RM(1, m + 1) with


structure (u, u). Clearly, their weight is 2 · wt(u) = 2 · 2m−1 = 2m .
Second, consider v = 1 (and arbitrary u ∈ RM(1, m), including 0, 1). There are
|RM(1, m)| = 2m+1 such codewords of RM(1, m + 1) with structure (u, u + 1).
The weight of these words is wt(u) + length(u) − wt(u) = 2m .
Thus, in total there are 2m+1 − 2 + 2m+1 = 2m+2 − 2 codewords of RM(1, m + 1)
of weight 2m and each word of RM(1, m + 1) \ {0, 1} has weight 2m .

7.1.2 Unique Decoding

Let us now consider unique decoding of first-order RM codes. We thereby


consider an RM(1, m + 1) code which is constructed recursively from an
RM(1, m) code and a length-2m repetition code, denoted by Crep .

Given: received word r such that r = c + e,


jd k j k
RM(1,m+1) −1 2m −1
where wt(e) ≤ 2
= 2
= 2m−1 − 1 = dRM(1,m) − 1
and c ∈ RM(1, m + 1).

Task: find the codeword c.

For decoding, we split the received word into two halves: r = (r1 , r2 ) =
(u + e1 , u + v + e2 ).
Note that by simplej majority
k decision, the repetition code Crep of length 2m
can always correct 2 2−1 = 2m−1 − 1 errors uniquely.
m

The decoding is then done in two steps and summarized in Algorithm 5.


In the first step, we calculate rrep := r1 +r2 = v+e1 +e2 , which is a corrupted
j m k
codeword of the repetition code. Since wt(e1 + e2 ) ≤ wt(e1 , e2 ) ≤ 2 2−1 ,
this is decodable by the repetition code Crep and we can recover the vector v.
In the second step, first recall that we know r1 = u + e1 and from the first
decoding step r02 := r2 − v = u + je2 . Since wt(e
k 1 , e2 ) ≤ dRM(1,m) − 1, at
dRM(1,m) −1
least one of e1 or e2 has weight ≤ 2
. Therefore, decode both, r1
0
and r2 , in RM(1, m) and denote the results by u1 and u2 .
Let us now analyze how to know which
j d of the ktwo decoding results is the
RM(1,m) −1
correct word u. Assume wt(e1 ) ≤ 2
and therefore decoding is
correct and u1 = u. Then, wt(e2 ) ≤ dRM(1,m) − 1 − wt(e1 ). Since both,

COD, TUM Channel Coding


98 Chapter 7 Reed–Muller Codes

u1 = u and u2 are codewords of the RM(1, m) code, their distance is


d(u, u2 ) ≥ dRM(1,m) . Thus,

d(r02 , u2 ) ≥ d(u, u2 ) − d(u, r02 )


≥ dRM(1,m) − wt(e2 )
≥ wt(e1 ) + 1 = d(r1 , u) + 1,

where the first inequality follows by the triangle inequality.


That means, d(r1 , u1 = u) < d(r02 , u2 ) and vice versa d(r1 , u1 ) > d(r02 , u2 =
u) if r02 was decoded correctly to u. We can therefore simply decide for the
codeword of the RM(1, m) code with the smaller distance to the corresponding
“received” word. In case of equality, both were decoded correctly and the
results coincide.

Algorithm 5: Unique Decoding of First-Order RM Codes


Input: Received word r
1 Split received word into to halves: (r1 , r2 ) := r

2 Calculate rrep := r1 + r2
3 Majority decision on the symbols of rrep gives v0 ∈ Crep

4 Calculate r02 := r2 − v0
5 Decode r1 in RM(1, m) and denote the resulting codeword by u1
6 Decode r02 in RM(1, m) and denote the resulting codeword by u2
7 if d(r1 , u1 ) < d(r02 , u2 ) then
8 u0 := u1
9 else
10 u0 := u2
Output: Codeword (u0 , u0 + v0 ) of RM(1, m + 1)

7.2 Connection to Hamming and Simplex Codes


In this section, we analyze the connection of first-order RM codes to some
known classes of codes.
First, recall that the Hamming code H(m) is a [2m − 1, 2m − m − 1, 3]2 code
and its parity-check matrix HH(m) contains all 2m −1 non-zero binary vectors
of length m as columns. The extended Hamming code (see Example 4.16) is
a [2m , 2m − m − 1, 4]2 code and defined by the following parity-check matrix:
 
1 1...1 1
 
 0
 
HEH(m) = ..  = GRM(1,m) .
 H .
 H(m) 
0

Channel Coding COD, TUM


7.2 Connection to Hamming and Simplex Codes 99

Thus, the dual of the RM(1, m) code is the extended Hamming code EH(m).
Second, we define the Simplex code and analyze its connection to first-order
RM codes.

Definition 36 (Simplex Code). A binary Simplex code S(m) is defined


by its generator matrix GS(m) which contains all 2m − 1 binary non-zero
vectors of length m as columns.

From this definition, we see that the Simplex code S(m) is a shortened
first-order RM code RM(1, m) where the all-one row and the least-weight
(left-most column in Figure 7.1) were removed from the generator matrix.
 
 1 1 ... 1 
 
 0 0 ... 1 
 
GRM(1,m) =  
 .. .. .. 

 . . . 

 
0 1 ... 1
GS(m)

Figure 7.1: Connection between RM(1, m) and S(m) code

Theorem 27 (Parameters of Simplex Code). The Simplex code S(m) is a


[2m − 1, m, 2m−1 ]2 code.

Proof. The length and dimension follow from the definition of the generator
matrix.
For the minimum distance, note that from the generator matrix of the RM(1, m),
the all-one row and the left-most column were removed to obtain a generator
matrix of the S(m) code. After removing the all-one row, only codewords of
RM(1, m) remain that are 0 at the first position and therefore removing the first
column of the generator matrix does not decrease the distance.

Example 7.2 (Simplex code S(3)).


Consider the Simplex code S(3). The following matrix is a generator matrix:
 
 0 0 0 1 1 1 1 
GS(3) = 
 0 1 1 0 0 1 1

 m=3
 
1 0 1 0 1 0 1
23 − 1 = 7

COD, TUM Channel Coding


100 Chapter 7 Reed–Muller Codes

The weight of all non-zero codewords is four:


Row 1 = (0001111)
Row 2 = (0110011)
Row 3 = (1010101)
Row 1 + Row 2 = (0111100)
Row 1 + Row 3 = (1011010)
Row 2 + Row 3 = (1100110)
Row 1 + Row 2 + Row 3 = (1101001)

Figure 7.2 illustrates the connections of first-order Reed–Muller codes RM(1, m),
Hamming codes H(m), and Simplex codes S(m).

Reed-Muller RM(1, m)
[2m , m + 1, 2m−1 ]2

1. dual code: ext. Ham. shortening by one


[2m , 2m − m − 1, 4]2 position (first)
2. puncturing 1. extending
2. dual code lengthening

Hamming H(m) Simplex S(m)


[2 − 1, 2m − m − 1, 3]2
m
[2 − 1, m, 2m−1 ]2
m

dual code

Figure 7.2: Illustration of the connection of RM(1, m) with H(m) and S(m)

7.3 Reed–Muller Codes of Higher Order

In the following section, we deal with higher-order Reed–Muller codes and


start with their recursive construction.

Theorem 28 (Recursive Construction of RM Codes). Given an RM(r +


1, m) code and an RM(r, m) code, we can construct an RM(r + 1, m + 1)
code of order r + 1 as follows:

RM(r + 1, m + 1) = {(u, u + v) : u ∈ RM(r + 1, m), v ∈ RM(r, m)}.


Pr+1 m+1
This gives a [2m+1 , i=0 i
, 2m−r ]2 code.

The proof is obtained from the following theorem.

Channel Coding COD, TUM


7.3 Reed–Muller Codes of Higher Order 101

Theorem 29 (Recursive Construction, (u, u + v)-Construction). Given an


[n, ku , du ]2 code Cu and an [n, kv , dv ]2 code Cv , then
C := {(u, u + v) : u ∈ Cu , v ∈ Cv }
is a [2n, ku + kv , min{2du , dv }]2 code.

Proof. The length and the dimension are clear from the construction.
We are therefore left with proving the minimum distance.
Let a, b ∈ C with a 6= b and a = (u, u + v), b = (u0 , u0 + v0 ), where u, u0 ∈ Cu
and v, v0 ∈ Cv .
If on the one hand v = v0 , we have that d(a, b) = wt(u − u0 , u − u0 ) ≥ 2du and
there exist u, u0 such that wt(u − u0 , u − u0 ) = 2du .
If on the other hand v 6= v0 :

d(a, b) = wt(u − u0 , u + v − u0 − v0 ) = wt(u − u0 ) + wt(u + v − u0 − v0 )


≥ wt(u − u0 ) + wt(v − v0 ) − wt(u − u0 ) = wt(v − v0 ) ≥ dv ,

where we used that in general it holds that wt(a, b) ≥ wt(a) − wt(b). Further
there exists a v ∈ Cv such that wt(0, v) = dv which proves the equality for the
claim on the minimum distance.

Let us apply Theorem 29 to Reed–Muller codes. In particular, let Cu =


RM(r + 1, m) and Cv = RM(r, m) as in Theorem 28. Using the recursive
construction, this provides a code with the following parameters:
• Length: n = 2 · 2m = 2m+1 .
• Dimension:
! !
X
r+1
m Xr
m
ku + kv = +
i=0 i i=0 i
! " ! !#
m Xr
m m
= + +
0 i=0 i+1 i
! !
m Xr
m+1
= +
0 i=0 i+1
!
X
r+1
m+1
= .
i=0 i

• Minimum Distance: du = 2m−r−1 , dv = 2m−r and due to Theorem 29,


we obtain d = 2m−r .
This gives the proof idea for Theorem 28. The proper proof follows by
induction starting with small r and m.
Pr  
An RM(r, m) code is therefore a [2m , i=0
m
i
, 2m−r ]2 code with generator
matrix: !
Gu Gu
G(u,u+v) = ,
0 Gv

COD, TUM Channel Coding


102 Chapter 7 Reed–Muller Codes

where Gu and Gv denote generator matrices of Cu and Cv , respectively.


We can start the recursive construction with the RM(1, 1) code (set of
all words of length 2) and the RM(0, 1) code (single-parity check code of
length 2). A tree how to build Reed–Muller codes in shown in Figure 7.3.
Note that decoding up to b d−1
2
c = 2m−r−1 −1 errors can be done recursively,
similar as for first-order RM codes but is not part of this lecture.

Example 7.3 (Generator Matrix for RM Codes).


We want to find the generator matrix of the RM(2, 4) code.
From Theorem 28, we have that

RM(2, 4) = {(u, u + v) : u ∈ RM(2, 3), v ∈ RM(1, 3)}.

Notice that RM(2, 3) is an [8, 7, 2]2 single-parity check code with known
generator matrix.
We can also further decompose:

RM(1, 3) = {(u, u + v) : u ∈ RM(1, 2), v ∈ RM(0, 2)},

where RM(1, 2) is a [4, 3, 2]2 single-parity check code and RM(0, 2) is a


[4, 1, 4]2 repetition code. Thus,
 
! 1 1 1 1
 1
GRM(1,2) GRM(1,2)  1 1 1 
GRM(1,3) = = .
0 GRM(0,2)  1 1 1 1
1 1 1 1

Finally, we obtain:
!
GRM(2,3) GRM(2,3)
GRM(2,4) =
0 GRM(1,3)
 
1 1 1 1

 1 1 1 1
 
 1 1 1 1
 
 1 1 1 1
 
 1
 1 1 1 
 
= 1 1 1 1 .
 
 1 1 1 1
 
 1 1 1 1
 
 1
 1 1 1 
 
 1 1 1 1
1 1 1 1

Channel Coding COD, TUM


7.3 Reed–Muller Codes of Higher Order 103

single-parity check codes

RM(0, 1) RM(1, 1)
[2, 1, 2] [2, 2, 1]

RM(0, 2) RM(1, 2) RM(2, 2)


[4, 1, 4] [4, 3, 2] [4, 4, 1]

RM(0, 3) RM(1, 3) RM(2, 3) RM(3, 3)


[8, 1, 8] [8, 4, 4] [8, 7, 2] [8, 8, 1]

RM(0, 4) RM(1, 4) RM(2, 4) RM(3, 4) RM(4, 4)


repetition codes [16, 1, 16] [16, 5, 8] [16, 11, 4] [16, 15, 2] [16, 16, 1] trivial codes

RM(0, 5) RM(1, 5) RM(2, 5) RM(3, 5) RM(4, 5) RM(5, 5)


[32, 1, 32] [32, 6, 16] [32, 16, 8] [32, 26, 4] [32, 31, 2] [32, 32, 1]

RM(0, 6) RM(1, 6) RM(2, 6) RM(3, 6) RM(4, 6) RM(5, 6) RM(6, 6)


[64, 1, 64] [64, 7, 32] [64, 22, 16] [64, 42, 8] [64, 57, 4] [64, 63, 2] [64, 64, 1]

RM(0, 7) RM(1, 7) RM(2, 7) RM(3, 7) RM(4, 7) RM(5, 7) RM(6, 7) RM(7, 7)


[128, 128, 1] [128, 8, 64] [128, 29, 32] [128, 64, 16] [128, 99, 8] [128, 120, 4] [128, 127, 2] [128, 128, 1]

Figure 7.3: Tree of Reed–Muller Codes built by the recursive construction


from Theorem 28

COD, TUM Channel Coding


Chapter 8

Code Concatenation
This chapter deals with concatenated codes. The goal of concatenated codes
is to build long codes (which usually have a better performance, i.e., get
closer to the capacity of the channel) from short codes, but decode the short
codes. This is usually quite efficient as decoding short codes can be done
quickly.
The encoding and decoding process is visualized in Fig. 8.1. Here, encoding
is done first with an encoder for code A and then with an encoder for B.
Decoding is then done vice versa. We will see throughout this chapter that
the encoder of B might actually take several codewords of A to encode them
to one or several codeword(s) of B.
The recommended literature for this chapter is [3, Chapter 9] and [1, Chapter 12].

u encoder a encoder c r decoder â decoder


+ û
of A of B of B of A

Figure 8.1: Encoding and decoding of concatenated codes

The code A is called the outer code and the code B is called the inner code.

8.1 Product Codes

We start with the most straight-forward type of concatenation: product


codes. Each codeword of a product code can be represented by an array
where all rows are codewords from one code and all columns are codewords
from another code.

Definition 37 (Product code A ⊗ B). Let A be an [nA , kA , dA ]q code and B


be an [nB , kB , dB ]q code. The product code A ⊗ B is obtained by encoding
the rows of a kB × kA array by the code A and afterwards each column with
B.
106 Chapter 8 Code Concatenation

A product code is illustrated in Fig. 8.2. Usually, the codeword which is


encoded as an nB × nA matrix is written as an nA nB vector.

Information symbols

kB Checks on rows

nB Checks on columns

Checks on checks

kA
nA

Figure 8.2: Product code

Theorem 30 (Parameters of Product Code). The product code A ⊗ B is


an [nA nB , kA kB , dA dB ]q code.

Proof. The length and dimension follow from the construction, see also Fig. 8.2.
For the minimum distance, we start by showing that d ≥ dA dB . A non-zero
codeword of the product code contains at least one non-zero row r. This row is
a codeword of A and has weight wt(r) ≥ dA . It follows that at least dA columns
of the product codeword are non-zero. Every such column is a codeword of B.
Hence, the weight of each such column is at least dB . The overall number of
non-zero entries is consequently at least dA dB and thus d ≥ dA dB .
Second, we want to prove that d = dA dB . We first show this equality for q = 2.
Let a be a minimal-weight (non-zero) codeword of A and b be a minimal-weight
(non-zero) codeword of B. Then, fix the encoder of B and let ub be the information
word that is mapped to b. Now, choose ua such that you obtain a in each row
where ub is non-zero. Encoding each column with the encoder for B will give wt(a)
times the column vector b and thus a codeword of weight wt(a) wt(b) = dA dB .
This is also illustrated in Fig 8.3.
Now we verify that d = dA dB for q > 2. Let ua ∈ Fkq A and ub ∈ Fkq B be
information words that are mapped to minimal-weight (non-zero) codewords of
A and B, respectively. To form a kA × kB information array, we will use several
information words from the set {λ · ua : λ ∈ F} corresponding to codewords
{λ · a : λ ∈ F} from the code A. We choose proper values of λ for each row such
that after encoding all rows by the encoder of A, the first non-zero column in the
obtained kB × nA matrix is exactly ub . Note that each column in this matrix can
be represented as µ · ub for some µ ∈ Fq . Encoding each column with the encoder
for B will give wt(a) times the column vectors from the set {µ · b : µ ∈ F∗q }. Note
that wt(b) = wt(µ · b) for µ ∈ F∗q . Hence, the resulting codeword has weight
wt(a) wt(b) = dA dB .

Channel Coding COD, TUM


8.1 Product Codes 107

ub

1 a
0 0
0 0 a is a minimal weight
1 a (non-zero) codeword of A
1 a
0 0
b is a minimal weight
(non-zero) codeword of B

Fix encoder, let ub be the


b
encode in B information word mapped to b
1
0
0 codeword, which contains
1 a in exactly db rows
1
1 wt = dA dB
0
Figure 8.3: Illustration of minimal weight codeword of a product code

We can therefore see each codeword of the product code as an nB × nA


array or alternatively write it as a long vector of length nA · nB . In matrix
representation, every column is a codeword of B and every row is a codeword
of A. It is important to note that the order of encoding A or B first does
not play a role as shown in the following lemma.

Lemma 13 (Encoding Order for Product Code). For a product code as


in Definition 37 where first kB rows of length kA are encoded by A and
then all nA columns by B, all columns are codewords of B and all rows are
codewords of A.

Proof. Let U denote the information array as kB × kA array and let GB be the
generator matrix of the code B, and GA be the generator matrix of the code A.
Encoding in either order results in the codeword (as a matrix of size nB × nA )
defined by
C = GTB · U · GA . (8.1)
Since matrix multiplication is associative, all columns are codewords of B and all
rows are codewords of A.

Example 8.1 (Product Code).


Let A be an [8, 1, 8]2 repetition code and let B be a [4, 3, 2]2 single-parity

COD, TUM Channel Coding


108 Chapter 8 Code Concatenation

check code. The product code A ⊗ B is a [32, 3, 16]2 code due to Theorem 30.
Encoding is then done by two encoding steps (first by A, illustrated in black,
and second by B, illustrated in red):

kA u
u  A 0 u0 u0 u0 u0 u0 u0 u0 
0 u u1 
   1 u1 u1 u1 u1 u1 u1 
kB  u1   
B u u2 u2 u2 u2 u2 u2 u2 
 2 
u2
p p p p p p p p

p = u0 + u1 + u2

(u0 u1 u2 p| . . . |u0 u1 u2 p) ∈ Fn2 A ·nB

That means, the information vector u = (101) is encoded to the following


codeword of the product code (in matrix notation):
 
1 1 1 1 1 1 1 1
0 0
 0 0 0 0 0 0 
 .
1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0

We see that each row is a codeword of the repetition code and each column
a codeword of the single-parity check code. In vector notation, the codeword
is the vector (10101010 . . . 1010) of length 32.
Plugging n = 32 and d = 16 into the Gilbert-Varshamov bound
!
X
d−2
n−1
n−k
2 >
i=0
i

proves the existence of a code with dimension k = 2 and thus the above
product code is not so bad.

Example 8.2 (Long Product Code).


Let A and B both be [32, 26, 4]2 codes. Therefore, A ⊗ B is a [1024, 676, 16]2
code. The Gilbert-Varshamov bound for n = 1024 and k = 676 shows the
existence of a [1024, 676, 67] code, which has significantly higher minimum
distance.

From the previous example, we can see that product codes usually do not
have a good minimum distance, however they are efficiently decodable as
we just have to use the decoders for the short inner and outer code.

Channel Coding COD, TUM


8.2 Concatenated Codes 109

8.2 Concatenated Codes


Second, we deal with concatenated codes. Note that product codes can be
seen as a special case.
For code concatenation, we need two codes: an outer code and an inner
code. Let the outer code A be an [nA , kA , dA ]pm code and the inner code B
be an [nB , kB = m, dB ]p code.
Thus, every codeword of A is a vector of length nA over Fpm , where each
symbol can be seen as a vector of length m over Fp .
We then encode each of these nA vectors of length m with B which results
in nA codewords of B, denoted by b(1) , b(2) , . . . , b(nA ) . Figure 8.4 illustrates
this encoding process.

enc in A expand
1 Fpm 1 Fpm m Fp
kA nA
nA
enc in B
m=kB
(pm )kA = pm·kA = pkB ·kA

nB Fp

nA

Figure 8.4: Encoding with a concatenated code

The nB ×nA array (b(1) , b(2) , . . . , b(nA ) ) (or equivalently the vector representation
of length nA nB ) is a codeword of the concatenated code C. Similarly, the set
of all vectors (b(1) , b(2) , . . . , b(nA ) ) of length nA nB defines the concatenated
code C.
The previous construction of code concatenation can be generalized to
kB = a · m, for any fixed integer a by taking a codewords of A as information
symbols of the encoder of B. This is basically a generalization of product
codes (for m = 1 and kB = a, we obtain a product code as in the previous
section).
The following theorem states the parameters of concatenated codes.

Theorem 31 (Parameters of Code Concatenation). A concatenated code C


which was build from the outer code A with parameters [nA , kA , dA ]pm and
the inner code B with parameters [nB , kB = m, dB ]p has:
• length n = nA nB ,

COD, TUM Channel Coding


110 Chapter 8 Code Concatenation

• dimension k = kA kB (over Fp ),
• minimum distance d ≥ dA dB .

The proof is similar to the one for product codes and therefore omitted here.
Concatenated codes are still “bad” in the sense that there exist codes of
larger dimension (for the same n and d), but due to their structure there are
easy and efficient decoding algorithms. Also, compared to product codes,
their parameters are usually better.

Example 8.3 (Concatenated Code).


Let the outer code A be a [15, 9, 7]24 Reed–Solomon code and the inner code B
be a [5, 4, 2]2 single-parity check code. The concatenation of A and B gives
a [75, 36, ≥ 14]2 code as shown in Theorem 31.
For the encoding, consider one codeword of the outer code A, for example
 
1 1 1 0 α α2 α7 1 0 0 1 α8 0 1 0 ∈ F15
24 .

Expanding each symbol of F24 to 4 bits (using the table from Example 3.10
and reading from right to left) and encoding each with the [5, 4, 2]2 single-parity
check code gives:

( 1 1 1 0 α α2 α7 1 0 0 1 α8 0 1 0 ) ∈ F24
expand
 
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
 
 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 
 
 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 
 
 1 1 1 0 0 0 1 1 0 0 1 1 0 1 0 
encode in B
1 1 1 0 1 1 1 1 0 0 1 0 0 1 0

As a comparison, we can construct the following product code: Let A be a


[15, 9, 4]2 code, which is the best known code of length 15 and dimension 9
(see www.codetables.de) and let B be the [5, 4, 2]2 single-parity check code.
This gives a product code with parameters [75, 36, 8]2 .

This shows that concatenated codes usually result in better parameters than
product codes.

8.3 Generalized Concatenated Codes


A further improvement of code concatenation are generalized concatenated
codes, defined as follows.

Channel Coding COD, TUM


8.3 Generalized Concatenated Codes 111

Definition 38 (Generalized Code Concatenation). Let Ai , i = 1, . . . , ℓ, be


a sequence of ℓ outer [nA , kAi , dAi ]pmi codes with

dA1 ≥ dA2 ≥ · · · ≥ dAℓ .


Let Bi , i = 1, . . . , ℓ, be a sequence of ℓ inner [nB , kBi , dBi ]p codes such that

Bℓ ⊆ · · · ⊆ B 2 ⊆ B 1 ,
where
Pi−1
• B1 becomes Bi by fixing the j=1 mj leftmost information symbols to
zero,
Pℓ
• kB1 = j=1 mj ,
• dB1 ≤ dB2 ≤ · · · ≤ dBℓ .
Encode a codeword a(i) from Ai , i = 1, . . . , ℓ. The columns of
 (1)   
(1) (1)
a a0 . . . anA −1
 .   
 . 
 .  =

.. . . ..
. ..


 
a(ℓ) a0
(ℓ) (ℓ)
. . . anA −1
P
can be seen as vectors of length ℓj=1 mj over Fp which are encoded with B1
to nA codewords of B1 , denoted by b(1) , b(2) , . . . , b(nA ) .
The set of all nB ×nA arrays (b(1) , b(2) , . . . , b(nA ) ) (or equivalently represented
as vectors of length nA nB ) defines the generalized concatenated code C.

The encoding of a codeword of a generalized concatenated code is therefore


illustrated as follows.
k
∈ F p B 1 nA
kA1 enc in A1 nA
= a(1) m1 Fp
Fpm1 Fpm1
kA2 enc in A2 nA
= a(2) m2 Fp
F F P
l
pm2 pm2 kB1 = mi
i=1

.. .. .. ..
. . . .
kAℓ nA
enc in Aℓ
= a(ℓ) mℓ Fp
Fpmℓ Fpmℓ
enc in B1

nB − kB1

COD, TUM Channel Coding


112 Chapter 8 Code Concatenation

The following theorem states the parameters of the generalized concatenated


code.

Theorem 32 (Parameters of Generalized Concatenation). A generalized


concatenated code C built from the outer codes Ai , i = 1, . . . , ℓ, with parameters
[nA , kAi , dAi ]pmi and the inner codes Bi with parameters [nB , kBi , dBi ]p and
the previously mentioned conditions is a code over Fp with:
• length n = nA nB ,
Pℓ
• dimension k = i=1 mi kAi (over Fp ),
• minimum distance d ≥ mini=1,...,ℓ dAi dBi .

The length and dimension follow trivially from the construction. The proof
idea for the minimum distance is shown in the following example.

Example 8.4 (Generalized Concatenated Code).


Let the outer codes A1 and A2 be an [8, 1, 8]23 code and an [8, 4, 4]2 code,
respectively. Thus, m1 = 3 and m2 = 1.
Let the inner code B1 be a [5, 4, 2]2 single-parity check code with generator
matrix  
1 0 0 0 1
0 1
 1 0 0 
GB1 =  .
0 0 1 0 1
1 1 1 1 0
We choose B2 as a [5, 1, 4] code with generator matrix GB2 = (11110) and
therefore dB2 = 4. Hence, when fixing the leftmost m1 = 3 information
symbols to zero, B1 becomes B2 . Notice that it is not a good idea to choose
a systematic generator matrix for B1 because in this case fixing (any) three
information symbols to zero will result in a code of minimum distance only
two.
We can therefore encode one symbol of F23 to a codeword of A1 , denoted by
a(1) , and four binary symbols to a codeword of A2 , denoted by a(2) . Writing
the first one as a 3×8 binary matrix and stacking it above the latter one gives
a 4×8 matrix (illustrated below) where we encode each column with B1 . The
resulting 5 × 8 matrix is then the corresponding codeword of the generalized
concatenated code.

Channel Coding COD, TUM


8.3 Generalized Concatenated Codes 113

kA1 = 1 nA = 8 8
A1
α + α2 α + α2 , . . . , α + α2 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1
kA2 = 4 nA = 8 0 0 0 0 0 0 0 0
1010 10101010 1 1 0 1 0 1 0 1 0

enc in B1
(1101) · GB1 nA = 8
0 1 0 1 0 1 0 1
[nA · nB = 40, 0 1 0 1 0 1 0 1
m1 · kA1 + m2 · kA2 = 7, nB = 5 1 0 1 0 1 0 1 0
d ≥ min {8 · 2, 4 · 4} = 16]2
1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0

Here it also becomes clear why d = min{dA1 dB1 , dA2 dB2 }. If a(1) = 0, all
rows are codewords of A2 (the first three rows are the all-zero word) and all
columns of the resulting 5 × 8 codeword matrix are codewords of the code B2
(since the first three information symbols when encoding with B1 are zero).
Thus, in this case, we obtain d ≥ dA2 dB2 .
On the other hand, if a(1) 6= 0, there are at least dA1 non-zero columns which
are all encoded to columns of weight at least dB1 and d ≥ dA1 dB1 .
Thus, this generalized concatenated code is a [40, 7, 16]2 code.
For comparison, by using standard concatenation of a [8, 1, 8]24 and a [5, 4, 2]2
code, we obtain only a [40, 4, 16]2 code, which has smaller dimension for the
same distance. Similarly, from a [8, 1, 8]2 and a [5, 4, 2]2 code, we obtain a
[40, 4, 16]2 product code.

A generalized concatenated code often gives better parameters than a “usual”


concatenated code.

COD, TUM Channel Coding


Chapter 9

Convolutional Codes
This chapter gives a very brief overview of convolutional codes. Similar
to block codes, the codewords of a convolutional code are obtained from
a mapping that is applied to the information words. However, in theory,
the codewords of a convolutional code are (semi-)infinite and are therefore
called sequences. Convolutional codes are widely used in practice, mostly
due to the Viterbi decoding algorithm which is an easy-to-understand ML
decoding principle that can be done with feasible computational effort.
The recommended literature for this chapter is [3, Chapter 8] and
[6, Chapter 1], where the latter one provides extensive treatment of convolu-
tional codes.

9.1 Definition and Properties

9.1.1 Convolutional Codes

A convolutional code is a set of infinite sequences of code symbols which


means that we deal with data streams (sequences) instead of data blocks.
This is suitable for example for streaming applications. Encoding of convolu-
tional codes is very efficient and can be done by a linear shift-register circuit
(with or without feedback). The semi-infinite information sequence i is split
into blocks of length k which are then mapped to blocks of length n. This
mapping is done by a shift register that has k inputs and n outputs.
Therefore, we denote by k the number of inputs to the shift register and by
n the number of outputs of the shift register. At the same time R = k/n
is the rate of the convolutional code. Figure 9.1 shows an example of a
single-input linear shift register.
It is important to note that (in contrast to block codes), in convolutional
codes, not only the i-th code block depends on the i-th information block,
but additionally also the (i + 1)-th, ..., (i + m)-th code block. That means,
the encoder has memory m.
116 Chapter 9 Convolutional Codes

Example 9.1 (Rate- 21 Encoder).


A rate R = 21 convolutional code with memory m = 2 can be generated by a
linear shift register as shown in Figure 9.1.
Note that memory elements are usually initialized with 0.

+ +

...

+
Figure 9.1: Encoding by linear shift register with k = 1, n = 2

We denote by i the input sequence and by c(1) and c(2) the output sequences.
Let i = (11001000...), then for the upper output c(1) = (10011110...) since
(1)
cj = ij + ij−1 + ij−2 . For the lower output c(2) = (11111010...) since
(2)
cj = ij + ij−2 . These two outputs can be combined (represented by the
switch on the right in Figure 9.1) into one output c = (1101011111101100).

In general, a convolutional encoder maps k parallel input sequences (which


might originate from a single sequence that is split by “DEMUX” to k
parallel sequences) to n parallel output sequences. This is illustrated in
Figure 9.2.

c(1)
+
i(1) ...
c(2)
+
i(2) ...
DEMUX MUX
.. ..
. . c(n)
+
i(k) ...

Figure 9.2: A linear shift register encoder for a convolutional code with k
inputs and n outputs

We use the following notation for the parameters of convolutional codes:

• Code rate: R := nk ,

Channel Coding COD, TUM


9.1 Definition and Properties 117

• Constraint length for sequence i(j) : νj (the number of memory elements


that sequence i(j) has to pass and the length of the j-th impulse
response (defined later)),
Pk
• Overall constraint length: ν := j=1 νj (the number of memory
elements of the shift register),
• Memory: m := maxj=1,...,k (νj ) (the maximum number of memory
elements that any sequence has to pass).

9.1.2 Impulse Response and Generator Matrix


A term that is frequently used in the context of convolutional codes is
the impulse response. The impulse response from input j ∈ {1, . . . , k} to
output i ∈ {1, . . . , n} of a linear shift register is the sequence of length νj +1
(where νj denotes the j-th constraint length) that is obtained by choosing as
input a single one followed by zeros only. Notice that after νj + 1 possibly
non-zero output symbols, the impulse response is zero and therefore we
have to consider only the first νj + 1 output symbols. However, to obtain k
impulse responses of equal length, we fix the length of all impulse responses
to be m + 1 and pad shorter ones by zeros.
We can describe the shift register by a semi-infinite generator matrix such
that c = i · G, where both i and c are semi-infinite sequences.

(j) (j) (j) (j)


Theorem 33 (Generator Matrix). Let gi = (gi,0 , gi,1 , . . . , gi,m ) of length
m + 1 be the impulse response from input j ∈ {1, . . . , k} to output
i ∈ {1, . . . , n} of the linear shift register which generates the convolutional
code C.
Then, the following matrix G is a semi-infinite generator matrix of C:
 
G0 G1 . . . Gm
 
 G0 G1 . . . Gm 
 
G= G0 G1 . . . Gm 
 
 .. .. .. .. 
. . . .
 (1) (1) (1) (1) (1) (1)

g g2,0 . . . gn,0 g1,1 g2,1 . . . gn,1
 1,0 
g (2) (2) (2) (2)
g2,0 . . . gn,0 g1,1
(2)
g2,1 . . .
(2)
gn,1 
 1,0 
 . .. . . . .. .. . . .. 
 . 
 .
 . . .. . . . . . . .

 (k) (k) (k) (k) (k) (k) 
g1,0 g2,0 . . . gn,0 g1,1 g2,1 . . . gn,1 
 
 (1)
g1,0
(1)
g2,0 . . .
(1)
gn,0 
=  ,
 
 (2)
g1,0
(2)
g2,0 . . .
(2)
gn,0 
 
 .. .. . . .. 
 
 . . . . 
 
 (k) (k) (k) 
 g1,0 g2,0 . . . gn,0 
 
..
.

COD, TUM Channel Coding


118 Chapter 9 Convolutional Codes

where Gi is a k × n matrix, ∀i.

Example 9.2 (Shift Register from Figure 9.1).


Consider Example 9.1. We obtain the impulse response when the input is a
single one, followed by zeros. The memory elements of the shift register are
initialized with zeros as usual.
The two impulse responses from Input 1 to Outputs 1 and 2 are then:
(1)
• g1 = (1, 1, 1),
(1)
• g2 = (1, 0, 1).
And therefore the semi-infinite generator matrix of the convolutional code is:
 
11 10 11
 
 11 10 11 
G=
 11 10 11
.

 
.. .. ..
. . .

9.1.3 State Diagram

Instead of a shift register, the encoder can be represented as a state diagram


with 2ν states, where the different states correspond to the states of the
memory elements. All codewords of the convolutional codes are paths in
this state diagram (and vice versa).
An edge between the different states is drawn if we can get from one memory
state to another with a single input bit. The edges are labeled with the
length-k input word that enables the transition as well as with the length-n
output block.

Example 9.3 (Shift Register from Figure 9.1).


Consider Example 9.1. The state diagram of this shift register is shown in
Figure 9.3.

Channel Coding COD, TUM


9.1 Definition and Properties 119

10

1 1

01 01

10

0 1 1 0

00

11 11

0 0

ij = 0
00 ij = 1

Figure 9.3: State diagram of the shift register from Figure 9.1

(1) (2)
Instead of dashing one line, we frequently mark the edges by, e.g., ui /(ci , ci ),
i.e., the input bit and then the two output bits.

9.1.4 Polynomial Representation


It is frequently convenient to represent the semi-infinite sequences by polynomials.
This is done by mapping the sequences to polynomials in F2 [D]:
(c0 , c1 , . . . ) 7→ c0 + c1 D + c2 D2 + . . . .

With this mapping, we can define the convolutional encoder as follows.


A rate R = nk binary convolutional code is the row space of a k × n matrix
G(D) over F2 [D] of rank k and degree m. That means that the entries of
the matrix G(D) are binary polynomials of degree at most m.
Therefore, the codewords are encoded by:

c(D) = i(D) · G(D),

where i(D) is in Fk2 [D] and c(D) is in Fn2 [D].

Example 9.4 (Shift Register from Figure 9.1).


The generator matrix of previous example is

G(D) = (1 + D + D2 , 1 + D2 ).

Given an input sequence i = (110100...) with 4 non-zero bits, its polynomial

COD, TUM Channel Coding


120 Chapter 9 Convolutional Codes

representation is
i(D) = 1 + D + D3 .
The encoded polynomial is calculated by

c(D) = i(D) · G(D)



= (1 + D + D3 ) · (1 + D + D2 ), (1 + D + D3 ) · (1 + D2 )

= 1 + D4 + D5 , 1 + D + D2 + D5 .

This corresponds to the two output sequences (100011..., 111001...).


Therefore, the total encoded stream is:

c = (110101001011...).

9.1.5 Free Distance

For convolutional codes, there are many different distance measures, some
of them grow with the length of the sequence, others do not. In this lecture,
we only consider the so-called free distance.
The free distance df of a convolutional code is the minimum number of
differing symbols in any two (infinitely long) code sequences.
The free distance clearly depends on the memory m of the shift register.
Since we consider only linear convolutional code, the free distance equals
the minimum weight of any non-zero codeword (sequence).
At first glance, determining the free distance of a convolutional code sounds
complicated, but it can be found by determining the loop starting in the
zero-state with the smallest (non-zero) edge weight in the state diagram as
shown in the following example.

Example 9.5 (Shift Register from Figure 9.1).


Consider the state diagram from Figure 9.3. The loop with the smallest edge
weight in the state diagram transverses the following states:
• . . . 00
• 10
• 01
• 00 . . ..
The output sequence when transversing these states (and therefore the codeword
of minimum weight) is then

...0011101100..,

where clearly any number of zeros can be appended in the beginning and in
the end.

Channel Coding COD, TUM


9.2 Termination, Truncation & Tailbiting 121

Thus, the free distance is df = 5.

9.2 Termination, Truncation & Tailbiting

Theoretically, the codewords of a convolutional code are semi-infinite sequences.


However, in practical applications, we clearly need finite-length sequences.
This section shows three different approaches to obtain finite-length sequences
from a convolutional code, thereby discussing the advantages and drawbacks
of each principle.

9.2.1 Termination

The first principle is termination, defined as follows. Here, the information


sequence is chosen in a special way.

Definition 9.1 (Termination after L Blocks). Add k·m zeros to the information
sequence after L blocks such that the encoder ends in the zero state.

Each terminated codeword therefore has length n(L + m).

The generator matrix of a convolutional code, terminated after L blocks, is


a kL × n(L + m) matrix:
 
G0 G1 . . . Gm
 
 G0 G1 . . . Gm 
 
Gterm =
 . . . . .

.. .. .. ..
 
G0 G1 . . . Gm

However, termination results in a rate loss. The code rate of a terminated


kL L
convolutional code is Rterm = n(L+m) = R L+m in contrast to simply R for
semi-infinite convolutional codes.

9.2.2 Truncation

Alternatively, we can simply stop the output of the encoder after L blocks.
This is called truncation.

Definition 9.2 (Truncation after L Blocks). Cut the encoder output after
L output blocks.

Each truncated codeword therefore has length nL.

COD, TUM Channel Coding


122 Chapter 9 Convolutional Codes

The generator matrix of a convolutional code, truncated after L blocks, is


a kL × nL matrix:
 
G0 G1 . . . Gm
 
 G0 G1 . . . Gm 
 
 . .. . .. . .. . .. 
 
 

Gtrunc = G0 G1 . . . Gm  .
 . . 
 .. .. 
 
 
 G0 G1 
G0

There is no rate loss (the rate is Rtrunc = R = nk ), but the last information
bits have a worse protection against errors. For example, the last information
block only influences the last codeword block whereas the first information
block influences m + 1 codeword blocks. Thus, if the last codeword block is
erased, we cannot do anything to recover the last information block whereas
the first information block can most likely still be recovered when the first
codeword block is erased.

9.2.3 Tailbiting

The last principle combines the advantages of the previous two approaches:
no rate loss and equal error protection of all blocks. This is called tailbiting.

Definition 9.3 (Tailbiting after L Blocks). The encoder starts and ends
after L blocks in the same state.

Each tailbited codeword has length nL.

The generator matrix of a convolutional code, tailbited after L blocks, is a


kL × nL matrix:
 
G0 G1 ... Gm
 
 G0 G1 . . . Gm 
 
 .. .. .. .. 
 . . . . 
 
 
Gtail =  G0 G1 . . . Gm 

 .. .. 

Gm . . 
 . .. 
 .. . G0 G1 
 
G1 . . . Gm G0

There is no rate loss (the rate is Rtail = R = nk ) and all information


bits are equally protected against errors. However, we have to know all
information bits at the beginning of the encoding (not just the first k bits).
This is in particular difficult when we use convolutional codes in streaming
applications.

Channel Coding COD, TUM


9.3 Trellis and Viterbi Decoding 123

9.3 Trellis and Viterbi Decoding

9.3.1 Trellis
The partial trellis is an alternative representation of the state diagram.
Let us introduce the partial trellis by returning to our previous example.

Example 9.6 (Shift Register from Figure 9.1).


Figure 9.4 shows the partial trellis for our previous example. Both, the nodes
on the left side and on the right side of the graph represent the four states
of the memory of the shift register. There is an edge between a left node
and a right node if we can go from the left state to the right state by either
inputting a one (solid line) or a zero (dashed line).

00
(00)
11
10
(01)
01
11
(10)
00
01
(11)
10

Figure 9.4: The partial trellis represents states and transitions, where solid
lines indicate a 1 and dashed lines a 0.

From the example, we can see that the nodes represent the states of the
memory elements of the shift-register. The paths represent possible state
transitions (solid line —— : 1 as input, dashed line − − : 0 as input).
The output blocks are written above/below the states corresponding to the
upper and lower incoming edge, respectively.
Based on the partial trellis, we can define the trellis. Similar to the state
diagram, all paths through the trellis are codewords and all codewords are
paths of the trellis. The trellis consists of a concatenation of several partial
trellis where each partial trellis corresponds to one information/code block.
The trellis starts in the zero state (as does the encoder, since the memory
elements are initialized with zero) and for each block the possible states on
the left are connected with the states on the right side. After m blocks, the
full partial trellis is used, i.e., every state of the partial trellis is reached.
For the previous example, the trellis is shown in Figure 9.5.

COD, TUM Channel Coding


124 Chapter 9 Convolutional Codes

00 00 00 00 00 00 00
(00)
11 11 11 11 11
10 10 10 10 10 10
(01)
01 01 01 01 01
11 11 11 11 11 11 11
(10)
00 00 00 00 00
01 01 01 01 01 01
(11)
10 10 10 10 10
Figure 9.5: Trellis of the previous example

The trellis has a constant number ( = 2ν ) of nodes at each time instance.


The paths through the (semi-infinite or terminated or truncated) trellis
represent all codewords of the convolutional code.

9.3.2 Viterbi Decoding

In the following, we describe informally how the trellis can be used to


perform ML decoding of convolutional codes. Given a (usually terminated)
received sequence r, we want to find the code sequence with smallest Hamming
distance to r. By “walking” through the trellis and discarding paths that
for sure have larger Hamming distance than others, we can find this code
sequence efficiently.
This is done via the so-called Viterbi decoding algorithm.
The Viterbi algorithm consists of the followings steps:
• Compare the received sequence r with all code sequences by using the
trellis.
• At each step and each node, if there are two incoming paths, decide
for the “more likely one” (called survivor path), remove the other one.
• In the Hamming metric (and hard decision decoding), at time step
i remove the one which has larger Hamming distance (in the first i
blocks) to r.
In each step, the decoding complexity is the same (it does not grow with
the length). This is an important advantage of Viterbi decoding compared
to building a code tree.
Since we compare the received sequence to all code sequences and choose
the most likely one, this algorithm provides an ML decoding strategy.

Channel Coding COD, TUM


9.3 Trellis and Viterbi Decoding 125

Example 9.7 (Shift Register from Figure 9.1).


Consider the convolutional code from Example 9.1. Let the received sequence
be r = (101000000111). Figure 9.6 shows the decoding process of Viterbi
decoding.
The closest code sequence to the received sequence is therefore
ĉ = (11|10|00|01|01|11), which is in distance two to r.

5
2
0/00

11
0/
3
3

2
0/00

11

10

at each node: 2 leaving and 2 incoming nodes


01
0/

0/

0/
assume r = (10|10|00|00|01|11)

4
2
3
2

4
5

ĉ = (11|10|00|01|01|11)
4
11
01
0/00

1/10

1/
11

1/

01
0/

00

0/
10
1/

0/

4
1

3
2

4
3

3
11
01
0/00

1/10

1/
11

1/

01
0/

0/
10
00

0/
1/

2
3
2

11
01
0/00

1/
0

1/
/10

1
1

11
0/00

1/
(00)

(01)

(10)

(11)

Figure 9.6: Viterbi decoding of Example 9.7

COD, TUM Channel Coding


Bibliography
[1] R. M. Roth, Introduction to Coding Theory, 1st ed. Cambridge
University Press, 2006.
[2] J. Justesen and T. Høholdt, A Course in Error-Correcting Codes, 1st ed.
European Mathematical Society Publishing House, 2004.
[3] M. Bossert, Kanalcodierung, 3rd ed. Oldenbourg, 2013.
[4] ——, Channel Coding for Telecommunications, 1st ed. Wiley, 1999.
[5] R. E. Blahut, Algebraic Codes for Data Transmission, 2nd ed.
Cambridge University Press, 2003.
[6] R. Johannesson and K. S. Zigangirov, Fundamentals of Convolutional
Coding, 2nd ed. Wiley, 1999.
[7] J. H. van Lint, Introduction to Coding Theory, 3rd ed. Springer, 1998.

You might also like