channelcoding_WS2324
channelcoding_WS2324
Channel Coding
1 Motivation 1
1.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation of Channel Coding . . . . . . . . . . . . . . . . . 2
3 Finite Fields 23
3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Group . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.2 Field . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Prime Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Extension Fields . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Polynomials over Finite Fields . . . . . . . . . . . . . . . . . 31
3.5 Cyclotomic Cosets and Minimal Polynomials . . . . . . . . . 32
3.6 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Matrix Properties . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Reed–Solomon Codes 63
5.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Parity-Check Matrix and Generator Matrix . . . . . 63
5.1.2 Definition via Evaluation . . . . . . . . . . . . . . . . 65
5.1.3 Primitive Reed–Solomon Codes . . . . . . . . . . . . 66
5.1.4 Definition via Discrete Fourier Transform . . . . . . . 67
5.2 Syndrome-Based Unique Decoding . . . . . . . . . . . . . . 69
5.2.1 Syndrome Computation . . . . . . . . . . . . . . . . 70
5.2.2 The Key Equation and How to Solve it . . . . . . . . 70
5.2.3 Finding the Error Locations . . . . . . . . . . . . . . 74
5.2.4 Finding the Error Values . . . . . . . . . . . . . . . . 75
5.2.5 Unique Decoding: Overview . . . . . . . . . . . . . . 76
5.3 Interpolation-Based Unique Decoding . . . . . . . . . . . . . 76
5.4 List Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.1 Sudan Algorithm . . . . . . . . . . . . . . . . . . . . 78
5.4.2 Idea of Guruswami–Sudan Algorithm . . . . . . . . . 80
6 Cyclic Codes 83
6.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . 83
6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.2 Generator and Parity-Check Polynomials . . . . . . . 85
6.1.3 Generator and Parity-Check Matrix . . . . . . . . . . 86
6.2 BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.2 The BCH Bound . . . . . . . . . . . . . . . . . . . . 91
6.2.3 Special BCH Codes . . . . . . . . . . . . . . . . . . . 92
6.2.4 Decoding . . . . . . . . . . . . . . . . . . . . . . . . 93
7 Reed–Muller Codes 95
7.1 First-Order Reed–Muller Codes . . . . . . . . . . . . . . . . 95
7.1.1 Definition and Construction . . . . . . . . . . . . . . 95
7.1.2 Unique Decoding . . . . . . . . . . . . . . . . . . . . 97
7.2 Connection to Hamming and Simplex Codes . . . . . . . . . 98
7.3 Reed–Muller Codes of Higher Order . . . . . . . . . . . . . . 100
Motivation
1.1 References
This lecture is mostly self-contained. However, for further reading, the main
literature is given and also specific references for each chapter.
Main literature:
• R. M. Roth, Introduction to Coding Theory, Cambridge Univ. Press,
2006 [1]
• J. Justesen and T. Høholdt, A Course in Error-Correcting Codes,
European Mathematical Society, Jan. 2004. [2]
• M. Bossert, Kanalcodierung, 3rd ed. Oldenburg, 2013 [3].
(M. Bossert, Channel Coding for Telecommunications, Wiley, 1999 [4])
Further literature:
• R. E. Blahut, Algebraic Codes for Data Transmission, 1st ed. Cambridge
Univ. Press, 2003 [5]
• R. Johannesson and K. S. Zigangirov, Fundamentals of Convolutional
Coding, Wiley, 1999 [6]
• J. H. van Lint, Introduction to Coding Theory, 3rd ed. Springer,
1998 [7]
References for each Chapter:
• Chapter 2 (CC Principles): Bossert, Ch. 1; Roth, Ch. 1
• Chapter 3 (Finite Fields): Justesen–Hoholdt, Ch. 2; Roth, Ch. 3
• Chapter 4 (Linear Codes): Roth, Ch. 2
• Chapter 5 (Reed–Solomon Codes):
Definition & Key equation decoding: Roth, Ch. 5.1, 5.2, 6.1, 6.2, 6.3, 6.5;
List decoding: Justesen–Høholdt, Ch. 12
• Chapter 6 (Cyclic Codes): Roth, Ch. 8
• Chapter 7 (Reed–Muller Codes: Bossert, Ch. 5.1, 5.2
• Chapter 8 (Concatenation): Bossert, Ch. 9, Roth, Ch. 12
• Chapter 9 (Convolutional Codes): Bossert, Ch. 8, Johannesson, Ch. 1
2 Chapter 1 Motivation
Numbers: 3789 3759 error undetectable
Words: house houke error correctable
house mouse error undetectable
house horse error undetectable
We can therefore say that channel coding deals with the task of encoding
information in a way that the information can be reconstructed in the
presence of errors. We usually assume that a message consists of symbols
of a certain alphabet which are encoded to a longer sequence of symbols
(i.e., a codeword) and transmitted over some channel which distorts some
symbols. To reconstruct (i.e., decode) the original message, the receiver has
to be able to remove distortions from the received word.
Figure 1.1 already provides some insight to properties of “good” codes which
are discussed in the following.
The transmission of any four numbers has shown that not all words should
be valid codewords to enable error detection or correction. Thus, redundancy
is necessary. Language has some natural redundancy (e.g., “houke” is
no valid “codeword”), but we want to find ways to add redundancy in a
structured way. This will enable us to make statements on how many errors
can always be detected/corrected.
We have also seen that a single error in “house” can result in other words
(“mouse”, “horse”). As it is highly probable in practice that even in a good
channel some errors happen, a few corrupted symbols should not result in
another codeword. That means, any two codewords should have a large
distance.
Finally, one could simply transmit each message multiple times and hope
that at least one of them is received error-free (and can somehow be recognized
as such). This, however, results in a very inefficient scheme. We therefore
require that the redundancy should not be too large which implies that the
code rate (portion of information symbols in all transmitted symbols) is
sufficiently large.
The mentioned properties result in a trade-off between good error-correcting
capability (for which large redundancy is needed) versus high code rate (i.e.,
low redundancy) which is directly related to the efficiency of the scheme.
Some terminology that is used in this lecture should be clarified immediately
and is shown in the following high-level definition.
Figure 2.1 shows the transmission model that is used throughout this lecture.
u c r ĉ
source encoder + decoder sink û
u ∈ Ak → c ∈ An and c ∈ C,
6 Chapter 2 Principles of Channel Coding
Y
n−1
P (r|c) = P (ri |ci ).
i=0
The binary symmetric channel (BSC) and its generalization, the q-ary
symmetric channel (QSC), are channel models where symbol errors happen
with a certain probability.
1 − p if r = c
P (r|c) =
p 6 c,
if r =
The BSC therefore flips each bit with probability p, independently of the
previous and following bits (DMC). This is illustrated in Figure 2.2.
1−p
c=0 r=0
c=1 r=1
1−p
The QSC therefore changes each symbol with probability p into another
symbol of the alphabet A, where all “false” symbols occur with equal
probability. The QSC is illustrated in Figure 2.3.
Similar to the BSC, for p = 0 no errors happen and the communication is
completely reliable. For p = q−1
q
, all output symbols are equally likely (they
a
c=0 r=0
b b
a
c=1 r=1
.. b b ..
. .
b b
a=1−p
a
c=q−1 r =q−1 b = p/(q − 1)
The binary erasure channel (BEC) and its generalization, the q-ary erasure
channel (QEC), are channels where so-called erasures happen. This means,
the receiver knows that at a certain position an error happened but does
not know the error value. This can be modeled as a channel where the size
of the output alphabet is one larger than that of the input alphabet and
contains an additional symbol that denotes an erasure (we use ⊛ in this
lecture).
The BEC therefore erases each symbol with probability ϵ. This is illustrated
in Figure 2.4.
1−ϵ
0 0
ϵ
⊛
ϵ
1 1
1−ϵ
ε
ε
1−ε
0 0
1−ε
1 1
ε
.. ..
. .
1−ε
q−1 q−1
1−p−ε
0 0
ε
p
p ⊛
ε
1−p−ε
1 1
Figure 2.6: Binary symmetric error erasure channel with error probability
p and erasure probability ϵ
However, if the decoder obtains an “analog” value (e.g., from the AWGN
channel), it can be used to improve the decoding performance compared to
hard decision decoding since some received analog values are more reliable
than others. This is called soft decision decoding and illustrated in the
following example and Figure 2.7.
p(ei )
ei
p(ri |xi )
ei
−1 r1 1 r2
This section deals with basic decoding principles, i.e.: what is the goal of
the (channel) decoder (see the transmission model in Figure 2.1)? Recall
that decoding is usually the hard task for the receiver whereas mapping cb to
b (the sink) is an easy task due to the bijective mapping. In the following,
u
we analyze different decoding strategies of the decoder from a high level
point of view, without using a special code or explicit decoding algorithms.
The task of the decoder is to guess the channel input (i.e., a codeword)
c = (c0 , c1 , . . . , cn−1 ), given only the channel output r = (r0 , r1 , . . . , rn−1 ).
A maximum a-posteriori (MAP) decoder has the rule that given a received
word r, it decides for the most likely codeword ĉ from a given code C:
P (r|c)P (c)
ĉ = arg max P (c|r) = arg max ,
c∈C c∈C P (r)
However, we see that using the MAP decision rule directly as in Example 2.2,
is usually not practically feasible as we have to go through all possible
codewords, which results in a large decoding complexity.
The maximum likelihood (ML) decoding rule is a special case of the MAP
decoding rule (see (2.1)). For ML decoding, it is assumed that all codewords
are equi-probable and therefore
1
ĉ = arg max P (r|c)P (c) = arg max P (r|c) · = arg max P (r|c).
c∈C c∈C |C| c∈C
Thus, MAP and ML (if P (c) is the same for all c ∈ C) decoders choose the
most likely codeword and therefore minimize the block error probability.
Example 2.3 also illustrates how knowledge about the statistics of the source
is helpful to make a (good) decision.
Previously, the MAP decoding rule (2.1) was applied to the whole received
word at once. However, it can also be applied to each symbol separately.
This is called symbol-by-symbol MAP. To decide for the symbol ĉi , ∀i ∈
{0, . . . , n − 1}, we decide on the most likely value by summing up the
probabilities of all codewords where the i-th position has that value.
X X
0 if P (r|c)P (c) ≥ P (r|c)P (c)
ĉi = c∈C:ci =0 c∈C:ci =1 (2.3)
1 else
for i = 0, . . . , n − 1.
Example 2.2
= 0.0144 + 0.1296 = 0.144
X X Example 2.2
P (r|c)P (c) = P (r|c)P (c) = 0.1296
c∈C:c1 =1 (01),(11)
This clearly means that the Hamming distance between two words is equal
to the weight of the difference of the words: d(a, b) = wt(a − b).
Throughout this lecture, we say that a code has minimum Hamming distance d
if it is a set of words where the Hamming distance between any two codewords
is at least d and if there is a pair of words which are at distance d.
Proof. Since the minimum distance between any two codewords is at least d, the
received word r = c + e cannot be a codeword if 1 ≤ wt(e) ≤ d − 1, and therefore
the receiver knows that an error occurred. If wt(e) = 0, no error has occurred
and r = c is a codeword.
Proof. Due to the minimum distance of the code, any two codewords differ by at
least d symbols. Thus, if we erase d − 1 fixed positions in all codewords, any two
codewords still differ by at least one symbol.
Therefore, we can reconstruct c if at most d − 1 erasures have happened.
c(1)
j k
d−1
Proof. The decoding spheres of radius around all codewords do not overlap
2 j k
(see Figure 2.9). Therefore, for any received r = c+e with wt(e) ≤ d−1
2 , correct
decoding is guaranteed by finding the center of the decoding sphere that r lies
in.
1 2 3 4
1 2 3
where the second equality holds for any memoryless channel, in particular
DMCs. If P (bi |ci ) = pw < P (ci |ci ) = pc (independent of i), where bi ∈
A \ {ci } (for any i and any bi 6= ci ), maximizing (2.4) is equivalent to
minimizing d(r, c), and hence:
For the BSC, the statement P (bi |ci ) < P (ci |ci ) is equivalent to p < 21 .
c(2)
c(1)
r
d−1 c(3)
2
word falls into more than one decoding sphere, a list decoder returns all
possible codewords. Equivalently and as shown in Figure 2.12, we can draw
a sphere of radius τ around the received word r and return all codewords
that lie in this sphere.
A list decoder of radius τ is therefore defined as follows, where
Bτ (r) = {a : d(a, r) ≤ τ }
denotes a sphere of radius τ around r.
The size of the output list is called list size and denoted by ℓ = |L|. The
maximum list size clearly depends on τ . For example, for τ = b d−1 2
c, we
have ℓ = 1. If τ exceeds a certain value, the list size can grow exponentially
in the code length (such a decoder has a huge complexity and is practically
not feasible).
c(2)
c(1) r
d−1 c(3)
2
The block error probability denotes the sum of decoding failure probability
and decoding error probability, i.e.,:
X
Pblock (c) = Pfail + Perr = P (r|c),
r : Dec(r)6=c
where Dec(r) denotes the output of the decoder applied to the received
word r.
Since a code of minimum distance d guarantees to decode b d−1 2
c errors
uniquely, for the BSC, we can give the following upper bound on the block
error probability:
!
X
n
n
Pblock ≤ · pi · (1 − p)n−i .
i
i=b d−1
2 c
+1
However, this symbol error probability Psym (ui ) depends on the explicit
mapping between information and codewords as shown in Example 2.7,
while the block error probability does not. Therefore, to compare the
performance of different algorithms, it is usually recommended to compare
block error rates.
Finite Fields
As explained in the previous chapter, we consider vectors, matrices, and
elements over finite alphabets. In this chapter, we introduce finite fields
and their properties. It also gives a short summary of vector spaces. Notice
that these lecture notes do not provide a complete overview of finite fields,
but rather focus on properties that are needed in coding theory.
The recommended literature for this chapter is [1, Chapter 3] and [2, Chapter 2].
3.1.1 Group
2. Associativity: (a + b) + c = a + (b + c) mod 4 ✓
e.g.,(1 + 2) + 3 = 6 ≡ 2 mod 4
1 + (2 + 3) = 1 + 5 = 6 ≡ 2 mod 4
3. Identity: e = 0 =⇒ a + 0 = 0 + a = a ✓
4. Inverse: a−1 = −a = 4 − a mod 4 ✓
=⇒ a + a−1 = a + 4 − a = 4 ≡ 0 mod 4
3.1.2 Field
We continue with the definition of a field which is defined using two operations,
addition + and multiplication ·.
Common examples for fields include the sets of rational numbers Q, real
numbers R, and complex numbers C.
Fp := {0, 1, . . . , p − 1}
together with the operations + mod p and · mod p is a (finite) field with
p elements, called prime field or Galois field.
Proof. The proof of this theorem will be done in the tutorial of the lecture.
Some general properties of finite fields will be given in the following section.
a0 + a1 x + · · · + ad x d , with ai ∈ F.
The quotient q(x) and the remainder r(x) can be calculated by (standard)
polynomial division.
For the definition of extension fields, we need irreducible polynomials. A
non-constant polynomial f (x) ∈ F[x] is called irreducible in F if it cannot
be written as product of several polynomials of smaller non-zero degree in
F[x]. Irreducible polynomials play a similar role for polynomials as primes
for integers: prime numbers are irreducible integers and integers can be
factorized into their prime factors. Similarly, polynomials can be factorized
into a product of irreducible polynomials. In fact, this factorization is
unique up to permutation and scalar multiples.
Qn−1
f (x) = xn − 1 = i=0 (x − αi ) for some primitive element α ∈ F2n .
Every finite field (prime and extension fields) has at least one primitive
element α (proof see e.g., [1, Theorem 3.8]). Therefore, multiplying powers
of the primitive element can be done by adding their exponents: αi · αj =
α(i+j) mod (q−1) .
• α = 1: αi = 1, ∀i
=⇒ 1 does not generate F∗5 and is not primitive
• α = 2: 20 = 1
21 = 2
22 = 4
23 = 8 = 3
24 = 16 = 1
=⇒ 2 generates F∗5 and is primitive
• α = 3: 30 = 1
31 = 3
32 = 9 = 4
33 = 27 = 2
=⇒ 3 generates F∗5 and is primitive
• α = 4: 40 = 1
41 = 4
42 = 16 = 1
43 = 64 = 4
=⇒ 4 does not generate F∗5 and is not primitive
α = 2, 3 are primitive elements of F5 , whereas α = 1, 4 are not primitive.
For any irreducible polynomial f (z), there exists an element α ∈ Fpm such
that f (α) = 0. If α is a primitive element, then f (z) is called a primitive
polynomial.
The following theorem summarizes the most important facts for finite fields.
β q−3 ·β β2 αq−3 ·α α2
·β ·α
β3 α3
·β
β s+1 = β βs = 1
s = ord(β) ord(α) = q − 1
polynomials in F[x] modulo (xb − 1). The difference between these two
modulo operations is shown in the following example.
For the definition of cyclic codes and in particular BCH codes, so-called
cyclotomic cosets and minimal polynomials are needed.
While this lecture only considers base fields of prime order, this is not
necessary and it is possible to use base fields whose order itself is some
0
prime power. That is: q denotes some prime power pm and we consider
the extension Fqm of the base field Fq . This general approach is taken in
this section since it allows one to see that it is possible for BCH codes (in
Chapter 6) to be defined over a prime power field instead of just prime
fields. But, as stated already, the reader can safely assume the base field is
prime in this lecture and in particular q = p in this section.
Ci := {i · q j mod n, ∀j = 0, 1, . . . , ni − 1},
Let m denote the smallest positive integer such that n divides q m − 1 (one
can show that such an m exists iff gcd(n, q) = 1). Properties of cyclotomic
cosets are as follows (given without proof):
• Their size is at most m: |Ci | ≤ m.
• Two cyclotomic cosets are either distinct or identical: Ci ∩ Cj = ∅ or
C i = Cj .
• C0 = {0}.
S
• i Ci = {0, 1, . . . , n − 1}.
yields !
q l q−1
(x − α ) = x −
l q
αx q
+ ... ± αlq = xq ± αql ,
1
q
since i = 0 in Fqs . Second, we have
2 ni −1 ni
{αjq }j∈Ci = {αiq , αiq , . . . , αiq , αiq = αi } = {αj }j∈Ci .
X
d
(mi (xq )) = mij xjq .
j=0
The two expressions are equal if and only if mqij = mij , which is true if and only
if mij ∈ Fq (for the “only if” compare [1, Problem 3.11]).
The previous lemma gives the motivation why we can define (cyclic) codes
over the base field Fq by using minimal polynomials.
where mi (x) ranges over all distinct minimal polynomials of the powers of
α (i.e., all distinct polynomials in the set {mi (x), 0 ≤ i < n}).
Proof. Part (2) follows from Part (1) since all cyclotomic cosets are distinct and
since their union equals {0, . . . , n − 1}.
Qn−1
For proving Part (1), let us write explicitly write j=0 (x − αj ):
Y
n−1 Y
n−1
(x − αj ) = −αj
j=0 j=0
X
n−1 Y
n−1
+ −αj · x
i=0 j=0,j6=i
X
n−1 X Y
n−1
+ −αj · x2
i1 =0 i2 >i1 j=0,j6=i1 ,i2
+ ...
+ xn .
For the coefficient of x0 , we rewrite the coefficient by using the arithmetic series.
First consider n to be odd. Then,
Y
n−1 Pn−1 n(n−1) n−1
j
−αj = −α j=0 = −α 2 = −(αn ) 2 = −1,
j=0
n−1
where we use the fact that αn = 1 and that 2 is an integer.
Second, let n be even. Then,
Y
n−1 Y
n−1 Pn−1 n(n−1) n2
j −n n
−αj = αj = α j=0 =α 2 =α 2 2 = α2.
j=0 j=0
n
We know that (α 2 )2 = αn = 1. The polynomial x2 − 1 = 0 has only −1 and 1
n n
as roots and thus, α 2 ∈ {−1, 1}. Assume α 2 = 1, then ord(α) < n which is a
n
contraction and therefore α 2 = −1 and the coefficient of x0 equals −1.
For the coefficient of x, we obtain by using the geometric series:
X
n−1 Y
n−1 X
n−1
(α−1 )n − 1
−αj = α−i = = 0,
i=0 j=0,j6=i i=0
α−1 − 1
Qn−1 j
where the first equality follows by dividing the whole sum by j=0 α = 1 if n is
Q
odd and by − n−1 j
j=0 α = 1 if n is even.
X
n−1 X Y
n−1 X
n−1 X
−αj = − α−i1 α−i2
i1 =0 i2 >i1 j=0,j6=i1 ,i2 i1 =0 i2 >i1
X
n−1 X
n−1 X
i1
=− α−i1 α−i2 − α−i2
i1 =0 i2 =0 i2 =0
X
n−1 (α−1 )n −1 (α−1 )i1 +1 − 1
=− α−i1
−
i1 =0 |
α−1 −1 }
{z
α−1 −1
=0
X
n−1
α−(i1 +1) −1
=− α−i1
i1 =0
α−1 −1
X
α−1 n−1 1 X
n−1
−2i1
=− −1
α + −1
α−i1
α − 1 i =0 α − 1 i =0
1 1
α−1 (α−2 )n
−1 1 (α−1 )n − 1
=− + = 0,
α−1 − 1 α−2 − 1 α−1 − 1 α−1 − 1
Qn−1 j
where the first equality follows by dividing the whole sum by j=0 α = 1 if n is
Q
odd and by − n−1 j
j=0 α = 1 if n is even.
Similarly, it can be shown that the coefficients of x3 , . . . , xn−1 equal zero and
Part (1) of the statement follows.
Instead of the complete technically proof given above, we can explain Lemma 2
as follows. By the fundamental theorem of algebra, every polynomial of
degree n has at most n distinct roots. Since ord(α) = n, every element in
the set {α0 , α1 , . . . , αn−1 } is a distinct root of xn −1 since (αi )n = (αn )i = 1,
for all i. Therefore, these n elements must be all the n distinct roots of xn −1
and the statement follows.
Some properties of minimal polynomials are therefore as follows:
• deg mi (x) = |Ci |,
• α ∈ Fqm , but mi (x) ∈ Fq [x] (see Lemma 1),
• mi (x) is irreducible in Fq [x],
Qn
• mi (x)|(xn − 1) since xn − 1 = j=1 (x − αj ) (see Lemma 2).
C0 = {0}
C1 = {1, 2, 4, 8, 16 ≡ 1 mod 15} = {1, 2, 4, 8}
C2 = C1 (since 2 ∈ C1 )
C3 = {3, 6, 12, 24 ≡ 9 mod 15} = {3, 6, 9, 12}
C4 = C1
C5 = {5, 10, 20 mod 15} = {5, 10}
C6 = C3
C7 = {7, 14, 28 ≡ 13 mod 15, 56 ≡ 11 mod 15} = {7, 11, 13, 14}
m0 (x) = (x − α0 ) = x − 1
m1 (x) = (x − α)(x − α2 )(x − α4 )(x − α8 ) = x4 + x + 1
m3 (x) = (x − α3 )(x − α6 )(x − α9 )(x − α12 ) = x4 + x3 + x2 + x + 1
m5 (x) = (x − α5 )(x − α10 ) = x2 + x(−α5 − α10 ) + α5 α10 = x2 + x + 1
m7 (x) = (x − α7 )(x − α11 )(x − α13 )(x − α14 ) = x4 + x3 + 1.
This section repeats basic properties of vector spaces, without giving any
claim of completeness. We thereby focus on vector spaces over finite fields,
but most properties hold for any field. A vector space is a set of vectors
that may be added and multiplied by scalars and still remain in the same
vector space.
Formally, a vector space over a field F is a set V together with two operations,
“+” and “·”, that satisfy certain axioms.
The first operation (addition) “+”: V × V → V, takes any two vectors
u, v ∈ V and outputs w = u+v ∈ V. The second operation (multiplication)
“·”: F × V → V, takes any scalar a ∈ F and any vector v ∈ V and outputs
w = av ∈ V.
To actually form a vector space, addition and multiplication have to fulfill
the following axioms (for u, v, w ∈ V and a, b ∈ F):
1. Associativity of addition: u + (v + w) = (u + v) + w,
2. Commutativity of addition: u + v = v + u,
3. Identity element of addition: ∃ an element, called 0, such that u+0 =
u, for all u ∈ V,
is a vector space Fn .
Throughout this lecture, we also need some more basic concepts of linear
algebra such as linear (in)dependence. On the one hand, a vector v =
(v0 , v1 , . . . , vn−1 ) ∈ Fn is called linearly dependent of a set of vectors
{u(1) , u(2) , . . . , u(ℓ) } ⊂ Fn if there exist scalars ai ∈ F such that
X
ℓ
v= ai u(i) .
i=1
On the other hand, the vectors {u(1) , u(2) , . . . , u(ℓ) } are linearly independent
if
X
ℓ
ai u(i) = 0
i=1
implies that ai = 0 for all i = 1, . . . , ℓ.
The vectors u(1) , u(2) , . . . , u(ℓ) form a basis of a vector space V if they are
linearly independent and generate V.
Let V be a vector space over F with a basis of ℓ vectors. Then, any set of ℓ
linearly independent vectors in V is also a basis. The integer ℓ is called the
dimension of V.
The vector space that is spanned by the columns of A is called column space
and the space spanned by the rows of A is called row space.
The rank of a matrix A is the dimension of the vector space generated
(or spanned) by its columns (or rows). This corresponds to the maximum
number of linearly independent columns (or rows). It thereby does not
matter if we consider the row or the column space as the row and the
column rank are always equal. Throughout this lecture, we denote it by
rank(A).
An m × n matrix A is said to have full rank if rank(A) = min{m, n}.
A square m × m matrix A is called invertible (or non-singular) if there
exists an m × m matrix A−1 such that
A · A−1 = A−1 · A = Im ,
For example, consider the [3, 1, 3]2 binary repetition code of length 3:
C = {(000), (111)}.
For example, consider the [4, 3, 2]2 binary SPC code in the following.
Systematic encoding of all possible information words of length 3 is done by
appending a 0 if the weight of the information word is even and a 1 if the
weight of the information word is odd. This is shown in the following table.
u → c
000 → 0000
001 → 0011
010 → 0101
011 → 0110
100 → 1001
101 → 1010
110 → 1100
111 → 1111
We can see that all codewords have even weight, the cardinality is M = 23 =
8, the code rate is R = 34 , and the minimum distance is d = 2.
u → c
00 → 000
01 → 011
10 → 101
11 → 110
The first two symbols in each codeword are the information bits and the last
bit is a parity check bit (can be used for error detection).
Second, a quasi-systematic encoding of the same code is the following:
u → c
00 → 000
01 → 011
10 → 110
11 → 101
Both encoding methods result in the same code. Their difference is that in
the second mapping, the two information bits can be found in the first and
the third position.
A generator matrix is called systematic if it has the form (Ik | A), where
Ik denotes the k × k identity matrix. The codeword is then c = (u | u · A)
where the first k positions equal the information symbols.
Similarly, it is called quasi-systematic if all k unit vectors (i.e., the columns
of Ik ) are columns of the generator matrix. For every linear block code, there
Consider for example the [4, 3, 2]2 SPC code and the information word u =
(010). The corresponding codeword is then:
1 0 0 1
c = (010) · 0 1 0 1 = (0101).
0 0 1 1
We can see that the first n − 1 columns of GSPC equal In−1 which provides
a systematic encoding. The last bit is the parity bit which is simply the sum
of all information bits. Therefore, the last column is the all-one column.
We can also illustrate that there are several generator matrices for one code.
Let us therefore perform elementary row operations on the previously shown
systematic generator matrix GSPC of the [4, 3, 2]2 SPC-code:
1 0 0 1 I 1 1 0 0 I + II
0 1 0 1 II 0 1 0 1 II
GSPC = → G0SPC =
0 0 1 1 III 1 1 1 1 I + II + III
The matrix GSPC is the systematic generator matrix of the SPC code and
the matrix G0SPC is another generator matrix of the same code which was
constructed by using elementary row operations on the matrix GSPC .
! !
1 1 0 1 0 1 1 1 0
G= → G = .
0 0 1 1 1 0 0 1
u c c0
00 0000 0000
01 0011 1001
10 1101 1110
11 1110 0111
Definition 23 (Dual Code). Let C be a linear [n, k, d]q code. Then, the set
of vectors n o
C ⊥ := c⊥ ∈ Fnq : hc⊥ , ci = 0, ∀ c ∈ C
Lemma 4 (Parameters of the Dual Code). The dual code C ⊥ of an [n, k, d]q
code C is a linear [n, k ⊥ = n − k, d⊥ ]q code.
Proof. The length is trivial. The dimension follows from (4.1) as the rank of G
is k and therefore the dimension of the right kernel (and the dimension of the
dual code) is n − k. The distance d⊥ does not necessarily depend on d.
The parity-check matrix is based on the dual code and needed in the decoding
process.
c · HT = 0,
Proof. The proof of the first part can be done by contradiction: Assume that
there exist δ ≤ d − 1 linearly dependent columns. It follows that there is a word
c such that:
δ lin. dependent
0= ⋆ ⋆ ⋆ ·
columns of H
δ non-zero positions
The parity check equation in the i-th row checks whether the values at the
i-th and the last position of the codeword c are equal. Intuitively, if we look
at the scalar product of the i-th row of HRP and a codeword, we see that the
i-th code symbol is added to the last code symbol. Therefore, by HRP , we
check if all symbols are equal to the last symbol (and hence, all are equal).
If this is true, the syndrome is all-zero.
Note that HRP = GSPC and therefore repetition and SPC codes are dual
codes. Hence,
HSPC = 1 1 . . . 1 = GRP .
For the parameters of the dual code we have kSPC = n − kRP but dSPC 6= dRP .
The syndrome shows that an error occurred and can be used for error detection.
The code class defined in this section, the Hamming code, is the most famous
single-error-correcting code and defined as follows.
Proof. There are 2m −1 nonzero vectors of length m which are used as the columns
of the parity-check matrix. The length of the code is the number of columns of
the parity-check matrix, i.e., n = 2m − 1.
The parity-check matrix has n − k = m rows, i.e., the dimension of H(m) is
2m − 1 − m.
Any two columns of a parity-check matrix with pairwise different columns are
linearly independent and there are three columns which are linearly dependent.
Thus, the minimum distance is d = 3.
Observe that any two arbitrary columns are linearly independent and there
exist three linearly dependent columns, e.g.,:
1 0 1
0 + 0 = 0 .
0 1 1
The standard array is not unique as the order of the non-zero codewords in
the first row can be different. Also, when constructing the other rows, there
might be more than one word e of smallest Hamming weight that has not
appeared before and we can randomly choose one.
The rows of the standard array are called cosets. Two words a and b are
in the same coset if and only if a − b ∈ C. This in turn is equivalent to the
case that they have the same syndrome.
The first word in each row is a minimum weight word in its coset and is
called coset leader.
The actual decoding process is rather easy to explain. Given a received
word r, the task of the decoder is to find an error word ê such that r−ê ∈ C,
i.e., it is a codeword. Thus, ê must be in the same coset (i.e., row of the
standard array) as r. Since ML/nearest codeword decoding means that ê
should have minimum weight, we decide for ê being the coset leader.
As mentioned before, the standard array is not unique. We can, e.g., permute
the rows where the first entry has weight one (rows 2 until 6). We may also
permute the three rightmost columns. It is also possible to use (01100) as
coset leader in the last row.
Decoding:
Given r, find the row (coset) in the standard array which contains r, and let
the decoded error word ê be the coset leader of this row.
Assume r = (01111). This vector is in the fourth row and the third column
(underlined in the above table). The coset leader of this row is (00100). The
decoding therefore decodes for ê = (00100) and outputs ĉ = r − ê = (01011).
This codeword is in the same column as r.
the set of all words in Hamming distance at most t, i.e., Bt (a) := {b ∈ Fnq :
d(a, b) ≤ t}.
Thus, the total number of words in all decoding spheres (left-hand side of (4.2))
is at most the size of the whole space (right-hand side of (4.2)).
Codes which attain (fulfill it with equality) the sphere-packing bound are
called perfect codes.
We can list all linear perfect codes.
1. The set of all words, i.e., Fnq (this is an [n, n, 1]q code).
j k
d−1
Since = 0, it can easily be checked that this is a perfect code:
2
The LHS of (4.2) is 2n · n
0
= 2n and therefore equals the RHS.
2. The binary [n, 1, n]2 repetition code for odd n.
j k j k
d−1 n−1 n−1
For odd length n, 2
= 2
= 2
.
The LHS of (4.2) equals therefore
n−1 ! !
X
2
n (∗) 1 Xn
n
21 · = 2· · = 2n ,
i=0 i 2 i=0 i
where (∗)
holds
because of the symmetry of the binomial coefficients,
N N
i.e., i = N −i .
3. The q-ary Hamming code H(m).
The equality for q-ary Hamming codes can be shown in the same way.
4. The [23, 12, 7]2 Golay code and the [11, 6, 5]3 Golay code.
It can be shown that there are no other linear perfect codes, cf. [1, Page 96].
d ≤ n − k + 1.
Proof. Every linear block code has a quasi-systematic generator matrix, i.e., the
columns of Ik (unit vectors of length k) are columns of G.
If two information vectors differ in only one symbol, the two codewords differ in
at most n − k redundancy symbols and one information symbol. The other k − 1
systematic positions contain the same symbols in both codewords.
Hence, d ≤ n − (k − 1) = n − k + 1.
Codes which attain the Singleton bound are called Maximum Distance
Separable (MDS) codes. The following codes are (amongst others) MDS
codes.
1. The set of all words, i.e., Fnq (this is an [n, n, 1]q code).
2. The [n, 1, n]q repetition code.
3. The [n, n − 1, 2]q single parity-check (SPC) code.
4. The [n, k, n − k + 1]q≥n Reed–Solomon code RS(n, k). This is the
most famous class of MDS codes and can be constructed for all k and
n. The only limitation of Reed–Solomon codes is that the field size
has to be at least in the order of n. We will consider them in detail
in Chapter 5.
Note that there are no other binary MDS codes than the ones contained in
the previous enumeration (and their cosets).
It is possible to correct d − 1 = n − k erasures with an MDS code. That
means, if we know any k symbols of a codeword, we can reconstruct the
other n − k symbols.
While the sphere-packing and Singleton bounds are upper bounds and
provide necessary conditions for any linear code, the Gilbert–Varshamov
(GV) bound shows the existence of codes with certain parameters, i.e., a
sufficient condition. The binary GV bound is as follows.
1
Sphere-Packing Bound (q = 2)
0.9 Sphere-Packing Bound (q = 4)
Sphere-Packing Bound (q = 8)
0.8 Gilbert-Varshamov Bound (q = 2)
Gilbert-Varshamov Bound (q = 4)
Gilbert-Varshamov Bound (q = 8)
0.7
Singleton Bound
0.6
0.5
δ
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
We now want to compare the generator and parity check matrices of C and
Cs . Let G = (Ik | A) be a systematic generator matrix of a non-degenerate
code C for some A ∈ Fqk×(n−k) . Then Step 1 of the shortening procedure
means that we fix c0 = u0 = 0. This is equivalent to removing the first row
of G. Step 2 of the shortening procedure means that also the first column
is removed. This results in a systematic generator matrix of Cs , denoted
by Gs = (Ik−1 | As ), where As ∈ Fq(k−1)×(n−k) consists of the k − 1 bottom
rows of A.
The parity check matrix of Cs is then Hs = (−ATs | In−k ). This is also
illustrated in Figure 4.3.
1
..
( 0 u1 u2 ... uk−1 )
.
= ( 0 u1 ... uk−1 ck ... cn−1 )
1
1
G = Ik A k H = −AT In−k
and d = 2. The generator matrix of the lengthened code has the following
structure: !
1 ? ?
Gl =
0 1 1
If we choose (111) as first row, then (100) is a codeword and dl = 1 only.
However, with (101) as first row, we obtain a [3, 2, 2]2 code Cl .
Thus, lengthening with the all-one row results in the code Cl which is a
[4, 3, 2]2 SPC code Cl .
Reversely, shortening the code Cl by the first position results in turn in the
[3, 2, 2]2 SPC code C.
1 1
.. Gp = .. Ap
G= . A .
1 1
n−k n−k−1
k k
n−1
The all-one row on top guarantees that all codewords of Ce have even weight.
Reed–Solomon Codes
Reed–Solomon (RS) codes are a powerful class of codes which are probably
the most extensively used classed of codes in practice. This is due to
multiple reasons. First, RS codes are optimal in the sense that they are MDS
codes and attain the Singleton bound with minimum distance d = n − k + 1.
Second, they can be efficiently encoded and decoded. They were introduced
already in 1960 by Irving Reed and Gustave Solomon (see Figure 5.1).
1 1 ... 1 ν0
0
α0 α1 . . . αn−1
ν10
GRS :=
.. .. .. ..
·
.
(5.2)
. . . . . .
0
α0k−1 α1k−1 k−1
. . . αn−1 νn−1
Proof. Rewrite GRS · HTRS = 0 and let a indicate the a-th row of G and b the
b-th column of HT :
ν0
1 α0 ... αn−k−1
1 1 ... 1 0 ν0 0 0 ... 0
α
.0
α1 ... αn−1
0
ν1 ν1 1 α ... αn−k−1
.. .. .. .. · · · . .1 1 = .. . . .. .
. . . ..
.
..
. . . ..
.
.. . . .
. . . 0 ... 0
αk−1
0 αk−1
1 ... αk−1
n−1
0
νn−1
νn−1 n−k−1
1 αn−1 ... αn−1
)
∀ a = 0, . . . , k − 1 X
n−1
⇐⇒ αja νj νj0 αjb = 0
∀ b = 0, . . . , n − k − 1 j=0
X
n−1
⇐⇒ νj νj0 αja+b = 0
j=0
X
n−1
⇐⇒ νj νj0 αjr = 0 ∀ r = 0, ..., n − 2. (⋆)
j=0
Hence, ν00 , ν10 , . . . , νn−1
0 ∈ RS(n, 1) and wt ν00 , ν10 , . . . , νn−1
0 = n, i.e., the νi0 ’s
are all non-zero and determined by (⋆).
u(αi )
at most
deg(u(x)) roots
That means, νi = αi = αi .
Proof. Since GRS · HTRS = 0 has to hold, we use from the proof of Theorem 13:
X
n−1
νj νj0 αjr = 0, ∀ r = 0, ..., n − 2.
j=0
Let αj = αj , νj0 = 1, and νj = αj = αj , for all j. With the help of the geometric
P 1−an an −1
series n−1 j
j=0 a = 1−a = a−1 , it follows:
X
n−1 X
n−1 j α(r+1)n − 1
αj·(r+1) = α(r+1) = = 0.
j=0 j=0
αr+1 − 1
X
n−1 X
n−1
− 2πj ·iℓ
Aℓ := ai · e n = ai · wiℓ , ∀ ℓ = 0, . . . , N − 1
i=0 i=0
2πj
with the Fourier kernel defined as w = e− n and j being the imaginary unit
here.
X
n−1
Aj = n−1 ai α−ij = n−1 a(α−j ), ∀j = 0, . . . , n − 1.
i=0
X
n−1
ai = Aj αij = A(αi ), ∀i = 0, . . . , n − 1.
j=0
Pn−1
Proof. We have to prove that A(αi ) = ai where A(x) = j=0 Aj xj with Aj =
P
n−1 n−1
l=0 al α
−lj .
Therefore, we rewrite:
X
n−1 X
n−1 X
n−1 X
n−1 X
n−1
A(αi ) = Aj · (αi )j = n−1 al α−lj · αij = n−1 al αj(i−l) .
j=0 j=0 l=0 l=0 j=0
X
n−1
(αl−i )n − 1
αj(i−l) = = 0,
j=0
αl−i − 1
Thus,
X
n−1 X
n−1
A(αi ) = n−1 al αj(i−l) = ai .
l=0 j=0
The definitions via evaluation and generator matrix are equivalent and more
general than the one via the DFT as the one of the DFT only works for
primitive RS codes.
In this section, we deal with unique decoding of RS codes, i.e., with the
following task. We mainly follow the description of [1, Chapter 6].
Given: received
j k word r (or polynomial r(x)) such that r = c + e, where
wt(e) ≤ 2 and c ∈ RS(n, k).
d−1
The first step in the decoding process is the syndrome computation. This is
the easiest step as it consists only of multiplying the received word by the
parity-check matrix. The main goal of this section is to derive an expression
of the syndrome polynomial that is used for the following decoding process.
The task of this decoding step is the following.
X
n−1 X
n−1 X
si = rj νj αji = ej νj αji = ej νj αji ,
j=0 j=0 j∈E
X
d−2 X X
d−2
S(x) := s i xi = ej νj (αj x)i .
i=0 j∈E i=0
Consider the ring Fq [x] modulo xd−1 (denoted by Fq [x]/xd−1 ). That means,
all polynomials have their coefficients in Fq (this is the polynomial ring
Fq [x] that we have considered before) and are calculated modulo xd−1 .
Calculating a polynomial modulo xd−1 means cutting higher powers as
shown in Example 3.11.
In Fq [x]/xd−1 , the following multiplicative inverse can be calculated:
X
d−2
(1 − αj x)−1 = (αj x)i mod xd−1 .
i=0
In this section, we derive the so-called key equation. This equation provides
a relation between two polynomials that are in turn related to the error
word: the error locator polynomial (ELP) and the error evaluator polynomial
(EEP). The first is related to the error locations while the latter is related to
the error values. The task in the section is to derive a relation between these
two polynomials (defined in the following) and the syndrome polynomial.
The ELP is denoted by Λ(x) and indicates where the error positions are. It
is defined by Y
Λ(x) := (1 − αi x).
i∈E
Its roots are the αℓ−1 when ℓ is an erroneous position, i.e., Λ(αℓ−1 ) = 0 ⇐⇒
ℓ ∈ E. Thus, the roots of the ELP tell us where the errors are.
The EEP is denoted by Ω(x) and helps us to determine the error values. It
is defined by X Y
Ω(x) := e i νi (1 − αj x).
i∈E j∈E\{i}
If we insert the roots of the ELP (code locators at the error positions), we
obtain: Y
∀ℓ ∈ E : Ω(αℓ−1 ) = eℓ νℓ (1 − αj αℓ−1 ) 6= 0.
j∈E\{ℓ}
Note that Λ(x) and Ω(x) share no common roots, i.e., gcd(Λ(x), Ω(x)) = 1.
The degrees of the ELP and the EEP satisfy:
$ %
d−1
deg Ω(x) < |E| = deg Λ(x) ≤ .
2
Finally, we can establish a connection between the ELP and the EEP as
follows:
Q !
X − αj x)
j∈E\{i} (1 X ei νi
Ω(x) = Λ(x) e i νi Q = Λ(x) .
i∈E j∈E (1 − αj x) i∈E 1 − αi x
Combining this with the expression of the syndrome polynomial from (5.4)
yields:
$ %
d−1
Ω(x) = Λ(x)S(x) mod x , where deg Ω(x) < deg Λ(x) ≤
d−1
.
2
(5.5)
Equation (5.5) forms the so-called key equation for decoding (G)RS codes.
The next decoding step is to solve the key equation for the ELP. Once the
ELP is known, the EEP can be calculated from (5.5).
For t := wt(e) = |E| = deg Λ(x), the polynomial key equation from (5.5) is
equivalent to the following linear system of equations:
s0 0 0 ... 0 Ω0
s1 s0 0 ... 0 Ω1
.. .. .. .. ..
..
. . . . . Λ0 = 1
.
Λ
st−1 st−2 ... s0 0 1 Ωt−1
· .. = (5.6)
s st−1 ... s1 s0 0
t .
st+1 st ... s2 s1 Λt 0
. .. .. .. .. .
.. . . . . ..
sd−2 sd−3 . . . sd−t−1 sd−t−2 0
j k Pt Pt−1
where t ≤ d−1
2
, Λ(x) = i=0 Λi xi , and Ω(x) = i=0 Ωi xi .
In principle, the last d − 1 − t ≥ t equations of (5.6) do not depend on Ω(x),
so we can solve them for the coefficients of Λ(x). Once we know Λ(x), we
can use the first t equations to determine Ω(x).
However, the problem with this strategy is that we do not know the actual
value of t. For this purpose, the following lemma is needed. Its proof is due
to Peterson1 .
1
W.W. Peterson, “Encoding and error-correction procedures for the Bose-Chaudhuri
codes”, IEEE Trans. Inform. Theory, Sep. 1960, pp. 459–470.
Algorithm 1: Find t
Input: s,
j d k
1 Set ν = d−1
2
and set up Sν
2 while Sν is singular do
3 Set ν ← ν − 1
4 Set up Sν
5 t=ν
Output: t
Pt
5 Set Λ(x) = 1 + i=1 Λ i xi
6 Calculate Ω(x) = Λ(x)S(x) mod xd−1
Output: ELP Λ(x) and EEP Ω(x)
j k
Theorem 16 (Uniqueness of Solution). For t ≤ d−1
2
, the solution (Λ(x), Ω(x))
to the key equation (5.5) is unique.
Y
Given: ELP Λ(x) = (1 − αi x).
i∈E
Task: find the set of error locations E.
To solve this task, we derive the so-called Forney formula for error evaluation.
Ps
We use the standard derivative of a polynomial a(x) = i=0 ai xi , defined
by
X
s
a0 (x) := iai xi−1 .
i=1
By using the standard rule for the derivative of a product of two polynomials
(a(x)b(x))0 = a(x)0 b(x) + a(x)b0 (x), the derivative of the ELP is
X Y
Λ0 (x) = (−αℓ ) (1 − αj x).
ℓ∈E j∈E\{ℓ}
Plugging (5.10) into (5.11) and solving it for ei results in Forney’s formula
for error evaluation. For all i ∈ E, we can calculate ei by:
αi Ω(αi−1 )
ei = − · 0 −1 . (5.12)
νi Λ (αi )
From the set of error positions E and the error values ei , we can reconstruct
the error word e = (e0 , e1 , . . . , en−1 ) and therefore the codeword c = r − e.
j k
Given: received word r such that r = c + e, where wt(e) ≤ τ := d−1
2
and c ∈ RS(n, k).
Task: Find a bivariate polynomial Q(x, y) = Q0 (x)+Q1 (x)·y such that
• Condition 1: Q(αi , ri ) = 0, ∀i = 0, . . . , n − 1,
• Condition 2: deg Q0 (x) < n − τ ,
• Condition 3: deg Q1 (x) < n − τ − (k − 1).
Theorem
j k 18 (Factorization Step). If c = eval(u(x)) and wt(e) ≤ τ =
d−1
2
, then Q(x, u(x)) = 0 and u(x) = −Q 0 (x)
Q1 (x)
.
Proof. The bivariate polynomial Q(x, y) satisfies Q(αi , u(αi )+ei ) = 0 (Condition 1).
Since ei = 0 for at least n − τ positions (the error-free positions), the univariate
polynomial Q(x, u(x)) = Q0 (x) + Q1 (x) · u(x) has at least n − τ roots, namely
the αi where u(αi ) = ri .
Any polynomial with at least b roots has degree at least b, hence, Q(x, u(x)) has
degree at least n − τ .
However, deg Q(x, u(x)) ≤ max{deg Q0 (x), deg Q1 (x) + deg u(x)} < n − τ .
This is a contradiction and to fulfill both constraints, Q(x, u(x)) = 0.
−Q0 (x)
Hence, Q0 (x) + u(x)Q1 (x) = 0 and u(x) = Q1 (x) .
Thus, given Q(x, y), we can simply divide −Q0 (x) by Q1 (x) (by standard
polynomial division) and get the message polynomial u(x).
This decoding principle was introduced by Welch and Berlekamp and is
summarized in the following algorithm.
From the previous theorems, we know that a non-zero solution for Q(x, y)
by the linear system of equations in the Welch–Berlekamp
j k decoder exists
and that u(x) defines the sent codeword if wt(e) ≤ 2 .
d−1
Q0,0
Q0,1
α02 α0n−τ −1 r0 · α0 r0 ·
n−τ −(k−1)−1 ..
1 α0 ... r0 ... α0 .
n−τ −(k−1)−1 0
1 α1 α12 ... α1n−τ −1 r1 r1 · α1 ... r1 ·
α1 Q0,n−τ −1 .
. . ·
. . .. .. .. .. .. .. .. Q1,0 ..
=
. . . . . . . . .
n−τ −(k−1)−1
Q1,1 0
2
1 αn−1 αn−1 n−τ −1
. . . αn−1 rn−1 rn−1 · αn−1 . . . rn−1 · αn−1
..
.
Q1,n−τ −(k−1)−1
n−τ −(k−1)−1
X−1
n−τ X
2 Set Q0 (x) = Q0,i xi and Q1 (x) = Q1,i xi .
i=0 i=0
Q0 (x)
3 Factorization Step: calculate u(x) = −
Q1 (x)
Output: message word u
such that:
• Condition 1: Q(αi , ri ) = 0, ∀i = 0, . . . , n − 1,
• Condition 2: deg Qj (x) < n − τ − j(k − 1), for all j = 0, . . . , ℓ.
j k
d−1
Note that for ℓ = 1 and τ = 2
, this is the Welch–Berlekamp interpolation
step.
ℓ ℓ
τ< n − (k − 1), (5.13)
ℓ+1 2
there is at least one non-zero polynomial Q(x, y) which satisfies the previous
conditions.
Note that for reasonable parameters of the RS code, the probability that
ℓ > 1 is usually very small.
Q(αi , ri ) = 0, ∀i = 0, . . . , n − 1
From the previous theorems, we know that a non-zero solution of this system
of equations exists and that one of the u(x)’s defines the sent codeword if
ℓ
wt(e) < ℓ+1 n− 2ℓ (k −1). However, this only works for low code rates R ≲ 13 .
As we have seen in the previous section, Sudan’s list decoder only increases
the decoding radius for low-rate GRS codes. A further generalization to
higher code rates was later suggested by Guruswami and Sudan.
such that:
• Condition 1: Q(αi , ri ) = 0 with multiplicity s, ∀i = 0, . . . , n − 1,
• Condition 2: deg Qj (x) < s · (n − τ ) − j(k − 1), for all j = 0, . . . , ℓ.
1
Unique
Sudan, ℓ = 2
0.8 Sudan, ℓ = 3
Guruswami-Sudan
0.6
τ /n
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Figure 5.3: Comparison of unique decoding, the Sudan algorithm and the
Guruswami–Sudan algorithm
Cyclic Codes
Cyclic codes provide certain practical advantages in terms of efficiency and
fast implementations as they can be described compactly (by a generator or
parity-check polynomial) and encoding and decoding can be done by means
of shift registers.
The recommended literature for this chapter is [1, Chapter 8].
6.1.1 Definition
An interesting property of codes with several practical advantages is cyclicity,
defined as follows.
(000) → (000).
84 Chapter 6 Cyclic Codes
In the rest of this chapter we deal only with linear cyclic codes.
For cyclic codes, the polynomial description of all words frequently simplifies
notations. For this purpose, we associate each vector (c0 , c1 , . . . , cn−1 ) ∈ Fnq
with a polynomial c(x) := c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 ∈ Fq [x].
cn−1 +c0 x+· · ·+cn−2 xn−1 = x·c(x)−cn−1 ·(xn −1) = x·c(x) mod (xn −1).
c = ( c0 , c1 , c2 , . . . , cn−1 )
·x
( , c0 , c1 , . . . , cn−2 ) cn−1
mod xn − 1
( cn−1 , c0 , c1 , . . . , cn−2 ) cn−1
Figure 6.1: Connection between the cyclic shift of a codeword c and the
multiplication by x of the respective polynomial c(x).
Similar to the generator and parity-check matrices, for cyclic codes, we can
consider a generator polynomial and a parity-check polynomial (cf. [1]).
Proof. First notice that if g(x) exists, it must be a codeword of minimum degree
since g(x) | g(x) and it is unique because it divides all other codewords.
First, we prove (⇐=): select g(x) to be a monic polynomial of smallest degree in
C. Note that it is always possible to choose g(x) monic as we can multiply any
non-monic codeword by a scalar such that it is monic and it is still a a codeword
due to linearity of the code. From (6.1), we know that for all u(x) ∈ Fq [x],
it holds that u(x) · g(x) mod (xn − 1) ∈ C. In particular, for every u(x) with
deg u(x) < n − deg g(x) we know u(x) · g(x) ∈ C. Hence, all polynomial multiples
of g(x) are codewords of C.
Second, we prove (=⇒). Let c(x) = u(x) · g(x) + r(x) with deg r(x) < deg g(x).
Since g(x) ∈ C and due to linearity, it follows that r(x) = c(x) − u(x) · g(x) ∈ C.
Since g(x) is a minimum-degree codeword and deg r(x) < deg g(x), we get r(x) =
0, thus g(x) | c(x).
C = {u(x) · g(x) : u(x) ∈ Fq [x] and deg u(x) < k}. (6.2)
Proof. Observe that xn − 1 = h(x) · g(x) + r(x) with deg r(x) < deg g(x), which
means that r(x) = −h(x)g(x) mod (xn − 1). Since g(x) ∈ C and due to the
linearity of C, we have r(x) ∈ C. But since deg r(x) < deg g(x) and g(x) has
minimum degree of all codewords, r(x) = 0. Therefore, g(x) divides xn − 1.
X
n−1
g(j−i) mod n h(k+l−j) mod n = 0, (6.3)
j=0
with gj = 0 ∀j ∈ {n − k + 1, . . . , n − 1},
hj = 0 ∀j ∈ {k + 1, . . . , n − 1}.
xk · h(x−1 )
g ⊥ (x) = .
h(0)
Proof. We see that deg g ⊥ (x) = n−k ⊥ = k and deg h⊥ (x) = k ⊥ = n−k. Further
we construct via Theorem 22 the parity-check matrix H. Since h(0) 6= 0, H0 G⊥
has full rank. We have:
hk hk−1 . . . h0
hk hk−1 . . . h0
⊥
G =H=
.. .. .. ..
. . . .
hk hk−1 . . . h0
⊥
g0 g1⊥ . . . gn−k
⊥
⊥
g0⊥ g1⊥ ... ⊥
gn−k
⊥
= h(0) ·
.. .. .. ..
.
. . . .
⊥
g0 ⊥
g1 ⊥
. . . gn−k ⊥
Second we have to prove that g ⊥ (x) actually defines a cyclic code, i.e., that it
Q
divides xn − 1. Let h(x) = kj=1 (x − αji ), where {j1 , j2 , . . . , jk } ⊂ {0, . . . , n − 1}
and α is an element of order n in the splitting field Fqs . We know that h(x)|(xn −
Q
1). Further h(0) = h0 = (−1)k kj=1 αji .
We obtain:
xk · h(x−1 ) xk Yk
= (x−1 − αji )
h(0) h0 j=1
1 Y k
= (1 − αji x)
h0 j=1
Qk
j=1 α
ji Y
k
= (α−ji − x)
h0 j=1
Y
k
= (−1)k (α−ji − x)
j=1
Y
k
= (x − α−ji ).
j=1
Qk
Since {−j1 , −j2 , . . . , −jk } are all distinct, g ⊥ (x) = j=1 (x − α
−ji ) | (xn − 1).
xn − 1 Y
n−1
• hSPC (x) = = (x − αi ) = 1 + x + x2 + · · · + xn−1 .
gSPC (x) i=1
In this example, h(x) = g ⊥ (x) and g(x) = h⊥ (x), but in general this is not
true (see also Example 6.4).
is
1 α ... αn−1
1 α2 α2(n−1)
...
HRS =
.. .. .. .. .
. . . .
1 αn−k . . . α(n−1)(n−k)
The vector representation HRS · cT = 0 implies that for any codeword c(x),
we have c(α) = c(α2 ) = · · · = c(αn−k ) = 0.
First, we prove that a primitive RS code is cyclic. Let
Then, ce(αℓ ) = αℓ ·c(αℓ )−cn ((αℓ )n −1) = αℓ ·c(αℓ ) and therefore also ce(αℓ ) = 0
for ℓ = 1, . . . , n − k. Thus, ce(x) is also a codeword of the RS code and the
primitive RS code is cyclic.
Second, we want to determine the generator polynomial of the primitive
RS code. Since for any codeword, we have c(α) = c(α2 ) = · · · = c(αn−k ) = 0,
it has αℓ , ℓ = 1, . . . , n − k, as roots and
g(x) = (x − α) · (x − α2 ) · · · (x − αn−k ).
g ⊥ (x) = (x − 1) · (x − α) · · · (x − αk−1 ).
6.2.1 Definition
For the definition of BCH codes, we need a union of cyclotomic cosets,
denoted by D and called defining set of the code.
The following lemma shows that not all lengths are possible when constructing
a BCH code, in particular the length should be co-prime with the characteristic
of the field.
Proof. n | (q s − 1) means
∃ k ∈ N, such that k · n = q s − 1.
Moreover, assume now gcd(n, q) = c with c ∈ N, then it follows
∃ a, b ∈ N, such that
a · c = q,
b · c = n.
Replacing n and q with this gives
k · bc = (ac)s − 1
k · bc = as cs − 1
c −k · b + as cs−1 = 1.
Since k, a, b, c ∈ N it needs to hold that c = 1.
In the previous section, we have seen that length and dimension of a BCH
code directly follow from the definition by the defining set. In this subsection,
we bound the minimum distance d of a BCH code constructed by g(x) =
Q
i∈D (x − α ).
i
Theorem 24 (The BCH Bound). Let C be an [n, k, d]q cyclic code (BCH
code) where n | (q s − 1) and α is an element of order n. Assume that
{b, b + 1, . . . , b + δ − 2} ⊆ D for some integers b and δ ≥ 2. Then d ≥ δ.
Some classes of cyclic codes that we have treated previously can be seen as
special BCH codes.
Recall that codes that satisfy the sphere-packing bound (Section 4.3.1) with
equality are called perfect codes as the whole space is filled when decoding
spheres of radius b d−1
2
c are drawn around each codeword.
The following Golay codes are one of the few non-trivial perfect codes.
6.2.4 Decoding
Reed–Muller Codes
The recommended literature for this chapter is [3, Sections 5.1, 5.2].
The length and dimension follow from the construction, the minimum distance
is proved later for the recursive construction (see Theorem 26).
96 Chapter 7 Reed–Muller Codes
where CRP = {(0, . . . , 0), (1, . . . , 1)} ∈ F22 is a repetition code of length 2m .
m
Proof. We have to prove that this is indeed an RM(1, m + 1) code, i.e., it has
parameters [2m+1 , m + 2, 2m ]2 .
and RM(1, 1) = {(0, 0), (1, 1), (0, 1), (1, 0)}.
Now, let us show that RM(1, m + 1) has 2m+2 − 2 codewords of weight 2m
if RM(1, m) has 2m+1 − 2 codewords of weight 2m−1 . The codewords from
RM(1, m + 1) are constructed recursively by (u, u + v) as in Theorem 26.
First consider all words that were constructed with v = 0 and u ∈ / {0, 1}. (Notice
that u ∈ {0, 1} are the two cases of weight-0 and weight-2 words as mentioned
m
For decoding, we split the received word into two halves: r = (r1 , r2 ) =
(u + e1 , u + v + e2 ).
Note that by simplej majority
k decision, the repetition code Crep of length 2m
can always correct 2 2−1 = 2m−1 − 1 errors uniquely.
m
2 Calculate rrep := r1 + r2
3 Majority decision on the symbols of rrep gives v0 ∈ Crep
4 Calculate r02 := r2 − v0
5 Decode r1 in RM(1, m) and denote the resulting codeword by u1
6 Decode r02 in RM(1, m) and denote the resulting codeword by u2
7 if d(r1 , u1 ) < d(r02 , u2 ) then
8 u0 := u1
9 else
10 u0 := u2
Output: Codeword (u0 , u0 + v0 ) of RM(1, m + 1)
Thus, the dual of the RM(1, m) code is the extended Hamming code EH(m).
Second, we define the Simplex code and analyze its connection to first-order
RM codes.
From this definition, we see that the Simplex code S(m) is a shortened
first-order RM code RM(1, m) where the all-one row and the least-weight
(left-most column in Figure 7.1) were removed from the generator matrix.
1 1 ... 1
0 0 ... 1
GRM(1,m) =
.. .. ..
. . .
0 1 ... 1
GS(m)
Proof. The length and dimension follow from the definition of the generator
matrix.
For the minimum distance, note that from the generator matrix of the RM(1, m),
the all-one row and the left-most column were removed to obtain a generator
matrix of the S(m) code. After removing the all-one row, only codewords of
RM(1, m) remain that are 0 at the first position and therefore removing the first
column of the generator matrix does not decrease the distance.
Figure 7.2 illustrates the connections of first-order Reed–Muller codes RM(1, m),
Hamming codes H(m), and Simplex codes S(m).
Reed-Muller RM(1, m)
[2m , m + 1, 2m−1 ]2
dual code
Figure 7.2: Illustration of the connection of RM(1, m) with H(m) and S(m)
Proof. The length and the dimension are clear from the construction.
We are therefore left with proving the minimum distance.
Let a, b ∈ C with a 6= b and a = (u, u + v), b = (u0 , u0 + v0 ), where u, u0 ∈ Cu
and v, v0 ∈ Cv .
If on the one hand v = v0 , we have that d(a, b) = wt(u − u0 , u − u0 ) ≥ 2du and
there exist u, u0 such that wt(u − u0 , u − u0 ) = 2du .
If on the other hand v 6= v0 :
where we used that in general it holds that wt(a, b) ≥ wt(a) − wt(b). Further
there exists a v ∈ Cv such that wt(0, v) = dv which proves the equality for the
claim on the minimum distance.
Notice that RM(2, 3) is an [8, 7, 2]2 single-parity check code with known
generator matrix.
We can also further decompose:
Finally, we obtain:
!
GRM(2,3) GRM(2,3)
GRM(2,4) =
0 GRM(1,3)
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1
1 1 1
= 1 1 1 1 .
1 1 1 1
1 1 1 1
1
1 1 1
1 1 1 1
1 1 1 1
RM(0, 1) RM(1, 1)
[2, 1, 2] [2, 2, 1]
Code Concatenation
This chapter deals with concatenated codes. The goal of concatenated codes
is to build long codes (which usually have a better performance, i.e., get
closer to the capacity of the channel) from short codes, but decode the short
codes. This is usually quite efficient as decoding short codes can be done
quickly.
The encoding and decoding process is visualized in Fig. 8.1. Here, encoding
is done first with an encoder for code A and then with an encoder for B.
Decoding is then done vice versa. We will see throughout this chapter that
the encoder of B might actually take several codewords of A to encode them
to one or several codeword(s) of B.
The recommended literature for this chapter is [3, Chapter 9] and [1, Chapter 12].
The code A is called the outer code and the code B is called the inner code.
Information symbols
kB Checks on rows
nB Checks on columns
Checks on checks
kA
nA
Proof. The length and dimension follow from the construction, see also Fig. 8.2.
For the minimum distance, we start by showing that d ≥ dA dB . A non-zero
codeword of the product code contains at least one non-zero row r. This row is
a codeword of A and has weight wt(r) ≥ dA . It follows that at least dA columns
of the product codeword are non-zero. Every such column is a codeword of B.
Hence, the weight of each such column is at least dB . The overall number of
non-zero entries is consequently at least dA dB and thus d ≥ dA dB .
Second, we want to prove that d = dA dB . We first show this equality for q = 2.
Let a be a minimal-weight (non-zero) codeword of A and b be a minimal-weight
(non-zero) codeword of B. Then, fix the encoder of B and let ub be the information
word that is mapped to b. Now, choose ua such that you obtain a in each row
where ub is non-zero. Encoding each column with the encoder for B will give wt(a)
times the column vector b and thus a codeword of weight wt(a) wt(b) = dA dB .
This is also illustrated in Fig 8.3.
Now we verify that d = dA dB for q > 2. Let ua ∈ Fkq A and ub ∈ Fkq B be
information words that are mapped to minimal-weight (non-zero) codewords of
A and B, respectively. To form a kA × kB information array, we will use several
information words from the set {λ · ua : λ ∈ F} corresponding to codewords
{λ · a : λ ∈ F} from the code A. We choose proper values of λ for each row such
that after encoding all rows by the encoder of A, the first non-zero column in the
obtained kB × nA matrix is exactly ub . Note that each column in this matrix can
be represented as µ · ub for some µ ∈ Fq . Encoding each column with the encoder
for B will give wt(a) times the column vectors from the set {µ · b : µ ∈ F∗q }. Note
that wt(b) = wt(µ · b) for µ ∈ F∗q . Hence, the resulting codeword has weight
wt(a) wt(b) = dA dB .
ub
1 a
0 0
0 0 a is a minimal weight
1 a (non-zero) codeword of A
1 a
0 0
b is a minimal weight
(non-zero) codeword of B
Proof. Let U denote the information array as kB × kA array and let GB be the
generator matrix of the code B, and GA be the generator matrix of the code A.
Encoding in either order results in the codeword (as a matrix of size nB × nA )
defined by
C = GTB · U · GA . (8.1)
Since matrix multiplication is associative, all columns are codewords of B and all
rows are codewords of A.
check code. The product code A ⊗ B is a [32, 3, 16]2 code due to Theorem 30.
Encoding is then done by two encoding steps (first by A, illustrated in black,
and second by B, illustrated in red):
kA u
u A 0 u0 u0 u0 u0 u0 u0 u0
0 u u1
1 u1 u1 u1 u1 u1 u1
kB u1
B u u2 u2 u2 u2 u2 u2 u2
2
u2
p p p p p p p p
p = u0 + u1 + u2
We see that each row is a codeword of the repetition code and each column
a codeword of the single-parity check code. In vector notation, the codeword
is the vector (10101010 . . . 1010) of length 32.
Plugging n = 32 and d = 16 into the Gilbert-Varshamov bound
!
X
d−2
n−1
n−k
2 >
i=0
i
proves the existence of a code with dimension k = 2 and thus the above
product code is not so bad.
From the previous example, we can see that product codes usually do not
have a good minimum distance, however they are efficiently decodable as
we just have to use the decoders for the short inner and outer code.
enc in A expand
1 Fpm 1 Fpm m Fp
kA nA
nA
enc in B
m=kB
(pm )kA = pm·kA = pkB ·kA
nB Fp
nA
The nB ×nA array (b(1) , b(2) , . . . , b(nA ) ) (or equivalently the vector representation
of length nA nB ) is a codeword of the concatenated code C. Similarly, the set
of all vectors (b(1) , b(2) , . . . , b(nA ) ) of length nA nB defines the concatenated
code C.
The previous construction of code concatenation can be generalized to
kB = a · m, for any fixed integer a by taking a codewords of A as information
symbols of the encoder of B. This is basically a generalization of product
codes (for m = 1 and kB = a, we obtain a product code as in the previous
section).
The following theorem states the parameters of concatenated codes.
• dimension k = kA kB (over Fp ),
• minimum distance d ≥ dA dB .
The proof is similar to the one for product codes and therefore omitted here.
Concatenated codes are still “bad” in the sense that there exist codes of
larger dimension (for the same n and d), but due to their structure there are
easy and efficient decoding algorithms. Also, compared to product codes,
their parameters are usually better.
Expanding each symbol of F24 to 4 bits (using the table from Example 3.10
and reading from right to left) and encoding each with the [5, 4, 2]2 single-parity
check code gives:
( 1 1 1 0 α α2 α7 1 0 0 1 α8 0 1 0 ) ∈ F24
expand
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
1 1 1 0 0 0 1 1 0 0 1 1 0 1 0
encode in B
1 1 1 0 1 1 1 1 0 0 1 0 0 1 0
This shows that concatenated codes usually result in better parameters than
product codes.
Bℓ ⊆ · · · ⊆ B 2 ⊆ B 1 ,
where
Pi−1
• B1 becomes Bi by fixing the j=1 mj leftmost information symbols to
zero,
Pℓ
• kB1 = j=1 mj ,
• dB1 ≤ dB2 ≤ · · · ≤ dBℓ .
Encode a codeword a(i) from Ai , i = 1, . . . , ℓ. The columns of
(1)
(1) (1)
a a0 . . . anA −1
.
.
. =
.. . . ..
. ..
a(ℓ) a0
(ℓ) (ℓ)
. . . anA −1
P
can be seen as vectors of length ℓj=1 mj over Fp which are encoded with B1
to nA codewords of B1 , denoted by b(1) , b(2) , . . . , b(nA ) .
The set of all nB ×nA arrays (b(1) , b(2) , . . . , b(nA ) ) (or equivalently represented
as vectors of length nA nB ) defines the generalized concatenated code C.
.. .. .. ..
. . . .
kAℓ nA
enc in Aℓ
= a(ℓ) mℓ Fp
Fpmℓ Fpmℓ
enc in B1
nB − kB1
The length and dimension follow trivially from the construction. The proof
idea for the minimum distance is shown in the following example.
kA1 = 1 nA = 8 8
A1
α + α2 α + α2 , . . . , α + α2 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1
kA2 = 4 nA = 8 0 0 0 0 0 0 0 0
1010 10101010 1 1 0 1 0 1 0 1 0
enc in B1
(1101) · GB1 nA = 8
0 1 0 1 0 1 0 1
[nA · nB = 40, 0 1 0 1 0 1 0 1
m1 · kA1 + m2 · kA2 = 7, nB = 5 1 0 1 0 1 0 1 0
d ≥ min {8 · 2, 4 · 4} = 16]2
1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0
Here it also becomes clear why d = min{dA1 dB1 , dA2 dB2 }. If a(1) = 0, all
rows are codewords of A2 (the first three rows are the all-zero word) and all
columns of the resulting 5 × 8 codeword matrix are codewords of the code B2
(since the first three information symbols when encoding with B1 are zero).
Thus, in this case, we obtain d ≥ dA2 dB2 .
On the other hand, if a(1) 6= 0, there are at least dA1 non-zero columns which
are all encoded to columns of weight at least dB1 and d ≥ dA1 dB1 .
Thus, this generalized concatenated code is a [40, 7, 16]2 code.
For comparison, by using standard concatenation of a [8, 1, 8]24 and a [5, 4, 2]2
code, we obtain only a [40, 4, 16]2 code, which has smaller dimension for the
same distance. Similarly, from a [8, 1, 8]2 and a [5, 4, 2]2 code, we obtain a
[40, 4, 16]2 product code.
Convolutional Codes
This chapter gives a very brief overview of convolutional codes. Similar
to block codes, the codewords of a convolutional code are obtained from
a mapping that is applied to the information words. However, in theory,
the codewords of a convolutional code are (semi-)infinite and are therefore
called sequences. Convolutional codes are widely used in practice, mostly
due to the Viterbi decoding algorithm which is an easy-to-understand ML
decoding principle that can be done with feasible computational effort.
The recommended literature for this chapter is [3, Chapter 8] and
[6, Chapter 1], where the latter one provides extensive treatment of convolu-
tional codes.
+ +
...
+
Figure 9.1: Encoding by linear shift register with k = 1, n = 2
We denote by i the input sequence and by c(1) and c(2) the output sequences.
Let i = (11001000...), then for the upper output c(1) = (10011110...) since
(1)
cj = ij + ij−1 + ij−2 . For the lower output c(2) = (11111010...) since
(2)
cj = ij + ij−2 . These two outputs can be combined (represented by the
switch on the right in Figure 9.1) into one output c = (1101011111101100).
c(1)
+
i(1) ...
c(2)
+
i(2) ...
DEMUX MUX
.. ..
. . c(n)
+
i(k) ...
Figure 9.2: A linear shift register encoder for a convolutional code with k
inputs and n outputs
• Code rate: R := nk ,
10
1 1
01 01
10
0 1 1 0
00
11 11
0 0
ij = 0
00 ij = 1
Figure 9.3: State diagram of the shift register from Figure 9.1
(1) (2)
Instead of dashing one line, we frequently mark the edges by, e.g., ui /(ci , ci ),
i.e., the input bit and then the two output bits.
G(D) = (1 + D + D2 , 1 + D2 ).
representation is
i(D) = 1 + D + D3 .
The encoded polynomial is calculated by
c = (110101001011...).
For convolutional codes, there are many different distance measures, some
of them grow with the length of the sequence, others do not. In this lecture,
we only consider the so-called free distance.
The free distance df of a convolutional code is the minimum number of
differing symbols in any two (infinitely long) code sequences.
The free distance clearly depends on the memory m of the shift register.
Since we consider only linear convolutional code, the free distance equals
the minimum weight of any non-zero codeword (sequence).
At first glance, determining the free distance of a convolutional code sounds
complicated, but it can be found by determining the loop starting in the
zero-state with the smallest (non-zero) edge weight in the state diagram as
shown in the following example.
...0011101100..,
where clearly any number of zeros can be appended in the beginning and in
the end.
9.2.1 Termination
Definition 9.1 (Termination after L Blocks). Add k·m zeros to the information
sequence after L blocks such that the encoder ends in the zero state.
9.2.2 Truncation
Alternatively, we can simply stop the output of the encoder after L blocks.
This is called truncation.
Definition 9.2 (Truncation after L Blocks). Cut the encoder output after
L output blocks.
There is no rate loss (the rate is Rtrunc = R = nk ), but the last information
bits have a worse protection against errors. For example, the last information
block only influences the last codeword block whereas the first information
block influences m + 1 codeword blocks. Thus, if the last codeword block is
erased, we cannot do anything to recover the last information block whereas
the first information block can most likely still be recovered when the first
codeword block is erased.
9.2.3 Tailbiting
The last principle combines the advantages of the previous two approaches:
no rate loss and equal error protection of all blocks. This is called tailbiting.
Definition 9.3 (Tailbiting after L Blocks). The encoder starts and ends
after L blocks in the same state.
9.3.1 Trellis
The partial trellis is an alternative representation of the state diagram.
Let us introduce the partial trellis by returning to our previous example.
00
(00)
11
10
(01)
01
11
(10)
00
01
(11)
10
Figure 9.4: The partial trellis represents states and transitions, where solid
lines indicate a 1 and dashed lines a 0.
From the example, we can see that the nodes represent the states of the
memory elements of the shift-register. The paths represent possible state
transitions (solid line —— : 1 as input, dashed line − − : 0 as input).
The output blocks are written above/below the states corresponding to the
upper and lower incoming edge, respectively.
Based on the partial trellis, we can define the trellis. Similar to the state
diagram, all paths through the trellis are codewords and all codewords are
paths of the trellis. The trellis consists of a concatenation of several partial
trellis where each partial trellis corresponds to one information/code block.
The trellis starts in the zero state (as does the encoder, since the memory
elements are initialized with zero) and for each block the possible states on
the left are connected with the states on the right side. After m blocks, the
full partial trellis is used, i.e., every state of the partial trellis is reached.
For the previous example, the trellis is shown in Figure 9.5.
00 00 00 00 00 00 00
(00)
11 11 11 11 11
10 10 10 10 10 10
(01)
01 01 01 01 01
11 11 11 11 11 11 11
(10)
00 00 00 00 00
01 01 01 01 01 01
(11)
10 10 10 10 10
Figure 9.5: Trellis of the previous example
5
2
0/00
11
0/
3
3
2
0/00
11
10
0/
0/
assume r = (10|10|00|00|01|11)
4
2
3
2
4
5
ĉ = (11|10|00|01|01|11)
4
11
01
0/00
1/10
1/
11
1/
01
0/
00
0/
10
1/
0/
4
1
3
2
4
3
3
11
01
0/00
1/10
1/
11
1/
01
0/
0/
10
00
0/
1/
2
3
2
11
01
0/00
1/
0
1/
/10
1
1
11
0/00
1/
(00)
(01)
(10)
(11)