Convolutional Codes I Algebraic Structure
Convolutional Codes I Algebraic Structure
6, NOVEMBER1970
Abstract-A convolutional encoder is defined as any constant [l] and others that the Viterbi [2] maximum-likelihood
linear sequential circuit. The associated code is the set of all output decoding algorithm is the dynamic programming solution
sequences resulting from any set of input sequences beginning at to a certain control problem, and in the observation of
any time. Encoders are called equivalent if they generate the same
code. The invariant factor theorem is used to determine when a
Massey and his colleagues [3]-[S] that certain questions
convolutional encoder has a feedback-free inverse, and the minimum concerning error propagation are related to questions
delay of any inverse. All encoders are shown to be equivalent to concerning the invertibility of linear systems. As the
minimal encoders, which are feedback-free encoders withfeedback- theory of finite-dimensional linear systems is seen in-
free delay-free inverses, and which can be realized in the conven- creasingly as essentially algebraic, we have another motive
tional manner with as few memory elements as any equivalent en-
coder. Minimal encoders are shown to be immune to catastrophic for examining convolutional encoders in an algebraic
error propagation and, in fact, to lead in a certain sense to the context.
shortest decoded error sequencespossible per error event. In two Our result is a series of structure theorems that dissect
appendices, we introduce dual codes and syndromes, and show the structure of convolutional codes rather completely,
that a minimal encoder for a dual code has exactly the complexity mainly through use of the invariant factor theorem. We
of the original encoder; we show that systematic encoders with
feedback form a canonical class, and compare this class to the arrive at a class of canonical encoders capable of generating
minimal class. any convolutional code, and endowed with all the desirable
properties one might wish, except that in general they
I. INTR~BUCTI~N are not systematic. (The alternate canonical class of
LOCK CODES were the earliest type of codes to systematic encoders with feedback is discussed in Appendix
be investigated, and remain the subject of the II.) The results do not seem to suggest any constructive
B methods of generating good codes, and say lit,tle new
overwhelming bulk of the coding literature. On
the other hand, convolutional codes have proved to be in particular a.bout the important class of rate-l/n codes,
equal or superior to block codes in performance in nearly except for putting known results in a more general context.
every type of practical application, and are generally It appears that the results obtained here for convolutional
simpler than comparable block codes in implementation. codes correspond to block-code results ([9], ch. 3).
This anomaly is due largely to the difficulty of analyzing
convolutional codes, as compared to block codes. It is II. PROBLEM FORMULATION
the intent of this series to stimulate increased theoretical We are purposefully going to take a rather long time
interest in convolutional codes by review and clarification getting to our main results. Most of this time will be spent
of known results and introduction of new ones. We hope, in definitions and statements of fundamental results in
first, to advance the understanding of convolutional codes convolutional coding theory, linear system theory, and
and tools useful in their analysis; second, to motivate algebra. It is a truism that when dealing with fundamentals,
further work by showing that in every case in which block once the problem is stated correctly, the results are easy.
codes and convolutional codes can be directly compared We feel it is important that the right formulation of the
theoretically, the convolutional are as good or better. problem (like justice) not only be done, but be seen to
Two converging lines of development have generated be done, in the eyes of readers who may have backgrounds
interest in an algebraic approach to convolutional codes. in any of the three areas noted.
On the one hand, the success of algebraic methods in After exhibiting a simple convolutional encoder for
generating classes of good block codes suggests that motivation, we move to a general definition of convolu-
constructive methods of generating good convolutional tional encoders, which we see amount to general finite-
codes might be developed through use of algebraic struc- state time-invariant linear circuits. We discuss the decoding
tures. Correspondingly one might expect that powerful problem, which leads to a definition of convolutional
decoding techniques based on such structures might be encoder equivalence and to certain desirable code prop-
discovered. (Fortunately, good codes and decoding erties. Along the way certain algebraic artifacts will
methods not relying on such constructions are already intrude into the discussion; in the final introductory
known.) On the other hand, the usefulness of regarding section we collect the algebra we need, which is centered
convolutional encoders as linear sequential circuits has on the invariant factor theorem.
begun to become evident, as in the observation of Omura
Convolutional Encoder
Manuscript received December 18, 1969. Part of this paper Fig. l(a) shows a simple binary systematic rate-l/2
was presented at the Internatl. Information Theory Symposium,
Ellenville, N. Y., January 27-31, 1969. convolutional encoder of constraint length 2. The input
The author is with Codex Corporation, Newton, Mass. 02158. to this encoder is a binary sequence
FORNEY : CONVOLUTIONAL CODES I ia1
YI@) = g~W-0)
~0) = aP)x(D),
where the generator polynomials gl(D) and g,(D) are
(a)
ga(D) = 1 + D + D2,
and ordinary sequence multiplication with coefficient
operations modulo 2 and collection of like powers of D
is implied.
Similarly, we can define a general (n, k) conventional
convolutional encoder by a matrix of generator poly-
nomials gsj(D), 1 < i < k, 1 < j 5 n, with coefficients
in some finite field F. There are k-input sequences xi(D),
n-output sequences yi(D), each a sequence of symbols
from F, with input/output relations given by
1 Y2
Y,(D) = 8 xi(Qgii(D),
(b)
Fig. 1. (a) A rate-l/2 systematic convolutional encoder. again with all operations in F. If we define the constraint
(b) Alternate representation.
length for input i as
nomials such as l/(1 + D), without encountering the whose rows are the’generators gi, such that the input/out-
ambiguity l/(1 + D) = D-l + Dm2 + . . . .) If a sequence put relationship is
2; “starts” at time d (if the first nonzero element is xid), y = xG.
we say it has delay cl, de1 xi = d, and if it ends at time d’,
we say it has degree d’, deg xi = d,, in analogy to the Therefore from this point we use the matrix notation
degree of a polynomial. Similarly, we define the delay y = xG in preference to the functional notation y = G(x).
and degree of a vector of sequences as the minimum delay Finally, this definition implies that a zero input gives
or maximum degree of the component sequences: de1 x = a zero output so that the encoder may not have any
min de1 xi, deg x = max deg x,. A finite sequence has both transient response (nonzero starting state).
a beginning and an end. (Note that most authors consider Causal: If the nonzero inputs start at time t, then
only sequences starting at time 0 or later. It turns out that the nonzero outputs start at time t’ > t. Since the unit
this assumption clutters up the analysis without com- inputs start at time 0, this implies that all generators
pensating benefits.) start at time 0 or later. As sequences, all generators
N-output: There are n-output sequences yi, each must therefore have all negative coefficients equal to
with elements from F, which we write as the row vector y. 0, or de1 gi ~2 0. Conversely, the condition that all
The encoder is characterized by the map G, which maps generators satisfy de1 gi 2 0 implies
any vector of input sequences x into some output vector
y, which we can write in the functional form de1 y = de1 & x,gi
y = G(x).
2. min [de1 xi + de1 gJ
l<iSk
Constant (Time-Invariant) : If all input sequences are
shifted in time, all output sequences are correspondingly 2 de1 x,
shifted. In delay-operator notation, causality.
G(D”x) = DnG(x) n any integer. Finite-state: The encoder shall have only a finite
number of memory elements, each capable of assuming
(Note that most probabilistic analyses of convolutional a finite number of values. The physical state of an encoder
codes have been forced to assume nonconstancy to obtain at any time is the contents of its memory elements; thus
ensembles of encoders with enough randomness to prove there are only a finite number of physical states. A more
theorems.) abstract definition of the state of an encoder at any time
Linear: The output resulting from the superposition is the following: the state s of an encoder at time t- is the
of two inputs is the superposition of the two outputs sequence of outputs at time t and later if there are no
that would result from the inputs separately, and the nonzero inputs at time t or later. Clearly the number
output “scales” with the input. That is, of states so defined for any fixed t is less than or equal to
the number of physical states, since causality implies
G(x, + x,> = Gh) + G(x,)
that each physical state gives some definite sequence,
G&x,) = aG(xJ a E F. perhaps not unique. Thus an encoder with a finite number
of physical states must have a finite number of abstract
It is easy to see that constancy and linearity together
states as well.
give the broader linearity condition
By studying the abstract states, we develop further
G(ax1) = aG(x,), restrictions on the generators gi. Let us examine the set
of possible states at time l-, which we call the state space
where (Y is any sequence of elements of F in the delay 2. (By constancy the state spaces at all times are iso-
operator D. Furthermore, they imply a transfer-function morphic.) Let P be the projection operator that truncates
representation for the encoder. For let ei, 1 2 i 5 k, be sequences to end at time 0, and Q the complementary
the unit inputs in which the ith input at time 0 is 1, projection operator 1 - P that truncates sequences to
and all other inputs are 0. Let the generators gi be defined start at time 1:
as the corresponding outputs (impulse responses) :
xP = xdDd f a.+ + x-,D-’ + x0
gi = G(ci) l<i<k.
xQ = xlD2 + xsD2 + e-3 ,
Then since any input x can be written
k Then any input x is associated with a state at time l-
X = CXiei, given by
i=1
s = xPGQ;
we have by linearity
conversely, any state in I: can be so expressed using any
y = G(x) = 2 xigi. input giving that state. Now the state space Z is seen to
i-1 satisfy the conditions to be a vector space over the field F,
Thus we can define a k X n transfer function matrix G, for P, G, and Q are all linear over F; that is, if
FORNEY: CONVOLUTIONAL CODES I 723
turned on, can be made immune to ordinary error pro- 1) G has a realizable feedback-free zero-delay inverse
pagation simply by restarting whenever it detects it is G--l.
in trouble, or even periodically. 2) G is itself conventionaJ (feedback-free).
W e owe to Massey and Sain [4] the observation that 3) The obvious realization of G requires as few memory
if there is any infinite information sequence x0 such that elements as any equivalent encoder.
the corresponding codeword y, = x,G is finite, then even 4) Short codewords are associated with short information
in the absence of ordinary error propagation decoding sequences, in a sense to be made precise later.
catastrophes can occur. For there will generally be a
nonzero probability for the finite error event e = y. to In the context of linear system theory, the study of
occur, which will lead to infinite information errors e, = x0. convolutional encoders under equivalence can be viewed
Thus % will differ from the original information sequences as the study of those properties of linear systems that
x in an infinite number of places, even if no further decoding belong to the output space alone, or of the invariants
errors are made by the codeword estimator. (In fact, over the class of all systems with the same output spaces.
the only chance to stop this propagation is to make another
error.) Massey and Sain call this catastrophic error propa- Algebra
gation. Since C?’ must supply the infinite output x0 in W e assume that the reader has an algebraic background
response to the finite input y,, it must have internal roughly at the level of the introductory chapters of Peter-
feedback for the above situation to occur. W e therefore son [9]. He will therefore be familiar with the notion of a
require that any useful encoder must not only have a field as a collection of objects that can be added, subtracted,
realizable pseudo-inverse Q-‘, but one that is feedback- multiplied, and divided with the usual associative, dis-
free. (We see later that if G has no such inverse, then tributive, and commutative rules; he will also know what
there is indeed an infinite input leading to a finite output.) is meant by a vector space over a field. Further, he will
W e come at last to our most important observations understand that a (commutative) ring is a collection of
(see also [3]). The codeword estimator dominates both objects with all the properties of fields except division.
complexity and performance of the system: complexity, He should also recall that the set of all polynomials in
because both G and 6-l represent simple one-to-one D with coefficients in a field F, written conventionally
(and in fact linear) maps, while the codeword estimator as F[D], is a ring.
map from received data r to codeword 9 is many-to-one; The polynomial ring F[D] is actually an example of
performance, because in the absence of catastrophic error the very best kind of ring, a principal ideal domain. The
propagation the probability of decoding error is equal to set of integers is another such example. W ithout giving
the probability of an error event multiplied by the average the technical definition of such a ring, we can describe
decoding errors per error event, with the former generally some of its more convenient properties. In a principal
dominant and the. latter only changed by small factors ideal domain, certain elements r, including 1, have inverses
for reasonable pseudo-inverses Q-’ (in fact, in many prac- r -’ such that rr-l = 1; such elements are called units.
tical applications the probability of error event is the sig- The unit integers are fl, and the unit polynomials are
nificant quantity, rather than the probability of decoding those of degree 0, namely, the nonzero elements of F.
error). But the performance and complexity of the code- Those elements that are not units can be uniquely factored
word estimator depend only on the set of codewords y, not into products of primes, up to units; a prime is an element
on the input/output relationship specified by G (assuming that has no factor but itself, up to units. (The ambiguity
that all x and hence y are equally likely, or at least that induced by the units is eliminated by some convention:
the codeword estimator does not use codeword probabilities, the prime integers are taken to be the positive primes,
as is true, for example, in maximum-likelihood estimation). while the prime polynominals are taken to be monk
Therefore it is natural to assert that two encoders genera- irreducible polynomials, where manic means having
ating the same set of codewords are essentially equivalent highest order coefficient equal to 1.) It follows that we
in a communications context. So we give the following can cancel: if ab = ac, then b = c; this is almost as good
definitions. as division. Further, we have the notion of the greatest
Definition 2: The code generated by a convolutional common divisor of a set of elements as the greatest product
encoder G is the set of all codewords y = xG, where the k of primes that divides all elements of the set, again made
inputs x are any sequences. unique by the conventional designation of primes.
Definition S: Two encoders are equivalent if they Other principal ideal domains have already occurred
generate the same code. in our discussions. In general, if R is any principal ideal
These definitions free us to seek out the encoder in domain and X is any multiplicative subset-namely, a
any equivalence class of encoders that has the most group of elements containing 1 but not 0 such that if a E S,
desirable properties. W e have already seen the desirability b E ii’, then ab E X-then the ring of fractions S-‘R
of having a feedback-free realizable pseudo-inverse 0-l. consisting of the elements r/s where r E R and s E S is a
Our main result is that any code can be generated by an principal ideal domain. (All elements of R that are in X
encoder G with such a G-‘, and in fact with the following are thereby given inverses and become units.) Letting R
properties. be the polynomials F[D], we have the following examples.
726 IEEE TIUNSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
1) Let S consist of all nonzero polynomials. Then S’R if there is any decomposition G = AI’B such that A and
consists of all ratios of polynomials with the denominator B are invertible R-matrices and I? is a diagonal matrix
nonzero, which are called the rational functions, written with Yi I ~c+l or ~i+l = 0, then the yi are the invariant
conventionally F(D). Obviously, in F(D) all nonzero factors of G with respect to R.
elements are invertible, so F(D) is actually a field, called Sketch of Proof [lo]: G is said (in this context only)
the field of quotients of F[D]. to be equivalent to G’ if G = AG’B, where A and B are
2) Let S consist of all nonnegative powers of D, in- square k X k and n X n R-matrices with unit deter-
cluding Do = 1. Then S-IR consists of elements D-“f(D), minants; the assertion of the theorem is that G is equiva-
where f(D) is a polynomial; in other words, S-‘R is the lent to a diagonal matrix r that is unique under the
set of finite sequences F,,(D). Clearly all the irreducible specified conditions on the yi. Since any such A can
polynomials except D remain as primes in F,,(D). be represented as the product of elementary row operations,
3) Let S consist of all polynomials with nonzero constant and B of elementary column operations (interchange
term; that is, not divisible by D. Then K’R consists of of rows (columns), multiplication of any row (column)
ratios of polynomials in D with a nonzero constant term by a unit in R, addition of any R-multiple of any row
in the denominator. We saw earlier that these are precisely (column) to another), it can be shown that the Ai are
the generators realizable by causal finite-state systems; preserved under equivalence. In particular, therefore,
these are therefore called the realizable functions F,,(D). A, divides all elements of all equivalent matrices. We
Note that in F,.(D), D is the only prime element. will now show that there exists an equivalent matrix
In addition to the above, we shall be considering the in which some element divides all other elements, hence
ring of polynomials in D-‘, F[D-I]. We originally obtained is equal to A, up to units. Let G not be already of this
the realizable functions as ratios of polynomials in D-l form, and let (Yand ,9 be two nonzero elements in the same
with the degree of the numerator less than or equal to row or column such that (Y does not divide p. (If there is
the degree of the denominator; the realizable functions no element in the same row or column as cy not divisible
thus form a ring of fractions of F[D-‘1, but not of the by CY,there is some such p in some other column, and this
type S-‘R. In this ring the only prime is expressed as column can be added to the column containing (Y to give
(l/D-‘). an equivalent matrix for which the prescription above
We also use the ring containing all sequences x such can be satisfied.) By row or column permutations a! may
that de1 x 2 0, which in algebra is called the formal power be placed in the upper-left corner and p in the second
series in D and denoted conventionally as F[[D]]; F[[D]] entry of the first row or column; we assume column for
is also a principal ideal domain, whose only prime is D. definiteness. Now there exist x and y such that crx + py = 6,
If G is a matrix of field elements, then it generates a where 6 is the greatest common divisor of (Y and /3, and
vector space over the field. If it is a matrix of ring elements, has fewer prime factors than a! since a! 1 0, 6 # (Y. The
then it generates a module over the ring. (A module is row transformation below then preserves equivalence
defined precisely like a vector space, except the scalars while replacing Q!by 6:
are in a ring rather than a field. This is the difference
between block and convolutional codes.) The main theorem
concerning modules over a principal ideal domain-some
would say the only theorem-is a structure theorem,
which, when applied to matrices G, is called the invariant-
factor theorem. This theorem alone, when extended and
applied to different rings, yields most of our results.
Invariant-Factor Theorem: Let R be a principal ideal If 6 does not now divide all elements of the equivalent
domain and let G be a Ic X n R-matrix. Then G has an matrix, the construction can be repeated and 6 replaced
invariant-factor decomposition by some 6’ with fewer prime factors. This descending
chain can therefore terminate only with a 6 that does
G = APB, divide all elements of the equivalent matrix. Since 6 =
where A is a square k X k R-matrix with unit determinant, Al = yl, up to units, multiplication of the top row by a
hence with an R-matrix inverse A-‘; B is a square n X n unit puts y1 in the upper-left corner, and the whole first
R-matrix with R-matrix inverse B-‘; and I’ is a Ic X n row and column can be cleared to zero by transformations
diagonal matrix, whose diagonal elements yi, 1 5 i 5. k, of the above type (with x = 1, y = 0), giving the equiv-
are called the invariant factors of G with respect to R. alent matrix
The invariant factors are unique, and are computable
as follows: let Ai be the greatest common divisor of the
i X i subdeterminants (minors) of G, with A, = 1 by
convention; then yi = Ai/Ai-l. We have that yi divides
yi+l if yi+l is not zero, 1 5 i < k - 1. The matrices A
and B can be obtained by a computational algorithm where y1 divides every element of G,. Similarly, G, is
(sketched below); they are not in general unique. Finally, equivalent to a matrix G: of the same form, so
FORNEY: CONVOLUTIONAL CODES I
A=
L 1
’
1
D+l
D
0 A-’ = D D + 1
i i 1 1 1
where yl divides y2 and ya divides all elements of Gz. are two inverse binary polynomial 2 X 2 scramblers.
Continuing in this way, we arrive at a diagonal matrix r (The reader may at first be surprised, as was the author,
meeting the conditions of the theorem. Its uniqueness by the existence of nontrivial pairs of scramblers that
and the formula for the yi are obtained from the relation- are feedback-free and thus not subject to infinite error
Ship Ai = ni*<dy<t. Q.E.D. propagation.) The only 1 X 1 scramblers are the trivial
ones consisting of units of R.
The invariant-factor decomposition involves a similarity Now we illustrate an invariant factor decomposition
transformation in some respects reminiscent of diagonal- of G with respect to R by the block diagram of Fig. 5.
izing transformations of square matrices over a field; Input sequences are scrambled in the k X k R-scrambler
the invariant factors have some of the character of eigen- A; the outputs are then operated on individually by the
values. The analogy cannot be pressed very far however. invariant factors yi; finally, the k outputs plus n - k
The extension of the invariant-factor theorem to rings dummy zeroes are scrambled in an n X n R-scrambler B
of fractions is immediate. to give the output sequences.
Invariant-Factor Theorem (Exfension): Let R be a The invariant-factor theorem and its extension are
principal ideal domain and let Q be a ring of fractions well known in linear system theory, particularly in the
of R. Let G be a k X n Q-matrix. Let #be the least common work of Kalman [ll], [12], who attributes the first engi-
multiple of all denominators in G; then $G is an R-matrix. neering use of the extension above to McMillan [13].
Consequently #G has an invariant-factor decomposition As far as the author is aware, its use has generally been
$ G = APB. confined to the rings of polynomials and rational functions.
The utility of considering additional rings will become
Dividing through by $, we obtain an invariant-factor clear in the sections to follow.
decomposition of the Q-matrix G with respect to R
III. STRUCTURALTHEOREMS
G = ArB
Our principal results are presented in three sections.
where In the first, we show how to determine whether G has
inverses of various kinds. In the second, we show that
r = rf/+. every encoder G is equivalent to a so-called minimal
encoder, which is a conventional convolutional encoder
Here A and B are R-matrices with R-matrix inverses with a feedback-free inverse. In the final section we point
A-’ and B-‘. The diagonal elements yi of r are elements out other desirable properties of minimal encoders: they
of Q uniquely determined as yi = yfi/# = ai/fli, where require as few memory elements to realize as any equiv-
(Y~ and pi are obtained by canceling common factors in alent encoder, they allow easy enumeration of error events
y: and $, gcd (ai, pi) = 1. Since y: 1 Y:+~ if Y;+~ # 0, we by length, and they ensure that short codewords correspond
have that (xi 1 a!i+l if ayi+l # OandPi+l 1 pi, 1 < i < k - 1. to short information sequences in a way not shared by
Explicitly, if gi is the least common multiple of the de- nonminimal encoders. W e conclude that a minimal encoder
nominators of the i X i subdeterminants of G, if Bi is the is the natural choice to generate any desired code. In
greatest common divisor of the numerators, and Ai = two appendices, we discuss dual codes and systematic
~J$J~ with A0 = 1 by convention, then encoders; the latter may also be taken as canonical encoders
for any code.
yi = cyi/pi = Ai/Ai-l l<i<k.
Inverses
The yi are called the invariant factors of the Q-matrix
G with respect to R. Finally, if there exists any G = ArB In this section we shall determine when a k X n Q-
satisfying the above conditions, then the diagonal terms matrix G has an R-matrix right inverse G-‘, where Q is
of r are the invariant factors of G with respect to R. a ring of fractions of R and k 2 n. The results are stated
in terms of the invariant factors yi of G with respect to R.
W e may picture an invariant factor decomposition W e assume yk # 0, otherwise G has rank less than k and
more concretely as follows. Let a k X k scrambler A be thus no inverse of any kind.
defined as a k X k R-matrix with an R-matrix inverse Consider the invariant-factor decomposition of G with
A-‘. W e call it a scrambler because the map x’ = XA respect to R, as illustrated in Fig. 5. The outputs of the
is a one-to-one permutation of all the k-dimensional scrambler A when its inputs range over all possible se-
R-vectors x. For example, quences are simply all possible sequences in some different
IEEE TRANSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
Fig. 5. Invariant-factor decomposition of (n, k) encoder. W e then obtain the results we need as special cases.
Corollary 1: An encoder G has a feedback-free inverse
iff its minimum factor with respect to the polynomials
order, since A is invertible. Moreover, if the inputs to A F[D] is 1.
range over all vectors of k elements of R, then the outputs Corollary 6: An encoder G has a feedback-free pseudo-
are all vectors of k elements of R in a different order, inverse iff its minimum factor with respect to the finite
since ‘A and A-l are R-matrices. In particular, there is seqdences F,,(D) is 1. Furthermore, in this case and only
some input R-vector xk to A such that the output xkA in this case is there no infinite x such that y = xG is finite.
is Ed, the kth unit input, namely, x, = EVA-‘. Now the Corollary S: An encoder G has a realizable inverse 8
input vector y;‘xI, gives y;& at the output of. A, and 4 its minimum factor with respect to the realizable functions
at the output of I’, hence the R-vector QB at the matrix F,.(D) is 1.
output. But y;&, hence yklxk, is an R-vector if and only Corollary 4: An encoder G has a realizable pseudo-
if ykl is an element of R. Continuing with this argument, inverse iff its minimum factor with respect to the rational
we have the following. functions F(D) is 1; that is, 0~~# 0, or the rank of G is k.
Lemma 1: Let R be a principal ideal domain and Q a Here we have used the obvious facts that a feedback-
ring of fractions of R. Let G be a Q-matrix with invariant- free pseudo-inverse implies and is implied by a finite-
factor decomposition G = AI’B with respect to R, and sequence inverse, and similarly with a realizable pseudo-
invariant factors yi = CYJ~~, 1 < i < k. If 0~~# 1, then inverse and a rational inverse, where one is obtained
there is a vector y;‘q+A-’ that is not an R-vector but from the other in both cases by multiplication by D*d,
which gives an R-vector output. d being the delay of the pseudo-inverse. W e also note
Proof: If CY~# 1, then y;’ = pk/c+ is not an element that since G is both an F,,(D) and an F(D) matrix, the
of R, hence y;‘ek is not an R-vector, hence y;&A--l is invariant factors with respect to these rings cannot have
not an R-vector, since if it were (y;‘~A-l) A would be denominator terms; further, since F(D) is a field, the
an R-vector. But y;&AwlG = yk&I?B = skB is an R- only greatest common divisors are 1 and 0, and the rank
vector since efi and B are in R. Q.E.D. of G equals the rank of r since A and B are invertible.
W ith equal ease, we can obtain a sharper result on
W e call the numerator LY&of the kth invariant factor
pseudo-inverses Q- ’ in the cases where G has no inverse.
that appears in Lemma 1 the minimum factor of G with
W e make the following general definition of a pseudo-
respect to R; this designation will be justified by Theorem 2.
inverse.
From Lemma 1 we obtain a general theorem on inverses,
Dejinition: &’ is an R-matrix pseudo-inverse for G
application of which to particular rings R will settle many
with factor $ if e-’ is an R-matrix and G&” = #Ik.
questions concerning inverses. W e note that if G = AI’B,
If G-’ = B-‘I‘-‘A-’ is not an R-matrix inverse for G,
then G-’ = B-lI’-lA-l is an inverse for G, where r-l
it is because r-l is not an R-matrix. Since y;’ = ,L~J’CX~
is the n X lc matrix with diagonal elements y;‘. Since
and ai 1 o(~, 1 5 i 5 k, &’ = akGM1 = B-’ (aJ1)A--l
A-’ and B-’ are R-matrices, G-’ is certainly an R-matrix
is an R-matrix pseudo-inverse for G with factor CY~. Theorem
if all y;l are elements of R. The following theorem says
2 shows that this is the minimum such factor.
that if some y;’ is not an element of R, then G has no
Theorem 5’: Let R be a principal ideal domain and Q
R-matrix inverse.
a ring of fractions of R. Let G be a Q-matrix whose in-
Theorem 1: Let R be a principal ideal domain and Q
variant factors with respect to R are yi = cri/fii, 1 < i < k.
a ring of fractions of R. Let G be a Q-matrix whose invariant
Then G has an R-matrix pseudo-inverse G-’ with factor
factors with respect to R are yi = ailPi, 1 5 i 5 k. Then
a& further, all R-matrix pseudo-inverses have factors $
the following statements are equivalent.
such that ok divides $.
1) G has an inverse G-‘, which is an R-matrix. Proof: The discussion preceding the theorem shows
2) There is no x that is not an R-vector such that y = xG how to construct a pseudo-inverse with factor ffk. There-
is an R-vector, or y E R-implies x E R. fore let 0-l be any R-matrix pseudo-inverse, and consider
3) Ok = 1. the input x = y;‘ekA-‘. By Lemma 1, xG is an R-vector,
Proof: W e shall show 1 =+ 2 =+ 3 + 1. hence xG@’ is an R-vector, but
(1 =+ 2). If y = xG is an R-vector, then x = yG-’ is xG&’ = t//x
an R-vector since G-’ is an R-matrix.
(2 + 3). By Lemma 1, if ak # 1, then x = y#&A-’ = #y;lekA-l.
FORNEY: CONVOLUTIONAL CODES1 729
Hence (+~Y;‘QA-*)A = $y;& is an R-vector, which is In [71, Olson gives a test for the existence and minimum
to say J/y;l = $&/oI~ is an element of R. But gcd (OLD, delay of any feedforward inverse that is equivalent to
&J = 1; hence ak must divide # in R. Q.E.D. the above; although more cumbersome, Olson’s result
and proof are remarkable for being carried through success-
W e note that we could have obtained Theorem 1 as a
fully without the aid of the powerful algebraic tools used
consequence of Theorem 2 and Lemma 1.
here.
The pseudo-inverses we are interested in are realizable
pseudo-inverses with delay cl, or, in the terminology
Canonical Encoders Under Equivalence
introduced above, F,, (D)-matrix pseudo-inverses with
factor Da. Since the only prime in the ring of realizable The theorems of this section are aimed at the deter-
functions F,,(D) is D, and since G is itself realizable, mination of a canonical encoder in the equivalence class
the invariant factors of G with respect to F,,(D) are of encoders generating a given code.
yi = Ddi (or zero); and in part.icular the minimum factor The idea of the first theorem of this section is as follows.
is DdA. W e then define the delay d of a realizable matrix as Any encoder G has the invariant factor decomposition
d = d,, so c+ = Dd and di < d, 1 < i < k. Then Theorem 2 AI’B with respect to the polynomials F[D] as is illustrated
answers a problem stated by Kalman ([ll], lO.lOe) with in Fig. 5, where A is a k X k polynomial scrambler, B is
the following corollary. an n X n polynomial scrambler, and I’ is a set of generally
Corollary 1: Let G be realizable and let the minimum nonpolynomial transfer functions. If the inputs to A are
factor with respect to the realizable functions F,,(D) all the k-tuples of sequences; then since A is invertible
be C+ = Da. Then G has a realizable pseudo-inverse G-’ the outputs are also all the k-tuples in some different
with delay d, and no realizable pseudo-inverse with delay order. If none of the yi is zero, then the outputs of I’ are
less than d. also all Ic-tuples of sequences, if we allow many-to-one
Similarly, the question of the delay of a feedback-free encoders, so that G may have rank r < k and yr+l =
inverse, which was investigated in [4] and [7], is answered . . . = Yk may be zero, then the outputs of I? are all k-tuples
by Corollary 2. of sequences in which the last k - r components are zero.
Coorollary 2: Let G be realizable and let the minimum But the outputs of B with these inputs are the code gene-
factor with respect to the polynomials F[D] be CY~.Then rated by G; G is therefore equivalent to the encoder G,
G has a feedback-free pseudo-inverse G-’ with delay represented by the first r rows of B. Now since B is poly-
d’ if and only if cyk = Dd ford’ 2 d. nomial and has a polynomial inverse, G, is also polynomial
For computation, it is convenient not to have to compute and has a polynomial right inverse G;’ consisting of the
invariant factors repeatedly, so we use the following first r columns of B-‘. These observations are made precise
lemma. in the following theorem and proof.
Lemma 2: Let G have invariant factors yi with respect Theorem 3: Every encoder G is equivalent to a con-
to R; then the invariant factors with respect to Q are ventional convohrtional encoder G, that has a feedback-
r:, where 7: = yi up to units in Q and 7: is a product of free delay-free inverse G;‘.
primes in Q. Remark: In other words, G, and G;’ are polynomial
Proof: Let yi = -& y”, with 7:’ a unit in Q. Let and G,G;’ = I,.
G = AI’B be an invariant factor decomposition of G Proof: Let G have invariant-factor decomposition
with respect to R. Let B’ be the Q matrix obtained from B G = AI’B with respect to F[D]. Let G, be the first r rows
by multiplying the ith row by 7:‘; then det B’ = (det B) of B,
7ciyi’ is a unit in Q, since det B is a unit in R. Hence G =
AYB’ is an invariant-factor decomposition of G with
respect to Q, so the y: are invariant factors of G with
G, is polynomial since B is, and has a polynomial inverse
respect to Q. Q.E.D.
G;l equal to the first r columns of B-‘. To show equivalence,
Now we have the following recipe for deciding whether let y, be any codeword in the code generated by G; then
G has inverses of various types. Let G = ( hz/Gi 1; multiply
each row through by its denominator to obtain the poly- Y, = xoG
so forth; furthermore, system complexity is usually in some sense short codewords will be associated with
dominated by the codeword estimator, not the encoder. short information sequences. Let us establish a partial
However, Theorem 6 does tend to discourage spending ordering of information sequences such that x < x’ if
much time on looking for great encoder simplifications deg xi 5 deg xi for all i, with at least one strict inequality.
through unconventional approaches, such as the use of Codewords y can be ordered by their degrees deg y, namely,
feedback. the maximum degree of all components y,. Now we have
Another property of minimal encoders, called the the following:
predictable degree property, is a useful analytical tool. Lemma 7: x < x’ implies deg y < deg y’ if and only
Note that in general for any conventional encoder with if G is minimal, where y = xG, y’ = x’G.
constraint lengths vi, and any codeword y = xG, Proof: If G is minimal, and x < x’, then
deg y 5 max (deg xigJ deg y = max (deg xi + vi)
l<iSk
= max (deg 2; + vi). < max (deg Z: + vi>
l<iSk
= deg y’.
If equality holds for all x, we say G has the predictable
degree property. Now we have the following. If G is not polynomial, then it has an infinite generator
Lemma 6: Let G be a basic encoder; then G has the gi = hi/$i where hi and #i are polynomial. Let x = ci,
predictable degree property if and only if G is minimal. x’ = qie,; then x < x’ but y = gi is infinite whereas y’ = hi
Proof: By Lemma 3, G is minimal iff its backwards is finite, hence deg y’ < deg y = a,.
encoder G has an anticausal inverse. By Theorem 1 with If G is polynomial but not basic, then by Lemma 1
R = F[D-‘I, G has an anticausal inverse iff deg xG _< 0 there is an infinite input x = y;l~A-’ that gives a
implies deg x 5 0, or equivalently iff deg x 2 1 implies finite output y, but x’ = fikekA-’ is finite and gives finite
deg xG > 1, or by constancy deg x 2 cl implies deg xG 2 cl. output y' = sky where yk = o!k/& deg LYE2 1. Hence
But x’ < X, but deg y’ = deg elk + deg y > deg y.
If G is basic but does not have the predictable degree
y = xG property, then for some x
= x’G Let X’ = Xiei for some i for which the maximum on the
right is attained; then x’ < x, but
where the x’ corresponding to x is given by
deg y’ = deg xi + v, > deg y.
Q.E.D.
This lemma is not quite as sharp as one would like,
Now deg x 2 cliff max deg x, 2 cliff max (deg xi + vi) 2 d.
since the ordering of the inputs x is only partial, so that
Hence deg x > cl implies deg xG 2 cliff max (deg xi + vi) 2
deg y > deg y’ does not imply x > x’ but only x Q: x’.
d implies deg y = deg x’G 2 d, which is the same as the
Also, the ordering of codewords y by degree does not
predictable degree property. Q.E.D.
take into account the lengths of the individual sequences
In our earlier discussion, we asserted that the error yi. However, Lemma 7 reassures us that by choosing a
events of interest are the finite codewords. Let us normalize minimal encoder we will not get an excessive number of
all such words y to start at time zero, de1 y = 0. When information errors out per error event.
G has a zero-delay feedback-free inverse, these are precisely
the words generated by inputs x that are finite and start IV. CONCLUSIONS
at zero, de1 k = 0. When G has the predictable degree All our results tend to the same conclusion: regardless
property as well, we can easily enumerate the finite code- of what convolutional code we want to use, we can and
words by degree, since for each possible input x we can should use a minimal encoder to generate it. Minimal
compute the degree of the output knowing only the encoders are therefore to be considered a canonical class,
constraint lengths v,. In fact the number of codewords like systematic encoders for block codes. (In Appendix II
of degree <d is equal to the number of ways of choosing we consider systematic encoders as a canonical class for
k polynomials xi such that deg x, 5 d - vi or Xi = 0. convolutional codes.)
For example, if an (n, 2) binary code has V, = 2, up = 4, It should be noted that none of our results depends on
then there is one codeword of degree less than 2 (the all- the finiteness of F, so that all apply to sampled-data
zero word); there are two of degree 52, 4 of degree 13, filters, with F the field of real or complex numbers. (In
16 of degree 14, 64 of degree 15, and so forth. Of course, Lemma 4 we do need some continuity restriction, such
all equivalent encoders have the same codewords and as that the output be a linear function of the state, whenp
hence the same distribution of codeword lengths. is infinite; otherwise a multidimensional abstract state
The predictable degree property also guarantees that space could be mapped into a single real physical memory
734 IEEEI TRANSACTIONS ON INFORMATION THW)RY, NOVEMBER 1970
element, for example, by a Cantor mapping.) In this Proof: Recall that G is the first k rows of B, while
context polynomial generators correspond to tapped H* is the,last (n - k) columns of B-‘, and det B = a unit
delay line filters. The results on inverses are of clear in F[D]. We shall show that the upper left k X k sub-
interest here, but whether the remaining results are or determinant of B is equal to the lower right (n - k) X
not depends on whether our definition of equivalence is (n - k) subdeterminant of B-‘; the same proof carries
germane to some sampled-data problem. through for any other selection of columns in B and the
There are several obvious directions for future work. corresponding rows in B-’ by transposition.’ Write
First, the essential similarity in statement and proof of
Theorems 3 and 5 suggests that there ought to be some
way of setting up the problem so that the equivalence
of any encoder to a minimal encoder could be shown
without the intermediary of basic encoders. Second, there where the dotted lines separate rows and columns into
ought to be some way of treating the constraint lengths groups of k and n - k. Consider the matrix product
pi referenced to the output (pi = maxlSiSk deg gj,) com-
parable in simplicity to our treatment of the constraint
lengths v, referenced to the inputs. Third, at least on
memoryless channels, permutations of the transmitted
i I[
B,,; B,,
0-i ii-,
taking determinants
B:,iB:,
i&i;;.
we have
sequences or shifts of one relative to the others do not
result in essentially different codes; it might be interesting l&l1 IB-‘I = I%l,
cc
as
I/< =
&Ii-l
c
i-0
$iiD-’
qFfzcJ~
Fig. 6. Canonical realization of (n, k) encoder.
with short error events; in general short error events may V. ACKNOWLEDGMENT
result in many output errors with systematic encoders. The work of Prof. J. L. Massey and his colleagues at
Example: Minimal encoder = [l + D + D2, 1 + D”]. Notre Dame, particularly the result of Olson [7], was the
Pseudo-inverse with delay D: [l, llT. Output error prob- initial stimulus for the investigation reported here, and
ability when decoder is not working: 2p(for p low). Most should be considered the pioneering work in this field.
likely error event (only codeword of weight 5) : [l + D + D2, The principal results were at first obtained by tedious
1 + D’]. Output errors per most likely error event: 1. constructive arguments; subsequently Prof. R. E. Kalman
Equivalent systematic encoder = [l, 1 + D2/1 -I- was good enough to send along some of his work, which
D + 0’1. Inverse: [l, 91’. Output error probability when pointed out the usefulness of the invariant factor theorem
decoder is not working: p. Most likely error event: same as in the guise of the Smith-McMillan canonical form,
above. Output errors per most likely error event: 3. and which consequently was of great value in simplifying
It appears that if we expect the decoder to be working, and clarifying the development. The close attention of
we should select a minimal encoder, while if we expect it Dr. A. Kohlenberg and Prof. J. Massey to the final draft
not to be, we should select a systematic encoder. This is was also helpful.
not as frivolous as it sounds; in a sequential decoder, for
example, actual (undetected) error events can be made REFERENCES
extremely rare, the decoder failures instead occurring [l] J. K. Omura, ‘IOn the Viterbi decoding algorithm,” IEEE
y;;r Informatzon Theory, vol. IT-15, pp. 177-179, January
at times when the decoder has to give up decoding a
certain segment because of computational exhaustion. [2] A. J: Viterbi, “Error bounds for convolutional codes and an
asymptotical1 optimum decoding algorithm,” IEEE Trans.
During these times the decoder must put out something, Information 8 heory, vol. IT-13, pp. 260-269, April 1967.
and the best it can do is generally to put out the noisy [3] J. L. Massey and M. K. Sain, “Codes,. automata, and con-
tinuous systems: Explicit interconnectlons,” IEEE Trans.
estimates obtained directly from the received data, the Automatic Control, vol. AC-12, pp. 644-650, December 1967.
errors in which will be minimized if the encoder is syste- [4] -, “Inverses of linear sequential circuits,” IEEE Trans.
Computers? vol. C-17 pp. 330-337 April 1968.
matic. A systematic encoder (with feedback) might there- (51 M. K. Sam and J. L. Massey, “l&vertibiliy of linear time-
fore be a good choice for a sequential decoder, depending on invariant dynamical systems,” IEEE Trans. utornutic Control,
vol. AC-l?, pp. 141-149, April 1969.
the resynchronization method and the performance [6] D. D. Sulhvan, “Control of error propagation in convolutional
criterion. On the other hand, a maximum-likelihood decoder codes,” Universit.y of Notre Dame, Notre Dame, Ind., Tech.
Rept. EE-667, November 1966.
(Viterbi algorithm) is subject only to ordinary error [7] R. R. Olson, “Note on feedforward inverses for linear sequential
events and as a consequence should be used with a minimal circuits,” Dept. of Elec. Engrg., University of Notre Dame,
Notre Dame, Ind., Tech. Rept. EE-684, April 1, 1968; also
encoder. IEEE Trans. Computers (to be published).
As a final practical consideration, the feedback in the [8] 1). J. Costello, “Construction of convolutional codes for
sequential decoding,” Dept. of Elec. Engrg., University of
general systematic encoder can lead to catastrophes if ygip Dame, Notre Dame, Ind., Tech. Rept. EE-692, August
there is any chance of noise causing a transient error in
[9] W. W. Peterson, Error-Correcting Codes. Cambridge, Mass.:
the encoding circuit. M.I.T. Press, 1961.
From a theoretical point of view, minimal encoders [lo] C. W. Curtis and I. Reiner, Representation Theory of Finite
Groups and Associative Algebras. New York: Interscience,
are particularly helpful in analyzing the set of finite 1962, pp. 94-96.
codewords, as we saw in the main text. The fact that they [ll] R. E. Kalman, P. L. Falb, and M. A. Arbib, Topics {n Mathe-
yhttz$ System Theory. New York: McGraw-H111, 1969,
are a basis for the F[D]-module of all such codewords
means that we can operate entirely in F[D], which is con- [12] R. E. Kalman, ‘L$,e,ducible representations and the degree
;:;5 rational matnx, J. SIAM Control, vol. 13, pp. 520144,
venient, although throughout this paper we have seen the
utility of considering larger rings. The outstanding [13] B. h&Millan, “Introduction to formal realizability theory,”
Bell Sys. Tech. J., vol. 31, pp. 217-279, 541-600, 1952.
theoretical virtue of systematic encoders is that under [14] J. L. Massey, Threshold Decoding. Cambridge, Mass.: M.I.T.
some convention as to which columns shall contain the Press, 1963, pp. 23-24.
[15] J. J. Bussgang!, “Some properties of binary convolutional
identity matrix, there is a unique systematic encoder code generators, IEEE Trans. Information Theory, vol. IT-11,
generating any code. Thus systematic encoders are most pp. 90-100, January 1965.
[16] E. A. Bucher and J. A. Heller, ‘!Error probability bounds for
suited to the classification and enumeration of codes. Our systematic convolutional codes,” IEEE Trans. Information
taste is indicated by the relative placement of minimal Theory, vol. IT-16, pp. 219-224, March 1970.
[17] E. A. Bucher, “Error mechanisms for convolutional codes,”
and systematic codes in this paper, but clearly there are Ph.D. dissertation, Dept. of Elec. Engrg., Massachusetts
virtues in each class. Institute of Technology, Cambridge, September 1968.