XI Reunion de Trabajo en Procesamiento de la Informacion y Control,16 al 18 de octubre de 2007
Maximum Likelihood Decoding on a Communication Channel
Cesar F. Caiafay , Nestor R. Barrazaz , Araceli N. Protoy;?
y Laboratorio de Sistemas Complejos, Facultad de Ingeniera, Universidad de Buenos Aires - UBA
ccaiafa@ .uba.ar
z Instituto de Ingeniera Biomedica, Facultad de Ingeniera, Universidad de Buenos Aires - UBA
nbarraza@ .uba.ar
? Comision de Investigaciones Cient cas de la Prov. de Buenos Aires Aires - CIC
aproto@ .uba.ar
Keywords Digital Communication Channel,
Maximum Likelihood Decoder, Markov Chain, Logarithmic Distribution, Ising Model.
Abstract A binary additive communication
channel with different noise processes is analyzed.
Several noise processes are generated according to
Bernoulli, Markov, Polya, and Logarithmic distributions. A noise process based on the two dimensional
Ising model (Markov 2D) is also studied. In all cases,
a maximum likelihood decoding algorithm is derived.
We obtain interesting results since in many cases, the
most probable code-word is either the closest to the
input, or that farthest away, depending on the model
parameters.
I
noise process, are shown.
II THE BINARY ADDITIVE
COMMUNICATION CHANNEL
We study a discrete communication channel with binary
additive noise as it is depicted in Fig. 1. Then, the ith
output Yi 2 f0; 1g is the module-two sum of the ith input
Xi 2 f0; 1g and the ith noise symbol Zi 2 f0; 1g, i.e.
Yi = Xi Zi , i = 1; 2; . We assume independence
between input and noise processes, and input is a nite
code-word chosen from a nite code-book. This type of
channel was analyzed in Alajaji and Fuja (1994) where
the process Zi follows the Polya contagion model .
INTRODUCTION
Maximum likelihood - ML decoding on communications
has been applied for different kind of channels: Additive White Gaussian Noise - AWGN (Chi-Chao et al
(1992), Haykin (2001)), Binary Symmetric Channel BSC (Haykin (2001)), Binary Erasure Channel - BEC
(Khandekar and Mc. Eliece (2001)) and others. ML
decoding was also studied when some code is transmitted over the channel, such as Turbo Codes (Hui Jin and
McEliece (2002), Moreira and Farrel (2006)), Linear
Predicting Code - LPC (Haykin (2001), Moreira and Farrel (2006)) or Cyclic Redundant Codes - CRC (Haykin
(2001), Moreira and Farrel (2006)). In some cases, it
is considered that Maximum Likelihood is equivalent to
minimum Hamming distance decoding. However, it is
not true for different kind of noise processes (crossover
probabilities). In this paper, we show some cases where
the most probably transmitted code-word is that farthest
away from the input. Also, cases which are equivalent to
minimum Hamming distance, and intermediate possibilities are presented. The type of channel we analyze is the
BSC, where the output is produced by adding some noise
process to the input code-word. The different kinds of
noise distributions we analyze are Bernoulli, Polya contagion, Markov chain and Logarithmic. In addition, a two
dimensional Ising noise process is also studied. New and
interesting results, depending on the parameters of the
Figure 1: The binary additive communication channel
model.
Following these assumptions, for an output vector
Y = [Y1 ; Y2 ;
; Yn ], a random input code-word X =
[X1 ; X2 ;
; Xn ] and a random noise vector Z =
[Z1 ; Z2 ;
; Zn ], the channel transition probabilities are
given by1 :
P (Y = y=X = x) = P (Z = x
y)
(1)
where x y = [x1 y1 ; x2 y2 ; ].
To clarify concepts, a given input, output and noise
outcomes could be:
x = [1; 0; 0; 1; 1; 0; 1; 1; 1; 0; 1; 0; 0]
z = [0; 0; 1; 1; 0; 0; 0; 0; 1; 1; 0; 0; 1]
y = [1; 0; 1; 0; 1; 0; 1; 1; 0; 1; 1; 0; 1]
1 Through out this paper, we use capital letters for random variable
names and lower case letters for denoting realizations of them. Additionally, bold letters are used for vectors.
XI Reunion de Trabajo en Procesamiento de la Informacion y Control,16 al 18 de octubre de 2007
Therefore, 1's in the noise process determines which
input symbols are changed. The Hamming distance between input and received code-words, is given by:
d=
n
X
(2)
zi
i=1
In order to simplify the notation through out the paper,
we will avoid the usage of random variable names when
a probability of a speci c realization is written, for example, instead of writing P (X = x=Y = y) we will write
P (x=y).
III
MAXIMUM LIKELIHOOD DECODING
For a code-book C composed by a set of m code-words,
i.e., C = x1 ; x2 ; :::; xm , the maximum likelihood decoder chooses, as the estimated input, the most probably
code-word xk given a received output y, i.e. by maximizing P (xk =y). Following the Bayes rule we get:
P (y=xk )P (xk )
P (y)
P (xk =y) =
x
^ = arg max(P (y=x )) : xk 2 C
x
^ = arg max(P (z )) : z = y
x ; x 2C
(5)
Then, the estimated input is fully determined by noise
(crossover) characteristics and the used code-book.
Following the chain rule of probability, for code-words
of length n the noise process can be expressed as:
P (z) = P (z1 )
n
Y
P (zi =zi
1 ; zi 2 ; :::; z1 )
(6)
i=2
ML Decoder error probability
If k max denotes the index for which the probability
k
P (y=x ) is maximized, i.e. x
^ = xk max , then the conditional error probability of the ML decoder is de ned as
(Barbero et al (2006)):
P (error =y) = P (xk max 6= xk =y)
(7)
and the error probability of the ML decoder is
X
P (error) =
P (error =y)P (y)
(8)
Now we obtain an expression of the error probability
in terms of the code-book C and the received vector y.
Equation (7) can be rewritten as follows:
X
P (error =y) =
P (xi =y)
i6=k max
1 P (y=x
)
Pm
i
i
i=1 P (y=x )P (x )
(10)
From equation (10), it is clear that, as the atter the
k
function P (y=x ) is in terms of xk , the bigger the error
is.
In the following subsections, we analyze the decoder
behavior for some speci c noise distributions.
B
Bernoulli noise model
For this noise distribution, all the Zi 's are independent
and have a common parameter p (probability of change
in one bit or crossover probability), so (6) results:
P (zi =zi
1 ; zi 2 ; :::; z1 )
= P (zi ) = pzi (1
1 zi
p)
(11)
According to (1), (6) and (11), the probability that a
given code-word xk had been input when a code-word y
is received P (y=xk ) is given by:
(4)
Following (1) and (4), the estimated code-word is obtained by choosing x
^ = xk which makes P (z) maximum, i.e.:
k
k max
(3)
Since P (y) is independent of the decoding rule, and
considering that all code-words are equally likely, the ML
algorithm results:
and by using the Bayes rule, it is easy to see that the conditional error probability can be written in the following
form:
P
i
i
i6=k max P (y=x )P (x )
(9)
P (error =y) =
Pm
i
i
i=1 P (y=x )P (x )
gB (d) = P (zk ) =
n
d
p
1
(1
p)n
(12)
where d = d(xk ; y) is the Hamming distance between
xk and y as was already de ned in (2).
As it can be seen from (12), when p is less than 1 p,
the most probable input code-word (ML decoding) which
maximizes gB (d) is that one closest from that received
(minimum d). Conversely, when p is greater than 1 p,
the ML input decoding is that having the greatest d, i.e.,
the most different code-word from that received. This
simple case shows the two possibilities for ML decoding, when p < 21 , the noise parameter is not enough to
produce considerable changes. When p > 12 , the noise
parameter is big enough to consider that the input was
changed at maximum.
C
Polya contagion noise model
As it was analyzed in Alajaji and Fuja (1994), when the
noise process is given by the Polya contagion model (see
Polya and Eggenberger (1923), Feller (1950)) the conditional probabilities are given by:
P (zi =zi
where si 1 =
ities result:
1 ; zi 2 ; :::; z1 )
Pi
1
l=1 zi .
gP (d) = P (zk ) =
= P (zi =si
1)
(13)
The channel transition probabil-
(1= ) ( = + d) ( = + n d)
( = ) ( = ) (1= + n)
(14)
XI Reunion de Trabajo en Procesamiento de la Informacion y Control,16 al 18 de octubre de 2007
where d is the Hamming distance as it was de ned before;
R 1 t ,1 and are model parameters and (t) =
u
exp( u)du is the gamma function. Since
0
gP (d) is strictly convex, has a unique minimum d0 , and
is symmetric about d0 , the most probable code-word will
be either that having minimum or maximum Hamming
distance from the received code-word (Alajaji and Fuja
(1994)). It means, the best estimate corresponds to d farthest away from d0 . This property for the Polya contagion
model, is independent from the parameters, it means, the
estimated input could be the closest or the farthest depending on the received code-word. It is due to the convexity of gP (d).
D
Markov noise model
We consider here that the noise process can be modeled by a rst order Markov chain (Feller (1950)), i.e.
P (zi =zi 1 ; zi 2 ; :::; z1 ) = P (zi =zi 1 ). This model
depends on three parameters: the crossover probability p = P (zi = 1) and the noise transition probabilities
= P (zi = 1=zi 1 = 0) (probability of a bit
1 given that the previous noise outcome is a 0) and
= P (zi = 0=zi 1 = 1) (probability of a bit 0 given
that the previous noise outcome is a 1). Using the chain
rule (6) we obtain the channel transition probabilities as
follows:
P (zk ) = pz1 (1
1 z1
n10
n11
n01
where 1 and 0 are the parameters of the logarithmic
distributions corresponding to 1's and 0's respectively and 0 < 1 ; 0 < 1.
In order to clarify this model, a noise output example
is shown below:
1 ]
z = [ 0; 0 ; 1; 1 ; 0; 0; 0; 0; 1; 1 ; 0; 0 ; |{z}
|{z} |{z} | {z } |{z} |{z}
v1 =2 u1 =2
v2 =4
u2 =2 v3 =2 u3 =1
where ui and vj are the lengths of the 1's chain number
i and the 0's chain number j. Notice that high values of
1 and 0 produce noise con gurations with long chains,
on the other hand, for 1 ; 0 ! 0 we get con gurations
with alternate single 1's and 0's.
The interest in this model comes from the property that
the probability of getting a 1 in a given bit, following a
group of r 1's, tends to a constant value 1 as r ! 1,
as it is shown from the conditional probability:
P (Zi = 1=Zi
= 1;
; Zi
= 1) =
Sr+1 + Sr+2 +
Sr + Sr+1 +
(19)
where:
Sr = P (U = r)
This property shows a difference with the Polya contagion model, where the conditional probability (19) tends
to 1 as r tends to in nity. The property (19) for the logarithmic distribution was remarked in Siromoney (1964).
Assuming independence among ui and vj 8 i; j, we
obtain the channel transition probabilities as follows:
1
!0 k
k1
0
Y
Y
P (zk ) =
P (ui ) @
P (vj )A
(20)
) 00
(15)
where the parameters nst (s; t = 0; 1) are the number of
bits with the value s followed by a bit with the value t
and verifying the constraint: n10 + n11 + n01 + n00 =
n 1.
A very simple expression for the ML decoder is obtained from (15) for the particular case where the noise
i=1
j=1
transitions are symmetric, i.e.
= . In the later case,
the function to be maximized (ML decoder) is:
where k1 is the number of 1's chains, k0 is the number
z1
q
of 0's chains,
p
(16)
gM (z1 ; q) =
In order to obtain the ML decoder, we apply the natu1 p
1
ral logarithmic function to (20) and we obtain the gL (:)
where q = n01 + n10 is the number of transitions (0 function to be maximized which is:
to 1 and 1 to 0) in the noise vector z = y xk .
We conclude from (16) that the ML decoder is a nongL (n1 ; k1 ; k0 ; fui g; fvj g) = n1 ln 1
(21)
decreasing (decreasing) function of q when the noise
0
transition probability is
> 0:5 ( < 0:5). In other
k1
k0
X
X
words, when > 0:5, the most probable input code-word
k1 ln [ 1 ] k0 ln [ 0 ]
ln [ui ]
ln [vj ]
is that one corresponding to a noise vector with the highi=1
j=1
est number of transitions as possible (maximum q).
Pk1
where n1 = i=1
ui is the total number of 1's and the
E Logarithmic noise model
parameters 1 and 0 are de ned by s = ln (1
s)
In this model, we consider that noise is composed by (for s = 0; 1).
alternate chains of 1's and 0's and the length of
From the observation of equation (21) we coneach chain follows a logarithmic distribution (Douglas clude that there are too many variables to measure
(1980)). If we denote by U the length of a given 1's (n1 ; k1 ; k0 ; fui g and fvj g) for the implementation of the
chain and by V the length of a given 0's chain, then:
ML decoder, which could be a problem from the point of
1
(17) view of its complexity. For this reason, in this paper, we
P (U = u) =
u ln (1
propose an approximation of (21) in order to reduce its
1)
p)
P (V = v) =
(1
v ln (1
0)
(1
(18)
XI Reunion de Trabajo en Procesamiento de la Informacion y Control,16 al 18 de octubre de 2007
Let us now consider a simple case where 1's chains
and 0's chains are identically distributed, i.e. = 1 =
0 . In this case, 1 = 0 and 1 = 0 and therefore the
approximated ML decoder is even simpler:
g^L (q) = (q + 1) 1
h
i
Figure 2: Plot of h( ) = 1 ln 1
. The ML decoder
chooses the maximum or minimum number of transitions
q as < 0:73 or > 0:73, .
Figure 3: Exact ML decoder (optimum) versus Approximated ML decoder for
= 1 = 0 , m = 16 and
n = 7; 14; 21 and 28.
complexity. The idea is that the last two terms in (21) can
be approximated by using the linear approximation of the
logarithm (ln(t) t 1 for jtj < ) as follows:
k1
X
n1
ln ui
k0
X
k1 + k1 ln
(22)
i=1
ln vi
n1
k0 + k0 ln
(23)
i=1
where 1 = E[U ] =
1 = [(1
1 ) ln (1
1 )] and
0 = E[V ] =
0 = [(1
0 ) ln (1
0 )] are the mean
values of the logarithmic random variables U and V respectively. Note that, in the approximations (22) and
v
(23), we have used the approximation for ui ; j <
1
0
which indicates that these approximations will be valid
for the cases where chain lengths are not so far away from
their mean values. Finally, by putting (22) and (23) in
(21), we obtain the approximated gL (:) function which
is:
1
1
g^L (n1 ; k1 ; k0 ) = n1 ln 1 +
(24)
0
+k1 f1
ln [
1 ]g + k0 f1
ln [
0 0 ]g
ln
(25)
where q = k1 + k0 1 is the number of transitions (0
to 1 and 1 to 0) in the noise vector z = y xk .
Looking at equation (25) we see that, in this particular
case, g^L (q) depends on the number of transitions linearly;
so,
need to determine if the factor h( ) =
n
h we io
is positive or negative in order to as1 ln 1
sign the most probable transmitted code-word to the maximum or to the minimum number of transitions q. From
Fig. 2 we see that
= 0:73 is a threshold from
the most probable code-word corresponds to that having
maximum or minimum number of transitions q, provided
< 0:73 or > 0:73.
In order to test the effectiveness of our approximations
(22) and (23) we have conducted a huge number of simulations where vector noises were generated according
to their logarithmic distributions for the case of having
= 1 = 0 and covering the complete range of the
parameter . A random code-book with m = 16 codewords was generated for different cases of code-word
lengths n (n = 7, 14, 21 and 28) and a minimum Hamming distance among code-words of d(xi ; xj ) = 2 was
guaranteed. For each value of , a total of 500 simulations were conducted in order to average the obtained
decoder error probability and reach to an estimation of
(8). In Fig. 3 the decoder error probability obtained by
using the exact ML decoder (equation (21)) and the approximated ML decoder (equation (25)) are shown. Notice that the exact ML decoder always gives a lower error
probability than the approximated version as expected.
Maximum probability error is reached at
0:75, as
shown. We remark that this value of also gives the
maximum variance of q, in agreement with the maximum
probability of error of the ML decoder and the transition
threshold shown in Fig. 3. These results will be further
studied in a future work.
F
2D Ising noise model
In this subsection, we extend the ML decoder for 2D binary signals transmitted over a channel with the same
characteristics as shown in Fig. 1. 2D signals are useful for representing digital images. A very well known
model for binary images is the Ising model which has
its roots in statistical mechanics as a model for ferromagnetic materials (Huang (1987)). The Ising model has
been widely applied to model interactions between pixels
in images, (Geman and Geman (1984)), and introduced
the development of the theory of Markov Random Fields,
(Greaffeath (1976)). In this paper we propose to use the
XI Reunion de Trabajo en Procesamiento de la Informacion y Control,16 al 18 de octubre de 2007
Figure 4: Likelihood function for 2D Ising noise model
Ising model to represent the 2D noise process fZi;j g with
i; j = 1; 2; ::; L (for L L images).
Originally, in the Ising model, lattice variables are
called spins fSi;j g and they are allowed to take only
two opposite states: spin up (si;j = +1) or spin down
(si;j = 1). In this case, the probability of a lattice
con guration fsi;j g is provided by the Gibbs formula
(Huang (1987)):
0
1
X
X
P (s) / exp @
si;j (si+1;j + si;j+1 ) + H
si;j A
i;j
i;j
(26)
where s is a vector containing all the variables of the lattice fsi;j g, is called the interaction coef cient and H is
the external magnetic eld. The effect of the parameter
is to regulate the interaction among neighbor spins, for
instance for ! 0 the spins tend to be independent each
other, on the other hand, if j j is higher than a critical
value of c = 0:44, then the lattice is magnetized (a majority of the spins are in the same state) (Huang (1987)).
On the other side, a positive (negative) parameter H induces spins to adopt the +1 (-1) state.
Since we want to model binary images we need to apply a mapping from the lattice with spin states to a new
lattice with binary values 0 and 1 (fSi;j g ! fZi;j g).
For this mapping we consider the following relationship:
xi;j + 1
(27)
2
Using the mapping (27), the equation (26) and after
applying some algebraic operations we nally reach to
the probability of a 2D Ising noise process which is:
zi;j =
P (z) / exp (4 n11 + 2(H
4 )n1 )
gI (n11 ; n1 ) = 2 n11 + (H
4 )n1
(28)
where z is a vector containing all the variables of the lattice fzi;j g (also known as pixels in an image processing context), n11 is the number of horizontal and vertical
pixel pairs where both pixels have a 1 value and n1 is
the total number of pixels with the value 1.
Following the reasoning of the previous subsections,
the ML decoder for this case relies on the maximization
of the following gI (:) function:
(29)
Note that different scenarios can take place depending on the values of parameters H and , for example, if
> 0 and H < 4 , then the ML decoder will choose
the code-word which produces the maximum n11 and, at
the same time, the minimum n1 within the set of vectors
zk = y xk ; xk 2 C.
In order to simplify the equation (29), in this paper we provide an approximation which is based on the
Bragg-Williams approximation already used in physics
for the estimation of the critical temperature in the Ising
model (Huang (1987)). The Bragg-Williams approximation states that
n1 2
n11
(30)
2n
n
Replacing the approximation (30) in (29) we get a simpler approximated g^I (:) function:
g^I (n1 ) =
n1
n
n1
n
H
4
(31)
Notice that the function g^I (n1 ) is quadratic and convex, therefore it has a unique minimum and the ML decoder has a similar behavior as the case of Polya contagion model, it means, the estimated input could be the
closest or the farthest depending on the received codeword. A sketch of (31) is shown in Fig. 4.
IV
CONCLUSIONS
Several noise models for a binary communication channel were analyzed. Bernoulli and Markov models were
introduced for comparing them to the Polya contagion
model previously analyzed in the bibliography. Also, a
logarithmic distribution for the noise process was specially studied as it has conditional probabilities reaching to a constant value
< 1, which is different to
the Polya contagion model. We demonstrated that for
Markov chain and Logarithmic noise models, if some
particular conditions are considered, the ML decoder is
reduced to maximize or minimize the number of transitions q. Additionally, a two dimensional Ising model was
analyzed since it is usually applied in image processing.
We show that a ML decoding algorithm can be reduced
to counting 0's and 1's when the Bragg-Williams approximation is applied to binary images.
In summary, this work provides new mathematical results that can be useful for the implementation of new
decoders taking advantage of already known noise processes. One-dimensional noise models, like Markov and
logarithmic cases here discussed, can be used for modeling burst like noise in a communication channel, where
the probability of an error in a bit is dependent on the rest
of the bits errors. On the other side, the 2D-Ising model
developed in this paper, can be used directly for modeling
spot like noise in black & white images, for example in
digitally scanned images or scanned photocopies where
the degradation of the image is not well modeled through
XI Reunion de Trabajo en Procesamiento de la Informacion y Control,16 al 18 de octubre de 2007
an i.i.d. (independent identically distributed) variable
associated to pixels. The Bragg-Williams approximation here introduced could be considered in more general
Markov Random Fields. It will be discussed in a future
work
Acknowledgements: C. F. Caiafa acknowledges the
support of Facultad de Ingenieria, Universidad de Buenos
Aires, Argentina (Beca Doctoral Peruilh). This work was
partially supported by UBACyT I036 Project. The authors thank Eng. Facundo Caram for his useful comments
on the rst draft of this paper.
Geman S. and Geman D., Stochastics relaxation, Gibbs
distributions, and the Bayesian restoration of images,IEEE Trans. Patt. Anal. Machine Intell., 6, No.
6. pp. 721-741 (1984).
Greaffeath D., Introduction to Random Fields ,in Denumerable Markov Chains, New York: Springer-Verlag,
pp. 425-458, 1976.
Haykin, S. Communication Systems, 4th Edition , J. Wiley (2001).
References
Huang, K. Statistical Mechanics, 4nd Edition , J. Wiley,
New Jersey, (1987).
ALajaji, F. and Fuja T., A Communication Channel
modeled on contagion, IEEE Trans. on Inf. Theory,
49, 2035-2041 (1994)..
Hui Jin, McEliece R.J., Coding theorems for turbo code
ensembles ,IEEE Trans. Inf. Theory, 48, No. 6. pp.
1451-1461 (2002).
Barbero A., Ellingsen P., Spisante S. and Ytrehus
O.,Maximum Likelihood Decoding of Codes on the
Z-channel IEEE International Conference on Communications, 2006. ICC '06. , Istanbul, 1200-1205
(2006).
Khandekar A. and Mc Eliece R. J.,On the Complexity
of Reliable Communication on the Erasure Channel
IEEE International Symposium on Information Theory,
2001. ISIT '01. , Washington, 7803-7123 (2001).
Chi-Chao Chao, McEliece R.J., Swanson L., Rodemich
E.R., Performance of binary block codes at low
signal-to-noise ratios ,IEEE Trans. Inf. Theory, 38,
No. 6. pp. 1677-1687 (1992).
Douglas J. B., Analysis with Standard Contagious Distributions, International Co. Publishing House, (1980).
Feller W., An Introduction to Probability Theory and Its
Applications, Volume 1, J. Wiley (1950).
Moreira J.C. and Farrell P.G., Essentials of Error-Control
Coding, 1st Edition , J. Wiley (2006).
Polya G. and Eggenberger F., Uber
die Statistik Verketteter Vorgange Z. Angew. Math. Mech. 3, pp. 279-289
(1923).
Siromoney G., The General Dirichlet's Series Distribution Journal of the Indian Statistical Association, 2,
No. 2, (1964).