SHANNON-FANO-ELIAS CODING
In digital communication, the source encoder converts input, that is, symbol sequence, into a
binary sequence of 0s and 1s. Each source symbol is represented by some sequence of coded
symbols called code words.
Which means that probabilities are used to determine codewords, more specific, it uses its
cumulative distribution function. Shannon–Fano–Elias coding produces a binary prefix code.
Given a discrete random variable X of ordered values to be encoded, let p(x) be the probability
for any x in X, where we assume that p(x) > 0 for all x, that is so we can have different step for
each x in CDF, the cumulative distribution function F(x) is F ( x )=∑ p(a). We can represent
α≤x
CDF as a step function, where the step in k has the height p(k), so each next step is the sum of all
previous steps. We define F́ ( x ) - modified cumulative distribution function. F́ (x) denotes the
sum of the probabilities of all symbols less than x plus half the probability of symbol x (midpoint
of the step in the picture). In general, F́ (x) is a real number so it can be represented with infinite
number of bits, to avoid that we use it as a core for our code and take only first l(x) bits, where
l(x) is calculated as follows…
For that codeword it is required to lie within the interval for that x in step function and to be
prefix free:
1
Proof that it lies within the step corresponding to x: F́ ( x )−⌊ F́ ( x ) ⌋ l ( x ) < l(x) , if
2
1
l ( x ) =⌈ log ⌉ +1 , then we get that
p (x)
1 p(x)
F́ ( x )−⌊ F́ ( x ) ⌋ l ( x ) < < = F́ ( x )−F ( x−1)
2l ( x ) 2
F́ ( x )−⌊ F́ ( x ) ⌋ l ( x ) < F́ ( x )−F ( x−1)
−⌊ F́ ( x ) ⌋ l ( x ) ← F(x−1)
F ( x−1 ) < ⌊ F́ ( x ) ⌋ l ( x ) < F (x)
Proof that it is prefix free: The code is prefix free if and only if the intervals corresponding to
1
codewords are disjoint. For each codeword z 1 z 2 … z l that interval is [0. z 1 z 2 … z l , 0. z1 z 2 … z l + l ],
2
from that we can easily see that the interval corresponding to each codeword has length 2 , −l (x)
which is less than half of the height of the step, proven above. The lower end of the interval is in
the lower half because it is a round of, and with those two we get that the other side of the
interval lies below the top of the step. Taking that the entire interval is within the step for
different xs, we have that intervals are disjoint.
ARITHMETIC CODING
It is an algorithm for encoding and decoding. Arithmetic coding is in principle a generalization
of Shannon-Fano-Elias coding to coding symbol sequences instead of coding single symbols. So
now, instead of finding a codeword for each symbol, we are finding a codewrod for entire
sequence of symbols, for example ABCD. Comparing to, let’s say Huffman algorithm, this one
is more useful because it can easily be extended to longer blocks, i.e. if we add one more symbol
we can calculate codeword directly by the rest of the block.
Simplified version of the arithmetic coding: we assume fixed number of blocks length n and
binary source alphabet (means we only take numbers as source symbols, so we can easily
arrange them). So string x is greater than string y if x i=1 , yi =0 , xi ≠ y i for the first i. In our tree
that means that right branch is 1 and left branch is 0, so x is to the right of y. Equivalently
x > y if ∑ xi 2−i > ∑ y i 2−i which means that corresponding binary numbers satisfy 0. x> 0. y.
i i
Similar to the previous section, few steps needs to be done:
n
n
1. Calculate p ( x )= p ( x 1 x 2 … x n ) =∏ p( x i ) if we assume that sources are i.i.d. We can see
i=1
that knowing p ( x ) we can calculate p ( x n x n +1).
n
2. Calculate joint cumulative distribution function F (x n) for the source sequence x n.
Comparing to Shannon Fano Elias algorithm, that is the sum of all y n such that y n < x n.
We can arrange strings as the leaves of a tree of depth n, where each level of the tree
corresponds to one bit. In that way, the ordering x > y corresponds to the fact that x is to
the right of y on the same level of the tree. From the tree we see that we need to find the
probabilities from all the leaves on the left of x n to find F (x¿¿ n) ¿. The sum of these
probabilities is the sum of all the subtrees to the left of x n. In this step we may not need to
find each p( y ¿¿ n) ¿ if some higher node is familiar. We can calculate this sum of
probabilities as the sum of all subtrees to the left of x n. Let T x x … x 0 be a subtree starting
1 2 k−1
with x 1 x 2 … x k−1 0, where this 0 represents the left branch, because we are observing only
trees to the left, so 0 will always be at the end. The probability of that subtree is the sum
of all his subtrees, and that is:
p ( T x x … x 0 ) = ∑ p ( x 1 x 2 … x k−1 0 y k+1 … y n )= p(x 1 x2 … x k−1 0)
1 2 k−1
y k+ 1 … y n
F ( x )= ∑ p ( yn)=
n
∑ the ¿ x n p(T )¿
n n
y <x T : T is ¿
3. Use a number in the interval ¿ as the code for x n . That number is F́ ( x ) . Using same
argument as in Shannon Fano Elias, it follows that codeword corresponding to any
sequence lies within the step in CDF corresponding to that sequence. So codes for
different sequences of length n are different.
Suppose that we want to code a sequence x = x1, x2, . . . , xn. Start with the whole probability
interval [0, 1). In each step divide the interval proportional to the cumulative distribution F(i) and
choose the subinterval corresponding to the symbol that is to be coded.