0% found this document useful (0 votes)
26 views

ICT - Module 1 Lecture 1

This document provides information about an Information and Coding Theory module taught by Dr. Akriti Nigam at BIT Mesra. It includes references on information theory, the textbook used, and details about course assessment. It also provides background on Claude Shannon, the father of information theory, and introduces some key concepts in information theory like entropy, capacity, source coding, and channel coding. Assessment for the course includes quizzes, a midterm exam, assignments, and a final exam.

Uploaded by

SHITAL SUMAN
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

ICT - Module 1 Lecture 1

This document provides information about an Information and Coding Theory module taught by Dr. Akriti Nigam at BIT Mesra. It includes references on information theory, the textbook used, and details about course assessment. It also provides background on Claude Shannon, the father of information theory, and introduces some key concepts in information theory like entropy, capacity, source coding, and channel coding. Assessment for the course includes quizzes, a midterm exam, assignments, and a final exam.

Uploaded by

SHITAL SUMAN
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Information & Coding Theory (ICT)

Module 1

By
Dr Akriti Nigam
Computer Science & Engineering
Department
BIT, Mesra

1
References
 “Elements of Information Theory”, by Thomas Cover
https://cs-114.org/wp-content/uploads/2015/01/
Elements_of_Information_Theory_Elements.pdf
 Stefan M. Moser: “Information Theory (Lecture Notes)”

Text Book
 Ranjan Bose, “Information Theory coding and
Cryptography”, McGraw-Hill Publication,
2ndEdition

2
Course Assessment

 10 marks Quiz 1

 10 marks Quiz 2

 25 marks Mid Sem

 5 marks Assessment

 50 marks End Sem

3
Claude Shannon
Father of Digital
Communications
“Claude Shannon's creation in
the 1940's of the subject of
information theory is arguably
one of the great intellectual
achievements of the twentieth
century”
Bell Labs
Computing and Mathematical
Sciences Research
Link to the article:
https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
4
Quotes about Shannon

• ”What is information? Sidestepping questions


about meaning, Shannon showed that it is a
measurable commodity”.

• ”Today, Shannon’s insight help shape virtually


all systems that store, process, or transmit
information in digital form, from compact discs
to computers, from facsimile machines to
deep space probes”.

• ”Information theory has also infilitrated fields


outside communications, including linguistics,
psychology, economics, biology, even the arts”.
Basic Model of Digital Communication

The main objective of Shannon:


Find a way to “reliably” transmit information (i.e., data) from the sender
throughout the channel towards the receiver at “maximal” possible rate.
6
Information Theory

The art of quantifying and communicating


information

Information theory (“being theory”) starts from the


notion of what is possible rather than how what has
been proved possible could be attainable.

7
Information Theory

ENTROPY
Two Main Concepts: CAPACITY

Measurement of
Three Main Axes for Information.
Shannon Theory Source Coding Theory.
Channel Coding Theory.

8
Information Sources

9
Information Content

10
SAME Information Source DIFFERENT Information
Content

Information (at least until Shannon) depicts a non-


tangible term that is uncountable.

Shannon turned his attention to PROBABILITY


THEORY to help him come up with a measure for
information
11
Change to an efficient
Change
representation,
to an efficient representation for,
Any source of information
i.e., data compression.
transmission, i.e., error control coding.

Source Channel Channel


Source
coder coder

Channel Channel Source Sink,


decoder decoder receiver

The channel is anything transmitting or storing information –


Recover
a radio link, from channel
a cable, distortion.Uncompress
a disk, a CD, a piece of paper, …
Fundamental Entities

H Source R Channel Channel


Source
coder coder
C

Channel Channel Source Sink,


C decoder decoder receiver

H: The information content of the source.


R: Rate from the source coder.

C: Channel capacity.
Fundamental Theorems

H Source R Channel Channel


Source
coder coder
C

Channel Channel Source Sink,


C decoder decoder receiver

Shannon 1: Error-free transmission possible if R¸H and C¸R.


Shannon 2: Source codingChannel
Source coding
and channel
theorem
coding
coding can be optimized
(simplified)
theorem (simplified)
independently, and binary symbols can be used as intermediate
format. Assumption: Arbitrarily long delays.
Stochastic sources

• A source outputs symbols X1, X2, ...


• Each symbol take its value from an alphabet A = (a1, a2, …).
• Model: P(X1,…,XN) assumed to be known for all combinations.
Example 1: A text is a sequence of symbols each
Example 2: A (digitized) grayscale image is a
taking its value from the alphabet
sequence of symbols each taking its value from the
A = (a, …, z, A, …, Z, 1, 2, …9, !, ?, …).
alphabet A = (0,1) or A = (0, …, 255).

Source X1, X2, …


Two Special Cases

1. The Memoryless Source


 Each symbol independent of the previous ones.
 P(X1, X2, …, Xn) = P(X1) ¢ P(X2) ¢ … ¢ P(Xn)
2. The Markov Source
 Each symbol depends on the previous one.
 P(X1, X2, …, Xn) = P(X1) ¢ P(X2|X1) ¢ P(X3|X2) ¢ … ¢ P(Xn|Xn-1)
Information and Entropy

• Assume a binary memoryless source, e.g., a flip of a coin. How much


information do we receive when we are told that the outcome is
heads?
• If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that the amount of
information is 1 bit.
• If we already know that it will be (or was) heads, i.e., P(heads) = 1, the
amount of information is zero!
• If the coin is not fair, e.g., P(heads) = 0.9, the amount of information is more
than zero but less than one bit!
• Intuitively, the amount of information received is the same if P(heads) = 0.9
or P (heads) = 0.1.
Uncertainty and Measure of Information

Consider extreme cases for some event : Information Source

There is no uncertainty in the


source
Occurrence of event does not
introduce information

If hypothetically event occurs,


we have possibly gained an
infinite amount of information

18
Probability-Based Measure of Information

As pk decreases,
The uncertainty increases
The occurrence of event corresponds to some gain in information. BUT
HOW MUCH?

𝐼 ( 𝑥 𝑘 )= log
1
𝑝𝑘 ( )
=− log ( 𝑝 𝑘 )
 bits
 nats
 Hartleys
19
Self Information

• So, let’s look at it the way


Shannon did.
• Assume a memoryless source
with
• alphabet A = (a1, …, an)
• symbol probabilities (p1, …,
pn).
• How much information do we
get when finding out that the
next symbol is ai?
• According to Shannon the self
information of ai is
Why?

• Assume two independent events A and B, with probabilities


P(A) = pA and P(B) = pB.
• For both the events to happen, the probability is pA ¢ pB.
However, the amount of information should be added, not
multiplied.

Logarithms satisfy this!


No, we want the information to increase with
decreasing probabilities, so let’s use the negative
logarithm.
Self Information

Example 1:

Example 2:
Self Information

• On average over all the symbols, we get:

• H(X) is called the first order entropy of the source.

• This can be regarded as the degree of uncertainty


about the following symbol.
Entropy

Example: Binary Memoryless Source


BMS 01101000…

Let

Ofte
n de
Then note
d

1
The uncertainty (information) is greatest when

0 0.5 1
Entropy: Three properties

1. It can be shown that 0 < H < log N.


2. Maximum entropy (H = log N) is reached when all
symbols are equiprobable, i.e.,
pi = 1/N.
3. The difference log N – H is called the redundancy of
the source.
Entropy & Source Coding Theory

Entropy: The average information content of a source.

( )
𝑲 𝑲
𝟏
𝑯 ( 𝑿 )= ∑ 𝒑 𝒌 × 𝐥𝐨𝐠 =− ∑ 𝒑 𝒌 ×𝐥𝐨𝐠 ( 𝒑 𝒌)
𝒌=𝟏 𝒑𝒌 𝒌=𝟏

Shannon’s First Source Coding Theorem states that:


“To reliably store the information generated by some
random source , you need no more/less than, on the
average, bits for each outcome”
Joint Entropy and Conditional Entropy

• The joint entropy H(X,Y) of a pair of discrete random variables


(X, Y) with a joint distribution p(x, y) is defined as
Joint Entropy and Conditional Entropy

• The naturalness of the definition of joint entropy and


conditional entropy is exhibited by the fact that the entropy of a
pair of random variables is the entropy of one plus the
conditional entropy of the other This is proved in the following
theorem.

Theorem:
Joint Entropy and Conditional Entropy

the number of bits needed to describe X and Y is the sum of the number of bits
needed to describe X and that needed to describe Y once X is known
The conditional entropy
measures how much
entropy a random variable
Y has remaining if we have
already learned the value
of a second random
variable X.
Or
as the expected number of
bits needed to describe Y
when X is known to both
the encoder and the
decoder
It is referred to as the entropy
of Y conditional on X, and is
written H(Y∣X)

H(Y|X)≤H(Y)
Knowledge of (X) never
increases entropy, and,
except when it is
irrelevant (X and Y
independent), it always
lowers entropy.
Relative Entropy and Mutual Information

• The entropy of a random variable is a measure of the uncertainty of the


random variable; it is a measure of the amount of information required
on the average to describe the random variable.
• The relative entropy D(p||q) is a measure of the inefficiency of assuming
that the distribution is q when the true distribution is p.
• For example, if we knew the true distribution p of the random variable,
we could construct a code with average description length H(p).
• If, instead, we used the code for a distribution q, we would need H(p) +
D(p||q) bits on the average to describe the random variable.
• The relative entropy or Kullback–Leibler distance between two
probability mass functions p(x) and q(x) is defined as
Relative Entropy and Mutual Information

• In the above definition, we use the convention that 0 log 0/0= 0 and the convention
that 0 log 0/q = 0 and p log p/0= ∞. Thus, if there is any symbol x ∈ X such that p(x) >
0 and q(x) = 0, then D(p||q)=∞.
• Relative entropy is always nonnegative and is zero if and only if p = q.

• Consider two random variables X and Y with a joint probability mass function p(x, y)
and marginal probability mass functions p(x) and p(y). The mutual information I (X;
Y) is the relative entropy between the joint distribution and the product distribution
p(x)p(y):
Relationship Between Entropy And Mutual
Information
• We can rewrite the definition of mutual information I (X; Y) as

Thus, the mutual


information I (X; Y) is
the reduction in the
uncertainty of X due to
the knowledge of Y.
By symmetry, it also
follows that
I (X; Y) = H(Y) − H(Y|
X).
Thus, X says as much
about Y as Y says about
X.
• A qualitative diagram of the conditional entropy and mutual information quantities of two
probability distributions X and Y.
• Two circles enclose the areas representing the entropies H(X) and H(Y) of the two separate
distributions.
• The common (overlap) area of the two circles corresponds to the mutual information I(X:Y)
in both distributions.
• The remaining parts of two circles represent the corresponding conditional entropies H(X|Y)
and H(Y|X), measuring the residual uncertainty about events in a given set, when one has
the full knowledge of the occurrence of the events in the other set of outcomes.
• The area enclosed by the envelope of the two overlapping circles then represents the
entropy of the “product” (joint) distribution H(X,Y)

You might also like