0% found this document useful (0 votes)
175 views44 pages

Entropy and Noise in Communication Systems

This document discusses information theory concepts related to noise, entropy, and communication over noisy channels. It introduces key topics such as additive noise channels, random processes like white Gaussian noise, signal to noise ratio, analog vs digital communication, entropy, channel capacity, coding theory, and the Huffman coding algorithm. Communication systems are abstractly modeled with an information source, encoder, channel, and decoder. Noise is an unavoidable aspect of real communication channels.

Uploaded by

thinkoneblack
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
175 views44 pages

Entropy and Noise in Communication Systems

This document discusses information theory concepts related to noise, entropy, and communication over noisy channels. It introduces key topics such as additive noise channels, random processes like white Gaussian noise, signal to noise ratio, analog vs digital communication, entropy, channel capacity, coding theory, and the Huffman coding algorithm. Communication systems are abstractly modeled with an information source, encoder, channel, and decoder. Noise is an unavoidable aspect of real communication channels.

Uploaded by

thinkoneblack
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Noise, Information Theory,

and Entropy

CS414 – Spring 2007


By Roger Cheng, Karrie Karahalios, Brian Bailey
Communication system
abstraction

Information
Encoder Modulator
source

Sender side
Channel
Receiver side

Output signal Decoder Demodulator


The additive noise channel
• Transmitted signal s(t)
is corrupted by noise
source n(t), and the
resulting received signal
is r(t) s(t) + r(t)

• Noise could result form


many sources, including n(t)
electronic components
and transmission
interference
Random processes
• A random variable is the result of a single
measurement

• A random process is a indexed collection of


random variables, or equivalently a non-
deterministic signal that can be described by
a probability distribution

• Noise can be modeled as a random process


WGN (White Gaussian Noise)
• Properties
• At each time instant t = t0, the value of n(t) is
normally distributed with mean 0, variance σ2 (ie
E[n(t0)] = 0, E[n(t0)2] = σ2)

• At any two different time instants, the values of n(t)


are uncorrelated (ie E[n(t0)n(tk)] = 0)

• The power spectral density of n(t) has equal power


in all frequency bands
WGN continued
• When an additive noise channel has a white Gaussian
noise source, we call it an AWGN channel

• Most frequently used model in communications

• Reasons why we use this model


• It’s easy to understand and compute
• It applies to a broad class of physical channels
Signal energy and power

• Energy is defined as  x =  | x(t ) | 2 dt

T /2
1
• Power is defined as 
2
Px = lim | x (t ) | dt
T  T
T / 2

• Most signals are either finite energy and zero


power, or infinite energy and finite power

• Noise power is hard to compute in time domain


• Power of WGN is its variance σ2
Signal to Noise Ratio (SNR)
• Defined as the ratio of signal power to the noise
power corrupting the signal

• Usually more practical to measure SNR on a dB


scale

• Obviously, want as high an SNR as possible


Analog vs. Digital
• Analog system
• Any amount of noise will create distortion at the
output

• Digital system
• A relatively small amount of noise will cause no
harm at all
• Too much noise will make decoding of received
signal impossible

• Both - Goal is to limit effects of noise to a


manageable/satisfactory amount
Information theory and entropy
• Information theory tries to
solve the problem of
communicating as much
data as possible over a
noisy channel

• Measure of data is entropy

• Claude Shannon first


demonstrated that reliable
communication over a noisy
channel is possible (jump-
started digital age)
Review of Entropy Coding
• Alphabet: finite, non-empty set
• A = {a, b, c, d, e…}

• Symbol (S): element from the set

• String: sequence of symbols from A

• Codeword: sequence representing coded string


• 0110010111101001010
N

• Probability of symbol in string  p 1


i 1
i

• Li: length of codeword of symbol I in bits


"The fundamental problem of
communication is that of reproducing at
one point, either exactly or approximately,
a message selected at another point."

-Shannon, 1944
Measure of Information
• Information content of symbol si
• (in bits) –log2p(si)

• Examples
• p(si) = 1 has no information
• smaller p(si) has more information, as it was
unexpected or surprising
Entropy
• Weigh information content of each source
symbol by its probability of occurrence:
• value is called Entropy (H)
n

  p(s ) log
i 1
i 2 p ( si )

• Produces lower bound on number of bits needed


to represent the information with code words
Entropy Example
• Alphabet = {A, B}
• p(A) = 0.4; p(B) = 0.6

• Compute Entropy (H)


• -0.4*log2 0.4 + -0.6*log2 0.6 = .97 bits

• Maximum uncertainty (gives largest H)


• occurs when all probabilities are equal
Entropy definitions
• Shannon entropy

• Binary entropy formula

• Differential entropy
Properties of entropy
• Can be defined as the expectation of log p(x) (ie H(X) = E[-
log p(x)])

• Is not a function of a variable’s values, is a function of the


variable’s probabilities

• Usually measured in “bits” (using logs of base 2) or “nats”


(using logs of base e)

• Maximized when all values are equally likely (ie uniform


distribution)

• Equal to 0 when only one value is possible


Joint and conditional entropy

• Joint entropy is the entropy of the


pairing (X,Y)

• Conditional entropy is the entropy of X


if the value of Y was known

• Relationship between the two


Mutual information
• Mutual information is how much
information about X can be obtained by
observing Y
Mathematical model of a
channel
• Assume that our input to the channel is
X, and the output is Y

• Then the characteristics of the channel


can be defined by its conditional
probability distribution p(y|x)
Channel capacity and rate
• Channel capacity is defined as the
maximum possible value of the mutual
information

• We choose the best f(x) to maximize C


• For any rate R < C, we can transmit
information with arbitrarily small
probability of error
Binary symmetric channel
• Correct bit transmitted with probability 1-p
• Wrong bit transmitted with probability p
• Sometimes called “cross-over probability”
• Capacity C = 1 - H(p,1-p)
Binary erasure channel
• Correct bit transmitted with probability 1-p
• “Erasure” transmitted with probability p
• Capacity C = 1 - p
Coding theory
• Information theory only gives us an upper
bound on communication rate
• Need to use coding theory to find a practical
method to achieve a high rate
• 2 types
• Source coding - Compress source data to a
smaller size
• Channel coding - Adds redundancy bits to make
transmission across noisy channel more robust
Source-channel separation
theorem
• Shannon showed that when dealing with one
transmitter and one receiver, we can break
up source coding and channel coding into
separate steps without loss of optimality

• Does not apply when there are multiple


transmitters and/or receivers
• Need to use network information theory principles
in those cases
Coding Intro
• Assume alphabet K of
{A, B, C, D, E, F, G, H}

• In general, if we want to distinguish n


different symbols, we will need to use, log2n
bits per symbol, i.e. 3.

• Can code alphabet K as:


A 000 B 001 C 010 D 011
E 100 F 101 G 110 H 111
Coding Intro
“BACADAEAFABBAAAGAH” is encoded as
the string of 54 bits

• 00100001000001100010000010100000
1001000000000110000111

(fixed length code)


Coding Intro
• With this coding:
A0 B 100 C 1010 D 1011
E 1100 F 1101 G 1110 H 1111

• 10001010010110110001101010010000
0111001111

• 42 bits, saves more than 20% in space


Huffman Tree

A (8), B (3), C(1), D(1), E(1), F(1), G(1), H(1)


Huffman Encoding
• Use probability distribution to determine
how many bits to use for each symbol
• higher-frequency assigned shorter codes
• entropy-based, block-variable coding
scheme
Huffman Encoding
• Produces a code which uses a minimum
number of bits to represent each symbol
• cannot represent same sequence using fewer real
bits per symbol when using code words
• optimal when using code words, but this may
differ slightly from the theoretical lower limit
• lossless

• Build Huffman tree to assign codes


Informal Problem Description
• Given a set of symbols from an alphabet and
their probability distribution
• assumes distribution is known and stable

• Find a prefix free binary code with minimum


weighted path length
• prefix free means no codeword is a prefix of any
other codeword
Huffman Algorithm
• Construct a binary tree of codes
• leaf nodes represent symbols to encode
• interior nodes represent cumulative probability
• edges assigned 0 or 1 output code

• Construct the tree bottom-up


• connect the two nodes with the lowest probability
until no more nodes to connect
Huffman Example
• Construct the Symbol
P (S)
Huffman coding tree (S)
(in class) A 0.25
B 0.30
C 0.12
D 0.15
E 0.18
Characteristics of Solution
• Lowest probability symbol is Symbol
always furthest from root Code
(S)

• Assignment of 0/1 to children A 11


edges arbitrary B 00
• other solutions possible; lengths
remain the same
• If two nodes have equal
C 010
probability, can select any two
D 011

• Notes E 10
• prefix free code
• O(nlgn) complexity
Example Encoding/Decoding
Encode “BEAD” Symbol
Code
(S)
001011011
A 11
B 00
C 010
Decode “0101100”
D 011
E 10
Entropy (Theoretical Limit)
N
H    p ( si ) log 2 p ( si ) Symbol P (S) Code
i 1
A 0.25 11

= -.25 * log2 .25 + B 0.30 00


-.30 * log2 .30 + C 0.12 010
-.12 * log2 .12 +
-.15 * log2 .15 + D 0.15 011
-.18 * log2 .18 E 0.18 10

H = 2.24 bits
Average Codeword Length
N
L   p ( si )codelength( si ) Symbol P (S) Code
i 1
A 0.25 11
= .25(2) +
.30(2) + B 0.30 00
.12(3) + C 0.12 010
.15(3) +
.18(2) D 0.15 011

L = 2.27 bits E 0.18 10


Code Length Relative to Entropy
N N
L   p ( si )codelength( si ) H    p ( si ) log 2 p( si )
i 1 i 1

• Huffman reaches entropy limit when all


probabilities are negative powers of 2
• i.e., 1/2; 1/4; 1/8; 1/16; etc.

• H <= Code Length <= H + 1


Example
H = -.01*log2.01 + Symbol P (S) Code
-.99*log2.99 A 0.01 1
= .08 B 0.99 0

L = .01(1) +
.99(1)
=1
Exercise
• Compute Entropy (H) Symbol
P (S)
(S)
A 0.1
• Build Huffman tree
B 0.2

C 0.4
• Compute average D 0.2
code length
E 0.1

• Code “BCCADE”
Solution
• Compute Entropy (H) Symbol P(S) Code
• H = 2.1 bits
A 0.1 111

B 0.2 100
• Build Huffman tree
C 0.4 0

D 0.2 101
• Compute code length
E 0.1 110
• L = 2.2 bits

• Code “BCCADE” => 10000111101110


Limitations
• Diverges from lower limit when probability of
a particular symbol becomes high
• always uses an integral number of bits

• Must send code book with the data


• lowers overall efficiency

• Must determine frequency distribution


• must remain stable over the data set
Error detection and correction
• Error detection is the ability to detect errors that
are made due to noise or other impairments
during transmission from the transmitter to the
receiver.

• Error correction has the additional feature that


enables localization of the errors and correcting
them.

• Error detection always precedes error correction.

• (more next week)

You might also like