Lecture 3
Lecture 3
04/02/2024 1
Derivation of Average Information
• Given a set of independent events (A1, A2,…. An) with probability pi = P(Ai ),
we desire the following properties in the measure of average information H:
– We want H to be a continuous function of the probabilities pi .That is, a small
change in pi should only cause a small change in the average information.
– If all events are equally likely, that is, pi = 1/n for all i, then H should be a
monotonically increasing function of n. The more possible outcomes there are,
the more information should be contained in the occurrence of any particular
outcome.
– Suppose we divide the possible outcomes into a number of groups. We indicate
the occurrence of a particular event by first indicating the group it belongs to,
then indicating which particular member of the group it is. Thus, we get some
information first by knowing which group the event belongs to and then we get
additional information by learning which particular event (from the events in
the group) has occurred. The information associated with indicating the
outcome in multiple stages should not be any different than the information
associated with indicating the outcome in a single stage.
• suppose we have an experiment with three
outcomes A1, A2, and A3, with corresponding
probabilities p1, p2, and p3. The average
information associated with this experiment is
simply a function of the probabilities:
04/02/2024 3
Shannon showed that the only way all these conditions
could be satisfied was if
04/02/2024 4
04/02/2024 5
04/02/2024 6
• In other words,
• We can generalize this for the case of n = km as
04/02/2024 7
04/02/2024 8
04/02/2024 9
04/02/2024 10
By convention we
pick K to be 1, and
we have the
formula
04/02/2024 11
Models
04/02/2024 12
Physical Models
• If we know something about the physics of the data generation process,
we can use that information to construct a model.
– For example, if residential electrical meter readings at hourly
intervals were to be coded, knowledge about the living habits of the
populace could be used to determine when electricity usage would
be high and when the usage would be low. Then instead of the
actual readings, the difference (residual) between the actual
readings and those predicted by the model could be coded.
– In general, however, the physics of data generation is simply too
complicated to understand, let alone use to develop a model.
04/02/2024 13
Probability Models
• The simplest statistical model for the source is to assume that each
letter that is generated by the source is independent of every other
letter, and each occurs with the same probability. We could call this
the ignorance model, as it would generally be useful only when we
know nothing about the source.
• The next step up in complexity is to keep the independence
assumption, but remove the equal probability assumption and assign
a probability of occurrence to each letter in the alphabet.
• For a source that generates letters from an alphabet
04/02/2024 16
04/02/2024 17
Markov Chain in Data Compression
• A special Conditional Probability Model is
called a “Markov Model”(MM) which is also
called a “Discrete-Time Markov Chain”
04/02/2024 21
Composite Source Model
• In many applications, it is not easy to use a single model to
describe the source.
• A composite source can be viewed as a combination or
composition of several sources, with only one source being
active at any given time.
• A switch selects a source Si with probability Pi
04/02/2024 22