0% found this document useful (0 votes)
21 views

Lecture 3

The document discusses derivation of average information and different models for data compression including physical models, probability models, Markov models, and composite source models. It describes how Markov models can be used in text compression by increasing the context size to reduce entropy.

Uploaded by

anushka
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lecture 3

The document discusses derivation of average information and different models for data compression including physical models, probability models, Markov models, and composite source models. It describes how Markov models can be used in text compression by increasing the context size to reduce entropy.

Uploaded by

anushka
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Compression Lecture 3

04/02/2024 1
Derivation of Average Information
• Given a set of independent events (A1, A2,…. An) with probability pi = P(Ai ),
we desire the following properties in the measure of average information H:
– We want H to be a continuous function of the probabilities pi .That is, a small
change in pi should only cause a small change in the average information.
– If all events are equally likely, that is, pi = 1/n for all i, then H should be a
monotonically increasing function of n. The more possible outcomes there are,
the more information should be contained in the occurrence of any particular
outcome.
– Suppose we divide the possible outcomes into a number of groups. We indicate
the occurrence of a particular event by first indicating the group it belongs to,
then indicating which particular member of the group it is. Thus, we get some
information first by knowing which group the event belongs to and then we get
additional information by learning which particular event (from the events in
the group) has occurred. The information associated with indicating the
outcome in multiple stages should not be any different than the information
associated with indicating the outcome in a single stage.
• suppose we have an experiment with three
outcomes A1, A2, and A3, with corresponding
probabilities p1, p2, and p3. The average
information associated with this experiment is
simply a function of the probabilities:

04/02/2024 3
Shannon showed that the only way all these conditions
could be satisfied was if

where K is an arbitrary positive constant.

04/02/2024 4
04/02/2024 5
04/02/2024 6
• In other words,
• We can generalize this for the case of n = km as

04/02/2024 7
04/02/2024 8
04/02/2024 9
04/02/2024 10
By convention we
pick K to be 1, and
we have the
formula

04/02/2024 11
Models

04/02/2024 12
Physical Models
• If we know something about the physics of the data generation process,
we can use that information to construct a model.
– For example, if residential electrical meter readings at hourly
intervals were to be coded, knowledge about the living habits of the
populace could be used to determine when electricity usage would
be high and when the usage would be low. Then instead of the
actual readings, the difference (residual) between the actual
readings and those predicted by the model could be coded.
– In general, however, the physics of data generation is simply too
complicated to understand, let alone use to develop a model.

04/02/2024 13
Probability Models
• The simplest statistical model for the source is to assume that each
letter that is generated by the source is independent of every other
letter, and each occurs with the same probability. We could call this
the ignorance model, as it would generally be useful only when we
know nothing about the source.
• The next step up in complexity is to keep the independence
assumption, but remove the equal probability assumption and assign
a probability of occurrence to each letter in the alphabet.
• For a source that generates letters from an alphabet

• Given a probability model (and the independence assumption), we


can compute the entropy of the source
• If the assumption of independence does not fit with our observation
of the data, we can generally find better compression schemes if we
discard this assumption. When we discard
04/02/2024 14
Markov Models
• What is a Markov model?
– A Markov model is a stochastic method for
randomly changing systems that possess the
Markov property. This means that, at any given
time, the next state is only dependent on the
current state and is independent of anything in
the past.
• Two commonly applied types of Markov
model are used when the system being
represented is autonomous -- that is, when
the system isn't influenced by an external
agent.
04/02/2024 15
Markov Models: Types
 Markov chains. These are the simplest type of Markov
model and are used to represent systems where all states
are observable. Markov chains show all possible states,
and between states, they show the transition rate, which
is the probability of moving from one state to another per
unit of time. This is used in Data Compression
 Hidden Markov models. These are used to represent
systems with some unobservable states. In addition to
showing states and transition rates, hidden Markov
models also represent observations and observation
likelihoods for each state.

04/02/2024 16
04/02/2024 17
Markov Chain in Data Compression
• A special Conditional Probability Model is
called a “Markov Model”(MM) which is also
called a “Discrete-Time Markov Chain”

In other words, knowledge of the past k symbols is equivalent to the


knowledge of the entire past history of the process. The values (States)
taken on by the set process.
04/02/2024 18
• If we assumed that the dependence was
introduced in a linear manner, we could view
the data sequence as the output of a linear
filter driven by white noise. The output of
such a filter can be given by the difference
equation

• is a white noise process. This model is often


used when developing coding algorithms for
speech and images.
• The use of the Markov model does not
04/02/2024 19
A two-state Markov model for binary images

• The entropy of a finite state process with


states Si is simply the average value of the
entropy at each state:
04/02/2024 20
Markov Models in Text Compression
• Markov models are particularly useful in text compression, where the
probability of the next letter is heavily influenced by the preceding
letters.
• the kth-order Markov models are more widely known as finite context
models, with the word context being used for what we have earlier
defined as state.
• Consider the word preceding. Suppose we have already processed
precedin and are going to encode the next letter. If we take no account
of the context and treat each letter as a surprise, the probability of the
letter g occurring is relatively low. If we use a first-order Markov model
or single-letter context (that is, we look at the probability model given
n), we can see that the probability of g would increase substantially. As
we increase the context size (go from n to in to din and so on), the
probability of the alphabet becomes more and more skewed, which
results in lower entropy.

04/02/2024 21
Composite Source Model
• In many applications, it is not easy to use a single model to
describe the source.
• A composite source can be viewed as a combination or
composition of several sources, with only one source being
active at any given time.
• A switch selects a source Si with probability Pi

04/02/2024 22

You might also like