Review of Probability Theory
A. W. Umrani
1
Outlines
WHAT IS PROBABILITY
ORIGINS OF PROBABILITY
WHY STUDY PROBABILITY?
PROBABILITY IN COMMUNICATIONS
SOME BASICS
2
WHAT IS PROBABILITY
Probability is the mathematical modelling of
the phenomenon of chance/randomness,
and using models to try and predict the
‘likelihood’ of events happening, etc.
Probability is the study of randomness and
uncertainty.
3
WHY STUDY PROBABILITY?
Everything has some degree of
uncertainty/probability!
It trains you to think clearly, rationally and
logically;
It’s a fundamental mathematical science that
is crucial to so many disciplines
4
WHY STUDY PROBABILITY?
Science (e.g. quantum physics),
Engineering (e.g. communication systems),
Information theory (e.g. image coding),
Biology (e.g. genetics),
Economics (e.g. analysis of financial trends),
Social sciences (e.g. population growth),
Entertainment (e.g. gambling),
Politics (e.g. voting intentions and confidence intervals),
Manufacturing (e.g. probability of failure, reliability), etc.
5
PROBABILITY IN COMMUNICATIONS
Speech Recognition (Hidden Markov Models/HMM).
Wireless Communication Systems (Bit Error Rate/BER,
Coding Techniques).
Modelling Wireless Communication Systems (Fading).
Noise in Communication Systems.
Queuing Theory in Packet Switched Networks.
Mobile Phones (Viterbi Equaliser).
Probabilistic Models in Decision and Estimation Theory
(Radar/Sonar Systems).
6
REMEMBER
Probability is paradoxically both intuitive and counter-
intuitive. For example – you toss a coin 8 times. Which
are you more likely to get – three heads or four heads,
and why?
You have to have a ‘feel’ for the subject – i.e. the
solution to a problem sometimes involves as much ‘art’
as ‘science’.
Intuition plays an important role. You can often solve the
problem with no a-priori mathematical knowledge, just
carefully logical reasoning! You can also spectacularly
fail!!
7
Remember - 80% of the academic staff in this University
SOME BASICS
Classical definition of probability
– If all the outcomes are equally likely then the probability of
event A is the number of outcomes in A (M(A)) divided by
number of all outcomes (M):
M ( A)
P ( A) =
M
Frequency definition of probability
– If number of trials is m and number of the occurrence
of A is m(A) then according to frequency definition
probability of A is the limit:
m( A)
P ( A) = lim (m → ∞)
8 m
SOME BASICS
Random Experiment
– The concept behind probabilities is called the
random experiment.
– A random experiment is an experiment that can be
repeated over and over, giving different results.
– a random experiment is a process whose outcome is
uncertain. For example,
– Tossing a coin once or several times
– Picking a card or cards from a deck
– Measuring temperature of patients
9
SOME BASICS
Example
10 days; check the weather
each day. Each time we
check,
it’s a random experiment.
Sample space = {Sunny, Rainy,
Snow}
Event:
Outcome
E = {Rainy, Snow} = precipitation
Sunny 6 probability(E) = prob(Rainy) + prob(Snow)
Rainy 3 = 0.3 + 0.1 = 0.4
Snow 1
10
SOME BASICS
Axioms of Probability
– For any event A, 0 ≤ P(A) ≤ 1.
– P(Ω) =1.
– If A1, A2, … An is a partition of A, then
P(A) = P(A1) + P(A2) + ...+ P(An)
(A1, A2, … An is called a partition of A if A1 ∪ A2 ∪ …∪
An = A and A1, A2, … An are mutually exclusive.)
11
SOME BASICS
Properties of Probability
– For any event A, P(Ac) = 1 - P(A).
– If A ⊂ B, then P(A) ≤ P(B).
– For any two events A and B,
P(A ∪ B) = P(A) + P(B) - P(A ∩ B).
For three events, A, B, and C,
P(A∪B∪C) = P(A) + P(B) + P(C) -
P(A∩B) - P(A∩C) - P(B∩C) + P(A∩B ∩C).
12
SOME BASICS
The probability of an event A is, 0 ≤ P ( A) ≤ 1
The probability of tossing a fair coin and getting heads =
0.5;
So if I toss a coin 10 times will I then get exactly 5
heads?
The probability of getting 10 heads in ten throws is
(0.5)10 or 1 in 1024 (each throw is an independent
event).
If the outcome of an experiment is either event A, event
B or event C, then P(A)+P(B)+P(C) = 1 (exhaustive and
mutually exclusive events).
13
Sample Space and events
If all outcomes in the sample space are equally
likely, the following definition of probability applies:
The probability of an event E, which is a subset of a
finite sample space S of equally likely outcomes, is
given by P(E) = |E|/|S|.
Probability values of event range from 0 (for an
event that will never happen) to 1 (for an event that
will always happen whenever the experiment is
carried out).
Sample Space and events
Let E be an event in a sample space S. The probability
of an event Ec = S - E, the complimentary event of E, is
given by
P(Ec ) = 1 – P(E).
This can easily be shown:
P(Ec) = (|S| - |E|)/|S| = 1 - |E|/|S| = 1 – P(E).
This rule is useful if it is easier to determine the
probability of the complimentary event than the
probability of the event itself.
Complimentary Events
Example I: A sequence of 10 bits is randomly
generated. What is the probability that at least one of
these bits is zero?
Solution: There are 210 = 1024 possible outcomes
of generating such a sequence. The event Ec, “none
of the bits is zero”, includes only one of these
outcomes, namely the sequence 1111111111.
Therefore, P(Ec) = 1/1024.
Now P(E) can easily be computed as
P(E) = 1 – P(Ec) = 1 – 1/1024 = 1023/1024.
If experiment are not equally likely
What happens if the outcomes of an experiment are not equally
likely?
In that case, we assign a probability p(s) to each outcome s ∈ S,
where S is the sample space.
Two conditions have to be met:
(1): 0 ≤ p(s) ≤ 1 for each s ∈ S, and
(2): ∑s ∈ S p(s) = 1
How can we obtain these probabilities p(s) ?
The probability p(s) assigned to an outcome s equals the limit of
the number of times s occurs divided by the number of times the
experiment is performed.
We call function p: S → [0, 1] a probability distribution of S.
Example I
A die is biased so that the number 3 appears twice
as often as each other number. What are the
probabilities of all possible outcomes?
Solution: There are 6 possible outcomes s1, …, s6.
p(s1) = p(s2) = p(s4) = p(s5) = p(s6)
p(s3) = 2p(s1)
Since the probabilities must add up to 1, we have:
5p(s1) + 2p(s1) = 1
7p(s1) = 1
p(s1) = p(s2) = p(s4) = p(s5) = p(s6) = 1/7, p(s3) = 2/7
Example II:
For the biased die from Example I, what is
the probability that an odd number appears
when we roll the die?
Solution:
Eodd = {s1, s3, s5}
Remember the formula p(E) = ∑s∈E p(s).
p(Eodd) = ∑s∈Eodd p(s) = p(s1) + p(s3) + p(s5)
p(Eodd) = 1/7 + 2/7 + 1/7 = 4/7 = 57.14%
Conditional probability
P(B|A) is the probability of event B given
that event A is known to have occurred.
Thus P(B|A) = P(A∩B)/P(A) = P(AB)/P(A), P(A)≠0
Thus P(A|B) = P(A∩B)/P(B) = P(AB)/P(B), P(B)≠0
A B
Conditional probability
number of elements in A ∩ B
P( A | B) =
number of elements in B
number of ways A and B can occur
P( A | B) =
number of ways B can occur
if A and B are equally likely.
Example I
In a college, 25% fail Maths; 15% fail Chemistry; 10%
fail both Maths and Chemistry. (1) What is the
probability that a student failed Maths, given that he
failed Chemistry? (2) What is the probability of failing
Maths or Chemistry?
.
Solution of Example I
Let M={students who failed Maths} and C={students
who failed Chemistry}.
Then
=
P ( M ) 0.25,=
P(C ) 0.15 and P(=
M ∩ C ) 0.10
(1) The probability that a student failed Maths, given that he
failed Chemistry is:
P( M ∩ C ) 0.1
P( M =
| C) = = 0.667
P (C ) 0.15
(2) Prob. of failing Maths or Chemistry is:
P ( M ∪ C= ) P ( M ) + P (C ) − P ( M ∩ C=
) 0.3
Example II
What is the probability of a random bit string
of length four contains at least two
consecutive 0s, given that its first bit is a 0 ?
Solution:
E: “bit string contains at least two consecutive 0s”
F: “first bit of the string is a 0”
Example II
We know the formula p(E | F) = p(E ∩ F)/p(F).
E ∩ F = {0000, 0001, 0010, 0011, 0100}
p(E ∩ F) = 5/16
p(F) = 8/16 = 1/2
p(E | F) = (5/16)/(1/2) = 10/16 = 5/8 = 0.625
Multiplication Rule
Let A and B be any two events in sample space Ω.
Then, by rearranging the definition of conditional probability,
Pr(A ∩ B) = Pr(A|B) Pr(B)
This can be extended to any n events. Let A, B and C be
events, then,
Pr(A ∩ B ∩ C) = Pr(A | B ∩ C) Pr(B ∩ C)
= Pr(A | B ∩ C) Pr(B | C) Pr(C)
And so on to any number n of events A1, A2,...,An
Pr(A1,...,An) = Pr(A1) Pr(A2|A1) Pr(A3|A1∩A2) ... Pr(An|A1∩A2
... ∩ An-1)
Law of Total Probability
If A1,A2, ... ,An are mutually exclusive and
exhaustive events, and let B be any other event,
then . But and so we can say (total probability):
n
=P ( B ) P( B | A1 ) P( A1 ) + P( B | A2 ) P( A2 ) + + P( B =
| An ) P( An ) ∑ P( B | A j ) P( A j )
j =1
A1 A2 A3
B
Bayes’ Rule (named after Sir Thomas
Bayes)
Let A1,A2, ... ,An be mutually exclusive and
exhaustive events, and let B be any other
event. We know that
P ( Ai ∩ B )
P ( Ai | B ) =
P( B)
P ( Ai ∩ B )
P ( B | Ai ) =
P ( Ai )
Bayes’ Rule (named after Sir Thomas
Bayes)
And so we can say that:
P( Ai ∩ B) P( Ai ) P( B | Ai ) P( Ai ) P( B | Ai )
=
P( Ai | B) = =
P( B) P( B) P( B | A1 ) P( A1 ) + P( B | A2 ) P( A2 ) + + P( B | An ) P( An )
A1 A2 A3
B
Example I
100 bits/sec are received from transmitter A: error rate =1/10.
200 bits/sec are received from transmitter B: error rate =1/20.
300 bits/sec are received from transmitter C: error rate =1/30.
Let all three bit streams be randomly multiplexed together, and
let one bit be selected from the composite signal.
a) What is the probability that the selected bit is in error?
b) If the selected bit is in error, what is the probability that it is
from transmitter A?
Example I
Solution
a) Let D denote the event that the bit selected is in
error. Let A,B and C denote the events that the selected
bit is from transmitters A,B and C respectively.
P( D) = P( D ∩ [ A ∪ B ∪ C ]) = P( D ∩ A) + P( D ∩ B) + P( D ∩ C )
= P( D| A) P( A) + P( D| B) P( B) + P( D|C ) P(C )
= ( 101 × 100
600 ) + ( 20 × 600 ) + ( 30 × 600 ) = 20
1 200 1 300 1
Example I
A B C
P ( A ∩ D) P ( D| A) P ( A) 1
P ( A| D) = = =
P ( D) P ( D| A) P ( A) + P ( D| B ) P ( B ) + P ( D| C ) P (C ) 3
Example II
1 0 1 1 0 1 0 1 1 1
Comms channel
with additive
noise
Let A = ‘one transmitted’ and B = ‘one received’.
Also let , ,
P (A) = 0.60 P (B | A) = 0.90 P (B | A) = 0.05
What is the probability that when a ‘one is received’,
then a ‘one has been transmitted’ - i.e. the a-
posteriori probability P(A|B)?
Example II
P ( B / A)
P(A) 1 1 P(B)
P( B / A )
P ( B / A)
P ( A )0 0
P( B )
P( B / A )
Example II
P ( B ) = P ( B| A) P ( A) + P ( B| A ) P ( A ) = 0.56
P(A|B) = [P(B|A)P(A)]/P(B) = 27/28
Example II
Comments on Example2
0.9
0.6 1 1 0.56
0.05
Binary Symmetric Channel (BSC)
0.1
0.4 0 0.95
0 0.44
Example II
Note that this result is obvious from the diagram.
Firstly, it is clear that P(B) – the probability of
receiving a ‘1’ - is (0.6)(0.9) + (0.4)(0.05) = 0.56 is
P(B) . Of this 0.56, the contribution due to a ‘1’ being
transmitted is (0.6)(0.9)=0.54. So the probability that
a ‘1’ was transmitted is intuitively 0.54/0.56 = 27/28.
But clearly this is also P(A|B) = [P(B|A)P(A)]/P(B).
In fact sometimes just forget about the formulae, and
work everything out intuitively from the figures!
Introduction to Random Variables
A random variable(r.v.) X can be either
discrete (in amplitude) or continuous (in
amplitude).
An example of a continuous random
variable would be a person’s height, or the
temperature of a room, or the direction of an
arrow.
Introduction to Random Variables
An example of a discrete random variable
would be the outcome of tossing a die, or
the output of an A/D converter with an
analogue signal input.
A continuous random variable V(t)
CONTINUOUS GAUSSIAN R.V.
4
0
V(t)
-2
-4
0 20 40 60 80 100
time, t
CONTINUOUS UNIFORMLY DISTRIBUTED R.V.
4
0
V(t)
-2
-4
0 20 40 60 80 100
time, t
A discrete random variable X(n)
UNIFORMLY DISTRIBUTED DISCRETE R.V.
2
0
X(n)
-2
-4
0 20 40 60 80 100
sample number, n
GAUSSIAN/NORMALLY DISTRIBUTED DISCRETE R.V.
20
10
0
X(n)
-10
-20
0 20 40 60 80 100
sample number, n
Definition of a Random Variable (Discrete
or Continuous)
A r. v. is a function whose domain is the set of outcomes
λ∈S, and whose range R1, is the real line. For every
outcome λ∈S, the random variable assigns a number, X(λ)
such that:
– P(x1 <X≤ x2 ) = P(λ: x1 <X(λ)≤ x2 )
A discrete random variable X is characterised by a set of
allowable values x1, x2, ... ,xn and the probability that X= xi
is denoted by P(X=xi) for i=1,2, ... ,n and is called the
probability mass function.
Probability Mass Function for a
Discrete R.V
The probability mass function for tossing a die is
P(X = xi) for i =1,2, ..., n.
30 TYPICAL THROWS OF A DIE
6
P(X=xi)
5
3
1/6
X(n)
1
0 1 2 3 4 5 6 7 xi
0
0 5 10 15 20 25 30
THROW NUMBER n
Examples of mass functions in
communications
Uniform r.v. (all outcomes equally likely –
e.g. a die or BPSK signal)
P(X = xi ) = 1/n, i=1,2,3, ..., n
Bernouli r.v. (only two outcomes)
P(X = x0 ) = p; P(X = x1 ) = q=1-p;
(value of a bit)
Examples of mass functions in
communications
Binomial r.v. (X is the number of successes
(probability ‘p’) in n Bernoulli trials
P( X = k ) = nCk p k (1 − p) n− k , k = 0,1,2,..., n
where µ X = np, and σ X2 = np(1 − p).
(bits in error per word)
Poisson r.v. (probability of k telephone calls per hour
at exchange)
=
P( X k=
) e − λ λ k / k !, =
k 0,1, 2,...
where µ X
= λ=
, and σ X
2
λ.
Probability Mass Function for a
Poisson Random Variable (1)
51 POINTS OF PMF FOR POISSON RANDOM VARIABLE X
0.18
0.16
0.14
0.12
0.1
P(X=k)
0.08 ) e − λ λ k / k !, =
P ( X= k= k 0,1, 2,...
0.06
µ X λ , and =
where = σ X2 λ=
(λ 5)
0.04
0.02
0
0 5 10 15 20 25 30 35 40 45 50
k
Probability Mass Function for a
Poisson Random Variable (2)
51 POINTS OF PMF FOR POISSON RANDOM VARIABLE X
0.5
0.45
0.4
) e − λ λ k / k !, =
P ( X= k=
0.35
0.3
k 0,1, 2,...
µ X λ , and =
where = σ X2 λ=
(λ 0.75)
P(X=k)
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15 20 25 30 35 40 45 50
k
Probability Mass Function for a
Poisson Random Variable (3)
51 POINTS OF PMF FOR POISSON RANDOM VARIABLE X
0.09
0.08
0.07
) e − λ λ k / k !, =
0.06
0.05
P ( X= k= k 0,1, 2,...
P(X=k)
0.04 µ X λ , and =
where = σ X2 λ=
(λ 20)
0.03
0.02
0.01
0
0 5 10 15 20 25 30 35 40 45 50
k
Properties of Probability Mass Functions
P ( X = xi ) > 0, i = 1,2,, n
n
∑ P( X
i =1
= xi ) = 1
P ( X ≤ x ) = FX ( x ) = ∑ P( X
all xi ≤ x
= xi )
FX ( x) ( is the cumulative distribution function)
P ( X = xi ) = Lim [ FX ( xi ) − FX ( xi − ε )]
ε → 0 ,ε > 0
The (Cumulative) Distribution Function
for a Discrete R.V
Consider tossing a die. Then we have:
λ1 : Up face is a one ⇒ X(λ1 ) = 1
λ2 : Up face is a two ⇒ X(λ2 ) = 2
. .
. . .
λ6 : Up face is a six ⇒ X(λ6 ) = 6
– Distribution Function
– P(X≤x) = FX(x) and so 1
5/6
– P[x1 <X≤ x2 ] = FX (x2) - FX (x1) 4/6
– where FX(−∞) = 0 and FX (∞) = 1 3/6
2/6
1/6
0 1 2 3 4 5 6
x
Mean, Median and Mode for A Discrete
R.V
For a discrete random variable X let P(X = xi)
for i =1,2, ..., n. Then n
MEAN(X) = ; µ X = E { X } = ∑
i =1
xi P ( X = xi )
MEDIAN(X) = that value such that ;
P ( X ≤ median) = P ( X > median) = 0.5
MODE(X) = that value of X such where its
probability mass function is at a maximum.
Example
Consider a biased die with face numbers
1,2,3,4,5,6. Let the probabilities of
occurrence be respectively: 0.125, 0.125,
0.125, 0.125, 0.125 and 0.375. Then the
mean is 4.125; the median is 4.5; and the
mode is 6.