Elementary Probability and Statistics
Elementary Probability and Statistics
Contents
1 Introduction to Probability 3
1.1 Some important terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Empirical or experimental Probability . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Compound Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Mutually Exclusive Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter One
Introduction to Probability
Probability theory originated from gambling theory. A large problems exists today which are
based on the game of chances, such as coin tossing, die throwing and playing cards. Probability
is very important for statistics because it provides the rules that allows us to reason uncertainty
and randomness, which is the basis of statistics. Independence and conditional probability are
profound ideas, but they must be understood in order to think clearly about any statistical
investigation.
Aims of the chapter: The aims of this chapter are to:
Learning outcomes: After completing this chapter, you should be able to:
• Explain the fundamental ideas of random experiments, sample spaces and events
Example 1.1.
4. The sample space, S, of an experiment is a set of all possible outcomes of one trial of
the experiment. Each outcome can then be called a sample point.
Example 1.2. • Dataset 1: S = {0, 1, 2, ....}
• Dataset 2: S = {x : x ≥ 0}
• Toss of a coin: S = {H, T }
• Roll of a six sided die: S = {1, 2, 3, 4, 5, 6}
7. The total number of possible outcomes in any trial is known as exhaustive events.
Definition 1.2 (Probability). The probability of an event A occurring is the ration of the
total number of a favourable outcomes to the total number of all possible outcomes all equally
likely to occur. That is, the probability of an event A is given by P (A) where
Now, Since the probability of the sample space is 1, that is, P (S) = 1, we have that
Thus in general, if an event A can occur in n ways out of a total number of N possible likely
ways, then the probability of occurrence of the event called it’s success can be expressed as
n
P (A) = (1.4)
N
And
N −n n
P (Ac ) = =1− = 1 − P (A) (1.5)
N N
Conversely, the probability of non occurrence of the event called it’s failure can be expressed
as
P (success) + P (f ailure) = 1 (1.6)
5
Properties of probbility
1. P (∅) = 0
Example 1.4. 1. In the role of a pair of dice white, find the probability that appearance are
the same.
Solution:
1. S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2),
(5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
n(S) = 36
Let A be the set of same appearance, then
A = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}
n(A) = 6
We have that, P (A) = n(A) n(S)
= 366
= 16
Solution:
6
Remark 1.1. If your probability is zero, it means the event can never occur.
If after n (n large) repetitions of an experiment if an event occurs, then the probability of the
outcome is nh . This is called the Empirical Probability of the event.
For example, if a coin is tossed 1000 times and heads count is 489 then the probability of
getting a head on the next throw is
489
P (H) = = 0.489 = 48.9% (1.8)
1000
The probability of getting a tail on the next throw is
A ∩ B 0 = {x : x ∈ A and x ∈
/ B} (1.10)
Definition 1.3. Two events A and B are mutually exclusive if the are disjoint. That is, if
A ∩ B = {}.
7
P (A ∪ ∅) = P (A) + P (∅)
=⇒ P (∅) = P (A) − P (A ∪ ∅)
= P (A) − P (A)
=0
Proof: The sample space S can be decomposed into two mutually exclusive events A and
A . That is, S = A ∪ A0
0
We have that
P (S) = P (A ∪ A0 )
1 = P (A) + P (A0 )
=⇒ P (A0 ) = 1 − P (A)
P (A ∩ B 0 ) = P (A) − P (A ∩ B) (1.13)
Proof:
A = (A ∩ B) ∪ (A ∩ B 0 )
=⇒ P (A) = P [(A ∩ B) ∪ (A ∩ B 0 )]
=⇒ P (A) = P (A ∩ B) + P (A ∩ B 0 )
=⇒ P (A ∩ B) = P (A) − P (A ∩ B 0 )
Proof: The set A ∪ B can be decomposed into two mutually exclusive events A ∩ B 0 and
B. Therefore
A ∪ B = (A ∩ B 0 ) ∪ B
P (A ∪ B) = P [(A ∩ B 0 ) ∪ B]
= P (A ∩ B 0 ) + P (A)
= P (A) − P (A ∩ B) + P (B)
Thus,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (1.15)
Definition 1.4. Events A, B and C are collectively exhaustive if
P (A ∪ B ∪ C) = 1 (1.16)
That is
Example 1.6. John tosses a fair coin and Daniel tosses a fair dice. Find the probability of
Solution:
(a)
P (H ∩ 4) = P (H) · P (4)
1 1
= ·
2 6
1
=
12
9
(b)
P (H ∪ 4) = P (H) + P (4) − P (H ∩ 4)
1 1 1
= + −
2 6 12
4 1
= −
6 12
7
=
12
Example 1.7. A fair coin is tossed 3 times. Consider the events A, B and C where
decide whether
(i) A is independent of B.
(ii) B is independent of C.
(iii) A is independent of C.
(iii) Assignment
P (A ∩ B)
P (A\B) = , P (B) 6= 0 (1.21)
P (B)
Bayes’ theorem comes into play when the conditional probability cannot be applied directly.
Therefore Bayes’ theorem is a extension of the conditional probability.
P (Ai ) · P (B\Ai )
P (Ai \B) = (1.22)
P (A1 ) · P (B\A1 ) + P (A2 ) · P (B\A2 ) + · · · + P (An ) · P (B\An )
1.8 EXERCISES
2. A bag contains 8 white and 10 black balls. Two balls are drawn in succession. What is
the probability that the first is white and the second is black.
3. Two persons A and B appear in an interview for 2 vacancies for the same post. The
probability of A’s selection is 71 and that of B’s selection is 15 . What it the probability
that
4. What is the chance of getting two 6s’ in two rolling of a single die?
6. From a bag containing 4 white balls and 6 black balls, two balls are drawn at random. If
the balls are drawn one after the other without replacement, find the probability that
7. Do the same exercise as above, but consider that the balls are replaced in this case.
8. IF from a pack of cards a single card is drawn. What is the probability that it is either a
spade or a King?
9. A person is known to hit the target in 3 out of 4 shots, whereas another person is known
to hit the target in 2 out of 3 shots. Find the probability of the targets being hit at all
when they both try.
10. A bag contains 3 red and 4 white balls. Two draws are made without replacement. What
is the probability that both ball are red?
11. Find the probability of drawing a queen and an King from a pack of cards in two consec-
utive draws, the cards drawn are not replaced.
12. The Guardian news paper publishes three columns entitled politics (A), Books (B), Ad-
verts (C). Reading habits of a randomly selected reader with respect to the three columns
are, A = 0.14, B = 0, 23, C = 0.37, A ∩ B = 0.08, A ∩ C = 0.09, B ∩ C = 0.13 and
A ∩ B ∩ C = 0.05. Find
(i) P (A\B)
(ii) P (A\B ∪ C)
(iii) P (A\reads at least one)
(iv) P (A ∪ B\C)
12
Chapter Two
Learning outcomes: After completing this chapter, you should be able to:
• Define a random variable and distinguish it from the values that it takes.
• Find the expectation and variance of simple random variables, whether discrete of con-
tinuous.
• Demonstrate how to proceed and use simple properties of expected values and variance.
Definition 2.1. A random variable is an experiment for which the outcomes are numbers.
Suppose that to each point of a sample space we assign a number. We then have a function
defined on the sample space. This function is called a random variable (or stochastic variable)
or more precisely a random f unction (stochastic f unction). It is usually denoted by an upper-
case letter such as X or Y . In general, a random variable has some specified physical, geometric,
or other significance.
Example 2.1. Suppose that a coin is tossed twice so that the sample space is S = {HH, HT, T H, T T }.
Let X represent the number of heads that can come uo. With each sample point we can associate
a number for X as shown in the Table below. It follows that X is a random variable.
13
Sample Point HH HT TH TT
X 2 1 1 0
It Should be noted that many other random variables could also be defined on this sample
space, for example, the square of the number of heads or the number of heads minus the tails.
Example 2.2. Roll two dice and call their sum X. The sample space for X is
and the outcomes are not equally likely. However, we know that the probabilities of the events
corresponding to each of these outcomes, and we could display them in a table as follows.
Outcome 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Probability 36 36 36 36 36 36 36 36 36 36 36
A random variable that takes on a finite or countably infinite number of values is called a
discrete random variable. For example, the number of days that it rains yearly.
While one that takes on noncountably infinite(continuous) number of values is called contin-
uous random variable. For example, the amount of preparation time for an Exams.
Definition 2.2. A random variable that takes a finite or countably finite number of values is
called a discrete random variable
Definition 2.3. Let X be a discrete random variable. We define the probability mass func-
tion(PMF) to be the function which gives the probability of each x ∈ SX . That is,
X
P (X = x) P (s) (2.1)
(s∈S|X(s)=x)
That is, the probability of getting a particular number is the sum of the probabilities of all
those outcomes which have that number associated with them. The set of all pairs
{(x, P (X = x)) : x ∈ SX }
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36
as every outcome has some number associated with it. It can often be useful to know the
probability that a random number is no greater than some particular value. This leads us to
cumulative distributive f unction.
Definition 2.4. Let X be a discrete random variable. We define the cumulative distributive
function (CDF) as X
F (x) = P (X ≤ x) = P (X = x)
y∈S:y≤x
Example 2.4. For the sum of two dice, the cumulative distribution function for the outcomes
can be tabulated as
x 2 3 4 5 6 7 8 9 10 11 12
1 3 6 10 15 21 26 30 33 35 36
P (X ≤ x) 36 36 36 36 36 36 36 36 36 36 36
It is important to know that the CDF is defined for all real numbers - not just the possible
values. In ou example, we have
F (−3) = P (X ≤ −3) = 0
6
F (4.5) = P (X ≤ 4.5) = P (X ≤ 4) =
36
F (25) = P (X ≤ 25) = 1
Example 2.5. Consider the following probability distribution for the household size, X in
Cameroon. With x being the number of people in a household
x 1 2 3 4 5 6 7 8
P (X = x) 0.3002 0.3417 0.1551 0.1336 0.0494 0.0145 0.0034 0.0021
8
P
These are clearly non negative and their sum P (X = x) = 1. We have that the CDF of
i=1
household problem is
x 1 2 3 4 5 6 7 8
F (x) = P (X ≤ x) 0.3002 0.6419 0.7970 0.9306 0.9800 0.9945 0.9979 1.0000
15
Thus,
Definition 2.5. The expected value or (population) mean of a random variable X, denote
µ or E(X), is the sum over n possible values. That is
n
X n
X
µ = E(X) = xi P (X = xi ) = xi f (xi ) (2.8)
i=1 i=1
The location measure use to summarise random quantities is known as the expectation of
the random quantity. It is the ”centre of mass” of the probability distribution. Analogous to
the sample mean x̄, it represents the ”average” value of X.
The variance represents the spread of random quantities, relative to the expected value of
all values with positive probability. The variance can also be written as
n
X
2
σ = V ar(X) = x2i P (X = xi ) − µ2
i=1
Definition 2.7. The standard deviation of X, denoted by σ or SD(X), is the square root
16
r
35
SD(X) =
6
Example 2.8. Find the expectation and variance of the Household example.
Now that we have an understanding of discrete random quantities, we look next at a few
standard families of discrete random variables. One of the most commonly encountered discrete
distribution is the binomial distribution. This is the distribution of the number of ”successes”
in a series of n independent trails, each of which results in a ”success” (with probability p) or
a ”failure” (with probability 1 − p). If the number of successes ix X, we would write
X ∼ B(n, p)
to indicate that X is a binomial random quantity based on n independent trials, each occurring
probability p.
Example 2.9. 1. Toss a fair coin 100 times and let X be the number of heads. Then X ∼
B(100, 0.5).
2. A certain kind of lizard lays 8 eggs, each of which will hatch independently with probability
0.7. Let Y denote the number of eggs which hatch. Then Y ∼ B(8, 0.7)
Let us now derive the PMF for the binomial distribution X ∼ B(n, p). Clearly, X can take
on any value from 0 up to n, and no other. Therefore, we simply have to calculate P (x = k) for
k = 1, 2, 3, ..., n. The probability of k successes followed by n−k failures is given by pk (1−p)n−k.
Indeed, this is the probability of any particular sequence involving k successes. There are nk
such sequences, so by the multiplication principle, we have
n k
P (X = k) = p (1 − p)n−k , k = 0, 1, 2, ..., n (2.11)
k
17
k 0 1 2 3 4 5 6 7 8
P (Y ≤ k) 0.00 0.00 0.01 0.04 0.14 0.25 0.30 0.20 0.06
F (k) 0.00 0.00 0.01 0.06 0.19 0.45 0.74 0.94 1.00
The Expectation, variance and standard deviation of the binomial distribution X ∼ (n, p)
are given by
E(X) = np (2.12)
V ar(X) = np(1 − p) (2.13)
p
SD(X) = V ar(X) (2.14)
Example 2.11. For example, in the case of the coin tosses, X ∼ B(n, p), we have
If X is the number of trials until a success is encountered, and each independent trial has
probability p of being a success, we write
X ∼ Geom(p)
Clearly, X can take on any positive integer, so to deduce the PMF, we need to calculate
P (X = k) for k = 1, 2, 3, ... In order to have X = k, we must have an ordered sequence of k − 1
failures followed by one success. By the multiplication rule we have that
The Poisson distribution is a very important discrete probability distribution, which arises in
many different context in probability and statistics. Typically, Poisson random quantities are
used in place of binomial random quantities in situations where n is large, p is small, and the
expectation np is stable.
A Poisson random variable, X with parameter λ is written as
X ∼ P (λ)
Example 2.13. Consider the number of calls made in a 1 minute interval to an internet service
provider (ISP). The ISP has thousands of subscribers, but each one will call with a very small
probability. The ISP knows that on average 5 calls will be made in the interval. The actual
number of calls will be a Poisson random variable, with mean 5.
λk −k
P (X = k) = e , k = 0, 1, 2, 3, ... (2.22)
k!
Example 2.14. content...
The Expectation, variance and standard deviation of the Poisson distribution X ∼ P (λ)
are given by
E(X) = λ (2.23)
V ar(X) = λ (2.24)
p
SD(X) = V ar(X) (2.25)
Thus the Expectation and the variance of the Poisson distribution are both λ.
19
In this section, we discuss techniques for handling continuous random quantities. Continuous
random variables are random quantities with a sample space which is neither finite nor
countably infinite. Continuous probability models are appropriate if the result of a experiment
is a continuous measurement, rather than a count of a discrete set.
IF X is a continuous random quantity with sample space S, then for any particular a ∈ S,
we generally have that
P (X = a) = 0
. This is because the sample space is so ”large” and every possible outcome so ”small” that the
probability of any particular value is vanishingly small. Therefore the probability mass function
we defined for discrete random quantities is inappropriate for understanding continuous random
quantities. In order to understand continuous random quantities, we need to understand a little
calculus.
Definition 2.8. Let X be a continuous random variable, the probability density function
(PDF) of X is a function f (x) which satisfies the following:
1. f (x) ≥ 0
R∞
2. −∞
f (x)dx = 1
Rb
3. P (a ≤ X ≤ b) = a
f (x)dx for any a and b.
Consequently, we have
Z
P (x ≤ X ≤ x + δx) = x + δxf (y)dy
x
u f (x)δx
P (x ≤ X ≤ x + δx)
=⇒ f (x) u
δx
So that we may interpret the PDF as
P (x ≤ X ≤ x + δx)
f (x) = lim = (2.26)
δx→0 δx
Example 2.15. The manufacturer of a certain kind of light bulb claims that the lifetime of the
20
where c is a constant. What value must c take in order for this to define a valid PDF? What is
the probability that the bulb lasts no longer that 150 hours? Given that a bulb lasts longer than
150 hours, what is the probability that it lasts longer that 200 hours?
Remark 2.1. 1. PDFs are not probabilities. For example, the density can take values greater
than 1 in some regions as long as it still integrates to 1.
2. Because P (X = x) = 0, we have that P (X ≤ k) = P (X < k) for continuous random
quantities.
Definition 2.9. Let X be a continuous random variable, the cumulative distribution func-
tion (CDF) of X is a function F (x) such that for all x
F (x)P (X ≤ x)
Hence, the CDF of continuous random quantities is defined the same as the CDF of discrete
random quantities, but for continuous random quantities we have the continuous analogue
Z x
F (x) = P (X ≤ x) = P (−∞ ≤ X ≤ x) = f (y)dy
−∞
Just as in the discrete case, the cumulative distribution function is defined for all x ∈ R, evene
if the sample space S is not the whole of the real line.
Properties of the CDF
Definition 2.10. The median of a random quantity is the value m which is the ”middle” of
the distribution. That is, it is the value m such that
1
P (X ≤ m) = P (X ≥ m) =
2
21
F (x)0.5
Similarly, the lower quartile of a random quantity is the value l such that
F (l) = 0.25
F (u) = 0.75
Now, that we have the basic properties of continuous random variables, we can look at
some of the important standard continuous probability distribution models. We start with the
simplest of these which is the uniform distribution.
Definition 2.11. A random quantity X is said to have a uniform distribution over the
range [a, b] written
X ∼ U (a, b)
if is has PDF as (
1
b−a
a≤x≤b
,
f (x) = (2.27)
0, otherwise
and it’s CDF is
0,
x<a
F (x) = x−a
b−a
, a≤x≤b (2.28)
1, x>b
The lower quartile, median and upper quartile of the uniform distribution are given respec-
tively by
3 1 a+b 1 3
a + b, , a+ b (2.29)
4 4 2 4 4
a+b
E(X) = (2.30)
2
and the variance is given by
(b − a)2
V ar(X) = (2.31)
12
22
The uniform distribution is too simple to realistically model actual experimental data, but
is very useful for computer simulation, as random quantities from different distributions can be
obtained from U (0, 1) random quantities.
Definition 2.12. The random variable X has an exponential distribution with parameter
λ > 0, written
X ∼ Exp(λ)
if it has PDF (
λe−λx , x ≥ 0
f (x) = (2.32)
0, otherwise
and CDF (
0, x<0
f (x) = (2.33)
1 − e−λx , x≥0
Definition 2.13. A random quantity X has a normal distribution with parameter µ and
σ 2 , written
X ∼ N (µ, σ 2 )
if it has PDF 2
1 1 x−µ
f (x) = √ exp − , −∞ < x < ∞ (2.36)
σ 2π 2 σ
for σ > 0, and CDF
x−µ
F (x) = P (X ≤ x) = Φ (2.37)
σ
b−µ a−µ
Note that P (a < X ≤ b) = Φ σ
−Φ σ
E(X) = µ (2.38)
23
2.3.5 Exercises
2. Mary rolls two fair dice and observes two numbers X and Y .
4. Let X be a discrete random variable with range RX = {1, 2, 3, ...}. Suppose the PMF of
X is given by
1
f (x) = x f or k = 1, 2, 3, ...
2
(a) Find P (2 < X ≤ 5)
(c) Find P (X > 4)
(a) Find c
(b) Find the CDF of X, FX (x)
(c) Find P (1 < X < 3)
fx (x)2x, 0≤x≤1
Find P (X ≤ 23 |X > 13 ).
25