Elements of Probability and Statis
Elements of Probability and Statis
Massimo Campanino
Elements
NITEXT
of Probability
and Statistics
An Introduction to Probability
with de Finetti’s Approach
and to Bayesian Statistics
UNITEXT - La Matematica per il 3+2
Volume 98
Editor-in-chief
A. Quarteroni
Series editors
L. Ambrosio
P. Biscari
C. Ciliberto
M. Ledoux
W.J. Runggaldier
More information about this series at http://www.springer.com/series/5418
Francesca Biagini Massimo Campanino
•
Elements of Probability
and Statistics
An Introduction to Probability
with de Finetti’s Approach
and to Bayesian Statistics
123
Francesca Biagini Massimo Campanino
Department of Mathematics Department of Mathematics
Ludwig-Maximilians-Universität Università di Bologna
Munich Bologna
Germany Italy
Translation from the Italian language edition: Elementi di Probabilità e Statistica di Francesca Biagini e
Massimo Campanino, © Springer-Verlag Italia, Milano 2006. All rights reserved.
© Springer International Publishing Switzerland 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
To my brother Vittorio
Massimo Campanino
Preface
This book is based on the lectures notes for the course, Probability and
Mathematical Statistics, taught for many years by one of the authors (M.C.) and
then, divided into two sections, by both authors at the University of Bologna (Italy).
We follow the approach of de Finetti, see de Finetti [1] for a complete detailed
exposition. Although de Finetti [1] was conceived as a textbook of probability for
mathematics students, it was also meant to illustrate the point of view of the author
on the foundations of probability and mathematical statistics and discuss it in
relation to prevalent approaches, resulting often of difficult access for beginners.
This was the main reason that prompted us to arrange the lectures notes of our
courses into a more organic way and to write a textbook for an initial class on
probability and mathematical statistics.
The first five chapters are devoted to elementary probability. After that in the
next three chapters we develop some elements of Markov chains in discrete and
continuous time also in connection with queueing processes, and introduce basic
concepts in mathematical statistics in the Bayesian approach. Then we propose six
chapters of exercises, which cover most of the topics treated in the theoretical
part. In the appendices we have inserted summary schemes and complementary
topics (two proofs of Stirling formula). We also informally recall some elements of
calculus, as this has often proved useful for the students.
This book offers a comprehensive but concise introduction to probability and
mathematical statistics without requiring notions of measure theory; hence it can be
used in basic classes on probability for mathematics students and is particularly
suitable for computer science, physics and engineering students.
ix
x Preface
We are grateful to Springer for allowing us to publish the English version of the
book. We wish to thank Elisa Canova, Alessandra Cretarola, Nicola Mezzetti and
Quirin Vogel for their fundamental help with latex, for both the Italian and the
English version.
Part I Probability
1 Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Probability of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Uniform Distribution on Partitions . . . . . . . . . . . . . . . . . . . . . 12
1.6 Conditional Probability and Expectation . . . . . . . . . . . . . . . . . 14
1.7 Formula of Composite Expectation and Probability. . . . . . . . . . 15
1.8 Formula of Total Expectation and Total Probability . . . . . . . . . 16
1.9 Bayes Formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10 Correlation Between Events. . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.11 Stochastic Independence and Constituents . . . . . . . . . . . . . . . . 20
1.12 Covariance and Variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13 Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.14 Chebychev’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.15 Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 Random Numbers with Discrete Distribution . . . . . . . . . . . . . . 27
2.2 Bernoulli Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7 Independence of Partitions. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.8 Generalized Bernoulli Scheme . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.10 Stochastic Independence for Random Numbers
with Discrete distribution . . . . . . . . . . . . . . . ............ 35
2.11 Joint Distribution . . . . . . . . . . . . . . . . . . . . . ............ 35
xi
xii Contents
Part II Exercises
9 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Exercise 9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Exercise 9.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Exercise 9.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Exercise 9.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Exercise 9.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Exercise 9.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Exercise 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Exercise 10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Exercise 10.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Exercise 10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Exercise 10.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Exercise 10.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Exercise 10.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
xiv Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Part I
Probability
Chapter 1
Random Numbers
1.1 Introduction
Probability Theory deals with the quantification of our degree of uncertainty. Its main
object of interest are random entities and, in particular, random numbers. What is
meant by random number?
A random number is a well defined number, whose value is not necessarily known.
For example we can use random numbers to describe the result of a determined exper-
iment, or the value of an option at a prefixed time, or the value of a meteorological
magnitude at a given time. All these quantities have a well defined value, but may
not be known either because they refer to the future and there are no means to predict
their values with certainty or, even if they refer to the past, there is no available
information at the moment.
We shall denote random numbers with capital letters. Even if the value of a random
number is in general not known, we can speak about the set of its possible values,
that will be denoted by I (X ). Certain numbers can be considered as particular cases
of random numbers, whose set of possible values consists of a single element.
Example 1.1.1 Let the random numbers X, Y represent respectively the results of
throwing a coin and a die. If we denote head and tail by 0 and 1 and the sides of the
die with the numbers from 1 to 6, we have:
I (X ) = {0, 1} ,
I (Y ) = {1, 2, 3, 4, 5, 6} .
Given two random numbers X and Y, we denote by I (X, Y ) the set of pairs of
values that (X, Y ) can attain. In general given n random numbers X 1 , . . . , X n , we
denote by I (X 1 , . . . , X n ) the set of possible values that (X 1 , . . . , X n ) can attain.
The random numbers X and Y are said to be logically independent if
I (X, Y ) = I (X ) × I (Y ) ,
Example 1.1.2 In a lottery two balls are consecutively drawn without substitution
from an urn that contains 90 balls numerated from 1 to 90. Let X and Y represent
the random numbers corresponding respectively to the first and the second drawing.
The set of possible pairs is then
Clearly I (X, Y ) = I (X ) × I (Y ) as I (X, Y ) does not contain pairs of the type (i, i),
with i ∈ {1, . . . , 90}. The random numbers X and Y therefore are not logically
independent.
x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z), (1.1)
x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z); (1.2)
2. associative property
x ∨ (y ∨ z) = (x ∨ y) ∨ z, (1.3)
x ∧ (y ∧ z) = (x ∧ y) ∧ z; (1.4)
3. commutative property
x ∨ y = y ∨ x, (1.5)
x ∧ y = y ∧ x; (1.6)
1.1 Introduction 5
4. furthermore
x̃˜ = x, (1.7)
˜
(x ∨ y) = x̃ ∧ ỹ, (1.8)
(x ∧ y)˜ = x̃ ∨ ỹ . (1.9)
1.2 Events
Ẽ = 1 − E.
(E ∨ F)˜ = Ẽ ∧ F̃ = (1 − E)(1 − F) = 1 − E − F + E F,
so that
E ∨ F = E + F − E F.
Analogously
so that
E ∨ F ∨ G = E + F + G − E F − E G − F G + E F G.
E ⊂ F for
E ≤ F,
and
E = F for E ≡ F
A constituent of E 1 , . . . , E n is a product
Q = E 1∗ · · · E n∗ .
E 1 Ẽ 2 . . . Ẽ n ,
Ẽ 1 E 2 Ẽ 3 . . . Ẽ n ,
··· ,
Ẽ 1 . . . Ẽ n−1 E n ,
in this case the constituents can be identified with the events themselves.
Let us now introduce the concept of logical dependence and independence of an
event E from n given events E 1 , . . . , E n . The constituents Q of E 1 , . . . , E n can be
classified in the following way with respect to a given event E:
(i) constituent of I type if Q ⊂ E;
(ii) constituent of II type if Q ⊂ Ẽ;
(iii) constituent of III type otherwise.
We say that the event E is:
• logically dependent from E 1 ,…,E n if all constituents of E 1 ,…,E n are of I or II
type;
• logically independent from E 1 ,…,E n if all constituents of E 1 ,…,E n are of the III
type;
• logically semidependent from E 1 ,…,E n otherwise.
If E is logically dependent from E 1 ,…,E n , then we can write
E= Q.
Q of I type
Q⊂E
Example 1.2.3 Let us consider two events E 1 , E 2 . The logical sum (E 1 ∨ E 2 ) can
be written as
E 1 ∨ E 2 = E 1 E 2 + E˜1 E 2 + E 1 E˜2 .
Example 1.2.4 Let us throw five times a coin. Let E i be the event that we get head
at the ith trial, i.e. E i = 1. Set Y = E 1 + E 2 + E 3 + E 4 + E 5 (Y is the total number
of heads in the five throws) and consider the event
E = (Y ≥ 3).
8 1 Random Numbers
1.3 Expectation
Given a random number X , we look for a non-random number that expresses our
evaluation of X . We call this quantity expectation of X . In economic terms, if we
think of the expectation of X as a non-random gain that we judge equivalent to X .
Following de Finetti [1] the expectation P(X ) assigned to the random number X
can be defined in an operative way as follows.
Two equivalent operative definitions can be used to define the expectation:
1. Bet method: we think of X as a random gain (or loss, if it is negative). We have
to choose a value P(X ) (non-random) that we judge equivalent to X .
After this choice is made, we must accept any bet with gain (or loss) given by
λ(X − x̄),
λ(X − x̄) < 0.
1.3 Expectation 9
λ(X − x̄) < 0 ,
i.e. a certain loss. It follows that these choices are not coherent according to the
first criterium. If
inf I (X ) ≤ x̄ ≤ sup I (X ),
2 2
then
X − x̄¯ < (X − inf I (x)) or
X − x̄¯ < (X − sup I (x)) respec-
tively. In this case these choices are not coherent according to the second cri-
terium.
c1 = c2 = −c3 ,
(so that the random part of G cancels), then we have that the total gain is: G =
c3 (x̄ + ȳ − z̄). Then if x̄ + ȳ − z̄ = 0, one can choose c3 so that
G < 0. In this
case this choice is not coherent according to the first criterium. On the other side
if we follow the penalty method we will pay a penalty proportional to
Remark 1.3.2 For unbounded random numbers X (for which inf I (X ) = −∞, or
sup I (X ) = ∞, or both) an evaluation of P(X ) is not necessarily finite or even may
not exist. We refer to [1] for a discussion on the definition of the expectation for
unbounded random numbers.
If E is an event, i.e. a random number such that I (E) ⊂ {1, 0}, then its expectation
P(E) is also called probability of E. From monotonicity it follows that:
1. the probability of an event E is a number between 0 and 1, 0 ≤ P(E) ≤ 1;
2. E ≡ 0 =⇒ P(E) = 0;
3. E ≡ 1 =⇒ P(E) = 1.
When E ≡ 1, E is called certain event. If E ≡ 0, E is called impossible event.
Furthermore for any given events E 1 , E 2 we have that
and that
P(E 1 + E 2 ) = P(E 1 ) + P(E 2 ).
n
P(E i ) = 1.
i=1
The function that assigns to the events of a partition their probabilities is called
probability distribution of the partition. If E is logically dependent from the events
{E 1 , . . . , E n } of a partition, then we can express the probability of E in terms of the
probabilities of E 1 , . . . , E n . Indeed we have
E= Ei
E i ⊂E
so that
P(E) = P(E i ).
E i ⊂E
Let us now compute the expectation of a random number X with a finite number of
possible values I (X ) = {x1 , . . . , xn } in terms of the probabilities of events E i :=
(X = xi ). We use the convention that some proposition within brackets represents a
quantity which is 1 when the proposition is true and 0 when it is false. We have:
1.4 Probability of Events 11
n
P(X ) = xi P(X = xi ). (1.10)
i=1
Indeed
P(X ) = P(X (E 1 + · · · + E n ))
= P(X E 1 ) + · · · + P(X E n )
n
n
= P(X E i ) = P(xi E i )
i=1 i=1
n n
= xi P(E i ) = xi P(X = xi ),
i=1 i=1
where we have used the fact that X E i is a random number that is equal to xi when
E i = 1 and to 0 when E i = 0, i.e. X E i = xi E i .
n
P(φ(X )) = φ(xi )P(X = xi ). (1.11)
i=1
The proof is completely analogous to the one of (1.10), which deals with the particular
case φ(x) = x.
Example 1.4.1 Let X be a random number representing the result of throwing a
symmetric die with faces numbered from 1 to 6. By symmetry it is natural to assign
the same probability (that must be 16 ) to all possible values. In this case:
1
6
6·7 7
P(X ) = i= = .
6 i=1 6·2 2
Note that in this case the expectation does not coincide with one of the possible
values of X .
Example 1.4.2 Let us throw a symmetric coin. Let X = 1 if the result is head and
X = 0 if we obtain tail. Also in this case by symmetry it is natural to assign the same
probability (that must be equal to 21 ) to both values. In this case
1 1 1
P(X ) = ·0+ ·1= .
2 2 2
12 1 Random Numbers
In some situations, for reasons of symmetry, it is natural to assign the same prob-
ability to all events of a partition. This is the case of hazard games. If the events
E 1 , . . . , E n are assigned the same probability, we say that the partition has the
uniform distribution. Since the probabilities of a partition add up to 1, we have then
1
P(E i ) = .
n
Let E an event which depends logically from the partition E 1 , . . . , E n , then the
probability of E is given by:
⎛ ⎞
{i|E i ⊂ E}
P(E) = P ⎝ Ei ⎠ = .
E i ⊂E
n
{i| E i ⊂ E}
P(E) = .
n
This formula is commonly expressed by saying that the probability is given by the
number of favorable cases (i.e. the elements E i contained in E), divided by the
number of possible cases (i.e. the total number of E i ), as shown below:
f avorable cases
P(E) = . (1.12)
possible cases
This identity is valid only if the events of the partition are judged equiprobable.
Example 1.5.1 A symmetric coin is thrown n times. Let X be the random number
that counts the number of heads in the n throws and let E i be the event that the ith
throw gives head. We consider the event
E := (X = k) = Q,
Q⊂E
It follows from the properties of binomial coefficients that when n is even, the largest
value for P(E) is obtained for k = n2 . If n is odd, the largest value for P(E) is obtained
for k = n−1
2
and k = n+1
2
.
Let us now consider the same problem, in the case when the drawings are made
without replacement. In this case n must be less than or equal to N , as we cannot
perform more than N drawings without replacement. Also X has some extra con-
straints, as the number X of the extracted white balls must be less than or equal to H
and the number n − X of extracted black balls must be less than or equal to N − H .
Therefore
I (X ) = {0 ∨ (n − (N − H )), . . . , n ∧ H }.
In this case the possible cases are represented by all possible sets of extracted balls.
An event corresponds to a set of extracted balls. The number of possible cases is then
N
.
n
14 1 Random Numbers
Also here by symmetry it is natural to assign the same probability to all events. If
we do so, we can apply formula (1.12) and get
H N−H
k n−k
P(X = k) =
,
N
n
G = cH (X − x̄) ,
where c is a constant (positive or negative). The chosen value is then our evaluation
of the conditional expectation of X given by H and denoted by P(X |H ).
2. Penalty method: Here we have to choose a value x̄¯ with the condition that we
accept to pay a penalty.
P = λH (X − x̄) ¯ 2,
where λ is a positive constant. Note that the penalty is null when the event H
does not take place, similarly as in the definition based on bets. According to this
definition x̄¯ is our evaluation of the conditional expectation P(X |H ) of X .
It can be shown, as in the case of ordinary expectation, that the two definitions are
equivalent.
In the particular case when we consider an event E we speak about the conditional
probability P(E|H ) of E given H .
1.6 Conditional Probability and Expectation 15
G = c1 (H − x) + c2 H (X − y) + c3 (X H − z)
= H (c1 + (c2 + c3 )X − c2 y) − c1 x − c3 z ,
where c1 , c2 and c3 are arbitrary constants. As in previous cases, let us fix c1 , c2 and
c3 in such a way that the random part of G cancels: c2 = −c3 and c1 = c2 y. Then
G = −c1 x − c3 z = c2 (z − x y).
Analogously this equality follows by using the definition based on penalty. If P(H ) >
0, then
P(X H )
P(X |H ) = .
P(H )
has a logical meaning, as E H is the logical product of E and H , i.e. the event that
both E and H take place. In particular:
P(E)
1. E ⊂ H ⇒ P(E|H ) = ;
P(H )
2. H ⊂ E, that means I (E|H ) = {1} ⇒ P(E|H ) = 1;
3. H ⊂ Ẽ, that means I (E|H ) = {0} ⇒ P(E|H ) = 0.
n
P(X ) = P(X |Hi )P(Hi ) . (1.14)
i=1
We call (1.14) the fomula of total expectation. If X is also an event, (1.14) is said to
be the formula of total probability. Indeed,
Let E, H be events with P(H ) > 0. By applying twice the formula of total probability
we obtain Bayes’ formula:
Example 1.9.1 Consider an urn contain N identical balls of which some are white
and some are black. Let Y be the random number of the white balls present in the
urn (the composition of the urn is unknown).
The events Hi = (Y = i), for i = 0, . . . , N form a partition. Let E be the event
that we obtain a white ball in a drawing from the urn. Using the formula of total
probability (1.14) we obtain:
1.9 Bayes Formula 17
N N
i
P(E) = P(E|Hi )P(Hi ) = P(Hi ) .
i=0 i=0
N
Indeed if the composition of the urn is known, i.e. if we condition with respect to Hi
for some i, we can apply usual symmetry considerations and get P(E|Hi ) = Ni .
In the case we assign to the partition H0 , . . . , HN the uniform distribution
1
P(Hi ) = , i = 0, . . . , N
N +1
we get
N
i 1
P(E) = = .
i=0
N (N + 1) 2
We now evaluate the probability that the urn contains i white balls if we have extracted
a white ball. This question is answered by Bayes’ formula:
i 1
P(E|Hi )P(Hi ) N N +1 2i
P(Hi |E) = = = .
P(E) 1
2
N (N + 1)
We see that distribution on the partition conditional to the event that a white ball is
drawn is no longer uniform, but it gives higher probabilities to compositions with a
large number of white balls.
Example 1.10.1 We consider an urn with H white balls and N − H black balls. We
perform two drawings. Let E i be the event that a white ball is extracted at the ith
extraction, i = 1, 2. For drawings with replacement we have
H
P(E 1 ) = P(E 2 ) = .
N
Indeed the urn composition in the two drawings is the same. In this case E 1 and E 2
are non-correlated, as by (1.12)
H2
P(E 1 E 2 ) = = P(E 1 )P(E 2 ) .
N2
Let us now consider the case of drawings without replacement. We use again formula
H
(1.12) to compute probabilities and conditional probabilities. We have P(E 1 ) =
N
and by the formula of total probability (1.14) applied to the event E 2 and the partition
E 1 , E˜1 we get
Here P(E 1 ) and P(E 2 ) are both equal to HN and P(E 1 ), P(E 2 ) are negatively corre-
lated, as
H −1 H
P(E 2 |E 1 ) = < = P(E 2 )
N −1 N
if 0 < H < N .
When P(E 1 ) > 0 and P(E 2 ) > 0 this definition coincides with non-correlation.
When one or both of E 1 and E 2 have 0 probability, then E 1 and E 2 are stochastically
independent, as in this case P(E 1 )P(E 2 ) = 0 and
1.10 Correlation Between Events 19
We remark that in general n events are not stochastically independent if the events
are only pairwise stochastically independent.
We shall see that if the events E 1 , . . . , E n are stochastically independent, then
the events E 1∗ , . . . , E n∗ are stochastically independent for every possible choice of
E i∗ between E i and Ẽ i , for i = 1, . . . , n.
Definition 1.10.3 Let H = {H1 , . . . , Hn } be a partition. The events E 1 , E 2 are said
to be stochastically independent conditionally to the partition H if
Hi = (Y = i) i = 0, . . . N .
It is easy to see that the events E 1 and E 2 are stochastically independent conditionally
to the partition H. We want to see whether E 1 and E 2 are stochastically independent,
assuming that we assign the uniform distribution to H , i.e. P(Hi ) = N 1+1 for i =
0, 1, . . . , N . We compute:
1. the probability of the first drawing:
N
P(E 1 ) = P(E 1 |Hi )P(Hi )
i=0
1 i
N
=
N + 1 i=0 N
1 N (N + 1)
=
N +1 2N
1
= ;
2
20 1 Random Numbers
1
P(E 2 ) = P(E 1 ) = ;
2
3. the probability that we draw a white ball in both drawings:
N
P(E 1 E 2 ) = P(E 1 E 2 |Hi )P(Hi )
i=0
1
N
= P(E 1 |Hi )P(E 2 |Hi )
N + 1 i=0
1 i2
N
= .
N + 1 i=0 N 2
we have
N N
(i + 1)3 − i 3
N N
1 (N + 1)3 N (N + 1) N
i =
2
− i− = − − ,
i=0 i=0
3 i=0 i=0
3 3 2 3
and
(N + 1)2 1 1
P(E 1 E 2 ) = − − .
3N 2 2N 3N (N + 1)
1
For N → +∞, P(E 1 E 2 ) tends to . Therefore at least for large N , E 1 and E 2
3
are positively correlated. This shows that stochastic independence conditionally
to a partition does not imply stochastic independence.
Q = Ẽ 1 E 2 E 3 = (1 − E 1 )E 2 E 3 = E 2 E 3 − E 1 E 2 E 3 .
P(Q) = P (φ(E 1 , . . . , E n ))
= φ (P(E 1 ), . . . , P(E n ))
= P(E 1∗ ) · · · P(E n∗ ),
where the last equality is obtained by collecting terms in φ and using that P( Ẽ i ) =
1 − P(E i ). In the example Q = Ẽ 1 E 2 E 3 . We have
P(Q) = P Ẽ 1 E 2 E 3
= P (E 2 E 3 − E 1 E 2 E 3 )
= P(E 2 )P(E 3 ) − P(E 1 )P(E 2 )P(E 3 ) = φ(P(E 1 ), P(E 2 ), P(E 3 ))
= (1 − P(E 1 ))P(E 2 )P(E 3 ) = P( E˜1 )P(E 2 )P(E 3 ).
⇐) We assume that (1.16) holds for all constituents of the events E 1 , . . . , E n . Let
{i 1 , . . . , i k } ⊂ {1, . . . , n} and { j1 , . . . , jn−k } = {1, . . . , n} \ {i 1 , . . . , i k }. Then
⎛ ⎞
P(E i1 · · · E ik ) = P ⎝ Q⎠
Q⊂E i1 ···E ik
= P(E i1 ) · · · P(E ik ) P(E j1 . . . E jn−k )
where the sum ranges over all possible choices of E jl for l = 1, . . . , n − k. By
collecting terms we get:
Given two random numbers X and Y , the covariance between X and Y is defined by
σ 2 (X ) = cov(X, X ).
Other notations for the variance of X are var(X ) and D(X ). From the two expressions
for the covariance
we get two expressions
for the variance: σ 2 (X ) = P(X 2 ) − P(X )2
and σ (X ) = P (X − P(X )) . From the second expression we see that
2 2
σ 2 (X ) ≥ 0,
• standart deviation:
σ(X ) = σ 2 (X ) = P Q (X − P(X )).
σ 2 (a X + b) = a 2 σ 2 (X ). (1.19)
1.12 Covariance and Variance 23
2. Again from the definition of covariance and the linearity of the expectation we
have:
σ 2 (X 1 + · · · + X n ) = cov(X 1 + · · · + X n , X 1 + · · · + X n )
n
= cov(X i , X i ) + cov(X i , X j )
i=1 i= j
n
= σ 2 (X i ) + cov(X i , X j ).
i=1 i, j
i= j
Definition 1.13.1 For X, Y random numbers with σ(X ) > 0, σ(Y ) > 0 the corre-
lation coefficient of X and Y is defined by
cov(X, Y )
ρ(X, Y ) = .
σ(X ) σ(Y )
cov(a X + b, cY + d)
ρ(a X + b, cY + d) =
σ 2 (a X + b) σ 2 (cY + d)
ac cov(X, Y )
=
|ac| σ 2 (X ) σ 2 (Y )
= sgn(ac) ρ(X, Y ).
2. −1 ≤ ρ(X, Y ) ≤ 1.
Let
X − P(X ) Y − P(Y )
X∗ = , Y∗ = .
σ(X ) σ(Y )
These are the so-called standardized random numbers: they are obtained from
X, Y by means of suitable linear transformation such that P(X ∗ ) = 0, P(Y ∗ ) = 0
and σ 2 (X ∗ ) = 1, σ 2 (Y ∗ ) = 1 by using linearity of the expectation and (1.19).
By (1.18) we get
P (X ∗ Y ∗ )
cov(X ∗ , Y ∗ ) = = ρ(X, Y ).
σ(X ) σ(Y )
0 ≤ σ 2 (X ∗ + Y ∗ ) = σ 2 (X ∗ ) + σ 2 (Y ∗ ) + 2 cov(X ∗ , Y ∗ )
= 2 + 2ρ(X, Y ),
1.13 Correlation Coefficient 25
0 ≤ σ 2 (X ∗ − Y ∗ ) = σ 2 (X ∗ ) + σ 2 (−Y ∗ ) + 2 cov(X ∗ , −Y ∗ )
= 2 − 2ρ(X, Y ).
so that ρ(X, Y ) ≤ 1.
The Chebychev’s inequality allows to estimate the probability that a random number
takes value far from its expectation. It can be formulated in two ways:
1. Let X be a random number with PQ (X ) > 0. For every t > 0
1
P |X | ≥ t PQ (X ) ≤ 2 .
t
1
P (|X − m| ≥ σ(X )t) ≤ .
t2
Proof 1. Let E be the event E = |X | ≥ t PQ (X ) . We compute P X 2 using the
formula of total expectation with respect to the partition E, Ẽ:
P X 2 = P X 2 |E P(E) + P X 2 | Ẽ P Ẽ .
Proof The proof is based on the second form of Chebychev’s inequality. First we
Sn
compute the expectation of :
n
Sn 1
P = (P(X 1 ) + · · · + P(X n )) = m
n n
where we have used Proposition 1.12.2 and the fact that random numbers of the
sequence are pairwise uncorrelated. From the second form of Chebychev’s inequality
we get
Sn σ 1
P | − m| ≥ √ t ≤ 2 .
n n t
σ 1 σ2
Putting λ = √ t, we obtain 2 = . Therefore
n t nλ2
Sn σ2
P | − m| ≥ λ ≤ ,
n nλ2
Example 1.15.2 In particular one can apply the weak law of large numbers to the
case of a sequence of uncorrelated events (E i )i=1,2,... with the same probability
P(E i ) = p. Note that for an event E i ,
so the E i ’s have automatically the same variance. Hence for all λ > 0 we have
Sn
P | − p| ≥ λ → 0
n
for n → ∞.
Chapter 2
Discrete Distributions
A simple and useful model from which some discrete distributions can be derived is
the Bernoulli scheme. It can be thought of as a potentially infinite sequence of trials,
each of them with two possible outcomes called success and failure. Each trial is
performed in the same known conditions and we assume that there is no influence
between different trials. Formally a Bernoulli scheme with parameter p, 0 < p < 1,
is a sequence E 1 , E 2 , . . . of stochastically independent equiprobable events with
P(E 1 ) = p.
Example 2.2.1 A concrete example for which one can use as a model a Bernoulli
scheme with p = 21 is a sequence of throws of a symmetric coin, where E i is the
event that one gets head at the ith throw.
Given a Bernoulli scheme (E i )i∈N with P(E i ) = p, let Sn the random number of
successes in the first n trials. Sn can be written as
Sn = E 1 + · · · + E n .
We must determine the probability of a constituent of I type with respect to the event
(Sn = k). An example of such a constituent is
Q = E 1 . . . E k Ẽ k+1 . . . Ẽ n , (2.1)
that is the event that k successes are obtained in the first k trials, whereas the remaining
n − k trials yield failures.
Analogously, any other constituent of I type will be a product of the same kind as in
(2.1). Since the events are stochastically independent, in force of Proposition 1.11.1,
every constituent Q of I type has the same probability, given by
P(Q) = p · · · p (1 − p) · · · (1 − p) = p k (1 − p)n−k .
k times (n−k) times
n
n
1 = ( p + 1 − p) = n
p k (1 − p)n−k .
k
k=0
The simplest way to compute the expectation of Sn is through the linearity of expec-
tation:
n
P(Sn ) = P(E 1 + · · · + E n ) = P(E i ) = np .
i=1
Example 2.3.1 Consider an urn containing N identical balls, of which H are white
and N − H are black. We perform a sequence of n drawings with replacement.
It is easy to check that by symmetry the sequence of events (E i )i = 1, 2, ... where
E i = (a white ball is drawn at the ith drawing) makes up a Bernoulli scheme, with
parameters p = HN . Indeed for 1 ≤ i 1 < i 2 < . . . < i k
k
Hk H
P(E i1 . . . E in ) = k = ,
N N
where the possible cases correspond to the N k sequences of balls that may be drawn in
the drawings i 1 , . . . , i k , whereas the favorable cases correspond to the H k sequences
where white balls are drawn.
I (T ) = N \ {0} ∪ {∞} .
It is easy to see that P(T = ∞) = 0 since for all n > 0, (T = ∞) ⊆ E˜1 . . . E˜n
so that P(T = ∞) ≤ P( E˜1 . . . E˜n ) = (1 − p)n for every n. Let us compute the
probability distribution of T for finite values:
P (T = i) = P Ẽ 1 . . . Ẽ i−1 E i = P Ẽ 1 . . . P Ẽ i−1 P (E i ) = (1 − p)i−1 p .
T is said to have geometric distribution with parameter p. Using the formula for the
sum of geometric series (see Appendix G.1), one verifies that
+∞
+∞
+∞
1
P(T = i) = (1 − p)i−1 p = p (1 − p)k = p · = 1.
i=1 i=1 k=0
1 − (1 − p)
30 2 Discrete Distributions
+∞
+∞
+∞
p 1
P(T ) = iP(T = i) = i(1 − p)i−1 p = p i(1 − p)i−1 = = ,
i=1 i=1 i=1
p2 p
+∞ +∞
+∞
d i d d 1 1
ix i−1
= [x ] = x i
= = .
i=1 i=1
d x d x i=0
dx 1−x (1 − x)2
i.e. the conditional probability of no success up and including the (m +n)th trial given
that there was no success up and including the nth trial is equal to the probability
of no success up and including the m trial: everything starts from scratch. We have
namely that
(1 − p)m+n
P(T > m + n | T > n) = = (1 − p)m = P(T > m) .
(1 − p)n
+∞
+∞ i
+∞ i
λ λ
P(X = i) = e−λ = e−λ = e−λ eλ = 1 .
i=0 i=0
i! i=0
i!
2.5 Poisson Distribution 31
In order to compute the expectation, we use the extension of the formula for random
numbers with a finite number of possible values to the case of a enumerable set of
possible values as we did for geometric distribution and as we will do in similar cases
(provided that the series is convergent). We obtain
+∞
+∞
+∞
λi λi−1
P(X ) = iP(X = i) = i e−λ = λe−λ
i=0 i=0
i! i=1
(i − 1)!
+∞ k
λ
= λe−λ = λe−λ eλ = λ .
k=0
k!
Consider an urn containing N balls of which H are white and N − H black, where
0 < H < N . We perform n drawings without replacement from the urn with n ≤ N .
Let X be the random number that counts the number of white balls in the sample
that we draw.
Since we perform drawings without replacement, X is less than or equal to H and
n − X , the number of black balls in the sample, is less than or equal to N − H . From
this it follows that the set of possible values of X is given by
I (X ) = {0 ∨ n − (N − H ), . . . , n ∧ H } .
A sample with i white balls contains (n − i) black balls. The number of favorable
cases that correspond to such samples is therefore given by:
H N−H
f avorable cases = .
i n−i
n
X= Ei
i=1
where E i is the event that a white ball is chosen at ith drawing. Therefore by the
linearity of the expectation
In the evaluation of P(E i ) we can still use symmetry by interchange of balls, but when
defining possible cases we must take into account the order, since the event depends
on the order of the drawings. Possible cases correspond to sequences of length n
of distinct elements from a set of N elements. Their number is DnN = (N )n =
N (N − 1) − (N − n + 1). Favorable cases correspond to those sequences that have
a white ball at the ith place. This ball can be chosen in H ways. The remaining balls
form a sequence of lenght n − 1 of distinct elements from a set of N − 1 elements.
Therefore
N −1
f avorable cases H Dn−1 H
P(E i ) = = =
possible cases Dn N N
and
H
P(X ) = n .
N
E 1(1) , . . . , Er(1)
E 1(2) , . . . , Er(2)
.. .
. , . . . , ..
.. .
. , . . . , ..
E 1(n) , . . . , Er(n) ,
where the events belonging to the same column are equiprobable, whereas the events
of each row constitute stochastically independent partitions.
Starting from a generalized Bernoulli scheme, as defined in Sect. 2.2, we can now
define the multinomial distribution in the same way as the binomial distribution can
be defined starting from an ordinary Bernoulli scheme. Given n > 0, let us consider
the random numbers Y1 , . . . , Yr defined by
n
Yl = El(i) , l = 1, . . . , r .
i=1
34 2 Discrete Distributions
In the array of the previous section, the Yl ’s are obtained by adding up the events
along the columns. We have
r
r
n
n
r
Yl = El(i) = El(i) = n .
l=1 l=1 i=1 i=1 l=1
1
The idea of constituents can be extended in a natural fashion from events to partitions.
A constituent of the partition H1 , . . . , Hn is an event of the form
Q = i=1
n
H∗i ,
We want to compute
P (Y1 = k1 , . . . , Yr = kr )
where Q varies among the constituents of I type contained in the event (Y1 =
k1 , . . . , Yr = kr ). In the product defining a constituent of I type there will be kl
events of index l with 1 ≤ l ≤ r . Therefore since the partitions are stochastically
independent, the probability of a constituent of I type is given by:
Let us consider two random numbers X and Y , that we can look at as a random vector
(X, Y ), assuming a finite number of possible values I (X, Y ). If I (X ) = {x1 , . . . , xm }
and I (Y ) = {y1 , . . . , yn } we define the joint distribution of X and Y . This is the
function
p(xi , y j ) = P(X = xi , Y = y j )
p1 (xi ) = P(X = xi )
for i = 1, . . . , m. The marginal distribution can be obtained from the joint distribu-
tion:
n
n
p1 (xi ) = P(X = xi ) = P(X i , Y j ) = p(xi , y j ),
j=1 j=1
i.e. adding up the elements on the rows of the matrix. It is called marginal because it
is customarily written at the margin of the matrix. Similarly the marginal distribution
of Y is defined by:
m
p2 (y j ) = P(Y = y j ) = p(xi , y j ) .
i=1
36 2 Discrete Distributions
It follows that two random numbers X and Y are stochastically independent if and
only if
p(xi , y j ) = p1 (xi ) p2 (y j ) (2.2)
m
n
P(Z ) = P(ψ(X, Y )) = ψ(xi , y j ) p(xi , y j ) . (2.3)
i=1 j=1
The proof is completely analogous to that one in the case of a single random number.
For example, we can compute P(X Y ):
m
n
P(X Y ) = xi y j p(X = xi , Y = y j ).
i=1 j=1
Indeed
m
n
P(φ1 (X )φ2 (Y )) = φ1 (xi )φ2 (y j )P(X = xi , Y = y j )
i=1 j=1
= φ1 (xi )φ2 (y j ) p1 (xi ) p2 (y j )
(xi ,y j )∈I (X )×I (Y )
= φ1 (xi ) p1 (xi ) φ2 (y j ) p2 (y j )
xi ∈I (X ) y j ∈I (Y )
= P(φ1 (X ))P(φ2 (Y )) .
where we use that for an event E 2 = E since E can take only values 0 and 1.
2.12 Variance of Discrete Distributions 37
n
σ 2 (E 1 + . . . + E n ) = σ 2 (E i ) = np(1 − p) .
i=1
+∞
1
P(X ) = i p(1 − p)i−1 = .
i=1
p
Hence
+∞
+∞
+∞
P(X 2 ) = p i 2 (1 − p)i−1 = p i(i − 1)(1 − p)i−1 + p i(1 − p)i−1
i=1 i=1 i=1
+∞
1
= p(1 − p) i(i − 1)(1 − p)i−2 +
p
i=2
+∞
d2 1
= p(1 − p) (1 − p)i
+
d2 p p
i=2
2
d 1 1
= p(1 − p) − 1 − (1 − p) +
2
d p 1 − (1 − p) p
2(1 − p) 1
= +
p2 p
2 1
= 2 − .
p p
(1 − p)
σ 2 (X ) = P[X 2 ] − P(X )2 = .
p2
+∞
+∞
+∞ λi +∞
λi −λ λi
P(X 2 ) = i 2 P(X = i) = i2 e = e−λ i(i − 1) + λe−λ
i=0 i=0
i! i=0
i! i=0
i!
+∞
+∞ k
λi−2 λ
= λ2 e−λ + λ = λ2 e−λ + λ = λ2 + λ
i=2
(i − 2)! k=0
k!
where we have used the computation of the expectation of the Poisson distribution.
38 2 Discrete Distributions
We have then
σ 2 (X ) = P(X 2 ) − P(X )2 = λ2 + λ − λ2 = λ .
5. Hypergeometric distribution: with the notation of Sect. 2.6, we use the represen-
tation X = E 1 +· · · + E n . The events E i ’s in this case are not stochastically inde-
pendent and are actually pairwise negatively correlated. Indeed, for 0 < H < N
for every pair i, j with i
= j, we have:
H H−N
cov(E i , E j ) = P(E i E j ) − P(E i )P(E j ) = <0
N2 N − 1
as
N −2 N −2
H (H − 1)Dn−2 H (H − 1) Dn−2 H (H − 1)
P(E i E j ) = = −2
= .
DnN N (N − 1) Dn−2
N N (N − 1)
Here we have used formula (1.12); possible cases are sequences with no repetition
of length n from a set of N elements, whereas in counting favorable cases we
first select two different white balls for the ith and the jth drawings and then the
remaining n − 2 balls from a set of N − 2 elements.
The variance of X is then obtained by means of the formula for the variance of
the sum of n random numbers:
n
σ 2 (X ) = σ 2 (E i ) + cov(E i , E j )
i=1 i, j
i
= j
H H H H−N N −n H H
=n (1 − ) + n(n − 1) 2 =n (1 − ) ,
N N N N −1 N −1 N N
where n(n − 1) is the number of ordered pairs i, j, with i
= j, which can be chosen
out of {1, . . . , n}.
Let us consider two random numbers X and Y with discrete joint distribution given
by:
p(X i , Y j ) = P(X = xi , Y = y j ) = pi, j
p1 (xi ) = P(X = xi ) = pi i = 1, . . . , m ,
p2 (y j ) = P(Y = y j ) = q j j = 1, . . . , n .
2.13 Non-correlation and Stochastic Independence 39
i.e. if
xi y j pi, j = xi pi yjqj .
i j i j
pi, j = 1 .
i j
Assume that we want to find values pi, j of the joint distribution, such thatX and Y are
non-correlated and have two fixed marginal distributions { pi }i=1,...,m and q j j=1,...,n .
We observe first of all that pi, j must satisfy the relation i, j pi, j = 1. In order to
determine the marginal distributions we must verify other additional (m −1)+(n −1)
linear relations. We have (m −1)+(n −1) and not m +n, since once (m −1)+(n −1)
relations
are satisfied, the last two follow from the fact that p
i, j i, j = 1, p
i i = 1,
q
j j = 1. Finally in order to impose non-correlation, an extra linear relation must
be verified on the pi, j ’s:
pi, j xi y j = m 1 m 2 ,
i j
m n
where m 1 = i x i pi and m 2 = j y j q j . We have therefore a system of 1 +
(m − 1) + (n − 1) + 1 = m + n linear equations for mn unknowns. This system has
the solution pi, j = pi q j , for which X and Y are stochastically independent. This will
be the only solution if the number of linearly independent equations is equal to the
number of the unknowns, i.e. if m + n = mn, or mn−m−n = (m−1)(n−1)−1 = 0.
This happens only if m = n = 2. It follows that non-correlation does not imply in
general stochastic independence. If m = n = 2, then there is just one solution so
that non-correlation and stochastic independence coincide. This is the case of events:
two events are non-correlated if and only if they are stochastically independent.
In Sect. 2.11 we have shown that stochastic independence implies non-correlation
and that in fact it implies non-correlation of any two functions of the random numbers.
1 d n φ X (u)
P(X = n) = ,
n! d x n u=0
for every n ∈ N. This shows that the probability distribution of X can be obtained
from its generating function.
Proposition 2.14.1 If P(X ) = k∈I (X ) kP(X = k) < ∞, then P(X ) =
limu→1− φX (u). Moreover P(X ) = k∈I (X ) kP(X = k) = +∞ if and only if
limu→1 φX (u)= ∞.
This is a particular case of the following result.
Proposition 2.14.2 If P(X (X − 1) . . . (X − k + 1)) = k∈I (X ) (k(k − 1) . . . (k −
n + 1))P(X = k) < ∞, then
Furthermore k∈I (X ) (k(k − 1) . . . (k − n + 1))P(X = k) = ∞ if and only if
limu→1− φ(n)
X (u) = ∞.
Previous results are easily obtained by taking the derivatives of the generating func-
tion. In particular the variance of X can be obtained from the generating function:
2
σ 2 (X ) = P(X 2 ) − P(X )2 = lim− φX (u) + φX (u) − φX (u) ,
u→1
where φX and φX denote respectively the first and the second derivatives of φ X .
Generating functions of some common discrete distributions are easily obtained:
φ E (u) = up + (1 − p).
2.14 Generating Function 41
n
n
φ X (u) = ukp k (1 − p)n−k
k
k=0
n
n
= (up)k (1 − p)n−k = (up + (1 − p))n ,
k
k=0
where the formula for the sum of geometric series has been used.
4. Poisson distribution with parameter λ:
∞
λk −λ
φ X (u) = uk e
k=0
k!
∞
(uλ)k
= e−λ = e−λ(1−u) .
k=0
k!
If X and Y are two stochastically independent random numbers with values in N, i.e.
P(X = i, Y = j) = P(X = i)P(Y = j) for all i, j ∈ I (X ) × I (Y ), then it is easy
to show that
φ X +Y (u) = φ X (u)φY (u).
Indeed:
= φ X (u)φY (u) .
42 2 Discrete Distributions
N , X 1, X 2, . . .
SN = X 1 + · · · + X N .
where φ N is the generating function of X and we have used the fact that the random
numbers X i have the same distribution and hence the same generating function. See
e.g. [3] or [6] for a more complete treatment of generating functions.
Chapter 3
One-Dimensional Absolutely Continuous
Distributions
3.1 Introduction
For random numbers with discrete distribution, the distribution is completely spec-
ified by the probabilities of taking single values. If we want to introduce random
numbers that take values on intervals or on the whole line, then the specification
of the probabilities of taking single values is no longer sufficient to determine their
distributions. For example for a random number corresponding to a random choice
in an interval [a, b], the probabilities of taking single values must be clearly equal
to 0, but that in no way specifies the probability of taking value in a subinterval of
[a, b]. In the following we will see how it is possible to describe the distribution of
a random number in general.
Given a random number X , its cumulative distribution function (c.d.f) is defined by:
where F(x0− ) denotes lim x→x0− F(x). This limit always exists as F(x) is bounded
non-decreasing.
Example 3.2.1 (Discrete case) In the case of a random number X with discrete
distribution I (X ) = {x1 , x2 , . . .} one has:
F(x) = P(X ≤ x) = P(X = xi ).
xi ≤x
The probability that a random number X takes value in an interval (a, b] can be
obtained from its c.d.f. F by:
The function f is the called a probability density function (p.d.f.) of X . Note that
f is not unique. Indeed if the values of f are changed on a finite set of points, the
new function is still a density of X , as its integrals are the same. It follows from
fundamental theorem of calculus that if x is a continuity point of f , then
f (x) = F (x).
3.3 Absolutely Continuous Distributions 45
If x is a continuity point of f , then f (x) ≥ 0. Indeed assume that f (x) < 0, then by
continuity there would be a neighborhood (a, b) of x where f is still strictly negative
but then b
F(b) = F(a) + f (x)d x < F(a),
a
Let us now see how to compute the expectation of X from the p.d.f. f . We consider
the particular case when I (X ) is contained in some interval [a, b] and the p.d.f.
f is continuous (and zero outside [a, b]). We subdivide [a, b] into n intervals Ii ,
i = 1, . . . , n of length b−an
. It is not important that the extremes are included: we
assume that the intervals are closed on the r.h.s. and open on the l.h.s. except for I1
that is closed on both sides. We define two random numbers with discrete distribution
(n) (n) (n)
X− and X + : if X takes value in Ii , then X − is equal to the left endpoint of Ii ,
(n) (n) (n)
X + is equal to the right endpoint. Since X − and X + have discrete distribution
with a finite number of possible values, we can compute their expectations using the
formula (1.10). They are given by:
n−1
a+( j+1) b−a
(n) b−a n
P(X − ) = a+ j f (x)d x;
j=0
n b−a
a+ j n
n−1
a+( j+1) b−a
(n) b−a n
P(X + ) = a + ( j + 1) f (x)d x.
j=0
n b−a
a+ j n
(n) (n)
Since X − ≤ X ≤ X+ , then
(n) (n)
P(X − ) ≤ P(X ) ≤ P(X + ).
(n)
It is easy to see, using the continuity of f (x), that as n → ∞ both P(X − ) and
(n)
P(X + ) converge to
b
x f (x)dx = x f (x)dx,
a R
46 3 One-Dimensional Absolutely Continuous Distributions
that is hence the value of P(X ). Approximation arguments lead to extend this formula
to the case of a general X with absolutely continuous distribution with probability
density f (x) provided that
|x| f (x)dx < ∞, (3.1)
R
i.e. one assume that, when (3.1) holds true, the expectation of X in the absolutely
continuous case is given by:
+∞
P(X ) = x f (x)dx.
−∞
σ 2 (X ) = P(X 2 ) − P(X )2
+∞ +∞ 2
= x f (x) dx −
2
x f (x) dx ,
−∞ −∞
provided that the integrals exist. In the following sections we shall introduce some
of the most common one-dimensional absolutely continuous distributions.
A random number X has uniform distribution in [0, 1] if its c.d.f. is given by:
⎧
⎨ 0 x ≤ 0,
F(x) = x 0 < x < 1,
⎩
1 x ≥ 1.
As in the following examples the values of the p.d.f. in discontinuity points can be
chosen in an arbitrary way. The expectation is given by
1
1
x2 1
P(X ) = x f (x) dx = x dx = = ,
R 0 2 0 2
A random number X has uniform distribution in [a, b] if its c.d.f. is given by:
⎧
⎨0 x ≤ a,
F(x) = c(x − a) a < x < b,
⎩
1 x ≥ 1.
In order to compute the constant c, we impose the continuity in the point x = b and
get c(b − a) = 1, that is:
1
c= .
b−a
If X is the time when a certain fact happens (for example when the atom of some
isotope decays), the exponential distribution has the property of absence of memory.
Given x, y ≥ 0 we have:
i.e. the probability that the fact does not occur for an extra amount of time x, given
that has not occurred up to time y, is the same as the probability starting from the
initial time. We obtain (3.3) by using the formula of composite probability:
P(X > x + y, X > y)
P(X > x + y | X > y) =
P(X > y)
P(X > x + y)
=
P(X > y)
e−λ(x+y)
=
e−λy
−λx
= e
= P(X > x).
In the following we shall see that the exponential distribution can be obtained as
limit of suitably rescaled geometric distributions. Geometric distribution has also the
property of absence of memory for discrete times, as we have remarked in Sect. 2.4.
The expectation of exponential distribution with parameter λ is equal to
+∞ +∞
+∞ 1
P(X ) = λxe−λx dx = −xe−λx 0 + e−λx dx = .
0 0 λ
σ 2 (X ) = P(X 2 ) − P(X )2
+∞
1
= λx 2 e−λx dx − 2
0 λ
+∞
2 −λx +∞ 1
= −x e + 2 xe−λx dx − 2
0
0 λ
2 1
= 2− 2
λ λ
1
= 2.
λ
3.7 A Characterization of Exponential Distribution 49
Then
P(x < X < x + h) f (x) d
lim = =− log(1 − F(x)).
h→0 hP(X > x) 1 − F(x) dx
For exponential distribution with parameter λ, it is easy to see that the hazard rate is
equal to λ for all x. Indeed:
f (x) λe−λx
h(x) = = −λx = λ.
1 − F(x) e
d
h(x) = − log(1 − F(x)),
dx
we have that for x ≥ 0
x
log(1 − F(x)) = − h(y)dy (3.4)
0
x
= 1 − F(x) = exp − h(y)dy (3.5)
0 x
= F(x) = 1 − exp − h(y)dy . (3.6)
0
where a change to polar coordinates x = r cos θ, y = r sin θ has been used. The
Jacobian determinant of this change of variable is r (see Appendix H).
+∞ x 2 √
It follows that −∞ e− 2 = 2π and so
1
K =√ .
2π
Since n is an even function and its integral over the whole line is equal to 1, we
have:
N (−x) = 1 − N (x).
3.8 Normal Distribution 51
Therefore in tables of N (x), only values for positive values of x are usually tabulated.
The expectation of standard normal distribution is
+∞
P(X ) = x n(x) dx = 0,
−∞
We introduce now the general normal distribution which has two parameters m, σ 2
and will be denoted by N (m, σ 2 ). We start with X ∼ N (0, 1) and consider Y =
m + σ X , where σ > 0. Then Y has normal distribution N (m, σ 2 ). The c.d.f. of Y is
given by:
FY (y) = P(Y ≤ y)
= P(m + σ X ≤ y)
y−m
= P X≤
σ
y−m
=N .
σ
The probability density function of Y is obtained by chain rule for the derivative of
a composite function:
d y−m 1 y−m 1 (y−m)2
f Y (y) = N = n = √ e− 2σ 2 .
dy σ σ σ σ 2π
σ 2 (Y ) = σ 2 (σ X + m) = σ 2 σ 2 (X ) = σ 2 .
As we have said, there is no formula in terms of elementary functions for N (x) and
therefore for the probability that a random number X ∼ N (0, 1) is greater than some
x > 0. It is however possible to give asymptotic estimates for this probability as x
tends to infinity.
52 3 One-Dimensional Absolutely Continuous Distributions
Let α and λ be strictly positive real numbers. The random number X is said to have
gamma distribution Γ (α, λ) if its probability density function is given by
K x α−1 e−λx x > 0,
gα,λ (x) =
0 x ≤ 0.
Γ (α) = (α − 1)!
+∞
since Γ (1) = 0 e−x dx = 1.
Now for the p.d.f. gα,λ we have
+∞ +∞ +∞
K K
1= gα,λ (x) dx = K x α−1 e−λx dx = α y α−1 e−y dy = α Γ (α).
−∞ 0 λ 0 λ
Hence
λα
K = .
Γ (α)
The expectation and the variance of the gamma distribution can be computed
using the recurrence property of gamma function:
+∞
P(X ) = xgα,λ (x) dx
−∞
α +∞
λ
= x α e−λx dx
Γ (α) 0
λα Γ (α + 1)
=
Γ (α) λα+1
α
= .
λ
It follows that:
α(α + 1) α2 α2
σ 2 (X ) = P(X 2 ) − P(X )2 = − 2 = 2.
λ 2 λ λ
54 3 One-Dimensional Absolutely Continuous Distributions
3.11 χ2 -Distribution
From the normal distribution we can derive another distribution of wide use in sta-
tistics, the χ2 -distribution. In this section we introduce the χ2 -distribution with
parameter ν = 1. In Chap. 4 we shall consider general χ2 -distributions with parame-
ter ν ∈ N \ {0}.
Let X be a random number with standard normal distribution N (0, 1) and let
Y = X 2 . We first consider the c.d.f. of Y . If y < 0
FY (y) = P(Y ≤ y) = 0
1
f Y (y) = FY (y) = 2n(y) √
y
1 1 y 1
= √ √ e− 2 = √ y 2 −1 e− 2 y ,
1 1
2π y 2π
where the derivative has been computed by using chain rule for the derivative of
composite functions. The density f Y (y) is of course zero for negative y. It follows
that Y has distribution Γ ( 21 , 21 ). Moreover by comparing the normalizing constants,
we get
21
1 1 1 1
√ √ = ,
2 π 2 Γ 21
so that
1 √
Γ = π.
2
3.11 χ2 -Distribution 55
for k = 1, 2, . . ..
We now consider a distribution for which the expectation defined in Sect. 3.3 does not
exist. This is the Cauchy distribution. This is the distribution of a random number
Y = tan Θ, where the random number Θ has uniform distribution in the interval
π π
[− , ]. We have for y ∈ R that:
2 2
1
f Y (y) = .
π 1 + y2
In addition to discrete and absolutely continuous c.d.f.’s, there are continuous but
not absolutely continuous c.d.f.’s. These will be not considered in this elementary
book. Here we briefly speak about mixed c.d.f.’s that are convex linear combinations
of discrete and absolutely continuous c.d.f.’s.
For 0 < p < 1, let F1 be a discrete c.d.f. and F2 be an absolutely continuous c.d.f.
Then we can consider a c.d.f. F(x):
where X 1 and X 2 are random numbers with c.d.f. F1 and F2 respectively, provided
that the terms on the right-hand side both make sense. The first term is expressed by
a sum or a series, while the second by an integral.
An example of random number with mixed c.d.f is the time T of function of some
device, for example a lamp, when there is a positive probability p that the device
does not work already at the initial time and otherwise the distribution is absolutely
continuous, for example exponential with parameter λ. The c.d.f of T is then given
by:
0 for t < 0,
FT (t) = −λt
p + (1 − p)(1 − e ) for t ≥ 0.
1− p
It is easy to check that P(T ) = .
λ
Chapter 4
Multi-dimensional Absolutely Continuous
Distributions
Let X, Y be two random numbers that we can consider as a random vector (X, Y ).
The joint cumulative distribution function ( j.c.d.f.) is defined as:
F : R2 −→ [0, 1].
The probability that (X, Y ) belong to the rectangle (a1 , b1 ] × (a2 , b2 ] is given by:
We shall always assume that the following continuity properties are verified:
1. lim F(x, y) = 1;
x→+∞
y→+∞
2. lim F(x, y) = lim F(x, y) = 0;
x→−∞ y→−∞
3. lim+ F(x, y) = F(x0 , y0 );
x→x0
y→y0+
4. P(X = x0 , Y = y0 ) = F(x0 , y0 ) − F(x0− , y0 ) − F(x0 , y0− ) + F(x0− , y0− ),
where F(x0− , y0 ) := lim x→x0− F(x, y0 ), F(x0 , y0− ) := lim y→y0− F(x0 , y) and F(x0− ,
y0− ) := lim− F(x, y).
x→x0
y→y0−
Other analogous properties will also be assumed. We shall quote them when they
will be needed.
Given two random numbers X, Y with j.c.d.f. F(x, y), the c.d.f.’s F1 , F2 of X and
Y are called marginal cumulative distribution functions (m.c.d.f.’s).
The m.c.d.f. of X is obtained from the j.c.d.f. by taking the limit:
f : R2 −→ R
Such function f is called joint probability density (j.p.d.). Applying formula (4.1)
for the probability that (X, Y ) belong to a rectangle (a, b] × (c, d], we get:
4.3 Absolutely Continuous Joint Distributions 59
By usual limiting procedure one gets that the probability that a random vector
(X, Y ) belongs to a sufficiently regular region A of R2 is given by the integral of the
j.p.d.f. over A, i.e.
P((X, Y ) ∈ A) = f (s, t) dsdt.
A
It is easy to check out that, if f (X, Y ) can be expressed as a product of two functions,
f (x, y) = u(x)v(y),
60 4 Multi-dimensional Absolutely Continuous Distributions
then X and Y are stochastically independent and their marginal probability densities
are proportional to u(x) and v(y). Conversely if X and Y are stochastically indepen-
dent and their joint distribution is absolutely continuous, then their joint probability
density can be expressed as the product of their marginal probability densities:
Let X and Y be two random numbers with joint probability density f (x, y). We want
to determine the density of
Z = X + Y.
where we have made the change of variable t = x + y for fixed x, that allows then
to exchange the order of integration in the final equality. It follows from the last
expression that
z
f Z (z) = f z (t) dt
−∞
with +∞
f Z (z) = f (x, z − x) dx,
−∞
where I A denotes the indicator function of the set A. The integral can be written as
λα+β z
e−λz x α−1 (z − x)β−1 dx
Γ (α)Γ (β) 0
with
λα+β 1
K = t α−1 (1 − t)β−1 dt. (4.4)
Γ (α) Γ (β) 0
Remark 4.4.1 Since the constant K must be equal to the normalizing constant of the
distribution Γ (a + b, λ), by (4.4) we obtain
λα+β 1
λα+β
K = t α−1 (1 − t)β−1 dt = ,
Γ (α + β) 0 Γ (α + β)
so that 1
Γ (α) Γ (β)
t α−1 (1 − t)β−1 dt = .
0 Γ (α + β)
62 4 Multi-dimensional Absolutely Continuous Distributions
Let α > 0 and β > 0. A random number X is said to have beta distribution B(α, β)
if its density f (x) is given by
⎧
⎨ K x α−1 (1 − x)β−1 x ∈ [0, 1],
f (x) =
⎩
0 otherwise.
It follows from the computation at the end of the previous section that
1 Γ (α + β)
K = 1 = . (4.5)
x α−1 (1 − x)β−1 dx Γ (α) Γ (β)
0
The expectation can be obtained from the recursion property of Euler’s gamma
function. If X has B(α, β) distribution, then
Γ (α + β) 1
P(X ) = x f (x) dx.
Γ (α) Γ (β) 0
Γ (α + 2) = (α + 1)α Γ (α)
Γ (α + β + 2) = (α + β + 1) (α + β) Γ (α + β)
so that
α(α + 1)
P(X 2 ) =
(α + β) (α + β + 1)
4.5 Beta Distribution B(α, β) 63
and
σ 2 (X ) = P(X 2 ) − P(X )2
(α + 1) α α2 αβ
= − = .
(α + β + 1) (α + β) (α + β) 2 (α + β) (α + β + 1)
2
where
1 z2 ν
ν e− 2 u 2 −1 e− 2 .
u
f (z, u) = √ ν
2 2πΓ 2
2
By taking the derivative of FT (t) with respect to t, it follows from the fundamental
calculus theorem that for density of the Student distribution is given for t > 0 by
∞
u u
f T (t) = FT (t) = f (t , u) du
ν ν
∞
0
1 ν+1 t2
u 2 −1 e− 2 (1+ ν ) du
u
= √ ν
ν
2 2πνΓ 2 0
2
Γ ν+1 t 2 ν+1
= √ 2
ν (1 + )− 2 ,
πνΓ 2 ν
where the integral has been computed by using the formula for the normalizing
constant of the gamma distribution. Note that for ν = 1 the Student distribution
coincides with the Cauchy distribution. Since
+∞
|t|
t 2 ν+1
dt
−∞ (1 + ν
) 2
64 4 Multi-dimensional Absolutely Continuous Distributions
must be finite for the existence of P(T ), we have that P(T ) exists and is finite if and
only if ν > 1. We have that
+∞
Γ ν+1 t 2 ν+1
P(T ) = √ 2
ν t (1 + )− 2 dt = 0,
πνΓ 2 −∞ ν
F : Rn −→ [0, 1]
defined by:
F(x1 , x2 , . . . , xn ) = P(X 1 ≤ x1 , X 2 ≤ x2 , . . . , X n ≤ xn )
Here Fi1 ,...,ik (xi1 , . . . , xik ) := P(X i1 ≤ xi1 , . . . , X ik ≤ xik ), xi1 , . . . , xik ∈ R, is
called the marginal cumulative distribution function (m.c.d.f.) of X i1 , . . . , X ik . As in
the two-dimensional case the probability that X 1 , . . . , X n belongs to some intervals
(a1 , b1 ], . . . , (an , bn ] can be computed using the j.c.d.f. Precisely:
P(a1 < X 1 ≤ b1 , . . . , an < X n ≤ bn ) = (−1)
(c) F(c1 , . . . , cn )
c
4.7 Multi-dimensional Distributions 65
Moreover it can be shown that one can always choose a non-negative f . The function
f is called joint probability density ( j.p.d.) of (X 1 , . . . , X n ). What we have said
about two-dimensional joint probability density generalizes in a natural way to the
n-dimensional case.
If A is a sufficiently regular region A ⊂ Rn then
P((X 1 , . . . , X n ) ∈ A) = ··· f (t1 , . . . , tn )dt1 · · · dtn .
A
66 4 Multi-dimensional Absolutely Continuous Distributions
f (x1 , x2 , . . . , xn ) = K e− 2 Ax·x+b·x
1
n
b·x = bi xi
i=1
1 Recallthat a matrix A ∈ Rn × n is
• symmetric if At = A, i.e. ai j = a ji ,
• positive definite if Ax · x > 0 for all x
= 0, x ∈ R.
4.9 Multi-dimensional Gaussian Distribution 67
we can always replace the matrix B with a symmetric matrix A such that
Ax · x = ai j xi x j = Bx · x,
i, j
where ai j is defined by
⎧
⎨ bii for i = j,
ai j =
⎩
(bi j + b ji )/2 for i
= j.
and b = 0. We obtain2
x12 x22 xn2
f (x1 , x2 , . . . , xn ) = K exp − λ1 + λ2 + · · · + λn .
2 2 2
where
λi λi xi2
f X i (xi ) = exp −
2π 2
2 Here the notation exp (x) is introduced to denote the exponential function e x .
68 4 Multi-dimensional Absolutely Continuous Distributions
⎛ ⎞
1
λ1
0 ··· 0
⎜ 1 ...
.. ⎟
⎜ 0 . ⎟
=⎜
⎜ ..
λ2 ⎟
⎟
⎝ .. ..
. . . 0⎠
0 ··· 0 1
λn
= A−1 .
It follows that the joint probability density can be similarly obtained from that
of X :
fU (u 1 , u 2 , . . . , u n ) = f X (u 1 + c1 , u 2 + c2 , . . . , u n + cn )
1
= K exp − A(u + c) · (u + c) + b · (u + c)
2
1 1 1 1
= K exp − Au · u − Au · c − Ac · u − Ac · c + b · u + b · c
2 2 2 2
1 1
= K exp − Ac · c + b · c exp − Au · u + (b − Ac) · u ,
2 2
constant
Ac · u = Au · c,
b − Ac = 0.
We choose therefore
c = A−1 b.
Note that A is invertible since it is positive definite. For this choice of c the density
fU (u 1 , u 2 , . . . , u n ) is given by:
fU (u 1 , u 2 , . . . , u n ) = f X (u 1 + c1 , u 2 + c2 , . . . , u n + cn )
−1 A(A−1 b) · A−1 b 1
= K exp A b · b − exp − Au · u
2 2
1 −1 1
= K exp A b · b exp − Au · u
2 2
K
1
= K exp − Au · u .
2
P(X ) = A−1 b ;
70 4 Multi-dimensional Absolutely Continuous Distributions
where the expectation of a random vector is defined as the vector of the expectations
of its components. The normalizing constant is
1 −1
K = K exp A b·b ,
2
where K is the normalizing constant for the case with b = 0. The covariance matrix
of X is equal to one of U , as a translation leaves variances and covariances unchanged:
⎛ ⎞
σ 2 (X 1 ) cov(X 1 , X 2 ) ··· cov(X 1 , X n )
⎜ ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ cov(X 2 , X 1 ) σ 2 (X 2 ) . . ⎟
⎜ ⎟
C=⎜
⎜
⎟
⎟
⎜ .. .. .. ⎟
⎜ . . . cov(X n−1 , X n ) ⎟
⎜ ⎟
⎝ ⎠
cov(X n , X 1 ) ··· cov(X n , X n−1 ) σ (X n )
2
⎛ ⎞
σ 2 (U1 ) cov(U1 , U2 ) ··· cov(U1 , Un )
⎜ ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ cov(U2 , U1 ) σ 2 (U2 ) . . ⎟
⎜ ⎟
=⎜
⎜
⎟.
⎟
⎜ .. .. .. ⎟
⎜ . . . cov(Un−1 , Un ) ⎟
⎜ ⎟
⎝ ⎠
cov(Un , U1 ) ··· cov(Un , Un−1 ) σ 2 (Un )
Now for U we are in the situation of a diagonal matrix that we have already consid-
ered. The covariance matrix of X , is given by:
Here the expectation of a random matrix denotes a matrix whose entries are the
expectations of the corresponding entries. We have used the easily verifiable fact
that if Z is a random matrix and A, B are constant matrices, such that the product
AZ B is defined, then P(AZ B) = AP(Z )B.
We have found that in the general case
1. the normalization constant is
det A − 1 A−1 b·b
K = e 2 ;
(2π)n
2. the expectation is
P(X ) = A−1 b ;
Remark 4.9.1 It is easy to check that the marginal distribution of the X i ’s and of
subsets of the X i ’s are gaussian. In particular, if cov(X i , X j ) = 0 for some i, j i
= j,
then the covariance matrix of (X i , X j ) is diagonal, so that X i and X j are stochastically
independent as it is shown in the next remark.
Remark 4.9.2 When n = 2, the covariance matrix is given by:
⎛ ⎞
σ12 ρ σ1 σ2
C =⎝ ⎠
ρ σ1 σ2 σ22
1
f (x, y) = ·
2πσ1 σ2 1 − ρ2
1 (x − m 1 )2 ρ(x − m 1 )(y − m 2 ) (y − m 2 )2
exp − −2 + .
2(1 − ρ2 ) σ12 σ1 σ2 σ22
Chapter 5
Convergence of Distributions
However it is not true in this case that Fn (x) → F(x) for every x ∈ R. Indeed
Fn (0) = 0 for every n, whereas F(0) = 1. Therefore it is natural to introduce a
weaker definition of convergence.
Definition 5.1.1 We say that Fn → F for every x if for every > 0 there exists N
such that for n ≥ N
and also
where [x] denotes the integer part of x, then Fn → F, where F is the c.d.f. of the
uniform distribution in [0, 1]:
⎧
⎪
⎨0 for x ≤ 0,
F(x) = x for 0 < x ≤ 1,
⎪
⎩
1 for x > 1.
5.2 Convergence of Geometric Distribution to Exponential Distribution 75
We have seen that geometric and exponential distributions share the property of
absence of memory, the former among discrete distributions, the latter among ab-
solutely continuous distributions. Let us now consider a sequence (X n )n∈N of random
numbers with geometric distributions with parameters pn :
P(X n = k) = pn (1 − pn )k−1 , ∀k ≥ 1.
FYn → F,
[nx] 1
= 1 − pn (1 − pn )
1 − (1 − pn )
= 1 − (1 − pn )[nx] ,
where we have used the formula for the sum of geometric series. We write
We obtain therefore
(1 − pn )δn −−−→ 1,
n→∞
We observe that:
1 k−1
• 1− ··· 1 − tends to 1 as n → ∞;
n n
• (npn )k tends to λk for n → ∞;
• (1 − pn )−k tends to 1 for n → ∞;
• (1 − pn )n tends to e−λ for n → ∞ as
tends to −λ.
5.3 Convergence of Binomial Distribution to Poisson Distribution 77
Theorem 5.4.1 Let (X n )n∈N be a sequence of random numbers with binomial dis-
tribution Bn(n, p) with 0 < p < 1 and let X n∗ be the corresponding standardized
random numbers given by
X n − P(X n ) X n − np
X n∗ = =
σ(X n ) np p̃
hn x2
P(X n∗ = x) = √ e− 2 e En (x) ,
2π
1
where h n = and the error E n (x) tends uniformly to 0 when x ranges on
np p̃
I (X n∗ ) [−K , K ] for any fixed constant K .
1
where h n = is the spacing between possible values of X n∗ .
np p̃
We define φn (x) = log P(X n∗ = x) for x ∈ I (X n∗ ) and consider its incremental
ratio:
φn (x + h n ) − φn (x) 1 P(X n∗ = x + h n )
= log .
hn hn P(X n∗ = x)
78 5 Convergence of Distributions
Putting k = np + x np p̃, we obtain
1 P(X n∗ = x + h n ) 1 P(X n = k + 1)
log = log
hn P(X n∗ = x) hn P(X n = k)
1 (n − k) p
= log
hn (k + 1) p̃
n p̃ − x np p̃ p
= np p̃log
n + 1 + x np p̃ p̃
p
1 − x
n p̃
= np p̃log .
1 p̃
1+ +x
np np
The function φn (x) is not defined everywhere, but only for x in I (X n∗ ). We can extend
it to values between two elements of I (X n∗ ) by linear interpolation. In this way we
can write x
φn (x) = φn (0) + φn (y)dy .
0
x2 + 1 x2 + 1
If x ≤ y ≤ x +h n , then φn (y) = h n φn (x) = −x + O( √ ) = −y + O( √ )
n n
so that:
x
φn (x) = φn (0) + φn (y)dy
0
x
|x|3 + |x|
= φn (0) + (−y)dy + O( √ )
0 n
x2 |x|3 + |x|
= φn (0) − + O( √ ).
2 n
5.4 De Moivre-Laplace Theorem 79
x2
log P(X n∗ = x) = eφn (0) e− 2 e En (x)
|x|3 + |x|
where E n (x) = O √ .
n
We can estimate eφn (0) in the following way: X n∗ is a standardized random number,
i.e. P(X n∗ ) = 0 and σ 2 (X n∗ ) = 1. By the Chebychev inequality, we have that:
1
P(|X n∗ | ≥ K ) ≤ .
K2
K can be chosen so that this probability is arbitrary small, that is for every > 0
there is K such that:
1
1−=1− ≤ P(|X n∗ | < K ) ≤ 1.
K2
Since P(|X n∗ | < K ) = x,|x|<K P(|X n∗ | = x), it follows that:
1−≤ P(X n∗ = x) ≤ 1 .
x,|x|<K
Moreover
x2
P(|X n∗ | < K ) = P(|X n∗ | = x) = h n e− 2 .
x,|x|<K x,|x|<K
x2
Since E n (X ) tends uniformly to 0 on bounded interval and x,|x|<K h n e− 2 is the
x2 K x2
Riemann sum for the function e− 2 and tends to −K e− 2 d x, we have for n suffi-
ciently large that:
eφn (0) K − x 2
1 − 2 ≤ e 2 dx ≤ 1 .
h n −K
eφn (0) √
1 − 3 ≤ 2π ≤ 1
hn
so that
eφn (0) √
2π −−−→ 1 .
hn n→∞
80 5 Convergence of Distributions
It follows that
hn x2
P(|X n∗ | = x) = √ e− 2 e En (x) ,
2π
where E n (x) is an error that tends uniformly to 0 for x ranging on the possible values
of X n∗ in a bounded interval.
1 x2
This is the Riemann sum of n(x) = √ e− 2 , therefore it converges to
2π
b
1 x2
√ e− 2 d x = N (b) − N (a) ,
2π a
where N (x) is the c.d.f. of standard Gaussian distribution. The c.d.f. Fn (x) of X n∗
converges to N (x) since
with lim E n (x) = 0. The Chebychev inequality states that P(X n∗ ≤ −k) can be
n→∞
made arbitrarily small. Also N (−k) tends to 0 for k −→ ∞. Therefore the c.d.f. of
standardized binomial distributions tend to N .
Chapter 6
Discrete Time Markov Chains
where
1. ρsi , si ∈ S, is called initial distribution:
ρsi = P(X 0 = si ) and ρs = 1 .
s∈S
[P]s,s =: ps,s .
The Markov chain (X i )i∈N can be seen as representing the evolution of a system that
moves from one state to another in a random fashion. We have assumed that S ⊂ R,
but it may be convenient to consider in some situations a general finite set S. In this
case X i are not random numbers, but random entities. However what follows goes
through without any change.
We show now that ps,s is the probability to go from state s to state s . Moreover
we show that the probability that X r +1 = s conditional to all previous history
X 0 = s0 , . . . , X r −1 = sr −1 , X r = s depends just on s and is equal to ps,s (Markov
property). Indeed:
P(X r +1 = s |X r = s, X r −1 = sr −1 , . . . , X 0 = s0 )
P(X r +1 = s , X r = s, X r −1 = sr −1 , . . . , X 0 = s0 )
=
P(X r = s, X r −1 = sr −1 , . . . , X 0 = s0 )
ρs0 ps0 ,s1 · · · psr −1 ,Ps,s
=
ρs0 ps0 ,s1 · · · psr −1 ,s , ps,s
= ps,s ,
where 0 < p < 1. Boundary conditions are determined by the transition probabil-
ities from state a and b. Other boundary conditions can be considered: reflecting,
mixed, …In the case p = 21 we speak of symmetric random walk.
Example 6.1.2 (Bernoulli-Laplace chain). Let us consider two urns A and B, each
containing N balls. The balls are assumed to be identical apart from their colors.
Among the balls there are N white balls and N black balls. At each integer time we
choose one ball from each urn and exchanges them.
Let X i be the random number of white balls in A at time i. The state space is
S = {0, 1, . . . , N }.
pk,k+1 = P(1 black ball from urn A and 1 white ball from B )
N −k N −k (N − k)2
= = ; (6.2)
N N N2
pk,k−1 = P(1 white ball from A and 1 black ball from B )
k k k2
= = 2. (6.3)
N N N
The transition probabilities to other states are zero. This applies also to the case k = 0
and k = N . The transition matrix is therefore:
⎛ ⎞
0 1 0 0 ··· 0
⎜ 1 2(N −1) N −1 2 ⎟
⎜ N2 N2 0 ··· 0⎟
⎜ N
4(N −2)
2 ⎟
P=⎜ ··· 0⎟
4 N −2
⎜ 0 N2 N2 N ⎟.
⎜ .. .. ⎟
⎝ . . ⎠
0 ··· 0 ··· 1 0
By using composite probability formula we can compute the probability for a Markov
chain to go from state s to state s in n steps. Let s0 , s1 , . . . , sm−1 , s be a sequence of
states such that ρs0 ps0 ,s1 ps1 ,s2 · · · psm−1 ,s are strictly positive. We have:
= P n s,s .
84 6 Discrete Time Markov Chains
This probability does not depend on m, but just on n, that is the number of intermediate
steps. It is obtained as the element with coordinates s, s of the n-th power of the
(n)
transition matrix P. In the following we will use the common notation ps,s for this
probability: n
(n)
ps,s := P(X m+n = s |X m = s) = P s,s .
Let (X i )i∈N be a homogeneous Markov chain. We say that the state s communicates
with the state s if there exists n > 0 such that
(n)
ps,s > 0,
that is if there exists a path s, s1 , . . . , sn−1 , s such that all transition probabilities
ps,s1 , ps1 ,s2 , . . . , psn−1 ,sn are strictly positive. We will use the notation s ≺ s to
indicate that s communicates with s .
Two states s, s , are said to be equivalent if s ≺ s and s ≺ s. This is an equivalence
relation, i.e. it is reflexive, symmetric and transitive. The first two properties are
evident. Transitivity follows from transitivity of communication. Assume that s ≺ s
and s ≺ s . Then there are n 1 , n 2 such that ps,s n1 n2
> 0 and ps ,s > 0. It follows that
s ≺ s . Indeed:
(n 1 +n 2 )
(n 1 ) (n 2 ) (n 1 ) (n 2 )
ps,s = P n 1 +n 2 s,s = ps,s ps1 ,s ≥ ps,s ps ,s > 0 .
s1
1
>0 >0
A+ (n)
s = { n > 0 | ps,s > 0} .
If A+s = ∅, we define the period of s as the greatest common divisor (GCD) of the
elements of A+ s . If the period of s is 1, we say that s is an aperiodic state. For example,
in the random walk on the interval [a, b] with absorbing boundary conditions all states
s with a < s < b have period 2.
All states of an equivalence class have the same period. Therefore one can speak
of the period of an equivalence class.
(n 1 ) (n 2 )
(n 1 +n 2 ) (n 1 ) (n 2 )
ps,s = ps,s 1 s1 ,s
≥ ps,s ps ,s > 0.
s1
Similarly (n 1 + n 2 ) ∈ A+
s ; hence q and q both divide (n 1 + n 2 ). Moreover for all
+ +
n ∈ As , (n + n 1 + n 2 ) ∈ As , since
ps(n+n
,s
1 +n 2 )
≥ ps(n ,s2 ) ps,s
(n) (n 1 )
ps,s > 0 .
+
Hence, q and q divide (n + n 1 + n 2 ) for all n ∈ A+
s and for all n ∈ As . Since n 1 , n 2
are divisible by q and by q , q and q are both common divisors of As and of A+
+
s , so
that
q = q .
C = C0 ∪ C1 ∪ · · · ∪ Cq−1
(n)
with the property that if s ∈ Ci , s ∈ C j and ps,s > 0 then
n ≡ ( j − i) (mod q) .
and ∀ s ∈ S
lim ps(n)
,s = πs
n→+∞
This theorem can be used also in the case when the period q is strictly larger than 1,
by considering the Markov chain with transition matrix P q . Indeed, the restriction of
this chain to each of the subsets C0 , C1 , . . . , Cq−1 satisfies the hypothesis of ergodic
theorem.
The probability distribution Π that appears in the statement of the ergodic theorem
is an invariant (or stationary) distribution for the Markov chain: this means that if
we take it as initial distribution, so that P(X 0 = s) = πs for every s ∈ S, then for
every s ∈ S and for every n ≥ 0
P(X n = s) = πs .
πs = P(X 1 = s)
= P(X 0 = s ) ps ,s
s ∈S
= πs ps ,s .
s ∈S
6.4 Ergodic Theorem 87
Under the hypothesis of the ergodic theorem one can show that there is one and
only one solution for this system of |S| + 1 equations in |S| unknowns; one of the
equations, in this case one of the first |S| equations, is a linear combination of the
others and therefore it can be skipped in the solution of the system:
Πt = Πt P
(6.4)
πs ∈S πs = 1 ,
Proof Let us assume that (μs )s∈S is another probability distribution on the state space
satisfying system (6.4). We have
μt = μt P
s∈S μs = 1 ,
where we have represented the distribution (μs )s∈S as the |S|-dimensional column
vector μ. We have
μt = μt P ⇒ μt = μt P = μt P 2 = · · · = μt P n .
therefore for s ∈ S
μs = μs ps ,s = μs ps(n)
,s .
s s
μs = μs πs = πs μs = πs .
s s
1
Chapter 7
Continuous Time Markov Chains
7.1 Introduction
In this chapter we shall introduce some simple queueing systems. For further reading,
we refer to [7, 8].
A queueing system can be described in terms of servers and a flow of clients
who access servers and are served according to some pre-established rules. The
clients after service can either stay in the system or leave it, also according to some
established rules.
The simplest case is when there is a single set of servers and a flow of clients
accessing to it. If there is at least one free server, then an incoming client is served
right away. Otherwise, i.e. if all servers are engaged, he is put in a queue and waits
for his turn. Once a client is served, he leaves the system.
Usual hypotheses are that service times are stochastically independent, identically
distributed, and moreover that they are stochastically independent from the flow of
clients’ arrivals. One would like to obtain the probabilities that, at given times, there
are some numbers of clients in the system. For this one needs to introduce a random
number for each time t; this leads us to introduce the notion of stochastic process.
1. M denotes the Poisson process for the flow of incoming clients or exponential
distribution for service times;
2. Er denotes the Erlang distribution with parameter r for the inter-arrival times
of clients (that are supposed to be stochastically independent and identically
distributed) or for service times. The Erlang distribution with parameter r is the
distribution of a sum of r stochastically independent exponential random numbers
with the same parameter;
3. D denotes deterministic (non-random) inter-arrival times or service times;
4. G indicates that one does not make any particular hypothesis on the inter-arrival
times or service times (that however are always assumed to be stochastically
independent).
A process of the type we have described will be indicated by three symbols separated
by two slashes. The first symbol refers to the distribution of inter-arrival times, always
assumed to be stochastically independent and identically distributed (i.i.d.). The
second symbol refers to the distribution of service times. The third symbol indicates
the number of servers; it can possibly take the value ∞.
We shall consider three examples of queueing systems and precisely the systems
M/M/1, M/M/n with n > 1 and M/M/∞. Before that we shall speak about
continuous time Markov chains with countable state space, and in particular introduce
the Poisson process Nt , t ≥ 0, that for these queueing systems represents the number
of clients who entered the system before time t.
P(X 0 = s, X t1 = s1 , . . . , X tn = sn )
= ρs0 ps0 ,s1 (t1 ) ps1 ,s2 (t2 − t1 ) . . . psn−1 ,sn (tn − tn−1 ).
Π (t + t ) = Π (t) Π (t ) ∀ t, t ≥ 0
or explicitly:
ps,s (t + t ) = ps,s (t) ps ,s (t ).
s
In order to treat interesting examples, such as those arising from queueing theory,
we need to consider the case of strictly denumerable state spaces. In this case Π (t)
is a matrix with infinitely many rows and columns with non-negative entries, such
that the sum of the series of the elements of each row is equal to 1.
The product of two matrices of this kind can be defined according to the usual row
times column rule, where the finite sum is replaced by a series. It is easy to check
that the result is still a matrix of this kind.
In the case of discrete time the transition probabilities in more steps can be obtained
from those in one step. In the case of continuous time analogously transition prob-
abilities in a finite time t can be obtained starting from their behavior as t becomes
infinitely small. The simplest case is the Poisson process.
A Poisson process is a continuous time Markov chain with state space S = N. In the
following we shall use a Poisson process as a model for the flow of clients entering a
queueing system. For the quantities that we shall consider the order in which clients
are served does not matter. A Poisson process N = (Nt )t≥0 with parameter λ, where
λ > 0, is characterized by the following properties:
We put:
μs (t) = p0,s (t) for s ∈ N
and denote by μs the first derivative of μs . The functions μs verify the system of
equations:
μ0 (t) = −λμ0 (t)
(7.1)
μs (t) = −λμs (t) + λμs−1 (t) for s ≥ 1,
μs (t + h) − μs (t)
as we now show. Consider for s > 0 the incremental ratio for
h
h > 0. We have:
μs (t + h) − μs (t) p0,s (t + h) − p0,s (t)
=
h h
p
j 0, j (t) p j,s (h) − p0,s (t)
=
h
1
= ((1 − λh + o(h)) p0,s (t) + (λh + o(h)) p0,s−1 (t))
h
⎛ ⎞
⎜ ⎟
1 ⎜ ⎟
+ ⎜ p0, j (t) p j,s (h) − p0,s (t)⎟
h ⎝ ⎠
j
j=s, j=s−1
o(h) o(h)
= −λ p0,s (t) + λ p0,s−1 (t) + = −λμs (t) + λμs−1 (t) + .
h h
where we have used the notation for the derivative since it is easy to show that it
exists. For s = 0, we obtain for h > 0:
i.e. for each t we have that Nt has Poisson distribution with parameter λt.
If we take ρs̄ = 1 and ρs = 0 for s = s̄ i.e. assume that P(N0 = s̄) for some
arbitrary state s̄, then we obtain the transition probabilities starting from s̄:
⎧
⎨ ps̄,s (t) = 0 ¯
for s < s̄,
(λt) s−s̄
¯ (7.2)
⎩ ps̄,s (t) = e−λt for s ≥ s̄.
(s − s̄)!
Let us prove that (7.2) provides a solution for the system with initial state s̄. Let us
consider the generating function:
Φ(z, t) = ps̄,s (t)z s .
s
We derive Φ(z, t) with respect to t. It is easy to see that the derivative can be
exchanged with the series. By applying the system of equation for μs (t) = ps̄,s (t),
we obtain
∞ ∞
∂
Φ(z, t) = μs (t)z s = −λ μs (t)z s + λ μs−1 (t)z s = λ(z − 1)Φ(z, t).
∂t s s=0 s=1
Therefore
1 ∂ ∂
Φ(z, t) = log Φ(z, t) = λ(z − 1),
Φ(z, t) ∂t ∂t
so that
log Φ(z, t) = λ(z − 1)t + K ,
that is Φ(z, t) = e K eλ(z−1)t . Since μs̄ (0) = 1 and μs (0) = 0 for s = s̄, we have
Φ(z, 0) = z s̄ . We have therefore:
(λt)k
Φ(z, t) = z s̄ eλ(z−1)t = e−λt z s̄ eλzt = e−λt z s̄+k .
k
k!
94 7 Continuous Time Markov Chains
0 1 2 3
(λt)s−s̄ −λt
It follows that ps̄,s (t) = 0 for s < s̄, ps̄,s (t) = e for s ≥ s̄. The Poisson
(s − s̄)!
process is non-decreasing with probability 1. It can be represented as in Fig. 7.1,
when an arrow connecting two states with superscript λ indicates that the transition
intensity from one state to the other one is equal to λ. We observe that an arrow
enters every state s with s ≥ 1. These two arrows, one in-coming and one exiting,
correspond to two terms, one with plus sign and one with minus sign, on the right-
hand side of the differential equation. For s = 0 there is just an out-coming arrow,
corresponding to the single term, with minus sign, on the right-hand side of the
differential equation.
If we indicate with Ps (t) = P(Nt = s) the probability that Poisson process at
time t is in the state s, then we have
Ps (t) = ρs̄ ps̄,s (t) ,
s∈N
where ρs is the initial distribution. It follows that for every initial distribution the
functions (Ps (t))s∈N satisfy the same system of differential equations.
P0 (t) = −λP0 (t)
Ps (t) = −λPs (t) + λPs−1 (t) for s ≥ 1.
The functions ( ps̄,s (t))s∈N can be considered as particular cases in which ρs̄ = 1
and ρs = 0 for s = s̄.
We now consider some examples of continuous time Markov chains that serve as
models of queueing processes. As we have said in Sect. 7.1, in queueing theory there
is a symbolic notation to indicate the type of a queueing system. In the examples we
consider the flow of incoming clients follows a Poisson process with parameter λ.
Clients who find a free server start a service time and after service leave the system.
When an arriving client finds all servers engaged, he is put in a queue. When a server
becomes free, if there are clients waiting in queue, one of them starts its service time.
For what we are interested in, the order in which clients access the service does
not matter; we can assume, for example, that the order is randomly chosen, but
other possible choices would not change the results. We assume that service times
7.4 Queueing Processes 95
We consider an idealized situation in which there are infinitely many servers. The
flow of arrivals is ruled by a Poisson process with parameter λ and service times are
exponentially distributed with parameter μ.
Let X = (X t )t≥0 be the process indicating the number of clients who are in the
system at time t. As initial distribution we assume that:
P(X 0 = 0) = 1 ,
P(X 0 = i) = 0 for i > 0,
i.e. no client is present in the system at time 0. As stated in previous section, ser-
vice times are stochastically independent between themselves and from the arrivals’
process. In order to compute the intensity of service process, we obtain the probabil-
ity that a client is served in time interval (t, t + h), given that he has not been served
up to time t. If T is service time for a client, we have:
P(t < T ≤ t + h)
P(T ≤ t + h|T > t) =
P(T > t)
e−μt − e−μ(t+h)
=
e−μt
−μh
= 1−e
= 1 − (1 − μh + o(h))
= μh + o(h),
where we have used first order expansion of the exponential e−μh = 1 − μh + o(h)
for small h. Assume that there are n clients in the system. If no one of them has been
served up to time t, the probability that at least one of them is served in time interval
(t, t + h) is then:
μ 2μ 3μ 4μ
where T1 , . . . , Tn denote the service times of the clients and we have used the fact
that they are stochastically independent and identically distributed. Therefore a client
exits the system with an intensity which is proportional to the number of clients
present in the system. The process can be represented as in Fig. 7.2.
Putting p0,s (t) = μs (t), we can write forward Kolmogorov equations by using
the rule described in Sect. 7.3:
μ0 (t) = μμ1 (t) − λμ0 (t)
μi (t) = −(λ + iμ) μi (t) + λ μi−1 (t) + (i + 1)μ μi+1 (t)
By adding up the equations up to the i-th one, we obtain the recursive formula:
i
λ 1 λ
pi = pi−1 = p0 .
iμ i! μ
+∞
1 λ i
By imposing the condition p0 = 1, we obtain:
i=0
i! μ
+∞
1 λ i
p0 = 1.
i=0
i! μ
7.5 M/M/∞ Queueing Systems 97
+∞
1 λ i λ
Since = e μ , therefore
i=0
i! μ i
− μλ 1 λ λ
pi = e and pi = e− μ ,
i! μ
which is Poisson distribution with parameter μλ . We come to the conclusion that for
M/M/∞ the queueing system stationary distribution exists for all values of λ and μ.
Also for M/M/1 service times are assumed to be stochastically independent and
identically distributed with exponential distribution with parameter μ. The arrival
flow of clients is ruled by a Poisson process with parameter λ which is stochastically
independent from service times.
For this system there is just one server. Therefore the intensity for a client to
exit the system is equal to μ independently from the number of clients present in
the system. M/M/1 queueing system can be graphically represented as shown in
Fig. 7.3.
The system of differential equations for the function μs (t) = ps̄,s (t), where s̄ is
some fixed state, is then:
μ0 (t) = μμ1 (t) − λμ0 (t)
μ1 (t) = −(λ + μ) μ1 (t) .
Also in this case we look for a stationary solution, i.e. such that μi (t) = 0 for i ∈ N
with μi (t) = pi , where pi is a probability distribution. We obtain then the system of
linear equations: ⎧
⎪
⎪ 0 = μ p1 − λ p0
⎪
⎨ 0 = −(λ + μ) pi + λ pi−1 + μ pi+1
+∞
⎪
⎪
⎪
⎩ pi = 1 .
i=0
0 1 2 3
μ μ μ μ
98 7 Continuous Time Markov Chains
From this system we obtain, by adding up the first n equations, the recursive relation
n
λ λ
pn = pn−1 = p0 .
μ μ
+∞
By imposing the condition i=0 pi = 1, we obtain
∞ i
λ
p0 = 1.
i=0
μ
λ
This series is convergent if < 1. In this case we get
μ
λ
p0 = 1 − .
μ
We finally consider M/M/n queueing systems with n ≥ 2, i.e. with a finite number
of servers larger than 1. From considerations similar to those developed for the
other cases we obtain the following system of equations for transition probabilities
μs (t) = ps̄,s (t), where s̄ is some fixed state:
⎧
⎪
⎪ μ0 (t) = μμ1 (t) − λμ0 (t)
⎪
⎪
⎪
⎪ μ1 (t) = −(λ + μ) μ1 (t) + λ μ0 (t) + 2μ μ2 (t)
⎪
⎪
⎨···
μn−1 (t) = −(λ + (n − 1)μ) μn−1 (t) + λ μn−1 (t) + nμ μn (t)
⎪
⎪
⎪
⎪ μn (t) = −(λ + nμ) μn (t) + λ μn−1 (t) + nμ μn+1 (t)
⎪
⎪
⎪
⎪ μ (t) = −(λ + nμ) μn+1 (t) + λ μn (t) + nμ μn+2 (t)
⎩ n+1
...,
7.7 M/M/n Queueing Systems 99
λ λ λ λ λ λ
μ 2μ 3μ nμ nμ nμ
Fig. 7.4 Scheme of a M/M/n queueing system with initial state in 0
where λ and μ are, as in previous cases, respectively the parameter of the Poisson
process ruling the arrival of clients and of the exponential distribution of service
times. The system is graphically represented in Fig. 7.4.
Let us now look for the stationary distribution by imposing μi (t) = 0 for all
i ∈ N. If we denote pi ≡ μi (t), we obtain the following system of linear equations:
⎧
⎪
⎪ 0 = μ p1 − λ p0
⎪
⎪ = 2μ p2 − λ p1
⎪
⎪ 0
⎪
⎪ · · ·
⎪
⎪
⎪ 0 = (n − 1)μ p
⎪
⎪
⎨ n−1 − λ pn−2
0 = nμ pn − λ pn−1
⎪
⎪ 0 = nμ pn+1 − λ pn
⎪
⎪
⎪
⎪ ···
⎪
⎪
⎪
⎪ +∞
⎪
⎪
⎪
⎩ pi = 1 .
i=0
λ
pi = pi−1 for i = 1, . . . , n;
iμ
λ
pi = pi−1 for i ≥ n + 1.
nμ
Therefore we have:
i
λ 1
pi = p0 for i = 0, . . . , n,
μ i!
i
λ 1
pi = p0 for i ≥ n + 1.
μ n!n i−n
n−1 i
∞
λ 1 λ i 1
+ < +∞.
i=0
μ i! i=n μ n!n i−n
100 7 Continuous Time Markov Chains
The first term on the left-hand side is a finite sum. The series of the second term can
be rewritten by putting j = i − n as
n ∞
1 λ λ j
.
n! μ j=0
nμ
λ
The condition of convergence is therefore < 1, i.e. λ < nμ. This result answers
nμ
the problem of how many servers are needed for a queueing system with some fixed
Poisson flow of incoming clients so that the queue stabilizes (so that a stationary
distribution exists). For λ < nμ we have:
n−1 −1
λ i 1 1 λ n 1
p0 = + λ
(7.3)
i=0
μ i! n! μ 1 − nμ
i
λ 1
pi = p0 for i = 1, . . . , n, (7.4)
μ i!
i
λ 1
pi = p0 for i ≥ n + 1. (7.5)
μ n!n i−n
For Markov queueing systems introduced in the previous sections the existence of
an invariant distribution allows us to consider a stationary regime for the process X
representing the number of clients present in the system. In the stationary regime
probabilistic characteristics of the process don’t vary in time. The stationary regime
is obtained by taking as initial distribution the stationary distribution.
It can be shown that, when a stationary distribution exists, these queueing systems
evolve towards stationary regime and moreover temporal averages of observables
tend, as the length of the temporal interval tends to infinity, to the expectations of
the observables computed in stationary regime. All this should be precisely stated
and supported with proofs. We limit ourselves to accept it and to reason at intuitive
level. We now consider some quantities or observables, which are relevant for the
study of queueing systems and their efficiency, and establish some useful relations.
From now on we shall always refer to queueing systems in stationary regime.
In order to evaluate the efficiency of a queueing system, we introduce the utiliza-
tion factor ρ. This quantity is defined as the client’s average arrival rate λ times the
average service time T̄ divided by the number m of servers. It can be shown that the
utilization factor is equal to the average percentage rate of utilization of servers. For
a non-deterministic system in stationary regime it is known that ρ < 1, see also [12],
7.8 Queueing Systems in Stationary Regime and Little’s Formulas 101
i.e. that with probability one servers do not work full time. A server will be free for
a positive percentage of time. Other interesting quantities are:
1. the average number L of clients present in the system;
2. the average number L q of clients waiting in queues;
3. the average time W that a client spends in the system;
4. the average time Wq that a client spends waiting in queues.
The last two quantities are related by the equation
W = Wq + T̄ ,
We have therefore
∞
∞ k
λ λ λ
L= kρk = k 1− =
k=1 k=1
μ μ μ − λ
and
∞
∞ k
λ λ λ2
Lq = (k − 1)ρk = (k − 1) 1− = .
k=2 k=1
μ μ μ(μ − λ)
1 λ
W = , Wq = ,
μ−λ μ(μ − λ)
102 7 Continuous Time Markov Chains
1
that satisfy the equation W = Wc + T̄ , where T̄ = (expectation of exponential
μ
distribution with parameter μ).
λ
In this case the utilization factor is . We observe that, as ρ tends to 1, the average
μ
number of clients present in the system and waiting in queue, as well as the average
time spent by a client in the system, all tend to infinity. This is a general characteristics
of random queueing systems. If one tries to increase utilization factor, one has to pay
the price of an increase of the number of clients in queue and of their typical waiting
times. Value 1 for the utilization factor is not reachable by a random queueing system
in stationary regime, but it can be obtained by a deterministic system with one server
where clients arrive at regular time intervals equal to the service time.
Chapter 8
Statistics
We now introduce some basic notions in Bayesian statistics. For further reading, we
refer to [5, 9, 10].
Assume that we know the value xi of some characteristics, for example the height for
every individual i of a population i = 1, . . . , N . We can then build up a cumulative
distribution function F(x) defined by
{i| xi ≤ x}
F(x) = .
N
F(x) can be interpreted as the c.d.f. of a random number X , where X is the height of
an individual randomly chosen from the population (every individual is chosen with
equal probability N1 ). Some relevant quantities can be extracted from F(x), such as
the expectation, the variance, the median and others.
F(x) (called empirical c.d.f.) will always be of discrete type, but for large N it is
possible that it is well approximated by an absolutely continuous c.d.f. Similarly for
two quantities xi , yi for example height and weight relative to each individual, we
can obtain the joint c.d.f. F(x, y) defined by
{i| xi ≤ x, yi ≤ y}
F(x, y) = .
N
F(x, y) is the joint c.d.f. of the random vector (X, Y ), where X and Y are respectively
the height and the weight of a randomly chosen individual in the population. Also
in this case relevant indices such as covariance, correlation coefficient, etc. can be
extracted from F(x, y). The study of empirical c.d.f.’s is part of descriptive statistics
and is obviously related to the study of probability distributions.
Often the data about the entire population we are interested in are not available. In
this case one tries to form an evaluation of the distributions of quantities in the whole
population starting from results obtained by sampling (that is by randomly extracting
a subset of individuals of the population). These methods are part of what is called
statistical inference or statistical induction, in the Bayesian approach, that we shall
follow in this chapter. They are an application of Bayes’ Formula and therefore are
part of Probability Theory. We deal here just with a few relevant examples in which
a model based on some distribution is assumed to be fixed and one makes inference
on one or a certain number of unknown parameters, that in Bayesian approach are
treated as random numbers.
f (x, y)
f Y |X (y|x) =
f X (x)
and
f (x, y) = f X |Y (x|y) f Y (y),
we get
f X |Y (x|y)
f Y |X (y|x) = f Y (y) .
f X (x)
This formula is applied to statistical inference in the Bayesian approach that will be
treated in following sections.
P(E i = 1|Θ = θ) = θ
n
P(E 1 = 1 , . . . , E n = n |Θ = θ) = P(E i = i |Θ = θ)
i=1
Let Θ have an a priori probability density. We want to find out how the distribution
of Θ changes after n experiments are performed. Assume that the results are E 1 =
1 , . . . , E n = n . The conditional density of Θ given E 1 = 1 , . . . , E n = n is
denoted by
πn (θ|E 1 = 1 , . . . , E n = n )
and it is called a posteriori density. By the composite probability law we have, given
0 ≤ a < b ≤ 1 that
Therefore we have
πn (θ|E 1 = 1 , . . . , E n = n )
1
= π0 (θ)θ1 +···+n (1 − θ)n−1 −···−n
c
for 0 ≤ θ ≤ 1 where
1
c = P(E 1 = 1 , . . . , E n = n ) = θ1 +···+n (1 − θ)n−1 −···−n π0 (θ) d(θ).
0
n
n
α = α + i and β = β + n − i
i=1 i=1
n n
where i=1 i and n − i=1 i are respectively the number of events that have and
have not taken place. Therefore
⎧ Γ (α +β ) α −1
⎨ Γ (α ) Γ (β ) θ (1 − θ)β −1 θ ∈ [0, 1],
πn (θ|E 1 = 1 , . . . , E n = n ) =
⎩
0 otherwise.
8.4 Statistical Induction on Expectation of Normal Distribution 107
πn (θ|x1 , . . . , xn ) :=
n
= K π0 (θ) f (xi |θ)
i=1
(θ − μ0 )2
n
−
2 (xi − θ)2
= Ke 2σ0 exp −
i=1
2σ 2
n
1 1 n μ0 i=1 x i
= K exp − + 2 θ − 2θ
2
+
2 σ02 σ σ02 σ2
1 (θ − m n ) 2
= K exp − ,
2 σn2
where
108 8 Statistics
n
μ0 i=1 x i
+
σ02 σ2 1 n −1
mn = , σn2 = +
1 n σ02 σ2
+ 2
σ0
2 σ
x1 +···+xn
and K is the normalizing constant. If x̄ denotes the sample average x̄ = n
,
the a posteriori distribution of Θ is Gaussian
μ0 σ0−2 + x̄nσ −2 1
N , .
σ0−2 + nσ −2 σ0−2 + nσ −2
i=1
2 i=1
n nS 2 φ
= K φ exp −
2 ,
2
where
n
i=1 (x i − μ)2
S :=
2
n
8.5 Statistical Induction on Variance of Normal Distribution 109
is the average of the squares of the deviations of the xi ’s from μ. If we assume that
the a priori distribution of Φ is Γ (α0 , λ0 ), then the a posteriori density of Φ, given
that X 1 = x1 , . . . , X n = xn , is given by:
nS 2
2 +α0 −1
n
πn (φ|x1 , . . . , xn ) = K φ exp −φ(λ0 + )
2
Let us now consider the case of statistical induction on both expectation and variance
of a normal distribution. Assume that we are in a state of vague information, that, as
we have said, can be described by means of an improper distribution. We have now
two unknown parameters Θ and Φ, respectively the expectation and the precision,
that is the inverse of the variance. Since Φ can take only positive values, we con-
sider as a priori distribution an improper uniform distribution for Θ and log Φ. This
corresponds to the improper density:
Assume that we have a sequence of random numbers that are stochastically indepen-
dent conditionally on the event that Θ and Φ take some definite values θ and φ and
that their conditional density is:
1 1 φ
f (x|θ, φ) = √ φ exp − (x − θ) .
2 2
2π 2
110 8 Statistics
From joint a posteriori probability density of Θ and Φ we can get their marginal
densities by integrating with respect to the other variable. The integral with respect
to φ reduces to the integral of the gamma function. After collecting in the constant
K all factors that do not depend on θ, we obtain
+∞
K
πn (θ|x1 , . . . , xn ) = πn (θ, φ|x1 , . . . , xn )dφ = .
0 ( x̄ − θ) 2 + νs 2
x̄ − θ
From this it follows that the random number T = √ has Student t density with
s ν
ν degrees of freedom
− ν +2 1
t2
f T (t) = K 1 + .
ν
Φ is
+∞
ν νs 2 φ
πn (φ|x1 , . . . , xn ) = πn (θ, φ|x1 , . . . , xn )dθ = K φ 2 −1 exp − ,
0 2
with φ > 0. By making a linear change of variable we see that the random number
νs 2 Φ has a posteriori distribution with density
ν
u
K u 2 −1 exp − ,
2
Assume that we have two samples of size respectively n 1 and n 2 that, conditionally
on the knowledge that the parameters Θ1 and Θ2 are equal respectively to θ1 and
θ2 , are stochastically independent samples with Gaussian distribution N (θ1 , σ12 ) and
N (θ2 , σ22 ) respectively. If the a priori density of Θ1 and Θ2 is uniform improper,
Θ1 and Θ2 are stochastically independent a posteriori with Gaussian distribution
σ2 σ2
N (x̄1 , n1 ) and N (x̄2 , n2 ) respectively, where x̄1 , x̄2 are the sample averages of the
samples.
Indeed, since the samples are stochastically independent and Θ1 and Θ2 are sto-
chastically independent in the a priori distribution, we can separately apply to the
samples the results on the induction on the expectation of normal distribution in the
case of uniform improper a priori distribution. If we define Θ = Θ2 − Θ1 , then the
σ2 σ2
a posteriori distribution of Θ is in N (x̄2 − x̄1 , 2 + 1 ).
n2 n1
Let us now consider the case when there is an extra parameter Φ such that con-
ditionally on the knowledge that Φ = φ and Θ1 = θ1 , Θ2 = θ2 , the two sam-
ples are stochastically independent with distributions respectively N (θ1 , φ−1 ) and
N (θ2 , φ−1 ). The conditional probability densities of the random numbers of the first
and the second sample are then respectively
112 8 Statistics
1 1 φ
f 1 (x|θ1 , θ2 , φ) = √ φ 2 exp − (x − θ1 )2 ,
2π 2
1 1 φ
f 2 (x|θ1 , θ2 , φ) = √ φ 2 exp − (x − θ2 )2 .
2π 2
Also here we consider the case of improper a priori distribution and precisely we
assume that Θ1 , Θ2 , log Φ are stochastically independent with uniform improper
distribution on R. This corresponds for Θ1 , Θ2 , Φ to an a priori improper density
Consider first statistical induction for Φ. Here we can apply without any essential
change what we have seen about the induction for normal distributions with two
unknown parameters and obtain that the a posteriori density of Φ is given by:
2
ν1 +ν2
−1 s φ
Kφ 2 exp − ,
2
and xi, j is the j-th value of the i-th sample. By combining these results we can obtain
the a posteriori probability density of Θ = Θ2 − Θ1 in the case when Φ is unknown.
Indeed we have:
⎛ ⎞
2
φ ν1 + ν2 s φ
φ exp ⎝− (θ − (x̄2 − x̄1 ))2 ⎠ φ
1
−1
π(θ|x1 , x2 ) = K 2 2 exp − dφ
R+ 2 n11 + 1 2
n2
⎡ ⎤− ν 1 + ν 2 + 1
2
ν1 + ν2 + 1 (θ −
⎣ 2 ( x̄ − x̄ )) 2 ν1 + ν2 + 1
+ s2⎦ −1 −y
1
= K2 2 y 2 e dy
+
1
n1 + 1
n2
R
⎡ ⎤− ν1 + ν2 + 1
2
ν1 + ν2 + 1 ν1 + ν2 + 1 ⎣ (θ − (x̄2 − x̄1 ))2
= K2 2 Γ( ) + s2⎦ ,
2 1
+ 1 n1 n2
⎡ ⎤− ν1 + ν2 +1
2
ν1 +ν2 +1 ν1 + ν2 + 1 ⎣ (θ − (x̄2 − x̄1 ))2
= K (2/s 2 ) 2 Γ( ) + 1⎦
2 s2 1 + 1 n1 n2
8.9 Comparison of Expectations for Normal Distribution 113
x̄2 −x̄1))
(θ−( 2
where we have used the change of variable y = 1
2
+ s 2 φ to express
n +n
1 1
1 2
the integral in terms of a Gamma function. We obtain
⎡ ⎤− ν +2 1
(θ − ( x̄ − x̄ ))2
π(θ|x̄1 , x̄2 ) = K ⎣ + 1⎦
2 1
,
νs 2 n11 + n12
θ − (x̄2 − x̄1 )
T = 21 ,
s n11 + n12
Solution 9.1 1. The number of different ways a player can receive an handful of
13 cards is given by the simple combinations
52
.
13
Namely one has to choose 13 elements out of 52 without repetitions and without
taking in account of the order.
2. For the first player we have already computed the number of different ways she
can receive an handful of 13 cards. For the second player we can choose 13 cards
out of the 52 − 13 = 39 remaining ones. Analogously for the third player. The
fourth player receives the remaining 13 cards. The number of different ways in
which all 4 players receive cards all different in values is then
52 39 26 13 52!
= .
13 13 13 13 (13!)4
4 · 4 ·· · · · 4 = 4
13
13 times
different ways. If we consider all 4 players, we have that for the second player
the choices for each card will reduce to
3 · 3 ·· · · · 3 = 3
13
13 times
different ways. Then the 4 players receive cards all different in values in
different ways.
4. A player can receive flush number cards of the same sign in 4 different ways,
since there exist flush number cards of 4 different signs. If we consider 4 players,
the number of ways of assigning them flush number cards of the same sign is
given by the number of permutations of the 4 signs, i.e.
4 · 3 · 2 · 1 = 4!.
The number of ways in which a player can obtain at least 2 cards with equal
value is equal to
52
− 413
13
that is the number of all possible choices minus the number of ways of obtaining
an handful of all different cards.
Exercise 9.2 At the ticket counter of a theatre there are available tickets with num-
bers from 1 to 100. The tickets are randomly distributed among the buyers. Four
friends A, B, C, D buy separately a ticket each.
1. Which is the probability that they have received the tickets with numbers
31, 32, 33 and 34?
2. Which is the probability that they have received the tickets 31, 32, 33 and 34 in
this order?
3. Which is the probability that they have received tickets with 4 consecutive num-
bers?
4. Which is the probability that A, B, C receive tickets with a number greater than
50?
9 Combinatorics 119
favorable cases
.
possible cases
The possible cases are all the ways of choosing 4 numbers out of 100, i.e.
100
.
4
There exists only 1 favorable case, i.e. to choose the numbers 31, 32, 33 and
34. Hence the probability that the three friends have received the tickets with
numbers 31, 32, 33, 34 is given by
1
p = .
100
4
100!
D4100 = .
96!
The probability that the 4 friends receive the tickets 31, 32, 33, 34 in this order
is then
1 96!
p = 100 = .
D4 100!
100 − 3 = 97
different ways. We need also to consider the case {97, 98, 99, 100}. The proba-
bility of receiving 4 consecutive tickets is then
97 97!4!
= .
100 100!
4
4. The probability that A, B and C receive tickets with numbers greater than 50 is
50 49 48
p = .
100 99 98
For the first case the are 50 favorable cases (all tickets with number from 51
up to 100) out of 100. For the second ticket there are 49 possibilities out of the
99 tickets left. And so on.
120 9 Combinatorics
Exercise 9.3 A credit card PIN consists of 5 numbers. We assume that every
sequence of 5 digits is generated with the same probability. Compute :
1. The probability that the numbers composing the PIN are all different.
2. The probability that the PIN contains at least 2 numbers which are equal.
3. The probability that the numbers composing the PIN are all different if the first
digit is different from 0.
4. The probability that the PIN contains exactly 2 numbers which are equal, if the
first digit is different from 0.
Solution 9.3 1. A PIN differs from another one if the digits are in different order.
The possible cases are given by
105 .
10!
D510 = .
5!
The probability that the numbers composing the PIN are all different is then
D510
p1 = .
105
2. The probability that the PIN contains at least 2 numbers which are equal is
10!
p = 1 − p1 = 1 − ,
5! 105
where p1 is the probability that the numbers composing the PIN are different.
3. In this case the number of possible cases is
9 · 10 · 10 · 10 · 10.
For the first digit we have 9 possibilities (all numbers from 1 to 9). We need to
choose the remaining digits without repetitions and taking in account the order:
we have D49 ways. The number of favorable cases is then
9 · 9 · 8 · 7 · 6 = 9 · D49 .
The probability that the numbers composing the PIN are all different if the first
digit is different from 0 is then
9 · D49 D9
= 44 .
9 · 10 4 10
9 Combinatorics 121
9 · 104 .
In order to compute the number of ways in which the PIN contains exactly 2
numbers which are equal, if the first digit is different from 0, we can proceed as
follows:
(a) For the digit that is repeated: without loss of generality we can think that
it is equal to the first digit in the string. There are 9 ways of choosing it
(remember: the 0 is now excluded).
(b) We choose the place of the repeated digit in the string: there are
4
1
D39
Totally we have
4
9 · 9 · D2 ·
8
.
2
Exercise 9.4 Four fair dice are thrown at the same time. Their faces are numbered
from 1 to 6. Compute:
(a) The probability of obtaining four different faces.
(b) The probability of obtaining at least 2 equal faces.
(c) The probability of obtaining exactly 2 equal faces.
(d) The probability that the sum of the faces is equal to 5.
(e) We throw only 2 dice. Compute the probability that the sum of the faces is an
odd number.
favorable cases
p= . (9.1)
possible cases
possible cases = 6 · 6 · 6 · 6 = 64 .
D46 5
P(all the thrown dies have different faces) = = .
64 18
(b) The probability of obtaining at least 2 equal faces can be computed by using the
probability obtained above, since:
c) Also in this case we use the string rule as in Exercise 9.3. The number of the
ways of obtaining exactly 2 equal faces is then:
4
· 6 · D25 ,
2
where
4
= ways of choosing 2 dice with equal faces,
2
6 = ways of choosing the face which is repeated,
D25 = ways of choosing the remaining faces.
Recall that the remaining faces must be different among each other and with
respect to the one which is repeated.
(d) In order to have the sum of the faces equal to 5, the only possibility is that 3
faces present the number 1 and one the number 2, since we are dealing with 4
dice. We compute first the favorable cases. After having chosen the places for
the number 1, it remains only one possibility for the number 2, i.e. we have
4
·1=4
3
favorable cases. The possible cases are given by 64 ways of having a configuration
of 4 dice. Hence the probability that the sum of the faces is equal to 5 is given
by
4
p = 4.
6
(e) The sum of the faces is odd if one of the faces presents an odd number and the
other one an even number. Hence
2
favorable cases = · 3 · 3 = 18,
1
2
where counts the number of ways for a die to come out with an even face.
1
Hence
2 · 32 1
P(the sum of the faces is given by an odd number) = = .
62 2
More simply, one can consider that the sum of the faces can be either odd or
even. Hence:
124 9 Combinatorics
possible cases = 2
favorable cases = 1,
and consequently
1
P(the sum of the faces is given by an odd number) = .
2
Exercise 9.5 Two factories A and B produce garments for the same trademark Y .
For the factory A, 5 % of the garments present some production defect; for the
factory B, 7 % of the garments present some production defect. Furthermore 75 %
of the garments sold by Y derive from the the factory A, while the remaining 25 %
comes from the factory B. We suppose that a garment is chosen randomly with equal
probability among all the garments on sale. Compute:
1. The probability of purchasing a garment of the trademark Y which presents some
production defect.
2. The probability that the garment comes from the factory A, subordinated to the
fact that it presents some production defect.
5 75 7 25 11
= + = .
100 100 100 100 200
9 Combinatorics 125
2. The probability that the garment comes from the factory A, if it presents some
production defect, is given by:
P(D|A) P(A) 15
P(A|D) = = .
P(D) 22
90 75 60 81
In this case P B̃ = = , from which
100 100 100 200
81 119
P(B) = 1 − = .
200 200
2. Let O be the event
O = {the pupil wears glasses}.
The probability of O can be computed by using the formula of the total probability,
since we do not know which school the pupil belongs to. We set
• E = {the pupil belongs to school E};
• M = {the pupil belongs to school M};
• S = {the pupil belongs to school S}.
We then have:
Note that we have assumed that each school can be picked up with the same
probability.
3. The probability that the pupil belongs to school E, if she wears glasses, can be
computed by using Bayes’ formula:
P(O|E) P(E) 2
P(E|O) = = .
P(O) 15
Chapter 10
Discrete Distributions
Exercise 10.1 Two friends A and B are playing with a deck of cards consisting of
52 cards, 13 for each sign. They choose out 2 cards each. Player A starts. In order to
win, the player has to be the first to extract the ace of spade or 2 cards of diamonds.
After having chosen the 2 cards, they put the 2 cards back in the deck and mix it.
Compute the probability that:
(a) Player A wins after 3 trials (i.e. after each player has done 2 extractions).
(b) Player A wins, player B wins, nobody wins.
(c) Let T be the random number representing the number of the trial, when one of
the player first wins. Compute the expectation of T .
(d) Which is the probability distribution of T ?
Solution 10.1 (a) The trials of the 2 players can be represented as a sequence of
stochastically independent and equally distributed random trials. The probability
that player A wins after 3 trials (i.e. after each player has done 2 extractions) is
then equal to the probability of first success after
2+2+1=5
trials. The player A wins if she extracts the ace of spade or 2 cards of diamonds.
The probability of this event is given by
13
51 2
p= + (10.1)
52 52
2 2
where we have used the fact that the events are incompatible and that:
(b) If A wins, the game stops with an odd trial. The probability that A wins is then
∞
P(A wins) = P(T = 2k + 1)
k=0
∞
1
= p(1 − p)2k = p .
k=0
1 − (1 − p)2
If B wins, the game stops with an even trial. The probability that B wins is then
∞
P(B wins) = P(T = 2k)
k=1
∞
= p (1 − p)2k−1
k=1
p 1
= − 1
1 − p 1 − (1 − p)2
p (1 − p)2
=
1 − p 1 − (1 − p)2
p(1 − p) 1− p
= = .
1 − (1 − p) 2 2− p
(c)–(d) The random number T that represents the time when the game is decided,
has a geometric distribution of parameter p since it denotes the first time of
success in a sequence of stochastically independent and identically distrib-
uted trials. Hence the expectation of T is given by:
52
1 2
P(T ) = = .
p 13
1+
2
To compute the variance, we use the formula of the variance of the sum
σ 2 (X + Y ) = σ 2 (X ) + σ 2 (Y ) + 2 cov(X, Y ) .
cov(X, Y ) = 0 .
Hence
σ 2 (X + Y ) = σ 2 (X ) + σ 2 (Y ) = μ + λ .
130 10 Discrete Distributions
I (Z ) = N = {inf(X ) + inf(Y ), . . . } .
{Z = i} = {X = 0, Y = i} + {X = 1, Y = i − 1} + · · · + {X = i, Y = 0}
i
= {X = k, Y = i − k},
k=0
i
P(Z = i) = P(X = k, Y = i − k) .
k=0
so that
i
P(Z = i) = P(X = k) P(Y = i − k)
k=0
i
μk σ (i−k) −σ
= e−μ e
k=0
k! (i − k)!
e−(μ+σ)
i
i!
= μk σ (i−k)
i! k=0
k! (i − k)!
(μ + σ)i −(μ+σ)
= e ,
i!
where we have used Newton’s binomial formula. Therefore Z has Poisson distri-
bution with parameter μ + λ.
4. In order to compute the covariance between Z and X , we proceed as follows:
We now compute P(u X ) by using the formula for the expectation of a function
of X :
+∞
P(u X ) = u i P(X = i)
i=0
+∞
(uμ)i
= e−μ
i=0
i!
(u−1) μ
=e ,
+∞ i
x
= ex .
i=0
i!
It follows that
Since the generating function uniquely identifies the distribution, this proves that
Z has Poisson distribution with the parameter μ + σ.
Exercise 10.3 In a small village with 200 inhabitants, 5 inhabitants are affected by
a particular genetic disease. A sample of 3 individuals is chosen randomly among
the population (all subsets have the same probability of being chosen). Let X be the
number of individuals in the sample who are affected by the disease.
1. Determine the set I (X ) of possible values for X .
2. Determine the probability distribution of X .
3. Compute the expectation and the variance of X .
132 10 Discrete Distributions
Solution 10.3 1. The possible values of X are 0, 1, 2 and 3, i.e. the minimum
number of people affected by the disease in the sample is 0 and the maximum
number is 3.
2. Consider the event {X = i}, i ∈ I (X ). To determine the probability distribution
of X , we need to compute
P(X = i), i ∈ I (X ) .
favorable cases
.
possible cases
We obtain
5 195
i 3−i
P(X = i) = .
200
3
3
P(X ) = i P(X = i)
i=0
1 195 195
= 5 + 20 + 30
200 2 1
3
3
= .
40
10 Discrete Distributions 133
σ 2 (X ) = P(X 2 ) − P(X )2
3
and to compute P(X 2 ) = i 2 P(X = i).
i=0
Exercise 10.4 At a horse race there are 10 participants. Gamblers can win if they
correctly predict the first 3 horses in order of arrival. We suppose that all the orders
have the same probability of occurrence and that the gamblers choose independently
of each other and with the same probability the 3 horses on which to bet.
1. Compute the probability that one of the gamblers wins.
2. If the gamblers are 100 in total, let X be the random numbers counting the number
of gamblers who win. Determine I (X ) and P(X = i) for i = 1, 2, 3.
3. Compute expectation and variance of X .
4. Suppose that the gamblers are numbered from 1 to 100. Compute the probability
that there is at least one winner and that the winner with the minimal number has
a number greater or equal to 50.
Solution 10.4 1. The probability that a gambler wins can be computed with
the formula
favorable cases
.
possible cases
In this case, the possible cases are given by the simple dispositions of 3 elements
out of 10. They represent the number of ways of assuming the first 3 positions for
the 10 horses. Only one is the winning triplet, hence the probability of winning
for a gambler is given by
1 7! 1
p = = = .
D310 10! 720
2. If X is the random numbers counting the number of gamblers who win, we can
write
X = E 1 + E 2 + · · · + E 100 ,
where the event E i is verified if the i-th gambler wins. The events E i , i =
1, . . . , 100, are stochastically independent and identically distributed since the
gamblers choose independently of each other and with the same probability the 3
horses on which to bet. Hence X has binomial distribution Bn(n, p) of parameters
1
n = 100 and p = . The set of possible values is then
720
134 10 Discrete Distributions
I (X ) = {0, 1, . . . , 100}
and
i 100−i
100 1 1
P(X = i) = 1− .
i 720 720
In particular, we obtain:
i=1 99
1 719
P(X = 1) = 100 · · .
720 720
i=2 2
100 1 719 98
P(X = 2) = · .
2 720 720
i=3 3
100 1 719 97
P(X = 3) = · .
3 720 720
σ 2 (X ) = σ 2 (E 1 + · · · + E 100 )
100 100
= σ 2 (E i ) + cov(E i , E j )
i=1 i, j=1
0
1 1
= 100 · · 1− .
720 720
with number from 50 to 100 wins. The probability that there is at least one winner
and that the winner with the minimal number has a number greater than or equal
to 50 is then
719 49 719 51
P(E F) = P(E)P(F) = 1− ,
720 720
and F
where P(F) = 1 − P( F) is the event that no gambler with number from
50 to 100 wins.
Exercise 10.5 In an opinion poll 100 people are asked to answer a questionnaire
with 5 questions. Each question can be answered only yes or no. For each person the
probability of all possible answers is the same and their choices are stochastically
independent. Let N be the number of interviewed people that answer yes to the first
questions or answer yes at least to 4 questions.
1. Which is the probability distribution of N ?
2. Compute the expectation, the variance and the generating function of N .
Solution 10.5 1. Let E i be the event that the i-th interviewed person has answered
yes to the first questions or yes at least to 4 questions. We can rewrite N as
N = E 1 + E 2 + · · · + E 100 .
The events are stochastically independent and identically distributed since every
person answers independently of the other ones. Furthermore we have assumed
that the probability of all possible answers is the same. It is sufficient to compute
the probability of each E i . We put:
• Fi = {the i-th interviewed person answers yes to the first question};
• G i = {the i-th interviewed person answers yes at least to 4 questions}.
We obtain that E i = Fi ∨ G i e
In the case the events happen at the same time, we need to choose only the
other 2, respectively 3 questions to which the candidate answer yes.
Finally
1 5 1 1 3 1 1 1 5
P(E i ) = + + 5− 5− 5 = 2+ 4 = .
4 4 2 5 2 2 2 2 2 16
5
i.e. N has binomial distribution Bn(100, 16 ).
2. The expectation of N is given by
100
5 125
P(N ) = iP(E i ) = 100 · = .
i=1
16 4
100
100
σ 2 (N ) = σ 2 (E i ) + cov(E i , E j )
i=1 i, j=1
0
5 5
= 100 · · 1− .
16 16
φ N (t) = P(t N )
100
i
100 5t 5 100−i
= · · 1−
i 16 16
i=0
100
5t 5
= +1− ,
16 16
Exercise 10.6 A box contains 8 balls: 4 white and 4 black. We draw 4 balls. Let E i
be the event that the i-th ball extracted is white. Let X = E 1 + E 2 , Y = E 3 + E 4 .
(a) Compute the joint distribution of X and Y .
(b) Compute P(X ), P(Y ), σ 2 (X ), σ 2 (Y ).
(c) Compute cov(X, Y ), the correlation coefficient ρ(X, Y ). Are X and Y stochas-
tically independent?
Solution 10.6 (a) Consider the random vector (X, Y ). The set of possible values
for (X, Y ) is given by
for all (i, j) ∈ I (X, Y ). The probability of extracting a white ball in the first 2
extractions is given by
4 4
i 2−i
P(X = i) = .
8
2
8
Here the possible cases are since we consider only the first 2 extractions.
2
Moreover
4−i
4 − (2 − i)
j 2− j
P(Y = j| X = i) =
6
2
4−i 2+i
j 2− j
= .
6
2
After the first 2 extractions, only 6 balls are left in the box. We have to draw 2
more balls, j among the remaining white ones (4 − i) and (2 − j) among the
remaining black ones 4 − (2 − i) = 2 + i. The joint distribution of X and Y
is then
138 10 Discrete Distributions
4−i 2+i 4 4
j 2− j i 2−i
P(X = i, Y = j) = · .
6 8
2 2
(b) To compute P(X ) and P(Y ) we use the fact that the events E i have equal prob-
ability (but they are not stochastically independent!), hence
4
P(X ) = P(E 1 ) + P(E 2 ) = 2 · =1
8
and
P(X ) = P(Y ) = 1 .
σ 2 (X ) = σ 2 (E 1 + E 2 ) = σ 2 (E 1 ) + σ 2 (E 2 ) + 2cov(E 1 , E 2 )
1 1 1 13
= + − = .
4 4 28 28
13
Also in this case σ 2 (Y ) = σ 2 (X ) = .
28
(c) We have:
cov(X, Y ) = cov(E 1 + E 2 , E 3 + E 4 )
= cov(E 1 , E 3 ) + cov(E 1 , E 4 ) + cov(E 2 , E 3 ) + cov(E 2 , E 4 )
1 1
= 4· − =− .
28 7
Here we have used that fact that the covariance is a bilinear function. Finally,
the coefficient of correlation between X and Y is equal to:
1
cov(X, Y ) − 4
ρ(X, Y ) = = 7 =− .
σ(X )σ(Y ) 13 13 13
·
28 28
10 Discrete Distributions 139
Solution 10.7 (a) Since E 1 , E 2 are events, i.e. random numbers that can assume
only the values 0 and 1, we have that the set of possible values of X is given by
I (X ) = {0, 1, 2} .
Analogously for Y
I (Y ) = {0, 1, 2} .
P(X = 0) = P(E 1 + E 2 = 0)
9
= P(E 1 = E 2 = 0) = P( Ẽ 1 )P( Ẽ 2 ) = .
16
Since X is equal to the sum of 2 stochastically independent events with the
same probability, we can immediately say that the distribution of X is binomial
1
Bn(n, p) with parameters n = 2 and p = . Analogously Y has binomial
4
1
distribution Bn(2, ) and we have that
3
i 2−i
2 1 3
P(X = i) = , i = 0, 1, 2
i 4 4
j 2− j
2 1 2
P(Y = j) = , j = 0, 1, 2 .
j 3 3
P(X + Y ) = P(E 1 + E 2 + F1 + F2 )
= P(E 1 ) + P(E 2 ) + P(F1 ) + P(F2 )
1 1 7
= 2· +2· = .
4 3 6
For the variance, we use the formula of the variance of a sum:
σ 2 (X + Y ) = σ 2 (X ) + σ 2 (Y ) + 2 cov(X, Y ).
140 10 Discrete Distributions
1 3 3
σ 2 (X ) = 2 · · = ,
4 4 8
1 2 4
σ 2 (Y ) = 2 · · = .
3 3 9
To compute the covariance between X and Y we use the fact that the events
E 1 , E 2 , F1 , F2 are stochastically independent in the following way:
cov(X, Y ) = cov(E 1 + E 2 , F1 + F2 )
= cov(E 1 , F1 ) + cov(E 1 , F2 ) + cov(E 2 , F1 ) + cov(E 2 , F2 )
= 0.
Hence
3 4 59
σ 2 (X + Y ) = + = .
8 9 72
(c) To compute P(X = Y ) we note that the event
(X = Y )
is given by
(X = Y ) = (X = 0, X = 0) + (X = 1, Y = 1) + (X = 2, Y = 2) .
Hence
2
P(X = Y ) = P(X = i, Y = i)
i=0
2
= P(X = i)P(Y = i)
i=0
2 i 2−i i 2−i
2 1 3 2 1 2
=
i4 4 i 3 3
i=0
2 2
2 1 i 1 2−i
=
i 12 2
i=0
1 2 2 1 i
2
61
= = .
4 i=0 i 6 144
10 Discrete Distributions 141
(X = −Y )
is verified only if
(X = −Y ) = (X = 0, Y = 0) .
Hence
1
P(X = −Y ) = P(X = 0, Y = 0) = P(X = 0)P(Y = 0) = .
4
Chapter 11
One-Dimensional Absolutely Continuous
Distributions
Exercise 11.1 The random numbers X, Y and Z are stochastically independent with
exponential distribution of parameter λ = 2.
(a) Compute the density of the probability of X + Y and of X + Y + Z .
(b) Let E, F, G be the events E = (X ≤ 2), F = (X + Y > 2), G = (X + Y +
Z ≤ 3). Compute P(E), P(F), P(G) e P(E F).
(c) Determine if E, F and G are stochastically independent.
Solution 11.1 (a) The exponential distribution is a particular case of the gamma
distribution with parameter 1, λ. If X, Y and Z are stochastically independent
random numbers with exponential distribution of parameter λ = 2, i.e. Gamma
distribution Γ (1, 2), we can use the following property of the the sum of sto-
chastically independent random numbers with Gamma distribution
Hence W1 = X + Y has distribution Γ (2, 2). We can iterate this procedure and
obtain that
W2 = X + Y + Z = W1 + Z
+∞
P(F) = P(X + Y > 2) = 4xe−2x dx
2
+∞
+∞
= −2xe−2x 2 + 2 e−2x dx
2
and
Here we have used the fact that X and Y are assumed to be stochastically inde-
pendent, as well as that the product of 2 events denotes that both conditions must
be simultaneously satisfied.
(c) To determine if E, F, G are stochastically independent, we need to verify all the
following conditions:
P(E F) = P(E)P(F);
P(E G) = P(E)P(G);
P(F G) = P(F)P(G);
P(E F G) = P(E)P(F)P(G) .
If one of them is not verified, then the events are not stochastically independent.
We can immediately see that
P(E F) = P(E)P(F)
by using the results above. Hence the three events are not stochastically inde-
pendent.
Exercise 11.2 Let X be a random number with standard normal distribution. Let
Y = 3X + 2 and Z = X 2 .
1. Compute the c.d.f. and the density of Y .
2. Estimate P(Y ≥ y), where y > 0.
3. Compute the expectation and the variance of Z .
4. Compute the c.d.f. and the density of Z .
11 One-Dimensional Absolutely Continuous Distributions 145
y−2 y
y−2 3 1 (z−2)2
P X≤ = n(t)dt = √ e− 18 dz ,
3 −∞ −∞ 3 2π
z−2
where we have used the change of variable t = . The density f Y of Y is
3
obtained by the derivation of FY :
d 1 1 (y−2)2
f Y (y) = FY (y) = √ e− 2·9 .
dy 3 2π
σ 2 (Z ) = P(Z 2 ) − P(Z )2 .
It remains to compute
146 11 One-Dimensional Absolutely Continuous Distributions
Finally we get ⎧
⎪
⎪ 0 z < 0,
⎨
FZ (z) = √z
⎪
⎪ 1 t2
⎩ −√z √ e− 2 dt z ≥ 0.
2π
To compute the density f Z , we can take the derivative of the c.d.f.. For z ≥ 0
√ −√z
z
d 1 2
− t2 1 2
− t2
f Z (z) = √ e dt − √ e dt
dz −∞ 2π −∞ 2π
1 1 √ 1 1 √
= z − 2 n( z) − − z − 2 n(− z)
2 2
− 21 √
= z · n( z)
1
= z− 2 √ e− 2 .
1 z
2π
11 One-Dimensional Absolutely Continuous Distributions 147
We obtain ⎧
⎪
⎪ 0 z < 0,
⎨
f Z (z) =
⎪
⎪ 1
z − 2 e− 2
1 z
⎩√ z ≥ 0.
2π
Exercise 11.3 Let X be a random number with exponential distribution with para-
meter λ = 2.
1. Compute the moments of order n of X , i.e. P(X n ), n ∈ N.
2. Consider the family of random numbers Z u = eu X , u < λ. Given a fixed u < λ,
compute the expectation Ψ X (u) = P(eu X ) of Z u . The function Ψ X (u) is called
moment generating function of X .
Solution 11.3 1. The moment of order n ∈ N for X can be computed with the
formula
P(Ψ (x)) = Ψ (x) f X (x) dx ,
for a given function Ψ : R −→ R such that the integral above exists and is finite.
In this case Ψ (x) = x n . We then obtain
+∞
P(X n ) = x n λ e−λx dx
0
+∞
Γ (n + 1)
=λ x n e−λx dx = λ
0 λn+1
n!
= n.
λ
1
In particular for n = 1 we have that P(X ) =.
λ
2. We compute the expectations of Z u = e , u ∈ R.
uX
P(Z u ) = P(eu X )
+∞
= λ eux e−λx dx
0
+∞
= λ e(u−λ)x dx .
0
148 11 One-Dimensional Absolutely Continuous Distributions
Note that here u is a given parameter. The integral is well-defined since u < λ.
We obtain that
λ (u−λ)x +∞ λ
P(Z u ) = e = .
u−λ 0 u−λ
Exercise 11.4 The random number X has uniform distribution on the interval
[−1, 1].
(a) Write the density of X .
Let Z = log |X |.
(b) Compute I (Z ) e P(Z ).
(c) Compute the c.d.f. and the density of Z .
(d) Calculate P(Z < − 21 |X > − 21 ).
I (X ) = [−1, 1],
I (Z ) = (−∞, 0] .
1 1
= P(log X |X > 0) · + P(log(−X )|X < 0) · ,
2 2
where we have used the fact that
1
P(X > 0) = P(X < 0) = .
2
Verify this by direct computation!
11 One-Dimensional Absolutely Continuous Distributions 149
0
P(log(−X )|X < 0) = log(−x)dx (11.2)
−1
1
= log ydy = −1,
0
hence
P(Z ) = P(log X ) = −1 .
(c) To compute the c.d.f. of Z , we need to exclude again the value 0. We have
FZ (z) = P(Z ≤ z)
= P(Z ≤ z, X > 0) + P(Z ≤ z, X < 0) .
We now compute
and
Hence
FZ (z) = e z if z < 0 .
150 11 One-Dimensional Absolutely Continuous Distributions
1 1
(d) We evaluate P(Z < − |X > − ) by using the formula of the conditional prob-
2 2
ability:
1 1 P Z < − 21 , X > − 21
P Z < − X > − = ,
2 2 P X > − 21
where
1 1 1 1
P Z <− ,X >− = P log |X | < − , X > − =
2 2 2 2
1 1 1
P log X < − , X > 0 + P log(−X ) < − , − < X < 0 ,
2 2 2
It follows that
e− 21
1
P log X < − , X > 0 = P 0 < X < e− 2 =
1
2 2
and furthermore
1 1 − 21 1
P log(−X ) < − , − < X < 0 = P X > −e , − < X < 0
2 2 2
0
1 1 1
=P − <X <0 = dx = .
2 −2 2
1 4
Finally
1 1 1 1 1
P Z < − |X > − = √ + .
2 2 2 e 2
Chapter 12
Absolutely Continuous and Multivariate
Distributions
(a) Compute K .
(b) Compute the c.d.f., the expectation and the variance of X .
(c) Let Y be a random number which is stochastically independent and has expo-
nential distribution with parameter λ = 2. Write the joint density function and
the joint c.d.f. of (X, Y ).
Hence
1 3
K = 1 = .
x 2 dx 2
−1
Hence ⎧
⎪
⎪ 0 for x ≤ 1,
⎨ x
3 2 1 3
F(x) = t dt = (x + 1) for x ∈ [−1, 1]
⎪
⎪ −1 2 2
⎩
1 for x ≥ 1.
σ 2 (X ) = P(X 2 ) − P(X )2
1
3
= P(X ) =
2
x 2 · x 2 dx
−1 2
3 1 4 3
= x dx = .
2 −1 5
If X and Y are stochastically independent, then the joint density is given by the
product of the marginal densities:
⎧
⎪ 3
⎨ 2e−2y x 2 = 3e−2y x 2 for x ∈ [−1, 1] and y ≥ 0,
f (x, y) = f X (x)gY (y) = 2
⎪
⎩
0 otherwise.
Analogously the joint c.d.f. coincides with the product of the marginal distribu-
tion functions:
⎧
−2y x + 1
3
⎪
⎪
⎪ (1 − e )
⎪ for x ∈ [−1, 1] and y ≥ 0,
⎪
⎨ 2
F(x, y) = FX (x)FY (y) = −2y
⎪
⎪ 1−e for x > 1 and y ≥ 0,
⎪
⎪
⎪
⎩
0 otherwise.
12 Absolutely Continuous and Multivariate Distributions 153
Exercise 12.2 Let (X, Y ) be a random vector with uniform distribution on the disk
of radius 1 and center at the origin of the axes.
1. Compute the joint density function f (x, y) of (X, Y ).
2. What is the marginal density f X of X ?
1
3. Let Z = X 2 + Y 2 , compute P( ≤ Z ≤ 1).
4
4. Compute the c.d.f. and the density of Z .
D1 = (x, y) : x 2 + y 2 ≤ 1 .
i.e.
1 1 1
c = = = .
D1 dx dy area D 1 π
D1
O x
154 12 Absolutely Continuous and Multivariate Distributions
O x x
x O x
Summing up:
⎧ √ 2
⎨ 2 1−x
π
for x ∈ [−1, 1]
f X (x) =
⎩
0 otherwise.
3. Let Z = X 2 + Y 2 ; to compute P 41 ≤ Z ≤ 1 is equivalent to calculate the prob-
ability that the random vector (X, Y ) belongs to the region A of the plane between
the disk with center O and radius 21 and the disk with center O and radius 1, i.e.
1 1
P ≤Z ≤1 = P ≤ X2 + Y 2 ≤ 1 .
4 4
Hence
1
P ≤ X +Y ≤1 =
2 2
f (x, y) dx dy .
4 A
x = ρ cos θ, y = ρ sin θ .
Fig.
12.41 Area2 of the region
y
(x, y)| 4 ≤ x + y 2 ≤ 1
O x
156 12 Absolutely Continuous and Multivariate Distributions
It follows that
f (x, y) dx dy = f (ρ, θ) dρ dθ
A(x,y) A(ρ,θ)
2π 1
1 1 1 3
= dθ dρ = 2ρ dρ = ρ2 1 = .
0 1
2
π 1
2
2 4
FZ (z) = P(Z ≤ z)
= P(X 2 + Y 2 ≤ z)
= f (x, y) dx dy,
Dz
2 −x+2 2
y2 1
=k x dx = k x (2 − x)2 dx
0 2 0 0 2
2 2
k k 4 1 2
= (4x − 4x + x ) dx =
2 3
2x − x 3 + x 4
2
= k.
2 0 2 3 4 0 3
It follows that
3
k = .
2
1
2. The probability P(X > 1, Y < ) is given by the integral of the joint density on
2
the region D given by the intersection
1
D = {(x, y) ∈ R2 | x > 1, y < }∩T ,
2
see Figs. 12.5 and 12.6.
To find the extremes, it is easier this time to fix y and let x vary. The extremes
are given by the intersection of the border of D with the line passing in (0, y)
which is parallel to the x-axis, as we can see in Fig. 12.7.
158 12 Absolutely Continuous and Multivariate Distributions
O x
O x
O x
12 Absolutely Continuous and Multivariate Distributions 159
1
P(X > 1, Y < )= f (x, y) dx dy
2 D
1
−y+2
2 3
= y x dx dy
0 1 2
1 −y+2
2 3 2
= y x dy
0 4 1
1
3 2
= y (3 − 4y + y 2 ) dy
4 0
1
3 3 2 4 3 1 4 2
= y − y + y
4 2 3 4 0
43
= .
256
1
The conditional probability P(X > 1|Y < ) can be obtained as follows:
2
D1 = E 1 ∩ T ,
D1
O x
E1
Note that in this case X and Y are both positive, hence the condition X > −Y
reduces to X > 0. In Fig. 12.9 we represent the region where the integral of the
joint density of X, Y must be calculated to obtain P(0 < Z < 1).
O x
X+Y=1
12 Absolutely Continuous and Multivariate Distributions 161
1 1−y
3
P(0 < X < 1 − Y ) = y x dx dy
0 2 0
3 1
= y (1 − y)2 dy
4 0
3 1
= (y − 2y 2 + y 3 ) dy
4 0
3 1 2 2 3 1 4 1
= y − y + y
4 2 3 4 0
1
= .
16
O x
X+Y=Z
162 12 Absolutely Continuous and Multivariate Distributions
z z−y
3
P(Z < z) = y x dx dy
2
0
0
3 z
= y (z − y)2 dy
4 0
3 z 2
= (z y − 2zy 2 + y 3 ) dy
4 0
3 1 2 2 2 3 1 4 z
= z y − zy + y
4 2 3 4
0
3 1 4 2 4 1 4
= z − z + z
4 2 3 4
z4
= .
16
Summing up:
⎧
⎪ 0 for z < 0,
⎪
⎪
⎪
⎪
⎨ 4
z
FZ (z) = for 0 ≤ z ≤ 2,
⎪
⎪ 16
⎪
⎪
⎪
⎩
1 for z > 2 .
Exercise 12.4 Let X, Y be two random numbers with joint distribution function
K x for y ≤ x ≤ y + 1, 0 ≤ y ≤ 2,
f (x, y) =
0 otherwise.
(a) Compute K .
(b) Compute the m.d.f. and the expectation of X .
12 Absolutely Continuous and Multivariate Distributions 163
1 2
−1
Solution 12.4 (a) As in previous exercises, first we draw the picture of the region
R of definition of the joint density, as shown by Fig. 12.11.
Since the integral of a density must be equal 1, the constant of normalization is
given by
1
K = ,
R2 xdxdy
where
2 y+1
xdxdy = dy xdx
R2 0 y
2 y+1
x2
= dy
0 2 y
(y + 1)2
2
y2
= − dy
0 2 2
1 2
= (y + 1)3 − y 3 0 = 3 .
6
164 12 Absolutely Continuous and Multivariate Distributions
1
We conclude that K = .
3
(b) To compute the marginal density of X we apply the formula
f X (x) = f (x, y)dy .
R
y=0 e y=x.
y = x −1 e y = x.
y = x − 1 and y = 2.
12 Absolutely Continuous and Multivariate Distributions 165
Summing up:
⎧
⎪ 1 2
⎪
⎪ x for 0 < x < 1,
⎪3
⎪
⎪
⎪
⎪
⎪
⎪
⎪ 1
⎪
⎨ x for 1 < x < 2,
f X (x) = 3
⎪
⎪1
⎪
⎪
⎪
⎪ x(3 − x) for 2 < x < 3,
⎪
⎪
⎪
⎪ 3
⎪
⎪
⎩
0 otherwise.
Indeed
1 2 3
1 2 1 1
x dx + xdx + x(3 − x)dx =
0 3 1 3 2 3
2 3
1 3 1 1 2 2 x x3
= x + x + −
9 0 6 1 2 9 2
= 1.
where
P(X Y ) = x y f (x, y)dxdy =
R2
2 y+1 2
1 1
= dy x y xdx = y (y + 1)3 − y 3 dy
0 y 3 0 9
2
1 4 1 3 1 2 22
= y + y + y = .
12 9 18 0 9
Hence
P(Y ) = y f (x, y)dxdy
R
2 y+1
1
= dy x ydx
0 y 3
2
1
= y (y + 1)2 − y 2 dy
0 6
3 2
y y2 5
= + = .
6 12 0 3
12 Absolutely Continuous and Multivariate Distributions 167
R1
We obtain
22 11 5 11
cov(X, Y ) = − × = ,
9 12 3 12
i.e. X and Y are positively correlated.
(d) To compute P(0 < X − Y < 1), we note that
contains entirely the domain of definition of the density, see Fig. 12.13.
(a) Compute K .
(b) Compute the joint c.d.f., the expectation, the variance and the covariance of X
and Y .
168 12 Absolutely Continuous and Multivariate Distributions
(c) Let Z = X 2 . Compute the c.d.f., the expectation and the variance of Z .
(d) Compute the correlation coefficients ρ(X, Z ), ρ(X + Y, Z ).
(b) Since the random numbers X and Y are stochastically independent, their joint
c.d.f. is given by the product of the marginal c.d.f.’s:
It is sufficient to compute
x
F(x) = P(X ≤ x) = f (t)dt .
−∞
We obtain
⎧
⎪
⎪ 0
⎪
x < 1,
⎪
⎨
x
4 3 4 x4 3
(t − 1)dt = −x+ x ∈ [1, 2],
FX (x) = 11 11 4 4
⎪
⎪
1
⎪
⎪
⎩
1 x ≥ 2.
Hence
⎧
⎪
⎪ 0 for x < 1 or y < 1,
⎪
⎪
⎪
⎪ 4 2 4
⎪
⎪
⎪
⎪ (x − x + 43 )(y 4 − y + 43 ) for (x, y) ∈ [1, 2] × [1, 2],
⎪
⎪
11
⎨
F(x, y) = 4
(x 4 − x + 34 ) for x ∈ [1, 2], y > 2,
⎪
⎪ 11
⎪
⎪
⎪
⎪ 4 (y 4 − y + 3 )
⎪
⎪ for x > 2, y ∈ [1, 2],
⎪
⎪ 11 4
⎪
⎪
⎩
1 for x > 2, y > 2 .
cov(X, Y ) = 0 .
12 Absolutely Continuous and Multivariate Distributions 169
Hence
2
98 94
σ 2 (X ) = P(X 2 ) − P(X )2 = − .
33 55
98
P(Z ) = P(X 2 ) = .
33
170 12 Absolutely Continuous and Multivariate Distributions
Exercise 12.6 The random numbers X and Y are stochastically independent. The
probability density f X (x) of X is given by:
2x for 0 ≤ 1,
f X (x) =
0 otherwise,
1 1
P(Y ) = = 1, σ 2 (Y ) = = 1.
λ λ2
(b) The random numbers X and Y are stochastically independent, hence their joint
density is equal to
f (x, y) = f X (x) f Y (y),
i.e.
2xe−y for 0 ≤ x ≤ 1 and y ≥ 0,
f (x, y) =
0 otherwise.
after having identified the domain D of definition of the joint density as shown
in Fig. 12.14.
We obtain that
⎧ x y −t −y
0 0 2se dsdt = x (1 − e ) for 0 ≤ x ≤ 1 e y ≥ 0,
2
⎪
⎪
⎪
⎪
⎨
F(x, y) = 1 y −t −y
⎪ 0 0 2se dsdt = 1 − e for x > 1 e y ≥ 0,
⎪
⎪
⎪
⎩
0 otherwise.
0 1
(ii) the formula for the variance of the sum of 2 random numbers:
σ 2 (Z ) = σ 2 (X + Y ) = σ 2 (X ) + σ 2 (Y ) + 2cov(X, Y )
19
= σ 2 (X ) + σ 2 (Y ) = .
18
To compute the distribution function of Z = X + Y , we use the fact that
where for all fixed z, Dz is the region of the plane determined by the intersection
of the domain D of definition of the density and of the semi-plane
Figures 12.15 and 12.16 show the region intersected by Sz on D when z varies.
We obtain that:
(i) for z < 0, Fz (z) = 0;
(ii) for 0 < z < 1,
z z−x
FZ (z) = 2x e−y dydx
0 0
z
= 2x 1 − e−(z−x) dx = z 2 + 2(1 − z) − 2e−z ;
0
12 Absolutely Continuous and Multivariate Distributions 173
0 1
0 1
174 12 Absolutely Continuous and Multivariate Distributions
Exercise 12.7 The random numbers X and Y have bidimensional Gaussian density
1 − 1 (x 2 +y 2 )
p(x, y) = e 2 .
2π
Let U = 2X + 3Y and V = X − Y . Compute:
1. The covariance matrix of U and V .
2. The joint density of U and V .
Solution 12.7 1. We compute the covariance matrix of U and V :
⎛ ⎞
σ 2 (U ) cov(U, V )
C = ⎝ ⎠.
cov(U, V ) σ 2 (V )
In order to compute C we use the formula of the variance of the sum of 2 random
numbers and the bilinearity of the covariance:
• σ 2 (U )
σ 2 (U ) = σ 2 (2X + 3Y )
= 4 σ 2 (X ) + 9 σ 2 (Y ) + 2 × 6 cov(X, Y )
= 13 ;
• σ 2 (V )
σ 2 (V ) = σ 2 (X − Y )
= σ 2 (X ) + σ 2 (Y ) − 2 cov(X, Y )
= 2;
12 Absolutely Continuous and Multivariate Distributions 175
• cov(U, V )
2. To compute the joint density of (U, V ), we first compute the joint c.d.f. of (U, V )
given by
This probability is given by the integral of the joint density on the domain Du,v
of R2 where
Du,v = {(x, y) ∈ R2 | 2x + 3y ≤ u, x − y ≤ v} .
We obtain
F(u, v) = f (x, y) dx dy.
Du,v
z = 2x + 3y, t = x − y,
D̂u,v = {(x, y) ∈ R2 | z ≤ u, t ≤ v} .
with sides which are parallel to the axes. If we now compute x, y as function of
z and t, we obtain
1 1
x= (z + 3t), y= (z − 2t) .
5 5
It follows that the Jacobian matrix is equal to
⎛ ∂Ψ1 ∂Ψ1 ⎞ ⎛ 1 3
⎞
∂z ∂t 5 5
JΨ = ⎝ ⎠=⎝ ⎠,
∂Ψ2 ∂Ψ2
∂z ∂t
1
5
− 25
176 12 Absolutely Continuous and Multivariate Distributions
z + 3t z − 2t
where (x, y) = Ψ (z, t) = (Ψ1 (z, t), Ψ2 (z, t)) = , , with
5 5
determinant
1
|det JΨ | = .
5
We obtain:
F(u, v) = f (x, y) dx dy
Du,v
= f (ψ(z, t)) |det JΨ | dz dt
D̂u,v
u v
1 − 21 ( z+3t
5 ) +( 5 )
2 z−2t 2 1
= e dz dt
−∞ −∞ 2π 5
u v
1 − 1 · 1 (2z 2 +13t 2 +2zt )
= e 2 25 dz dt .
−∞ −∞ 10π
1. Compute k.
2. Compute the expectations P(X ), P(Y ) and P(Z ).
3. Compute the density of the random vector (X, Z ).
4. Compute the correlation coefficient between X and Z and between X and Y .
5. Let W = X + Z ; compute the probability density of W .
12 Absolutely Continuous and Multivariate Distributions 177
f (x, y, z) = k e− 2
1
Av·v+b·v
b is the vector in R3 ⎛ ⎞
−1
b = ⎝ 3 ⎠,
0
and v is given by ⎛ ⎞
x
v = ⎝y⎠ .
z
det A = 1,
⎛ ⎞
1 1 0
⎜ ⎟
⎜ ⎟
A−1 = ⎜ ⎜1 2 0⎟⎟
⎝ ⎠
0 0 1
from which ⎛ ⎞
2
A−1 b = ⎝ 5 ⎠
0
178 12 Absolutely Continuous and Multivariate Distributions
and
− 21 A−1 b·b det A 1
= e− 2
13
k=e .
(2π) 3 (2π)3
To prove this, we derive the joint density f X,Z (x, z) from f (x, y, z) as follows:
f X,Z (x, z) = f (x, y, z) dy
R
k e− 2 (2x −2x y+y 2 +z 2 +2x−6y)
1 2
= dy
R
= e− 2 (2x +z 2 +2x)
k e− 2 (y −2x y)+3y
1 2 1 2
dy
R
= e− 2 (2x +z 2 )−x
k e− 2 y +(3+x)y
1 2 1 2
dy .
R
A = 1 and b = 3+x,
12 Absolutely Continuous and Multivariate Distributions 179
1
− y 2 + (3 + x)y
2
in the integral I. It follows that
√
f X,Z (x, z) = k e− 2 (2x +z 2 )−x
2π e 2 (3+x)
1 2 1 2
·
e− 2 + 2 − 1 (x 2 +z 2 )+2x
13 9
= e 2
2π
e−2 − 1 (x 2 +z 2 )+2x
= e 2 .
2π
cov(X, Z ) = 0 ,
hence √
1 2
ρ(X, Y ) = √ √ = .
2 1 2
2 π
and variance
σ 2 (W ) = σ 2 (X ) + σ 2 (Z ) + 2cov(X, Z ) = 2 .
Chapter 13
Markov Chains
Solution 13.1 (a) To determine equivalence classes of the states, we can draw a
graph of the transition probabilities by using the matrix P. We first represent
the states (see Fig. 13.1) and then connect with an arrow two states such that
the transition probability from one to the other is strictly positive. For example,
since
3
[P]1,2 = ,
4
the chain has positive probability to go from the state 1 to the state 2 in one step.
We represent this on the graph by connecting 1 and 2 with an arrow going from
1 to 2, see Fig. 13.2.
1
Analogously, [P]1,1 = means that the chain has positive probability of remain-
4
ing in the state 1. This can be represented as illustrated in Fig. 13.3.
By using this procedure we can construct the graph of Fig. 13.4.
From the graph we deduce that all elements can communicate with each other,
i.e. there exists paths that connect each state to all the other ones with positive
probability. We conclude that there exists only one equivalence class [1].
Furthermore we can deduce from the graph that the period of the chain is 1, since
there exists a path of length 1 from state 1 to itself, i.e.
(n)
1 ∈ {n | p1,1 > 0}.
(2)
(b) To compute p2,1 , i.e. the probability of going in 2 steps from the state 2 to the
state 1, we write
(2)
(1) (1)
p2,1 = p2,i pi,1 .
i∈S
This formula shows how the probability of going in 2 steps from the state 2 to
the state 1 can be computed as the sum of the probabilities of all possible paths
from 2 to 1.
From the graph relative to the matrix P we obtain
(2)
Note that we can compute p2,1 by taking the product of the matrix column with
the matrix row
(2)
p2,1 = P2 · P 1
where P2 denotes the second row and P 1 the first column of the matrix P.
(2) (2)
Analogously we compute p1,4 and p1,1 .
(c) Since the chain is irreducible and aperiodic, the ergodic theorem guarantees the
existence of the limit
(n)
lim p1,3 = π3 ,
n→∞
π = tπ P
⎧
⎪
⎪ π1 = πi pi,1
⎪
⎪
⎪
⎪
⎪
⎪
⎨ π2 = πi pi,2
⎪
⎪
⎪ π3 = πi pi,3
⎪
⎪
⎪
⎪
⎪
⎩
π1 + π2 + π3 + π4 = 1.
In this case
⎧
⎪ π1 = 4 π1 + 4 π3
1 1
⎪
⎪
⎪
⎪
⎪
⎪
⎪ π2 = 43 π1 + 13 π4
⎪
⎪
⎪
⎨
⎪ π3 = 23 π2 + 43 π3
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ 4
⎪
⎪ πi = 1.
⎩
i=1
(n) 12
lim p1,3 = .
n→∞ 25
184 13 Markov Chains
4
1 (n)
4
P(X n = 2) = P(X n = 2|X 0 = i)P(X 0 = i) = p .
i=1
4 i=1 i,2
we have
1 (n)
4
1
lim P(X n = 2) = lim pi,2 = · 4π2 = π2 ,
n→∞ n→∞ 4 4
i=1
9
where π2 = .
50
Exercise 13.2 A Markov chain X n , n = 0, 1, 2 . . . with states
S = {1, 2, 3, 4, 5, 6}
Solution 13.2 1. As in the previous exercise, we draw the graph of the states as
in Fig. 13.5, in order to determine the equivalence classes of the states and their
periods. We connect the states with an arrow in the case there exists a positive
13 Markov Chains 185
1 2 3 4 5 6
probability to pass from the state, where the arrow starts, to the state where the
arrow ends.
By using the transition matrix P we obtain the graph shown in Fig. 13.6, where
we can see that there exists only one equivalence class. We note that the number
of steps needed to come back to the state from where we started is always even.
(2)
Furthermore p1,1 > 0, hence it follows that the period of the equivalence class is
2. Namely
2 = MCD A+
s
where A+ (n)
s = {n| ps,s > 0}.
2. To study the limits, we consider the equivalence classes of the matrix P 2 ; we obtain
two equivalence classes, each of period 1. To derive the equivalence classes, it
is not necessary to compute the whole matrix P 2 ; for example, the equivalence
class of 1 will be given by all states that communicate with 1 with an even number
of steps. We obtain
The state 5 belongs to the class [1] calculated with respect to P 2 . Hence we can
apply the ergodic theorem to that class, since it has period 1 with respect to P 2 .
The submatrix of P 2 relative to [1] is given by:
⎛ 5 9 2
⎞
18 18 9
⎜ ⎟
⎜ ⎟
⎜ 1 11 2⎟
.
⎜ 6 18 9⎟
⎝ ⎠
1 1 1
6 2 3
We obtain
3 9 1
π1 = π3 = π5 = .
16 16 4
Hence
(2n) 1
lim P1,5 = .
n→∞ 4
(n)
To obtain the asymptotic behavior of p3,5 for n going to infinity, we note that
(a) on the even steps, i.e. for n = 2k, we have
(2k)
p3,5 −−−→ π5 ;
k→∞
since the probability of going from the state 3 to the state 5 in an odd number of
steps is zero. Since the limit on two subsequences is different, we can conclude
that
(n)
lim p3,5
n→∞
13 Markov Chains 187
6
P(X n = 5) = P(X n = 5|X 0 = i) P(X 0 = i)
i=1
6
(n) 1 (n) 2 (n)
pi,5 μi = p + p .
i=1
3 1,5 3 2,5
2 (1) (2k)
6
1 (2k+1) 2 (2k+1) 2 (2k+1)
p1,5 + p2,5 = p2,5 = p p ,
3 3 3 3 i=1 2,i i,5
2 6
(1) 2 (1)
that tends to π5 p2,i = π5 for k → ∞, since we have that p2,i = 0
3 i=1
3
(2k)
for the i such that we have lim pi,5 = π5 .
k→∞
lim P(X n = 5)
n→∞
2
P(X 2 < 3) = P(X 2 = i),
i=1
6
P(X 2 = 1) = P(X 2 = 1|X 0 = i) P(X 0 = i)
i=1
6
(2)
= pi,1 μi
i=1
1 (2) 2 (2) 5
= p1,1 + p2,1 =
3 3 54
and
6
P(X 2 = 2) = P(X 2 = 2|X 0 = i) P(X 0 = i)
i=1
6
(2)
= pi,2 μi
i=1
1 (2) 2 (2) 2
= p1,2 + p2,2 = .
3 3 9
17
Finally, P(X 2 < 3) = .
54
Exercise 13.3 A Markov chain X n , n = 0, 1, 2 . . . with states
S = {1, 2, 3, 4}
1 2 3 4
Solution 13.3 1. To find the equivalence classes we draw the graph of the states
as shown in Fig. 13.7.
Since we can reach all other states by starting from the state 1 and the state 1
can be reached from all other states, there exists only one equivalence class.
Furthermore by starting from 1, we return to it always with only an even number
(2)
of steps and p1,1 > 0. We conclude that the period of the class is equal to 2.
2. To compute the conditional probability P(X 5 = 2|X 2 = 3) we use the fact that
the chain is homogeneous. It holds that
(3)
P(X 5 = 2|X 2 = 3) = p3,2 = [P 3 ]3,2 = 0.
To compute this probability, we need to calculate the element on the row 3 and
column 2 of the matrix P 3 , We can obtain this element by multiplicating the third
row of P 2 with the second column of P.
Anyways without any computation we can immediately state that the probability
of going from the state 3 to the state 2 in a odd number of steps is 0!
Furthermore
(2) 7
p1,4 = [P 2 ]1,4 = .
12
We now compute the expectation of the state of the chain at time t = 2 by using
the formulas of the expectation of a random number with discrete distribution
and of the total probabilities:
4
P(X 2 ) = i P(X 2 = i)
i=1
4
4
= i P(X 2 = i|X 0 = j) P(X 0 = j)
i=1 j=1
4
i
= (P(X 2 = i|X 0 = 1) + P(X 2 = i|X 0 = 2) + P(X 2 = i|X 0 = 3))
i=1
3
4
i 2
= [P ]1,i + [P 2 ]2,i + [P 2 ]3,i .
i=1
3
190 13 Markov Chains
3. We now compute the limits. The Markov chain observed on the even steps can
be considered as a Markov chain with transition matrix equal to P 2 . We can
immediately see that the state 3 cannot be reached from the state 1 with an even
number of steps. In fact, the equivalence classes relative to P 2 are given by
It follows that
(2n)
lim p1,3 = 0.
n→∞
The state 4 belongs to the equivalence class [1] relative to P 2 . This class has
period 1. Hence we can apply the ergodic theorem to this irreducible aperiodic
subchain to compute
(2n)
lim p1,4 .
n→∞
(2n) (2n)
If we put π4 = lim p1,4 and π1 = lim p1,1 , by the ergodic theorem we
n→∞ n→∞
obtain
⎧
⎨ π1 + π4 = 1
⎩ 5
12
π1 + 13
24
π4 = π1 .
13 Markov Chains 191
13 14
π1 = , π4 = .
27 27
It follows that
(2n) 14
lim p1,4 = .
n→∞ 27
(n)
To compute lim p2,3 we observe the behavior of the chain on the even steps and
n→∞
on the odd ones.
(a) First we note that 2 ∈ [3] relative to P 2 .
By the ergodic theorem we have that on the even steps (i.e. if n = 2k)
(2k)
p2,3 −−−→ π3
k→∞
(b) There exists no path with an odd number of steps from the state 2 to the state
3, hence
p2,3 (2k + 1) = 0 ∀ k.
(2k+1)
(2k)
p2,3 = p2, j p j,3 (1)
j
(2k) (2k)
= p2,1 p1,3 (1) + p2,4 p4,3 (1) = 0.
Summing up
(2k)
p2,3 −−−→ π3 > 0
k→∞
(2k+1)
p2,3 −−−→ 0.
k→∞
(n)
Hence the limit lim p2,3 does not exist.
n→∞
192 13 Markov Chains
4
P(X n = 2) = P(X n = 2|X 0 = i) P(X 0 = i)
i=1
1
= (P(X n = 2|X 0 = 1) + P(X n = 2|X 0 = 2) + P(X n = 2|X 0 = 3))
3
1 (n) (n) (n)
= p1,2 + p2,2 + p3,2 .
3
2
p2n −−−→ π2 ,
n→∞ 3
1
p2n+1 −−−→ π2 ,
n→∞ 3
Exercise 13.4 A Markov chain (X n )n ∈ N with states S = {1, 2, 3, 4, 5} has the fol-
lowing transition matrix
13 Markov Chains 193
1 2 3 4 5
⎛ 1 1 ⎞
0 2 2
00
⎜ ⎟
⎜1 1 ⎟
⎜ 0 0 0⎟
⎜2 2 ⎟
⎜ ⎟
⎜ 2 1 ⎟
P = ⎜ 0 3 0 3 0⎟
⎜
⎟
⎜ ⎟
⎜ ⎟
⎜ 0 0 2 0 1⎟
⎜ 3 3⎟
⎝ ⎠
2 1
3
00 3 0
Solution 13.4 (a) To determine the equivalence classes of the states and their peri-
ods we draw the graph of the states, see Fig. 13.8.
We first note that all the states comunicate among each other. Consider the set
(n)
A+
1 = {n| p11 > 0}
i.e. the set of the lengths of the paths that starts and ends in 1. We note that there
exists a path of length 2 (for example, from 1 to 2 and from 2 to 1) and of length
3 (from 1 to 3, from 3 to 2, from 2 to 1). We have
2, 3 ∈ A+
1.
d = MC D(A+
1 ),
194 13 Markov Chains
hence d must be equal to 1 since it must divide 2 and 3. We conclude that there
exists only one equivalence class of period 1.
(b) By the ergodic theorem it follows that all the limits exist since the chain has a
unique equivalence class of period 1. First we note that
(n) (n)
lim p1,5 = lim p3,5 = π5
n→∞ n→∞
and finally
5
lim P(X n = 5) = lim P(X n = 5|X 0 = i)P(X 0 = i)
n→∞ n→∞
i=1
5
(n)
= lim μ(i) pi,5
n→∞
i=1
5
= π5 μ(i) = π5 · 1 = π5 ,
i=1
(n)
5
since lim pi,5 = π5 , ∀i = 1, . . . , 5 and i=1 μ(i) = 1. To obtain πi , it is suf-
n→∞
ficient to solve the system
π = πt P
5
i=1 πi = 1
i.e. ⎧
⎪
⎪ π1 = 1
π
2 2
+ 23 π5
⎪
⎪
⎪
⎪
⎪
⎪ π2 = 1
π + 23 π3
⎪
⎪ 2 1
⎪
⎪
⎨
π4 = 1
π
3 3
+ 13 π5
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ π5 = 1
π
3 4
⎪
⎪
⎪
⎪
⎩
5
i=1 πi = 1.
7 14
π3 = e π5 = .
540 135
13 Markov Chains 195
Hence, summing up
(n) (n) 14
lim p1,5 = lim p3,5 = lim P(X n = 5) = π5 = ,
n→∞ n→∞ n→∞ 135
7 14 7
(n) (n)
lim p2,3 + p3,5 = π3 + π5 = + = .
n→∞ 540 135 60
(c) To compute the probabilities, we note that
5
P(X 2 = 5) = P(X 2 = 5|X 0 = i)μ(i)
i=1
2 (2) 1 (2)
= p + p
3 2,5 3 3,5
2 1
= [P 2 ]2,5 + [P 2 ]3,5
3 3
1 1 1
= · = .
3 9 27
Chapter 14
Statistics
E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1.
Solution 14.1 1. The a posteriori density can be computed by using the formula
π4 (Θ|E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1)
= kP(E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1|Θ = θ)π0 (θ).
1 Here and in the sequel, for a given function f we denote by arg max( f ) the points where f achieves
its maximum.
© Springer International Publishing Switzerland 2016 197
F. Biagini and M. Campanino, Elements of Probability and Statistics,
UNITEXT - La Matematica per il 3+2 98, DOI 10.1007/978-3-319-07254-8_14
198 14 Statistics
P(E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1|Θ = θ)
can be factorized in
P(E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1|Θ = θ)
= P(E 1 = 0|Θ = θ) · P(E 2 = 1|Θ = θ)P(E 3 = 1|Θ = θ) · P(E 4 = 1|Θ = θ)
= (1 − θ) · θ · θ · θ.
Γ (6 + 2) 7!
k = = = 42.
Γ (6)Γ (2) 5!
1
2. The a priori probability that Θ belongs to the interval , 1 is given by:
2
1
1
P( ≤ Θ ≤ 1) = π0 (θ) dθ
2 1
2
1 1 7
= 3 θ2 dθ = θ3 1 = .
1
2
2 8
1
3. The a posteriori probability that Θ belongs to the interval , 1 is given by:
2
1
P( ≤ Θ ≤ 1|E 1 = 0, E 2 = E 3 = E 4 = 1)
2
1
= π4 (θ|E 1 = 0, E 2 = E 3 = E 4 = 1) dθ
1
2
1
1
θ6 θ7 15
= 42 (θ5 − θ6 ) dθ = 42 − = .
1
2
6 7 1 16
2
d
π4 (θ|E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1) = 42 θ4 (5 − 6θ),
dθ
14 Statistics 199
5
we have that it is equal 0 in θ̄ = . Since
6
d2
π4 θ= 5 = 42 θ3 (20 − 30θ)
θ= 5 < 0,
d θ
2 6 6
E = E 5 ∧ E 6 = min(E 5 , E 6 ) = E 5 E 6
P(E 5 E 6 |E 1 = 0, E 2 = E 3 = E 4 = 1)
1
= P(E 5 E 6 |θ)P(θ|E 1 = 0, E 2 = E 3 = E 4 = 1) dθ
0
1
= P(E 5 E 6 |θ)π4 (θ|E 1 = 0, E 2 = E 3 = E 4 = 1) dθ
0
1
= P(E 5 |θ)P(E 6 |θ) · 42 θ5 (1 − θ) dθ
0
1
= θ2 42 θ5 (1 − θ) dθ
0
1
= 42 θ7 (1 − θ) dθ
0
Γ (8) Γ (8) Γ (2)
= ·
Γ (6) Γ (2) Γ (10)
Γ (8) 7 · 6 · Γ (6)
= ·
Γ (6) 9 · 8 · Γ (8)
7
= .
12
2 For this, see the proof for the density of the sum of Γ (α, λ) + Γ (β, λ), where Γ (α, λ), Γ (β, λ)
are stochastically independent.
200 14 Statistics
⎧
⎨ K θ2 (1 − θ) 0 ≤ θ ≤ 1,
π0 =
⎩
0 otherwise.
E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 0, E 5 = 1.
Solution 14.2 1. To compute the constant k, we impose that the integral of the
density is equal to 1, i.e.
1
π0 (θ) dθ = 1.
0
It follows that
1
k = 1 .
0 θ2 (1 − θ) dθ
hence
Γ (3 + 2) 4!
k = = = 12.
Γ (3) Γ (2) 2! · 1!
π5 (Θ|E 1 = 0, E 2 = E 3 = 1, E 4 = 0, E 5 = 1)
= π0 (θ)P(E 1 = 0, E 2 = E 3 = 1, E 4 = 0, E 5 = 1|θ)
= π0 (θ)P(E 1 = 0|θ)P(E 2 = 1|θ)P(E 3 = 1|θ)P(E 4 = 0|θ)P(E 5 = 1|θ)
= c · θ2 (1 − θ) · (1 − θ)2 Θ 3
= c · θ5 (1 − θ)3 .
Γ (6 + 4) 9!
c = = = 7 · 8 · 9 = 504.
Γ (6) Γ (4) 5! 3!
2
P(X |W ) = i P(X = i|W )
i=0
= P(X = 1|W ) + 2 P(X = 2|W ).
We obtain:
21 24 6
P(X |W ) = 2 · + = .
55 55 5
Note that X = E 6 + E 7 is a random number, but not an event since it can assume
3 possible values: 0, 1 or 2.
The a posteriori probability of the events E = E 6 E 7 e F = E 6 ∨ E 7 can be
calculated in the same way:
21
P(E|W ) = P(E 6 E 7 = 1|W ) = P(E 6 = E 7 = 1|W ) =
55
and
i.e.
1
K = 1 ,
0 θ2 (1 − θ)2 dθ
since the integral of a probability density must be equal to 1. The integral appear-
ing at the denominator is well-known and equal to
1
Γ (3)2
θ2 (1 − θ)2 dθ =
0 Γ (6)
hence
Γ (6) 5!
K = = = 30.
Γ (3)2 (2!)2
π4 (θ|E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1)
= K π0 (θ)P(E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1|θ)
= P(E 1 = 0|θ) · P(E 2 = 1|θ) · P(E 3 = 1|θ) · P(E 4 = 1|θ)
= K θ5 (1 − θ)3 ,
Γ (10)
where K = = 504 and θ ∈ [0, 1]. For θ ∈ / [0, 1], the a posteriori
Γ (6)Γ (4)
density is equal to 0. To compute the a posteriori expectation of Θ, we apply the
formula of the expectation for absolutely continuous distributions, i.e.
P(Θ|E 1 = 0, E 2 = E 3 = E4 = 1)
1
= θπ4 (θ|E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1)dθ
0
1
Γ (10)
= θ6 (1 − θ)3 dθ
Γ (6)Γ (4) 0
Γ (10) Γ (7)Γ (4) 3
= · = .
Γ (6)Γ (4) Γ (11) 5
204 14 Statistics
(c) The event F = E 52 coincides with E 5 since it assumes only the value 0 or 1. The
a posteriori probability of F is given by
P(F|E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1)
= P(E 52 |E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1)
= P(E 5 |E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1)
1
3
= θπ4 (θ|E 1 = 0, E 2 = 1, E 3 = 1, E 4 = 1)dθ = .
0 5
1 E 2 E 3 E 4 )
σ 2 ( Ẽ 6 | E
1 E 2 E 3 E 4 ) − P( Ẽ 6 | E
= P( Ẽ 62 | E 1 E 2 E 3 E 4 )2
1 E 2 E 3 E 4 ) − P( Ẽ 6 | E
= P( Ẽ 6 | E 1 E 2 E 3 E 4 )2
1 E 2 E 3 E 4 )(1 − P( Ẽ 6 | E
= P( Ẽ 6 | E 1 E 2 E 3 E 4 )),
Ẽ 62 = Ẽ 6 .
1 E 2 E 3 E 4 ) = P( Ẽ 6 | E
P( Ẽ 62 | E 1 E 2 E 3 E 4 )
1 E 2 E 3 E 4 )
= P( Ẽ 6 E 5 | E
1 E 2 E 3 E 4 )
+ P( Ẽ 6 Ẽ 5 | E
1
= P( Ẽ 6 E 5 |θ)π4 (θ| E1 E 2 E 3 E 4 )dθ
0
1
+ P( Ẽ 6 Ẽ 5 |θ)π4 (θ| E1 E 2 E 3 E 4 )dθ
0
1
= θ(1 − θ)π4 (θ| E 1 E 2 E 3 E 4 )dθ
0
1
+ (1 − θ)(1 − θ)π4 (θ| E 1 E 2 E 3 E 4 )dθ
0
1
= (1 − θ)[θ + 1 − θ]π4 (θ| E 1 E 2 E 3 E 4 )dθ
0
1
= (1 − θ)π4 (θ| E 1 E 2 E 3 E 4 )dθ
0
14 Statistics 205
1
=1− 1 E 2 E 3 E 4 )dθ
θπ4 (θ| E
0
2
= .
5
Note that the a posteriori probabilities of E 5 and E 6 coincide.
Solution 14.4 (a) The normalization constant K makes the integral of the density
equal to 1, hence
1
K = 1 1
.
0 θ2 (1 − θ) 2 dθ
We know that
1 Γ (3)Γ 23
1
θ (1 − θ) dθ =
2
2 ,
0 Γ 3 + 23
hence
9 3
Γ 7
· 5
· 3
·Γ 105
K = 2
3 = 2 2 2
3 2
= .
Γ (3)Γ 2
2!Γ 2
16
(b) We compute the a posteriori density by using the fact that the events are stochas-
tically independent subordinately to Θ. We have
π4 (θ|E 1 = 1, E 2 = E 3 = 0, E 4 = 1)
= K P(E 1 = 1, E 2 = E 3 = 0, E 4 = 1|θ)π0 (θ)
= K P(E 1 = 1|θ) · P(E 2 = 0|θ) · P(E 3 = 0|θ) · P(E 4 = 1|θ)π0 (θ)
5
= K θ4 (1 − θ) 2 ,
206 14 Statistics
where 7
Γ 5 + 27 15
· 13
· 11
· 29 Γ 6435
K = = 2 2 2
7 2
= .
Γ (5)Γ 27 Γ (5)Γ 2
128
Hence
⎧
⎪ 6435 4 5
⎨ θ (1 − θ) 2 θ ∈ [0, 1],
π4 (θ|E 1 = 1, E 2 = E 3 = 0, E 4 = 1) = 128
⎪
⎩
0 otherwise.
The arg max can be computed by finding the zeros of the first derivative. We
have
5
π4 (θ|E 1 = 1, E 2 = E 3 = 0, E 4 = 1) = K 4θ3 (1 − θ) 2 − θ4 (1 − θ) 2
5 3
2
3 5
= K θ3 (1 − θ) 2 4(1 − θ) − θ
2
θ 3
3
= K (1 − θ) 2 [8 − 13θ].
2
The derivative is equal to 0 in the extremes of the interval as well in
8
θ̄ = .
13
8 8
Since π4
> 0 for θ ∈ 0,
and π4 < 0 for θ ∈ , 1 , we have that
13 13
8
arg max π4 (θ|E 1 = 1, E 2 = E 3 = 0, E 4 = 1) = .
13
(c) The a posteriori covariance of the events E 6 and E 7 is given by
cov(E 6 , E 7 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1)
= P(E 6 E 7 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1)
− P(E 6 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1)P(E 7 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1).
P(E 6 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1)
= P(E 7 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1)
1
10
= θπ4 (θ|E 1 = 1, E 2 = E 3 = 0, E 4 = 1)dθ = .
0 17
14 Statistics 207
Analogously
P(E 6 E 7 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1)
1
120
= θ2 π4 (θ|E 1 = 1, E 2 = E 3 = 0, E 4 = 1)dθ = .
0 323
We conclude that
2
120 10
cov(E 6 , E 7 |E 1 = 1, E 2 = E 3 = 0, E 4 = 1) = − .
323 17
We assume that Θ has standard normal distribution. We observe the values of the
first 4 experiments:
Solution 14.5 (a) Since Θ has a standard normal distribution as a priori distribution,
we can write immediately the a priori density
1 θ2
π0 (θ) = √ e− 2 , θ ∈ R.
2π
(b) We compute the a posteriori density by using the fact that the random numbers
are stochastically independent subordinately to Θ:
Note that all factors which are independent of θ are now included in the constant k.
8 1
We obtain that the a posteriori distribution is Gaussian N ( , ), hence
25 5
√
5
k=√ .
2π
8
The graph of the a posteriori density is bell shaped with symmetry axis x = .
25
Then arg max π4 (θ| x1 = 0.1, x2 = 2, x3 = −1, x4 = 0.5) = 25
8
. Verify this by
computing the derivatives of the density function.
(c) The parameters of the a posteriori density provide us with:
1. the a posteriori expectation
8
P(Θ| x1 = 0.1, x2 = 2, x3 = −1, x4 = 0.5) = ;
25
2. the a posteriori variance
x1 = 1, x2 = 0.5, x3 = −1.
(b) By the computations for the likelihood factor, we immediately obtain the a
posteriori density as follows
where we have put in the constant k all terms which are independent of θ. We
1 4
obtain that the a posteriori distribution is a normal distribution N ( , ) with
√ 2 5
5
normalization constant k = √ .
2 2π
(c) To estimate the a posteriori probability of the event (Θ > 1000) we use the
tail estimation for the Gaussian distribution. To this purpose we need first to
express Θ as function of a random variable Y with distribution N (0, 1). Since
1 4
the a posteriori distribution of Θ is Gaussian N ( , ), we have
2 5
√
2 5 1
Θ= Y+ ,
5 2
where Y ∼ N (0, 1). Hence
√ √
2 5 1 5
P(Θ > 1000) = P( Y + > 1000) = P(Y > · 999, 5).
5 2 2
By the tail estimation of the standard normal distribution, we obtain that
1 x2
where x > 0, n(x) = √ e− 2 . To obtain an upper bound for P(Θ > 1000)
2π
n(x)
we can compute in the point
x
√
x=
5
· 999, 5.
2
x1 = 1.5, x2 = 0.5, x3 = 2.
Solution 14.7 (a) Since the a priori distribution of Φ is given by a Gamma distri-
bution Γ (2, 1), we can write immediately the a priori density:
φ e−φ φ ≥ 0,
π0 (φ) =
0 φ < 0.
for φ ≥ 0, 0 otherwise. Note that we have put in the constant k all factors which
are independent of φ. The a posteriori distribution is then a Gamma distribution
7 7
Γ ( , ) with normalization constant
2 4
14 Statistics 211
27 √
7 1 77
k= = √ .
4 Γ (2)
7
240 π
dφ 2 4
10
if φ = . We immediately obtain that arg max π3 (φ| x1 = 1.5, x2 = 0.5, x3 =
7
2) = 10
7
by analyzing the sign of the first derivative.
(c) The parameters of the a posteriori density provide us with
1. the a posteriori expectation
(a) Write the a priori density of Φ and the a priori probability of the event (Φ > 2).
(b) Compute the a posteriori density of Φ and the a posteriori probability of the
event (Φ > 2).
(c) Compute the a posteriori expectation of Z = Φ 2 .
for φ ≥ 0, 0 otherwise. Note that we have put in the constant k all factors which
are independent of φ. The a posteriori distribution is then a Gamma distribution
45
Γ (α4 , λ4 ) = Γ (3, ) with normalization constant
8
3
45 1 453
k= = 10 .
8 Γ (3) 2
n
Consider a set Ω = {a1 , . . . , an } of n elements. We recall that the symbol is
r
n n!
called binomial coefficient and that = .
r r !n − r !
A.1 Dispositions
We count the number of ways of choosing r elements out of a set of n elements with
repetitions and taking in account of their order, i.e. the number of dispositions of r
elements out of n. We have:
· ·
· ·
· ·
r th elements −→ n choices .
1o element −→ n choices,
2o elements −→ (n − 1) choices,
3o elements −→ (n − 2) choices,
· ·
· ·
· ·
r elements −→ (n − r + 1) choices .
o
n!
Totally, the simple dispositions are n · (n − 1) . . . (n − r + 1) = and are
(n − r )!
denoted by the symbol Drn or (n)r . They count the number of injective functions from
a set of r elements to a set of n elements. If r = n, they are called permutations.
They count the number of injective functions from a set of r elements to a set of n
elements which have a different image.
Appendix A: Elements of Combinatorics 215
A.4 Combinations
We count the number of ways of choosing r elements out of a set of n elements without
taking in account of their order, i.e. the number of combinations of r elements out
of n. Given a combination {a1 , . . . , ar }, without loss of generality, we can suppose
that a1 ≤ · · · ≤ ar . Starting from this combination, we now construct a simple
combination of r elements out of n + r − 1 elements in the following way:
b1 = a1 ,
b2 = a2 + 1 ,
· ·
· ·
· ·
br = ar + r − 1 .
On the other way round, we can always associate a combination to a simple com-
bination. Hence the r -combinations
are
as many as the r -simple combinations in
n +r −1
n + r − 1, elements, i.e. .
r
n!
.
r1 !r2 ! . . . rk !
n
To form the first group of r1 elements, we have possibilities. For the second
r1
n − r1
group, we have ways. Analogously we proceed for the remaining groups.
r2
We obtain
n n − r1 n − r1 − · · · − rk−1 n!
··· = .
r1 r2 rk r1 !r2 ! . . . rk !
Appendix B
Relations Between Discrete and Absolutely
Continuous Distributions
In Table B.1 we summarize some analogies between discrete and absolutely contin-
uous distributions.
Table B.1 Some analogies between discrete and absolutely continuous distributions
C. Discrete C. Abs. Continuous
Probability Density
P(X = x) −→ f (x)
Cumulative distribution function P(X ≤ x)
x
P(X = i) −→ −∞ f (s) ds
i∈I (X ),i≤x
Expectation of X
+∞
i P(X = i) −→ −∞ s f (s) ds
i∈I (X )
Expectation of Y = Ψ (X )
+∞
Ψ (i) P(X = i) −→ −∞ Ψ (s) f (s) ds
i∈I (X )
P(X ∈ A)
P(X = i) −→ A f (s) ds
i∈I (X ),i∈A
. , xn ) = k e(− 2 Ax·x+b·x)
1
Density f (x1⎛, . . ⎞ ⎛ ⎞
x1 b1
⎜ . ⎟ ⎜ . ⎟
x = ⎝ .. ⎠ , A ∈ S(n), b = ⎝ .. ⎟
⎜ ⎟ ⎜
⎠
xn bn
det A − 21 A−1 b·b
Normalization constant k= (2π)n e
Expectation P(X )= A−1 b ⇒ P(X −1
i ) = (A b)i
Variance and covariance matrix C = A −1
Marginal distribution of X i X i ∼ N (A−1 b)i , [A−1 ]ii
In this chapter we present Stirling’s formula, which describes the asymptotic behavior
of n! with n increasing. It holds that:
√
2π n n+ 2 e−n (1 + O(n −1 )) .
1
Stirling’s formula: n! =
Different kinds of proofs can be used to prove this formula. Here we present the
classical proof and a more general result, of which the Stirling’s formula represents
a particular case.
We start with the classical proof which can be found in [2]. Here we recall it for
reader’s convenience.
Here we obtain Stirling’s √formula modulo a multiplicative constant. This value can
be shown to be equal to 2π, as a consequence of Theorem 5.4.1 by approximating
the probability that a random number with binomial distribution Bn(2n, 21 ) assumes
the value n.
The Stirling’s formula is equivalent to
n!
lim √ 1 = 1.
n→∞ 2π n n+ 2 e−n
Hence summing up
n
k
n n
k+1
log xdx < log k < log xdx,
k=1 k−1 k=1 k=1 k
we have
n log n − n < log n! < (n + 1) log(n + 1) − n .
1
(n + ) log n − n .
2
we have
1 n+1
dn − dn+1 = n + log − 1, (F.1)
2 n
however
1
1+
n+1 2n + 1
=
n 1
1−
2n + 1
and
∞
xn
log(x + 1) = (−1)n+1 . (F.2)
n=1
n
Since
⎛ 1 ⎞
1+
n+1 ⎜ 2n + 1 ⎟ = log 1 + 1 1
= log ⎝ − − ,
1 ⎠
log log 1
n 2n + 1 2n + 1
1−
2n + 1
1
using (F.2) with x = ± we obtain
2n + 1
Appendix F: Stirling’s Formula 227
1 1 1
dn − dn+1 = (2n + 1) log 1 + − log 1 − −1
2 2n + 1 2n + 1
1 2 2 2
= (2n + 1) + + + · · · −1
2 (2n + 1) 3(2n + 1)3 5(2n + 1)5
1 1
= + + ··· ,
3(2n + 1) 2 5(2n + 1)4
we obtain that the limit of dn exists finite since the two sequences an e dn are bounded
by each other.
where α > 0. It represents a generalization of factorial n, since for all α > 0 it holds
that
228 Appendix F: Stirling’s Formula
Γ (α + 1) = αΓ (α).
1 n
(−1)k−1 (x − α)k (−1)n (x − α)n+1
φ(x) = α log α − α − (x − α)2 + +α ,
2α k=3
k α k−1 n+1 ξ n+1
x −α √
u= √ , dx = αdu .
α
We obtain +∞
u2
Γ (α + 1) = αα+ 2 e−α e− 2 +ψ(u) du,
1
√
− α
where
n
(−1)k−1 u k n+3 (−1)
n
u n+1
ψ(u) = +α 2 √
k=3
k α 2 −1
k
n + 1 (α + ξ α)n+1
where δ > 0 s sufficiently small constant. For what concerns I1 , I3 we note that φ(u)
is a concave function. Hence we obtain that also the function
u2
θ(u) = − + ψ(u),
2
obtained by φ by adding a constant and by a linear transformation of the underlying
variable, is concave. For u ≤ −αδ we have
u
θ(u) ≤ − θ(−αδ )
αδ
Appendix F: Stirling’s Formula 229
and for u ≥ αδ
u
θ(u) ≤ θ(αδ ).
αδ
1
By the expansion of ψ(u) with n = 2 we note that for α sufficiently big and δ <
6
α2δ α2δ
we have θ(−αδ ) < − , θ(αδ ) < − Hence for |u| ≥ αδ it holds that
4 4
αδ
θ(u) ≤ −|u| .
4
It follows that
αδ
eθ(u) du + eθ(u) du ≤ e−|u| 4 du
I1 I3 |u|≥αδ
8 −|u| αδ +∞ 8 α2δ
= − δe 4 = δ e− 4 .
α αδ α
and hence √
2π αα+ 2 e−α (1 + O(α−1 )) .
1
Γ (α + 1) =
Appendix G
Elements of Analysis
In this appendix we recall some definitions and results of analysis in one variable to
facilitate the theoretical comprehension and the execution of the exercises.
i.e. if for all > 0 there exists N = N () such that for all n > N
|an − L| < ;
2. divergent if
lim an = +∞,
n→∞
i.e. if for all M > 0 there exists N = N (M) such that for all n > N
an > M ,
an < M .
A sequence may neither be convergent nor divergent. For example, the sequence
an = (−1)n oscillates between 1 and −1.
A function f : R −→ R has:
1. finite limit in x if
lim f (y) = L < ∞ ,
y→x
i.e. if for all > 0 there exists δ = δ() such that for all y with |y − x| < δ
| f (y) − L| < ;
2. infinite limit in x if
lim f (y) = +∞ ,
y→x
i.e. if for all M > 0 there exists δ = δ(M) such that for all y with |y − x| < δ
f (y) > M ,
or lim y→x f (y) = −∞ meaning that, for all M > 0 there exists δ =
δ(M) such that for all y with |y − x| < δ
f (y) < M .
2. ∀ x ∈ R
x n
lim 1+ = ex ;
n→∞ n
3.
log(1 + x)
lim = 1.
x→0 x
Appendix G: Elements of Analysis 233
G.4 Series
for all |x| < 1, which is obtained as derivative of the geometric series;
3. the exponential series
∞
xn
= ex
n=0
n!
for all x ∈ R.
G.5 Continuity
where lim− f (x), lim+ f (x) are called left limit and right limit respectively. The
x→x0 x→x0
left limit is taken over x < x0 , the right limit is taken over x > x0 .
We summarize the most common derivatives as well as the principal rules of deriva-
tion in Table G.1 and in Table G.2, respectively.
234 Appendix G: Elements of Analysis
G.7 Integrals
2. Change of variable
x = g(y) ⇒ d x = g (y) dy,
b g −1 (b)
f (x) dx = f (g(y)) g (y) dy .
a g −1 (a)
Appendix H
Bidimensional Integrals
This is analogous to the one-dimensional case, where the length of a segment [a, b]
is given by
b
l([a, b]) = dx .
a
Let f : R2 → R and put z = f (x, y). A function in two variables describes a surface
in R3 of coordinates (x, y, f (x, y)). We want to calculate the volume between the
surface described by the function and the plane x y. This volume is given by the
double integral
f (x, y) dxdy ,
R2
O x
This result holds for sufficiently regular functions f (for example, if f is continuous)
and is known as Fubini-Tonelli theorem. We refer to [11] for further details.
Example H.2.1 Let A = {1 < x < 2, 3 < y < 4}. We compute the following
integral on A.
2 4
x 2 y dxdy = dx x 2 y dy
A 1
! "
3
x is a parameter!
2 4
= x 2
y dy dx
1 3
2 4
1 2
= x2 y dx
1 2 3
7 2 2
= x dx
2 1
49
= .
6
Appendix H: Bidimensional Integrals 237
O x
Example H.2.2 Let B = {0 < x < 1, x − 1 < y < x + 1}, see Fig. H.2.
We calculate the following integral on B.
1 x+1
−y
e dxdy = dx e−y dy
B 0 x−1
1 # −y $x+1
= e x−1 dx
0
1 −(x+1)
= e − e−(x−1) dx
0
−1 1
= e − e e−x dx
0−1
= e − e 1 − e−1 .
y=
1
x+
−x
y=
+1
−1 O 1 x
In the first passage the extremes of integration can be found by drawing the parallels
to the x-axis and finding the intersections with the border of the domain D. In the
second step the integral has been split in two parts and the extremes have been found
by drawing the parallels to the y-axis.
1. f (x, y) = x 2 y:
∂f ∂f
= 2x y , = x2 ;
∂x ∂y
Let Ψ : R2 → R2 , (x, y) = (Ψ1 (x, y), Ψ2 (x, y)). We call Jacobian Jψ of the
function ψ the matrix
Appendix H: Bidimensional Integrals 239
⎛ ⎞
∂Ψ1 ∂Ψ1
∂x ∂y
⎜ ⎟
JΨ = ⎝ ⎠.
∂Ψ2 ∂Ψ2
∂x ∂y
Ψ : R2 → R2
Ψ / 2
R2(u,v) R(x,y)
GG
GGf ◦Ψ
GG
GG f
G#
R
O x
240 Appendix H: Bidimensional Integrals
O x
x = ρ cos θ, y = ρ sin θ
Ψ
(θ, ρ) → (x, y) = (ρ cos θ, ρ sin θ) .
i.e.
| det JΨ | = ρ .
Appendix H: Bidimensional Integrals 241
It follows that
+∞ 2π
e− 2 (x +y 2 )
ρ e− 2 ρ dθ
1 2 1 2
dxdy = dρ
R2 0 0
+∞ 2π
− 21 ρ2
= ρe dρ dθ
0 0
1 2 +∞
= 2π −e− 2 ρ
! 0 "
1
= 2π .
Hence
+∞ 2 +∞ +∞
x2 x2 Y2
e− 2 dx = e− 2 dx e− 2 dy
−∞ −∞ −∞
e− 2 (x +y 2 )
1 2
= dxdy = 2π .
R2
Finally +∞
x2 √
e− 2 dx = 2π .
−∞
References
A a posteriori, 106
Absolutely continuous distribution, 44 conditional, 104
Absorbing boundary conditions, 82 joint probability, 58
Area, 235 marginal probability, 59
Derivative, 234
Discrete distribution, 27
B Dispositions, 213
Bayes’ formula, 16 simple, 214
Bernoulli scheme, 27 Distribution
Bet, 8 a priori, 106
Binomial distribution, 28 beta, 62
Bounded, 3 Cauchy, 55
χ2 , 54
exponential, 48
C gamma, 52
Change of variables, 239 Gaussian n-dimensional, 66
Client, 89 initial, 81
Coefficient normal, 50
binomial, 213 stationary, 96
multinomial, 215 Student, 63
Coherence, 8 Double integral, 235
Combinations, 215
Complementary, 5
Conditional probability, 14 E
Confidence Equations
intervals, 111 Chapman-Kolmogorov, 91
region, 111 Kolmogorov forward, 91
Constituent, 6 Event, 5
Continuity, 233 Exhaustivity, 6
Convergence for sequences of cumulative Expectation, 8
distribution functions, 73
Covariance, 22
Cumulative distribution function, 43 F
Formula
of composite expectation, 15
D Stirling’s, 225
Density Function
© Springer International Publishing Switzerland 2016 245
F. Biagini and M. Campanino, Elements of Probability and Statistics,
UNITEXT - La Matematica per il 3+2 98, DOI 10.1007/978-3-319-07254-8
246 Index
Q
J Queueing system, 89
Jacobian, 238
Joint distribution, 35
R
Random
L number, 3
Law of large numbers, 25 vector, 35
Likelihood factor, 108 walk, 82
Limit, 231
Linearity, 8
Little’s formula, 101 S
Logical Series, 233
product, 5 Server, 89
sum, 5 Service times, 89
Logically Simple combinations, 214
dependent event, 7 State space, 81
independent event, 7 Stationary regime, 100
semidependent event, 7 Statistical
Lower bounded, 3 induction, 104
inference, 104
Successes, 28
M
Marginal distribution, 35
Memoryless, 30 T
Monotonicity, 8 Theorem
Multinomial distribution, 33 De Moivre-Laplace, 77
Transition probability matrix, 81
N
Negatively U
correlated, 22 Uniform distribution, 46
Non-correlated, 22 Upper bounded, 3