0% found this document useful (0 votes)
32 views85 pages

Chapter 4: Probability and Probabilistic Models: Statistics I

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views85 pages

Chapter 4: Probability and Probabilistic Models: Statistics I

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Statistics I

Chapter 4: Probability and probabilistic models


Topic 4. Probability and Stochastic models

Summary
I Probability:
I Random experiments, sample space, elementary and composite
events.
I Axioms of Probability.
I Conditional probability and its properties.
I Random variables (RVs) and their properties.
I Stochastic models for discrete RVs: the Bernoulli and other related
models.
I Stochastic models for continuous RVs: the Normal (or Gaussian) and
related models.
Basic concepts
I Random Experiment: is the process of observing an outcome that
cannot be predicted with certainty.
I Sample Space: is the set of all possible outcomes of a random
experiment, denoted by

Ω = {e1 , e2 , . . . , en , . . .}

whose elements are called elementary events. These are disjoint (i.e.
they cannot occur at the same time).
I Event: is a collection of elementary events.

A = {e1 , e3 }

Examples:
I Result of a coin toss.
I The closing price of stock x at the end of next Monday.
Events: basic concepts

Intersection of events: Let A and B be two events of the sample space Ω,


then the intersection, A ∩ B, is the set of all events Ω that jointly belong
to A and B.

Representation by Euler-Venn diagrams:


Events: basic concepts

A and B are said to be incompatible events if they have no common


elementary events, that is the set A ∩ B = ∅ is empty (denoted by the
symbol ∅).
Events: basic concepts

Union of events: Let A and B be two events of the sample space Ω, the
union, A ∪ B, is the set of all events of Ω that belong either to A OR B.
Events: basic concepts

Trivial events:
I Sure event Ω: the set equals to the sample space
I Impossible event ∅: the empty set

Complementary event
The complementary of an event A is the set of all events of Ω that do
not belong to A.
Example: throw of a dice

Consider the outcome of throwing a regular dice (i.e. a dice with 6 faces):

I elementary events: observed face 1,2,3,4,5,6.


I sample space: Ω = {1, 2, 3, 4, 5, 6}
I events: A = {2, 4, 6} B = {4, 5, 6}
The event A corresponds to an even number, while B corresponds to a
number larger than three.
Example: throw of a dice
Ω = {1, 2, 3, 4, 5, 6} A = {2, 4, 6} B = {4, 5, 6}
I Complementary:
Ā = {1, 3, 5} B̄ = {1, 2, 3}
I Intersection:
A ∩ B = {4, 6} Ā ∩ B̄ = {1, 3} = A ∪ B
I Union:
A ∪ B = {2, 4, 5, 6} Ā ∪ B̄ = {1, 2, 3, 5} = A ∩ B
A ∪ Ā = {1, 2, 3, 4, 5, 6} = Ω
I Incompatible events:
A ∩ Ā = ∅
I Note that:
A∩B ⊂A A∩B ⊂B
A⊂A∪B B ⊂A∪B
Notions of Probability

Classic (Laplace rule)


Consider an experiment with k elementary events all equiprobable (i.e.
with probability 1/k), then the probability of a set A is defined as

1
P(A) = × size of A
k

In general, the probability can be viewed as an application that assigns to


each event (i.e. A) a number between 0 and 1, that is P (A) ∈ [0, 1].
Notions of Probability (... part 2)
Frequentist notion
The probability of A, P(A), is the limit of its relative frequency, that is
nA
P(A) = lim ,
n→∞ n
where nA is the number of times that we observed A in n experiments.
A Flight Example
What is the probability that A flight crashes ?
(i.e. 10−6 as for a million of flights only 1 falls)

Subjective / Epistemic notion


The probability of A, can be tough to be the proportion, P(A), of money
that one judge to be fair to bet for A.
MY Flight Example
What is the probability that MY flight crashes ?
(i.e. If I take it is because hope to be almost 0)
Postulates of Probability and some consequences

Postulates:
1. 0 ≤ P(A) ≤ 1.
Pn
2. If A = {e1 , e2 , . . . , en }, then P(A) = i=1 P(ei ).

3. P(Ω) = 1.

Consequently:
I Complementary: P(Ā) = 1 − P(A).
I P(∅) = 0

I Union: P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

I If A and B are incompatible (A ∩ B = ∅), then


P(A ∪ B) = P(A) + P(B).
Example: throw of a dice

I Probability of elementary events: P(ei ) = 16 , where ei = i, para


i = 1, . . . , 6
I Probability of an even score: A = {2, 4, 6}, then

1 1 1 1
P(A) = P(”2”) + P(”4”) + P(”6”) = + + =
6 6 6 2
I Probability of a score greater than 3: B = {4, 5, 6}, then

1 1 1 1
P(B) = P(”4”) + P(”5”) + P(”6”) = + + =
6 6 6 2
I Probability of an odd score
1 1
P(Ā) = 1 − P(A) = 1 − =
2 2
Example: throw of a dice

I Probability of an even score (A =“even”) OR greater than 3


(B =“greater than 3”)

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)


2 1
As A ∩ B = {4, 6}, then P(A ∩ B) = 6 = 3

1 1 1 4 2
P(A ∪ B) = + − = =
2 2 3 6 3
I Probability of even or face 1:
Events A = {2, 4, 6} and C = {1} are incompatible (A ∩ C = ∅)
therefore
1 1 4 2
P(A ∪ C ) = P(A) + P(C ) = + = =
2 6 6 3
Example: conditional probability

I We play at the roulette betting for numbers 3, 13 and 22. What is


the probability to win?

I The sample space is Ω = {0, 1, 2, . . . , 36}, made by 37 elementary


events. The event of interest is A = ”our bet” = {3, 13, 22} that
contains 3 elementary events.

3
I Therefore, the probability to win is P (A) = 37 .

I Suppose that before begin it were told us that the roulette is unfair
as only odd numbers come out. What is the probability to win when
including this information? Is the same as before ?
Notion of conditional probability
Conditional Probability
Let A and B be two events such that P(B) > 0, the conditional
probability of A given B is defined:

P(A ∩ B)
P(A|B) =
P(B)

The product’s law


If P(B) > 0, then
P(A ∩ B) = P(A|B)P(B)
Notion of independence
Event A is independent from B if conditioning on B its probability does
not change:
P(A|B) = P(A),
more over if P(A) > 0 and P(B) > 0 the above is equivalent to the
following:
P(A ∩ B) = P(A)P(B).

Note: Do not confuse independent events with incompatible events.


Example: conditional probability

I Let B =“The outcome is Odd”= {1, 3, 5, . . . , 35}, which contains


18 elementary events.

I Then, given that A ∩ B = {3, 13}, the conditional probability is:


2
P (A ∩ B) 37 2 1
P (A|B) = = 18 = =
P (B) 37
18 9

I Note that, once we know that the roulette is unfair, then the sample
space changes from the initial as no even numbers can be obtained.
It becomes Ω∗ = B = {1, 3, 5, . . . , 35}. The probability of A over Ω∗
is known 19 .

I As P(A) 6= P (A ∩ B), then A and B are dependent.


Examples
From a pack of 40 spanish cards, I extract two without replacement.
Calculate the probability that:
10
I The first card is a club: P(A) = 40 .
I The second is a club knowing that the first was also a club:
9
P(B|A) = 39 .
9 10
I The first two are clubs: P(A ∩ B) = P(B|A)P(A) = 39 40 .

I trow twice a dice. Calculate the probability that:


I the first dice is one: P(C ) = 61 .
I the second dice is one knowing that the first was one:
P(D|C ) = P(D) = 61 .
I the first is one, knowing that the second is also one:
P(C |D) = P(C ) = 61 .
11
I both are one: P(C ∩ D) = P(D)P(C ) = 66 (independent events)
Law of total probability (1)
Consider a set of events B1 , B2 , . . . , Bk which are mutually exclusive,

Bi ∩ Bj = ∅, ∀i 6= j.

if also they satisfy


Ω = B1 ∪ B2 ∪ . . . ∪ Bk ,
we say that they are a partition of the sample space.
Example

I For the pack of spanish card the following sets are partition of the
sample space:

I Ω = {diamonds, clubs, spades, hearts} .

I Ω = {aces, threes, queens, horses, kings, the rest of cards} .


Law of total probability (2)

Given a partition of the sample space, B1 , B2 , . . . , Bk , and and an event


A it must be

P(A) = P(A ∩ B1 ) + P(A ∩ B2 ) + . . . + P(A ∩ Bk ) =


= P(A|B1 )P(B1 ) + P(A|B2 )P(B2 ) + . . . + P(A|Bk )P(Bk ).
Example: of the law of total probability
I In a spanish pack of 48 cards, calculate the probability of drawing an
ace using the law of total probability
I the four seeds forms a partition of the pack
Ω = {diamonds, clubs, spades, hearts}, then

P (Ω) = P (diamonds) + P (clubs) + P (spades) + P (hearts) =


1 1 1 1
= + + +
4 4 4 4
I If A =“ace”, then

P (A) = P (A|diamonds) P (diamonds) + P (A|clubs) P (clubs) +


P (A|spades) P (spades) + P (A|hearts) P (hearts) =
1 12 1 12 1 12 1 12 4 1
= + + + = =
12 48 12 48 12 48 12 48 48 12
I If we draw an ace, what is the probability to be an ace of clubs? We
need to invert the conditioning.
The Bayes theorem

For two events A and B it must be


P(A ∩ B) P(B|A)P(A)
P(A|B) = =
P(B) P(B)

Such “theorem” is applied if we know P(B|A).

Example: (cont.) Given that we draw an ace what is the probability to


be an ace of clubs ?
1 1
P(A|clubs)P(clubs) 12 4 1
P(clubs|A) = = 1 =
P(A) 12
4
Example

I A cat want to fish in a aquarium with 5 fishes: 3 yellow and 2 black.


Assuming that he fished one, what is the probability to be black ?
Let R =“black”, then:
2
P (R) =
5
I Assuming that he fished one, what is the probability to be of
different color ?
Denoting with R1 =“first black”, R2 =“second black”, A1 =“first
yellow” and A2 =“second yellow”, then:

P (R1 ∩ A2 ) + P (A1 ∩ R2 ) = P (A2 |R1 ) P (R1 ) + P (R2 |A1 ) P (A1 ) =


32 23 6 6 12 3
= + = + = =
45 45 20 20 20 5
Example

I Assuming that he fishes twice and knowing that the second is black,
what is the probability that the first were yellow?

P (R2 |A1 ) P (A1 ) P (R2 |A1 ) P (A1 )


P (A1 |R2 ) = = =
P (R2 ) P (R2 |A1 ) P (A1 ) + P (R2 |R1 ) P (R1 )
23 6
45 20 6 3
= 23 12 = 6 2 = =
45 + 45 20 + 20
8 4
Random Variables

I Let Ω be the sample space of an experiment.


I We define a random variable (r.v.), X , a function X : Ω −→ R, such
that to each element ei ∈ Ω corresponds a number X (ei ) = xi ∈ R.
I Intuitively, a r.v. is a quantity that varies according to the possible
outcome ei .
I The random variable is denoted with a capital letter (i.e. X ), while
lower letter (i.e. xi ) denotes the specific value corresponding to a
point on the sample space (i.e. ei ).
I Note: The statistic variables we saw in the previous topics (1,2 and
3) may be viewed as the outcomes of a r.v. observed in a certain
sample. Such variables differ from the stochastic variables here
considered.
Random Variables

Discrete r.v.
If X assumes values over a finite or infinite countable space S ⊆ R we
say that X is a discrete r.v.

Continuous r.v.
If X assumes values over a infinite uncountable space S ⊆ R we say that
X is a continuous r.v.

Examples
I X =“Score of trowing a dice” is a discrete r.v. S = {1, 2, 3, 4, 5, 6}.
I Y =“Number of cars crossing a bridge in a week” is a discrete r.v.
S = {0, 1, 2, . . .} = N ∪ 0 as it is infinite countable.
I Z = “the heigh of a student” is a continuous r.v. S = [0, +∞).
Discrete r.v.

Probability function
Let X be a discrete r.v. with values {x1 , x2 , . . .}. We call probability
function or probability mass function, the set of probabilities where X
takes values, that is, pi = P[X = xi ], para i = 1, 2, . . . .

Example
X = the score at throwing a dice. The probability mass function for a
fair dice
x 1 2 3 4 5 6
1 1 1 1 1 1
P[X = x] 6 6 6 6 6 6

In this case, S = {1, 2, 3, 4, 5, 6} y p1 = . . . = p6 = 16 .


Discrete r.v.

Probability function. Properties


Let X be a discrete r.v. over the set S = {x1 , x2 . . .} with probabilities
p1 = P(X = x1 ), p2 = P(X = x2 ),. . .
I 0 ≤ P[X = xi ] ≤ 1.
X
I P[X = xi ] = 1.
i

X
I P[X ≤ x] = P[X = xi ].
i,xi ≤x

I P[X > x] = 1 − P[X ≤ x].


Example
I Suppose a game where 3 rings must be inserted in a arm. The
payoffs are: -3 euro for participating, 4 euros for 1 ring, 6 for 2 rings
and 30 for 3 rings inserted. Assume that the probability of inserting
one ring at each throw is 0.1 and that the 3 throws are independent.
I Let X be the r.v. representing the gain. The sample space is

Ω = {(f , f , f ) , (a, f , f ) , (f , a, f ) , (f , f , a) ,
(a, a, f ) , (a, f , a) , (f , a, a) , (a, a, a)}

where a denote inserted and f failed. Then, X has only four


outcomes:

P (X = −3) = 0.93 = 0.729


P (X = 1) = 3 × 0.1 × 0.92 = 0.243
P (X = 3) = 3 × 0.12 × 0.9 = 0.027
P (X = 27) = 0.13 = 0.001
Example

I What is the probability to gain at least 3 euros ?

P (X ≥ 3) = P (X = 3) + P (X = 27) = 0.027 + 0.001 = 0.028

I What is the probability to not lose money ?

P (X ≥ 0) = P (X = 1) + P (X = 3) + P (X = 27) =
= 0.243 + 0.027 + 0.001 = 0.271

or:

P (X ≥ 0) = 1 − P (X < 0) = 1 − P (X = −3) = 1 − 0.729 = 0.271


Discrete r.v.

Distribution Function
The cumulative distribution function (c.d.f.) of a r.v. X is an application
F : R → [0, 1], that to each x ∈ R assigns the probability:
X
F (x) = P[X ≤ x] = P (X = xi )
xi ∈S,xi ≤x

Note: that the c.d.f. is defined over all x ∈ R.


I 0 ≤ F (x) ≤ 1 for all x ∈ R.
I F (y ) = 0 for all y < min S. Then, F (−∞) = 0.
I F (y ) = 1 for all y > max S. Therefore, F (∞) = 1.
I If x1 ≤ x2 , then F (x1 ) ≤ F (x2 ), that is, F (x) is not decreasing.
I For all a, b ∈ R,
P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = F (b) − F (a).
Example

I The probability function of the variable X in the game of rings is:



 0.729
 x = −3
0.243 x =1

P (X = x) =

 0.027 x =3
0.001 x = 27

I The c.d.f. of the variable X in the game of rings is:




 0 x < −3
0.729 −3 ≤ x < 1



F (x) = P (X ≤ x) = 0.729 + 0.243 = 0.972 1≤x <3
0.729 + 0.243 + 0.027 = 0.999 3 ≤ x < 27




0.729 + 0.243 + 0.027 + 0.001 = 1 27 ≤ x

I Note that the c.d.f. is a discontinuous function with jump sizes


equal to P (X = x), for all x ∈ S.
Expectation of a r.v.

Let X be a discrete r.v. over S = {x1 , x2 , . . . } with probabilities


p1 = P (X = x1 ) , p2 = P (X = x2 ) , . . . then, the expectation of X is
defined as:
X X X
E [X ] = xP (X = x) = xi P (X = xi ) = xi pi
x∈S i i

The following properties can be proved


I If a, b ∈ R, then:
E [a + bX ] = a + bE [X ]
I If g is a real function, then:
X
E [g (X )] = g (x) P (X = x)
x∈S
Example

The expectation of the variable X in the game of rings is:


X
E [X ] = xP (X = x) =
x∈S
= −3 × P (X = −3) + 1 × P (X = 1) + 3 × P (X = 3) + 27 × P (X = 27) =
= −3 × 0.729 + 1 × 0.243 + 3 × 0.027 + 27 × 0.001 = −1.836

That is, the expected gain is −1.836 euros.


Variance of a r.v.
The variance of a discrete r.v. is
h i X
2 2
V [X ] = E (X − E [X ]) = (x − E [X ]) P (X = x) =
x∈S
X 2
X 2
= (xi − E [X ]) P (X = xi ) = (xi − E [X ]) pi
i i

The following properties can be verified


I The variance can be written as:
2
V [X ] = E X 2 − E [X ]
 

I V [X ] ≥ 0 y Var [X ] = 0 if and only if, X is a constant.


I If a, b ∈ R, then:
V [a + bX ] = b 2 V [X ]
The squaredproot of the variance is called standard deviation and denoted
by S[X ] = V [X ].
Example

The variance of the variable X in the game of rings is:


2 2
V [X ] = E X 2 − E [X ] = 7.776 − (−1.836) = 4.405
 

where:
2
E X 2 = (−3) × 0.729 + 12 × 0.243 + 32 × 0.027 + 272 × 0.001 = 7.776
 


therefore the standard deviation is S[X ] = 4.405 = 2.0988.
Example

Let X count the number of tails in tossing a coin twice. The probability
function of X is
x 0 1 2
1 1 1
P[X = x] 4 2 4

The expectation is:


1 1 1
E [X ] = 0 × +1× +2× =1
4 2 4
and its variance is:
3 1
Var [X ] = E [X 2 ] − E [X ]2 = − 12 =
2 2
where:
1 1 1 3
E [X 2 ] = 02 × + 12 × + 22 × =
4 2 4 2
Chebyshev’s inequality

This inequality provides a bound for the probability of (for instance) a


discrete r.v. X when only the expectation E [X ] and the variance V [X ]
are available.
Let X be a discrete r.v. with expectation E [X ] and the variance V [X ],
then for k ≥ 1:
V (X )
P (|X − E [X ]| ≥ k) ≤
k2
or,
V (X )
P (|X − E [X ]| < k) ≥ 1 −
k2
Note: That the this is a raw bound and it must be used only when the
distribution of X is unavailable.
Chebyshev’s inequality

Consider the application of Chebyshev’s inequality for the game of rings if


we assume that we don’t know the probability of inserting the ring but
only that E [X ] = −1.836 and V [X ] = 4.405. Then:

4.405
P (|X + 1.836| ≥ 3) ≤ = 0.4894
9
Considering the probability function then the exact probability is:

P (|X + 1.836| ≥ 3) = P (X + 1.836 ≥ 3) + P (X + 1.836 ≤ −3) =


= P (X ≥ 1.164) + P (X ≤ −4.836) =
= P (X = 3) + P (X = 27) = 0.027 + 0.001 = 0.028

which shows that the Chebyshev’s bound can be roughly far from the
exact probability.
Summary Example

I Let X , be a discrete r.v. representing the number of tails minus the


number of crosses in tossing 3 unfair coins, where the probability of
tail is twice that of cross.

I Let “c”={tails} y “+”={cross}.

I The sample space is:


 
e1 = {c, c, c} , e2 = {+, c, c} , e3 = {c, +, c} , e4 = {c, c, +} ,
Ω=
e5 = {+, +, c} , e6 = {+, c, +} , e7 = {c, +, +} , e8 = {+, +, +}
Summary Example

I The set S where the r.v. has support S = {−3, −1, 1, 3} as:

X (e1 ) = 3 − 0 = 3
X (e2 ) = X (e3 ) = X (e4 ) = 2 − 1 = 1
X (e5 ) = X (e6 ) = X (e7 ) = 1 − 2 = −1
X (e8 ) = 0 − 3 = −3

I The probability function is:


 3

 P (X = −3) = 13 = 27 1
 2
P (X = −1) = 3 × 13 × 23 = 29
 
P (X = x) = 2

 P (X = 1) = 3 × 13 × 32 = 94
3

P (X = 3) = 32 = 27 8
 
Summary Example
I Assume that the payoffs for playing with the three coins are: -6 for
playing, then 4,6 and 30 if 1, 2, and 3 crosses appear, respectively.
What is the expected gain ?
I Let Y represents the gain then:
I For no crosses we have X = 3 tails, then Y = −6 with probability
8
P (Y = −6) = P (X = 3) = 27 .
I For 1 cross we have X = 1, so Y = −2 with probability
P (Y = −2) = P (X = 1) = 49 .
I For 2 crosses we have X = −1, thenY = 0 with probability
P (Y = 0) = P (X = −1) = 92 .
I For 3 crosses, X = −3, and then Y = 24 with probability
1
P (Y = 24) = P (X = −3) = 27 .
I Therefore, Y takes values over the set S = {−6, −2, 0, 24}. The
expected gain is:
8 4 2 1
E [Y ] = −6 × − 2 × + 0 × + 24 × = −1.78 euros
27 9 9 27
Bernoulli model

Description
This probability model describes the outcome of an experiments where
there are only two possible outcomes that we can denote as (for instance)
success and fail. The corresponding random variable:

1 if success
X =
0 if fail

Let p ∈ [0, 1] denotes the probability of success, then 1 − p is the


probability of fail.

Such experiment is called Bernoulli experiment is distributed according to


a Bernoulli distribution of parameter p,

X ∼ Ber (p).
Bernoulli model

Example
Throwing a fair coin 
1 tail
X =
0 cross
It is a Bernoulli experiment and X follows a Bernoulli distribution with
p = 1/2.

Example
A certain airline assumes that a passenger has probability 0.05 to not
show up at the check-in.
Let 
1 check-in
Y =
0 not check-in
Y follows a Bernoulli distribution with parameter p = 0.95.
Bernoulli model

Probability function:

P[X = 0] = 1 − p P[X = 1] = p

c.d.f.: 
 0 if x < 0
F (x) = 1−p if 0 ≤ x < 1
1 if x ≥ 1

Properties
I E [X ] = p × 1 + (1 − p) × 0 = p
I E [X 2 ] = p × 12 + (1 − p) × 02 = p
I V [X ] = E [X 2 ] − E [X ]2 = p − p 2 = p(1 − p)
p
I S[X ] = p(1 − p)
Binomial model

Description
This model describes the total number of successes of a n equal Bernoulli
experiments repeated independently. The r.v. represents the number of
successes, and follows a binomial distribution with parameters n ∈ N and
p = [0, 1].

Definition
A discrete r.v. X follows a a binomial distribution with parameters n and
p if  
n
P[X = x] = p x (1 − p)n−x ,
x
for x = 0, 1, . . . , n where
 
n n!
= ,
x x!(n − x)!

and we write X ∼ B(n, p).


Binomial model

Example
Suppose that the previous airline sold 80 tickets for a certain flight and
the probability of each passenger to not show up at the check-in is 0.05.
Let X = number of check-in passengers. Then (assuming independence
between passengers)
X ∼ B(80, 0.95)

I The probability that all passengers show up is


 
80
P[X = 80] = 0.9580 × (1 − 0.95)80−80 = 0.0165
80

I The probability that at least one does not show-up is:

P[X < 80] = 1 − P[X = 80] = 1 − 0.0165 = 0.9835


Binomial model

Properties
I E [X ] = np

I Var [X ] = np(1 − p)

p
I S[X ] = np(1 − p)
Poisson model

Description
It models the probability of the number of rare events occurring in a
certain domain as, for instance, the time or space.
Examples: telephone calls in a hour, typos in a page, traffic accidents in
a week, particles in a m3 of air, ”Prussian soldiers hit by the kick of an
horse”, . . .

Definition
A r.v. X follows a Poisson distribution of parameter λ > 0 if

λx e −λ
P[X = x] = , para x = 0, 1, 2, . . . ,
x!
and we write X ∼ P(λ).
Poisson model

Properties (1)
I E [X ] = λ
I Var [X ] = λ

I S[X ] = λ

Property (2)
Let X ∼ P(λ) represents the number of events in a unit of time with
mean λ.
Let Y represent the number of events in a time length t then we have

Y ∼ P(tλ)
Poisson model

Example
The mean number of typos for slide is 0.2, let X represent such number,
then
X ∼ P(0.2)
What is the probability to have no typos ?

0.20 e −0.2
P[X = 0] = = e −0.2 = 0.8187.
0!
What is the probability to have one typo in 4 slides ?
Let Y be the number of typos in t = 4 slides, then

Y ∼
P(0.2 × 4) = P(0.8)
0.81 e −0.8
P[Y = 1] = = 0.8e −0.8 = 0.3595.
1!
Continuos r.v.

Distribution function
For a continuous r.v. X , the distribution function is
F (x) = P[X ≤ x], ∀x ∈ R

As for the discrete case, F (x) provides the cumulative probability up to


point x ∈ R and now, F (x) is a continuous function.
Continuos r.v.

Properties
I 0 ≤ F (x) ≤ 1, for all x ∈ R
I F (−∞) = 0.
I F (∞) = 1.
I If x1 ≤ x2 , then F (x1 ) ≤ F (x2 ), that is, F (x) is no decreasing.
I For all x1 , x2 ∈ R, P(x1 ≤ X ≤ x2 ) = F (x2 ) − F (x1 ).
I F (x) is continuos.

The probability mass function has no meaning for continuous r.v.,


because P(X = x) = 0. In its place we use the so called density function.
Continuos r.v.

Density function
For a continuos r.v. X with distribution function F (x), the density
function of X is:
dF (x)
f (x) = = F 0 (x)
dx

Properties
I f (x) ≥ 0 ∀x ∈ R
Rb
I P(a ≤ X ≤ b) = a f (x)dx ∀a, b ∈ R
Rx
I F (x) = P(X ≤ x) = −∞ f (u)du
R∞
I
−∞
f (x)dx = 1
Continuos r.v.

Example
For a r.v. X with density function

12x 2 (1 − x)

si 0 < x < 1
f (x) =
0 si no

we have
Z 0.5 Z 0.5
P(X ≤ 0.5) = f (u)du = 12u 2 (1 − u)du = 0.3125
−∞ 0
Z 0.5 Z 0.5
P(0.2 ≤ X ≤ 0.5) = f (u)du = 12u 2 (1 − u)du = 0.2853
0.2 0.2

Z x 
  30  si x ≤ 0
x x4
F (x) = P(X ≤ x) = f (u)du = 12 3 − 4 si 0 < x ≤ 1
−∞ 
1 si x > 1

Expectation of a continuous r.v.

Let X be a continuous r.v. over S ⊆ R, with density function f (x) .


Then, the expectation of X is
Z
E [X ] = xf (x) dx
S

The following properties can be verified:


I If a, b ∈ R, then:
E [a + bX ] = a + bE [X ]
I Let g be a real function, then
Z
E [g (X )] = g (x) f (x) dx
S
Example

The expectation of the r.v. X in the previous example is


Z Z 1
E [X ] = x · f (x)dx = x · 12x 2 (1 − x)dx =
R 0
Z 1    
1 4 1 5 1 1 1 3
12(x 3 − x 4 ) dx = 12

= x − x 0 = 12 − =
0 4 5 4 5 5
Variance of a continuous r.v.

The variance of the continuous r.v. X is:


h i Z
2 2
V [X ] = E (X − E [X ]) = (x − E [X ]) f (x)dx =
S
Z
2 2
2
x f (x)dx − E [X ] = E X 2 − E [X ]
 
=
S

The following properties can be verified:


I V [X ] ≥ 0 y Var [X ] = 0 if and only if, X is a constant.
I If a, b ∈ R, then:
V [a + bX ] = b 2 V [X ]
Again, the squared root
p of the variance is called standard deviation and
denoted by S[X ] = V [X ].
Example

The variance of the r.v. X in the previous example is


 2
2 2 3 2 9 1
Var [X ] = E X 2 − E [X ] = −
 
= − =
5 5 5 25 25

where:
Z Z 1
12 5 x=1 12 6 x=1
E X2 = 2
12x 4 (1 − x)dx =
 
x f (x)dx = x |x=0 − x |x=0 =
R 0 5 6
12 2
=−2=
5 5
q
1
Therefore the standard deviation is S[X ] = 25 = 15 .
Uniform distribution

Description
For the uniform distribution every sets of the same length has the same
probability, that is, the density is constant over a bounded set where the
r.v. takes values.

Definition
A continuous r.v. variable X follows a uniform distribution over the
interval (a, b) (where a and b are the parameters of the distribution) if
 1
b−a si a < x ≤ b
f (x) =
0 otherwise

and we write X ∼ U(a, b).


Uniform distribution

Density function:
Properties
a+b
I Expectation: E [X ] = 2
(b−a)2
I Variance: V [X ] = 12
I Standard deviation:
b−a
S[X ] = √ 12
Example of uniform distribution

Suppose X follows a uniform distribution over (a = 3, b = 5) then its


density function is
 1
2 si 3 < x < 5
f (x) =
0 si no

Lets calculate the following probabilities:


R 0.5
P(X ≤ 0.5) = −∞ f (u)du = 0
R4 R4
P(X ≤ 4) = −∞ f (u)du = 3 21 du = 12 u|43 = 21
R 4.5 R 4.5
P(3.5 ≤ X ≤ 4.5) = 3.5 f (u)du = 3.5 12 du = 12
Example of uniform distribution

Distribution function
Z x
F (x) = P(X ≤ x) = f (u)du = . . .
−∞

I Is x ≤ 3 then F (x) = P(X ≤ x) = 0.


Rx 1
I If 3 < x ≤ 5 then F (x) = P(X ≤ x) = 3 2
du = u2 |x3 = x−3
2 .

R51
I If 5 < x then F (x) = P(X ≤ x) = 3 2
du = u4 |53 = 5−3
2 = 1.
Summarizing we have:

 0 si x ≤ 3
x−3
F (x) = 2 si 3 < x ≤ 5
1 si x > 5

Example of uniform distribution

Expectation
R5 5
x2 52 −32
x · 12 dx =
R
E [X ] = R
x · f (x)dx = 3 4 = 4 =4
3

Variance
x 2 · f (x)dx − E [X ]2
R
Var [X ] = R
R5 5
x2 x3
= 3 2 dx − 42 = 6 − 16 = 0.33
3
Exponential distribution
Description
The exponential distribution models the time between two independent
events, separately and uniformly distributed over time (i.e. the time
between two Poisson events).

Definition
We say that X follows an exponential distribution of parameter λ >,
X ∼ E(λ), if its density function is

f (x) = λe −λx , para x ≥ 0.

Note that X takes values over S = [0, ∞).

Examples
I Time between the arrivals of two trucks at the discharge point.
I Time between two emergency calls.
I Life time of a lightbulb.
Exponential distribution

Density function
Properties
1
I Expectation: E [X ] = λ
1
I Variance: V [X ] = λ2
I Standard Deviation:
S[X ] = λ1
I Density function:

1 − e −λx

if x ≥ 0
F (x) =
0 otherwise

I The exponential distribution is related to the Poisson distribution.


I λ is the mean number of events in a certain unit of time (or space).
Exponential distribution
Example
In a given city there occur, in mean, 50 fires every year and we assume
that such number follows a Poisson distribution: We can calculate
I The probability of a certain time between two fires
I Knowing that a fire has just occurred, what is the probability that
the next will occur in two weeks ?
We have that
I The number of fires N ∼ P(λ) with λ = 50.
I The time between two fires is X ∼ E(λ) with λ = 50.
1
I Therefore the expected time between two fires is E [X ] = λ = 1/50
años, 7.3 dı́as.
I Two weeks correspond to the following fraction of the year:
2×7
365 = 0.03836.

I P[X > 0.03836] = 1 − P[X ≤ 0.03836] = 1 − (1 − e −50×0.03836 ) =


0.147.
Normal or Gaussian distribution

Description
The normal distribution, models the measurement errors of a certain
continuos quantity and it approximates very well most of the real
situations. Statistics makes large use of this model and those models that
derive from it.

Definition
The r.v. X follows a normal or Gaussian distribution with parameters
µ ∈ R and σ ∈ R+ , X ∼ N (µ, σ), if
 
1 1 2
f (x) = √ exp − 2 (x − µ)
σ 2π 2σ

Properties
E [X ] = µ V [X ] = σ 2
If X ∼ N (µ, σ), f (x) the distribution is symmetric around the median µ.
Normal or Gaussian distribution
Density function for 3 different values of µ and σ
Normal or Gaussian distribution

Property
If X ∼ N (µ, σ),
I P(µ − σ < X < µ + σ) ≈ 0.683
I P(µ − 2σ < X < µ + 2σ) ≈ 0.955
I P(µ − 3σ < X < µ + 3σ) ≈ 0.997

Chebyshev’s inequality
Chebyshev’s inequality applies also to continuous variables knowing only
its mean and standard deviation. In the case where X is Gaussian with
mean µ and standard deviation σ, we have that:

σ2
P (µ − k < X < µ + k) = P (|X − µ| < k) ≥ 1 −
k2
1
therefore, if k = cσ, we have that P (µ − cσ < X < µ + cσ) ≥ 1 − c2 .
Normal or Gaussian distribution

Linear transformation
If X ∼ N (µ, σ), then:

Y = aX + b ∼ N (aµ + b, |a|σ)

Standardization
If X ∼ N (µ, σ), it is possible to consider the standardized r.v.

X −µ
Z= ∼ N (0, 1)
σ
The special case of N(0, 1) is called the standard normal distribution. It
is symmetric around 0 its c.d.f. (whose analytical expression is not
available) is tabulated.
Table of N (0, 1)
Example of Normal Distribution

Let Z ∼ N(0, 1), calculate the following probabilities:

I Pr(Z < 1.5) = 0.9332. tabla

I Pr(Z > −1.5) = Pr(Z < 1.5) = 0.9332. why?

I Pr(Z < −1.5) = Pr(Z > 1.5) = 1 − Pr(Z < 1.5) = 1 − 0.9332 =
0.0668. Why nor ≤?

I Pr(−1.5 < Z < 1.5) = Pr(Z < 1.5) − Pr(Z < −1.5) =
0.9332 − 0.0668 = 0.8664.
Example of Normal Distribution
Let X ∼ N(µ = 2, σ = 3), we want to calculate Pr(X < 4) and
Pr(−1 < X < 3.5) using the table of the standard normal:
I First we make the statement over Z using standardization and then
we use the table:
 
X −2 4−2 
Pr(X < 4) = P < = Pr Z < 0.666̇ ≈ 0.7454,
3 3

where Z ∼ N(0, 1).


I The same applies for the second probability:

Pr(−1 < X < 3.5) = Pr(−1 − 2 < X − 2 < 3.5 − 2)


 
−1 − 2 X −2 3.5 − 2
=P < < = Pr(−1 < Z < 0.5) =
3 3 3
= Pr(Z < 0.5) − Pr(Z < −1) = 0.6915 − 0.1587 = 0.5328.

where Z ∼ N(0, 1).


Example of Normal Distribution

It is difficult to label a pack of meet with its exact weight due to a


natural weight loss (defined as the percentage of loss weight). Assume
that the weight loss of a chicken packet is distributed according to a
normal distribution with mean 4% and standard deviation 1%.
Let X represent the loss of a pack
I What is the probability that 3% < X < 5%?
I What it the value of x such that the 90% of the packs have a loss
less than x?
I Consider a sample of 4 packs, provide a bound that all packs show a
loss between 3 and 5%.

Sexauer, B. (1980) Drained-Weight Labelling for Meat and Poultry: An


Economic Analysis of a Regulatory Proposal, Journal of Consumer Affairs, 14,
307-325.
Example of Normal Distribution
 
3−4 X −4 5−4
Pr(3 < X < 5) = Pr < < = Pr(−1 < Z < 1)
1 1 1
= Pr(Z < 1) − Pr(Z < −1) = 0.8413 − 0.1587 = 0.6827

We want Pr(X < x) = 0.9. Then


 
X −4 x −4
Pr < = Pr(Z < x − 4) = 0.9.
1 1
Looking at the table the value that satisfies the above equation is
x − 4 ≈ 1.28 implying that 90% of the packs have a loss less than
x = 5.28%.
For a pack we have p = Pr(3 < X < 5) = 0.6827. Let Y counts the
number of packs (out of the 4) with a loss between 3% and 5%, we have
Y ∼ B(4, 0.6827) and then
 
4
Pr(Y = 4) = 0.68274 (1 − 0.6827)0 = 0.2172.
4
Example of Normal Distribution

If the sample were made of 5 packs, what is the probability that at least
one has a loss between 3% and 5% ? In this case we have n = 5 and
p = 0.6827, then Y ∼ B(5, 0.6827) and

Pr(Y ≥ 1) = 1 − Pr(Y < 1) = 1 − Pr(Y = 0) =


 
5 5
=1− 0.68270 (1 − 0.6827)5−0 = 1 − (1 − 0.6827) = 0.9968.
0
The Central Limit Theorem (CLT)

The following theorem applies to the mean of set of independent and


identically distributed (i.i.d.) r.v.,
n
1X
X̄ = Xi
n
i=1

and state that in the limit, for n large, the distribution of X̄ is Gaussian
independently of the distribution of X . Given its generality it has called
“central”.
Theorem
Let X1 , X2 , . . . , Xn be independent r.v., identically distributed, with mean
µ and standard deviation σ (both finite). For n large enough, then

X̄ − µ
√ ∼ N (0, 1)
σ/ n
Approximations with the CLT

Binomial
Let X ∼ B(n, p) with n large enough (that is n ≥ 30 and 0.1 ≤ p ≤ 0.9
or np ≥ 5 and n (1 − p) ≥ 5), then:

X − np
p ∼ N (0, 1)
np(1 − p)

Poisson
If X ∼ P(λ) with λ large enough (λ > 5)

X −λ
√ ∼ N (0, 1)
λ
 √ 
On the other hands, P(λ) ≈ N λ, λ .
Approximations with the CLT: Example
I Let X ∼ B(100, 1/3). Suppose we want to calculate Pr(X < 40), as
the exact calculus is heavy by hand we use the
I CLT where X ∼ B(100, 1/3) ≈ N (33.3, 4.714) , because

1
E [X ] = 100 × = 33.3̇
3
1 2
V [X ] = 100 × × = 22.2̇
p 3 3
S[X ] = 22.2̇ = 4.714

I Therefore,
 
X − 33.3̇ 40 − 33.3̇
Pr(X < 40) = P <
4.714 4.714
≈ P (Z < 1.414) donde Z ∼ N(0, 1)
≈ 0.921,

I the exact, value using a PC, is 0.934 and the approximation with the
CLT is not very far from the exact value.
Distributions related to the normal one

χ2 (Chi-squared)
Let X1 , X2 , . . . , Xn be i.i.d. N (0, 1). r.v.. The distribution of
n
X
S= Xi2
i=1

is called χ2 distribution regulated by one parameter,n, called degrees of


freedom (d.f.) with the following properties:
I E [S] = n
I Var [S] = 2n
Distributions related to the normal one

t Student
Let Y , X1 , X2 , . . . , Xn be i.i.d. N (0, 1) r.v.. The distribution of

Y
T = qP
n
i=1 Xi2 /n

is called t Student distribution regulated by one parameter,n, called


degrees of freedom (d.f.) with the following properties:
I E [T ] = 0
n
I Var [T ] = n−2
Distributions related to the normal one

Fn,m de Fisher
Let X1 , X2 , . . . , Xn and Y1 , Y2 , Y3 , . . . , Ym be N (0, 1) r.v.. The
distribution of Pn
m i=1 Xi2
F =
n m 2
P
i=1 Yi
is called Fisher distribution, Fn,m is regulated by two parameters,n and m,
called degrees of freedom (d.f.) with the following properties:
m
I E [F ] = (para m > 2)
m−2
2m2 (n + m − 2)
I Var [F ] = (para m > 4)
n(m − 2)2 (m − 4)
1
I ∼ Fm,n
F

You might also like