0% found this document useful (0 votes)

76 views20 pages

Probability and Statistics

This document discusses probability and statistics concepts including: 1) Random variables which assign numerical values to outcomes and have cumulative distribution functions. Discrete random variables take countable values and have probability mass functions. 2) Common probability models including the Bernoulli, binomial, and Poisson distributions which are used to build more complex models. The binomial models counts successes in independent trials while the Poisson is its limiting distribution for large trials. 3) Key concepts like expectation, moments, and simulations of random variables are introduced but not explained in detail. Common distributions like the normal are mentioned.

Uploaded by

amolaaudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views20 pages

Probability and Statistics

Uploaded by

amolaaudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Probability and Statistics

Data Science Engineering

Chapter 2: Random variables

Random Variables. Cumulative distribution function. Discrete Random variables. Proba-

bility function. Probability models. Expectation and Moments. Law of averages. Continous
random variables. Probability density. Continous distributions. Normal distribution. Sim-
ulation of random variables.

1. Random variables

The language of events is sometimes cumbersome. The usual description of events uses
a numerical description and events are identified by numbers. For example, in dice rolling
the events are identified by the numerical output Ω = {1, 2, 3, 4, 5, 6}. This leads to the
definition of random variables.

Definition 1.1. A random variable on a probability space (Ω, A, Pr) is a function

X:Ω→R

with the property that, for every x ∈ R, the preimage X −1 ((−∞, x]) = {ω ∈ Ω : X(ω) ≤ x}
is an event in A.

A random variable simply translates elements in the sample spaces to numbers, with
the condition that intervals of the form (−∞, x] have preimages in the σ–algebra A of the
probability space, where probabilities are defined.

Example 1.2. Let A = {∅, A, Ā, Ω} be a Bernouilli algebra on a sample space Ω. The map

X:Ω→R

defined by
(
0 x 6∈ A
X(ω) =
1 x∈A
1
is a random variable: we have

∅ x<0


−1
X (−∞, x] = Ā 0≤x<1

Ω x≥1


Such random variables are called indicator random variables for the set A and are usually
written as 1A .

The purpose of the definition of random variables is to translate the probability func-
tion from the probability space to R. This is generally accomplished by the cumulative
distribution function.

Definition 1.3. Let X be a random variable on a probability space (Ω, A, Pr). The cumula-
tive distributio function of X is
FX : R → R
defined as
FX (x) = Pr(X −1 ((−∞, x]) = Pr(X ≤ x).

Example 1.4. Let 1A be the indicator function of the event A in a probability space (Ω, A, Pr)
with Pr(A) = p. The cumulative function of 1A is

0 x<0


F1A (x) = 1 − p 0 ≤ x < 1

1 x ≥ 1.


Distribution functions are identified by the following properties:

Proposition 1.5. Let FX be the distribution function of a random variable X. Then

(1) 0 ≤ FX (x) ≤ 1 for all x ∈ R.

(2) FX is non decreasing: x < y implies FX (x) ≤ FX (y).
(3) limx→∞ FX (x) = 1 and limx→−∞ FX (x) = 0.
(4) FX is right–continuous: limx↓a FX (x) = FX (a).

The above properties identify the class of real functions which are distribution functions
of some random variable.

Remark 1.6. Usually capital letters like X, Y, Z, . . . are used to denote random variables in
the probability setting. For a subset A ⊂ R we use the shorthand X ∈ A for X −1 (A). Thus,
we write Pr(X < 0), Pr(0 < X ≤ 1) or Pr(X = 2) instead of Pr(X −1 (−∞, 0)), Pr(X −1 ((0, 1]),
Pr(X −1 (2)) or Pr({ω ∈ Ω : X(ω) = 2}).
2
Probabilities of general events X ∈ A can be expressed in terms of the distribution function
(sometimes in a not simple way). Some examples are
Proposition 1.7. Let X be a random variable in a probability space with distribution func-
tion FX . Then

(i) For a, b ∈ R, a < b, we have Pr(a < X ≤ b) = FX (b) − FX (a).

(ii) For a ∈ R, we have Pr(X = a) = FX (a) − limx↑a FX (x).

2. Discrete Random variables

A random variable is discrete if it takes a countable number of values.

Definition 2.1. A random variable X on the probability space (Ω, A, Pr) is discrete if X(Ω)
is countable.

An equivalent definition is that the distribution function FX of X is a step function: there

is a countable set {x1 , x2 , . . .} ⊂ R of real numbers such that FX is constant outside this set,
and FX has jump discontinuites at the points in this set.
For discrete random variables it is usually preferred to identify the probability distribution
by the so–called probability function instead of the cumulative distribution function.
Definition 2.2. The probability function of a discrete random variable is
PX : R → [0, 1]
defined as PX (x) = Pr(X = x).
Example 2.3. By throwing a dice we naturally identify the outcomes by a discrete random
variable X which takes values in {1, 2, 3, 4, 5, 6}. Its probability function is Pr(X = i) = 1/6
for 1 ≤ i ≤ 6. This shows how natural is the use of random variables in many situations.
Example 2.4. An indicator function 1A is obviously a discrete random variable taking values
in {0, 1} with probability function Pr(1A = 0) = 1 − Pr(A) and Pr(1A = 1) = Pr(A).
Actually every discrete random variable can be written as a linear combination of indicator
P
variables: if Ai is the event that X = xi for each xi ∈ X(Ω) then X = i xi 1Ai .
Example 2.5. We toss a coin till Heads shows up. The random variable X which counts
the number of tosses takes values in N, a countable infinite set. Its probability function is
Pr(X = i) = (1/2)i (and certainly, i Pr(X = i) = 1).
P

3. Discrete Probability models

We now come to one important part of the course. A large part of probability theory is
developped upon simple models which are used once an again to build more complex ones.
3
We next describe some of the most important ones. They are identified both as a model and
give the name to their probability distributions and, by abuse of language, to the random
variables.

3.1. Bernouilli model. The Bernouilli model is the simplest one, associated to the Bernouilli
algebra and to the indicator functions. In this model we are only interetsed in a single event
A ⊂ Ω and want to determine if this event occurs. Formally,

Definition 3.1. A random variable X has Bernouilli distribution with parameter p ∈ [0, 1]
if X(Ω) = {0, 1} and
Pr(X = 0) = 1 − p, Pr(X = 1) = p.
Equivalently, X = 1A for some A ⊂ Ω. We write X ∼ B(p).

There is not much to say about the Bernouilli distribution except that it is the building
block of many more complicated models.

3.2. Binomial model. We again focus on a single event A in the sample space, but now we
make n independent repetitions of the experience associated to the probability space, and we
are interested in counting how many times the event occurs in these n repetitions. By the
independence, any possible output of the n experiments in which there are k occurrences of
k n−k n
A has probability p q , where p = Pr(A) and q = 1 − p. There k outputs with precisely
k occurrences of A. This gives the probability distribution of our random variable.

Definition 3.2. A random variable X has the Binomial distribution with parameters n and
p if

n k n−k
Pr(X = k) = p q , k = 0, 1, . . . , n.
k
We write X ∼ Bin(n, p).

The Binomial model is a quite fundamental one. Among other interesting properties, we
have.

Proposition 3.3. Let X ∼ Bin(n, p). Then

Pr(X = k) ≤ Pr(X = k + 1), 0 ≤ k ≤ np − 1

and
Pr(X = k) ≥ Pr(X = k + 1), np − 1 < k ≤ n.
In other words, the sequence Pr(X = 0), Pr(X = 1), . . . , Pr(X = n) is unimodal with a
maximum at bn/pc.
4
3.3. Poisson Model. The values of the probability function of a Binomial variable are
somewhat cumbersome to compute. An useful simplification is the limiting model for n large
as long as np is kept constant. More precisely, if Xn ∼ Bin(n, pn ) with limn→∞ npn = λ,
then the probability function of Xn has a limit which can be written in a simpler way:

n k
Pr(Xn = k) = p qn n − k
k n
1
∼ (npn )k (1 − pn )−k (1 − pn )n
k!
1
∼ λk e−λ ,
k!
where in the last line we have used that
lim (1 − λ/n)n = e−λ .
n→∞

This leads to the following definition.

Definition 3.4. A random variable X has the Poisson distribution with parameter λ if
λk −λ
Pr(X = k) = e , k = 0, 1, 2, · · ·
k!
and we write X ∼ P ois(λ).

Even if a result of an asymptotic approximation, the expression of the Poisson prob-

ability function approximates quite well the Binomial one. Here are some examples for
X ∼ Bin(n, p) and Y ∼ P oiss(np) when n = 30 and p = 0.1:

k Pr(X = k) Pr(Y = k)
0 0.042 0.049
1 0.141 0.149
2 0.227 0.224
3 0.236 0.224
4 0.177 0.168

The number of radioactive particles, the number of phone calls, the number of impacts of
drop rain, are examples where the Poisson distribution fits well, associated to large number
of trials with small probability of success which can be imagined to be independent.

3.4. Geometric model. We again repeat independently an experience associated to a

Bernouilli algebra but now we count the number of repetitions till the first occurrence of A
(the waiting times till the first ‘success’). By independence, the probability of that we wait
up to k repetitions is q k−1 p.
Definition 3.5. A random variable X has the geometric distribution with parameter p if
Pr(X = k) = q k−1 p, k = 1, 2, 3, . . .
5
We write X ∼ Geom(p).

The name of the distribution comes from the fact that the sum of probabilities, which
must add up to one, is the geometric series
X X 1
Pr(X = k) = q k−1 p = p = 1.
k≥1 k≥1
1 − q

We note that a slight variation of the geometric distribution Y = X − 1, which counts the
number of ‘failures’ before the first ‘success’ is usually also called geometric, with probability
distribution
Pr(Y = k) = q k p, k = 0, 1, 2, . . .
starting with k = 0 instead of k = 1.
The geometric distribution has a characteristic property, the ‘lack of memory’. If we know
that up to the k–th repetition we have no success, then the probability of having to wait at
least s additional repetitions is the same as from the start:
Pr(X ≥ r + s, X ≥ s) q r+s−1
Pr(X ≥ r + s|X ≥ r) = = r−1 = Pr(X ≥ s).
Pr(X ≥ r) q
where we have used Pr(X ≥ r) = k≥r q k−1 p = q r−1 . In other words, if we are waiting for
P

Heads in coin tossing, the fact that we have waited 103 tosses without seeing the event does
not mean that Heads are approaching faster in the future.
Proposition 3.6. Let X be a discrete random variable taking values in the positive integers
with Pr(X = 1) = p. If X has the ‘lack of memory’ property then X ∼ Geom(p).

3.5. Negative Binomial model. If instead of waiting for the first ‘success’ as in the geo-
metric distribution, one waits till the appearance of the r–th ‘success’, then the probability
of waiting up to k repetitions is that of having r − 1 successes in the first k − 1 trials (a
binomial distribution) and then having the r–th one in the k–th trial.
Definition 3.7. A random variable X has the negative binomial distribution with parameter
p and r if
k − 1 r k−r
Pr(X = k) = p q , k = r.r + 1, . . .
r−1
We write X ∼ N egBin(p, r).

Of course, when r = 1 we obtain the geometric distribution.

3.6. Hypergeometric model. In most of the above examples we consider independent

repetition of trials of a simple Bernouilli experience. When sampling without replacement
from a population, the iteration of the sampling is not bound to independence, because the
result in the k–trial affects the probability distribution in the (k + 1)–th trial. The typical
6
example is drawing balls from an urn without replacement (as opposite with drawing with
replacement, when the trials are independent).
In the hypergeometric model, we extract samples of size r out of a population of size n
which has n1 individuals of one type and n2 = n − n1 individuals of a second type. We are
then interested in counting the number of individuals of type 1 in the sample. This leads to
the following definition.
Definition 3.8. A random variable X has the hypergeometric distribution with parameters
n, n1 and r if
n1
n2
k r−k
Pr(X = k) = n
, k = 0, 1, 2, . . .
r
where n2 = n − n1 .
We write X ∼ HypGeom(n, n1 , r).

The range of values of k for which the above definition is meaningful is

max{0, r − n2 , } ≤ k ≤ min{r, n1 }.
In order to not get bored about these boundary values we may adopt the (reasonable)
convention that a binomial coefficient ab equals zero whenever b > a or b < 0.
The name of the distribution comes from the fact that the sum of probabilities is an
hypergeometric (finite) series.

3.7. Uniform model. The basic distribution we have repeatedly seen in a finite sample
space Ω = {1, 2, . . . , n} is the uniform one, where each event gets the same probability.
Definition 3.9. A random variable X has the uniform distribution with parameter n if
1
Pr(X = k) = , k = 0, 1, 2, . . . , n.
n
We write X ∼ U (n).

We simply note that a discrete random variable can not have the uniform distribution on
an infinite countable set, say Ω = N. This would lead to Pr(X = k) = 0 for all k and then,
P
by σ–additivity, Pr(N) = x∈N Pr(X = x) = 0, contradicting the first axiom of a probability
measure.

4. Expectation and moments

Expectation is a central concept in probability and statistics. The mean of a sequence

x = (x1 , . . . , xn ) of real numbers is P
xi
x̄ = i .
n
7
If there are repetitions in the sequence then we can collect the values yj which are repeated
nj times and rewrite the mean as
X nj
x̄ = yj .
j
n
The expectation of a random variable mimics the definition and the spirit of the mean of a
sequence of numbers, by substitution of relative frequencies nj /n by probabilities:

Definition 4.1 (Expectation). The expectation of a discrete random variable X is

X
E(X) = xi Pr(X = xi ),
i

if the sum is absolutely convergent.

In other words, the expectation is the sum of values of the random variable weighted
by their probabilities. We will see later on that E(X) is the single number which best
represents a probability distribution, a clear intuitive fact. Of course the expectation can
be large because X takes very large values even with small probabilities, so the expectation
(as the mean) may be a misleading representative of a random variable. A large amount of
probability and statistics is devoted to clarify the above statement.
The caution in the definition about the convergence of the sum is not superfluous: there
are random variables which do not have expectation, although sometimes the value ∞ is
accepted.

Example 4.2. Let X be a random variable taking values on the positive integers with prob-
ability
6
Pr(X = k) = .
πk 2
P
One can check (is a famous problem in the history of mathematics) that k Pr(X = k) = 1.
However the series k k Pr(X = k) = (6/π 2 ) k k Pr(X = k) is not convergent.
P P

The expectation of the basic distributions we have seen so far is as follows:

Distribution Expectation
X ∼ B(p) p
X ∼ Bin(n, p) np
X ∼ P ois(λ) λ
X ∼ Geom(p) 1/p
X ∼ N egBin(r, p) r/p
X ∼ HypGeom(n, n1 , r) rn1 /n.

One important property of expectation is linearity. For this it is meaningful to have a

look on the distribution of the sum of two discrete random variables.
8
Definition 4.3 (Sum of random variables). Let X, Y be two discrete random variables on
the same probability space. The random variable Z = X +Y , defined as Z(ω) = X(ω)+Y (ω)
for each ω ∈ Ω has probability function
X
Pr(Z = k) = Pr(X = i, Z = k − i),
i

where (X = i, Z = k − i) is shorthand for {X = i} ∩ {Y = k − i}.

Proposition 4.4. Let X, Y be two random variables on the same probability space. Then
E(X + Y ) = E(X) + E(Y ).
provided that the involved expectations exist.
Moreover, for each λ ∈ R,
E(λX) = λE(X).

Proposition 4.4 is particularly useful. For example, it provides a simple derivation of the
expectation of Binomial and Negative Binomial distributions.
A second important property of the expectation is related to functions of random variables.
Proposition 4.5 (Functions of random variables). Let X be a discrete random variable
on a probability space and let g : R → R be a function such that the preimage of each
interval (−∞, x], x ∈ R belongs to the Borel σ–algebra (the class of such functions is called
‘measurable’ and it includes continuous functions).
Then the composition Y = g(X) is a discrete random variable on the same probability
space.

One can sometimes obtain explicitly the distribution of Y = g(X). However the following
result, usually called the theorem of expectation or the formula of change of variables for
expectation, is often useful.
Theorem 4.6. Let X be a discrete random variable on a probability space taking values
x1 , x2 , . . . and let g : R → R be a measurable function. Then the expectation of Y = g(X)
satisfies X
E(Y ) = g(xi ) Pr(X = xi ).
i

Particularly important functions are the polynomials xk , which lead to the following defi-
nition.
Definition 4.7 (Moments). Let X be a discrete random variable taking values x1 , x2 , . . ..
The k–th moment of X is X
E(X k ) = xki Pr(X = xi ).
i
9
whenever the sum is absolutely convergent.
The k–th central moment of X is
X
E((X − E(X))k ) (xi − E(X))k Pr(X = xi ).
i

Among moments the second one has a significant importance.

Definition 4.8. The Variance of the discrete random variable X is

V ar(X) = E((X − E(X))2 ) = E(X 2 ) − (E(X))2 .
The standard deviation of X is
p
σ(X) = + V ar(X).

The variance of X measures the mean deviation of X with respect to the expected value.
The smaller the variance, the more concentrated are the values of X around its mean (less
probability that it takes values far from its mean). A quantitative measure of this deviation
is given by the Chebyshev inequality.

Theorem 4.9 (Markov and Chebyshev inequalities). Let X be a discrete random variable
taking only nonnegative values, with finite expectation. Then, for each a ∈ R+ ,
E(X)
Pr(X ≥ a) ≤ . (Markov inequality)
a
Let X be a discrete random variable with first and second moments. Then, for each a ∈ R+ ,
V ar(X)
Pr(|X − E(X)| ≥ a) ≤ (Chebyshev inequality).
a2

One particular consequence is that, if V ar(X) = 0 then X takes the only value E(X) with
probability one: such a variable is a constant.

Example 4.10. Let X be a random variable with uniform distribution U (n). Then
n
X i 1 n+1 n+1
E(X) = = = .
i=1
n n 2 2

Computing the Variance requires the formula ni=1 i2 = n(n + 1)(2n + 1)/6 which can be
P

proved by induction on n.
n
2
X i2 (n + 1)(2n + 1)
E(X ) = = .
i=1
n 6
Hence,
n2 − 1
V ar(X) = E(X 2 ) − (E(X))2 = .
12
10
Chebyshev inequality gives
n2 − 1
Pr(|X − (n + 1)/2| ≥ k) ≤
12k 2
while the actual value is
n − 2k + 1 n + 2k + 1 2k − 1 n − 2k + 1
Pr(|X −(n+1)/2| ≥ k) = 1−Pr( <X< ) = 1− = .
2 2 n n
For example, for n = 13 the two values are

k 1 2 3 4 5 6
Chebyshev 14 7/2 14/9 7/8 14/25 7/18
Actual value 12/13 10/13 8/13 6/13 4/13 2/13

which shows that the bounds can be rather poor. However, the Chebyshev estimation is valid
for any probability distribution and it can be tight.

The variance of the basic distributions we have seen so far is as follows:

Distribution Variance
X ∼ B(p) pq
X ∼ Bin(n, p) npq
X ∼ P ois(λ) λ
X ∼ Geom(p) q/p2
X ∼ N egBin(r, p) rq/p2
X ∼ HypGeom(n, n1 , r) rn1 (n − n1 )(n − r)/n2 (n − 1).

5. Law of averages

We now come to one of the connections between the axiomatic approach to probability
theory and the frequency one. Let A be an event in a probability space with probability p =
Pr(A). If we make n independent repetitions of the experience associated to the probability
space, the number X of times the event A appears follows a binomial distribution X ∼
Bin(n, p). The proportion of successes in the n trials is X/n, the relative frequency of
appearance of A. By Chebyshev inequality, for every a > 0
pq 1
Pr(|X/n − p| ≥ a) = Pr(|X − E(X)| ≥ na) ≤ → 0 (n → ∞).
a n
In words, the relative frequency of appearance of A approaches its probability for n large.
This is precisely the intuitive meaning of probability and matches with its axiomatic presen-
tation. The above fact is the simplest expression of what is called the Law of Large Numbers,
already obtained by Bernouilli in 1692.
It is interesting to note that, for the particular case presented before, one can obtain a
much better estimation than the one given by Chebyshev inequality. The next bound was
11
first obtained by Bernstein and belongs to what is known as Chernoff bounds. The proof
illustrates an interesting technique worth to see.

Theorem 5.1. Let X ∼ Bin(n, p). For each > 0 we have

2 /4
Pr(|X/n − p| ≥ ) ≤ 2e−n .

Proof. We simply estimate one tail,

X X n k n−k
Pr(X/n ≥ p + ) = Pr(X = k) = p q .
k
k≥n(p+) k≥n(p+)

Let m = dn(p + )e. For each λ > 0 and each k ≥ m we have eλk ≥ eλn(p+) . Therefore,
n
λ(k−n(p+) n
X
Pr(X/n ≥ p + ) ≤ e pk q n−k
k=m
k
n
−λn
X n
≤e (peλq )k (qe−λp )n−k
k=0
k
= e−λn (peλq + qe−λp )n .
2
By using ex ≤ x + ex , valid for all x, one can turn both exponents to positive:
2 q2 2 p2
Pr(X/n ≥ p + ) ≤ e−λn (peλ + qeλ )n
2 n−λn
= eλ ,
2 q2 2 p2 2
where in the last inequality we write eλ , eλ ≤ eλ . This inequality, valid for every λ > 0,
is optimized when λ = /2, giving
2 /4
Pr(X/n ≥ p + ) ≤ e−n .

6. Continuous random variables

Continuous random variables are roughly identified by the fact that the distribution func-
tion is continuous. As it happens, this requirement is not enough to identify the class of
continuous random variables. Instead we ask the stronger requirement that the distribution
function can be obtained by integration of a density function.

Definition 6.1 (Continuous random variable). A random variable is continuous if there is

a function f : R → R such that, for each x ∈ R,
Z x
FX (x) = f (t)dt.
−∞

The function f is called the probability density function of X and usually denoted by fX .
12
By the fundamental theorem of Calculus, we have
fX (x) = FX0 (x),
at each point x where FX has a derivative. It is a result from Calculus that a continuous
random variable has a continuous distribution function. This in particular shows that
Pr(X = x) = FX (x) − lim FX (t) = 0
t↑x

for all x. In particular, Pr(X ∈ A) = 0 for every countable set A ⊂ R. By the Fundamental
theorem of Calculus,
Pr(a < X < a + h) = fX (a)h + o(h),
which can be written, for small h,
Pr(a < X < a + h)
≈ fX (a),
h
which explains the name ‘probability density’ for fX . So, large values of fX indicate large
probability of being locally around the argument. In this sense one can interpret continuous
random variables as limit versions of discrete ones.
Example 6.2. Let Xn be a uniform discrete distribution on n points. For n → ∞ the
distribution function FXn tends to the function F (x) = x1[ 0, 1](x), which is a distribution
function of a continuous random variable Y . The density of Y is fY (y) = 1[0,1] . The
density function is constant on the points where it is nonzero indicating that the distribution
is uniform. This is called the uniform distribution U (0, 1) and corresponds to the random
choice of a point in the interval.

FX (x)

Figure 1. The continuous uniform distribution as a limit of the discrete one.

The probability that X lies in a set A can be obtained from the density function as
Z
Pr(X ∈ A) = fX (t)dt.
A
In particular, Z b
Pr(a < X ≤ b) = Pr(a < X < b) = fX (t)dt.
a
The density function of a continuous random variable has the following properties:
13
Proposition 6.3. Let fX be the density function of continuous random variable X. The
following holds:

(i) RfX (x) ≥ 0 for all x, and

∞
(ii) −∞ fX (t)dt = 1.

The above properties characterize the class of functions which are density functions of
some continuous random variable.

7. Probability models

The following are some of the important continuous distributions.

7.1. Uniform distribution. We choose a random point in an interval (a, b). All intervals
with the same length have the same probability. This leads to:

Definition 7.1. A random variable X has the uniform distribution on an interval (a, b) if
its density function is
(
1 1/(b − a) x ∈ (a, b)
fX (x) = 1(a,b) (x) = .
b−a 0 x 6∈ (a, b)
The distribution function of X is, for x 6= a, b,

0 x<a


1
FX (x) = b−a x a < x < b

1

x > b.

we write X ∼ U (a, b).

7.2. Exponential distribution. The exponential distribution can be seen as the limiting
distribution of a geometric one. The model corresponds to the time a random event occurs
when the probability of occurring in a small interval is a Bernouilli variable independent of
the occurrence in other disjoint intervals.

Definition 7.2. A random variable X has the exponential distribution with parameter λ if
its density function is
fX (x) = λe−λx 1(0,∞) (x)
The distribution function of X is, for x > 0,
Z x
FX (x) = λe−λt dt = 1 − e−λx .
0

we write X ∼ Exp(λ).
14
If Xn is a discrete random variable with the geometric distribution, Xn ∼ Geom(1/n),
then
Pr(Xn ≤ kn) = 1 − (1 − 1/n)kn → 1 − e−k ,
This illustrates how the exponential distribution can be seen as the limit of a Geometric
distribution. In particular, the exponential distribution also has the memoryless property:
Proposition 7.3. Let X be a random variable with the exponential distribution. Then, for
s, t > 0
Pr(X > t|X > s) = Pr(X > t − s).

It can be shown that a continuous random variable taking values in (0, ∞) with the
memoryless property has an exponential distribution.
An interesting connection with the Poisson distribution is worth mentioning. Suppose
that the number of events in a time interval [0, t] is a random variable X with a Poisson law
X ∼ P oiss(λt). There are natural assumptions which lead to such distribution. Then the
waiting time for the first event to happen is a random variable T with distribution
Pr(T ≤ t) = Pr(X ≥ 1) = 1 − Pr(X = 0) = 1 − e−λt ,
so that T follows an Exponential distribution T ∼ Exp(λ).

7.3. Normal distribution. The Normal distribution is among the most important ones
in Probability and Statistics. One of the reasons is that it can be seen as the limiting
distribution of the Binomial distribution. A such, it models the sum of (infinitely many)
independent Bernouilli variables. It occurs in random phenomena which are the sum of many
independent inputs. It was observed by Gauss as the law of errors in measurements, and for
that reason it is also known as the Gaussian distribution.
Definition 7.4. A random variable X has the normal distribution with parameters m, σ if
its density function is
1 (x−m)2
fX (x) = √ e− 2σ2 .
2πσ
We write X ∼ N (m, σ 2 ).

As it happens, the density function of a normal distribution does not have a primitive
which can be expressed as a finite combination of elementary functions. For that reason,
the values of the normal distribution were historically recorded in tables. These values are
accessible through most standard mathematical softwares, particularly in R.
For m = 0 and σ 2 = 1 the corresponding normal distribution N (0, 1) is called the standard
one and has the more transparent density function
1 2
fX (x) = √ e−x /2 .
2π
15
This is a symmetric function with respect to the origin and has particularly small tails.
Proposition 7.5. Let X ∼ N (0, 1). Then, for x > 0,
Z ∞
1 2 1 2
Pr(X > x) = √ e−t /2 dt < √ e−x .
2π t x 2π

In his celebrated treatise by Laplace on probability one can already find what is known
as the De Moivre-Laplace theorem which states that the binomial probability tends to the
Normal one. More precisely,
Theorem 7.6 (De Moivre-Laplace). For n large, X ∼ Bin(n, p) and k close to np we have

n k n−k 1 (k−np)2
Pr(X = k) = p q ∼√ e− 2npq ,
k 2πnpq
which is the value of the density of the normal distribution N (np, npq) at k.

The Galton Bean Game is a physical experiment which illustrates the above Theorem.

The de Moivre-Laplace Theorem is the first form of the celebrated Central Limit Theorem,
one of the central results in Probability and Statistics. The general form of this basic result
will be discussed later on in this course.

8. Expectation and Moments

Expectation, and general moments, can be defined for general random variables. They
may be expressed in analogous forms for discrete and for continuous random variables, with
the same meaning.
Definition 8.1 (Expectation and Moments). Let X be a continuous random variable with
density fX . The expectation of X is
Z ∞
E(X) = xfX (x)dx
−∞
16
whenever the integral is absolutely convergent. The k–th moment of the random variable X
is Z ∞
k
E(X ) = xk fX (x)dx,
−∞
and the central k–th moment
Z ∞
k
E((X − m)) = (x − m)k fX (x)dx,
−∞

where m = E(X).
The second central moment is the Variance of X,
2
σX = V ar(X) = E((X − m)2 ) = E(X 2 ) − (E(X))2 .
whenever the corresponding integrals are absolutely convergent.

The following table summarizes the expectation and variance of the most common distri-
butions.
Distribution Mean Value Variance
X ∼ U (a, b) (a + b)/2 (b − a)2 /12
X ∼ Exp(λ) 1/λ 1/λ2
X ∼ N (m, σ 2 ) m σ2

9. Functions of random variables

Functions of continuous random variables through continuous, and more general, functions
are continuous random variables. The following is a useful result for computing density
functions and expectations of such functions of random variables.

Theorem 9.1 (Change of Variable). Let X be a continuous random variable with density
fX . Let gR → R be a differentiable function with g 0 (x) > 0 for each x. Then,
1
fY (y) = fX (x),
g 0 (x)
where Y = g(X) and y = g(x).
Moreover, if g 0 (x) < 0 for each x then
1
fY (y) = fX (x),
|g 0 (x)|
Example 9.2 (Linear Transformations). One simple example of a transformation is a linear
one, g(x) = ax + b, and, for Y = aX + b, one gets
1 y−b
fY (y) = fX ( ).
|a| a
17
One important case is converting a random variable to a standard one. If m = E(X) and
V ar(X) = σ 2 , the linear transformation Y = X−m
σ
turns X into a variable with expectation
E(Y ) = 0 and Variance V ar(Y ) = 1. Its density function is
fY (y) = σfX (σy + m).

Example 9.3 (Lognormal distribution). Let X ∼ N (0, 1) be a standard normal distribution.

Consider the random variable Y = eX . In this case Y = g(X) with g(x) = ex , a differentiable
function with positive derivative elsewhere. If y = g(x) then x = ln(y). Therefore, g 0 (x) =
elny = y and Theorem 9.1 gives, for y ≥ 0,
1 1 1 2
fY (y) = fX (ln y) = √ e−(ln y) /2 .
y y 2π
The random variable Y is said to have the lognormal distribution (because log Y ∼ N (0, 1)).
It is an important distribution with interesting properties which arises naturally in several
random phenomena.

The above Theorem can be extended to differentiable functions g which are not necessarily
strictly monotone. The case Y = X 2 is an important example which illustrates the general
situation.

Proposition 9.4. Let X be a continuous random variable with density fX . The density of
Y = X 2 is, for y > 0,
1 √ √
fY (x) = √ (fX ( y) + fX (− y)).
2 y

Sometimes it is simpler to obtain the expectation of a function Y = g(X) of a random

variable X in terms of the density fX directly instead of computing previously the density
of Y . This can be achieved as follows.

Theorem 9.5 (Expectation of a function). Let X be a continuous random variable with

density fX . Let gR → R be a continuous function. Then
Z ∞
E(g(X)) = g(x)fX (x)dx,
−∞

if the integral is absolutely convergent.

The computation of the k-th moment of a random variable is an example of application

of the above Theorem, applied to the function g(x) = xk .

10. Markov inequality

As in the discrete case, the Markov and Chebyshev inequalities are valid for continuous
random variables.
18
Theorem 10.1 (Markov and Chebyshev inequalities). Let X be a continuous random vari-
able taking only nonnegative values, with finite expectation. Then, for each a ∈ R+ ,
E(X)
Pr(X ≥ a) ≤ . (Markov inequality)
a
Let X be a continuous random variable with first and second moments. Then, for each
a ∈ R+ ,
V ar(X)
Pr(|X − E(X)| ≥ a) ≤ (Chebyshev inequality).
a2

11. Simulation of random variables

Most programming languages and software packages have a primitive to obtain a random
number with the uniform distribution in the interval [0, 1]. It is invoked as random() in
Python, or rand() in C++. Actually these numbers are produced by computational means
which are called Pseudo Random Number Generators. They produce sequences of numbers
which are not random but do have statistics close to what a truly random number would
have. Among the most common devices are the Linear Congruential Generators, which are
based on a congruence reccurrence of the form
xn+1 = (axn + b) (mod m),
for suitable chosen a, b and m. The sequence starts in an initial point, called a seed, which
is often taken from the computer clock.
The Mersenne Twister is based on an analogous linear recurrence on a finite field by using
a large Mersenne prime (219937 − 1 is used) and it is the random generator used by Python
or R among many other programming languages and mathematical software systems.
Once the uniform distribution is at disposal then one can produce random samples for
other distributions easily. The most common way is based in the following Proposition.

Proposition 11.1. Let X be a continuous random variable with distribution function FX .

Then
U = FX (X) ∼ U ([0, 1]).

Thus, if FX is inversible (strictly monotone) then one can obtain the distribution of X
from a uniform distribution simply by writing X = FX−1 (U ).

Example 11.2. In order to sample the exponential distribution X ∼ Exp(λ) one can obtain
a sample U of the uniform distribution in [0, 1] and then apply the function
1
X = − ln(1 − U ).
λ

19
There are other specific functions, particularly to sample the Normal distribution, whose
distribution function is not expressible in simple analytic terms.
Discrete distributions can be also sampled with analogous methods. If X is a discrete
random variable which takes integer values with distribution function FX then we sample
X = k if FX (k − 1) < U ≤ FX (k).

Sampling with R according some distribution can be done directly by the sample call.
Simulation of random phenomena is a very common tool and it can become an art of many
subtleties and technical difficulties.

SI Chapter-1
No ratings yet
SI Chapter-1
30 pages
CHP 5
No ratings yet
CHP 5
63 pages
03 PDF
No ratings yet
03 PDF
22 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
125 pages
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
11 pages
Digital Communication Basics
No ratings yet
Digital Communication Basics
109 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
Chap 02
No ratings yet
Chap 02
153 pages
Discrete Random Variables Class 4, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Discrete Random Variables Class 4, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
13 pages
Ch3 Random Variables
No ratings yet
Ch3 Random Variables
27 pages
Introduction to Probability Basics
No ratings yet
Introduction to Probability Basics
4 pages
Chapter 3 & 4
No ratings yet
Chapter 3 & 4
12 pages
Stochastic Processes SM
No ratings yet
Stochastic Processes SM
82 pages
Random Variables
No ratings yet
Random Variables
21 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
10 pages
Probability
No ratings yet
Probability
69 pages
Best Ones
No ratings yet
Best Ones
32 pages
Example 3.1.1
No ratings yet
Example 3.1.1
22 pages
Random Variables & Distributions Guide
No ratings yet
Random Variables & Distributions Guide
80 pages
Basic Probability & Statistics Guide
No ratings yet
Basic Probability & Statistics Guide
35 pages
Engineering Uncertainty Notes
No ratings yet
Engineering Uncertainty Notes
15 pages
Statistical Foundations: SOST70151 - LECTURE 2
No ratings yet
Statistical Foundations: SOST70151 - LECTURE 2
43 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Lesson 3 Theory
No ratings yet
Lesson 3 Theory
25 pages
Lect 03
No ratings yet
Lect 03
15 pages
Randomvariables
No ratings yet
Randomvariables
18 pages
Econ-2042 - Unit 2-HO
No ratings yet
Econ-2042 - Unit 2-HO
12 pages
Chapter 2 - Lec 3-4
No ratings yet
Chapter 2 - Lec 3-4
57 pages
Learning Quiz 3 - Discrete Random Variables - Jupyter Notebook
No ratings yet
Learning Quiz 3 - Discrete Random Variables - Jupyter Notebook
15 pages
Real Analysis A Long-Form M - (Z-Library) - 1
No ratings yet
Real Analysis A Long-Form M - (Z-Library) - 1
18 pages
Fundamentals of Probability
No ratings yet
Fundamentals of Probability
58 pages
Probability Distributions Course
No ratings yet
Probability Distributions Course
61 pages
Print
No ratings yet
Print
12 pages
Random Variables FinalNotes
No ratings yet
Random Variables FinalNotes
57 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
Probability and Statistics: Cheat Sheet
100% (1)
Probability and Statistics: Cheat Sheet
10 pages
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
No ratings yet
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
6 pages
One Dim. RV (CH 3 and 4)
No ratings yet
One Dim. RV (CH 3 and 4)
11 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Stat 350 Study Guide
No ratings yet
Stat 350 Study Guide
37 pages
2 Random Variable
No ratings yet
2 Random Variable
69 pages
Chapitre 3 - Random Variables
No ratings yet
Chapitre 3 - Random Variables
44 pages
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
No ratings yet
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
14 pages
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
No ratings yet
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
113 pages
Probability Review
No ratings yet
Probability Review
12 pages
Stat 116
No ratings yet
Stat 116
7 pages
Random Variables Apr 27
No ratings yet
Random Variables Apr 27
32 pages
Unit 2 P&S
No ratings yet
Unit 2 P&S
82 pages
Chapter 2
No ratings yet
Chapter 2
25 pages
UNIT II Probability Theory
No ratings yet
UNIT II Probability Theory
77 pages
3.lecture 3-Random Variables
No ratings yet
3.lecture 3-Random Variables
22 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
Basic Statistics and Probability Theory
No ratings yet
Basic Statistics and Probability Theory
45 pages
Discrete Random Variables
No ratings yet
Discrete Random Variables
34 pages
Basic-Statistic PDF
No ratings yet
Basic-Statistic PDF
56 pages
ECN-511 Random Variables 11
No ratings yet
ECN-511 Random Variables 11
106 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
8 A. K. Majee
No ratings yet
8 A. K. Majee
13 pages
4090 en
No ratings yet
4090 en
4 pages
Activity 1d (Joint Probability Distribution) : Attempt History
No ratings yet
Activity 1d (Joint Probability Distribution) : Attempt History
7 pages
A Literature Survey On Load-Frequency Control For Conventional and Distribution Generation Power Systems
No ratings yet
A Literature Survey On Load-Frequency Control For Conventional and Distribution Generation Power Systems
17 pages
D 6473 - 15 Standard Test Method For Specific Gravity and Absorption of Rock..
No ratings yet
D 6473 - 15 Standard Test Method For Specific Gravity and Absorption of Rock..
4 pages
Class Xii Physics Practical Lab Manual New
No ratings yet
Class Xii Physics Practical Lab Manual New
28 pages
Abstract Reasoning 20 QnA With Explanation
No ratings yet
Abstract Reasoning 20 QnA With Explanation
12 pages
Mechanical Comparators
No ratings yet
Mechanical Comparators
16 pages
Chapter 3 Non-Uniform Flow Tutorial
No ratings yet
Chapter 3 Non-Uniform Flow Tutorial
2 pages
Eigenforms Interfaces & Encoding + My Commentary
No ratings yet
Eigenforms Interfaces & Encoding + My Commentary
27 pages
Fluid Dynamics Problems and Solutions
No ratings yet
Fluid Dynamics Problems and Solutions
2 pages
Omoniyi 2021 IOP Conf. Ser. Mater. Sci. Eng. 1107 012094
No ratings yet
Omoniyi 2021 IOP Conf. Ser. Mater. Sci. Eng. 1107 012094
9 pages
Large N QCD Simplification
100% (2)
Large N QCD Simplification
38 pages
Flow Compensation in Gas Measurement
No ratings yet
Flow Compensation in Gas Measurement
10 pages
Grain Size Analysis of Soils
No ratings yet
Grain Size Analysis of Soils
10 pages
Mechanical Properties of Solids: Elasticity
No ratings yet
Mechanical Properties of Solids: Elasticity
12 pages
Fluid Mechanics Formula Sheet
No ratings yet
Fluid Mechanics Formula Sheet
3 pages
Maintenance Engineering 3
No ratings yet
Maintenance Engineering 3
14 pages
Verbalizing Data-Interpreting Graphs and Tabels
No ratings yet
Verbalizing Data-Interpreting Graphs and Tabels
34 pages
HW2 2
No ratings yet
HW2 2
4 pages
What Is Metaphysics...
100% (1)
What Is Metaphysics...
11 pages
Trial 2003 Admt P1
No ratings yet
Trial 2003 Admt P1
11 pages
Units and Dimensions: Introduction To Chemical Engineering Calculations
No ratings yet
Units and Dimensions: Introduction To Chemical Engineering Calculations
24 pages
Vallentyne 1993 Utilitarianism and Infinite Utility
No ratings yet
Vallentyne 1993 Utilitarianism and Infinite Utility
7 pages
Generic Article Paper Science Quantum
No ratings yet
Generic Article Paper Science Quantum
5 pages
Vlsi - Unit 1 Notes
No ratings yet
Vlsi - Unit 1 Notes
41 pages
ZM-Planning Datasheet
No ratings yet
ZM-Planning Datasheet
5 pages
20120224043147
100% (2)
20120224043147
8 pages
DT 3-Fix-Soal Science-P6
No ratings yet
DT 3-Fix-Soal Science-P6
3 pages
Arches in Steel
100% (1)
Arches in Steel
24 pages

Probability and Statistics

Uploaded by

Probability and Statistics

Uploaded by

Probability and Statistics

Data Science Engineering

Random Variables. Cumulative distribution function. Discrete Random variables. Proba-

Definition 1.1. A random variable on a probability space (Ω, A, Pr) is a function

Distribution functions are identified by the following properties:

Proposition 1.5. Let FX be the distribution function of a random variable X. Then

(1) 0 ≤ FX (x) ≤ 1 for all x ∈ R.

(i) For a, b ∈ R, a < b, we have Pr(a < X ≤ b) = FX (b) − FX (a).

2. Discrete Random variables

A random variable is discrete if it takes a countable number of values.

An equivalent definition is that the distribution function FX of X is a step function: there

3. Discrete Probability models

Proposition 3.3. Let X ∼ Bin(n, p). Then

Pr(X = k) ≤ Pr(X = k + 1), 0 ≤ k ≤ np − 1

This leads to the following definition.

Even if a result of an asymptotic approximation, the expression of the Poisson prob-

3.4. Geometric model. We again repeat independently an experience associated to a

Of course, when r = 1 we obtain the geometric distribution.

3.6. Hypergeometric model. In most of the above examples we consider independent

The range of values of k for which the above definition is meaningful is

4. Expectation and moments

Expectation is a central concept in probability and statistics. The mean of a sequence

Definition 4.1 (Expectation). The expectation of a discrete random variable X is

if the sum is absolutely convergent.

The expectation of the basic distributions we have seen so far is as follows:

One important property of expectation is linearity. For this it is meaningful to have a

where (X = i, Z = k − i) is shorthand for {X = i} ∩ {Y = k − i}.

Among moments the second one has a significant importance.

Definition 4.8. The Variance of the discrete random variable X is

The variance of the basic distributions we have seen so far is as follows:

Theorem 5.1. Let X ∼ Bin(n, p). For each  > 0 we have

Proof. We simply estimate one tail,

6. Continuous random variables

Definition 6.1 (Continuous random variable). A random variable is continuous if there is

Figure 1. The continuous uniform distribution as a limit of the discrete one.

(i) RfX (x) ≥ 0 for all x, and

The following are some of the important continuous distributions.

we write X ∼ U (a, b).

8. Expectation and Moments

9. Functions of random variables

Example 9.3 (Lognormal distribution). Let X ∼ N (0, 1) be a standard normal distribution.

Sometimes it is simpler to obtain the expectation of a function Y = g(X) of a random

Theorem 9.5 (Expectation of a function). Let X be a continuous random variable with

if the integral is absolutely convergent.

The computation of the k-th moment of a random variable is an example of application

10. Markov inequality

11. Simulation of random variables

Proposition 11.1. Let X be a continuous random variable with distribution function FX .

You might also like

Theorem 5.1. Let X ∼ Bin(n, p). For each > 0 we have