Lecture 7: Random Variables and Probability
Distributions
Ye Tian
Department of Statistics, Columbia University
Calculus-based Introduction to Statistics (S1201)
July 14, 2022
1/32
Recap: independence and conditional independence
Independence:
◦ Two events A and B are independent if P(A|B) = P(A), denoted as
A⊥⊥ B. They are dependent otherwise, denoted as A ∕⊥
⊥ B.
◦ Events A1 , . . . , An are (mutually) independent if for every
k = 2, 3, . . . , n and every subset of indices i1 , i2 , . . . , ik ,
P(Ai1 ∩ Ai2 ∩ . . . ∩ Aik ) = P(Ai1 )P(Ai2 ) · · · P(Aik ).
Conditional independence:
◦ Two events A and B are independent conditioning on event C if
P(A ∩ B|C) = P(A|C)P(B|C), denoted as (A ⊥ ⊥ B)|C. They are
dependent otherwise, denoted as (A ∕⊥
⊥ B)|C.
◦ Events A1 , . . . , An are (mutually) independent conditioning on event
C if for every k = 2, 3, . . . , n and every subset of indices i1 , i2 , . . . , ik ,
P(Ai1 ∩ Ai2 ∩ . . . ∩ Aik |C) = P(Ai1 |C)P(Ai2 |C) · · · P(Aik |C).
2/32
Recap: probability calculation
P(·) : A ∈ a set of all events → a number P(A) (a mapping/function
which maps an event to a number)
Three axioms:
◦ 0 ≤ P(A) ≤ 1 for any event A ⊆ Ω
◦ P(Ω) = 1, P(∅) = 0
◦ If A1 , A2 , A3 , …, An is a collection of disjoint (mutually exclusive)
events, then
!n
P(A1 ∪ A2 ∪ . . . ∪ An ) = P(Ai ).
i=1
Theorem: In an experiment consisting of N outcomes with equal
probability, for any event A,
N (A)
P(A) = ,
N
where we use counting techniques to calculate N (A) and N .
3/32
Recap: probability calculation
General idea to calculate probability:
(1) Translate the conditions and the event of interest by probability
language
(2) Calculate.
⊲ For equally likely outcomes, consider P(A) = NN (A)
combined with
counting techniques
⊲ If conditional probability is easier to calculate or it is given, consider
multiplication rule, rule of total probability, and Bayes' Theorem
⊲ See whether there are independence/conditional independence or
not, which may help us to do the calculation
4/32
Today's goal
◦ Understand the random variable and know how to use it in practice
◦ Understand the probability distribution, probability mass function (pmf),
probability density function (pdf), and cumulative distribution function
(cdf)
◦ Know the difference between discrete and continuous random variables
◦ See some examples of discrete and continuous random variables
5/32
Random Variables
6/32
Why do we need random variables
It seems that we have covered everything about probability...
◦ Frequently, we are interested in some numerical aspects of the
outcome
Example:
⊲ In political poll, the number of people voting for Trump among 100
⊲ The number of heads when flipping a coin 10 times
◦ And sometimes we are interested in the probability of many events
instead of a single one, where we want to systematically give a formula
for every case instead of studying the event separately
Example:
⊲ The number of people voting for Trump among 100 (denoted as
X): P({the number = 20}) =?, P({the number = 50}) =?,
P({the number = 100}) =?
⊲ The number of heads when flipping a coin 10 times (denoted as
X): P({the number = 5}) =?, P({the number = 0}) =?,
P({the number = 10}) =?
7/32
Random variables
Definition: Given an experiment and the sample space Ω, a random
variable is a function mapping a outcome (ω ∈ Ω) into a real number, i.e.
X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).
Example 1: Toss a coin 3 times: the sample space is
Ω = {H, T} × {H, T} × {H, T}. Random variable X = the number of
heads.
For instance: X({HHH}) = 3, X({THH}) = X({HHT}) = 2,
X({TTT}) = 0
8/32
Random variables
Definition: Given an experiment and the sample space Ω, a random
variable is a function mapping a outcome (ω ∈ Ω) into a real number,
i.e.
X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).
Example 2: Toss two dice: the sample space is
Ω = {1, 2, . . . , 6} × {1, 2, . . . , 6}. Random variable X = the sum of dice.
For instance: X({(1, 3)}) = 4, X({(4, 5)}) = 9, X({(6, 6)}) = 12 9/32
Random variables
Definition: Given an experiment and the sample space Ω, a random
variable is a function mapping a outcome (ω ∈ Ω) into a real number, i.e.
X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).
Example 3: Suppose that we select a location at random (defined by
latitude and longitude) and define X to be the temperature at that
location at the current time.
10/32
Discrete and continuous random variables
◦ When the possible values of of a random variable are countable1 , the
random variable is discrete.
Examples: the number of heads/tails of coin flipping, the number of
dice etc.
◦ When both of the following apply, the random variable is continuous.
⊲ The range is uncountable (e.g.: an interval on the number line)
⊲ No possible value of the variable has positive probability, i.e.
P(X = c) = 0 for any number c.
Examples: the temperature at a random location
1
either constitute a finite set or else can be listed in an infinite sequence in which
there is a first element, a second element, and so on ("countably" infinite) 11/32
Exercise
Describe the set of possible values for the variable, and state whether the
variable is discrete.
(1) X = the number of unbroken eggs in a randomly chosen standard
egg carton
(2) Y = the number of students on a class list for a particular course
who are absent on the first day of classes
(3) U = the number of times a duffer has to swing at a golf ball before
hitting it
(4) X = the length of a randomly selected rattlesnake
(5) Z = the sales tax percentage for a randomly selected Amazon
purchase
(6) Y = the pH of a randomly chosen soil sample
(7) X = the tension (psi) at which a randomly selected tennis racket
has been strung
(8) X = the total number of times three tennis players must spin their
rackets to obtain something other than UUU or DDD (to determine
which two play next) 12/32
Random variables and random events
Compare their definitions:
◦ A random variable is a function mapping a outcome (ω ∈ Ω) into a
real number, i.e. X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).
◦ An event is a set (collection) of outcomes.
Furthermore, for any subset B on the number line 2 , {ω : X(ω) ∈ B} is a
random event, which is a set (collection) of outcomes. And we can
calculate the corresponding probability.
Therefore, P(X ∈ B) = P({ω : X(ω) ∈ B}).
2
Actually not "any" subset, but the current statement is enough and correct for this
course. You will learn more in a PhD-level probability course in the future. Currently, B
can be any union/intersection of intervals/points on the real line. E.g., B can be (0, 1),
[−2, +∞), (5, 5.5], {1}, {−1, 2.5}, (−3, −1) ∪ (9, 10] etc. 13/32
Random variables and random events
Let's recall our previous example:
Toss a fair coin 3 times: the sample space is
Ω = {H, T} × {H, T} × {H, T}. Random variable X = the number of
heads.
Therefore,
1
P(X = 3) = P({HHH}) = ,
8
3
P(X = 2) = P({HTH, THH, HHT}) = ,
8
3
P(X = 1) = P({HTT, THT, TTH}) = ,
8
1
P(X = 0) = P({TTT}) = .
8
14/32
Distribution of Random Variables
15/32
Distribution
Definition: The (probability) distribution of a random variable X
describes how the total probability of 1 is distributed among all possible
values of X. It tells us P(X ∈ B) = P({ω : X(ω) ∈ B}) for any subset B
of number line 3 .
Definition: Cumulative distribution function (cdf) of a r.v. X is
defined as
F (x) = P(X ≤ x)
for any number x (including −∞ and +∞).
Proposition: The cdf can describe the distribution of random variables.
Why? (Not need to know): Because any subset B of the real line can be
expressed as the union/intersection/difference of intervals like (−∞, x].
E.g.: (5, 10] = (−∞, 10]\(−∞, 5])
why?
⇒ Then P(X ∈ (5, 10]) = P(X ≤ 10) − P(X ≤ 5) = F (10) − F (5).
3
Currently, B can be any union/intersection of intervals/points on the real line. E.g.,
B can be (0, 1), [−2, +∞), (5, 5.5], {1}, {−1, 2.5}, (−3, −1) ∪ (9, 10] etc. 16/32
Distribution
Definition: Cumulative distribution function (cdf) of a r.v. X is
defined as
F (x) = P(X ≤ x)
for any number x (including −∞ and +∞).
Remark:
◦ F (+∞) = 1
because (−∞, +∞] ⊇ Ω ⇒ F (+∞) = P((−∞, +∞]) ≥ P(Ω) = 1
◦ F (−∞) = 0
because (−∞, −∞) behaves like an empty set 4
⇒ F (−∞) = P((−∞, −∞)) = 0
◦ Therefore 0 ≤ F (x) ≤ 1 for any number x
◦ F (x) is an increasing function, i.e. for x1 ≤ x2 , F (x1 ) ≤ F (x2 ) (why?)
◦ (Not required to know) F (x) is right-continuous, i.e.
limz→x+0 F (z) = F (x)
4
not accurate, but enough for this course 17/32
Distribution
Definition: For a discrete r.v. X, its distribution can also be described by
probability mass function (pmf)
p(x) = P(X = x) = P({ω : X(ω) = x})
for any number x (including −∞ and +∞).
◦ For discrete r.v., suppose S = {z1 , z2 , z3 , . . .} including all possible
values of X, then:
⊲ p(x) > 0 only when x ∈ S, and p(x) = 0 elsewhere
"
⊲ F (x) =
i:zi ≤x p(zi )
⊲ p(zi ) = F (x2 ) − F (x1 ) for any x1 and x2 with (x1 , x2 ] ∩ S = {zi }
◦ Two conditions for a valid pmf:
(1) p(x)
" ≥ 0 for any x;
(2) x∈S p(x) = 1.
◦ It's senseless to talk about pmf of continuous r.v., because
P(X = x) = 0 for any number x if X is continuous! (will see that from
the view of integral)
18/32
cdf and pmf of discrete random variables
cdf: F (x) = P(X ≤ x)
pmf: p(x) = P(X = x)
For discrete r.v., suppose S = {z1 , z2 , z3 , . . .} including all possible values
of X, then:
◦ p(x) > 0 only when x ∈ S, and p(x) = 0 elsewhere
"
◦ F (x) = i:zi ≤x p(zi )
◦ p(zi ) = F (x2 ) − F (x1 ) for any x1 and x2 with (x1 , x2 ] ∩ S = {zi }
"
◦ P(x1 < X ≤ x2 ) = F (x2 ) − F (x1 ) = i:x1 <zi ≤x2 p(zi )
Example: The pmf of a discrete r.v. is
F (2) = P(X ≤ 2) = p(0) + p(1) + p(2) = 0.05 + 0.1 + 0.15 = 0.3,F (1) =
P(X ≤ 1) = p(0) + p(1) = 0.05 + 0.1 = 0.15,
P(1 ≤ X ≤ 3) = p(1) + p(2) + p(3) = 0.1 + 0.15 + 0.25 = 0.5
19/32
Example of discrete distribution: binomial distribution
Toss a unfair coin 10 times. Suppose each time the probabilities of heads
and tails are p and 1 − p, respectively. Random variable X = the number
of heads.
◦ P(X = 0) = P({TTTTTTTTTT}) = (1 − p)10
# $
◦ P(X = 1) = P({9 T's and 1 H}) = 10
1 p(1 − p)
9
◦ ...
◦ In general, the pmf # $ x
p(x) = P(X = x) = P({(10 − x) T's and x H's}) = 10
x p (1 − p)
10−x ,
x = 0, . . . , 10.
" " # $ k
◦ The cdf F (x) = k:0≤k≤x p(k) = k:0≤k≤x 10 k p (1 − p)
10−k
We call such a variable X as binomial random variable and its
distribution as the binomial distribution.
20/32
cdf and pdf of continuous random variables
Definition: For a continuous r.v. X, its distribution can also be described
by a non-negative probability density function (pdf) f (x) which
satisfies % b
P(a < X ≤ b) = F (b) − F (a) = f (x)dx
a
for any two numbers a and b with a ≤ b (including −∞ and +∞).
&b
◦ By letting a = −∞: cdf F (b) = −∞ f (x)dx
◦ By Fundamental theorem of calculus (Newton-Leibniz Theorem): the
cdf F of a continuous variable is differentiable and F ′ (x) = f (x)
◦ For continuous r.v., the single point doesn't matter, i.e.
P(a < X ≤ b) = P(a ≤ X ≤ b) (why?)
◦ An appropriate pdf should satisfy two conditions:
(1) f& (x) ≥ 0 for any number x
(2) −∞ f (x)dx = 1
+∞
21/32
cdf and pdf of continuous random variables
Definition: For a continuous r.v. X, its distribution can also be described
by probability density function (pdf) f (x) which satisfies
% b
P(a < X ≤ b) = F (b) − F (a) = f (x)dx
a
for any two numbers a and b with a ≤ b (including −∞ and +∞).
pdf cdf
The probability of a r.v. falling into a region is the area of shaded region
under pdf f (x), which connects to the physical meaning of the integral!
22/32
cdf and pdf of continuous random variables
Definition: Probability density function (pdf) f (x) satisfies
% b
P(a < X ≤ b) = F (b) − F (a) = f (x)dx
a
for any two numbers a and b with a ≤ b (including −∞ and +∞).
&x
◦ By letting a = −∞: cdf F (x) =
−∞ f (x)dx
◦ F ′ (x) = f (x)
pdf cdf 23/32
Exercise: continuous variables
Given the pdf, write down the corresponding cdf:
(1) f (x) = 1, 0 ≤ x ≤ 1 and f (x) = 0 elsewhere
(2) f (x) = 32 x2 , −1 ≤ x ≤ 1 and f (x) = 0 elsewhere
(3) f (x) = 2e−2x , x ≥ 0 and f (x) = 0 elsewhere
Given the cdf, write down the corresponding pdf:
(1) F (x) = x, 0 ≤ x ≤ 1
(2) F (x) = 1 − e−x , x ≥ 0
24/32
Example of continuous distribution: uniform distribution
We say X follow a uniform distribution on [A, B], if:
'
1
B−A , A≤x≤B
◦ Its pdf is f (x) =
0, elsewhere
(
)
*0, x≤A
◦ Its cdf is F (x) = x−A
A<x≤B
) B−A ,
+
1, x>B
pdf
25/32
Example: uniform distribution
We say X follow a uniform distribution on [A, B], if:
'
1
B−A , A≤x≤B
◦ Its pdf is f (x) =
0, elsewhere
(
)
*0, x≤A
◦ Its cdf is F (x) = x−A
A<x≤B
) B−A ,
+
1, x>B
cdf
26/32
Comparison: discrete and continuous variables
Underlying intuition:
◦ The probability "mass" of discrete variables concentrates at a few points
◦ The probability "mass" of continuous variables spreads out in a dense
region
Characterization of their distributions:
◦ cdf is available for both of them: F (x) = P(X ≤ x)
◦ pmf only works for discrete variables: p(x) = P(X = x)
◦ pdf only &works for continuous variables: f (x) = F ′ (x) and
x
F (x) = −∞ f (t)dt
27/32
Comparison: discrete and continuous variables
Discrete distribution:
pmf cdf
Continuous distribution:
pdf cdf 28/32
Relative frequency bar chart and pmf
Suppose a r.v. X has distribution with this pmf. I sampled X1 , X2 , …,
X1000 independently from this distribution.
pmf bar chart
◦ Empirical relative frequency is an approximation of pmf.
◦ If we sample infinite points, the relative frequency will equal pmf.
◦ We will discuss more on this next week. 29/32
Density histogram and pdf
Suppose a r.v. X has distribution with this pdf. I sampled X1 , X2 , …,
X1000 independently from this distribution.
pdf histogram
◦ Empirical density histogram is an approximation of pdf.
◦ If we sample infinite points and the bin width is infinitely small, the
density histogram will be the same as the pdf curve.
30/32
Reading list (optional)
◦ "Probability and Statistics for Engineering and the Sciences" (9th
edition):
⊲ Chapter 3.1, 3.2, 4.1 and 4.2 (skip the part of expectations)
◦ "OpenIntro statistics" (4th edition, free online, download [here]):
⊲ Chapter 3.4 and 3.5 (It's ok if you feel difficult to understand the
expectation and variance. We will cover them next week.)
31/32
Many thanks to
Yang Feng
Joyce Robbins
Chengliang Tang
Owen Ward
Wenda Zhou
And all my teachers in the past 25 years
32/32