0% found this document useful (0 votes)
11 views32 pages

Best Ones

This lecture covers random variables and probability distributions, emphasizing the definitions and differences between discrete and continuous random variables. It explains independence, conditional independence, and the calculation of probabilities using various rules and theorems. Additionally, it introduces the concepts of probability mass functions (pmf) and cumulative distribution functions (cdf) for describing the distribution of random variables.

Uploaded by

aleenazahra34
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views32 pages

Best Ones

This lecture covers random variables and probability distributions, emphasizing the definitions and differences between discrete and continuous random variables. It explains independence, conditional independence, and the calculation of probabilities using various rules and theorems. Additionally, it introduces the concepts of probability mass functions (pmf) and cumulative distribution functions (cdf) for describing the distribution of random variables.

Uploaded by

aleenazahra34
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Lecture 7: Random Variables and Probability

Distributions

Ye Tian

Department of Statistics, Columbia University


Calculus-based Introduction to Statistics (S1201)

July 14, 2022

1/32
Recap: independence and conditional independence
Independence:
◦ Two events A and B are independent if P(A|B) = P(A), denoted as
A⊥⊥ B. They are dependent otherwise, denoted as A ∕⊥
⊥ B.
◦ Events A1 , . . . , An are (mutually) independent if for every
k = 2, 3, . . . , n and every subset of indices i1 , i2 , . . . , ik ,

P(Ai1 ∩ Ai2 ∩ . . . ∩ Aik ) = P(Ai1 )P(Ai2 ) · · · P(Aik ).

Conditional independence:
◦ Two events A and B are independent conditioning on event C if
P(A ∩ B|C) = P(A|C)P(B|C), denoted as (A ⊥ ⊥ B)|C. They are
dependent otherwise, denoted as (A ∕⊥
⊥ B)|C.
◦ Events A1 , . . . , An are (mutually) independent conditioning on event
C if for every k = 2, 3, . . . , n and every subset of indices i1 , i2 , . . . , ik ,

P(Ai1 ∩ Ai2 ∩ . . . ∩ Aik |C) = P(Ai1 |C)P(Ai2 |C) · · · P(Aik |C).


2/32
Recap: probability calculation
P(·) : A ∈ a set of all events → a number P(A) (a mapping/function
which maps an event to a number)

Three axioms:
◦ 0 ≤ P(A) ≤ 1 for any event A ⊆ Ω
◦ P(Ω) = 1, P(∅) = 0
◦ If A1 , A2 , A3 , …, An is a collection of disjoint (mutually exclusive)
events, then
!n
P(A1 ∪ A2 ∪ . . . ∪ An ) = P(Ai ).
i=1

Theorem: In an experiment consisting of N outcomes with equal


probability, for any event A,
N (A)
P(A) = ,
N
where we use counting techniques to calculate N (A) and N .
3/32
Recap: probability calculation

General idea to calculate probability:


(1) Translate the conditions and the event of interest by probability
language
(2) Calculate.
⊲ For equally likely outcomes, consider P(A) = NN (A)
combined with
counting techniques
⊲ If conditional probability is easier to calculate or it is given, consider
multiplication rule, rule of total probability, and Bayes' Theorem
⊲ See whether there are independence/conditional independence or
not, which may help us to do the calculation

4/32
Today's goal

◦ Understand the random variable and know how to use it in practice


◦ Understand the probability distribution, probability mass function (pmf),
probability density function (pdf), and cumulative distribution function
(cdf)
◦ Know the difference between discrete and continuous random variables
◦ See some examples of discrete and continuous random variables

5/32
Random Variables

6/32
Why do we need random variables
It seems that we have covered everything about probability...
◦ Frequently, we are interested in some numerical aspects of the
outcome
Example:
⊲ In political poll, the number of people voting for Trump among 100
⊲ The number of heads when flipping a coin 10 times

◦ And sometimes we are interested in the probability of many events


instead of a single one, where we want to systematically give a formula
for every case instead of studying the event separately
Example:
⊲ The number of people voting for Trump among 100 (denoted as
X): P({the number = 20}) =?, P({the number = 50}) =?,
P({the number = 100}) =?
⊲ The number of heads when flipping a coin 10 times (denoted as
X): P({the number = 5}) =?, P({the number = 0}) =?,
P({the number = 10}) =?
7/32
Random variables

Definition: Given an experiment and the sample space Ω, a random


variable is a function mapping a outcome (ω ∈ Ω) into a real number, i.e.

X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).

Example 1: Toss a coin 3 times: the sample space is


Ω = {H, T} × {H, T} × {H, T}. Random variable X = the number of
heads.

For instance: X({HHH}) = 3, X({THH}) = X({HHT}) = 2,


X({TTT}) = 0

8/32
Random variables
Definition: Given an experiment and the sample space Ω, a random
variable is a function mapping a outcome (ω ∈ Ω) into a real number,
i.e.
X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).
Example 2: Toss two dice: the sample space is
Ω = {1, 2, . . . , 6} × {1, 2, . . . , 6}. Random variable X = the sum of dice.

For instance: X({(1, 3)}) = 4, X({(4, 5)}) = 9, X({(6, 6)}) = 12 9/32


Random variables
Definition: Given an experiment and the sample space Ω, a random
variable is a function mapping a outcome (ω ∈ Ω) into a real number, i.e.

X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).

Example 3: Suppose that we select a location at random (defined by


latitude and longitude) and define X to be the temperature at that
location at the current time.

10/32
Discrete and continuous random variables

◦ When the possible values of of a random variable are countable1 , the


random variable is discrete.
Examples: the number of heads/tails of coin flipping, the number of
dice etc.
◦ When both of the following apply, the random variable is continuous.
⊲ The range is uncountable (e.g.: an interval on the number line)
⊲ No possible value of the variable has positive probability, i.e.
P(X = c) = 0 for any number c.
Examples: the temperature at a random location

1
either constitute a finite set or else can be listed in an infinite sequence in which
there is a first element, a second element, and so on ("countably" infinite) 11/32
Exercise
Describe the set of possible values for the variable, and state whether the
variable is discrete.
(1) X = the number of unbroken eggs in a randomly chosen standard
egg carton
(2) Y = the number of students on a class list for a particular course
who are absent on the first day of classes
(3) U = the number of times a duffer has to swing at a golf ball before
hitting it
(4) X = the length of a randomly selected rattlesnake
(5) Z = the sales tax percentage for a randomly selected Amazon
purchase
(6) Y = the pH of a randomly chosen soil sample
(7) X = the tension (psi) at which a randomly selected tennis racket
has been strung
(8) X = the total number of times three tennis players must spin their
rackets to obtain something other than UUU or DDD (to determine
which two play next) 12/32
Random variables and random events

Compare their definitions:


◦ A random variable is a function mapping a outcome (ω ∈ Ω) into a
real number, i.e. X : ω ∈ Ω *→ X(ω) ∈ (−∞, +∞).
◦ An event is a set (collection) of outcomes.

Furthermore, for any subset B on the number line 2 , {ω : X(ω) ∈ B} is a


random event, which is a set (collection) of outcomes. And we can
calculate the corresponding probability.

Therefore, P(X ∈ B) = P({ω : X(ω) ∈ B}).

2
Actually not "any" subset, but the current statement is enough and correct for this
course. You will learn more in a PhD-level probability course in the future. Currently, B
can be any union/intersection of intervals/points on the real line. E.g., B can be (0, 1),
[−2, +∞), (5, 5.5], {1}, {−1, 2.5}, (−3, −1) ∪ (9, 10] etc. 13/32
Random variables and random events
Let's recall our previous example:

Toss a fair coin 3 times: the sample space is


Ω = {H, T} × {H, T} × {H, T}. Random variable X = the number of
heads.

Therefore,
1
P(X = 3) = P({HHH}) = ,
8
3
P(X = 2) = P({HTH, THH, HHT}) = ,
8
3
P(X = 1) = P({HTT, THT, TTH}) = ,
8
1
P(X = 0) = P({TTT}) = .
8
14/32
Distribution of Random Variables

15/32
Distribution
Definition: The (probability) distribution of a random variable X
describes how the total probability of 1 is distributed among all possible
values of X. It tells us P(X ∈ B) = P({ω : X(ω) ∈ B}) for any subset B
of number line 3 .
Definition: Cumulative distribution function (cdf) of a r.v. X is
defined as
F (x) = P(X ≤ x)
for any number x (including −∞ and +∞).
Proposition: The cdf can describe the distribution of random variables.
Why? (Not need to know): Because any subset B of the real line can be
expressed as the union/intersection/difference of intervals like (−∞, x].
E.g.: (5, 10] = (−∞, 10]\(−∞, 5])
why?
⇒ Then P(X ∈ (5, 10]) = P(X ≤ 10) − P(X ≤ 5) = F (10) − F (5).

3
Currently, B can be any union/intersection of intervals/points on the real line. E.g.,
B can be (0, 1), [−2, +∞), (5, 5.5], {1}, {−1, 2.5}, (−3, −1) ∪ (9, 10] etc. 16/32
Distribution
Definition: Cumulative distribution function (cdf) of a r.v. X is
defined as
F (x) = P(X ≤ x)
for any number x (including −∞ and +∞).

Remark:
◦ F (+∞) = 1
because (−∞, +∞] ⊇ Ω ⇒ F (+∞) = P((−∞, +∞]) ≥ P(Ω) = 1
◦ F (−∞) = 0
because (−∞, −∞) behaves like an empty set 4

⇒ F (−∞) = P((−∞, −∞)) = 0


◦ Therefore 0 ≤ F (x) ≤ 1 for any number x
◦ F (x) is an increasing function, i.e. for x1 ≤ x2 , F (x1 ) ≤ F (x2 ) (why?)
◦ (Not required to know) F (x) is right-continuous, i.e.
limz→x+0 F (z) = F (x)
4
not accurate, but enough for this course 17/32
Distribution
Definition: For a discrete r.v. X, its distribution can also be described by
probability mass function (pmf)
p(x) = P(X = x) = P({ω : X(ω) = x})
for any number x (including −∞ and +∞).

◦ For discrete r.v., suppose S = {z1 , z2 , z3 , . . .} including all possible


values of X, then:
⊲ p(x) > 0 only when x ∈ S, and p(x) = 0 elsewhere
"
⊲ F (x) =
i:zi ≤x p(zi )
⊲ p(zi ) = F (x2 ) − F (x1 ) for any x1 and x2 with (x1 , x2 ] ∩ S = {zi }

◦ Two conditions for a valid pmf:


(1) p(x)
" ≥ 0 for any x;
(2) x∈S p(x) = 1.
◦ It's senseless to talk about pmf of continuous r.v., because
P(X = x) = 0 for any number x if X is continuous! (will see that from
the view of integral)
18/32
cdf and pmf of discrete random variables
cdf: F (x) = P(X ≤ x)
pmf: p(x) = P(X = x)
For discrete r.v., suppose S = {z1 , z2 , z3 , . . .} including all possible values
of X, then:
◦ p(x) > 0 only when x ∈ S, and p(x) = 0 elsewhere
"
◦ F (x) = i:zi ≤x p(zi )
◦ p(zi ) = F (x2 ) − F (x1 ) for any x1 and x2 with (x1 , x2 ] ∩ S = {zi }
"
◦ P(x1 < X ≤ x2 ) = F (x2 ) − F (x1 ) = i:x1 <zi ≤x2 p(zi )
Example: The pmf of a discrete r.v. is

F (2) = P(X ≤ 2) = p(0) + p(1) + p(2) = 0.05 + 0.1 + 0.15 = 0.3,F (1) =
P(X ≤ 1) = p(0) + p(1) = 0.05 + 0.1 = 0.15,
P(1 ≤ X ≤ 3) = p(1) + p(2) + p(3) = 0.1 + 0.15 + 0.25 = 0.5
19/32
Example of discrete distribution: binomial distribution

Toss a unfair coin 10 times. Suppose each time the probabilities of heads
and tails are p and 1 − p, respectively. Random variable X = the number
of heads.
◦ P(X = 0) = P({TTTTTTTTTT}) = (1 − p)10
# $
◦ P(X = 1) = P({9 T's and 1 H}) = 10
1 p(1 − p)
9

◦ ...
◦ In general, the pmf # $ x
p(x) = P(X = x) = P({(10 − x) T's and x H's}) = 10
x p (1 − p)
10−x ,

x = 0, . . . , 10.
" " # $ k
◦ The cdf F (x) = k:0≤k≤x p(k) = k:0≤k≤x 10 k p (1 − p)
10−k

We call such a variable X as binomial random variable and its


distribution as the binomial distribution.

20/32
cdf and pdf of continuous random variables
Definition: For a continuous r.v. X, its distribution can also be described
by a non-negative probability density function (pdf) f (x) which
satisfies % b
P(a < X ≤ b) = F (b) − F (a) = f (x)dx
a
for any two numbers a and b with a ≤ b (including −∞ and +∞).
&b
◦ By letting a = −∞: cdf F (b) = −∞ f (x)dx
◦ By Fundamental theorem of calculus (Newton-Leibniz Theorem): the
cdf F of a continuous variable is differentiable and F ′ (x) = f (x)
◦ For continuous r.v., the single point doesn't matter, i.e.
P(a < X ≤ b) = P(a ≤ X ≤ b) (why?)
◦ An appropriate pdf should satisfy two conditions:
(1) f& (x) ≥ 0 for any number x
(2) −∞ f (x)dx = 1
+∞

21/32
cdf and pdf of continuous random variables

Definition: For a continuous r.v. X, its distribution can also be described


by probability density function (pdf) f (x) which satisfies
% b
P(a < X ≤ b) = F (b) − F (a) = f (x)dx
a
for any two numbers a and b with a ≤ b (including −∞ and +∞).

pdf cdf

The probability of a r.v. falling into a region is the area of shaded region
under pdf f (x), which connects to the physical meaning of the integral!

22/32
cdf and pdf of continuous random variables
Definition: Probability density function (pdf) f (x) satisfies
% b
P(a < X ≤ b) = F (b) − F (a) = f (x)dx
a

for any two numbers a and b with a ≤ b (including −∞ and +∞).


&x
◦ By letting a = −∞: cdf F (x) =
−∞ f (x)dx
◦ F ′ (x) = f (x)

pdf cdf 23/32


Exercise: continuous variables

Given the pdf, write down the corresponding cdf:


(1) f (x) = 1, 0 ≤ x ≤ 1 and f (x) = 0 elsewhere
(2) f (x) = 32 x2 , −1 ≤ x ≤ 1 and f (x) = 0 elsewhere
(3) f (x) = 2e−2x , x ≥ 0 and f (x) = 0 elsewhere

Given the cdf, write down the corresponding pdf:


(1) F (x) = x, 0 ≤ x ≤ 1
(2) F (x) = 1 − e−x , x ≥ 0

24/32
Example of continuous distribution: uniform distribution
We say X follow a uniform distribution on [A, B], if:
'
1
B−A , A≤x≤B
◦ Its pdf is f (x) =
0, elsewhere
(
)
*0, x≤A
◦ Its cdf is F (x) = x−A
A<x≤B
) B−A ,
+
1, x>B

pdf
25/32
Example: uniform distribution

We say X follow a uniform distribution on [A, B], if:


'
1
B−A , A≤x≤B
◦ Its pdf is f (x) =
0, elsewhere
(
)
*0, x≤A
◦ Its cdf is F (x) = x−A
A<x≤B
) B−A ,
+
1, x>B

cdf
26/32
Comparison: discrete and continuous variables

Underlying intuition:
◦ The probability "mass" of discrete variables concentrates at a few points
◦ The probability "mass" of continuous variables spreads out in a dense
region

Characterization of their distributions:


◦ cdf is available for both of them: F (x) = P(X ≤ x)
◦ pmf only works for discrete variables: p(x) = P(X = x)
◦ pdf only &works for continuous variables: f (x) = F ′ (x) and
x
F (x) = −∞ f (t)dt

27/32
Comparison: discrete and continuous variables
Discrete distribution:

pmf cdf
Continuous distribution:

pdf cdf 28/32


Relative frequency bar chart and pmf
Suppose a r.v. X has distribution with this pmf. I sampled X1 , X2 , …,
X1000 independently from this distribution.

pmf bar chart


◦ Empirical relative frequency is an approximation of pmf.
◦ If we sample infinite points, the relative frequency will equal pmf.
◦ We will discuss more on this next week. 29/32
Density histogram and pdf
Suppose a r.v. X has distribution with this pdf. I sampled X1 , X2 , …,
X1000 independently from this distribution.

pdf histogram
◦ Empirical density histogram is an approximation of pdf.
◦ If we sample infinite points and the bin width is infinitely small, the
density histogram will be the same as the pdf curve.
30/32
Reading list (optional)

◦ "Probability and Statistics for Engineering and the Sciences" (9th


edition):
⊲ Chapter 3.1, 3.2, 4.1 and 4.2 (skip the part of expectations)

◦ "OpenIntro statistics" (4th edition, free online, download [here]):


⊲ Chapter 3.4 and 3.5 (It's ok if you feel difficult to understand the
expectation and variance. We will cover them next week.)

31/32
Many thanks to
Yang Feng
Joyce Robbins
Chengliang Tang
Owen Ward
Wenda Zhou
And all my teachers in the past 25 years

32/32

You might also like