The Normal Distribution
image: Etsy
with materials by
Will Monroe Mehran Sahami
July 19, 2017 and Chris Piech
Announcements: Midterm
A week from yesterday:
Tuesday, July 25, 7:00-9:00pm
Building 320-105
One page (both sides) of notes
Material through today’s lecture
Review session:
Tomorrow, July 20, 2:30-3:20pm
in Gates B01
Review: A grid of random variables
number of successes time to get successes
One One
trial X ∼Ber( p) X ∼Geo( p) success
n=1 r=1
Several Several
trials X ∼Bin(n , p) X ∼NegBin (r , p) successes
Interval One success
of time X ∼Poi(λ) X ∼Exp(λ) after interval
of time
(continuous!)
Review: Continuous distributions
A continuous random variable has a
value that’s a real number (not
necessarily an integer).
Replace sums with integrals!
P (a< X ≤b)=F X (b)−F X (a)
a
F X (a)= ∫ dx f X (x)
x=−∞
Review: Probability density function
The probability density function (PDF)
of a continuous random variable
represents the relative likelihood of
various values.
Units of probability divided by units of X.
Integrate it to get probabilities!
b
P (a< X ≤b)= ∫ dx f X (x)
x=a
Continuous expectation and variance
Remember: replace sums with integrals!
∞ ∞
E [ X ]= ∑ x⋅p X (x) E [ X ]= ∫ dx x⋅f X ( x)
x=−∞ x=−∞
∞ ∞
2 2 2 2
E [ X ]= ∑ x ⋅p X ( x) E [ X ]= ∫ dx x ⋅f X ( x)
x=−∞ x=−∞
2 2 2
Var( X )=E [( X −E [ X ]) ]=E [ X ]−(E [ X ])
(still!)
Review: Uniform random variable
A uniform random variable is
equally likely to be any value in
a single real number interval.
X ∼Uni(α ,β)
1
{
f X (x)= β−α
0
if x∈[α ,β]
otherwise
Uniform: Fact sheet
minimum value
X ∼Uni(α ,β)
maximum value
1
PDF:
{
f X ( x)= β−α
0
if x∈[α ,β]
otherwise
x−α
CDF:
{
F X ( x)=
β−α
1
0
if x∈[α ,β]
if x>β
otherwise
expectation: E[ X ]=
α+β
2
(β−α)
2
variance: Var( X )=
12
image: Haha169
Review: Exponential random variable
An exponential random variable
is the amount of time until the
first event when events occur
as in the Poisson distribution.
X ∼Exp(λ)
−λ x
λe if x≥0
{
f X (x)=
0 otherwise
image: Adrian Sampson
Exponential: Fact sheet
rate of events per unit time
X ∼Exp(λ)
time until first event
−λ x
λe if x≥0
PDF: f X ( x)=
0{ otherwise
−λ x
1−e if x≥0
CDF: F X ( x)= {
0 otherwise
1
expectation: E [ X ]=
λ
1
variance: Var( X )= 2
image: Adrian Sampson
λ
Normal random variable
An normal (= Gaussian) random variable is
a good approximation to many other
distributions. It often results from sums or
averages of independent random variables.
2
X ∼N (μ , σ ) 2
1 x−μ
− (
1 2 σ )
f X ( x)= e
σ √2 π
Déjà vu?
Déjà vu?
P( X =k )
k
Déjà vu?
f X ( x)
X = sum of n independent Uni(0, 1) variables
image: Thomasda
“The normal distribution”
Also known as: Gaussian distribution
Shape: bell curve
Personality: easygoing
What is normally distributed?
Natural phenomena: heights, weights…
(approximately)
Noise in measurements
(caveats:
independence,
Sums/averages of many random variables equal weighting,
continuity...)
(with sufficient
Averages of samples from a population sample sizes)
The Know-Nothing Distribution
“maximum entropy”
The normal is the most spread-out distribution
with a fixed expectation and variance.
If you know E[X] and Var(X) but nothing else,
a normal is probably a good starting point!
Normal: Fact sheet
mean
2
X ∼N (μ , σ )
variance (σ = standard deviation)
2
1 x−μ
PDF: f X ( x)=
1
e
− ( )
2 σ
σ √2 π
The Standard Normal
Z∼N (0,1)
μ σ²
2
X ∼N (μ , σ ) X =σ Z +μ
X−μ
Z= σ
De-scarifying the normal PDF
2
1 x−μ
f X ( x)=
1 −
e
( )
2 σ
σ √2 π
De-scarifying the normal PDF
2
1 z−0
f Z ( z)=
1 −
e
( )
2 1
1 √2 π
De-scarifying the normal PDF
1 2
1 − z
2
f Z ( z)= e
√2 π
De-scarifying the normal PDF
1 2
− z
2
f Z ( z)=C e
De-scarifying the normal PDF
1 2
− z
2
f Z ( z)=C e
1 2
− z
2
De-scarifying the normal PDF
1 2
− z
2
f Z ( z)=C e
1 2
− z
2
De-scarifying the normal PDF
2
1 x−μ
f X ( x)=
1
e
− ( )
2 σ
σ √2 π X −μ
Z= σ
normalizing
constant
Normal: Fact sheet
mean
2
X ∼N (μ , σ )
variance (σ = standard deviation)
2
1 x−μ
PDF: f X ( x)=
1 −
e
(
2 σ )
σ √2 π
x
x−μ
CDF: ( )
F X ( x)=Φ σ = ∫ dx f X ( x)
−∞
(no closed form)
The Standard Normal
Z∼N (0,1)
μ σ²
2
X ∼N (μ , σ ) X =σ Z +μ
X−μ
Z= σ
Φ(z)=F Z ( z)=P(Z≤z)
Symmetry of the normal
P( X≤μ−x)=P( X≥μ+ x)
and don’t forget:
P( X > x)=1−P( X ≤x)
Symmetry of the normal
P(Z≤−z)=P(Z≥z)
and don’t forget:
P(Z > z)=1−P(Z≤z)
Symmetry of the normal
Φ(−z)=P(Z≥z)
and don’t forget:
P(Z > z)=1−Φ( z)
The standard normal table
Φ(0.54)=P(Z≤0.54)=0.7054
With today’s technology
scipy.stats.norm(mean, std).cdf(x)
standard deviation! not variance.
you might need math.sqrt here.
Break time!
Practice with the Gaussian
X ~ N(3, 16)
μ=3
σ² = 16
σ=4
X −3 0−3
P( X >0)=P
4 (>
4 )
3
=P Z >−(4 )
3 3
(
=1−P Z≤− =1−Φ(− )
4 )
4
3
=1−(1−Φ( ))
4
3
=Φ( )≈0.7734
4
Practice with the Gaussian
X ~ N(3, 16)
μ=3
σ² = 16
σ=4
P(|X −3|> 4)=P ( X <−1)+ P( X >7)
X −3 −1−3 X −3 7−3
=P( 4
<
4 ) ( +P
4
>
4 )
=P (Z <−1)+ P( Z >1)
=Φ(−1)+(1−Φ(1))
=(1−Φ(1))+(1−Φ(1))
≈2⋅(1−0.8413)
=0.3173
Practice with the Gaussian
X ~ N(3, 16)
μ=3
σ² = 16
σ=4
P(|X −μ|>σ)=P( X <μ−σ)+ P( X >μ+σ)
X −μ μ−σ−μ X −μ μ+σ−μ
(
=P σ < σ ) (+P σ > σ )
=P (Z <−1)+ P( Z >1)
=Φ(−1)+(1−Φ(1))
=(1−Φ(1))+(1−Φ(1))
≈2⋅(1−0.8413)
=0.3173
Normal: Fact sheet
mean
2
X ∼N (μ , σ )
variance (σ = standard deviation)
2
1 x−μ
PDF: f X ( x)=
1
e
( − )
2 σ
σ √2 π
x
x−μ
CDF: ( )
F X ( x)=Φ σ = ∫ dx f X ( x)
−∞
(no closed form)
expectation: E[ X ]=μ
2
variance: Var( X )=σ
Carl Friedrich Gauss
(1775-1855)—remarkably influential
German mathematician
Started doing groundbreaking math
as a teenager
Didn’t invent the normal distribution
(but popularized it)
Noisy wires
Send a voltage of X = 2 or -2 on a wire.
+2 represents 1, -2 represents 0.
Receive voltage of X + Y on other end,
where Y ~ N(0, 1).
If X + Y ≥ 0.5, then output 1, else 0.
P(incorrect output | original bit = 1) =
P(2+Y <0.5)=P (Y <−1.5)
=Φ(−1.5)
=1−Φ(1.5)≈0.0668
Noisy wires
Send a voltage of X = 2 or -2 on a wire.
+2 represents 1, -2 represents 0.
Receive voltage of X + Y on other end,
where Y ~ N(0, 1).
If X + Y ≥ 0.5, then output 1, else 0.
P(incorrect output | original bit = 0) =
P(−2+Y ≥0.5)=P(Y ≥2.5)
=1−P(Y <2.5)
=1−Φ(2.5)≈0.0062
Poisson approximation to binomial
large n, small p
P( X =k )
Bin (n , p)≈Poi (λ) k
Normal approximation to binomial
large n, medium p
P( X =k )
2 k
Bin (n , p)≈ N (μ , σ )
Something is strange...
Continuity correction
X ∼Bin (n , p)
Y ∼N (np , np(1− p))
P ( X ≥55)≈ P (Y >54.5)
When approximating a discrete distribution with
a continuous distribution, adjust the bounds by
0.5 to account for the missing half-bar.
Miracle diets
100 people placed on a special diet.
Doctor will endorse diet if ≥ 65 people have
cholesterol levels decrease.
What is P(doctor endorses | diet has no effect)?
X: # people whose cholesterol decreases
X ~ Bin(100, 0.5)
np = 50
np(1 – p) = 50(1 – 0.5) = 25
≈ Y ~ N(50, 25)
Y −50 64.5−50
P (Y >64.5)=P ( 5
>
5 )
=P(Z >2.9)=1−Φ(2.9)≈0.00187
Stanford admissions
Stanford accepts 2480 students.
Each student independently
decides to attend with p = 0.68.
What is
P(at least 1750 students attend)?
X: # of students who will attend.
X ~ Bin(2480, 0.68)
np = 1686.4
σ² = np(1 – p) ≈ 539.65
≈ Y ~ N(1686.4, 539.65)
Y −1686.4 1749.5−1686.4
P (Y >1749.5)=P
(
√ 539.65
>
√ 539.65 )
≈ P (Z >2.54)=1−Φ(2.54)≈0.0053
image: Victor Gane
Stanford admissions changes