Probability - Course Notes
Probability - Course Notes
SCIENCE
Introduction to Probability: Cheat Sheet
Probability Formula | Sample Space | Expected Values | Complements
Words of welcome
ML
Dataset
Insight
You are here because you want to comprehend the basics of probability before you can dive into the world of statistics
and machine learning. Understanding the driving forces behind key statistical features is crucial to reaching your goal of
mastering data science. This way you will be able to extract important insight when analysing data through supervised
machine learning methods like regressions, but also fathom the outputs unsupervised or assisted ML give you.
Bayesian Inference is a key component heavily used in many fields of mathematics to succinctly express complicated
statements. Through Bayesian Notation we can convey the relationships between elements, sets and events.
Understanding these new concepts will aid you in interpreting the mathematical intuition behind sophisticated data
analytics methods.
Distributions are the main way we lie to classify sets of data. If a dataset complies with certain characteristics, we can
usually attribute the likelihood of its values to a specific distribution. Since many of these distributions have elegant
relationships between certain outcomes and their probabilities of occurring, knowing key features of our data is
extremely convenient and useful.
What is probability?
Probability is the likelihood of an event occurring. This event can be pretty much anything – getting heads, rolling a 4 or even
bench pressing 225lbs. We measure probability with numeric values between 0 and 1, because we like to compare the relative
likelihood of events. Observe the general probability formula.
𝑃𝑟𝑒𝑓𝑒𝑟𝑟𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
P(X)=
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑝𝑎𝑐𝑒
Probability Formula:
• The Probability of event X occurring equals the number of preferred outcomes over the number of outcomes in the
sample space.
• Preferred outcomes are the outcomes we want to occur or the outcomes we are interested in. We also call refer to such
outcomes as “Favorable”.
• Sample space refers to all possible outcomes that can occur. Its “size” indicates the amount of elements in it.
In this instance, the experimental probability for getting heads would equal the number of heads we record over the course of
the 20 outcomes, over 20 (the total number of trials).
The expected value can be numerical, Boolean, categorical or other, depending on the type of the event we are interested in. For
instance, the expected value of the trial would be the more likely of the two outcomes, whereas the expected value of the experiment
will be the number of time we expect to get either heads or tails after the 20 trials.
Expected value for categorical variables. Expected value for numeric variables.
𝑛
𝐸 𝑋 =𝑛×𝑝 𝐸 𝑋 = 𝑥𝑖 × 𝑝𝑖
𝑖=1
Probability Frequency Distribution
What is a frequency?:
Frequency is the number of times a given value or outcome
appears in the sample space.
The complement of an event is everything an event is not. We denote the complement of an event with an apostrophe.
A’ = Not A
complement original event
opposite
Characteristics of complements:
• Can never occur simultaneously.
• Add up to the sample space. (A + A’ = Sample space)
• Their probabilities add up to 1. (P(A) + P(A’) = 1)
• The complement of a complement is the original event. ((A’)’ = A)
Example:
• Assume event A represents drawing a spade, so P(A) = 0.25.
• Then, A’ represents not drawing a spade, so drawing a club, a diamond or a heart. P(A’) = 1 – P(A), so P(A’) = 0.75.
Permutations
Permutations represent the number of different possible ways we can arrange a number of elements.
𝑃 𝑛 = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × ⋯× 1
Permutations Options for who we
Options for who Options for who
put last
we put first we put second
Characteristics of Permutations:
• Arranging all elements within the sample space.
• No repetition.
• 𝑃 𝑛 = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × ⋯ × 1 = 𝑛! (Called “n factorial”)
Example:
• If we need to arrange 5 people, we would have P(5) = 120 ways of doing so.
Factorials
Factorials express the product of all integers from 1 to n and we denote them with the “!” symbol.
𝑛! = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × ⋯ × 1
Key Values:
• 0! = 1.
• If n<0, n! does not exist.
Rules for factorial multiplication. (For n>0 and n>k) Examples: n = 7, k = 4
• 𝑛 + 𝑘 ! = 𝑛! × 𝑛 + 1 × ⋯ × (𝑛 + 𝑘) • 7 + 4 ! = 11! = 7! × 8 × 9 × 10 × 11
𝑛! 𝑛! 7!
• 𝑛−𝑘 != = • 7 − 4 ! = 3! =
𝑛−𝑘+1 ×⋯×(𝑛−𝑘+𝑘) 𝑛−𝑘+1 ×⋯×𝑛 4×5×6××7
𝑛! 𝑘!× 𝑘+1 ×⋯×𝑛 7!
• = = 𝑘 + 1 ×⋯×𝑛 • = 5×6×7
𝑘! 𝑘! 4!
Variations
Variations represent the number of different possible ways we can pick and arrange a number of elements.
Variations 𝑉ത 𝑛, 𝑝 = 𝑛𝑝 Variations
𝑛!
with
without 𝑉 𝑛, 𝑝 =
repetition
Number of different Number of elements repetition 𝑛−𝑝 !
elements available we are arranging
Number of different Number of elements
elements available we are arranging
Intuition behind the formula. (With Repetition) Intuition behind the formula. (Without Repetition)
• We have n-many options for the first element. • We have n-many options for the first element.
• We still have n-many options for the second • We only have (n-1)-many options for the second
element because repetition is allowed. element because we cannot repeat the value for we
chose to start with.
• We have n-many options for each of the p-
many elements. • We have less options left for each additional element.
• 𝑛 × 𝑛 × 𝑛 … 𝑛 = 𝑛𝑝 • 𝑛 × (𝑛 − 1) × (𝑛 − 2) … (𝑛 − 𝑝 + 1) =
𝑛!
𝑛−𝑝 !
Combinations
Combinations represent the number of different possible ways we can pick a number of elements.
𝑛 𝑛!
𝐶 𝑛, 𝑝 = =
Combinations 𝑝 (𝑛 − 𝑝)! 𝑝!
Combinations represent the number of different possible ways we can pick a number of elements.
𝐶 = 𝑛1 × 𝑛2 × ⋯ × 𝑛𝑝
Combinations Size of the first Size of the second Size of the last
sample space. sample space. sample space.
To win the lottery, you need to satisfy two distinct independent events:
• Correctly guess the “Powerball” number. (From 1 to 26)
• Correctly guess the 5 regular numbers. (From 1 to 69)
69!
𝐶= × 26
Total number of 64! 5!
Combinations
𝐶𝑃𝑜𝑤𝑒𝑟𝑏𝑎𝑙𝑙 𝑛𝑢𝑚𝑏𝑒𝑟
𝐶5 𝑛𝑢𝑚𝑏𝑒𝑟𝑠
Combinations represent the number of different possible ways we can pick a number of elements. In special
cases we can have repetition in combinations and for those we use a different formula.
(𝑛 + 𝑝 − 1)!
𝐶ҧ 𝑛, 𝑝 =
(𝑛 − 1)! 𝑝!
Number of elements we
Combinations with Total number of elements need to select
repetition in the sample space
Now that you know what the formula looks like, we are going to walk you through the process of deriving this
formula from the Combinations without repetition formula. This way you will be able to fully understand the
intuition behind and not have to bother memorizing it.
Applications of Combinations with Repetition
To understand how combinations with repetition work you need to understand the instances where they occur.
To get a better grasp of the number of combinations we have, let us explore a specific example.
The methodology we use for such combinations is rather abstract. We like to represent each type of pizza with
a special sequence of 0s and 1s. To do so, we first need to select a specific order for the available ingredients.
For convenience we can refer to each ingredient by the associated letter we have highlighted (e.g “c” means
cheese, and “o” means onions).
To construct the sequence for each unique type of pizza we follow 2 rules as we go through the ingredients in
the order we wrote down earlier.
1. If we want no more from a certain topping, we write a 0 and move to the next topping.
2. If we want to include a certain topping, we write a 1 and stay on the same topping.
• Not going to the next topping allows us to indicate if we want extra by adding another 1, before we
move forward. Say, if we want our pizza to have extra cheese, the sequence would begin with “1, 1”.
• Also, we always apply rule 1 before moving on to another topping, so the sequence will actually start
with “1, 1, 0”.
Pizzas and Sequences
If we need to write a “0” after each topping, then every sequence would consist of 6 zeroes and 3 ones.
1,0 0 0 0 1,1,0 0
A vegan variety pizza with onions, green peppers and
mushrooms would be represented by the sequence 0 1,0 1,0 1,0 0 0
0,1,0,1,0,1,0,0,0.
0 0 1,0 0 0 1,1,0
Now, what kind of pizza would the sequence
0,0,1,0,0,0,1,1,0 represent?
We can put the sequence into the table and see that it
represents a pizza with green peppers and extra
bacon.
Always Ending in 0
• 1,0,0,0,0,1,1,0,0
• 0,1,0,1,0,1,0,0,0
• 0,0,1,0,0,0,1,1,0
As stated before, we have 3 “1s” and 8 different positions. Therefore, the number of pizzas we can get would be
the number of combinations of picking 3 elements out of a set of 8. This means we can transform combinations
with repetition to combinations without repetition.
𝐶ҧ 6,3 = C(8,3)
Now that we know the relationship between the number of combinations with and without repetition, we can
plug in “n+p-1” into the combinations without repetitio formula to get:
n+p−1 ! n+p−1 !
𝐶ҧ 𝑛, 𝑝 = C n + p − 1, p = =
(n + p − 1) − p ! p! n − 1 ! p!
This is the exact same formula we showed you at the beginning.
Before we you continue to the next lecture, let’s make a quick recap of the algorithm and the formula.
1. We started by ordering the possible values and expressing every combinations as a sequence.
2. We examined that only certain elements of the sequence may differ.
3. We concluded that every unique sequence can be expressed as a combination of the positions of the
“1” values.
4. We discovered a relationship between the formulas for combinations with and without repetition.
5. We used said relationship to create a general formula for combinations with repetition.
Symmetry of Combinations
Let’s see the algebraic proof of the notion that selecting p-many elements out of a set of n is the same as omitting n-p many
elements.
For starters, recall the combination formula:
𝒏!
𝑪 𝒏, 𝒑 =
(𝒏 − 𝒑)! 𝒑!
𝒏! 𝒏! 𝒏! 𝒏!
𝑪 𝒏, 𝒏 − 𝒑 = = = = = 𝑪 𝒏, 𝒑
(𝒏 − (𝒏 − 𝒑))! (𝒏 − 𝒑)! (𝒏 − 𝒏 + 𝒑))! (𝒏 − 𝒑)! 𝒑! (𝒏 − 𝒑)! 𝒏 − 𝒑 ! 𝒑!
𝑥∈𝐴
Element (lower-case) Set (upper-case)
One completely
Not touch at all. Intersect (Partially Overlap)
overlaps the other.
Examples:
A -> Diamonds Diamonds Red Cards
B -> Hearts Queens Diamond
Intersection
The intersection of two or more events expresses the set of outcomes that satisfy all the events
simultaneously. Graphically, this is the area where the sets intersect.
One completely
Not touch at all. Intersect (Partially Overlap)
overlaps the other.
We denote the
intersection of two sets
with the “intersect”
sign, which resembles
an upside-down capital
letter U:
𝑨∩𝑩
Union
The union of two or more events expresses the set of outcomes that satisfy at least one of the events.
Graphically, this is the area that includes both sets.
One completely
Not touch at all. Intersect (Partially Overlap)
overlaps the other.
We denote the
𝑨 ∪ 𝑩of two sets
intersection
with the “intersect”
sign, which resembles
an upside-down capital
letter U:
𝑨∪𝑩=𝑨+𝑩 −𝑨∩𝑩
𝑨∩𝑩
Mutually Exclusive Sets
Sets with no overlapping elements are called mutually exclusive. Graphically, their circles never touch.
One completely
Not touch at all. Intersect (Partially Overlap)
If 𝑨 ∩ 𝑩 = ∅, then
We denote the
the overlaps the other.
two intersection
sets are
𝑨 ∪mutually
𝑩of two sets
exclusive.
with the “intersect”
sign, which resembles
an upside-down capital
Remember: letter U:
All complements are mutually exclusive, but not all mutually exclusive sets are complements.
𝑨∪𝑩=𝑨+𝑩 −𝑨∩𝑩
Example:
Dogs and Cats are mutually exclusive sets, since no species is simultaneously a feline and𝑨a ∩ 𝑩
canine, but the two are not
complements, since there exist other types of animals as well.
Independent and Dependent Events
If the likelihood of event A occurring (P(A)) is affected event B occurring, then we say that A and B are dependent events.
Alternatively, if it isn’t – the two events are independent.
We express the probability of event A occurring, given event B has occurred the following way 𝑷 𝑨 𝑩 .
We call this the conditional probability.
Independent: Dependent
• All the probabilities we have examined so far. • New concept.
• The outcome of A does not depend on the outcome • The outcome of A depends on the outcome of B.
of B.
• 𝑃 𝐴 𝐵 ≠ 𝑃(𝐴)
• 𝑃 𝐴 𝐵 = 𝑃(𝐴)
Example
Example
• A -> Hearts
• A -> Hearts
• B -> Jacks
• B -> Red
Conditional Probability
For any two events A and B, such that the likelihood of B occurring is greater than 0 (𝑃 𝐵 > 0), the conditional probability
formula states the following.
Probability of
Probability of A,
𝑃 𝐴∩𝐵 the intersection.
given B has
𝑃 𝐴𝐵 =
occurred
𝑃(𝐵)
Probability of
event B
Intuition behind the formula: Remember:
• Only interested in the outcomes where B is satisfied. • Unlike the union or the intersection, changing the
order of A and B in the conditional probability alters
• Only the elements in the intersection would satisfy A as well. its meaning.
• Parallel to the “favoured over all” formula: • 𝑃 𝐴 𝐵 is not the same as 𝑃 𝐵 𝐴 , even if 𝑃 𝐴 𝐵 =
• Intersection = “preferred outcomes” 𝑃(𝐵|𝐴) numerically.
The law of total probability dictates that for any set A, which is a union of many mutually exclusive sets 𝐵1 , 𝐵2 , … , 𝐵𝑛 , its probability equals the
following sum.
𝑃 𝐴 = 𝑃 𝐴 𝐵1 × 𝑃 𝐵1 + 𝑃 𝐴 𝐵2 × 𝑃 𝐵2 + ⋯ + 𝑃 𝐴 𝐵𝑛 × 𝑃 𝐵𝑛
Conditional
Probability of A, Conditional
Probability of 𝐵1 Probability of 𝐵2
given 𝐵1 has Probability of A,
occurring. occurring.
occurred. given 𝐵2 has
occurred.
The additive law calculates the probability of the union based on the probability of the individual sets it accounts for.
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)
The multiplication rule calculates the probability of the intersection based on the conditional probability.
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 × 𝑃(𝐵)
Bayes’ Law helps us understand the relationship between two events by computing the different conditional probabilities.
We also call it Bayes’ Rule or Bayes’ Theorem.
Conditional probability
of B, given A.
Conditional 𝑃 𝐵|𝐴 × 𝑃 𝐴
probability of 𝑃 𝐴|𝐵 =
A, given B. 𝑃 𝐵
• Bayes’ Law is often used in medical or business analysis to determine which of two symptoms affects the other one more.
An overview of Distributions
A distribution shows the possible values a random variable can take and how frequently they occur.
Certain distributions share characteristics, so we separate them into types. The well-defined types of
distributions we often deal with have elegant statistics. We distinguish between two big types of
distributions based on the type of the possible values for the variable – discrete and continuous.
Discrete Continuous
Discrete Distributions have finitely many different possible outcomes. They possess several key
characteristics which separate them from continuous ones.
A distribution where all the outcomes are equally likely is called a Uniform Distribution.
Notation:
• 𝒀~ 𝑼(𝒂, 𝒃)
• * alternatively, if the values are categorical, we simply
indicate the number of categories, like so: 𝒀~ 𝑼(𝒂)
Key characteristics
• All outcomes are equally likely.
• All the bars on the graph are equally tall.
• The expected value and variance have no predictive
power.
A distribution consisting of a single trial and only two possible outcomes – success or failure is called a
Bernoulli Distribution.
Notation:
• 𝒀~ 𝑩𝒆𝒓𝒏(𝒑)
Key characteristics
• One trial.
• Two possible outcomes.
• 𝑬 𝒀 =𝒑
• 𝑽𝒂𝒓 𝒀 = 𝒑 × (𝟏 − 𝒑)
A sequence of identical Bernoulli events is called Binomial and follows a Binomial Distribution.
Notation:
• 𝒀~ 𝑩(𝒏, 𝒑)
Key characteristics
• Measures the frequency of occurrence of one of the
possible outcomes over the n trials.
• 𝑷 𝒀 = 𝒚 = 𝑪 𝒚, 𝒏 × 𝒑𝒚 × 𝟏 − 𝒑 𝒏−𝒚
• 𝑬 𝒀 =𝒏×𝒑
• 𝑽𝒂𝒓 𝒀 = 𝒏 × 𝒑 × (𝟏 − 𝒑)
Notation:
• 𝒀~ 𝑷𝒐(λ)
Key characteristics
• Measures the frequency over an interval of time or
distance. (Only non-negative values.)
λ𝒚 𝒆−λ
• 𝑷 𝒀=𝒚 =
y!
• 𝑬 𝒀 =λ
• 𝑽𝒂𝒓 𝒀 = λ
If the possible values a random variable can take are a sequence of infinitely many consecutive values, we
are dealing with a continuous distribution.
Key characteristics
• Have infinitely many consecutive possible values.
• Cannot add up the individual values that make up
an interval because there are infinitely many of them.
• Can be expressed with a graph or a continuous
function. Cannot use a table, be
• Graph consists of a smooth curve.
• To calculate the likelihood of an interval, we need
integrals.
• They have important CDFs.
• 𝑷 𝒀 = 𝒚 = 0 for any individual value y.
• 𝑷 𝒀<𝒚 =𝑷 𝒀≤𝒚
Normal Distribution
A Normal Distribution represents a distribution that most natural events follow.
Notation:
• 𝒀~ 𝑵(μ, σ𝟐 )
Key characteristics
• Its graph is bell-shaped curve, symmetric and has
thin tails.
• 𝑬 𝒀 =μ
• 𝑽𝒂𝒓 𝒀 = σ𝟐
• 68% of all its values should fall in the interval:
• (μ − 𝝈, 𝝁 + 𝝈)
To standardize any normal distribution we need to transform it so that the mean is 0 and the variance and
standard deviation are 1.
Ensures mean is 0.
Using a 𝑦−𝜇
transformation to z=
create a new 𝜎
random variable z.
Ensures standard
deviation is 1.
Notation:
• 𝒀~ 𝒕 (𝒌)
Key characteristics
• A small sample size approximation of a Normal
Distribution.
• Its graph is bell-shaped curve, symmetric, but has fat
tails.
• Accounts for extreme values better than the Normal
Distribution.
𝒌
• If k>2: 𝑬 𝒀 = μ and 𝑽𝒂𝒓 𝒀 = 𝒔𝟐 ×
𝒌−𝟐
Example and uses:
• Often used in analysis when examining a small
sample of data that usually follows a Normal
Distribution.
Chi-Squared Distribution
A Chi-Squared distribution is often used.
Notation:
• 𝒀~ 𝝌𝟐 (𝒌)
Key characteristics
• Its graph is asymmetric and skewed to the right.
• 𝑬 𝒀 =𝒌
• 𝑽𝒂𝒓 𝒀 = 𝟐𝒌
• The Chi-Squared distribution is the square of the t-
distribution.
Notation:
PDF • 𝒀~ 𝑬𝒙𝒑 (𝝀)
Key characteristics
• Both the PDF and the CDF plateau after a certain
point.
𝟏
• 𝑬 𝒀 =
𝝀
𝟏
• 𝑽𝒂𝒓 𝒀 = 𝟐
𝝀
• We often use the natural logarithm to transform the
values of such distributions since we do not have a
table of known values like the Normal or Chi-
Squared.
CDF
Example and uses:
• Often used with dynamically changing variables, like
online website traffic or radioactive decay.
Logistic Distribution
The Continuous Logistic Distribution is observed when trying to determine how continuous variable inputs
can affect the probability of a binary outcome.
Notation:
• 𝒀~ 𝑳𝒐𝒈𝒊𝒔𝒕𝒊𝒄 (𝝁, 𝒔)
PDF
Key characteristics.
• 𝑬 𝒀 =𝝁
𝒔𝟐 ×𝝅𝟐
• 𝑽𝒂𝒓 𝒀 =
𝟑
• The CDF picks up when we reach values near the
mean.
• The smaller the scale parameter, the quicker it
reaches values close to 1.
Assume we have a random variable Y, such that 𝑌 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆), then we can find its expected
value and variance the following way. Recall that the expected value for any discrete random
variable is a sum of all possible values multiplied by their likelihood of occurring P(y). Thus,
∞
𝑒 −𝜆 𝜆𝑦
𝐸(𝑌) = ∑ 𝑦
𝑦!
𝑦=0
Now, when y=0, the entire product is 0, so we can start the sum from y=1 instead. Additionally, we
can divide the numerator and denominator by “y”, since “y” will be non-zero in every case.
∞ ∞
𝑒 −𝜆 𝜆𝑦 𝑒 −𝜆 𝜆𝑦
= ∑𝑦 =∑
𝑦! (𝑦 − 1)!
𝑦=1 𝑦=1
Since lambda is a constant number, we can take out 𝜆𝑒 −𝜆 , in front of the sum.
∞
−𝜆
𝜆𝑦−1
= 𝜆𝑒 ∑
(𝑦 − 1)!
𝑦=1
Since we have “y-1” in both the numerator and denominator and the sum starts from 1, this is
equivalent to starting the sum from 0 and using y instead.
∞
−𝜆
𝜆𝑦
= 𝜆𝑒 ∑
𝑦!
𝑦=0
𝑐𝑥
Calculus dictates that for any constant “c”, ∑∞ 𝑐
𝑥=0 𝑥! = 𝑒 . We use this to simply the expression to:
= 𝜆𝑒 −𝜆 𝑒 𝜆
Lastly, since any value to the negative power is the same as 1 divided by that same value, then
𝑒 −𝜆 𝑒 𝜆 = 1.
=𝜆
Now, let’s move on to the variance. We are first going to express it in terms of expected values and
then we are going to apply a similar approach to the one we used for the expected value.
We start off with the well-known relationship between the expected value and the variance, the
variance is equal to the expected value of the squared variable, minus the expected value of the
variable, squared.
= 𝐸((𝑌)(𝑌 − 1)) + (𝜆 − 𝜆2 )
From here on out, the steps are pretty much the same once we took for the expected value:
1) we change the starting value of the sum, since the first 2 are zeroes
= 𝜆2 𝑒 −𝜆 𝑒 𝜆 + (𝜆 − 𝜆2 )
= 𝜆2 + 𝜆 − 𝜆2
=𝜆
Therefore, both the mean and variance for a Poisson Distribution are equal to lambda (𝜆).
Setting up Wolfram Alpha
Step 1: Open a browser of your choosing.
(We have opted for Google Chrome in
this specific case, but any browser will
work.)
Step2: Go to the
https://www.wolframalpha.com/ website.
(The page you are on, should look
something like the image on the right.)
We find the expected value of a function by finding the sum of the products for each possible outcome and its chance
of occurring.
Step 1: For continuous variables, this means using an integral going from negative infinity to infinity. The chance of
∞
each outcome occurring is given by the PDF, 𝑓 𝑦 , so E(Y) = −∞ 𝑦𝑓 𝑦 ⅆ𝑦.
− 𝒚−𝝁 𝟐
𝟏
Step 2: The PDF for a Normal Distribution is the following expression: 𝑓 𝑦 = 𝒆 σ
𝟐 𝟐
σ 2𝜋
∞
− 𝒚−𝝁 𝟐
𝟏
Step 3: Thus, the expected value equals: E(Y) = 𝑦 σ 2𝜋 𝒆 σ
𝟐 𝟐 ⅆ𝑦
−∞
Step 4: Since sigma and pi are constant numbers, we can take them out of the integral:
∞ ∞
𝟏 − 𝒚−𝝁 𝟐 𝟏 − 𝒚−𝝁 𝟐
E Y = 𝑦 𝒆 𝟐σ𝟐 ⅆ𝑦 = 𝑦 𝒆 𝟐σ𝟐 ⅆ𝑦
σ 2𝜋 σ 2𝜋
−∞ −∞
Normal Distribution 𝑬 𝒀 , 𝑽𝒂𝒓(𝒀)
𝒚−𝝁
Step 5: We will substitute t in for to make the integral more manageable. To do so, we need to transform y and dy.
𝟐𝝈
𝒚−𝝁 𝑑𝑦
If t = , then clearly 𝑦 = 𝜇 + 2𝜎𝑡. Knowing this, = 2𝜎, so ⅆ𝑦 = 2𝜎 ⅆ𝑡. Therefore, we can substitute and take
𝟐𝝈 𝑑𝑡
the constant out of the integral, before simplifying:
∞
− 𝒚−𝝁 𝟐 ∞ ∞ ∞
𝟏 𝟏 −𝒕𝟐 2𝜎 −𝒕𝟐 1 𝟐
𝑦 𝒆 σ ⅆ𝑦 = + 2𝜎𝑡) 𝒆−𝒕 ⅆ𝑡
σ 2𝜋 න−∞(𝜇 + 2𝜎𝑡) 𝒆 2𝜎 ⅆ𝑡 = σ 2𝜋 න−∞(𝜇
𝟐 𝟐 + 2𝜎𝑡) 𝒆 ⅆ𝑡 = න (𝜇
σ 2𝜋 𝜋
−∞
−∞
Step 6: We expand the expression within parenthesis and split the integral:
1 ∞ −𝒕𝟐 1 ∞
𝟐
∞
𝟐
න 𝜇 + 2𝜎𝑡 𝒆 ⅆ𝑡 = 𝜇 න 𝒆 ⅆ𝑡 + 2𝜎 න 𝒕 𝒆−𝒕 ⅆ𝑡
−𝒕
𝜋 −∞ 𝜋 −∞ −∞
∞ ∞ ∞
1 𝟐 𝟐 1 𝟏 𝟐
𝜇 න 𝒆−𝒕 ⅆ𝑡 + 2𝜎 න 𝒕 𝒆−𝒕 ⅆ𝑡 = 𝜇 𝝅 + 2𝜎 − 𝒆−𝒕
𝜋 −∞ −∞ 𝜋 𝟐 −∞
Normal Distribution 𝑬 𝒀 , 𝑽𝒂𝒓(𝒀)
∞
1 1 2 1 𝜇 𝜋
𝜇 𝜋 + 2𝜎 − 𝑒 −𝑡 = 𝜇 𝜋+𝟎 = =𝜇
𝜋 2 −∞
𝜋 𝜋
Step 9: Using Calculus we just showed that for a variable 𝑦 which follows a Normal Distribution and has a PDF of
− 𝒚−𝝁 𝟐
𝟏
𝑓 𝑦 = 𝒆 σ
𝟐 𝟐 , the expected value equals 𝜇.
σ 2𝜋
To find the Variance of the distribution, we need to use the relationship between Expected Value and Variance we
already know, namely:
𝑉𝑎𝑟 𝑌 = 𝐸 𝑌 2 − [𝐸 𝑌 ]2
Step 1: We already know the expected value, so we can plug in 𝜇2 for [𝐸 𝑌 ]2, hence:
𝑉𝑎𝑟 𝑌 = 𝐸 𝑌 2 − 𝜇2
Normal Distribution 𝑬 𝒀 , 𝑽𝒂𝒓(𝒀)
Step 2: To compute the expected value for 𝑌 2 , we need to go over the same process we did when calculating the
expected value for 𝑌, so let’s quickly go over the obvious simplifications.
∞
𝟏 − 𝒚−𝝁 𝟐
E 𝑌 2 − 𝜇2 = 𝑦 2 𝒆 𝟐σ𝟐 ⅆ𝑦 − 𝜇2 =
σ 2𝜋
−∞
∞
𝟏 − 𝒚−𝝁 𝟐
= 𝑦 2 𝒆 𝟐σ𝟐 ⅆ𝑦 − 𝜇2 =
σ 2𝜋
−∞
∞
𝟏 𝟐 𝟐
= න 2𝜎𝑡 + 𝜇 𝒆−𝒕 2𝜎 ⅆ𝑡 − 𝜇2 =
σ 2𝜋 −∞
2𝜎 ∞ 𝟐 2
= න 2𝜎𝑡 + 𝜇 𝑒 −𝑡 ⅆ𝑡 − 𝜇2 =
σ 2𝜋 −∞
1 ∞ 𝟐 2
= න 2𝜎𝑡 + 𝜇 𝑒 −𝑡 ⅆ𝑡 − 𝜇2 =
𝜋 −∞
∞
1 2
= න (𝟐𝝈𝟐 𝒕𝟐 + 𝟐 𝟐𝝈𝝁𝒕 + 𝝁𝟐 ) 𝑒 −𝑡 ⅆ𝑡 − 𝜇2 =
𝜋 −∞
∞ ∞ ∞
1 2 2 −𝑡 2 −𝑡 2 2
= 2𝜎 න 𝑡 𝑒 ⅆ𝑡 + 2 2𝜎𝜇 න 𝑡 𝑒 ⅆ𝑡 + 𝜇 න 𝑒 −𝑡 ⅆ𝑡 − 𝜇2
2
𝜋 −∞ −∞ −∞
Normal Distribution 𝑬 𝒀 , 𝑽𝒂𝒓(𝒀)
Step 3: We already evaluated two of the integrals when finding the expected value, so let’s just use the results and
simplify.
∞ ∞ ∞
1 2 2 2
2𝜎 2 න 𝑡 2 𝑒 −𝑡 ⅆ𝑡 + 2 2𝜎𝜇 න 𝑡 𝑒 −𝑡 ⅆ𝑡 + 𝜇2 න 𝑒 −𝑡 ⅆ𝑡 − 𝜇2 =
𝜋 −∞ −∞ −∞
∞
1 2
= 2𝜎 2 න 𝑡 2 𝑒 −𝑡 ⅆ𝑡 + 2 2𝜎𝜇 × 𝟎 + 𝜇2 𝝅 − 𝜇2 =
𝜋 −∞
∞
1 2 1 2
= 2𝜎 2 න 𝑡 2 𝑒 −𝑡 ⅆ𝑡 + 𝜇 𝜋 − 𝜇2 =
𝜋 −∞ 𝜋
∞
1 2
= 2𝜎 2 න 𝑡 2 𝑒 −𝑡 ⅆ𝑡 + 𝜇2 − 𝜇2 =
𝜋 −∞
2𝜎 2 ∞ 2 −𝑡 2
= න 𝑡 𝑒 ⅆ𝑡
𝜋 −∞
2𝜎 2 ∞ 2 −𝑡 2 2𝜎 2 𝑡 −𝑡 2 ∞ 1 ∞ −𝑡 2
න 𝑡 𝑒 ⅆ𝑡 = − 𝑒 + න 𝑒 ⅆ𝑡
𝜋 −∞ 𝜋 2 −∞ 2 −∞
Normal Distribution 𝑬 𝒀 , 𝑽𝒂𝒓(𝒀)
2𝜎 2 𝑡 −𝑡 2 ∞ 1 ∞ −𝑡 2
− 𝑒 + න 𝑒 ⅆ𝑡 =
𝜋 2 −∞ 2 −∞
2𝜎 2 1 ∞ −𝑡 2
= 𝟎 + න 𝑒 ⅆ𝑡 =
𝜋 2 −∞
2𝜎 2 1 ∞ −𝑡 2
= න 𝑒 ⅆ𝑡 =
𝜋 2 −∞
𝜎 2 ∞ −𝑡 2
= න 𝑒 ⅆ𝑡
𝜋 −∞
∞ 2
Step 6: As we computed earlier, ධ 𝑒 −𝑡 ⅆ𝑡 = 𝜋, which means:
−∞
𝜎 2 ∞ −𝑡 2 𝜎2
න 𝑒 ⅆ𝑡 = 𝝅 = 𝝈𝟐
𝜋 −∞ 𝜋
− 𝒚−𝝁 𝟐
𝟏
Thus, the variance for a Variable, whose PDF looks like: 𝑓 𝑦 = 𝒆 σ
𝟐 𝟐 , equals 𝝈𝟐 .
σ 2𝜋