Probability Distributions
Probability Distributions
Probability distributions
If 275 males occur in 500 human births, the relative frequency of males is
f/n=275/500=0.55 (or 55%)
1
Biostatistics I Probability distributions 3
Master Degree Public Health 2016/2017
Probability of an event
The probability of an event is the likelihood of that event expressed as either
by the relative frequency observed from a large number of data or by
knowledge of the system under study.
Using the data from the preceding slide , we can estimate:
the probability that a human birth will be a male is 0.55.
the probability of tossing a coin that lands head side up is 0.52.
Probability models
The sample space S of a random phenomenon is the set of all possible
outcomes.
An event is an outcome or a set of outcomes of a random phenomenon.
That is, an event is a subset of the sample space.
A probability model is a mathematical description of a random
phenomenon consisting of two parts: a sample space and a way of
assigning probabilities to events.
2
Biostatistics I Probability distributions 5
Master Degree Public Health 2016/2017
When one child is born, there are only two outcomes, male or female. The
sample space is S = {M, F}
When the National Health Survey records the body weights in a random
sample of adults, the sample space contains all possible adult weights over a
realistic interval.
3
Biostatistics I Probability distributions 7
Master Degree Public Health 2016/2017
x2
Examples:
Discrete probability models (x = 0, 1, 2,..., n)
- Binomial distribution
- Poisson distribution
4
Biostatistics I Probability distributions 9
Master Degree Public Health 2016/2017
Normal distribution
Of all the continuous probability distributions, none is more widely used than
the family of normal distributions.
Normal distributions are extremely important in statistical inference.
They are also a good mathematical model for many (but certainly not all)
biological variables, such as blood pressure, bone mineral density, and the
height of plants or animals.
Formally, the normal probability density function can be represented by the
expression:
1 1 x − µ 2
exp−
2πσ 2 2 σ
34.1% 34.1%
13.6% 13.6%
0.15% 2.2% 2.2% 0.15%
µ −3σ µ −2σ µ −1σ µ µ +1σ µ +2σ µ +3σ
There is not just one normal distribution, as one might naively believe when
encountering the same bell-shaped image.
Rather, the number of such curves is infinite because the parameters: the
parametric mean μ and the parametric standard deviation σ, can assume an
infinite number of values.
The curve is symmetrical around the mean, therefore, the mean and median
of the normal distribution are at the same point.
5
Biostatistics I Probability distributions 11
Master Degree Public Health 2016/2017
34.1% 34.1%
13.6% 13.6%
0.15% 2.2% 2.2% 0.15%
µ −3σ µ −2σ µ −1σ µ µ +1σ µ +2σ µ +3σ
That is, 95% of young women are between 151.13 and 176.53 cm tall.
The tallest 2.5% of young women are taller than 176.53 cm.
In probability terms, there is probability 0.025 approximately that a
randomly selected woman is taller than 176.53 cm.
6
Biostatistics I Probability distributions 13
Master Degree Public Health 2016/2017
X −µ
Z=
σ
has the standard Normal distribution.
34.1% 34.1%
13.6% 13.6%
0.15% 0.15%
2.2% 2.2%
µ −3σ µ −2σ µ −1σ µ µ +1σ µ +2 σ µ +3σ
X i −µ
Zi =
σ
34.1% 34.1%
13.6% 13.6%
0.15% 0.15%
2.2% 2.2%
-3 -2 -1 0 1 2 3
7
Biostatistics I Probability distributions 15
Master Degree Public Health 2016/2017
Example:
The distribution of weights of men of a given population
is Normal with mean μ = 70 Kg and standard deviation, σ , 10Kg.
What is the probability that a randomly selected man weights more than 80 Kg?
80 − 70
P ( x > 80) = P Z > = P (Z > 1) = 0.1587
10
What is the probability that a randomly selected man weights less than 50 Kg?
50 − 70
P ( x < 50) = P Z < = P (Z < −2 ) = 0.0228
10
What is the probability that a randomly selected man weights between 50 and
80 Kg?
P (50 < x < 80) = P (− 2 < Z < 1)
= 1 − P (Z < −2 ) − P ( Z > 1)
= 1 − P (Z > 2 ) − P ( Z > 1)
= 0.8185
z1 =
(X1 − 145 )⇒ 1.645 = (X1 − 145 )
4 4
• Solving this equation for X1 gives X1 = 145 + 4x1.645 =151.6 cm
8
Biostatistics I Probability distributions 17
Master Degree Public Health 2016/2017
What is the body length, above which there are 95% of the fish?
• X2 = ? : P(x > X2) = 0.95
• Finding the corresponding z value:
• P(Z > z2) = 0.95 ⇔ P(Z < z2) = 0.05
• As P(Z>1.645)=0.05 then P(Z<-1.645)=0.05: z2 = -1.645
• Unstandardize to transform z back to the original scale
z2 =
(X 2 − 145 )⇒ −1.645 = (X 2 − 145 )
4 4
Solving this equation for X2 gives X2 = 145 - 4x1.645 = 138.4 cm
Z4 =
(X 4 − 145)⇒ 1.96 = (X 4 − 145) ⇔ X
4 = 152 .8 cm
4 4
(X − 145)⇒ −1.96 = (X 3 − 145) ⇔ X = 137.2 cm
Z3 = 3 4
4 4
9
Biostatistics I Probability distributions 19
Master Degree Public Health 2016/2017
10
Biostatistics I Probability distributions 21
Master Degree Public Health 2016/2017
Sampling distribution of
means, with mean µ and
standard deviation, σ X
34.1% 34.1%
13.6% 13.6%
0.15% 0.15%
2.2% 2.2%
µ −3σ µ −2σ µ −1σ µ µ +1σ µ +2σ µ +3σ X
X −µ X −µ
Z= =
Standard Normal σX σ n
distribution (z).
34.1% 34.1%
13.6% 13.6%
0.15% 0.15%
2.2% 2.2%
-3 -2 -1 0 1 2 3
X − µ 50.0 − 47.0
Z= = = 0.75
σX 4.0
( )
P X > 50.0 = P (Z > 0.75 ) = 0.2266
( )
P X < 40.0 = P (Z < −2.92 ) = P (Z > 2.92 ) = 0.00118
(Zar, 2010).
11
Biostatistics I Probability distributions 23
Master Degree Public Health 2016/2017
If 500 random samples of size 25 are taken from the preceding population,
How many would have means larger than 50.0 cm?
12
σX = = 2.4 cm
25
X − µ 50.0 − 47.0
Z= = = 1.25
σX 2 .4
( )
P X > 50.0 = P (Z > 1.25 ) = 0.1056
Therefore 0.1056 x 500 =53 samples would be expected to have means larger than
50.0 cm.
12
Biostatistics I Probability distributions 25
Master Degree Public Health 2016/2017
Binomial distribution
The count X of successes in the binomial setting has the binomial distribution with
parameters n and p.
The parameter n is the number of observations, and p is the probability of a
success on any one observation.
The possible values of X are the whole numbers from 0 to n.
Binomial probability
If X has the binomial distribution with n observations and probability p of success
on each observation, the possible values of X are 0, 1, 2, …, n. If k is any one of
these values,
P ( X = k ) = C kn p k ( 1 − p)n − k
n!
C kn =
k!( n − k )!
The count of children with type O blood is a binomial random variable X with n = 5
tries and probability p = 0.25 of success on each try. We want P(X = 2).
1. Find the probability that a specific 2 of the 5 tries give successes. The
probability is: 0.25x0.25x0.75x0.75x0.75 = (0.25)2 x (0.75)3
2. Any arrangement of 2 successes and 3 failures has this same probability. Here
are all the possible arrangements:
SSFFF SFSFF SFFSF SFFFS FSSFF
FSFSF FSFFS FFSSF FFSFS FFFSS
3. There are 10 of them, all with the same probability. The overall probability of 2
successes is therefore:
P(X = 2) = 10 x (0.25)2 x (0.75)3 = 0.2637
13
Biostatistics I Probability distributions 27
Master Degree Public Health 2016/2017
P(x=0) = 0.2373
P(x=1) = 0.3955
P(x=2) = 0.2637
P(x=3) = 0.0879 0,5
P(x=4) = 0.0146
0,4
P(x=5) = 0.0010
Probability
0,3
0,2
0,1
0
0 1 2 3 4 5
Count
If the probability that a human birth will be a male is 0.50, the mean
number of baby boys in 10 births overseen by a obstetrician should be 50% of 10,
or 5.
σ = np(1 − p)
14
Biostatistics I Probability distributions 29
Master Degree Public Health 2016/2017
0,5
0,4
Probability 0,3
0,2
0,1
0
0 1 2 3 4 5
Count
The mean and standard deviation of the binomial distribution with n = 5 and p =
0.25 are
15
Biostatistics I Probability distributions 31
Master Degree Public Health 2016/2017
The Normal approximation to the binomial distribution does not work well when n
is small. As a rule of thumb, np and n(1-p) should be at least 5.
We will use the Normal approximation when n is so large that:
np ≥ 5 and n(1-p) ≥ 5
What is the probability that a randomly selected fish has a length higher than
149 cm?
149 − 145
P (x > 149 ) = P Z > = P (Z > 1) = 0.1587
4
What is the probability of 2 of a sample of 5 have a length higher than 149 cm?
5!
P ( x = 2) = × 0.16 2 × 0.84 3 = 0.15
2!(5 − 2)!
16
Biostatistics I Probability distributions 33
Master Degree Public Health 2016/2017
For slightly more accurate results with the Normal distribution, you can use a
continuity correction. Because counts can only take integer values but the Normal
distribution can take any real value, the proper continuous equivalent to a count is
the interval around it with size 1.
In the example, the continuous equivalent to a 20 count is the interval between 19.5
and 20.5
19.5 − 15.87
P ( x > 19.5) = P z > = P ( z > 0.99 ) = 0.1611
3.65
References
The text of these slides may be found in the following references*:
Baldi B., D.S. Moore - The practice of statistics in the life sciences, W.H. Freeman and
Company, 2012.
Sokal R. and F.J. Rohlf – Biometry – The principles and practice of statistics in biological
research, W.H. Freeman and Company, 4th edition, 2012.
Zar J.H. - Biostatistical Analysis, 5th edition, Prentice - Hall International Inc., 2010.
* The copy and reproduction of portions created by other authors was done only for
educational use.
17