Chapter 3 of 3rd ed.
Chapter 2
Discrete Random Variables
Chien-Kang Huang ( 黃乾綱 )
臺大工科海洋系
1
Outline
2.1 Definitions (3.1)
2.2 Probability Mass Function (3.2)
2.3 Families of Discrete Random Variables (3.3)
2.4 Cumulative Distribution Function (CDF) (3.4)
2.5 Averages (3.5)
2.6 Functions of a Random Variable (3.6)
2.7 Expected Value of a Derived Random Variable (3.7)
2.8 Variance and Standard Deviation (3.8)
2.9 Conditional Probability Mass Function (7.1)
2
3.1
2.1 Definitions
3
Review of Probability Model
• It begins with a physical model of an experiment.
– An experiment consists of a procedure and observations.
• The set of all possible observations, S, is the sample space of
the experiment.
• S is the beginning of the mathematical probability model.
• In addition to S, the mathematical model includes a rule for
assigning numbers between 0 and 1 to sets A in S.
• Thus for every A S, the model gives us a probability P[A],
where 0 P[A] 1.
4
Terminology
• We will examine probability models that assign numbers to
the outcomes in the sample space.
• Random variable, X
– That is, the observation
• Range, SX
– The set of possible values of X is the range of X
– Analog to S, the set of all possible outcomes of an experiment.
5
Probability Model and Experiment
• A probability model always begins with an experiment.
Each random variable is related directly to this experiment.
There are three types of relationships.
1. The random variable is the observation.
– Example 2.1
2. The random variable is a function of the observation.
– Example 2.2
3. The random variable is a function of another random
variable.
– Example 2.3
6
Example 3.1
Example 2.1 (The random variable is the
observation)
7
Example 3.2
Example 2.2 (The random variable is a function
of the observation)
8
Example 3.3
Example 2.3 (The random variable is a function
of another random variable)
9
Definition 3.1
Definition 2.1
• A random variable is the result of an underlying experiment, but it
also permit us to separate the experiment, in particular, the
observations, from the process of assigning numbers to outcomes.
• In some definitions of experiments, the procedures contain variable
parameters. In these experiments, there can be values of the
parameters for which it is impossible to perform the observations
specified in the experiments. In these cases, the experiments do not
produce random variables. We refer to experiments with parameter
settings that do not produce random variables as improper
experiments.
10
Example 3.4
Example 2.4 improper experiments
11
Notation
• On occasion, it is important to identify the random variable X
by the function X(s) that maps the sample outcomes s to the
corresponding value of the random variable X.
• As needed we will write {X = x} to emphasize that there is a
set of sample points s S for which X(s) = x, that is, we have
adopted the shorthand notation
{X = x} = {s S | X(s) = x}
• More random variables:
– A, the number of students asleep in the next probability lecture;
– C, the number of phone calls you answer in the next hour;
– M, the number of minutes you wait until you next answer the phone.
– Random variables A and C are discrete, M is continuous. (Chapter 3)
12
Definition 3.2
Definitions 2.2 ~ 2.3
– By contrast a random variable Y that can take on any real number y in
an interval a y b is a continuous random variable.
13
• Often, but not always, a discrete random variable takes on
integer values. An exception is the random variables related
to your probability grade.
– The experiment is to take this course and observe your grade.
– At Rutgers, the sample space is
S = {F, D, C, C+, B, B+, A}.
– The function G() that transform this sample space into a random
variable, G, is
G(F) = 0, G(D) = 1, G(C) = 2, G(C+) = 2.5, G(B) = 3, G(B+) = 3.5, G(A) = 4
– G is a finite random variable. Its values are in the set SG = {0, 1, 2, 2.5, 3,
3.5, 4}
– Random variables allow us to computer averages. In the mathematics
of probability, average are called expectations or expected values of
random variables.
14
Example 2.5
15
Quiz 3.1 Quiz 2.1
16
3.2
2.2 Probability Mass Function
17
Definition 3.3
Definition 2.4 Probability Mass Function (PMF)
• Recall that a discrete probability model assigns a number
between 0 and 1 to each outcome in a sample space.
• When we have a discrete random variable X, we express the
probability model as a probability mass function (PMF).
• The argument of PMF ranges over all the real numbers.
18
Note and Notation
• Note that X = x is an event consisting of all outcomes s of the
underlying experiment for which X(s) = x. On the other hand,
PX(x) is a function ranging over all real numbers x. For any
value of x, the function PX(x) is the probability of the event X =
x.
• Notation
Notation Meaning
X The name of a random variable
x A possible value of the random
variable
PX() The PMF of random variable X
19
Example 2.6
20
Example 2.6 Solution
The final line is necessary to specify the function at all other numbers.
It is helpful to keep this part of the definition in mind when working with the PMF.
21
Example 2.7
22
Example 2.7 Solution
1 4 x0
1 2 x 1
PX x
1 4 x2
23
0 otherwise
Theorem 3.1
Theorem 2.1
24
Theorem 2.1 Proof
25
Quiz 3.2
Quiz 2.2
26
Quiz 2.2 Solution
27
3.3
2.3 Families of Discrete
Random Variables
6 families of discrete random variables
Bernoulli, Geometric, Binomial, Pascal,
Discrete Uniform, Poisson
28
• In this section, we define 6 families of discrete random
variables.
• There is one formula for the PMF of all the random variables
in a family. Depending on the family, the PMF formula
contains one or two parameters. By assigning numerical
values to the parameters, we obtain a specific random
variable.
– Binomial(n, p) Binomial (7, 0.1), given n = 7, p = 0.1.
• Appendix A summarizes important properties of 17 families of
random variables.
29
Example 3.6
Example 2.8
• Consider the following experiments:
1. Flip a coin and let it land on a table. Observe whether the side facing
up is heads or tails. Let X be the number of heads observed.
2. Select a student at random and find out her telephone number. Let X =
0 if the last digit is even. Otherwise, let X = 1.
3. Observe one bit transmitted by a modem that is downloading a file from
the Internet. Let X be the value of the bit (0 or 1).
• All three experiments lead to the probability mass function
1 2 x 0,
PX x 1 2 x 1,
0 otherwise.
30
Definition 3.4
Definition 2.5 Bernoulli (p) R. V.
1 p x 0,
PX x p x 1,
0 otherwise.
31
Example 3.7
Example 2.9
32
Example 2.10 ~ 2.11
Example 3.8
33
Example 2.11 Solution
34
Definition 3.5
Definition 2.6 Geometric (p) R. V.
p 1 p x 1 x 1,2,
PX x
0 otherwise.
35
Example 2.12
36
Example 3.9 Example 2.13
37
Definition 3.6
Definition 2.7 Binomial (n, p) R. V.
• When ever we have a sequence of n independent trials each
with success probability p, the number of successes is a
binomial random variable.
• Bernoulli(p) random variable is a binomial(n = 1, p) random
variable.
n x n
PX x p 1 p 0, x 0,1,, n
n x
x x
38
Example 2.14 ~ 2.15
Example 3.10
39
Example 2.15 Solution
40
Definition 3.7
Definition 2.8 Pascal (k, p) R. V.
Negative Binomial (k, p) Random Variable
• For a sequence of n independent trials each with success
probability p, a Pascal random variable is the number of trials
up to and including the k-th success.
• geometric (p) random variable is a Pascal (k = 1, p) random
variable.
x 1 k
PX x x k
p 1 p
k 1
41
Example 2.16
42
Example 3.11
Example 2.17
43
Definition 3.8
Definition 2.9 Discrete Uniform (k, l) R. V.
• X is uniformly distributed between k and l.
1 l k 1 x k , k 1, k 2,, l
PX x
0 otherwise
44
Example 3.12
Example 2.18
45
Poisson Random Variable
• The probability model of a Poisson random variable describes
phenomena that occur randomly in time.
• While the time of each occurrence is complete random, there
is a known average number of occurrences per unit time.
• The Poisson model is used widely in many fields
– Arrival of information requests at the WWW server
– Initiation of telephone calls
– Emission of particle from a radio active source
46
Definition 3.9
Definition 2.10 Poisson () R. V.
More
• To describe a Poisson random variable, we will call the
occurrence of the phenomenon of interest an arrival.
: average rate (arrivals per second), T: (seconds)
= T x
e x! x 0,1,2,
PX x
0 otherwise.
47
Poisson Binomial
Example 3.13
Example 2.19
48
Example 2.19 Solution
49
Example 2.20
50
Example 2.20 Solution
51
Example 3.14
Example 2.21
52
Example 2.22
53
Quiz 3.3
Quiz 2.3
54
Quiz 2.3 Solution
(1 – p)9
(0.9)9 = 0.3874
55
Quiz 2.3 Solution (continued)
56
3.4
2.4 Cumulative Distribution
Function (CDF)
57
Definition 3.10
Definition 2.11
• For any real number x, the CDF is the probability that the
random variable X is no larger than x.
• All random variables have cumulative distribution functions
but only discrete random variables have probability mass
function.
58
Theorem 3.2
Theorem 2.2
59
Equivalent Statements for Theorem 2.2
(a) Going from left to right on the x-axis, FX(x) starts at zero and
ends at one.
(b) The CDF never decreases as it goes from left to right.
(c) For a discrete random variable X, there is a jump
(discontinuity) at each value of xi SX. The height of the jump
at xi is PX(xi).
(d) Between jumps, the graph of the CDF of the discrete random
variable X is a horizontal line.
60
Theorem 3.3
Theorem 2.3
61
Note
• It is necessary to pay careful attention to the nature of
inequalities, strict () or loose ().
• The definition of the CDF contains a loose inequality, which
means that function is continuous from the right.
62
Example 2.23
63
Example 2.23 Solution
64
Note
• Consider any finite random variable X with possible values
(nonzero probability) between xmin and xmax.
• For this random variable, the numerical specification of the
CDF begins with F x 0 x x
X min
and ends with FX x 1 x xmax
65
Example 3.16
Example 2.24
66
Example 2.24 Solution
67
Example 2.24 Solution (continued)
n+1
68
Quiz 3.4
Quiz 2.4
69
Quiz 2.4 Solution
70
3.5
2.5 Averages
71
Averages
• The average value of a collection of numerical observations is
a statistic of the collection, a single number (parameter) that
describes the entire collection.
• Mean: adding up all the numbers in the collection and dividing
by the number of terms in the sum.
• Median: the median is a number in the middle of the set of
numbers, in the sense that an equal number of members of
the set are below the median and above the median.
• Mode: the mode is the most common number in the collection
of observations. If there are two or more numbers with this
property, the collection of observations is called multimodal.
72
Example 3.17
Example 2.25
4, 5, 5, 5, 7, 7, 8, 8, 9, 10
73
Definition 2.12 ~ 2.13
74
Comments on Mode and Median
• A random variable can have several modes?
• A random variable can have several medians?
• A random variable can have several modes or medians.
75
Definition 3.11
Definition 2.14 Expected Value
E X X xP x
X
xS X
76
Comments on Expectation
• expected value expectation mean value
X looks like the center of mass. That is why PX(x) is called PMF.
• X: A random variable,
n: number of independent trials,
x(1), …, x(n): sample values, x SX occurs Nx times
mn: sample average n
1 1 Nx
mn x i N x x x
n i 1 n xS X xS X n
• Recall section 1.3 relative frequency
NA Nx
PA lim PX x lim
n n n n
lim mn
n
xP x EX
xS X
X
77
Theorem 3.4 ~ 3.7
Theorem 2.4, ~ 2.7
Random Variable Expected Value, E[X]
Bernoulli (p) p
Geometric (p) 1/p
Poisson ()
Binomial (n, p) np
Pascal (k, p) k/p
Discrete uniform (k, l) (k + l) / 2
78
Theorem 3.4
Theorem 2.4
79
Example 2.26
80
Theorem 3.5
Theorem 2.5
p
p q p 1
E X p xq x 1 xq x
x 1 q x 1 q 1 q 2
p 2
p 81
Theorem 3.6
Theorem 2.6
82
Theorem 3.7
Theorem 2.7
83
Theorem 3.8
Theorem 2.8
lim Binomial n, p Poisson
n
n
n k n k k
lim 1 e
n
k n n k!
84
Proof: Theorem 2.8
85
Hints for Theorem 2.8
Poisson(): Probability of k arrivals in T seconds
Given average arrival rate: /T
= T
k arrivals in T seconds (Time: sec)
0 T
n
1 … i … n
n (Slot) T
k slots from n slots 0
k n
Binomial(n, p): Probability of select k slots from n slots
Given average probability of one arrival in one slot: p
p = (T/n) = /n
86
Proof of Theorem 2.8
• Kn is the binomial (n, /n) random variable with PMF.
For k = 0, …, n
nn 1 n k 1
k nk nk
n
k
PK n k 1 1
k n n nk k! n
• As n ,
n n 1 n k 1
lim k
1
n n k
n e k 0,1,
lim 1 e lim PK n n k!
n
n n
0 otherwise
k
lim1 1
n
n 87
Supplements
n
x x
1 e as n
n
x
• Proof n ln 1 ln 1
x x n
ln1 n ln1 x
n n x
n
x
ln
n 1 ln 1
ln 1 h ln 1
lim x lim x x
n
x h 0
h
n
ln t h ln t 1
lim
h 0
h t
88
Quiz 2.5
89
Quiz 2.5 Solution
90
3.6
2.6 Functions of a Random
Variable
91
Definition 3.12
Definition 2.15 Derived Random Variable
92
Examples 2.27
93
Example 2.27 Solution
94
Theorem 3.9
Theorem 2.9
PY y P x X
x: g x y
95
Example 3.20
Example 2.28
96
Example 2.28 Solution
97
Example 3.21
Example 2.29
98
Figure 2.1
99
Example 2.29 Solution
E[Y] = 0.15 (10 + 19 + 27 + 34) + 0.10 (40) + 0.30 (50) = 32.5 100
Example 2.30
101
Example 2.30 Solution
102
Quiz 2.6
103
Quiz 2.6 Solution
104
3.7
2.7 Expected Value of a
Derived Random Variable
105
Theorem 3.10
Theorem 2.10
106
Theorem 2.10 Proof
107
Example 3.22
Example 2.31
108
Example 2.31 Solution
109
Theorem 3.11
Theorem 2.11
110
Theorem 3.12
Theorem 2.12
111
Example 2.32
112
Example 2.33
113
Quiz 3.7
Quiz 2.7
114
Quiz 2.7 Solution
115
3.8
2.8 Variance and Standard
Deviation
116
Dispersion
• Average is one number that summarizes an entire probability
model.
• Question
– How typical is the average?
– What are the chances of observing an event far from the average?
• A measure of dispersion is an answer to these questions
wrapped up in a single number.
– Small measure observations are likely to be near the average.
• The most important measures of dispersion are the standard
deviation and its close relative, the variance.
E X X 0
E X X
2
117
Definition 3.13, 3.14
Definition 2.16 ~ 2.17
118
Comments
X has the same units as X.
X can be compared directly with the expected value.
• Informally, we think of outcomes within X of X as being in
the center of the distribution.
• Informally, we think of sample values within X of the expected
value, x [X – X, X + X] as “typical” values of X and other
values as “unusual”.
Var X X2 x X PX x
2
xS X
119
Theorem 3.14
Theorem 2.13
120
Definition 3.15
Definition 2.18
Comments
• E[X]: first moment
• E[X2]: second moment
• The set of moments of X is a complete probability model.
(Section 6.3, moment generating function)
121
Example 2.34
122
Example 2.34 Solution
123
Example 2.34
• Recall that in Examples 2.6, we found that R has PMF
1 4 r 0,
PR r 3 4 r 2,
0 otherwise
In Example 2.26, we calculated E[R] = R = 3/2. What is the
variance of R?
Solution:
• To apply Theorem 2.13, we find that
• E[R2] = 02 PR(0) + 32 PR(3) = 3
• Var[R] = E[R2] – R2 = 3 – (3/2)2 = 3/4
124
Theorem 3.15
Theorem 2.14
125
Example 3.26
Example 2.35
126
Example 3.27
Example 2.36
127
Example 2.36 Solution
128
Theorem 3.16
Theorem 2.15
129
Theorem 3.16
Theorem 2.15
Random Variable Variance, Var[X]
Bernoulli (p) p(1 – p)
Geometric (p) (1 – p) / p2
Poisson ()
Binomial (n, p) np(1 – p)
Pascal (k, p) k(1 – p) / p2
Discrete uniform (k, l) (l – k) (l – k + 2) /
12
130
Quiz 2.8
131
Quiz 2.8 Solution
132
7.1
2.9 Conditional Probability Mass
Function
133
Example 7.1
Example 2.37
134
Definition 7.2
Definition 2.19 Conditional PMF
135
Theorem 7.3
Theorem 2.16
136
Example 7.6
Example 2.38
137
Example 2.38 Solution
138
Page 228, 229
Conditional PMF
• When a conditioning event B SX, the PMF PX(x) determines
both the probability of B as well as the conditional PMF:
PX x, B
PX B x
PB
• Now either the event X = x is contained in the event B or it is
not.
• If x B, then {X = x} B = {X = x} and P[X = x, B] = PX(x).
• If x B, then {X = x} B) = , and P[X = x, B] = 0.
139
Theorem 7.1
Theorem 2.17
• The theorem states that when we learn that an outcome x B,
the probabilities of all x B are zero in our conditional model
and the probabilities of all x B are proportionally higher than
there were before we learned x B.
140
Example 7.2
Example 2.39
141
Example 2.39 Solution
142
Theorem 7.4
Example 2.40
143
Theorem 7.3
Theorem 2.18
144
Definition 7.4
Definition 2.20
145
Theorem 7.4
Theorem 2.19
m
PX x PX Bi x PBi
i 1
146
Theorem 7.5
Theorem 2.20
147
Example 7.8
Example 2.41
148
Quiz 2.9
149
Quiz 2.9 Solution
150
Quiz 2.9 Solution (continued)
151
Quiz 2.9 Solution (continued)
152
Summary of Discrete RV Families
• Theorem 2.4 ~ 2.7 and 2.15
Random Variable Expected Value Variance
E[X] Var[X]
Bernoulli (p) p p(1 – p)
Geometric (p) 1/p (1 – p) / p2
Poisson ()
Binomial (n, p) np np(1 – p)
Pascal (k, p) k/p k(1 – p) / p2
Discrete uniform (k, l) (k + l) / 2 (l – k) (l – k + 2) /
12
153