Math382 Lecture Notes Probability and Statistics PDF
Math382 Lecture Notes Probability and Statistics PDF
Contents
1 Probability in the World Around Us
2 Probability
2.1 What is Probability . . . . . . . . . . . .
2.2 Review of set notation . . . . . . . . . . .
2.3 Types of Probability . . . . . . . . . . . .
2.4 Laws of Probability . . . . . . . . . . . . .
2.5 Counting Rules useful in Probability . . .
2.6 Conditional probability and independence
2.7 Bayes Rule . . . . . . . . . . . . . . . . .
3 Discrete probability distributions
3.1 Discrete distributions . . . . . . . . . .
3.2 Expected values of Random Variables
3.3 Bernoulli distribution . . . . . . . . .
3.4 Binomial distribution . . . . . . . . . .
3.5 Geometric distribution . . . . . . . . .
3.6 Negative Binomial distribution . . . .
3.7 Poisson distribution . . . . . . . . . .
3.8 Hypergeometric distribution . . . . . .
3.9 Moment generating function . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
10
15
16
19
25
32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
39
43
49
49
53
55
57
60
62
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
65
65
69
74
76
78
79
81
84
86
90
92
.
.
.
.
.
.
.
.
.
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
93
93
96
98
101
104
107
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
111
111
111
113
117
119
7 Descriptive statistics
7.1 Sample and population . . . . . . .
7.2 Graphical summaries . . . . . . . .
7.3 Numerical summaries . . . . . . .
7.3.1 Sample mean and variance
7.3.2 Percentiles . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
123
. 123
. 124
. 125
. 125
. 126
8 Statistical inference
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 Unbiased Estimation . . . . . . . . . . . . . . .
8.2 Confidence intervals . . . . . . . . . . . . . . . . . . .
8.3 Statistical hypotheses . . . . . . . . . . . . . . . . . .
8.3.1 Hypothesis tests of a population mean . . . . .
8.4 The case of unknown . . . . . . . . . . . . . . . . . .
8.4.1 Confidence intervals . . . . . . . . . . . . . . .
8.4.2 Hypothesis test . . . . . . . . . . . . . . . . . .
8.4.3 Connection between Hypothesis tests and C.I.s
8.4.4 Statistical significance vs Practical significance
8.5 C.I. and tests for two means . . . . . . . . . . . . . . .
8.5.1 Matched pairs . . . . . . . . . . . . . . . . . .
8.6 Inference for Proportions . . . . . . . . . . . . . . . . .
8.6.1 Confidence interval for population proportion .
8.6.2 Test for a single proportion . . . . . . . . . . .
8.6.3 Comparing two proportions* . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
157
. 157
. 159
. 161
. 162
. 162
. 164
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Linear Regression
9.1 Correlation coefficient . . . . . . . . . . . . .
9.2 Least squares regression line . . . . . . . . . .
9.3 Inference for regression . . . . . . . . . . . . .
9.3.1 Correlation test for linear relationship
9.3.2 Confidence and prediction intervals . .
9.3.3 Checking the assumptions . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
131
131
131
132
135
136
140
140
142
145
145
147
149
151
151
151
152
CONTENTS
CONTENTS
Chapter 1
Now you are to stop and think: what are the factors that will make this model more uncertain?
resolution). The models should also take into account the uncertainties from many sources,
including our imperfect knowledge of the current state of Earth, our imperfect understanding of all physical processes involved, and the uncertainty about future scenarios of human
development.2
Understanding and communicating this uncertainty is greatly aided by the knowledge
of the rules of probability.
The authors thank Lynda Ballou for contributing some examples and exercises, and
Brian Borchers for valuable comments.
Not the least, our ability to calculate the output of such models is also limited by the current state of
computational science.
Chapter 2
Probability
2.1
What is Probability
Probability theory is the branch of mathematics that studies the possible outcomes of
given events together with the outcomes relative likelihoods and distributions. In common
usage, the word probability is used to mean the chance that a particular event (or set of
events) will occur expressed on a linear scale from 0 (impossibility) to 1 (certainty), also
expressed as a percentage between 0 and 100%. The analysis of data (possibly generated
by probability models) is called statistics.
Probability is a way of summarizing the uncertainty of statements or events. It gives a
numerical measure for the degree of certainty (or degree of uncertainty) of the occurrence
of an event.
Another way to define probability is the ratio of the number of favorable outcomes to
the total number of all possible outcomes. This is true if the outcomes are assumed to be
equally likely. The collection of all possible outcomes is called the sample space.
If there are n total possible outcomes in a sample space S, and m of those are favorable
for an event A, then probability of event A is given as
P (A) =
10
CHAPTER 2. PROBABILITY
Definition 2.1.
Random Experiment: A random experiment is the process of observing the
outcome of a chance event.
Outcome: The elementary outcomes are all possible results of the random
experiment.
Sample Space(SS): The sample space is the set or collection of all the outcomes
of an experiment and is denoted by S.
Example 2.2.
a) Flip a coin once, then the sample space is: S = {H, T }
b) Flip a coin twice, then the sample space is: S = {HH, HT, T H, T T }
We want to assign a numerical weight or probability to each outcome. We write
the probability of Ai as P (Ai ). For example, in our coin toss experiment, we may assign
P (H) = P (T ) = 0.5. Each outcome comes up half the time.
2.2
11
&%
&%
AB
A
&%
&%
Venn diagram
Venn diagram is often used to illustrate the relations between sets (events). The sets
A and B are represented as circles; operations between them (intersections, unions and
complements) can also be represented as parts of the diagram. The entire sample space
S is the bounding box. See Figure 2.1
AB
AB
AB
AB
Figure 2.1: Venn diagram of events A (in bold) and B, represented as insides of circles,
and various intersections
Example 2.4. Set notation
Suppose a set S consists of points labeled 1, 2, 3 and 4. We denote this by S =
{1, 2, 3, 4}.
If A = {1, 2} and B = {2, 3, 4}, then A and B are subsets of S, denoted by A S and
B S (B is contained in S). We denote the fact that 2 is an element of A by 2 A.
The union of A and B, A B = {1, 2, 3, 4}. If C = {4}, then A C = {1, 2, 4}. The
intersection A B = AB = {2}. The complement A0 = {3, 4}.
12
CHAPTER 2. PROBABILITY
Distributive laws
A(B C) = AB AC
and
A (BC) = (A B)(A C)
De Morgans Law
(A B)0 = A0 B 0
(AB)0 = A0 B 0
Exercises
2.1.
Use the Venn diagrams to illustrate Distributive laws and De Morgans law.
2.2.
Simplify the following (Draw the Venn diagrams to visualize)
a) (A0 )0
b) (AB)0 A
c) (AB) (AB 0 )
d) (A B C)B
2.3.
Represent by set notation the following events
a) both A and B occur
b) exactly one of A, B occurs
c) at least one of A, B, C occurs
d) at most one of A, B, C occurs
2.4.
The sample space consists of eight capital letters (outcomes), A, B, C ,..., H. Let V =
event that the letter represents a vowel, and L = event that the letter is made of straight
lines. Describe the outcomes that comprise
a) V L
b) V L0
c) V 0 L0
13
0.32
0.26
0.11
Two-way table
This is a popular way to represent statistical data. The cells of the table
correspond to the intersections of row and column events. Note that the
contents of the table add up accross rows and columns of the table. The
bottom-right corner of the table contains P (S) = 1
B
0.26
0.11
0.37
A
A0
B0
0.32
?
0.63
0.58
0.42
1
Tree diagram
A tree diagram may be used to show the sequence of choices that lead to the
complete description of outcomes. For example, when tossing two coins, we
may represent this as follows
Second toss
First toss
Outcome
HH
HT
TH
TT
14
CHAPTER 2. PROBABILITY
2.5.
Out of all items sent for refurbishing, 40% had mechanical defects, 50% had electrical
defects, and 25% had both.
Denoting A = {an item has a mechanical defect} and
B = {an item has an electrical defect}, fill the probabilities into the Venn diagram and
determine the quantities listed below.
a) P (A)
b) P (AB)
c) P (A0 B)
'$
'$
d) P (A0 B 0 )
e) P (A B)
f) P (A0 B 0 )
B
&%
&%
g) P ([A B]0 )
2.6.
Do the following satisfy the definitions of probability? If not, explain why.
a) P (A) = 0.3, P (B) = 0.5 and P (AB 0 ) = 0.4.
b) P (A) = 0.4, P (B) = 0.6 and P (AB) = 0.2.
c) P (A) = 0.7, P (B) = 0.6 and P (AB) = 0.2.
2.7.
For tossing a six-sided die, find the following probabilities (assume equally likely outcomes).
a) Probability to get both a number more than 3 and an even number.
b) Probability to get a number less than 4 or an odd number.
2.8.
When tossing two six-sided dice, find the following probabilities (assume equally likely
outcomes).
a) Probability that the first die shows a number more than 3 and the second one shows
an even number.
b) Probability that the first die shows a number less than 4 or the second one shows
an odd number.
2.9.
a) Suppose that P (A B) = 0.8 and P (A0 B) = 0.7. Find P (B).
b) Suppose that P (A) = 0.4 and P (B) = 0.3. What are the possible values for P (A0 B 0 )?
15
2.10.
A sample of mutual funds was classified according to whether a fund was up or down last
year (A and A0 ) and whether it was investing in international stocks (B and B 0 ). The
probabilities of these events and their intersections are represented in the two-way table
below.
A
A0
B
0.33
?
0.64
B0
?
?
?
?
0.52
1
2.3
Types of Probability
There are three ways to define probability, namely classical, empirical and subjective
probability.
Definition 2.6. Classical probability
Classical or theoretical probability is used when each outcome in a sample space is
equally likely to occur. The classical probability for an event A is given by
P (A) =
Number of outcomes in A
Total number of outcomes in S
Example 2.5.
Roll a die and observe that P (A) = P (rolling a 3) = 1/6.
Definition 2.7. Empirical probability
Empirical (or statistical) probability is based on observed data. The empirical
probability of an event A is the relative frequency of event A, that is
P (A) =
Frequency of event A
Total number of observations
Example 2.6.
The following are the counts of fish of each type, that you have caught before.
Fish Types
Number of times caught
Blue gill
13
Red gill
17
Crappy
10
Total
40
Estimate the probability that the next fish you catch will be a Blue gill.
P (Blue gill) = 13/40 = 0.325
16
CHAPTER 2. PROBABILITY
Example 2.7.
Based on genetics, the proportion of male children among all children conceived should
be around 0.5. However, based on the statistics from a large number of live births, the
probability that a child being born is male is about 0.512.
The empirical probability definition has a weakness that it depends on the results of
a particular experiment. The next time this experiment is repeated, you are likely to get
a somewhat different result.
However, as an experiment is repeated many times, the empirical probability of an
event, based on the combined results, approaches the theoretical probability of the event.1
Subjective Probability: Subjective probabilities result from intuition, educated guesses,
and estimates. For example, given a patients health and extent of injuries a doctor may
feel that the patient has a 90% chance of a full recovery.
Regardless of the way probabilities are defined, they always follow the same laws, which
we will explore starting with the following Section.
2.4
Laws of Probability
As we have seen in the previous section, the probabilities are not always based on the
assumption of equal outcomes.
Definition 2.8. Axioms of Probability
For an experiment with a sample space S = {e1 , e2 , . . . , en } we can assign
probabilities P (e1 ), P (e2 ), . . . , P (en ) provided that
a) 0 P (ei ) 1
P
b) P (S) = ni=1 P (ei ) = 1.
If a set (event) A consists of outcomes {e1 , e2 , . . . , ek }, then
P (A) =
k
X
P (ei )
i=1
This definition just tells us which probability assignments are legal, but not necessarily
which ones would work in practice. However, once we have assigned the probability to
each outcome, they are subject to further rules which we will describe below.
Theorem 2.1. Complement Rule
For any event A,
P (A0 ) = 1 P (A)
(2.1)
17
(2.2)
Proof. Consider the Venn diagram. P (A B) is the probability of the sum of all sample
points in A B. Now P (A) + P (B) is the sum of probabilities of sample points in A and
in B. Since we added up the sample points in (A B) twice, we need to subtract once to
obtain the sum of probabilities in (A B), which is P (A B).
Example 2.8. Probability that John passes a Math exam is 4/5 and that he passes a
Chemistry exam is 5/6. If the probability that he passes both exams is 3/4, find the
probability that he will pass at least one exam.
Solution. Let M = John passes Math exam, and C = John passes Chemistry exam.
P (John passes at least one exam) = P (M C) =
= P (M ) + P (C) P (M C) = 4/5 + 5/6 3/4 = 53/60
Corollary. If two events A and B are mutually exclusive, then
P (A B) = P (A) + P (B).
This follows immediately from (2.2). Since A and B are mutually exclusive, P (A B) = 0.
Example 2.9. What is the probability of getting a total of 7 or 11, when two dice are
rolled?
1
2
1 (1,1) (1,2)
2
3
4
5
6
6
(1,6)
(6,6)
Solution. Let A be the event that the total is 7 and B be the event that it is 11. The
sample space for this experiment is
S = {(1, 1), (1, 2), ......, (2, 1), (2, 2), ........., (6, 6)},
n(S) = 36
A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} and n(A) = 6.
So, P (A) = 6/36 = 1/6.
B = {(5, 6), (6, 5)} and n(B) = 2
So, P (B) = 2/36 = 1/18.
Since we cannot have a total equal to both 7 and 11, A and B are mutually exclusive, i.e.
P (A B) = 0.
So, we have P (A B) = P (A) + P (B) = 1/6 + 1/18 = 2/9.
18
CHAPTER 2. PROBABILITY
Exercises
2.11.
Two cards are drawn from a 52-card deck, without replacement. What is the probability
that both are greater than 2 and less than 8?
2.12.
A permutation of the word white is chosen at random. Find the probability that it
begins with a vowel. Also, find the probability that it ends with a consonant, and the
probability that it begins with a vowel and ends with a consonant.
2.13.
Find the probability that a leap year will have 53 Sundays.
2.14.
As a foreign language, 40% of the students took Spanish and 30% took French, while 60%
took at least one of these languages. What percent of students took both Spanish and
French?
2.15.
In a class of 100 students, 30 major in Mathematics. Moreover, of the 40 females in the
class, 10 major in Mathematics. If a student is selected at random from the class, what is
the probability that the student will be a male or will major in Mathematics (or both)?
2.16.
Suppose that P (A) = 0.4, P (B) = 0.5 and P (AB) = 0.2. Find the following:
a) P (A B)
b) P (A0 B)
c) P [A0 (A B)]
d) P [A (A0 B)]
2.17.
Two tetrahedral (4-sided) symmetrical dice are rolled, one after the other.
a) Find the probability that both dice will land on the same number.
b) Find the probability that each die will land on a number less than 3.
c) Find the probability that the two numbers will differ by at most 1.
d) Will the answers change if we rolled the dice simultaneously?
2.5
19
In some experiments it is helpful to list the elements of the sample space systematically
by means of a tree diagram, see page 13.
In many cases, we shall be able to solve a probability problem by counting the number
of points in the sample space without actually listing each element.
Theorem 2.3. Multiplication principle
If one operation can be performed in n1 ways, and if for each of these a second
operation can be performed in n2 ways, then the two operations can be performed
together in n1 n2 ways.
Example 2.10. How large is the sample space when a pair of dice is thrown?
Solution. The first die can be thrown in n1 = 6 ways and the second in n2 = 6 ways.
Therefore, the pair of dice can land in n1 n2 = 36 possible ways.
Theorem 2.3 can naturally be extended to more than two operations: if we have n1 ,
n2 ,...,nk consequent choices, then the total number of ways is n1 n2 nk .
The term permutations refers to an arrangement of objects when the order matters
(for example, letters in a word).
Theorem 2.4. Permutations
The number of permutations of n distinct objects taken r at a time is
n Pr
n!
(n r)!
Example 2.11.
From among ten employees, three are to be selected to travel to three out-of-town
plants A, B, and C, one to each plant. Since the plants are located in different cities, the
order in which the employees are assigned to the plants is an important consideration. In
how many ways can the assignments be made?
Solution. Because order is important, the number of possible distinct assignments is
10 P3
10!
= 10(9)(8) = 720.
7!
In other words, there are ten choices for plant A, but then only nine for plant B, and eight
for plant C. This gives a total of 10(9)(8) ways of assigning employees to the plants.
The term combination refers to the arrangement of objects when order does not matter.
For example, choosing 4 books to buy at the store in any order will leave you with the
20
CHAPTER 2. PROBABILITY
Example 2.12.
In the previous example, suppose that three employees are to be selected from among the
ten available to go to the same plant. In how many ways can this selection be made?
Solution. Here, order is not important; we want to know how many subsets of size r = 3
can be selected from n = 10 people. The result is
10
10!
10(9)(8)
=
=
= 120
3
3! 7!
3(2)(1)
Example 2.13.
A package of six light bulbs contains 2 defective bulbs. If three bulbs are selected for use,
find the probability that none of the three is defective.
Solution. P(none are defective) =
number of ways 3 nondefectives can be chosen
=
=
total number of ways a sample of 3 can be chosen
4
3
6
3
=
1
5
Example 2.14.
In a poker hand consisting of 5 cards, find the probability of holding 2 aces and 3 jacks.
Solution. The number of ways of being dealt 2 aces from 4 is 42 = 6 and the number of
ways of being dealt 3 jacks from 4 is 43 = 4.
The total number of 5-card poker hands, all of which are equally likely is
52
= 2, 598, 960
5
Hence, the probability of getting 2 aces and 3 jacks in a 5-card poker hand is P (C) =
(6 4)/2, 598, 960
21
Example 2.15.
A university warehouse has received a shipment of 25 printers, of which 10 are laser
printers and 15 are inkjet models. If 6 of these 25 are selected at random to be checked by
a particular technician, what is the probability that exactly 3 of these selected are laser
printers? At least 3 inkjet printers?
Solution.
First
choose 3 of the 15 inkjet and then 3 of the 10 laser printers. There are
15
10
and
3
3 ways to do it, and therefore
15
3
10
3
P (exactly 3 of the 6) =
25
6
= 0.3083
10
3
25
6
15
4
10
2
25
6
15
5
10
1
25
6
15
6
10
0
25
6
= 0.8530
Exercises
2.18.
An incoming lot of silicon wafers is to be inspected for defectives by an engineer in a microchip manufacturing plant. Suppose that, in a tray containing 20 wafers, 4 are defective.
Two wafers are to be selected randomly for inspection. Find the probability that neither
is defective.
2.19.
A person draws 5 cards from a shuffled pack of 52 cards. Find the probability that the
person has at least 3 aces. Find the probability that the person has at least 4 cards of the
same suit.
22
CHAPTER 2. PROBABILITY
2.20.
Three people enter the elevator on the basement level. The building has 7 floors. Find
the probability that all three get off at different floors.
2.21.
In a group of 7 people, each person shakes hands with every other person. How many
handshakes did occur?
2.22.
In a lottery, 6 numbers are drawn out of 45. You hit a jackpot if you guess all 6 numbers
correctly, and get $400 if you guess 5 numbers out of 6 correctly. What are the probabilities
of each of those events?
2.23.
A marketing director considers that theres overwhelming agreement in a 5-member
focus group when either 4 or 5 people like or dislike the product.a If, in fact, the products
popularity is 50% (so that all outcomes are equally likely), what is the probability that
the focus group will be in overwhelming agreement about it? Is the marketing director
making a judgement error in declaring such agreement overwhelming?
2.24.
A die is tossed 5 times. Find the probability that we will have 4 of a kind.
2.25.
There are 21 Bachelor of Science programs at New Mexico Tech. Given 21 areas from
which to choose, in how many ways can a student select:
a) A major area and a minor area?
b) A major area and two minors (regardless of order)?
2.26.
In a math modeling class, we have 15 students and want to split them into 3 groups, 5
students each, to do group projects. How many possible group assignments are there?
2.27.
If a group consist of 8 men and 6 women, in how many ways can a committee of 5 be
selected if:
a) The committee is to consist of 3 men and 2 women.
b) There are no restrictions on the number of men and women on the committee.
c) There must at least one man.
d) There must be at least one of each sex.
2.28.
From a box containing 5 chocolates and 4 hard candies, a child takes a handful of 4 (at
random). What is the probability that exactly 3 of the 4 are chocolates?
2.29.
Suppose we have a lot of 40 transistors of which 8 are defective. If we sample without
replacement, what is the probability that we get 4 good transistors in the first 5 draws?
23
1
1
1
3
1
1
1
3
4
5
10
1
5
10
A step in construction
The number in each cell represents the number of downward routes from the
vertex to that point (can you explain why?). It is also a number of ways to choose
r objects out of n (can you explain why?), that is, nr .
1
1
1
3
1
1
6
1
7
10
15
1
5
10
20
35
1
4
15
35
1
6
21
1
7
1
56 70 56 28
8
1
9
36 84 126 126 84 36
9
1
1
21
1
3
4
5
1
2
28
r=0
Note that, if you let a = b = 1/2, then on the right-hand side of the sum you will
get the probabilities
n
r
P (a is chosen r times and b is chosen n r times) = n
2
and on the left-hand side you will have 1 (the total of all probabilities).
24
CHAPTER 2. PROBABILITY
2.30.
A housewife is asked to rank four brands A, B, C, and D of household cleaner according
to her preference, number one being the one she prefers most, etc. she really has no
preference among the four brands. Hence, any ordering is equally likely to occur.
a) Find the probability that brand A is ranked number one.
b) Find the probability that brand C is number 1 and D is number 2 in the rankings.
c) Find the probability that brand A is ranked number 1 or number 2.
2.31.
On a given day, 8 soccer games are played. How many different outcomes are possible,
if its known that 4 games are won by the home team, 2 by the visiting team and 2 are
drawn?
2.32.
In how many ways can one arrange the letters of the word ADVANTAGE so that the
three As are adjacent to each other?
2.33.
How many distinct words can be formed by permuting the letters in the word PROBABILITY?
2.34.
Eight tires of different brands are ranked 1 to 8 (best to worst) according to mileage
performance. If four of these tires are chosen at random by a customer, find the probability
that the best tire among the four selected by the customer is actually ranked third among
the original eight.
2.35.
A drawer contains 3 white and 2 brown socks. Two socks are taken at random. What is
the probability that you got two socks of the same color?
2.36.
For password security, it is often recommended that users choose passwords that contain
at least two digits, some capital letters, etc. Calculate and compare the available number
of passwords when using the following conditions:
a) A 6-letter password only consisting of lowercase letters.
b) A 6-letter password consisting of lowercase and capital letters, with at least 2 capital
letters.
c) A 6-letter password consisting of lowercase and capital letters and some digits, with
at least 1 capital letter and at least 1 digit.
2.6
25
Humans often have to act based on incomplete information. If your boss has looked at
you gloomily, you might conclude that somethings wrong with your job performance.
However, if you know that she just suffered some losses in the stock market, this extra
information may change your assessment of the situation. Conditional probability is a
tool for dealing with additional information like this.
Conditional probability is the probability of an event occurring given the knowledge
that another event has occurred. The conditional probability of event A occurring, given
that event B has occurred is denoted by P (A|B) and is read probability of A given B.
Definition 2.9. Conditional probability
The conditional probability of event A given B is
P (A | B) =
P (A B)
for P (B) > 0
P (B)
(2.4)
26
CHAPTER 2. PROBABILITY
cards are red. So the probability of the card being the five of diamonds is now 1/26. What
we have just calculated is a conditional probabilitythe probability that the card is the
five of diamonds, given that it is red.
If we let A stand for the card being the five of diamonds, and B stand for the card
being red, then the conditional probability that the card is the five of diamonds given that
it is red is written P (A|B).
In our case, P (A B) is the probability that the card is the five of diamonds and red,
which is 1/52 (exactly the same as P(A), since there are no black fives of diamonds!). P(B),
the probability that the card is red, is 1/2. So the definition of conditional probability
tells us that P (A|B) = 1/26, exactly as it should. In this simple case we didnt really
need to use a formula to tell us this, but the formula is very useful in more complex cases.
If we rearrange the definition of conditional probability, we obtain the multiplication
rule for probabilities:
P (A B) = P (A|B)P (B)
(2.5)
The next concept, statistical independence of events, is very important.
Definition 2.10. Independence
The events A and B are called (statistically) independent if
P (A B) = P (A)P (B)
(2.6)
Another way to express independence is to say that the knowledge of B occurring does
not change our assessment of P (A). This means that P (A|B) = P (A). (The probability
that a person is female given that he or she was born in March is just the same as the
probability that the person is female.)
Equation (2.6) is often called simplified multiplication rule because it can be obtained
from (2.5) by substituting P (A|B) = P (A).
Example 2.18.
For a coin tossed twice, denote H1 the event that we got Heads on the first toss, and H2 is
the Heads on the second. Clearly, P (H1 ) = P (H2 ) = 1/2. Then, counting the outcomes,
P (H1 H2 ) = 1/4 = P (H1 )P (H2 ), therefore H1 and H2 are independent events. This agrees
with our intuition that the result of the first toss should not affect the chances for H2 to
occur.
The situation of the above example is very common for repeated experiments, like rolling
dice, or looking at random numbers etc.
Definition 2.10 can be extended to more than two events, but its fairly difficult to
describe.3 However, it is often used in this context:
If events A1 , A2 , ..., Ak are independent, then
P (A1 A2 ...Ak ) = P (A1 ) P (A2 ) ... P (Ak )
3
(2.7)
For example, the relation P (ABC) = P (A)P (B)P (C) does not guarantee that the events A, B, C are
independent.
27
For example, if we tossed a coin 5 times, the probability that all are Heads is P (H1 )
P (H2 ) ... P (H5 ) = (1/2)5 = 1/32. However, this calculation also extends to outcomes
with unequal probabilities.
Example 2.19.
Three bits (0 or 1 digits) are transmitted over a noisy channel, so they will be flipped
independently with probability 0.1 each. What is the probability that
a) At least one bit is flipped
b) Exactly one bit is flipped?
Solution. a) Using the complement rule, P (at least one) = 1 P (none). If we denote Fk
the event that kth bit is flipped, then P (no bits are flipped) = P (F10 F20 F30 ) = (1 0.1)3
due to independence. Then,
P (at least one) = 1 0.93 = 0.271
b) Flipping exactly one bit can be accomplished in 3 ways:
P (exactly one) = P (F1 F20 F30 ) + P (F10 F2 F30 ) + P (F10 F20 F3 ) = 3(0.1)(1 0.1)2 = 0.243
It is slightly smaller than the one in part (a).
Self-test questions
Suppose you throw two dice, one after the other.
a) What is the probability that the first die shows a 2?
b) What is the probability that the second die shows a 2?
c) What is the probability that both dice show a 2?
d) What is the probability that the dice add up to 4?
e) What is the probability that the dice add up to 4 given that the first die shows a 2?
f) What is the probability that the dice add up to 4 and the first die shows a 2?
Answers:
a) The probability that the first die shows a 2 is 1/6.
b) The probability that the second die shows a 2 is 1/6.
c) The probability that both dice show a 2 is (1/6)(1/6) = 1/36 (using the special
multiplication rule, since the rolls are independent).
d) For the dice to add up to 4, there are three possibilitieseither both dice show a 2,
or the first shows a 3 and the second shows a 1, or the first shows a 1 and the second
shows a 3. Each of these has a probability of (1/6)(1/6) 3= 1/36 (using the special
multiplication rule, since the rolls are independent). Hence the probability that the
dice add up to 4 is 1/36 + 1/36 + 1/36 = 3/36 = 1/12 (using the special addition
rule, since the outcomes are mutually exclusive).
e) If the first die shows a 2, then for the dice to add up to 4 the second die must also
show a 2. So the probability that the dice add up to 4 given that the first shows a
2 is 1/6.
28
CHAPTER 2. PROBABILITY
f) Note that we cannot use the simplified multiplication rule here, because the dice
adding up to 4 is not independent of the first die showing a 2. So we need to use
the full multiplication rule. This tells us that probability that the first die shows a
2 and the dice add up to 4 is given by the probability that the first die shows a 2,
multiplied by the probability that the dice add up to 4 given that the first die shows
a 2. This is (1/6)(1/6) = 1/36.
Alternatively, see part (c).
and so on.
Lets fill out the tree representing the consecutive choices. See Figure 2.2.
Second marble
P(R2|R1) = 6 9
First marble
P(R1) = 7 10
P(G1) = 3 10
P(R1R2) =
7 6 42
* =
10 9 90
P(G2|R1) = 3 9
P(R1G2) = 21 90
P(R2|G1) = 7 9
P(G1R2) = ?
P(G2|G1) = 2 9
P(G1G2) = ?
The conditional probability P (R2 | R1 ) can be obtained directly from reasoning that after
we took the first red marble, there remain 6 red and 3 green marbles. On the other hand,
we could use the formula (2.4) and get
P (R2 | R1 ) =
P (R2 R1 )
42/90
2
=
=
P (R1 )
7/10
3
where the probability P (R2 R1 ) same as P (R1 R2 ) can be obtained from counting the
outcomes
7
76
42
7
2
21
P (R1 R2 ) = 10 = 109
=
=
90
15
21
2
Now, can you tell me what P (R2 ) and P (R1 | R2 ) are? Maybe you know the answer
already. However, we will get back to this question in Section 2.7.
29
Example 2.21.
Suppose that of all individuals buying a certain digital camera, 60% include an optional
memory card in their purchase, 40% include a set of batteries, and 30% include both a card
and batteries. Consider randomly selecting a buyer and let A={memory card purchased}
and B= {battery purchased}. Then find P (A|B) and P (B|A).
Solution. From given information, we have P (A) = 0.60, P (B) = 0.40, and P(both purchased) = P (A B) =0.30. Given that the selected individual purchased an extra battery,
the probability that an optional card was also purchased is
P (A|B) =
P (A B)
0.30
=
= 0.75
P (B)
0.40
That is, of all those purchasing an extra battery, 75% purchased an optional memory card.
Similarly
P (battery | memory card) = P (B|A) =
P (B A)
0.30
=
= 0.50
P (A)
0.60
Notice that P (A|B) 6= P (A) and P (B|A) 6= P (B), that is, the events A and B are
dependent.
Exercises
2.37.
A year has 53 Sundays. What is the conditional probability that it is a leap year?
2.38.
The probability that a majority of the stockholders of a company will attend a special
meeting is 0.5. If the majority attends, then the probability that an important merger will
be approved is 0.9. What is the probability that a majority will attend and the merger
will be approved?
2.39.
Let events A, B have positive probabilities. Show that, if P (A | B) = P (A) then also
P (B | A) = P (B).
2.40.
The cards numbered 1 through 10 are placed in a hat, mixed up, then one of the cards is
drawn. If we are told that the number on the drawn card is at least five, then what is the
probability that it is ten?
2.41.
In the roll of a fair die, consider the events A = {2, 4, 6} = even numbers and B =
{4, 5, 6} =high scores. Find the probability that die showing an even number given that
it is a high score.
2.42.
There are two urns. In the first urn there are 3 white and 2 black balls and in the second
urn there 1 white and 4 black balls. From a randomly chosen urn, one ball is drawn. What
is the probability that the ball is white?
30
CHAPTER 2. PROBABILITY
2.43.
The level of college attainment of US population by racial and ethnic group in 1998 is
given in the following tableb
Racial or Ethnic Group
Native Americans
Blacks
Asians
Hispanics
Whites
Number of
Adults
(Millions)
1.1
16.8
4.3
11.2
132.0
Percentage
with
Associates
Degree
Percentage
with
Bachelors
Degree
6.4
5.3
7.7
4.8
6.3
6.1
7.5
22.7
5.9
13.9
Percentage
with Graduate
or Professional
Degree
3.3
3.8
13.9
3.3
7.7
The percentages given in the right three columns are conditional percentages.
a) How many Asians have had a graduate or professional degree in 1998?
b) What percent of all adult Americans has had a Bachelors degree?
c) Given that the person had an Associates degree, what is the probability that the
person was Hispanic?
2.44.
Given that P (A) = 0.3, P (B) = 0.5 and P (B | A) = 0.4, find the following
a) P (AB)
b) P (A | B)
c) P (A0 | B)
d) P (A | B 0 )
2.45.
During the Spring semester, the probability that Johnny was late to school was 0.15. Also,
the probability it rained in the morning was 0.2. Finally, the probability it rained and
Johnny was late to school was 0.1.
a) Find the probability that Johnny was late to school if it rained that morning.
b) Find the probability that Johnny was late to school if it didnt rain that morning.
c) Are the events {Late} and {Rained} independent? Explain.
2.46.
The dealers lot contains 40 cars arranged in 5 rows and 8 columns. We pick one car
at random. Are the events A = {the car comes from an odd-numbered row} and B =
{the car comes from one of the last 4 columns} independent? Prove your point of view.
2.47.
You have sent applications to two colleges. If you are considering your chances to be
accepted to either college as 60%, and believe the results are statistically independent,
what is the probability that youll be accepted to at least one?
How will your answer change if you applied to 5 colleges?
2.48.
Show that, if the events A and B are independent, then so are A0 and B 0 .
31
2.49.
In a high school class, 50% of the students took Spanish, 25% took French and 30% of the
students took neither.
Let A = event that a randomly chosen student took Spanish, and B = event that a
student took French. Fill in either the Venn diagram or a 2-way table and answer the
questions:
B
'$
'$
B0
A
A0
&%
&%
a) Describe in words the meaning of the event AB 0 . Find the probability of this event.
b) Are the events A, B independent? Explain with numbers why or why not.
c) If it is known that the student took Spanish, what are the chances that she also took
French?
2.50.
Suppose that the events A and B are independent with P (A B) = 0.7 and P (A0 ) = 0.4.
Find P (A).
2.51.
Error-correcting codes are designed to withstand errors in data being sent over communication lines. Suppose we are sending a binary signal (consisting of a sequence of 0s and
1s), and during transmission, any bit may get flipped with probability p, independently
of any other bit. However, we might choose to repeat each bit 3 times. For example, if
we want to send a sequence 010, we will code it as 000111000. If one of the three bits
flips, say, the receiver gets the sequence 001111000, he will still be able to decode it as
010 by majority voting. That is, reading the first three bits, 001, he will interpret it as an
attempt to send 000. However, if two of the three bits are flipped, for example 011, this
will be interpreted as an attempt to send 111, and thus decoded incorrectly.
What is the probability of a bit being decoded incorrectly under this scheme?c
2.52. ?
One half of all female physicists are married. Among those married, 50% are married to
other physicists, 29% to scientists other than physicists and 21% to nonscientists. Among
male physicists, 74% are married. Among them, 7% are married to other physicists, 11%
to scientists other than physicists and 82% to nonscientists.d What percent of all physicists
are female? [Hint: This problem can be solved as is, but if you want to, assume that physicists
comprise 1% of all population.]
2.53. ?
Give an example of events A, B, C such that they are pairwise independent (i.e. P (AB) =
P (A)P (B) etc.) but P (ABC) 6= P (A)P (B)P (C). [Hint: You may build them on a sample
space with 4 elementary outcomes.]
32
CHAPTER 2. PROBABILITY
2.7
Bayes Rule
B1
B2
....
Bk
B2A
P (AB1 )
P (B1 )
and
P (A|B2 ) =
P (AB2 )
,
P (B2 )
P (AB1 )
P (A|B1 ) P (B1 )
=
,
P (A)
P (A)
therefore
P (B1 |A) =
P (B1 )P (A|B1 )
P (B1 )P (A|B1 ) + P (B2 )P (A|B2 )
2.7.
BAYES RULE
33
k
X
P (Bi A) =
i=1
k
X
P (Bi )P (A|Bi )
(2.8)
i=1
Subsequently,
P (Bj )P (A|Bj )
P (A)
The equation (2.8) is often called Law of Total Probability.
P (Bj |A) =
(2.9)
Example 2.22.
A rare genetic disease (occuring in 1 out of 1000 people) is diagnosed using a DNA screening test. The test has false positive rate of 0.5%, meaning that
P (test positive | no disease) = 0.005. Given that a person has tested positive, what is the
probability that this person actually has the disease?
First, guess the answer, then read on.
Solution. Lets reason in terms of actual numbers of people, for a change. Imagine 1000
people, 1 of them having the disease. How many out of 1000 will test positive? One that
actually has the disease, and about 5 disease-free people who would test false positive.4
Thus, P (disease | test positive) 1/6.
It is left as an exercise for the reader to write down the formal probability calculation.
Example 2.23.
At a certain assembly plant, three machines make 30%, 45%, and 25%, respectively, of
the products. It is known from the past experience that 2%, 3%, and 2% of the products
made by each machine, respectively, are defective. Now, suppose that a finished product
is randomly selected.
a) What is the probability that it is defective?
b) If a product were chosen randomly and found to be defective, what is the probability
that it was made by machine 3?
Solution. Consider the following events:
A: the product is defective
B1 : the product is made by machine 1,
B2 : the product is made by machine 2,
B3 : the product is made by machine 3.
4
a) Of course, of any actual 1000 people, the number of people having the disease and the number of
people who test positive will vary randomly, so our calculation only makes sense when considering averages
in a much larger population. b) Theres also a possibility of a false negative, i.e. person having the disease
and the test coming out negative. We will neglect this, quite rare, event.
34
CHAPTER 2. PROBABILITY
P (B3 )P (A|B3 )
0.005
=
= 0.2041
P (A)
0.0245
This calculation can also be represented using a tree. Here, the first branching represents probabilities of the events Bi , and the second branching represents conditional
probabilities P (A | Bi ). The probabilities of intersections, given by the products, are on
the right. P (A) is their sum.
0.02
(
((((
(
(
(
(((
hhhhhh
hhhh
h
0.3
0.03 (((((
((
(
(
(
(
0.45
hhhh
H
hhh
HH
hhhh
H
0.97
HH
0.25 HH
(((
0.02
HH
(((
(
(
(
H(
h(
hhh
hhhh
hhh
0.0135
0.005
Exercises
2.54.
Lucy is undecided as to whether to take a Math course or a Chemistry course. She
estimates that her probability of receiving an A grade would be 12 in a math course, and 23
in a chemistry course. If Lucy decides to base her decision on the flip of a fair coin, what
is the probability that she gets an A?
2.55.
Of the customers at a gas station, 70% use regular gas, and 30% use diesel. Of the
customers who use regular gas, 60% will fill the tank completely, and of those who use
diesel, 80% will fill the tank completely.
a) What percent of all customers will fill the tank completely?
b) If a customer has filled up completely, what is the probability it was a customer
buying diesel?
2.7.
BAYES RULE
35
2.56.
For an on-line electronics retailer, 5% of customers who buy Zony digital cameras will return them, 3% of customers who buy Lucky Star digital cameras will return them, and 8%
of customers who buy any other brand will return them. Also, among all digital cameras
bought, there are 20% Zonys and 30% Lucky Stars.
Fill in the tree diagram and answer the questions.
(a) What percent of all cameras are returned?
(b) If the camera was just returned, what is the probability it is a Lucky Star?
(c) What percent of all cameras sold were Zony and were not returned?
P (A B1 ) =
P (A|B
)
=
1
PP
PP
P
PP
PP
P (B1 )
PP
PP
PP
PP
P
P (B2 )
@
@
@
P
@(B3 )
@
@
@
@
@P
PP
PP
PP
PP
2.57.
In 2004, 57% of White households directly and/or indirectly owned stocks, compared to
26% of Black households and 19% of Hispanic households.e The data for Asian households
is not given, but lets assume the same rate as for Whites. Additionally, 77% of households
are classified as either White or Asian, 12% as African American, and 11% as Hispanic.
a) What proportion of all families owned stocks?
b) If a family owned stock, what is the probability it was White/Asian?
2.58.
Drawer one has five pairs of white and three pairs of red socks, while drawer two has three
pairs of white and seven pairs of red socks. One drawer is selected at random, and a pair
of socks is selected at random from that drawer.
a) What is the probability that it is a white pair of socks.
b) Suppose a white pair of socks is obtained. What is the probability that it came from
drawer two?
36
NOTES
2.59.
Three newspapers, A, B, and C are published in a certain city. It is estimated from a
survey that that of the adult population: 20% read A, 16% read B, 14% read C, 8%
read both A and B, 5% read both A and C, 4% read both B and C, 2% read all three.
What percentage reads at least one of the papers? Of those that read at least one, what
percentage reads both A and B?
2.60.
Suppose P (A|B) = 0.3, P (B) = 0.4, P (B|A) = 0.6. Find:
a) P (A)
b) P (A B)
2.61. ?
This is the famous Monty Hall problem.f A contestant on a game show is asked to choose
among 3 doors. There is a prize behind one door and nothing behind the other two. You
(the contestant) have chosen one door. Then, the host is flinging one other door open, and
theres nothing behind it. What is the best strategy? Should you switch to the remaining
door, or just stay with the door you have chosen? What is your probability of success
(getting the prize) for either strategy?
2.62. ?
There are two children in a family. We overheard about one of them referred to as a boy.
a) Find the probability that there are 2 boys in the family.
b) Suppose that the oldest child is a boy. Again, find the probability that there are 2
boys in the family.g [Why is it different from part (a)?]
Chapter exercises
2.63.
At a university, two students were doing well for the entire semester but failed to show up
for the final exam. Their excuse was that they traveled out of state and had a flat tire.
The professor gave them the exam in separate rooms, with one question worth 95 points:
which tire was it?. Find the probability that both students mentioned the same tire.h
2.64.
In firing the companys CEO, the argument was that during the six years of her tenure, for
the last three years the companys market share was lower than for the first three years.
The CEO claims bad luck. Find the probability that, given six random numbers, the last
three are the lowest among six.
Notes
a
NOTES
d
37
Laurie McNeil and Marc Sher. The dual-career-couple problem. Physics Today, July 1999.
According to http://www.highbeam.com/doc/1G1-167842487.html, Consumer Interests Annual, January 1, 2007 by Hanna, Sherman D.; Lindamood, Suzanne
f
There are some interesting factoids about this in Mlodinows book, including Marylin vos Savants
column in Parade magazine and scathing replies from academics, who believed that the probability was
50%. Vos Savant did it again in 2011 with another probability question that seems, however, intentionally
ambiguously worded.
g
Puzzle cited by Martin Gardner, mentioned in Math Horizons, Sept. 2010. See also the discussion at
http://www.stat.columbia.edu/~cook/movabletype/archives/2010/05/hype\_about\_cond.html
h
This example is also from Mlodinows book.
e
38
NOTES
Chapter 3
Discrete distributions
In this chapter, we will consider random quantities that are usually called random variables.
Definition 3.1. Random variable
A random variable (RV) is a number associated with each outcome of some
random experiment.
One can think of the shoe size of a randomly chosen person as a random variable. We
have already seen the example when a die was rolled and a number was recorded. This
number is also a random variable.
Example 3.1.
Toss two coins and record the number of heads: 0, 1 or 2. Then the following outcomes
can be observed.
Outcome
Number of heads
TT
0
HT
1
TH
1
HH
2
The random variables will be denoted with capital letters X, Y, Z, ... and the lowercase x
would represent a particular value of X. For the above example, x = 2 if heads comes up
twice. Now we want to look at the probabilities of the outcomes. For the probability that
the random variable X has the value x, we write P (X = x), or just p(x).
For the coin flipping random variable X, we can make the table:
x
p(x)
1/4
1/2
1/4
39
40
A set is called countable if it can be enumerated with positive integers 1, 2, 3, .... Most frequently
we will use integers themselves, or nonnegative integers as possible values of X. Note, however,
that the set of all rational fractions m/n, where both m and n are integers, is also countable.
What does this actually mean? A discrete probability function is a function that
can take a discrete number of values (not necessarily finite). There is no mathematical
restriction that discrete probability functions only be defined at integers, but we will use
integers in many practical situations. For example, if you toss a coin 6 times, you can get
2 heads or 3 heads but not 2.5 heads.
Each of the discrete values has a certain probability of occurrence that is between zero
and one. That is, a discrete function that allows negative values or values greater than
one is not a PMF. The condition that the probabilities add up to one means that one of
the values has to occur.
Example 3.2.
A shipment of 8 similar microcomputers to a retail outlet contains 3 that are defective.
If a school makes a random purchase of 2 of these computers, find the probability mass
function for the number of defectives.
Solution. Let X be a random variable whose values x are the possible numbers of defective
computers purchased by school. Then x must be 0, 1 or 2. Then,
3 5
10
0 2
P (X = 0) = 8 =
28
2
5
P (X = 1) =
3
1
1 =
8
2
5
P (X = 2) =
3
2
p(x)
10
28
15
28
3
28
0 =
8
2
15
28
3
28
41
p(y)
yx
lim F (x) = 0
b) lim F (x) = 1
x
c) F (x) is non-decreasing
d) p(x) = F (x) F (x) = F (x) lim F (y)
yx
In words, CDF of a discrete RV is a step function, whose jumps occur at the values x for
which p(x) > 0 and are equal in size to p(x). It ranges from 0 on the left to 1 on the right.
Example 3.3.
Find the CDF of the random variable from Example 3.2. Using F (x), verify that P (X =
1) = 15/28.
Solution. The CDF of the random variable X is:
10
F (0) = p(0) = 28
F (1) = p(0) + p(1) = 25
28
F (2) = p(0) + p(1) + p(2) = 28
28 = 1.
Hence,
0
for x < 0
10/28 for 0 x < 1
F (x) =
25/28
for 1 x < 2
1
for x 2
Now, P (X = 1) = p(1) = F (1) F (0) =
25
28
10
28
(3.1)
15
28 .
Exercises
3.1.
Suppose that two dice are rolled independently, with outcomes X1 and X2 . Find the
distribution of the random variable Y = X1 + X2 . [Hint: Its easier to visualize all the
ourcomes if you make a two-way table.]
F(x)
p(x)
42
k
for x = 1, 0, 1, 2
2x
3.4.
With reference to the previous problem find an expression for the values of F (x), that is
CDF of X.
3.5.
For an on-line electronics retailer, X = the number of Zony digital cameras returned per
day follows the distribution given by
x
p(x)
0
0.05
1
0.1
2
?
3
0.2
4
0.25
5
0.1
43
3.7.
The CDF of a discrete random variable X is shown in the plot below.
0.6
0.4
0.0
0.2
F(x)
0.8
1.0
CDF
3.2
One of the most important things wed like to know about a random variable is: what
value does it take on average? What is the average price of a computer? What is the
average value of a number that rolls on a die?
The value is found as the average of all possible values, weighted by how often they
occur (i.e. probability)
Definition 3.4. Expected value (mean)
The mean or expected value of a discrete random variable X with probability mass
function p(x) is given by
X
E (X) =
x p(x)
x
44
x2 p(x).
The variance defines the average (or expected) value of the squared difference from the
mean.
If we use V (X) = E (X )2 as a definition, we can see that
V (X) = E (X )2 = E (X 2 2X + 2 ) = E (X 2 ) 2E (X) + 2 = E (X 2 ) 2
due to the linearity of expectation (see Theorem 3.2 below).
Definition 3.6. Standard deviation
The standard deviation of a random variable X is the square root of the variance,
and is given by
p
= 2 = E (X )2
The mean describes the center of the probability distribution, while standard deviation
describes the spread. Larger values of signify a distribution with larger variation. This
will be undesirable in some situations, e.g. industrial process control, where we would like
the manufactured items to have identical characteristics. On the other hand, a degenerate
random variable X that has P (X = a) = 1 for some value of a is not random at all, and
it has the standard deviation of 0.
Example 3.4.
The number of fire emergencies at a rural county in a week, has the following distribution
x
0
1
2
3
4
P (X = x) 0.52 0.28 0.14 0.04 0.02
Find E (X), V (X) and .
Solution. From Definition 3.4, we see that
E (X) = 0(0.52) + 1(0.28) + 2(0.14) + 3(0.04) + 4(0.02) = 0.76 =
and from definition of E (X 2 ), we get
E (X 2 ) = 02 (0.52) + 12 (0.28) + 22 (0.14) + 32 (0.04) + 42 (0.02) = 1.52
Hence, from Definition 3.5, we get
V (X) = E (X 2 ) 2 = 1.52 (0.76)2 = 0.9424
45
1
k2
To interpret this result, let k = 2, for example. Then the interval from 2 to + 2
must contain at least 1 k12 = 1 14 = 43 of the probability mass for the random variable.
Chebyshev inequality is useful when the mean and variance of a RV are known and we
would like to calculate estimates of some probabilities. However, these estimates are
usually quite crude.
Example 3.6.
The performance period of a certain car battery is known to have a mean of 30 months
and standard deviation of 5 months.
a) Estimate the probability that a car battery will last at least 18 months.
b) Give a range of values to which at least 90% of all batteries lifetimes will belong.
1
Note that in general E (g(X)) 6= g(E X), the equality is guaranteed only if g is a linear function!
46
Solution. (a) Let X be the battery performance period. Calculate k such that the value of
18 is k standard deviations below the mean: 18 = 30 5k, therefore k = (30 18)/5 = 2.4.
From Chebyshevs theorem we have
P (30 5k < X < 30 + 5k) > 1 1/k 2 = 1 1/2.42 = 0.826
Thus, at least 82.6% of batteries will make it to 18 months. (However, in reality this
percentage could be much higher, depending on distribution.)
(b) From Chebyshevs theorem we have
P ( k < X < + k) > 1
1
k2
According to the problem set 1 k12 = 0.90 and solve for k, we get k = 10 = 3.16. Hence,
the desired interval is between 30 3.16(5) and 30 + 3.16(5) = 14.2 to 45.8 months.
Example 3.7.
The number of customers per day at a certain sales counter, X, has a mean of 20 customers
and standard deviation of 2 customers. The probability distribution of X is not known.
What can be said about the probability that X will be between 16 and 24 tomorrow?
Solution. We want P (16 X 24) = P (15 < X < 25). From Chebyshevs theorem
P ( k < X < + k) 1
1
k2
Exercises
3.8.
Timmy is selling cholocates door to door. The probability distribution of X, the number
of cholocates he sells in each house, is given by
x
P (X = x)
0
0.45
1
0.25
2
0.15
3
0.1
4
0.05
47
3.12.
Consider X with the distribution of a random digit, p(x) = 1/10, x = 0, 1, 2, ..., 9
a) Find the mean and standard deviation of X.
b) According to Chebyshevs inequality, estimate the probability that a random digit
will be between 1 and 8, inclusive. Compare to the actual probability.
3.13.
In the Numbers game, two players choose a random number between 1 and 6, and compute
the absolute difference.
That is, if Player 1 gets the number Y1 , and Player 2 gets Y2 , then they find
X = |Y1 Y2 |
a) Find the distribution of the random variable X (make a table). [Hint: consider all
outcomes (y1 , y2 ).]
b) Find the expected value and variance of X, and E (X 3 )
c) If Player 1 wins whenever the difference is 3 or more, and Player 2 wins whenever
the difference is 2 or less, who is more likely to win?
d) If Player 1 bets $1, what is the value that Player 2 should bet to make the game
fair?
3.14.
According to ScanUS.com, the number of cars per household in an Albuquerque neighborhood was distributed as follows
0
1
2
3+
x
P (X = x) 0.047 0.344 0.402 0.207
3+ really means 3 or more, but lets assume that there are no more than 3 cars in any
household.
Find the expected value and standard deviation of X.
3.15.
For the above Problem, the web site really reported the average of 1.9 cars per household.
This is higher than the answer for the Problem 3.14. Probably, its due to the fact that
we limited the number of cars by 3.
Suppose we limit the number of cars by 4. This means the distribution will look like
x
0
1
2
3
4
p(x) 0.047 0.344 0.402 p3 p4
where p3 + p4 = 0.207. Assuming that E (X) = 1.9, reverse-engineer this information to
find p3 and p4 .
48
3.16.
The frequencies of electromagnetic waves in the upper ionosphere observed in the vicinity
of earthquakes have the mean 1.7 kHz, and standard deviation of 0.2 kHz. According to
Chebyshev inequality,
a) What percent of all observed waves is guaranteed to be contained in the interval 1.4
to 2.0 kHz?
b) Give an interval that would contain at least 95% of all such observed waves.
3.17.
Find the mean and variance of the given PMF p(x) = 1/k, where x = 1, 2, 3, ..., k.
3.18.
Show that the function defined by p(x) = 2x for x = 1, 2, 3, ... can represent a probability
mass function of a random variable X. Find the mean and the variance of X.
3.19.
For t > 0 show that p(x) = et (1 et )x1 , x = 1, 2, 3, ... can represent a probability
mass function. Also, find E (X) and V (X).
3.20. ?
The average salary of the employees in a firm is 80 thousand dollars, and the standard
deviation is 100 thousand. Given that the salary cant be negative, what can you say
about the proportion of the employees who earn more than 150 thousand?
3.21. ? Bakers problem
A shopkeeper is selling the quantity X (between 0 and 3) of a certain item per week, with
a given probability distribution:
x
p(x)
0
0.05
1
0.2
2
0.5
3
0.25
For each item bought, the profit is $50. On the other hand, if the item is stocked, but was
not bought, then the cost of upkeep, insurance etc. is $20. At the beginning of the week,
the shopkeeper stocks a items.
For example, if 3 items were stocked, then the expected profit can be calculated from the
following table:
Y = Profit
y
p(y)
$60
0.05
$10
0.2
$80
0.5
$150
0.25
3.3
49
Bernoulli distribution
Let X be the random variable denoting the condition of the inspected item. Agree to
write X = 1 when the item is defective and X = 0 when it is not. (This is a convenient
notation because, once we inspect n such items, X1 , X2 , ..., Xn denoting their condition,
the total number of defectives will be given by X1 + X2 + ... + Xn .)
Let p denote the probability of observing a defective item. The probability distribution
of X, then, is given by
x
p(x)
0
q =1p
1
p
3.4
Binomial distribution
Now, let us inspect n items and count the total number of defectives. This process of
repeating an experiment n times is called Bernoulli trials. The Bernoulli trials are
formally defined by the following properties:
a) The result of each trial is either a success or a failure
b) The probability of success p is constant from trial to trial.
c) The trials are independent
d) The random variable X is defined to be the number of successes in n repeated trials
This situation applies to many random processes with just two possible outcomes: a headsor-tails coin toss, a made or missed free throw in basketball etc2 . We arbitrarily call one
of these outcomes a success and the other a failure.
Definition 3.7. Binomial RV
Assume that each Bernoulli trial can result in a success with probability p and a
failure with probability q = 1 p. Then the probability distribution of the
binomial random variable X, the number of successes in n independent trials, is
n k nk
P (X = k) =
p q
, k = 0, 1, 2, . . . , n.
k
The mean and variance of the binomial distribution are
E (X) = = np
and
V (X) = 2 = npq.
We can notice that the mean and variance of the Binomial are n times larger than those
2
However, we have to make sure that the probability of success remains constant. Thus, for example,
wins or losses in a series of football games may not be a Bernoulli experiment!
50
0.00
0.05
0.10
0.15
0.20
15
20
25
30
35
40
45
50
10
15
Figure 3.2: Binomial PMF: left, with n = 60, p = 0.6; right, with n = 15, p = 0.5
0.20
0.15
0.10
0.20
0.00
0.00
0.05
0.10
p(x)
0.30
0.25
Note that Binomial distribution is symmetric when p = 0.5. Also, two Binomials with
the same n and p2 = 1 p1 are mirror images of each other.
10
15
10
15
Figure 3.3: Binomial PMF: left, with n = 15, p = 0.1; right, with n = 15, p = 0.8
Example 3.8.
The probability that a certain kind of component will survive a shock test is 0.75. Find
the probability that
a) exactly 2 of the next 8 components tested survive,
b) at least 2 will survive,
c) at most 6 will survive.
Solution. (a) Assuming that the tests are independent and p = 0.75 for each of the 8 tests,
we get
8
8!
0.752 0.256 =
P (X = 2) =
(0.75)2 (0.25)82 =
2
2! (8 2)!
=
(b)
40320
(0.5625)(0.000244) = 0.003843
2 720
P (X 2) = 1 P (X 1) = 1 [P (X = 1) + P (X = 0)]
= 1 [8(0.75)(0.000061) + 0.000002] = 1 0.000386 0.9996
51
P (X 6) = 1 P (X 7) = 1 [P (X = 7) + P (X = 8)]
= 1 [0.2669 + 0.1001] = 1 0.367 = 0.633
Example 3.9.
It has been claimed that in 60% of all solar heating installations the utility bill is reduced
by at least one-third. Accordingly, what are the probabilities that the utility bill will be
reduced by at least one-third in
(a) four of five installations;
(b) at least four of five installations?
Solution.
(a)
(b)
5
P (X = 4) =
(0.60)4 (0.4)54 = 5(0.1296)(0.4) = 0.2592
4
5
P (X = 5) =
(0.60)5 (0.40)55 = 0.605 = 0.0777
5
Exercises
3.22.
Theres 50% chance that a mutual fund return on any given year will beat the industrys
average. What proportion of funds will beat the industry average for at least 4 out of 5
last years?
3.23.
Biologists would like to catch Costa Rican glass frogs for breeding. There is 75% probability that a glass frog they catch is male. If 10 glass frogs of a certain species are caught,
what are the chances that they will have at least 2 male and 2 female frogs? What is the
expected value of the number of female frogs caught?
3.24.
A 5-member focus group are testing a new game console. Suppose that theres 50%
chance that any given group member approves of the new console, and their opinions are
independent of each other.
a) Calculate and fill out the probability distribution for X = number of group members
who approve of the new console.
b) Calculate P (X 3).
c) How does your answer in part (b) change when theres 70% chance that any group
member approves of the new console?
3.25.
Suppose that the four engines of a commercial airplane were arranged to operate independently and that the probability of in-flight failure of a single engine is 0.01. Find:
52
3.26.
Suppose a television contains 60 transistors, 2 of which are defectives. Five transistors are
selected at random, removed and inspected. Approximate
a) probability of selecting no defectives,
b) probability of selecting at least one defective.
c) The mean and variance for the number of defectives selected.
3.27.
A train is made up of 50 railroad cars. Each car may need service with probability 0.05.
Let X be the total number of cars in the train that need service.
a) Find the mean and standard deviation of X.
b) Find the probability that no cars need service.
c) Find the probability that at least two cars need service.
3.28.
Show that mean and variance of the binomial random variable X are np and npq respectively.
3.29.
If a thumb-tack is flipped, then the probability that it will land point-up is 1/3. If this
thumb-tack is flipped 6 times, then find:
a) the probability that it lands point-up on exactly 2 flips,
b) at least 2 flips,
c) at most 4 flips.
3.30.
The proportion of people with type A blood in a certain city is reported to be 0.20.
Suppose a random group of 20 people is taken and their blood types are to be checked.
What is the probability that there are at least 4 people who have type A blood in the
sample? What is the probability that at most 5 people in the group have type A blood?
3.31.
A die and a coin are tossed together. Let us define success as the event that the die shows
an odd number and the coin shows a head. We repeat the experiment 5 times. What is
the probability of exactly 3 successes?
3.5
53
Geometric distribution
In the case of Binomial distribution, the number of trials was a fixed number n, and
the variable of interest was the number of successes. It is sometimes of interest to count
instead how many trials are required to achieve a specified number of successes.
The number of trials Y required to obtain the first success is called a Geometric random
variable with parameter p.
Theorem 3.4. Geometric RV
The probability mass function for a Geometric random variable is
g(y; p) := P (Y = y) = (1 p)y1 p,
y = 1, 2, 3, . . .
Its CDF is
F (y) = 1 q y ,
q =1p
y = 1, 2, 3, . . . ,
1
p
and
2 =
1p
p2
Proof. To achieve the first success on yth trial means to have the first y 1 trials to result
in failures, and the last yth one a success, and then by independence of trials,
P (F F...F S) = q y1 p
Now the CDF
F (y) = P (Y y) = 1 P (Y > y)
The latter means that all the trials up to and including the yth one, resulted in failures,
which equals P (y failures in a row) = q y and we get the CDF subtracting this from 1.
The mean E (Y ) can be found by differentiating a geometric series:
E (Y ) =
yp(y) =
yp(1 p)y1 = p
y(1 p)y1 =
y=1
y=1
y=1
X
d y
d X y
d
2
3
=p
q =p
q =p
(1 + q + q + q + 1) =
dq
dq y=1
dq
y=1
d
d
p
1
=p
(1 q)1 (1) =
= .
dq
dq
(1 q)2
p
The variance can be calculated by differentiating a geometric series twice:
E {Y (Y 1)} =
y(y 1)p q y1 = pq
y=2
= pq
Hence
d2
dq 2
"
#
q y = pq
y=0
E (Y 2 ) =
2q 1
+
p2
p
X
d2 y
(q ) =
dq 2
y=2
d2
2
2q
(1 q)1 = pq
= 2
dq 2
(1 q)3
p
and
V (Y ) =
2q 1
1
q
+ 2 = 2
2
p
p p
p
0.4
0.3
0.0
0.1
0.2
p(x)
0.3
0.2
0.0
0.1
p(x)
0.4
0.5
54
10
15
20
10
15
20
Figure 3.4: Geometric PMF: left, with p = 0.2; right, with p = 0.5
Example 3.10.
For a certain manufacturing process it is known that, on the average, 1 in every 100 items
is defective. What is the probability that the first defective item found is the fifth item
inspected? What is the average number of items that should be sampled before the first
defective is found?
Solution. Using the geometric distribution with x = 5 and p = 0.01, we have
g(5; 0.01) = (0.01)(0.99)4 = 0.0096.
Mean number of items needed is = 1/p = 100.
Example 3.11.
If the probability is 0.20 that a burglar will get caught on any given job, what is the
probability that he will get caught no later than on his fourth job?
Solution. Substituting y = 4 and p = 0.20 into the geometric CDF, we get
P (Y 4) = 1 0.84 = 0.5904
Exercises
3.32.
The probability to be caught while running a red light is estimated as 0.1. What is the
probability that a person is first caught on his 10th attempt to run a red light? What is
the probability that a person runs a red light at least 10 times without being caught?
3.33.
A computing center is interviewing people until they find a qualified person to fill a vacant
position. The probability that any single applicant is qualified is 0.15.
a) Find the expected number of people to interview.
b) Find the probability the center will need to interview between 4 and 8 people (inclusive).
3.34.
From past experience it is known that 3% of accounts in a large accounting population
are in error. What is the probability that the first account in error is found on the 5th
try? What is the probability that the first account in error occurs in the first five accounts
audited?
55
3.35.
A rat must choose between five doors, one of which contains chocolate. If the rat chooses
the wrong door, it is returned to the starting point and chooses again (randomly), and
continues until it gets the chocolate. What is the probability of the rat getting chocolate
on the second attempt?
3.36.
If the probability of a success is 0.01, how many trials are necessary so that probability of
at least one success is greater than 0.5?
3.6
Let Y denote the number of the trial on which the rth success occurs in a sequence of
independent Bernoulli trials, with p the probability of success. Such Y is said to have
Negative Binomial distribution. When r = 1, we will of course obtain the Geometric
distribution.
Theorem 3.5. Negative Binomial RV
The PMF of the Negative Binomial random variable Y is
y1
pr q yr , y = r, r + 1, . . .
nb(y; r, p) := P (Y = y) =
r1
The mean and variance of Y are:
E (Y ) =
r
p
and
V (Y ) =
rq
.
p2
Proof. We have P (Y = y) =
= P [First y 1 trials contain r 1 successes and yth trial is a success] =
y 1 r1 yr
y 1 r yr
=
p
q
p=
p q , y = r, r + 1, r + 2, . . .
r1
r1
The proof for the mean and variance uses the properties of the independent sums to
be discussed in Section 5.4. However, note at this point that both and 2 are r times
larger than those of the Geometric distribution.
Example 3.12.
In an NBA championship series, the team which wins four games out of seven will be the
winner. Suppose that team A has probability 0.55 of winning over the team B, and the
teams A and B face each other in the championship games.
(a) What is the probability that team A will win the series in six games?
(b) What is the probability that team A will win the series?
Solution.
(a) nb(6; 4, 0.55) = 53 (0.55)4 (1 0.55)64 = 0.1853.
(b) P(team A wins the championship series) =
= nb(4; 4, 0.55) + nb(5; 4, 0.55) + nb(6; 4, 0.55) + nb(7; 4, 0.55) =
= 0.0915 + 0.1647 + 0.1853 + 0.1668 = 0.6083
56
Example 3.13.
A pediatrician wishes to recruit 5 couples, each of whom is expecting their first child, to
participate in a new childbirth regimen. She anticipates that 20% of all couples she asks
will agree. What is the probability that 15 couples must be asked before 5 are found who
agree to participate?
Solution. Substituting x = 15, p = 0.2, r = 5, we get
14
nb(15; 5, 0.2) =
(0.2)5 (0.8)155 = 0.034
4
Exercises
3.37.
Biologists catch Costa Rican glass frogs for breeding. There is 75% probability that a
glass frog they catch is male. Biologists would like to have at least 2 female frogs. What is
the expected value of the total number of frogs caught, until they reach their goal? What
is the probability that they will need exactly 6 frogs to reach their goal?
3.38.
Jim is a high school baseball player. He has 0.25 batting average, meaning that he makes
a hit in 25% of his tries (at-bats)3 . What is the probability that Jim makes his second
hit of the season on his sixth at-bat?
3.39.
A telemarketer needs to sell 3 insurance policies before lunch. He estimates the probability
of a sale as 0.1. How many calls, on average, does he need to make before lunch? What
is the probability that he needs exactly 25 calls to reach his goal?
3.40.
In the best-of-5 series, Team A has 60% chance to win any single game, and the outcomes
of the games are independent. Find the probability that Team A will win the series (i.e.
will win the majority of the games).
3.41.
For Problem 3.40, find the expected duration of the series (regardless of which team wins).
[Hint: First, fill out the table containing d, p(d) the distribution of the duration D. For example,
P (D = 3) = P (team A wins in 3) + P (team B wins in 3)]
3.7
57
Poisson distribution
It is often useful to define a random variable that counts the number of events that occur
within certain specified boundaries. For example, the average number of telephone calls
received by customer service within a certain time limit. The Poisson distribution is often
appropriate to model such situations.
Definition 3.8. Poisson RV
A random variable X with a Poisson distribution takes the values x = 0, 1, 2, . . .
with a probability mass function
pois(x; ) := P (X = x) =
e x
x!
Some textbooks use for the parameter. We will use for the intensity of the Poisson process,
to be discussed later.
X
X
e x X x e x1
=
=
E (X) =
x pois(x, ) =
x
x!
x(x 1)!
ex = 1 + x +
Now,
x=1
x=0
= e
X
x=1
2 3
x1
= e
1+ +
+
. . . = e e =
(x 1)!
1!
2!
3!
x(x 1)
x=0
= 2 e
X
e x
2 e x2
=
x(x 1)
x!
x(x 1)(x 2)!
x=2
X
x2
= 2 e e = 2
(x
2)!
x=2
0.20
0.00
0.10
p(x)
0.00
0.10
p(x)
0.20
0.30
58
10
15
20
10
15
20
x
x
Figure 3.5: Poisson
PMF: left, with = 1.75; right, with
=8
Example 3.14.
During World War II, the Nazis bombed London using V-2 missiles. To study the locations
where missiles fell, the British divided the central area of London into 576 half-kilometer
squares.i The following is the distribution of counts per square
Number of missiles in
a square
0
1
2
3
4
5 and over
Total
Number of squares
229
211
93
35
7
1
576
Expected
(Poisson)
Number of squares
227.5
211.3
98.1
30.4
7.1
1.6
576.0
e0.9288 0.92880
= 227.5
0!
The same way, fill out the rest of the expected counts column. As you can see, the data
match the Poisson model very closely!
Poisson distribution is often mentioned as a distribution of spatial randomness. As a
result, British command were able to conclude that the missiles were unguided.
Using the CDF
Knowledge of CDF (cumulative distribution function) is useful for calculating probabilities
of the type P (a X b). In fact,
P (a < X b) = FX (b) FX (a)
(3.2)
(you have to carefully watch strict and non-strict inequalities). We might use CDF tables
(see Appendix) to calculate such probabilities. Nowadays, CDFs of popular distributions
are built into various software packages.
59
Example 3.15.
During a laboratory experiment, the average number of radioactive particles passing
through a counter in one millisecond is 4. What is the probability that 6 particles enter
the counter in a given millisecond? What is the probability of at least 6 particles?
Solution. Using the Poisson distribution with x = 6 and = 4, we get
pois(6; 4) =
e4 46
= 0.1042
6!
e2 21
1!
= 0.271
b) P (X 2) = P (X = 0) + P (X = 1) + P (X = 2) =
= 0.1353 + 0.271 + 0.271 = 0.6766
e2 20
0!
e2 21
1!
e2 22
2!
Exercises
3.42.
Number of cable breakages in a year is known to have Poisson distribution with = 0.32.
a) Find the mean and standard deviation of the number of cable breakages in a year.
b) According to Chebyshevs inequality, what is the upper bound for P (X 2)?
c) What is the exact probability P (X 2), based on Poisson model?
3.43.
Bolted assemblies on a hull of spacecraft may become loose with probabiity 0.005. There
are 96 such assemblies on board. Assuming that assemblies behave statistically independently, find the probability that there is at most one loose assembly on board.
3.44.
At a barber shop, expected number of customers per day is 8. What is a probability that,
on a given day, between 7 and 9 customers (inclusive) show up? At least 3 customers?
60
3.45.
Poisson distribution can be derived by considering Binomial with n large and p small.
Compare computationally
a) Binomial with n = 20, p = 0.05: find P (X = 0), P (X = 1) and P (X = 2).
b) Repeat for Binomial with n = 200, p = 0.005
c) Poisson with = np = 1 [Note that matches the expected value for both (a) and
(b).]
d) Compare the standard deviations for distributions in (a)-(c)
3.46.
An airline finds that 5% of the people making reservations on a certain flight will not
show up for the flight. If the airline sells 160 tickets for a flight with 155 seats, what is
the probability that the flight ends up overbooked, i.e. more that 155 people will show
up? [Hint: Use the Poisson approximation for the number of people who will not show up.]
3.47.
A region experiences, on average, 7.5 earthquakes (magnitude 5 or higher), per year.
Assuming Poisson distribution, find the probability that
a) between 5 and 9 earthquakes will happen in a year;
b) at least one earthquake will happen in a given month.
c) Find the mean and standard deviation of the number of earthquakes per year.
3.48.
A plumbing company estimates to get the average of 60 service calls per week. Assuming
Poisson distribution, find the probability that, in a given week
a) it gets exactly 60 service calls;
b) it gets between 55 and 59 service calls.
3.49.
A credit card company estimates that, on average, 0.18% of all its internet transactions
are fraudulent. Out of 1000 transactions,
a) find the mean and standard deviation of the number of fraudulent transcations,
b) approximate the probability that at least one transaction will be fraudulent,
c) approximate the probability that 3 or less transactions will be fraudulent.
3.8
Hypergeometric distribution
Consider the Hypergeometric experiment, that is, one that possesses the following
two properties:
a) A random sample of size n is selected without replacement from N items.
b) Of the N items overall, k may be classified as successes and N k are classified as
failures.
61
We will be interested, as before, in the number of successes X, but now the probability
of success is not constant (why?).
Theorem 3.7.
The PMF of the hypergeometric random variable X, the number of successes in a
random sample of size n selected from N items of which k are labeled success and
N k labeled failure, is
k
N k
x
nx
hg(x; N, n, k) =
, x = 0, 1, ..., min(n, k)
N
n
The mean and variance
of the hypergeometric distribution are = n Nk and
N n
2 = n Nk 1 Nk
N 1
We have already seen such a random variable: see Example 3.2. Here are some more
examples.
Example 3.17.
Lots of 40 components each are called unacceptable if they contain as many as 3 defectives
or more. The procedure for sampling the lot is to select 5 components at random and to
reject the lot if a defective is found. What is the probability that exactly 1 defective is
found in the sample if there are 3 defectives in the entire lot?
Solution. Using the above distribution with n = 5, N = 40, k = 3 and x = 1, we can find
the probability of obtaining one defective to be
3 37
hg(1; 40, 5, 3) =
4
40
5
= 0.3011
Example 3.18.
A shipment of 20 tape recorders contains 5 that are defective. If 10 of them are randomly
chosen for inspection, what is the probability that 2 of the 10 will be defective?
Solution. Subsituting
x = 2, n = 10, k = 5, and N = 20 into the formula, we get
5 15
10(6435)
P (X = 2) = 2 208 =
= 0.348
184756
10
Note that, if we were sampling with replacement, we would have Binomial distribution
(why?) with p = k/N . In fact, if N is much larger than n, then the difference between
Binomial and Hypergeometric distribution becomes small.
Exercises
3.50.
Out of 10 construction facilities, 4 are in-state and 6 are out of state. Three facilities are
earmarked as test sites for a new technology. What is the probability that 2 out of 3 are
out of state?
62
3.51.
A box contains 8 diodes, among them 3 are of new design. If 4 diodes are picked randomly
for a circuit, what is the probability that at least one is of new design?
3.52.
There are 25 schools in a district, 10 of which are performing below standard. Five schools
are selected at random for an in-depth study. Find:
a) Probability that in your sample, no schools perform below standard.
b) Probability of selecting at least one that performs below standard.
c) The mean and variance for the number of the schools that perform below standard.
3.53.
A small division, consisting of 6 women and 4 men, picks employee of the month for 3
months in a row. Suppose that, in fact, a random person is picked each month. Let X be
the number of times a woman was picked. Calculate the distribution of X (make a table
with all possible values), for the cases
a) No repetitions are allowed.
b) Repetitions are allowed (the same person can be picked again and again).
c) Compare the results.
3.54.
A jar contains 50 red marbles and 30 blue marbles. Four marbles were selected at random.
Find the probability to obtain at least 3 red marbles, if the sampling was
a) without replacement;
b) with replacement.
c) Compare the results.
3.9
We saw in an earlier section that, if g(Y ) is a function of a random variable Y with PMF
p(y), then
X
E [g(Y )] =
g(y)p(y)
y
The expected values of powers of random variables are often called moments. For
example, E (Y ) is the first moment of Y , and E (Y 2 ) is the second moment of Y . When
63
etx pq x1 =
x=1
pX t x
(qe )
q
x=1
On the right, we have an infinite geometric series with first term qet and the ratio qet . Its
X
qet
sum is
. We obtain
(qet )x =
1 qet
x=1
1
t
M (t) = p e
1 qet
Exercises
3.55.
Find MX (t) for random variables X given by
a) p(x) = 1/3, x = 1, 0, 1
x+1
1
b) p(x) =
, x = 0, 1, 2, . . .
2
64
1 3
, x = 0, 1, 2, 3
8 x
3.56.
1
1 t2
Chapter 4
Continuous probability
distributions
4.1
All of the random variables discussed previously were discrete, meaning they can take only
a finite (or, at most, countable) number of values. However, many of the random variables
seen in practice have more than a countable collection of possible values. For example, the
metal content of ore samples may run from 0.10 to 0.80. Such random variables can take
any value in an interval of real numbers. Since the random variables of this type have a
continuum of possible values, they are called continuous random variables.1
Definition 4.1. Density (PDF)
The function f (x) is a probability density function (PDF) for the continuous
random variable X, defined over the set of real numbers R, if
a) f (x) 0, for all x
Z
b)
f (x) dx = 1.
Z
c) P (a X b) =
f (x) dx.
a
What does this actually mean? Since continuous probability functions are defined for an infinite
number of points over a continuous interval, the probability at a single point is always zero. Probabilities are measured over intervals, not single points. That is, the area under the curve between
1
Even though the tools we will use to describe continuous RVs are different from the tools we use for
discrete ones, practically there is not an enormous gulf between them. For example, a physical measurement
of, say, wavelength may be continuous. However, when the measurements are recorded (either on paper
or in computer memory), they will take a finite number of values. The number of values will increase
if we keep more decimals in the recorded quantity. With rounding we can discretize the problem, that
is, reduce a continuous problem to a discrete one, whose solution will hopefully be close enough to the
continuous one. In order to see if we have discretized a problem in a right way, we still need to know
something about the nature of continuous random variables.
65
66
two distinct points defines the probability for that interval. This means that the height of the
probability function can in fact be greater than one. The property that the integral must equal
one is equivalent to the property for discrete distributions that the sum of all the probabilities
must equal one.
Example 4.1.
Suppose that the error in the reaction temperature, in C, for a controlled laboratory
experiment is a continuous random variable X having the density
( 2
x
for 1 x 2
f (x) = 3
0
elsewhere
(a) Verify condition (b) of Definition 4.1.
(b) Find P (0 < X < 1).
R
R2 2
3
= 1 x3 dx = x9 |21 =
R1 2
3
(b) P (0 < X < 1) = 0 x3 dx = x9 |10 = 91 .
Solution. (a)
f (x)dx
8
9
1
9
=1
As an immediate consequence of equation (4.1) one can write these two results:
(a) P (a < X b) = F (b) F (a)
F (x) =
f (t)dt =
x
t2
t3
x3 + 1
dt = =
,
3
9 1
9
Note that the same relation holds for discrete RVs but in the continuous case P (a X b),
P (a < X b) and P (a < X < b) are all the same. Why?
67
Therefore,
F (x) =
x3 +1
9
x 1
for 1 < x < 2
x 2.
1
9
Example 4.3.
The time X in months until failure of a certain product has the PDF
3
( 2
3x
exp
x64
for x > 0
f (x) = 64
0
elsewhere
Find F (x) and evaluate P (2.84 < X < 5.28)
3
x
Solution. F (x) = 1 exp
,
and
P (2.84 X 5.28) = 0.5988
64
Example 4.4.
The life length of batteries X (in hundreds of hours) has the density
(
1 x2
for x > 0
e
f (x) = 2
0
elsewhere
Find the probability that the life of a battery of this type is less than 200 or greater than
400 hours.
Solution. Let A denote the event that X is less than 2, and let B denote the event that
X is greater than 4. Then
Z 2
Z
1 x
1 x
P (A B) = P (A) + P (B) (why?) =
e 2 dx +
e 2 dx
2
0 2
4
= (1 e1 ) + (e2 ) = 1 0.368 + 0.135 = 0.767
Example 4.5.
Refer to Example 4.4. Find the probability that a battery of this type lasts more than
300 hours, given than it already has been in use for more than 200 hours.
Solution. We are interested in P (X > 3|X > 2); and by the definition of conditional
probability,
P (X > 3)
P (X > 3, X > 2)
=
P (X > 3|X > 2) =
P (X > 2)
P (X > 2)
because the intersection of the events (X > 3) and (X > 2) is the event (X > 3). Now
Z
1 x/2
3
e
dx
1
P (X > 3)
e 2
2
= Z3
= 1 = e 2 = 0.606
1 x/2
P (X > 2)
e
e
dx
2
2
68
Example 4.6.
For each of the following functions,
(i) find the constant c so that f (x) is a PDF of a random variable X, and
(ii) find the distribution function F (x).
( 3
x
for 0 < x < c
a)
f (x) = 4
0
elsewhere
(
3 2
x
for c < x < c
b)
f (x) = 16
0
elsewhere
(
4xc for 0 < x < 1
c)
f (x) =
0
elsewhere
(
c
for 0 < x < 1
d)
f (x) = x3/4
0
elsewhere
Answers.
a) c = 2 and F (x) =
b) c = 2 and F (x) =
x3
16
x4
16 ,
0 < x < 2.
+ 12 , 2 < x < 2.
1
4
Exercises
4.1.
The lifetime of a vacuum cleaner, in years, is described by
0
elsewhere
Find the probability that the lifetime of a vacuum cleaner is
(a) less than 2.5 years
(b) between 1 and 3 years.
4.2.
The demand for an antibiotic from a local pharmacy is given by a random variable X with
CDF
(
2500
for x > 0
1 (x+50)
2
F (x) =
0
elsewhere
a) Find the probability that the demand is at least 50 doses
b) Find the probability that the demand is between 40 and 80 doses
c) Find the density function of X.
69
4.3.
The proportion of warehouse items claimed within 1 month is given by a random variable
X with density
(
c(x + 1) for 0 < x < 1
f (x) =
0
elsewhere
(a) Find c to make this a legitimate density function.
(b) Find the probability that the proportion of items claimed will be between 0.5 and 0.7.
4.4.
The waiting time, in minutes, between customers coming into a store is a continuous
random variable with CDF
(
0
for x < 0
F (x) =
1 exp (x/2)
for x 0
Find the probability of waiting less than 1.5 minutes between successive customers
a) using the cumulative distribution of X;
b) using the probability density function of X (first, you have to find it).
4.5.
A continuous random variable X that has a density function given by
(
1
for 1 < x < 4
f (x) = 5
0 elsewhere
a) Show that the area under the curve is equal to 1.
b) Find P (0 < X < 2).
c) Find c such that P (X < c) = 1/2. [This is called a median of the distribution.]
4.6.
A continuous random variable X that has a density function given by
f (x) =
c
for < x <
1 + x2
4.2
The expected values of continuous RVs are obtained using formulas similar to those of
discrete ones. However, the summation is now replaced by integration.
70
Example 4.7.
Suppose that X has density function given by
(
3x2 for 0 x 1
f (x) =
0
elsewhere
(a) Find the mean and variance of X
(b) Find mean and variance of u(X) = 4X + 3.
(c) Find the median of X
Solution. (a) From the above definitions,
1
x4
3
E (X) =
x f (x)dx =
x (3x )dx =
3x dx = 3
= = 0.75
4 0 4
0
0
Z 1
Z 1
1
x5
3
Now,
E (X 2 ) =
x2 (3x2 )dx =
3x4 dx = 3
= = 0.6
5 0 5
0
0
Z
71
(c)
Z
F (x) =
0
m3 ,
3y 2 dy = y 3 ]x0 = x3 .
Note that, according to the Theorem 3.2, E g(X) = g(E X) when g is a linear function,
that is, g(x) = a + bx. What happens when g is not linear?
Example 4.8.
Suppose that X has density function given by
(
(x + 1)/2 for 1 x 1
f (x) =
0
elsewhere
(a) Find the expected value of g(X) = X 3
(b) Is it true that E (X 3 ) = (E X)3 ?
Z 1
Z 1
Z
1 1 4
3
3
3
Solution. (a) E (X ) =
x f (x) dx =
x (x + 1)/2 dx =
(x + x3 ) dx = 1/5
2
1
1
1
Z 1
(b) Since E X =
x(x + 1)/2 dx = 1/3, then E (X 3 ) 6= (E X)3 .
1
Exercises
4.7.
For the density described in Exercise 4.3, find the mean and standard deviation of X.
4.8.
For a random variable X with the density
21 x
f (x) =
0
a) Find the mean of X
b) Find V (X)
c) Find E (X 4 )
72
4.9.
For a random variable X with the density
2 x
f (x) =
0
Continuous
Density
Probability function
Probability
f (x) =
p(x) = P (X = x)
d
P (X x) = F 0 (x)
dx
P (X = x) is 0 for any x
CDF
Is a ladder function
Is continuous
F (x) = P (X x)
P (a < X b) =
= F (b) F (a)
Mean
E (X) = X
Z
xp(x)
xf (x) dx
Mean of a function
E g(X)
Z
g(x)p(x)
g(x)f (x) dx
Variance
2
X
= E (X 2 ) 2
(x ) p(x)
(x )2 f (x) dx
73
4.10.
For the density described in Exercise 4.1,
a) find the mean and standard deviation of X;
b) Use Chebyshev inequality to estimate the probability that X is between 1 and 3
years. Compare with the answer to Exercise 4.1.
4.11.
For the density described in Exercise 4.5,
a) find the mean and standard deviation of X.
b) Discretize the problem by assigning equal probabilities for each integer between 1
and 4. Re-calculate the mean and standard deviation and compare the results to
(a).
4.12.
For a random variable X with the CDF
x /8
F (x) = 0,
1,
74
4.3
Uniform distribution
0.0
0.4
f(x)
0.00
0.10
f(x)
0.20
0.8
0.30
One of the simplest continuous distributions is the continuous uniform distribution. This
distribution is characterized by a density function that is flat and thus the probability
is uniform in a finite interval, say [a, b]. The density function of the continuous uniform
random variable X on the interval [a, b] is
0
elsewhere
(b a)2
b+a
and 2 =
.
2
12
Example 4.9.
Suppose that a large conference room for a certain company can be reserved for no more
than 4 hours. However, the use of the conference room is such that both long and short
conferences occur quite often. In fact, it can be assumed that length X of a conference
has a uniform distribution on the interval [0, 4].
a) What is the probability density function of X?
b) What is the probability that any given conference lasts at least 3 hours?
Solution. (a) The appropriate density function for the uniformly distributed random variable X in this situation is
(
1/4 for 0 < x < 4
f (x) =
0
elsewhere
(b)
Z
P (X 3) =
3
1
1
dx = .
4
4
75
Example 4.10.
The failure of a circuit board interrupts work by a computing system until a new board
is delivered. Delivery time X is uniformly distributed over the interval of at least one but
no more than four days. The cost C of this failure and interruption consists of a fixed cost
C0 for the new part and a cost that increases proportionally to X 2 , so that
C = C0 + C1 X 2
(a) Find the probability that the delivery time is two or more days.
(b) Find the expected cost of a single failure, in terms of C0 and C1 .
(
1
Solution. a)
for 1 x 5
f (x) = 4
0 elsewhere
Thus,
Z
P (X 2) =
2
1
1
3
dx = (5 2) =
4
4
4
b) We know that
E (C) = C0 + C1 E (X 2 )
so it remains for us to find E (X 2 ). This value could be found directly from the definition
or by using the variance and the fact that E (X 2 ) = V (X)+2 . Using the latter approach,
we find
(b a)2
a + b 2 (5 1)2
1 + 5 2 31
2
E (X ) =
+
=
+
=
12
2
12
2
3
Thus, E (C) = C0 + C1 31
3 .
Exercises
4.15.
For a digital measuring device, rounding errors have Uniform distribution, between 0.05
and 0.05 mm.
a) Find the probability that the rounding error is between 0.01 and 0.03mm
b) Find the expected value and the standard deviation of the rounding error.
c) Calculate and plot the CDF of the rounding errors.
4.16.
The capacitances of 1mF (microfarad) capacitors are, in fact, Uniform[0.95, 1.05] mF.
a) What proportion of capacitors are 0.98 mF or above?
b) What proportion of capacitors are within 0.03 of the nominal value?
4.17.
For X having a Uniform[1, 4] distribution, find the mean and variance. Then, use the
formula for variance and a little algebra to find E (X 2 ).
4.18.
Suppose the radii of spheres R have a uniform distribution on [2, 3]. Find the mean volume.
(V = 43 R3 ). Find the mean surface area. (A = 4 R2 ).
76
4.4
Exponential distribution
for x > 0
e
f (x) =
0
elsewhere
The mean and variance of the exponential distribution are
= and 2 = 2 .
The distribution function for the exponential distribution has the simple form:
Z t
1 x
t
F (t) = P (X t) =
e dx = 1 e
for t 0
0
The failure rate function r(t) is defined as
r(t) =
f (t)
,
1 F (t)
t>0
(4.2)
Suppose that X, with density f , is a lifetime of an item. Consider the proportion of items currently
alive (at the time t) that will fail in the next time interval (t, t + t], where t is small. Thus, by
the conditional probability formula,
P {die in the next (t, t + t] | currently alive} =
=
P {X (t, t + t]}
f (t)t
= r(t)t
P (X > t)
1 F (t)
f (t)
1/ et/
1
=
=
1 F (t)
et/
Note that the failure rate = 1 of an item with exponential lifetime does not depend on the
items age. This is known as the memoryless property of exponential distribution. The exponential
distribution is the only continuous distribution to have a constant failure rate.
In reliability studies, the mean of a positive-valued distribution, is also called Mean Time To
Fail or MTTF. So, we have exponential MTTF = .
et (t)0
= et .
0!
77
Exercises
4.19.
Prove another version of the memoryless property of the exponential distribution,
P (X > t + s | X > t) = P (X > s).
Thus, an item that is t years old has the same probabilistic properties as a brand-new
item. [Hint: Use the definition of conditional probability and the expression for exponential
CDF.]
4.20.
The 1-hour carbon monoxide concentrations in a big city are found to have an exponential
distribution with a mean of 3.6 parts per million (ppm).
(a) Find the probability that a concentration will exceed 9 ppm.
(b) A traffic control policy is trying to reduce the average concentration. Find the new
target mean so that the probability in part (a) will equal 0.01
78
4.5
The Gamma distribution derives its name from the well-known gamma function, studied
in many areas of mathematics. This distribution plays an important role in both queuing
theory and reliability problems. Time between arrivals at service facilities, and time to
failure of component parts and electrical systems, often are nicely modeled by the Gamma
distribution.
Definition 4.7. Gamma function
The gamma function, for > 0, is defined by
Z
x1 ex dx
() =
0
f (x) = ()
0
elsewhere
The mean and variance of the Gamma distribution are
= and 2 = 2 .
Note: When = 1, the Gamma reduces to the exponential distribution. Another
well-known statistical distribution, chi-square, is also a special case of the Gamma.
Uses of the Gamma Distribution Model
a) The gamma is a flexible life distribution model that may offer a good fit to some sets of
failure data, or other data where positivity is enforced.
b) The gamma does arise naturally as the time-to-failure distribution for a system with standby
exponentially distributed backups. If there are
n 1 standby backup units and the system and all backups have exponential lifetimes with
mean , then the total lifetime has a Gamma distribution with = n. Note: when is a
positive integer, the Gamma is sometimes called Erlang distribution. The Erlang distribution
is used frequently in queuing theory applications.
79
1.5
0.0
0.5
f(x)
1.0
= 0.5
=1
=2
=5
10
Example 4.13.
The total monthly rainfall (in inches) for a particular region can be modeled using Gamma
distribution with = 2 and = 1.6. Find the mean and variance of the monthly rainfall.
Solution. E (X) = = 3.2, and variance V (X) = 2 = 2(1.62 ) = 5.12
4.5.1
Poisson process
Following our discussion about Exponential distribution, the latter is a good model for
the waiting times between randomly occurring events. Adding independent Exponential
RVs will result in the Poisson process.
The Poisson process was first studied3 in 1900s when modeling the observation times
of radioactive particles recorded by Geiger counter. It consists of the consecutive event
times Y1 , Y2 ,... such that the interarrival times X1 = Y1 , X2 = Y2 Y1 ,... have independent
Exponential distributions. (The observations start at the time t = 0.)
0.0
0.5
1.0
time
not by Poisson!
1.5
2.0
80
(4.3)
Y1
Y2
Y3
k1
X
i=0
et/
(t/)i
i!
(4.4)
Exercises
4.22.
Customers come to a barber shop with the frequency of 3 per hour. Suppose Y4 is the
time when 4th customer has come.
a) Find the expected value and the standard deviation of Y4
b) Find the probability that the 4th customer comes within the 1st hour.
81
4.23.
A truck has 2 spare tires. Under intense driving conditions, tire blowouts are determined
to approximately follow a Poisson process with the intensity of 1.2 per 100 miles. Let X
be the total distance the truck can go with 2 spare tires.
a) Find the expected value and the standard deviation of X
b) Find the probability that the truck can go at least 200 miles
4.24.
Differentiate Equation 4.4 for k = 2 to show that you indeed will get the Gamma density
function with = 2.
4.25.
The time X between successive visits to a repair shop is estimated to have Gamma distribution with = 2 and = 50 days.
a) Find the expected value and the standard deviation of X.
b) Find the probability that 80 days pass without a visit.
4.26.
The bicycle sales at a store follow a Poisson process with the rate of 0.1 sales per working
hour.
a) Find the probability of having exactly 3 bicycle sales over the course of 30 hours.
b) What is the average time between bicycle sales?
c) Describe the distribution of the time between bicycle sales.
4.27.
The counts of user requests incoming to a server are approximated by a Poisson process
with the intensity of 560 per second.
a) Describe the distribution of time between requests
b) Find the probability that, during the next 10 ms (= 0.01 sec), between 4 and 6
requests (inclusive) will arrive.
4.28.
For X having a Gamma distribution with = 3.5 and = 4,
a) Can you apply the Poisson process formulas to obtain the CDF of X? Do it or explain
why you cant.
b) Calculate E X 5
4.6
Normal distribution
The most widely used of all the continuous probability distributions is the normal distribution (also known as Gaussian). It serves as a popular model for measurement errors,
particle displacements under Brownian motion, stock market fluctuations, human intelligence and many other things. It is also used as an approximation for Binomial (for large
n) and Gamma (for large ) distributions.
The normal density follows the well-known symmetric bell-shaped curve. The curve
is centered at the mean value and its spread is, of course, measured by the standard
deviation . These two parameters, and 2 , completely determine the shape and center
of the normal density function.
82
Definition 4.9.
The normal random variable X has the PDF
1
(x )2
f (x) =
,
exp
2 2
2
It will be denoted as X N (, 2 )
1.0
The normal random variable with = 0 and = 1 is said to have the standard normal
1
distribution and will be called Z. Its density becomes fZ (z) =
exp(z 2 /2). Direct
2
integration would show that E (Z) = 0 and V (Z) = 1.
0.0
0.2
0.4
f(x)
0.6
0.8
= 1, = 1
= 0, = 1
= 2, = 3
= 5, = 0.5
10
Example 4.15.
Popular (and controversial) IQ scores are scaled to have the mean = 100 and standard
deviation = 15. Then, if a person has an IQ of 115, it can be transformed into Z-score
as z = (115 100)/15 = 1 and expressed as one standard deviation above the mean. A
lot of standardized test scores (like SAT) follow the same principle.
The values of the CDF of Z can be obtained from Table A. Namely,
0.5 + TA(z), z 0
F (z) =
0.5 TA(|z|), z < 0
83
where TA(z) = P (0 < Z < z) denotes table area of z. The second equation follows from
the symmetry of the Z distribution.
Table A allows us to calculate probabilities and percentiles associated with normal
random variables, as the direct integration of normal density is not possible.
Example 4.16.
If Z denotes a standard normal variable, find
(a) P (Z 1)
(b) P (Z > 1)
(c) P (Z < 1.5)
(d) P (1.5 Z 0.5).
(e) Find a number, say z0 , such that P (0 Z z0 ) = 0.49
Solution. This example provides practice in using Normal probability Table. We see that
a) P (Z 1) = P (Z 0) + P (0 Z 1) = 0.5 + 0.3413 = 0.8413.
b) P (Z > 1) = 0.5 P (0 Z 1) = 0.5 0.3413 = 0.1587
c) P (Z < 1.5) = P (Z > 1.5) = 0.5 P (0 Z 1.5) = 0.5 0.4332 = 0.0668.
d) P (1.5 Z 0.5) = P (1.5 Z 0) + P (0 Z 0.5)
= P (0 Z 1.5) + P (0 Z 0.5) = 0.4332 + 0.1915 = 0.6247.
e) To find the value of z0 we must look for the given probability of 0.49 on the area
side of Normal probability Table. The closest we can come is at 0.4901, which
corresponds to a Z value of 2.33. Hence z0 = 2.33.
Example 4.17.
For X N (50, 102 ), find the probability that X is between 45 and 62.
Solution. The Z- values corresponding to X = 45 and X = 62 are
Z1 =
45 50
= 0.5
10
and
Z2 =
62 50
= 1.2.
10
0.0
0.2
0.4
Example 4.18.
Given a random variable X having a normal distribution with = 300 and = 50, find
the probability that X is greater than 362.
84
Solution. To find P (X > 362), we need to evaluate the area under the normal curve to
the right of x = 362. This can be done by transforming x = 362 to the corresponding
Z-value. We get
x
362 300
z=
=
= 1.24
50
Hence P (X > 362) = P (Z > 1.24) = P (Z < 1.24) = 0.5 TA(1.24) = 0.1075.
Example 4.19.
A diameter X of a shaft produced has a normal distribution with parameters = 1.005, =
0.01. The shaft will meet specifications if its diameter is between 0.98 and 1.02 cm. Which
percent of shafts will not meet specifications?
Solution.
1 P (0.98 < X < 1.02) = 1 P
0.98 1.005
1.02 1.005
<Z<
0.01
0.01
4.6.1
85
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.0
.1
.2
.3
.4
.5
.0000
.0398
.0793
.1179
.1554
.1915
.0040
.0438
.0832
.1217
.1591
.1950
.0080
.0478
.0871
.1255
.1628
.1985
.0120
.0517
.0910
.1293
.1664
.2019
.0160
.0557
.0948
.1331
.1700
.2054
.0199
.0596
.0987
.1368
.1736
.2088
.0239
.0636
.1026
.1406
.1772
.2123
.0279
.0675
.1064
.1443
.1808
.2157
.0319
.0714
.1103
.1480
.1844
.2190
.0359
.0753
.1141
.1517
.1879
.2224
.6
.7
.8
.9
1.0
.2257
.2580
.2881
.3159
.3413
.2291
.2611
.2910
.3186
.3438
.2324
.2642
.2939
.3212
.3461
.2357
.2673
.2967
.3238
.3485
.2389
.2704
.2995
.3264
.3508
.2422
.2734
.3023
.3289
.3531
.2454
.2764
.3051
.3315
.3554
.2486
.2794
.3078
.3340
.3577
.2517
.2823
.3106
.3365
.3599
.2549
.2852
.3133
.3389
.3621
1.1
1.2
1.3
1.4
1.5
.3643
.3849
.4032
.4192
.4332
.3665
.3869
.4049
.4207
.4345
.3686
.3888
.4066
.4222
.4357
.3708
.3907
.4082
.4236
.4370
.3729
.3925
.4099
.4251
.4382
.3749
.3944
.4115
.4265
.4394
.3770
.3962
.4131
.4279
.4406
.3790
.3980
.4147
.4292
.4418
.3810
.3997
.4162
.4306
.4429
.3830
.4015
.4177
.4319
.4441
1.6
1.7
1.8
1.9
2.0
.4452
.4554
.4641
.4713
.4772
.4463
.4564
.4649
.4719
.4778
.4474
.4573
.4656
.4726
.4783
.4484
.4582
.4664
.4732
.4788
.4495
.4591
.4671
.4738
.4793
.4505
.4599
.4678
.4744
.4798
.4515
.4608
.4686
.4750
.4803
.4525
.4616
.4693
.4756
.4808
.4535
.4625
.4699
.4761
.4812
.4545
.4633
.4706
.4767
.4817
2.1
2.2
2.3
2.4
2.5
.4821
.4861
.4893
.4918
.4938
.4826
.4864
.4896
.4920
.4940
.4830
.4868
.4898
.4922
.4941
.4834
.4871
.4901
.4925
.4943
.4838
.4875
.4904
.4927
.4945
.4842
.4878
.4906
.4929
.4946
.4846
.4881
.4909
.4931
.4948
.4850
.4884
.4911
.4932
.4949
.4854
.4887
.4913
.4934
.4951
.4857
.4890
.4916
.4936
.4952
2.6
2.7
2.8
2.9
3.0
.4953
.4965
.4974
.4981
.4987
.4955
.4966
.4975
.4982
.4987
.4956
.4967
.4976
.4982
.4987
.4957
.4968
.4977
.4983
.4988
.4959
.4969
.4977
.4984
.4988
.4960
.4970
.4978
.4984
.4989
.4961
.4971
.4979
.4985
.4989
.4962
.4972
.4979
.4985
.4989
.4963
.4973
.4980
.4986
.4990
.4964
.4974
.4981
.4986
.4990
86
Example 4.21.
The SAT Math exam is scaled to have the average of 500 points, and the standard deviation
of 100 points. What is the cutoff score for the top 10% of the SAT takers?
Solution. In this example we begin with a known area, find the z-value, and then find x
from the formula x = + z. The 90th percentile corresponds to the 90% area under the
normal curve to the left of x. Thus, we also require a z-value that leaves 0.9 area to the
left and hence, the Table Area of 0.4. From Table A, P (0 < Z < 1.28) = 0.3997. Hence
x = 500 + 100(1.28) = 628
Therefore, the cutoff for the top 10% is 628 points.
Example 4.22.
Let X = monthly sick leave time have normal distribution with parameters = 200 hours
and = 20 hours.
a) What percentage of months will have sick leave below 150 hours?
b) What amount of time x0 should be budgeted for sick leave so that the budget will
not be exceeded with 80% probability?
Solution. (a) P (X < 150) = P (Z < 2.5) = 0.5 0.4938 = 0.0062
(b) P (X < x0 ) = P (Z < z0 ) = 0.8, which leaves a table area for z0 of 0.3. Thus, z0 = 0.84
and hence x0 = 200 + 20(0.84) = 216.8 hours
Quantile-Quantile (Q-Q) plots
If X is normal (, 2 ) distribution, then
X = + Z
and there is a perfect linear relationship between X and Z. This is a graphical method
for checking normality.
The details of this method will be considered in Chapter 7.
4.6.2
As another example of using the Normal distribution, consider the Normal approximation
to Binomial distribution. This will be also used when discussing sample proportions.
Theorem 4.2. Normal approximation to Binomial
If X is a Binomial random variable with mean = np and variance 2 = npq, then
the random variables
X np
Zn =
npq
approach the standard Normal as n gets large.
We already know one Binomial approximation (by Poisson). It mostly applies when
the Binomial distribution in question has a skewed shape, that is, when p is close to 0 or 1.
When the shape of Binomial distribution is close to symmetric, the Normal appoximation
will work better. Practically, we will require that both np and n(1 p) 5.
87
Example 4.23.
Suppose X is Binomial with parameters n = 15, and p = 0.4, then = np = (15)(4) = 6
and 2 = npq = 15(0.4)(0.6) = 3.6. Suppose we are interested in the probability that X
assumes a value from 7 to 9 inclusive, that is, P (7 X 9). The exact probability is
given by
P (7 X 9) =
9
X
For Normal approximation we find the area between x1 = 6.5 and x2 = 9.5 using z-values
which are
x1 np
x1
6.5 6
z1 =
=
=
= 0.26,
npq
1.897
and
z2 =
9.5 6
= 1.85
1.897
0.00
0.05
0.10
0.15
0.20
The value 0.5 we add or subtract is called continuity correction. It arises when we try
to approximate a distribution with integer values (here, Binomial) through the use of a
continuous distribution (here, Normal). Shown in Fig.4.7, the sum over the discrete set
{7 X 9} is approximated by the integral of the continuous density from 6.5 to 9.5.
10
11
88
Solution. Let the binomial variable X represent the number of patients that survive. Since
n = 100 and p = 0.4, we have
= np = (100)(0.4) = 40
and
also =
Thus,
x
30.5 40
=
= 1.94,
4.899
and the probability of at most 30 of the 100 patients surviving is P (X 30) P (Z <
1.94) = 0.5 0.4738 = 0.0262.
z=
Example 4.25.
A fair coin (p = 0.5) is tossed 10,000 times, and the number of Heads X is recorded. What
are the values that contain X with 95% certainty?
Solution.
p
We have = np = 10, 000(0.5) = 5, 000 and = 10, 000(0.5)(1 0.5) = 50. We need
to find x1 and x2 so that P (x1 X x2 ). Since the mean of X is large, we will neglect
the continuity correction.
Since we will be working with Normal approximation, lets find z1 and z2 such that
P (z1 Z z2 ) = 0.95
The solution is not unique, but we can choose the values of z1,2 that are symmetric about
0. This will mean finding z such that P (0 < Z < z) = 0.475. Using Normal tables in
reverse we will get z = 1.96. Thus, P (1.96 < Z < 1.96) = 0.95.
Next, transforming back into X, use the formula x = + z, so
x1 = 5000 + 50(1.96) = 4902
and
Thus, with a large likelihood, our Heads count will be within 100 of the expected value of
5,000.
This is an example of the famous 68% - 95% rule.
Exercises
4.29.
Given a standard normal distribution Z, find
a) P (0 < Z < 1.28)
b) P (2.14 < Z < 0)
c) P (Z > 1.28)
d) P (2.3 < Z < 0.75)
4
To set this up correctly, remember to include the value of 30 because its already included in the
inequality. For P (X < 30) you would have used 29.5
89
90
4.35.
The likelihood that a job application will result in an interview is estimated as 0.1. A
grad student has mailed 40 applications. Find the probability that she will get at least 3
interviews,
a) Using the Normal approximation.
b) Using the Poisson approximation.
c) Find the exact probability. Which approximation has worked better? Why?
4.36.
It is estimated that 33% of individuals in a population of Atlantic puffins have a certain
recessive gene. If 90 individuals are caught, estimate the probability that there will be
between 30 and 40 (inclusive) with the recessive gene.
4.7
Weibull distribution
(4.5)
Note: if = 1 then we get the Exponential distribution. The parameter has the
dimension of time and is dimensionless.
By differentiating the CDF, we get the Weibull density.
Definition 4.11. Weibull distribution
The Weibull RV has the density function
x1
x
f (x) =
exp
x>0
The Weibull distribution with > 1 typically has an asymmetric shape with a peak in
the middle and the long right tail. Shapes of Weibull density are shown in Fig. 4.8 for
various values of .
Regarding the computation of the mean: the Gamma function of non-integer parameter
is, generally, not easy to find. Note only that (0.5) = , and we can use the recursive
relation (
+ 1) =
() to compute the Gamma function for = 1.5, 2.5 etc. Also, for
1
large , 1 +
(1) = 1
91
1.0
=1
=2
=5
0.0
0.5
f(x)
1.5
2.0
Exercises
4.37.
The time it takes for a server to respond to a request is modeled by the Weibull distribution
with = 2/3 and = 15 milliseconds.
a) Find the average time to respond.
b) Find the probability that it takes less than 12 milliseconds to respond.
c) Find the 70th percentile of the response times.
4.38.
The lifetimes of refrigerators are assumed to follow Weibull distribution with parameters
= 7 years and = 4. Find:
a) The proportion of refrigerators with lifetime between 2 and 5 years.
b) If a refrigerator has already worked for 2 years, what is the probability that it will
work for at least 3 more years?
92
4.39.
The tensile strength (in MPa) of titanium rods is estimated to follow Weibull distribution
with = 10.2 and = 415.
a) Find the critical value c so that only 5% of rods will break before reaching the load
c.
b) What proportion of rods will have tesnile strength above 450 MPa?
4.8
The moment generating function of a continuous random variable X with a pdf of f (x) is
given by
Z
tX
etx f (x)dx
M (t) = E (e ) =
when the integral exists. For the exponential distribution, this becomes
Z
Z
1
1
1
tx 1 x/
M (t) =
e
e
dx =
ex(1/t) dx =
=
(1/ t)
1 t
0
0
For properties of MGFs, see Section 3.9
Exercises
4.40.
Calculate MGF for the distribution with a given PDF
1
exp[(x a)/b], x > a
b
b) f (x) = exp[(x + 2)], x > 2
a) f (x) =
c) f (x) = 2x exp(2x),
x>0
x>7
e)
f (x) =
1 3
x exp(x/2),
96
x>0
f)
f (x) =
b3 x2
exp(bx),
2
x>0
4.41.
a) Calculate the MGF for the Standard Normal distribution.
b) Using properties (b) and (c) of Theorem 3.8, find the MGF for the RV
X N (, 2 )
4.42.
It is known that Gamma RV Gamma(, ) for integer = n is the sum of n independent
copies of Exponential RV. Calculate MGF for the Gamma distribution and check the
property given by Theorem 3.8(d), p. 63
Chapter 5
All of the random variables discussed previously were one dimensional, that is, we consider
random quantities one at a time. In some situations, however, we may want to record the
simultaneous outcomes of several random variables.
Examples:
a) We might measure the amount of precipitate A and volume V of gas released from
a controlled chemical experiment, giving rise to a two-dimensional sample space.
b) A physician studies the relationship between weekly exercise amount and resting
pulse rate of his patients.
c) An educator studies the relationship between students grades and time devoted to
study.
If X and Y are two discrete random variables, the probability that X equals x while Y
equals y is described by p(x, y) = P (X = x, Y = y). That is, the function p(x, y) describes
the probability behavior of the pair X, Y . It is not enough to know only how X or Y
behave on their own (which is described by their marginal probability functions).
Definition 5.1. Joint PMF
The function p(x, y) is a joint probability mass function of the discrete random
variables X and Y if
a) p(x, y) 0 for all pairs (x, y),
P P
b)
x
y p(x, y) = 1,
c) P (X = x, Y = y) = p(x, y).
For any region A in the xy-plane, P [(X, Y ) belongs to A] =
X
(x,y)A
93
p(x, y).
94
Example 5.1.
If two dice are rolled independently, then the numbers X and Y on the first and second
die, respectively, will each have marginal PMF p(x) =
for x = 1, 2, ..., 6.
P1/6
6
The joint PMF is p(x, y) = 1/36, so that p(x) = y=1 p(x, y)
Example 5.2.
Consider X = persons age and Y = income. The data are abridged from the US Current
Population Survey.j For the purposes of this example, we replace the age and income
groups by their midpoints. For example, the first row represents ages 25-34 and the first
column represents incomes $0-$10,000.
X, age 30
40
50
Total
Y, income
5
0.049
0.042
0.047
0.139
20
0.116
0.093
0.102
0.310
40
0.084
0.081
0.084
0.249
60
0.039
0.045
0.053
0.137
85
0.032
0.061
0.072
0.165
Total
0.320
0.322
0.358
1.000
Here, the joint PMF is given inside the table and the marginal PMFs of X and Y are row
and column totals, respectively.
For example, p(30, 60) = 0.039 and pY (40) = 0.084 + 0.081 + 0.084 = 0.249.
For continuous random variables, the PMFs turn into densities, and summation into
integration.
Definition 5.3. Joint density, marginal densities
The function f (x, y) is a joint probability density function for the continuous
random variables X and Y if
a) f (x, y) 0, for all (x, y)
Z Z
b)
f (x, y) dx dy = 1.
ZZ
c) P [(X, Y ) A] =
A
Z
f (x, y) dx
Note that,even if X, Y are each continuous RVs, this does not always mean that the joint
density exists. For example, X is Uniform[0,1] and Y = X. For this reason, X, Y satisfying this
definition might be called jointly continuous
95
When X and Y are continuous random variables, the joint density function f (x, y)
describes the likelihood that the pair (X, Y ) belongs to the neighborhood of the point
(x, y). It is visualized as a surface lying above the xy plane.
0.1
3
2
0.3
0.3
0.2
0.25
density
0.1
0.2
1
1
2
0
x
0.05
0.15
Figure 5.1: An example of a joint density function. Left: surface plot. Right: contour
plot.
Example 5.3.
A certain process for producing an industrial chemical yields a product that contains two
main types of impurities. Suppose that the joint probability distribution of the impurity
concentrations (in mg/l) X and Y is given by
(
2(1 x) for 0 < x < 1, 0 < y < 1
f (x, y) =
0
elsewhere
(a) Verify the condition (b) of Definition 5.3
(b) Find P (0 < X < 0.5, 0.4 < Y < 0.7)1
(c) Find the marginal probability density functions for X and Y .
RR
Solution. (a) Condition
f (x, y) dx dy = 1 can be verified by integrating one of the
densities in part (c).
(b)
Z 0.7 Z 0.5
P (0 < X < 0.5, 0.4 < Y < 0.7) =
2(1 x)dx dy = 0.225
0.4
(c)
fX (x) =
and fY (y) =
1
0<x<1
2(1 x)dx = 1,
f (x, y)dx =
f (x, y)dy =
0<y<1
Recall that for continuous RVs, the choice of the < of sign in the inequality does not matter.
96
5.2
p(x, y)
for y such that pY (y) > 0
pY (y)
For a pair of continuous RVs with joint density f (x, y), the conditional density
function of X given Y = y is defined as
f (x|y) =
f (x, y)
for y such that fY (y) > 0
fY (y)
f (x, y)
for x such that fX (x) > 0
fX (x)
For discrete RVs, the conditional probability distribution of X given Y fixes a value of
Y . For example, conditioning on Y = 0, produces
P (X = 0 | Y = 0) =
P (X = 0, Y = 0)
P (Y = 0)
Example 5.4.
Using the data from Example 5.2,
Y, income
5
20
40
60
85
Total
X, age 30
40
50
0.049
0.042
0.047
0.116
0.093
0.102
0.084
0.081
0.084
0.039
0.045
0.053
0.032
0.061
0.072
0.320
0.322
0.358
Total
0.139
0.310
0.249
0.137
0.165
1.000
X, age 30
20
40
60
85
0.049
= 0.153
0.320
0.116
= 0.362
0.320
0.084
= 0.263
0.320
0.039
= 0.122
0.320
0.032
= 0.1
0.32
Total
1
97
2.5
Example 5.6.
Suppose that X, Y are Uniform on the [0, 2] [0, 2] square. Find the probability that
X + Y 3.
1.0
1.5
2.0
0.0
0.5
x+y=3
0.0
0.5
1.0
1.5
2.0
2.5
fX (x) =
10xy 2 dy =
Z
fY (y) =
0
10
x(1 x3 ),
3
0<x<1
10xy 2 dx = 5y 4 ,
0<y<1
1.0
98
0.6
0.8
0.8
8
0.4
1.0
0.6
0.4
2
0.0
0.2
0.2
0.4
0.0
0.2
density
0.6
0.8
1.0
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure 5.3: Left: Joint density from Example 5.7, right: a typical sample from this
distribution
(b) Now
f (y|x) =
10xy 2
3y 2
f (x, y)
=
=
, 0<x<y<1
fX (x)
(10/3)x(1 x3 )
(1 x3 )
and
f (x|y) =
10xy 2
2x
f (x, y)
=
= 2, 0 < x < y < 1
fY (y)
5y 4
y
For the last one, say, treat y as fixed (given) and x is the variable.
5.3
99
Example 5.8.
Show that the random variables in Example 5.3 are independent.
Solution. Here,
(
2(1 x)
f (x, y) =
0
Exercises
5.1.
Suppose that the rolls of two dice, X1 and X2 have joint PMF
p(i, j) = P (X1 = i, X2 = j) = 1/36
a) Are random variables X1 , X2 independent? Explain.
b) Are the events A = {X1 3} and B = {X2 3} independent? Explain.
c) Are the events C = {X1 + X2 3} and D = {X1 X2 3} independent? Explain.
5.2.
X and Y have the following joint density:
(
k for 0 x y 1
f (x, y) =
0 elsewhere
a) Calculate the constant k that makes f a legitimate density.
b) Calculate the marginal densities of X and Y .
5.3.
The joint distribution for the number of total sales =X1 and number of electronic equipment sales =X2 per hour for a wholesale retailer are given below
0
X1 = 0
0.1
X1 = 1
0.1
0.2
X1 = 2
0.1
0.15
X2
a) Fill in the ?
b) Compute the marginal probability function for X2 . (That is, find P (X2 = i) for
every i.)
100
for 0 x, y 2
elsewhere
x, y > 0
5.4
101
The sum is over all values of (x, y) for which p(x, y) > 0.
If (X, Y ) are continuous random variables, with joint PDF f (x, y), then
Z Z
g(x, y) f (x, y)dx dy.
E [g(X, Y )] =
The covariance helps us assess the relationship between two variables. Positive covariance means positive association between X and Y meaning that, as X increases, Y also
tends to increase. Negative covariance means negative association.
0.05
+ +
0.15
+ +
0.3
+ +
0.25
0.2
+ +
0.1
102
In Figure 5.4, positive covariance is achieved, since pairs of x, y with positive products
have higher densities than those with the negative products.
This definition also extends our notion of variance as Cov(X, X) = V (X).
While covariance measures the direction of the association between two random variables,
its magnitude is not directly interpretable. Correlation coefficient, introduced below,
measures the strength of the association and has some nice properties.
Definition 5.8. Correlation
The correlation coefficient between two random variables X and Y is given by
Cov(X, Y )
= p
V (X) V (Y )
Properties of correlation:
The correlation coefficient lies between 1 and +1.
The correlation coefficient is dimensionless (while covariance has dimension of XY).
If = +1 or = 1, then Y must be a linear function of X.
The correlation coefficient does not change when X or Y are linearly transformed
(e.g. when you change the units from miles to
angstroms.)
However, the correlation coefficient is not a good indicator of a nonlinear relationship.
The following Theorem simplifies the computation of covariance. Compare it to the
variance identity V (X) = E (X 2 ) (E X)2 .
Theorem 5.1. Covariance
Cov(X, Y ) = E (XY ) E (X)E (Y )
Example 5.9.
0
X =0
0.1
0.25
0.35
X =1
0.35
0.3
0.65
0.45
0.55
Y
Let X and Y have the following joint PMF
103
Example 5.10.
Let X denote the proporion of calls to the Support Instruction Center (SIC) about computers and Y the proportion of calls to SIC about projectors. It is estimated that X and
Y have a joint density
(
c
f (x, y) =
0
Find the constant c that makes f a legitimate density. Then, find the covariance and
correlation of X and Y .
Solution. We first compute the marginal density functions (sketching
R 1x the density may
help you set up the limits of integration). They are: fX (x) = 0 c dy = c(1 x),
R 1y
R1
and fY (y) = 0 c dx = c(1 y). Then, integrating one of them, say 0 fX (x) dx =
R1
0 c(1 x) dx = c/2 = 1, we get c = 2. Thus,
(
2(1 x)
fX (x) =
0
for 0 x 1
elsewhere
and you can notice that fY = fX here. From the marginal density functions, we get
Z
x 2(1 x)dx =
E (X) =
0
1
3
and E (Y ) = E (X)
Now, we are ready to calculate covariance. From the joint density function given, we have
Z
1 Z 1y
1
.
12
0
0
1
1
1
1
Cov(X, Y ) = E (XY ) E (X)E (Y ) =
=
12
3
3
36
xy 2 dx dy =
E (XY ) =
Then
E (X ) =
0
1
x 2(1 x) dx =
6
2
and E (Y ) =
y 2 2(1 y) dy = E (X 2 )
104
Proof. We will show the proof for the continuous case; the discrete case follows similarly.
For independent X, Y ,
ZZ
ZZ
E (XY ) =
xy f (x, y)dx dy =
xfX (x)yfY (y)dx dy =
Z
=
Z
xfX (x)dx
yfY (x)dy = E (X) E (Y )
5.4.1
Variance of sums
105
Example 5.12.
We have discussed in Chapter 3 that the Binomial random variable Y with parameters
n, p can be represented as Y = X1 + X2 + ... + Xn . Here Xi are independent Bernoulli
(0/1) random variables with P (Xi = 1) = p.
It was found that V (Xi ) = p(1 p). Then, using the above Note, V (Y ) = V (X1 ) +
V (X2 ) + ... + V (Xn ) = np(1 p), which agrees with the formula for Binomial variance in
Section 3.4.
The same reasoning applies to Gamma RVs. If Y = X1 + X2 + ... + Xn , where Xi are
independent Exponentials, each with mean , then we know that V (Xi ) = 2 and Y has
Gamma distribution with = n. Then, V (Y ) = V (X1 ) + V (X2 ) + ... + V (Xn ) = n 2 .
Example 5.13.
A very important application of Theorem 5.3 is the calculation of variance of the sample
mean
X1 + X2 + ... + Xn
Y
X=
=
n
n
where Xi are independent and identically distributed RVs (representing a sample of measurements), and Y denotes the total of all measurements.
Suppose that V (Xi ) = 2 for each i. Then
V (Y )
V (X1 ) + V (X2 ) + ... + V (Xn )
n 2
2
=
=
=
n2
n2
n2
n
Example 5.14.
The error in a single permeability measurement has the standard deviation of 0.01 millidarcies (md). If we made 8 independent measurements, how large is the error we should
expect from their mean?
Exercises
5.10.
Y
X =0
0.1
X =1
0.1
0.2
X =2
0.1
0.35
0.15
106
5.12.
X and Y have the following joint density:
(
2 for 0 x y 1
f (x, y) =
0 elsewhere
a) Calculate E (X 2 Y ).
b) Calculate E (X/Y ).
5.13.
Using the density in Problem 5.12, find the covariance and correlation between X and Y .
5.14.
The random variables X, Y have the following joint density function:
(
k,
0 x, 0 y, and x + y 2
f (x, y) =
0
elsewhere
Sketch the region where f is positive and answer the following questions:
a) Find the constant k that makes f a true density function.
b) Find the marginal density of X.
c) Find the probability that X + Y > 1
d) Set up, do not evaluate the expression for the expected value of U = X 2 1 + Y 3 .
5.15.
Ten people get into an elevator. Assume that their weights are independent, with the
mean 150 lbs and standard deviation 30 lbs.
a) Find the expected value and the standard deviation of their total weight.
b) Assuming Normal distribution, find the probability that their combined weight is
less than 1700 pounds.
5.16.
While estimating speed of light in a transparent medium, an individual measurement X
is determined to be unbiased (that is, the mean of X equals the unknown speed of light),
but the measurement error, assessed as the standard deviation of X, equals 35 kilometers
per second (km/s).
a) In an experiment, 20 independent measurements of the speed of light were made.
What is the standard deviation of the mean of these measurements?
b) How many measurements should be made so that the error in estimating the speed
of light (measured as X ) will decrease to 5 km/s?
107
5.17.
Near-Earth asteroids (NEA) are being surveyed. The average mass of these is 250 tons,
with a standard deviation of 180 tons. Suppose that 3 NEAs are randomly selected.
a) Find the expected value and standard deviation of the total mass of these NEAs,
assuming that their masses are independent.
b) How does your answer change if you assume that that each pair of masses have
correlation of = 0.5?
5.18.
A part is composed of two segments. One segment is produced with the mean length
4.2cm and standard deviation of 0.1cm, and the second segment is produced with the mean
length 2.5cm and standard deviation of 0.05cm. Assuming that the production errors are
independent, calculate the mean and standard deviation of the total part length.
5.19.
Random variables X and Y have means 3 and 5, and variances 0.5 and 2, respectively.
Further, the correlation coefficient between X and Y equals 0.5. Find the mean and
variance of U = X + Y and W = X Y .
5.20. ?
Find an example of uncorrelated, but not independent random variables.
[Hint: Two
5.5
Conditional Expectations*
x p(x|y)
108
P (X = x, Y = 0)
1/6
=
= 1/3
P (Y = 0)
1/2
for x = 1, 2, 3, and
P (X = x | Y = 1) = 1/3 for x = 4, 5, 6.
Thus, E (X | Y = 0) = (1/3)(1 + 2 + 3) = 2 and E (X | Y = 1) = (1/3)(4 + 5 + 6) = 5
(b) E (X | Y ) is 2 or 5, depending on Y . Each value may happen with probability 1/2.
Thus, P [E (X | Y ) = 2] = 0.5 and P [E (X | Y ) = 5] = 0.5
Theorem 5.4. Expectation of expectation
Let X and Y denote random variables. Then
(a)
(b)
E (X) = E [E (X|Y )]
Z
xf (x|y)f (y)dx dy =
xf (x|y)dx f (y)dy
Example 5.16.
Suppose we are interested in the total weight X of occupants in a car. Let the number of
occupants equal Y , and each occupant weighs 150 lbs on average.2 Then E (X | Y = y) =
150y. Suppose Y has the following distribution
y
p(y)
0.62
0.28
0.07
0.03
150y
150
300
450
600
Then E (X | Y ) has the distribution with values given in the last row of the table, and probabilities identical to p(y). We can verify by straightforward calculation that E (X | Y ) =
E (150Y ) = 226.5. Then the Theorem says that E (X) = 226.5 as well, so we dont even
have to know the distribution of occupant weights Y , only its mean (150).
2
109
Exercises
5.21.
For the random variables X and Y from Example 5.15, verify the identity in part (a) of
the Theorem 5.4.
5.22.
Suppose that the number of lobsters caught in a trap follows the distribution
y
p(y)
0.5
0.3
0.15
0.05
and the average weight of lobster is 1.7 lbs, with variance 0.25 lbs2 . Find the expected
value and the variance of the total catch in one trap. Assume independence of lobsters
weights.
5.23.
In the following table from US Social Security Administrationk , the survival probabilities
P (X a) for US males are given, where a is the current age.
a, age
10
20
30
40
50
60
P (X a)
0.992
0.987
0.975
0.959
0.928
0.860
70
80
90
100
110
0.734
0.499
0.172
0.087
0.001
a, age
P (X a)
a) Find P (X 80 | X 60)
b) Find E (X | X 60).3 Since the data are only given once per decade, approximate
the age at death at the midpoint, e.g. if 60 X < 70 then count it as X = 75.
Explain why the result is higher than unconditional E (X).
5.24.
Tires from Manufacturer A last 50 thousand miles, on average, with the standard deviation
2.887 thousand miles; and those from Manufacturer B last 60 thousand miles, on average,
with standard deviation 2.887 too. You pick a tire at random, with 50% chance it comes
from A and 50% chance it comes from B. Find the expected lifetime of your tire, and
standard deviation of the lifetime.
Verify your calculations assuming that tire A has Uniform[45,55] lifetime and tire B
has Uniform[55,65] lifetime; so that the random tire will have Uniform[45,65] lifetime.
110
Chapter 6
Introduction
At times we are faced with a situation where we must deal not with the random variable
whose distribution is known but rather with some function of that random variable. For
example, we might know the distribution of particle sizes, and would like to infer the
distribution of particle weights.
In the case of a simple linear function, we have already asserted what the effect is
on the mean and variance. What has been omitted was what actually happens to the
distribution.
We will discuss several methods of obtaining the distribution of Y = g(X) from known
distribution of X. The CDF method and the transformation method are most frequently
used. The CDF method is all-purpose and flexible. The transformation method is typically
faster (when it works).
6.1.1
Simulation
One use of the above methods is to generate random variables with a given distribution. This is
important in simulation studies. Suppose that we have a complex operation that involves several
components. Suppose that each component is described by a random variable and that the outcome
of the operation depends on the components in a complicated way. One approach to analyzing
such a system is to simulate each component and calculate the outcome for the simulated values.
If we repeat the simulation many times, then we can get an idea of the probability distribution of
the outcomes. Some examples of simulation are given in Labs.
6.2
The CDF method is straightforward and very versatile. The procedure is to derive the
CDF for Y = g(X) in terms of both the CDF of X, F (x), and the function g, while
also noting how the range of possible values changes. This is done by starting with the
computation of P (Y < y) and inverting this into a statement that can often be expressed
in terms of the CDF of X.
If we also need to find the density of Y , we can do this by differentiating its CDF.
111
112
Example 6.1.
Suppose X has cdf given by F (x) = 1 ex , so that X is Exponential with the mean
1/. Let Y = bX where b > 0. Note that the range of Y is the same as the range of X,
namely (0, ).
P (Y < y) = P (bX < y) = P (X < y/b) =
(Since b > 0, the inequality sign does not change.)
= 1 ey/b = 1 e(/b)y
The student should recognize this as CDF of the exponential distribution with the mean
b/. We already knew that the mean would be b/, but we did not know that Y also has
an exponential distribution.
Example 6.2.
Suppose X has a uniform distribution on [a, b] and Y = cX + d, with c > 0. Find the
CDF of Y .
Solution. Recall that F (t) = (t a)/(b a). Note that the range of Y is [ca + d, cb + d].
We have
P (Y < t) = P (cX + d < t) = P (X < (t d)/c) = F ((t d)/c)
= ((t d)/c a)/(b a) = (t d ac)/(c(b a))
With a little algebra, this can be shown to be the uniform CDF on [ca + d, cb + d].
This example shows that certain simple transformations do not change the distribution
type, only the parameters. Sometimes, however, the change is dramatic.
Example 6.3.
Show that if X has a uniform distribution on the interval [0, 1] then
Y = ln(1 X) has an exponential distribution with mean 1.
Solution. Recall that for the uniform distribution on (0, 1), P (X < x) = x. Also, note
that the range of Y is (0, ).
P (Y < t) = P ( ln(1 X) < t) = P (ln(1 X) > t) =
= P 1 X > et = P X < 1 et = 1 et
Incidentally, note that if X has a uniform distribution on (0, 1), then so does W = 1X.
(See exercises.)
Example 6.4.
The pdf of X is given by
(
3x2
f (x) =
0
0x1
elsewhere
113
Solution.
u
u
F (u) = P (U u) = P [40(1 X) u] = P X > 1
= 1 P (X 1 )
40
40
Z 1u/40
3
u
u
= 1 FX 1
f (x)dx = 1 1
.
=1
40
40
0
Therefore,
3
u 2
f (u) = FU0 (u) =
1
, for 0 u 40
40
40
Exercises
6.1.
Show that if X has a uniform distribution on [0, 1], then so does 1 X.
6.2.
Let X have a uniform distribution on [0, 1]. Let Y = X 1/3 .
a) Find the distribution of Y.
b) Find the mean of Y using the result in (a).
c) Find the mean of Y using the formula E g(X) =
6.3.
Using the CDF method, show that the Weibull random variable Y (with some parameter
> 0, and = 1) can be obtained from Exponential X (with the mean 1) as Y = X 1/ .
6.4.
Suppose the radii of spheres have a normal distribution with mean 2.5 and variance
Find the median volume and median surface area.
1
12 .
6.5.
Let X have a uniform distribution on [0, 1]. Show how you could define H(x) so the
Y = H(X) would have a Poisson distribution with mean 1.3.
6.6.
A point lands into [0, 1] [0, 1] square with random coordinates X, Y independent, having
Uniform[0, 1] distribution each. Use the CDF method to find the distribution of U =
max(X, Y ).
6.7.
6.3
Method of transformations
114
Example 6.5.
Let X be a geometric random variable with PMF
3
p(x) =
4
x1
1
, x = 1, 2, 3, . . .
4
, y = 1, 4, 9, ...
pY (y) = pX ( y) =
4 4
For continuous RVs, the transformation formula originates from the change of variable
formula for integrals.
Theorem 6.2. Transformations: continuous
Suppose that X is a continuous random variable with density fX (x). Let y = h(x)
define a one-to-one transformation that can be uniquely solved for x, say x = w(y),
and J = w0 (y) exists (it is called the Jacobian of the transformation). Then the
density of Y = h(X) is
dx
fY (y) = fX (x) = fX [w(y)] |J|
dy
Example 6.6.
Let X be a continuous random variable with probability distribution
(
x/12 for 1 x 5
f (x) =
0
elsewhere
Find the probability distribution of the random variable Y = 2X 3.
Solution. The inverse solution of y = 2x 3 yields x = (y + 3)/2, from which we obtain
1
J = w0 (y) = dx
dy = 2 . Therefore, using the above Theorem 6.2, we find the density function
of Y to be
1 y+3 1
y+3
fY (y) =
=
, 1 < y < 7
12
2
2
48
Example 6.7.
Let X be a Uniform[0, 1] random variable. Find the distribution of Y = X 5 .
Solution. Inverting, x = y 1/5 , and dx/dy = (1/5)y 4/5 . Thus, we obtain
fY (y) = 1 (1/5)y 4/5 = (1/5)y 4/5 ,
0<y<1
115
Example 6.8.
Let X be a continuous random variable with density
(
x+1
for 1 x 1
2
f (x) =
0
elsewhere
Find the density of the random variable Y = X 2 .
Solution. The inversion of y = x2 yields x1,2 = y, from which we obtain J1 = w10 (y) =
dx1
dx2
1
1
0
y+1 y+1
1
1
=
+
= for 0 y 1
2 y
2
2
2 y
Example 6.9. Location and Scale parameters
Suppose that X is some standard distribution (for example, Standard Normal, or maybe
Exponential with = 1) and Y = a + bX, or, solving for X,
X=
Y a
b
Let X have the density f (x). Then the density of Y can be obtained from Theorem 6.2
dx
1
y
a
fY (y) = f (x) =
f
(6.1)
dy
|b|
b
For example, let X be Exponential with the mean 1, and Y = bX. Then f (x) = ex ,
x > 0, and (6.1) gives
fY (y) = (1/b)ey/b , y > 0
That is, Y is Exponentially distributed with the mean b. This agrees with the result of
Example 6.1.
Another example of location and scale parameters is provided by Normal distribution:
if Z is standard Normal, then Y = + Z produces Y a Normal (, 2 ) random variable.
Thus, is the location and is the scale parameter.
Formula (6.1) also provides a faster way to solve some of the above Examples.
Exercises
6.8.
Suppose that Y = cos(X) where the RV X is given by the table
x
p(x)
0.1
0.2
0.3
0.3
0.1
116
6.9.
The random variable X has a distribution given by the table
x
10
,
x2
x > 10
Write down the density function of Y = 4X 20 (do not forget the limits!)
6.13.
X is a random variable with Uniform [0, 5] distribution.
6.4
117
Sample mean (average of all observations) plays a central role in statistics. We have
discussed the variance of the sample mean in Section 5.4. Here are more facts about the
behavior of the sample mean.
From the linear properties of the expectation, its clear that
n
X1 + X2 + ... + Xn
=
E (X) = E
= .
n
n
Summarizing the above, we obtain
Definition 6.1. Sample mean
A group of independent random variables from some distribution is called a
sample, usually denoted as
X1 , X2 , ..., Xn .
Sample mean, denoted X, is
X=
X1 + X2 + ... + Xn
n
If E (Xi ) = and V (Xi ) = 2 for all i, then the mean and variance of sample
mean are
E (X) = and V (X) = 2 /n
But we are not only able to find the mean and variance of X, but to describe (albeit approximately) its entire distribution!
Theorem 6.3. CLT
Let X be the mean of a sample coming from some distribution with mean and
variance 2 . Then, for large n, X n is approximately Normal with mean and
variance 2 /n.
Here we mention (without proof, which can be obtained using the moment generating
functions) some properties of the sums of independent random variables.
What do these have in common?
The sum of independent Normal RVs is always Normal. The shape of the sum distribution for other independent RVs starts resembling Normal as n increases.
The Central Limit Theorem (CLT) ensures the similar property for most general distributions. However, it holds in the limit, that is, as n gets large (practically, n > 30 is usually
enough). According to it, the sums of independent RVs approach normal distribution.
The same holds for averages, since they are sums divided by n.
If n < 30, the approximation is good only if the population distribution is not too
different from a normal. If the population is normal, the sampling distribution of X will
follow a normal distribution exactly, no matter how small the sample size.1
1
There are some cases of the so-called heavy-tailed distributions for which the CLT does not hold,
but they will not be discussed here.
118
Gamma
Normal
Poisson
10
15
20
0.8
0.2
0.4
0.6
0.8
0.6
0.4
0.2
0.0
3
Poisson
10
500
500
1000
1500
2000
Normal
1.0
1.0
0.0
Exponential
0.00
0.0
0.2
0.05
0.4
0.6
0.10
0.8
1.0
0.15
Distribution of Xi
10
15
20
25
30
Example 6.10.
The average voltage of the batteries is 9.2V and standard deviation is 0.25V. Assuming
normal distribution and independence, what is the distribution of total voltage Y = X1 +
... + X4 ? Find the probability that the total voltage is above 37.
Solution. The mean is 4 9.2 = 36.8. The variance is 4 0.252 = 0.25. Furthermore, Y
itself will have a normal distribution.
Using z-scores, P (Y > 37) = P (Z > (37 36.8)/0.5) = P (Z > 0.4) = 0.5 0.1554 = 0.345
from Normal table, p. 85.
Example 6.11.
An electrical firm manufactures light bulbs with average lifetime equal to 800 hours and
standard deviation of lifetimes equal 400 hours. Approximate the probability that a
random sample of 16 bulbs will have an average life of less than 725 hours.
Solution. The sampling distribution of X will be approximately normal, with mean X =
400
800 and X =
= 100. Therefore,
16
P (X < 725) P
Z<
725 800
100
= P (Z < 0.75) = 0.5 0.2734 = 0.2266
Dependence on n
As n increases, two things happen to the distribution of X: it is becoming sharper (due
to the variance decreasing) and also the shape is becoming more and more Normal. For
119
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Example 6.12.
The fracture strengths of a certain type of glass average 14 (thousands of pounds per
square inch) and have a standard deviation of 2. What is the probability that the average
fracture strength for 100 pieces of this glass exceeds 14.5?
Solution. By the central limit theorem the average strength X has approximately a normal
2
distribution with mean= 14 and standard deviation, = 100
= 0.2. Thus,
P (X > 14.5) P
14.5 14
Z>
0.2
= P (Z > 2.5) = 0.5 0.4938 = 0.0062
6.4.1
Historically, CLT was first discovered in case of Binomial distribution. Since Binomial
Y is a sum of n independent Bernoulli RVs, CLT applies and says that X = Y /n is
approximately Normal, mean p and variance p(1 p)/n. In this case, p := Y /n is called
sample proportion. The Binomial Y itself is also approximately Normal with mean np and
variance np(1 p), as was discussed earlier in Section 4.6.2.
Example 6.13.
A fair (p = 0.5) coin is tossed 500 times.
a) What is the expected proportion of Heads?
b) What is the typical deviation from the expected proportion?
c) What is the probability that the sample proportion is between 0.46 and 0.54?
p
p
Solution. (a) We have E (
p) = p = 0.5 and p = p(1 p)/n = 0.25/500 = 0.0224.
(b) For example, the empirical rule states that about 68% of a normal distribution is
contained within one standard deviation of its mean. Here, the 68% interval is about
120
0.46 0.5
0.54 0.5
<Z<
0.0224
0.0224
=
0.03
0.025
0.02
0.015
0.01
0.005
0
200
210
220
230
240
250
260
270
280
290
300
Normal approximation is not very good when np is small. Heres an example with
n = 50 and p = 0.05:
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
5
10
15
20
25
121
Exercises
6.17.
The average concentration of potassium in county soils was determined as 85 ppm, with
standard deviation 30 ppm. If n = 20 samples of soils are taken, find the probability that
their average potassium concentration will be in the medium range (80 to 120 ppm).
6.18.
The heights of students have a mean of 174.5 centimeters (cm) and a standard deviation
of 6.9 cm. If a random sample of 25 students is obtained, determine
a) the mean and standard deviation of X;
b) the probability that the sample mean will fall between 172.5 and 175.8 cm;
c) the 70th percentile of the X distribution.
6.19.
The measurements of an irregular signals frequency have mean of 20 Hz and standard
deviation of 5 Hz. 50 independent measurements are done.
a) Find the probability that the average of these 50 measurements will be within 1 unit
of the theoretical mean 20.
b) How many measurements should be done to ensure that the probability in part (a)
equals 0.9?
6.20.
A process yields 10% defective items. If 200 items are randomly selected from the process,
what is the probability that the sample proportion of defectives
a) exceeds 13%?
b) is less than 8%?
6.21.
The weight Xi of a Giant Siamese Frog has an approximately normal distribution with
the mean of 215g and standard deviation of 40g.
a) Find the probability that a single frog weighs is between 200 and 230g.
b) Find the 85th percentile of the frogs weights.
c) Let X be the average weight of 25 frogs. Find the mean and standard deviation of
X. Assume that the frogs weights are independent.
d) Approximate the probability that X is between 200 and 230g. Compare to part (a)
6.22.
The proportion of office workers who frequently check Facebook at work is believed to
be 0.8. When n = 100 office workers are observed, X of them will be found checking
Facebook.
a) Find the mean and standard deviation of X.
b) Find the Normal approximation for the probability that X will be between 75 and
87 (inclusive).
122
Chapter 7
Descriptive statistics
The goal of statistics is somewhat complementary to that of the probability. Probability
answers the question of what data are likely to be obtained from known probability distributions.
Statistics answers the opposite question: what kind of probability distributions are likely
to have generated the data at hand?
Descriptive statistics are the ways to summarize the data set, to represent its tendencies
in a concise form and/or describe them graphically.
7.1
We will usually refer to the given data set as a sample and denote its entries as X1 , X2 , ..., Xn .
The objects whose measurements are represented by Xi are often called experimental units
and are usually assumed to be sampled randomly from a larger population of interest. The
probability distribution of Xi is then referred to as population distribution.
Definition 7.1. Population and sample
Population is the collection of all objects of interest. Sample is the collection of
objects from the population picked for the study.
A simple random sample (SRS) is a sample for which each object in the population
has the same probability to be picked as any other object, and is picked
independently of any other object.
Example 7.1.
a) We would like to learn the public opinion regarding a tax reform. We set up phone
interviews with n = 1000 people. Here, the population (which we really would like to
learn about) is all U.S. adults, and the sample (which are the objects, or individuals
we actually get), is the 1000 people contacted.
For some really important matters, the U.S. Census Bureau tries to reach every
single American, but this is practically impossible.
b) The gas mileage of a car is investigated. Suppose that we drive n = 20 times starting
with a full tank of gas, until its empty, and calculate the average gas mileage after
123
124
Usually, we require that our sample be a simple random sample (SRS) so that we can
extend our findings to the entire population of interest. This means that no part of the
population is preferentially selected for, or excluded from the study. It is amazing that,
given proper sampling procedures, we can sometimes tell a lot about a large population
after sampling only a small fraction of it!
Bias often occurs when the sample is not an SRS. For example, self-selection bias
occurs when subjects volunteer for the study. Medical studies that pay for participation
may attract lower-income volunteers. A questionnaire issued by a website will represent
only the people that visit that website etc.
The ideal way to implement an SRS is to create a list of all objects in a population,
and then use a random number generator to pick the objects to be sampled. In practice,
this is usually very difficult to accomplish.
In the future, we will always assume that we are dealing with an SRS, unless otherwise
noted. Thus, we will obtain a sequence of independent and identically distributed (IID)
random variables X1 , X2 , ..., Xn from the population distribution we are studying.
7.2
Graphical summaries
The most popular graphical summary for a numeric data set is a histogram.
Definition 7.2.
The histogram of the data set X1 , X2 , . . . , Xn is a bar chart representing the
classes (or bins) on the x-axis and frequencies (or proportions) on the y-axis.
Bins should be of equal width so that all bars would visually be on the same level.1
The construction of a histogram is easier to show by example.
Example 7.2.
Old Faithful is a famous geyser in Yellowstone National Park. The data recorded represent
waiting times between eruptions (in minutes). There are n = 272 observations. The first
ten observations are 79, 54, 74, 62, 85, 55, 88, 85, 51, 85. Using the bins 41-45, 46-50 etc
we get
Bin
Count
41-45
4
46-50
22
51-55
33
56-60
24
61-65
14
Bin
Count
76-80
54
81-85
55
86-90
23
91-95
5
96-100
1
66-70
10
71-75
27
The choice of bins of course affects the appearance of a histogram. With too many
bins, the graph becomes hard to read, and with too little bins, a lot of information is
lost. We would generally recommend to use more bins for larger sample sizes; but not too
many bins, so that the histogram keeps a smooth appearance. Some authors recommend
Bins can be of unequal width but then some adjustment to their heights must be made.
125
Describing the shape of a histogram, we may note its features as being symmetric, or
maybe skewed (left or right); having one bulge (mode) - that is, unimodal distribution,
or two modes - that is, bimodal distribution etc. The Old Faithful data have bimodal
shape. Some skewed histogram shapes are shown in Fig. 8.1
30
20
0
10
Frequency
40
50
Histogram of Y
40
50
60
70
80
90
100
10
Frequency
20
40
Frequency
60
80
15
40
50
60
70
80
90
100
50
60
70
80
90
Figure 7.2: histograms of Old Faithful data: bins too wide, bins too narrow
Symmetric
Right skewed
0.15
0.00
0.0
0.00
0.2
0.15
0.4
Left skewed
10
10
10
7.3
7.3.1
Numerical summaries
Sample mean and variance
The easiest and most popular summary for a data set is its mean X. The mean is a
measure of location for the data set. We often need also a measure of spread. One such
measure is the sample standard deviation.
126
S =
Pn
X)2
=
n1
i=1 (Xi
Pn
2
i=1 (Xi )
nX
n1
(7.1)
P 2
364 52 (8)
and X = 5. Then,
Xi = 364 and we get S 2 =
= 23.43 and S = 23.43 =
81
4.84.
7.3.2
Percentiles
Definition 7.4.
The pth percentile (or quantile) of a data set is a number q such that p% of the
entire sample are below this number. It can be calculated as r = ((n + 1)p/100)th
smallest number in the sample.
The algorithm for calculating pth percentile is then as follows.2
a) Order the sample, from smallest to largest, denote these as
X(1) , X(2) , . . . , X(n) .
b) Calculate r = (n + 1)p/100, let k = brc be the integer part of r.
c) If interpolation is desired, take X(k) + (r k)[X(k+1) X(k) ],
If interpolation is not needed, take X(r ) where r is the rounded value of r.
Generally, if the sample size n is large, the interpolation is not needed.3
The 50-th percentile is known as median. It is, along with the mean, a measure of
center of the data set.
2
127
Example 7.4.
Back to the example of US presidents: find the median and 22nd percentile of the presidents heights.
Solution. The ordered data are 177, 182, 182, 185, 185, 188, 188, 193. For n = 8 we have
two middle observations: ranked 4th and 5th, these are both 185. Thus, the median is
185 (accidentally we have seen that X = 185 also).
To find 22nd percentile, take r = (n + 1)p = 9(0.22) = 1.98, round it to 2. Then, take
2nd ranked observation, which is 182.
Mean and median
The mean and median are popular measures of center. For a symmetric data set, both
give roughly the same result. However, for a skewed data set, they might produce fairly
different results. For the right-skewed distribution, mean > median, and for the leftskewed, mean < median.
The median is resistant to outliers. This means that the unusually high or low observations do not greatly affect the median. The mean X is not resistant to outliers.
Mean of a function
We can define the mean of any function g of our data as
g(X) =
Similarly to the properties of the expected values (see Theorem 3.2), we have the following
properties:
a) aX + b = aX + b
b) but, generally, g(X) 6= g(X)
c) For sample standard deviation, SaX+b = |a|SX
Exercises
7.1.
The temperature data one morning from different weather stations in the vicinity of Socorro were
71.9, 73.7, 72.3, 74.6, 72.8, 67.5, 72.0 (in F )
a) Find the mean and standard deviation of temperatures
b) Find the median and 86th percentile.
c) Suppose that the last measurement came from Magdalena Ridge and became equal
to 41.7 instead of 72.0. How will this affect the mean and the median, respectively?
d) Re-calculate the above answers if the temperature is expressed in Celcius.
you do not have to do it from scratch!]
[Hint:
128
7.2.
The heights of the last 20 US presidents are, in cm: 185, 182, 188, 188, 185, 177, 182, 193,
183, 179, 175, 188, 182, 178, 183, 180, 182, 178, 170, 180.
a) Make a histogram of the heights, choosing bins wisely.
b) Calculate mean and the median, compare. How do these relate to the shape of the
histogram?
7.3.
The permeabilities of 12 oil pumping locations, in millidarcies, are: 0.07, 0.17, 0.06, 0.09,
0.17, 0.18, 0.04, 0.07, 0.02, 0.57, 0.71, 0.05.
a) Make a histogram of the permeabilities, choosing bins wisely.
b) Calculate mean and the median, compare. How do these relate to the shape of the
histogram?
c) Find standard deviation of permeabilities.
7.4.
Several runners have completed a 1 mile race, with these results: 4.35, 4.51, 4.18, 4.56,
4.10, 3.75 (in minutes).
a) Find the average time of these runners.
b) Find the average speed (note: you will have to find each runners individual speed,
first).
c) Compare the answers to (a) and (b): why is mean speed not equal to the inverse of
mean running time?
7.5.
Here are samples of temperature measurements at three cities (on random days).m Theyve
been ordered for your convenience.
Albuquerque, NM
28 28 30 34 36 42 48 48 52 55 56 65 66 69 70 74 76 77 80 83
San Francisco, CA
46 47 48 51 52 52 53 55 57 58 58 58 59 60 60 61 61 62 62 64
Anchorage, AK
9 15 24 26 28 30 32 32 33 33 34 38 41 48 52 54 55 58 59 63
Plot the histograms overlayed over one another (using different colors, maybe) or directly
above one another with a common scale. Also, calculate the means and standard deviations. Based on these numbers and graphs, compare the climates of these 3 cities.
129
10
15
20
25
7.6.
The following histogram was obtained for the distribution of 113 final grades in a math
course.
20
40
60
80
100
130
Chapter 8
Statistical inference
8.1
Introduction
8.1.1
Unbiased Estimation
What are the properties of desirable estimators? We would like the sampling distribution
of to have a mean equal to the parameter estimated. An estimator possessing this
property is said to be unbiased.
Definition 8.1.
A statistic is said to be an unbiased estimator of the parameter if
= .
E ()
The unbiased estimators are correct on average, while actual samples yield results
higher or lower than the true value of the parameter
On the other hand, biased estimators would consistently overestimate or underestimate
the target parameter.
Example 8.1.
We have seen (p. 117) that E (X) = , therefore X is an unbiased estimate of .
Example 8.2.
P
One reason that the sample variance S 2 = (Xi X)2 /(n1) is divided by n1 (instead
of n) is the unbiasedness property. Indeed, it can be shown that E (S 2 ) = 2 . However,
E (S) 6= .
131
132
8.2
Confidence intervals
(8.1)
Proof. Central Limit Theorem (CLT) claims that, regardless of the initial distribution,
the sample mean X = (X1 + ... + Xn )/n will be approximately Normal:
X Normal (, 2 /n)
for n reasonably large (usually n 30 is considered enough).
Suppose that a confidence level C = 100%(1 ) is given. Then, find z/2 such that
P (z/2 < Z < z/2 ) = 1 ,
Z is a standard Normal RV
Due to the symmetry of Z-distribution, we need to find the z-value with the upper tail
probability /2. That is, table area TA(z/2 ) = 0.5 /2.
1
On a technical note, a 95% confidence interval does not mean that there is a 95% probability that the
interval contains the true mean. The interval computed from a given sample either contains the true mean
or it does not. Instead, the level of confidence is associated with the method of calculating the interval.
For example, for a 95% confidence interval, if many samples are collected and a confidence interval is
computed for each, in the long run about 95% of these intervals would contain the true mean.
2
In this case, will also be an unbiased estimate of .
133
X
, therefore
/ n
X
< z/2 1
P z/2 <
/ n
2.5
2.8
4.8
4.4
2.9
4.0
3.6
5.2
2.8
3.0
3.3
4.8
5.6
Compute the 95% confidence interval for the mean drying time. Assume that = 1.
Solution. We compute X = 3.79 and z/2 = 1.96
( = 0.05, upper-tail probability = 0.025, table area = 0.5 0.025 = 0.475)
Then, using (8.1), the 95% C.I. for the mean is
134
Example 8.5.
An important property of plastic clays is the amount of shrinkage on drying. For a certain
type of plastic clay 45 test specimens showed an average shrinkage percentage of 18.4 and
a standard deviation of 1.2. Estimate the true average shrinkage for clays of this type
with a 95% confidence interval.
Solution. For these data, a point estimate of is X = 18.4. The sample standard deviation
is S = 1.2. Since n is fairly large, we can replace by S.
Hence, 95% confidence interval for is
1.2
1.2
18.4 1.96 < < 18.4 + 1.96 = (18.05, 18.75)
45
45
Thus we are 95% confident that the true mean lies between 18.05 and 18.75.
m = z/2
n
(8.2)
z
/2
2
Example 8.6.
We would like to estimate the pH of a certain type of soil to within 0.1, with 99% confidence. From past experience, we know that the soils of this type usually have pH in the
5 to 7 range. Find the sample size necessary to achieve our goal.
Solution. Let us take the reported 5 to 7 range as the 2 range. This way, the crude
estimate of is (7 5)/4 = 0.5. For 99% confidence, we find the upper tail area /2 =
(1 0.99)/2 = 0.005, thus z/2 = 2.576, and n = (2.576 0.5/0.1)2 166
Exercises
8.1.
We toss a coin n times, let X be the total number of Heads observed. Let the probability
of a Head in a single toss be p. Is p = X/n an unbiased estimate of p? Explain your
reasoning.
8.2.
In a school district, they would like to estimate the average reading rate of first-graders.
After selecting a random sample of n = 65 readers, they obtained sample mean of 53.4
words per minute (wpm), and standard deviation of 33.9 wpm.n Calculate a 98% confidence interval for the average reading rate of all first-graders in the district.
135
8.3.
A random sample of 200 calls initiated while driving had a mean duration of 3.5 minutes
with standard deviation 2.2 minutes. Find a 99% confidence interval for the mean duration
of telephone calls initiated while driving.
8.4.
a) Bursting strength of a certain brand of paper is supposed to have Normal distribution
with = 150 kPa and = 15 kPa. Give an interval that contains about 95% of all
bursting strength values
b) Assuming now that the true and are unknown, the researchers collected a sample
of n = 100 paper bags and measured their bursting strength. They obtained X =
148.4 kPa and S = 18.9 kPa. Calculate the 95% C.I. for the mean bursting strength.
c) Sketch a Normal density curve with = 150, = 15, with both of your intervals
shown on the x-axis. Compare the intervals widths.
8.5.
In determining the mean viscosity of a new type of motor oil, the lab needs to collect
enough observations to approximate the mean within 0.2 SAE grade, with 96% confidence. The standard deviation typical for this type of measurement is 0.4. How many
samples of motor oil should the lab test?
8.6.
The times to react to a pistol start were measured for a sample of 100 experienced swimmers, yielding a mean of 0.214 sec and standard deviation 0.036 sec. Find a 95% confidence
interval for the average reaction time for the population of all experienced swimmers.
8.7.
A new petroleum extraction method was tested on 60 wells. The average improvement
in total extraction was 18.3%. Assuming that standard deviation of the improvement
was = 10.3%, find the 96% CI for the true (i.e. all possible future wells) average
improvement in total extraction by the new method.
8.8.
a) Show that, for n = 2, S 2 is an unbiased estimate of 2 , that is, E (S 2 ) = 2 [Hint:
use the fact that E (Xi2 ) = (E Xi )2 + V ar(Xi )]
b)?
8.3
Statistical hypotheses
Definition 8.2.
A statistical hypothesis is an assertion or conjecture concerning one or more
population parameters.
The goal of a statistical hypothesis test is to make a decision about an unknown
parameter (or parameters). This decision is usually expressed in terms of rejecting or
accepting a certain value of parameter or parameters.
Some common situations to consider:
136
In making the decision, we will compare the statement (say, p = 1/2) with the available
data and will reject the claim p = 1/2 if it contradicts the data. In the subsequent sections
we will learn how to set up and test the hypotheses in various situations.
Null and alternative hypotheses
A statement like p = 1/2 is called the Null hypothesis (denoted by H0 ). It expresses
the idea that the parameter (or a function of parameters) is equal to some fixed value.
For the coin example, its
H0 : p = 1/2
and for the drug example its
H0 : 1 = 2
where 1 is the mean effectiveness of the old drug compared to 2 for the new one.
Alternative hypothesis (denoted by HA ) seeks to disprove the null. For example, we
may consider two-sided alternatives
HA : p 6= 1/2
8.3.1
n(X 0 )
d) Test Statistic z =
for two-tailed
for right-tailed
for left-tailed
137
A null hypothesis H0 for the population mean is a statement that designates the
value 0 for the population mean to be tested. It is associated with an alternative
hypothesis HA , which is a statement incompatible with the null. A two-sided (or
two-tailed) hypothesis setup is
H0 : = 0 versus HA : 6= 0
for a specified value of 0 , and a one-sided (or one-tailed) hypothesis setup is either
H0 : = 0 versus HA : > 0
(right-tailed test)
or
H0 : = 0 versus HA : < 0
(left-tailed test)
Calculation of P-values
For the two-tailed alternative hypothesis, P-value = 2 P (Z > |z|).
For the right-tailed hypothesis, HA : > 0 , P-value = P (Z > z)
0.1
0.2
0.3
< 0
0.0
0.1
0.2
0.3
> 0
0.0
0.0
0.1
0.2
0.3
0.4
0.4
0.4
138
is true, but nevertheless will be rejected by our test (known as Type I error, or proportion
of false positives). Decreasing decreases the proportion of false positives, but also makes
it harder to reject H0 . Usually p-value < 0.01 is considered strong evidence against H0 ,
and p-value around 0.10 as weak evidence. Which to use as the threshold for rejecting
H0 may depend on how important it is to avoid false positives. Many people think that
= 0.05 provides a good practical choice.
Example 8.7.
A manufacturer of sports equipment has developed a new synthetic fishing line that he
claims has a mean breaking strength of 8.0 kg with a standard deviation of 0.5 kg. A
random sample of 50 lines is tested and found to have a mean breaking strength of 7.80
kg. Test the hypothesis that = 8 against the alternative that 6= 8. Use = 0.01 level
of significance.
Solution.
a) H0 : = 8
b) HA : 6= 8
c) = 0.01 and hence critical value z/2 = 2.57
d) Test statistic:
z=
n(X 0 )
=
50(7.8 8)
= 2.83
0.5
z=
n(X 0 )
=
100(71.8 70)
= 2.02
8.9
e) Decision: Reject H0 if 2.02 > 1.645, since 2.02 > 1.645, we reject H0 .
139
f) Conclusion: We conclude that the mean life span today is greater than 70 years.
Decision based on P-value:
Since the test in this example is one-sided, the desired p-value is the area to the right of
z = 2.02. Using Normal Table, we have
P-value = P (Z > 2.02) = 0.5 0.4783 = 0.0217.
Conclusion: Reject H0 .
Example 8.9.
The nominal output voltage for a certain electrical circuit is 130V. A random sample of
40 independent readings on the voltage for this circuit gave a sample mean of 128.6V and
a standard deviation of 2.1V. Test the hypothesis that the average output voltage is 130
against the alternative that it is less than 130. Use a 5% significance level.
Solution.
a) H0 : = 130
b) HA : < 130
c) = 0.05 and z = 1.645
n(X 0 )
40(128.6 130)
d) Test statistic:
z=
=
= 4.22
2.1
e) Decision: Reject H0 since 4.22 < 1.645.
f) Conclusion: We conclude that the average output voltage is less than 130.
Decision based on p-value:
P-value = P (Z < 4.22) = (0.5 0.4990) = 0.001.
As a result, the evidence in favor of HA is even stronger than that suggested by the 0.05
level of significance. (P-value is very small!)
Exercises
8.9.
It is known that the average height of US adult males is about 173 cm, with standard
deviation of about 6 cm. Assume that the heights follow Normal distribution.
Referring to Exercise 7.2, the average height of 20 last US presidents was 181.9 cm.
Are the presidents taller than the average? Test at the level = 0.05 and also compute
the p-value.
8.10.
In an industrial process, nanotubes should have the average diameter of 5 angstrom. The
typical variance for the nanotubes obtained in this process is 0.2 angstrom.
The sample of 50 nanotubes was studied with the observed average diameter of 5.12
angstrom. Is the evidence that the process average is different from 5 angstrom?
140
8.11.
A biologist knows that, under normal conditions, the average length of a leaf of a certain
full-grown plant is 4 inches, with the standard deviation of 0.6 inches. A sample of 45
leaves from the plants that were given a new type of plant food had an average length of
4.2 inches. Is there reason to believe that the new plant food is responsible for a change
in the average growth of leaves? Use = 0.02.
8.12.
In the situation of Exercise 8.7, test H0 : = 0 against HA : > 0. Test at the level
= 0.04 and also compute the p-value.
8.13.
Is it more difficult to reject H0 when the significance level is smaller? Suppose that the
p-value for a test was 0.023. Would you reject H0 at the level = 0.05? At = 0.01?
8.14.
It is well known that the normal human temperature is 98.6 . If a sample of 75 healthy
adults is collected and the sample mean was 98.3 , can we claim that 98.6 is a plausible
value for the mean temperature of all adults? Assume that = 0.8 . Make your decision
based on the p-value.
8.4
8.4.1
Confidence intervals
Frequently, we are attempting to estimate the mean of a population when the variance is
unknown. Suppose that we have a random sample from a normal distribution, then the
random variable
X
T =
S/ n
is said to have a (Student)o T-distribution with n 1 degrees of freedom. Here, S is the
sample standard deviation.
With unknown, T should be used instead of Z to construct a confidence interval for
. The procedure is same as for known except that is replaced by S and the standard
normal distribution is replaced by the T-distribution.
T-distribution is also symmetric, but has somewhat heavier tails than Z. This is
because of extra uncertainty of not knowing .
Definition 8.4. CI for mean, unknown
If X and S are the mean and standard deviation of a random sample from a
normal population with unknown variance 2 , a (1 )100% confidence interval
for is
S
S
X t/2 < < X + t/2 ,
n
n
where t/2 is the t-value with n 1 degrees of freedom leaving an area of /2 to
the right. (See Table B.)
141
0.4
0.2
0.0
0.1
f(x)
0.3
df = 1
df=4
df=10
Z distribution
142
8.4.2
Hypothesis test
When sample sizes are small and population variance is unknown, use the test statistic
n(X 0 )
,
t=
S
with n 1 degrees of freedom.
Steps of a Hypothesis Test
a) Null Hypothesis H0 : = 0
b) Alternative Hypothesis HA : 6= 0 , or HA : > 0 , or HA : < 0 .
c) Critical value: t/2 for two-tailed or t for one-tailed test.
n(X 0 )
with n 1 degrees of freedom
d) Test Statistic t =
S
e) Decision Rule: Reject H0 if
|t| > t/2
t > t
t < t
for two-tailed
for right-tailed
for left-tailed
t=
n(X 0 )
=
S
25(88.3 85.0)
= 2.203
7.49
143
0.10
0.05
0.025
0.01
0.005
0.001
0.0005
3.078
1.886
1.638
1.533
1.476
6.314
2.920
2.353
2.132
2.015
12.706
4.303
3.182
2.776
2.571
31.821
6.965
4.541
3.747
3.365
63.657
9.925
5.841
4.604
4.032
318.309
22.327
10.215
7.173
5.893
636.619
31.599
12.924
8.610
6.869
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
2.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
5.208
4.785
4.501
4.297
4.144
5.959
5.408
5.041
4.781
4.587
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3.012
2.977
2.947
4.025
3.930
3.852
3.787
3.733
4.437
4.318
4.221
4.140
4.073
16
17
18
19
20
1.337
1.333
1.330
1.328
1.325
1.746
1.740
1.734
1.729
1.725
2.120
2.110
2.101
2.093
2.086
2.583
2.567
2.552
2.539
2.528
2.921
2.898
2.878
2.861
2.845
3.686
3.646
3.610
3.579
3.552
4.015
3.965
3.922
3.883
3.850
21
22
23
24
25
1.323
1.321
1.319
1.318
1.316
1.721
1.717
1.714
1.711
1.708
2.080
2.074
2.069
2.064
2.060
2.518
2.508
2.500
2.492
2.485
2.831
2.819
2.807
2.797
2.787
3.527
3.505
3.485
3.467
3.450
3.819
3.792
3.768
3.745
3.725
30
40
60
120
1.310
1.303
1.296
1.289
1.282
1.697
1.684
1.671
1.658
1.645
2.042
2.021
2.000
1.980
1.960
2.457
2.423
2.390
2.358
2.326
2.750
2.704
2.660
2.617
2.576
3.385
3.307
3.232
3.160
3.090
3.646
3.551
3.460
3.373
3.291
144
t=
n(X 0 )
=
S
20(34.271 35.0)
= 1.119
2.915
8.4.3
145
8.4.4
Exercises
8.15.
In determining the gas mileage of a new model of hybrid car, the independent research
company collected information from 14 randomly selected drivers. They obtained the
sample mean of 38.4 mpg, with the standard deviation of 5.2 mpg. Obtain a 99% C.I. for
.
What is the meaning of in this problem? What assumptions are necessary for your C.I.
to be correct?
146
8.16.
This problem is based on the well-known Newcomb data set for the speed of light.p It
contains the measurements (in nanoseconds) it took the light to bounce inside a network
of mirrors. The numbers given are the time recorded minus 24, 800 ns. We will only use
the first ten values.
28
26
33
24
34
-44
27
16
40
-2
Some mishaps in the experimental procedure led to the two unusually low values (44
and 2). Calculate the 95% C.I.s for the mean in case when
a) all the values are used
b) the two outliers are removed
Which of the intervals will you trust more and why?
8.17.
The following data were collected for salinity of water from a sample of municipal sources
(in parts per thousand)
0.5
0.5
0.6
0.6
0.8
0.8
0.8
0.9
1.0
1.1
1.3
Find a 98% confidence interval for the average salinity in all municipal sources in the
sampling area.
8.18.
A job placement director claims that mean starting salary for nurses is $25 per hour. A
random sample of 10 nurses salaries has a mean $21.6 and a standard deviation of $4.7
per hour. Is there enough evidence to reject the directors claim at = 0.01?
Repeat the exercise for the following results: sample mean $21.6 and a standard deviation of $0.47. What is your answer now? What can you conclude about the role of
noise (standard deviation) in statistical testing?
8.19.
Refer to the Exercise 8.14. If, now, a sample of only 25 adults is collected and the sample
mean was still 98.3 , with sample standard deviation S = 0.8 , what will your conclusion
be? Comparing to the answer to Exercise 8.14, what can you conclude about the role of
sample size in statistical testing?
8.20.
Suppose that 95% CI for the mean of a large sample was computed and equaled [10.15, 10.83].
What will be your decision about the hypothesis H0 : = 10 vs HA : 6= 10 at 5% level
of significance? At 10% level? At 1% level?
8.21.
For the situation in Example 8.7 (fishing line strength), test the hypotheses using the C.I.
approach.
8.5
147
Two-sample problems:
The goal of inference is to compare the response in two groups.
Each group is considered to be a sample from a distinct population.
The responses in each group are independent of those in the other group.
Suppose that we have two independent samples, from two distinct populations. Here
is the notation that we will use to describe the two populations:
population
1
2
Variable
Mean
Standard deviation
X1
X2
1
2
1
2
We want to compare the two population means, either by giving a confidence interval for
1 2 or by testing the hypothesis of difference, H0 : 1 = 2 . Inference is based on two
independent random samples. Here is the notation that describes the samples:
sample
1
2
sample size
sample mean
sample st.dev.
n1
n2
X1
X2
S1
S2
If independent samples of size n1 and n2 are drawn at random from two populations, with
means 1 and 2 and variances 12 and 22 , respectively, the sampling distribution of the
differences of the means X 1 X 2 , is normally distributed with mean X 1 X 2 = 1 2
2 = 2 /n + 2 /n . Then, the two-sample Z statistic
and variance D
1
2
1
2
Z=
(X 1 X 2 ) (1 2 )
D
(8.3)
148
for two-tailed
for right-tailed
for left-tailed
23
19
13.3
12.4
s
1.7
1.8
(a) Is there significant evidence that the mean hemoglobin level is higher among breast-fed
babies?
(b) Give a 95% confidence interval for the mean difference in hemoglobin level between
the two populations of infants.
Solution. (a) H0 : 1 2 = 0 vs HA : 1 2 > 0, where 1 is the mean of the Breast-fed
population and 2 is the mean of the Formula population. The test statistic is
13.3 12.4
0.9
t= q
=
= 1.654
0.544
1.72
1.82
+
23
19
with 18 degrees of freedom. The p-value is P (T > 1.654) = 0.058 using software. Using
Table B, we see that 1.654 is between table values 1.330 and 1.734, which gives upper-tail
probability between 0.05 and 0.10. This is not quite significant at 5% level.
(b) The 95% confidence interval is
0.9 2.101(0.544) = 0.9 1.1429 = (0.2429, 2.0429)
149
Standard Error
All previous formulas involving t-distribution have a common structure. For example,
(8.3) can be re-written as
(X 1 X 2 ) t/2 SEX 1 X 2 ,
p
where the quantity SEX 1 X 2 = S12 /n1 + S22 /n2 is called the Standard Error. Likewise,
the one-sample confidence interval for the mean is
X t/2 SEX ,
where SEX = s/ n.
Likewise, the formulas for the t-statistic are
t=
X1 X2
X 0
for 2-sample, and t =
for 1-sample situation.
SEX 1 X 2
SEX
We will see a lot of similar structure in the CI and hypothesis testing formulas in the
future. The value of standard error is often reported by the software when you request
CIs or hypothesis tests.
8.5.1
Matched pairs
Sometimes, we are comparing data that come in pairs of matched observations. A good
example of this are before and after studies. They present the measurement of some
quantity for the same set of subjects before and after a certain treatment has been administered. Another example of this situation is twin studies for which pairs of identical twins
are selected and one twin (at random) is given a treatment, while the other is serving as
a control (that is, does not receive any treatment, or maybe receives a fake treatment,
placebo, to eliminate psychological effects).
When the same (or somehow related) subjects are used, we should not consider the
measurements independent. This is the Matched Pairs design. In this case, we would compute Difference = Before After or Treatment Control and just do a one-sample
test for the mean difference.
Example 8.16.
The following are the left hyppocampus volumes (in cm3 ) for a group of twin pairs,
one is affected by schizophrenia, and the other is notq
Pair number 1
Unaffected 1.94
Affected
1.27
Difference 0.67
2
1.44
1.63
-0.19
3
1.56
1.47
0.09
4
1.58
1.39
0.19
5
2.06
1.93
0.13
6
1.66
1.26
0.40
7
1.75
1.71
0.04
8
1.77
1.67
0.10
9
1.78
1.28
0.50
10
1.92
1.85
0.07
11
1.25
1.02
0.23
12
1.93
1.34
0.59
Is there evidence that the LH volumes for schizophrenia-affected people are different
from the unaffected ones?
Solution. Since the twins LH volumes are clearly not independent (if one is large the other
is likely to be large, too positive correlation!), we cannot use the 2-sample procedure.
150
However, we can just compute the differences (Unaffected Affected) and test for the
mean difference to be equal to 0. That is,
H0 : = 0 versus HA : 6= 0
where is the true average difference, and X, S are computed for the sample of differences.
Given that X = 0.235 andS = 0.254, lets test these hypotheses at = 0.10. We
obtain t = (0.235 0)/(0.254/ 12) = 3.20. From the t-table with df = 11 we get p-value
between 2(0.005) = 0.01 and 2(0.001) = 0.002. At = 0.05, we Reject H0 , thus stating
that there is a significant difference between LH volumes of normal and schizophrenic
people.
Exercises
More exercises for this section are located at the end of this Chapter.
8.22.
In studying how humans pick random objects, the subjects were presented a population
of rectangles and have used two different sampling methods. They then calculated the
average areas of the sampled rectangles for each method. Their results were
Method 1
Method 2
mean
10.8
6.1
st.dev.
4.0
2.3
n
16
16
Calculate the 99% C.I. for the difference of true means by the two methods. Is there
evidence that the two methods produce different results?
8.23.
The sports research lab studies the effects of swimming on maximal volume of oxygen
uptake.
For 8 volunteers, the maximal oxygen uptake was measured before and after the 6-week
swimming program. The results are as follows:
Before
After
2.1
2.7
3.3
3.5
2.0
2.8
1.9
2.3
3.5
3.2
2.2
2.1
3.1
3.6
2.4
2.9
Is there evidence that the swimming program has increased the maximal oxygen uptake?
8.24.
Visitors to an electronics website rated their satisfaction with two models of printers/scanners,
on the scale of 1 to 5. The following statistics were obtained:
Model A
Model B
n
31
65
mean
3.6
4.2
st.dev.
1.5
0.9
At the level of 5%, test the hypothesis that both printers would have the same average
rating in the general population, that is, H0 : A = B . Also, calculate the 95% confidence
interval for the mean difference A B .
8.6
151
8.6.1
In this Chapter, we will consider estimating the proportion p of items of certain type,
or maybe some probability p. The unknown population proportion p is estimated by the
sample proportion
X
p = .
n
We know (from CLT, Section 6.4) that if the sample size is sufficiently large, p hasqapproximately normal distribution, with mean E (
p) = p and standard deviation p = p(1p)
n .
Based on this, we obtain the confidence intervals and hypothesis tests for proportion.
Theorem 8.2. CI for proportion
For a random sample of size n from a large population with unknown proportion p
of successes, the (1 )100% confidence interval for p is
p
p z/2 p(1 p)/n
8.6.2
p p0
p0 (1 p0 )/n
152
z
/2
2
= 0.09
2.33
0.01
2
= 4883
8.6.3
We will call the two groups being compared Population 1 and Population 2, with population proportions of successes p1 and p2 . Here is the notation we will use in this section:
Population
pop. prop.
sample
# successes
sample prop.
1
2
p1
p2
n1
n2
X1
X2
p1 = X1 /n1
p2 = X2 /n2
To compare the two proportions, we use the difference between the two sample proportions:
p1 p2 . Therefore, when n1 and n2 are
q large, p1 p2 is approximately normal with mean
1)
= p1 p2 and standard deviation p1 (1p
+
n1
p2 we replace them by p1 and p2 respectively.
p2 (1p2 )
.
n2
153
56
38
56 + 38
= 0.7 and p2 =
= 0.475 and p =
= 0.5875
80
80
80 + 80
Exercises
8.25.
A nutritionist claims that at 75% of the preschool children in a certain country have
protein deficient diets. A sample survey reveals that 206 preschool children in a sample
of 300 have protein deficient diets. Test the claim at the 0.02 level of significance. Also,
compute a 98% confidence interval.
8.26.
In a survey of 200 office workers, 165 said they were interrupted three or more times an
hour by phone messages, faxes etc. Find and interpret a 90% confidence interval for the
population proportion of workers who are interrupted three or more times an hour.
8.27.
You would like to design a poll to determine what percent of your peers volunteer for
charities. You have no clear idea of what the value of p is going to be like, and youll be
satisfied with the 90% margin of error equal to 10%. Find the sample size needed for
your study.
154
8.28.
In an opinion poll, out of a sample of 300 people, 182 were in support of Proposition Z. At
the level of 5%, test the hypothesis that more than half of population support Proposition
Z. Also, find the p-value.
8.29.
In random samples of 200 tractors from one assembly line and 400 tractors from another,
there were, respectively, 16 tractors and 20 tractors which required extensive adjustments
before they could be shipped. At the 5% level of significance, can we conclude that there
is a difference in the quality of the work of the two assembly lines?
8.30.
In a survey of customer satisfaction on amazon.com, 86 out of 120 customers of Supplier A
gave it 5 stars, and 75 out of 136 customers of Supplier B gave it 5 stars. Is there evidence
that customers are more satisfied with one supplier than the other? Also, compute a 95%
confidence interval for the difference of two proportions.
Chapter Exercises
For each of the questions involving hypothesis tests, state the null and alternative hypotheses, compute the test statistic, determine the p-value, make the decision and summarize
the results in plain English. Use = 0.05 unless otherwise specified.
8.31.
Two brands of batteries are tested and their voltages are compared. The summary statistics are below. Find and interpret a 95% confidence interval for the true difference in
means.
Brand 1
Brand 2
mean
9.2
8.9
st.dev.
0.3
0.6
n
25
27
8.32.
You are studying yield of a new variety of tomato. In the past, yields of similar types of
tomato have shown a standard deviation of 8.5 lbs per plant. You would like to design
a study that will determine the average yield within a 90% error margin of 2 lbs. How
many plants should you sample?
8.33.
College Board claimsr that in 2010, public four-year colleges charged, on average, $7,605
per year in tuition and fees for in-state students. A sample of 20 public four-year colleges
collected in 2011 indicated a sample mean of $8,039 and the sample standard deviation
was $1,950. Is there sufficient evidence to conclude that the average in-state tuition has
increased?
8.34.
The weights of grapefruit follow a normal distribution. A random sample of 12 new hybrid
grapefruit had a mean weight of 1.7 pounds with standard deviation 0.24 pounds. Find a
95% confidence interval for the mean weight of the population of the new hybrid grapefruit.
155
8.35.
The Mountain View Credit Union claims that the average amount of money owed on their
car loans is $ 7,500. Suppose a random sample of 45 loans shows the average amount
owed equals $8,125, with standard deviation $4,930. Does this indicate that the average
amount owed on their car loans is not $7,500? Use a level of significance = 0.01. Would
your conclusion have changed if you used = 0.05?
8.36.
An overnight package delivery service has a promotional discount rate in effect this week
only. For several years the mean weight of a package delivered by this company has been
10.7 ounces. However, a random sample of 12 packages mailed this week gave the following
weights in ounces:
12.1
15.3
9.5
10.5
14.2
8.8
10.6
11.4
13.7
15.0
9.5
11.1
Use a 1% level of significance to test the claim that the packages are averaging more
than 10.7 ounces during the discount week.
8.37.
Some people claim that during US elections, the taller of the two major party candidates
tends to prevail. Here are some data on the last 15 elections (heights are in cm).
Year
Winning candidate
Losing candidate
Year
Winning candidate
Losing candidate
2008
185
175
1976
177
183
2004
182
193
1972
182
185
2000
182
185
1968
182
180
1996
188
187
1964
193
180
1992
188
188
1960
183
182
1988
188
173
1956
179
178
1984
185
180
1980
185
177
1952
179
178
Test the hypothesis that the winning candidates tend to be taller, on average.
8.38.
An item in USA Today reported that 63% of Americans owned a mobile browsing device.
A survey of 143 employees at a large school showed that 85 owned a mobile browsing
device. At = 0.02, test the claim that the percentage is the same as stated in USA
Today.
8.39.
A poll by CNN revealed that 47% of Americans approve of the job performance of the
President. The poll was based on a random sample of 537 adults.
a) Find the 95% margin of error for this poll.
b) Based on your result in part (a), test the hypothesis H0 : p = 0.5 where p is
the proportion of all American adults that approve of the job performance of the
President. Do not compute the test statistic and p-value.
c) Would you have also reached the same conclusion for H0 : p = 0.45?
8.40.
Find a poll cited in a newspaper, web site or other news source, with a mention of the
sample size and the margin of error. (For example, rasmussenreports.com frequently
discuss their polling methods.) Confirm the margin of error presented by the pollsters,
using your own calculations.
156
NOTES
Notes
i
Chapter 9
Linear Regression
In science and engineering, there is often a need to investigate the relationship between
two continuous random variables.
Suppose that, for every case observed, we record two variables, X and Y . The linear
relationship between X and Y means that E (Y ) = b0 + b1 X.
X variable is usually called predictor or independent variable and Y variable is the
response or dependent variable; the parameters are slope b1 and intercept b0 .
Example 9.1.
Imagine that we are opening an ice cream stand and would like to be able to predict how
many customers we will have. We might use the temperature as a predictor. We decided
to collect data over a 30-week period from March to July.s
Week
Mean temp
Consumption
1
41
0.386
2
56
0.374
3
63
0.393
4
68
0.425
5
69
0.406
6
65
0.344
7
61
0.327
8
47
0.288
9
32
0.269
10
24
0.256
Week
Mean temp
Consumption
11
28
0.286
12
26
0.298
13
32
0.329
14
40
0.318
15
55
0.381
16
63
0.381
17
72
0.47
18
72
0.443
19
67
0.386
20
60
0.342
Week
Mean temp
Consumption
21
44
0.319
22
40
0.307
23
32
0.284
24
27
0.326
25
28
0.309
26
33
0.359
27
41
0.376
28
52
0.416
29
64
0.437
30
71
0.548
9.1
Correlation coefficient
Cov(X, Y )
X Y
157
158
0.6
0.4
0.5
0.3
0.2
20
30
40
50
60
70
80
P
n
n
SSX SSY
2
2
i=1 (Xi X)
i=1 (Yi Y )
You can recognize the summation on top as a discrete version of Cov(X, Y ) and the sums
on the bottom as part of the computation for the sample variances of X, Y . These are
P P
X
X Y
SSXY =
XY
,
n
P
P
X
X
( X)2
( Y )2
2
2
SSX =
X
, SSY =
Y
n
n
All sums are taken from 1 to n.
2 = SS /(n 1).
For example, the sample variance of X is SX
X
Lets review the properties of the correlation coefficient and its sample estimate, r:
the sign of r points to positive (when X increases, Y increases too) or negative (when
one increases, the other decreases) relationship
1 r 1, with +1 being a perfect positive and 1 a perfect negative relationship
r 0 means no linear relationship between X and Y (caution: there can still be a
non-linear relationship!)
r is dimensionless, and it does not change when X or Y are linearly transformed.
159
6
20
10
50
100
Y4
Y3
30
2
0
Y2
6
4
6
4
1
0
1
2
3
40
150
9.2
i = 1, ..., n
n
X
(Yi b0 b1 Xi )2
i=1
(SSE is for the Sum of Squared Errors, however the quantities Yi b0 b1 Xi are usually
referred to as residuals.)
To find the minimum, we would calculate partial derivatives of SSE with respect to b0 , b1 .
Solving the resulting system of equations, we get the following
Theorem 9.1. Least squares estimates
The estimates for the regression equation Yi = b0 + b1 Xi + i ,
Slope b1 =
SSXY
SY
=r
SSX
SX
i = 1, ..., n are:
and Intercept b0 = Y b1 X
Example 9.2.
To illustrate the computations, lets consider another data set. Here, X = amount of
tannin in the larva food, and Y = growth of insect larvae.t
X
0
1 2
3 4 5 6 7 8
Y 12 10 8 11 6 7 2 3 3
Estimate the regression equation and correlation coefficient.
Solution.
P
P
P 2
P 2
P
X = 36,
Y = 62,
X = 204,
Y = 536,
XY = 175.
Therefore,
X = 36/9 = 4,
Y = 62/9 = 6.89,
160
and finally,
b1 = 73/60 = 1.22,
b0 = 6.89 (1.22)4 = 11.76,
Thus, we get the equation
Y = 11.76 1.22X
r = 0.903
that is interpretable as a prediction for any given value X. In practice, the accuracy of
prediction depends on X, see the next Section.
Example 9.3.
For the data in Example 9.1,
a) Calculate and plot the least squares regression line
b) Predict the consumption when X = 50 F .
Solution.
(a) For the hand calculation, X = 49.1, Y = 0.3594, SSX = 7820.7 SSY = 0.1255 and
SSXY = 24.30. We obtain the following estimates (can also be done by a computer)
b0 = 0.2069,
b1 = 0.003107
and
r = 0.776
0.6
These can be used to plot the regression line (Fig. 9.3) and make predictions. Can you
interpret the slope and the intercept for this problem in plain English?
0.5
0.4
0.2
0.3
20
30
40
50
60
70
80
Figure 9.3: Least squares regression line for the ice cream example
(b) Y = b0 + b1 X = 0.2069 + 0.003107(50) = 0.362 pints per person.
9.3
161
The error variance 2 determines the amount of scatter of the Y -values about the line.
That is, it reflects the uncertainty of prediction of Y using X.
Its sample estimate is
Pn
[Yi (b0 + b1 Xi )]2
SSE
(1 r2 )SSY
2
S =
= i=1
=
,
n2
n2
n2
where SSE is the Sum of Squared Errors (Residuals). It is divided by n 2 because two
degrees of freedom have been used up when estimating b0 , b1 . The estimate of S can be
obtained by hand or using the computer output.
The values Yi = b0 + b1 Xi are called predicted or fitted values of Y.
The differences
Actual Predicted Yi Yi = ei , i = 1, ..., n
are called residuals.
The least squares esimates for slope and intercept can be viewed as sample estimates for
the true (unknown) slope and intercept. We can apply the same methods we have
done for, say, estimating the unknown mean . To make confidence intervals and perform
hypothesis testing for the slope and intercept, we will need standard errors (that is, the
estimates of standard deviations) of their estimates.
100%(1 ) CIs for regression parameters are then found as
Estimate t/2 (Std.Error),
t has df = n 2
The standard errors for slope and intercept can be obtained using the formulas
s
S
1
(X)2
SEb1 =
SEb0 = S
+
2
n (n 1)SX
SX n 1
or using the computer output. Notice that the errors decrease at the familiar
sample size n grows.
n rate, as
Example 9.4.
Continuing the analysis of data from Example 9.1, lets examine a portion of computer
output (done by R statistical package).
Estimate
(Intercept) 0.2069
X
0.003107
Std.Error
0.0247
0.000478
t-value
8.375
6.502
Pr(>|t|)
4.13e-09
4.79e-07
We can calculate confidence intervals and hypothesis tests for the parameters b0 and b1 .
The 95% C.I. for the slope b1 is
0.003107 2.048(0.000478) = [0.002128, 0.004086]
To test the hypothesis H0 : b1 = 0 we could use the test statistic
t=
Estimate
Std.Error
162
For the above data, we have t = 0.003107/0.000478 = 6.502, as reported in the table.
The p-values for this test can be found using a t-table; they are also reported by the
computer. Above, the reported p-value of 4.79e-07 is very small, meaning that the
hypothesis H0 : b1 = 0 is strongly rejected.
Another part of the output will be useful later. This is a so-called ANOVA (ANalysis
Of VAriance) table1 :
temperature
Residuals
Df
1
28
Sum Sq
0.075514
0.050009
Mean Sq
0.075514
0.001786
F value
42.28
Pr(>F)
4.789e-07
Here, we are interested in the Mean Square of Residuals S 2 = 0.001786. Also note that
the p-value (here given as Pr(>F)) coincides with the T-test p-value for slope.
9.3.1
To test whether the true slope equals 0 (and therefore the linear relationship does not
b1
exist), we can use the test statistic t =
.
SEb1
In terms of correlation r, the above test can be calculated more easily using the test
statistic
r
n2
t=r
, df = n 2
1 r2
Strictly speaking, this is for testing correlation
H0 : = 0 versus HA : 6= 0
but = 0 and b1 = 0 are equivalent statements.
Example 9.5.
For a relationship between Population size and Divorce rate in n = 20 American cities the
correlation of 0.28 was found. Is there a significant linear relationship between Population
size and Divorce rate?
Solution.
r
t = 0.28
20 2
= 1.23
1 0.282
with df = 18
From T-table (comparing with table value t = 1.33), p value > 2(0.1) = 0.2. Since
p-value is larger than our default level = 0.05, do not reject H0 . Thus, we can claim
no significant evidence of the linear relationship between Population size and Divorce
rate.
9.3.2
In addition to the C.I.s for b0 and b1 , we might be interested in the uncertainty of estimating Y-values given the particular value of X.
1
163
0.6
Example 9.6.
Continuing the analysis of data from Example 9.1, calculate both 95% confidence and
prediction intervals for the ice cream consumption when temperature is 70 F
0.5
0.4
0.2
0.3
20
30
40
50
60
70
80
Figure 9.4: Confidence (solid lines) and prediction bands (broken lines) for the ice cream
example
Solution. Y = b0 + b1 x = 0.2069+(0.003107)70 = 0.4244, and using the computer output
in Example 9.4, we will get
1
(x X)2
S = Mean Sq Residuals = 0.001786 = 0.0423 and +
2 = 0.0892. Then,
n (n 1)SX
with t/2 = 2.048 (df = 28),
CI
0.4244 0.0259,
PI
0.4244 0.0904
164
For comparison, both intervals are plotted in Fig. 9.4 for various values of x . Note that
the 95% prediction band (broken lines) contains all but one observation.
9.3.3
To check the assumption of linear relationship and the constant variance ( 2 ) of the
residuals, we might make a plot of Residuals ei = Yi Yi versus Predicted (Fitted) values.
If there is any trend or pattern in the residuals, then the assumptions for linear regression
0.05
0.00
Residual
0.10
0.05
0.30
0.35
0.40
Fitted values
Exercises
9.1.
In the file http://www.nmt.edu/~olegm/382book/cars2010.csv, there are some data
on several 2010 compact car models. The variables are: engine displacement (liters), city
MPG, highway MPG, and manufacturers suggested price.
a) Is the car price related to its highway MPG?
b) Is there a relationship between city and highway MPG?
Use scatterplots, calculate and interpret the correlation coefficient, test to determine if
there is a linear relationship.
For part (b), also compute and interpret the regression equation. Plot the regression line
on the scatterplot. Plot the residuals versus predicted values. Does the model fit well?
165
9.2.
The following is an illustration of famous Moores Law for computer chips. X = Year
(minus 1900, for ease of computation), Y = number of transistors (in 1000)
X
Y
71
2.3
79
31
83
110
85
280
90
93
1200 3100
95
5500
Yi = a0 ea1 Xi
then
ln Yi = ln a0 + a1 Xi
That is, doing the linear regression analysis of ln Y on X will help recover the
exponential growth. Make the regression analysis of ln Y on X. Does this model do
a good job fitting the data?
c) Predict the number of transistors in the year 2005. Did this prediction come true?
9.3.
We are trying to model the relationship between X = Year (1981 = Year 1), and Y =
Corn crop in US (in millions of tons). The following data were obtained for years 1981 to
1996:
Year
Crop
mean
8.5
112.9
st.dev.
4.76
15.06
3.6
79
1.8
54
3.3
74
2.3
62
4.5
85
2.9
55
4.7
88
3.6
85
1.9
51
Perform the regression analysis of Y on X. Interpret the slope and give a 95% confidence
interval for the slope.
166
9.6.
Are stock prices predictable? The following is a data set of the daily increases (in percent)
of the S&P500 companies stock prices for 21 consecutive business days. Is the next day
increase significantly correlated to the last days increase? [Hint: X and Y are both in
here!]
0.16
9.7.
Does the price of the first-class postal stamp follow linear regression, or some other pattern?v
Year (since 1900)
Price (cents)
Year (since 1900)
Price (cents)
32 58 63 68 71
3
4
5
6
8
101 102 106 108
34
37
39
42
74 75 78
10 13 15
111 112
44
45
81
20
85
22
88
25
91
29
95
32
99
33
Chapter 10
10.1
This is a test for the fit of the sample proportions to given numbers. Suppose that we have
observations that can be classified into each of k groups (categorical data). We would like
to test
H0 : p1 = p01 , p2 = p02 , ... , pk = p0k
HA : some of the pi s are unequal to p0i s
0
where pi is the probability that
P a subject
P 0 will belong to group i and pi , i = 1, ..., k are
given numbers. (Note that
pi =
pi = 1, so that pk can actually be obtained from
the rest of pi s.)
168
To adjust for the size of each group, we would take the squared difference divided by
Ei , that is (Ei Xi )2 /Ei . Adding up, we obtain the
Chi-square statistic
2 =
k
X
(Ei Xi )2
i=1
(10.1)
Ei
0.4
0.5
We would reject H0 when 2 statistic is large (that is, the Observed counts are far from
Expected counts). Thus, our test is always one-sided. To find the p-value, use 2 uppertail probability table very much like the t-table. See Table C.
0.0
0.1
0.2
f(x)
0.3
df = 2
df = 5
df = 10
10
15
20
169
0.050
1
2
3
4
5
2.706
4.605
6.251
7.779
9.236
3.841
5.991
7.815
9.488
11.070
5.024
7.378
9.348
11.143
12.833
6.635
9.210
11.345
13.277
15.086
7.879
10.597
12.838
14.860
16.750
10.828
13.816
16.266
18.467
20.515
12.116
15.202
17.730
19.997
22.105
6
7
8
9
10
10.645
12.017
13.362
14.684
15.987
12.592
14.067
15.507
16.919
18.307
14.449
16.013
17.535
19.023
20.483
16.812
18.475
20.090
21.666
23.209
18.548
20.278
21.955
23.589
25.188
22.458
24.322
26.124
27.877
29.588
24.103
26.018
27.868
29.666
31.420
11
12
13
14
15
17.275
18.549
19.812
21.064
22.307
19.675
21.026
22.362
23.685
24.996
21.920
23.337
24.736
26.119
27.488
24.725
26.217
27.688
29.141
30.578
26.757
28.300
29.819
31.319
32.801
31.264
32.909
34.528
36.123
37.697
33.137
34.821
36.478
38.109
39.719
16
17
18
19
20
23.542
24.769
25.989
27.204
28.412
26.296
27.587
28.869
30.144
31.410
28.845
30.191
31.526
32.852
34.170
32.000
33.409
34.805
36.191
37.566
34.267
35.718
37.156
38.582
39.997
39.252
40.790
42.312
43.820
45.315
41.308
42.879
44.434
45.973
47.498
21
22
23
24
25
29.615
30.813
32.007
33.196
34.382
32.671
33.924
35.172
36.415
37.652
35.479
36.781
38.076
39.364
40.646
38.932
40.289
41.638
42.980
44.314
41.401
42.796
44.181
45.559
46.928
46.797
48.268
49.728
51.179
52.620
49.011
50.511
52.000
53.479
54.947
30
40
60
80
100
40.256
51.805
74.397
96.578
118.498
43.773
55.758
79.082
101.879
124.342
46.979
59.342
83.298
106.629
129.561
50.892
63.691
88.379
112.329
135.807
53.672
66.766
91.952
116.321
140.169
59.703
73.402
99.607
124.839
149.449
62.162
76.095
102.695
128.261
153.167
0.001
0.0005
Degrees of
freedom
170
Example 10.1.
When studying earthquakes, we recorded the following numbers of earthquakes (1 and
above on Richter scale) for 7 consecutive days in January 2008.
Day
1
2
3
4
5
6
7
Count 85
98
79
118
112
135
137
Expected 109.1 109.1 109.1 109.1 109.1 109.1 109.1
Total
764
764
Here, n = 764. Is there evidence that the rate of earthquake activity changes during this
week?
Solution. If the null hypothesis H0 : p1 = p2 = ... = p7 were true, then each pi = 1/7,
i = 1, ..., 7. Thus, we can find the expected counts Ei = 764/7 = 109.1.
Results: 2 = 28.8, df = 6, p-value < 0.0005 from Table C.(The highest number there,
24.103, corresponds to upper tail area 0.0005.) Since the p-value is small, we reject H0
and claim that the earthquake frequency does change during the week.2
Example 10.2.
In this example, we will test whether a paricular distribution matches our experimental
results. These are the data from the probability board (quincunx), we test if the distribution is really Binomial (as is often claimed). The slots are labeled 0-19. Some slots were
merged together (why?)
Slots
Observed
Expected
0-6
16
8.4
7
2
9.6
8
11
14.4
9
18
17.6
10
14
17.6
11
14
14.4
12
7
9.6
13-19
18
8.4
Total
100
100
Solution. The expected counts are computed using Binomial(n = 19, p = 0.5) distribution,
and then multiplying by the T otal = 100. For example,
19
E9 =
0.59 (1 0.5)199 100 = 17.6
9
Next, 2 = 26.45, df = 7, and p-value < 0.0005.
Conclusion: Reject H0 , the distribution is not exactly Binomial.
10.2
This test is applied to the category probabilities for two variables. Each case is classified
according to variable 1 (for example, Gender) and variable 2 (for example, College Major).
The data are usually given in a cross-classification table (a 2-way table). Let Xij be the
observed table counts for row i and column j.
We are interested in testing whether Variable 1 (in r rows) is independent of Variable 2
(in c columns).3
2
We did not specify for this example. As mentioned earlier, = 0.05 is a good default choice.
Even if we pick a conservative = 0.01, we would still reject H0 here.
3
These are not random variables in the sense of Chapter 3, because they are categorical, not numerical.
171
2 =
r X
c
X
(Eij Xij )2
Eij
(10.2)
i=1 j=1
Example 10.3.
Suppose that we ordered 50 components from each of the vendors A, B and C, and the
results are as follows
Vendor A
Vendor B
Vendor C
Succeeded
49
45
41
Failed
1
5
9
Total
50
50
50
We would like to investigate whether all the vendors are equally reliable. That is,
H0 : Failure rate is independent of Vendor
HA : Not all Vendors have the same failure rate
Solution. Well put all the expected counts into the table
Expected counts:
Succeeded
Failed
Total
Vendor A
45
5
50
Vendor B
45
5
50
Vendor C
45
5
50
_____________________________________________________
Total
135
15
150
The 2 statistic will have df = (3 1)(2 1) = 2.
Here, 2 = (45 49)2 /45 + (1 5)2 /5 + ... = 7.11. Since 2 statistic is between table
values 5.991 and 7.378, the p-value is between 0.025 and 0.05. At the standard = 0.05
we are rejecting H0 . Thus, there is evidence that vendors have different failure rates.4
4
For this particular example, since df = 2, there is a more exact p-value calculation based on Exponential distribution: P (Y > 7.11) = exp(7.11/2) = 0.0286. For df 6= 2, we can use R function pchisq,
Excel function chidist or other software to compute the exact p-values.
172
NOTES
Exercises
10.1.
In testing how well people can generate random patterns, the researchers asked everyone
in a group of 20 people to write a list of 5 random digits. The results are tabulated below
Digits
Observed
0
6
1
11
2
10
3
13
4
8
5
13
6
7
7
17
8
8
9
7
Total
100
Are the digits completely random or do humans have preference for some particular digits
over the others?
10.2.
Forensic statistics. To uncover rigged elections, a variety of statistical tests might be
applied. For example, made-up precinct totals are sometimes likely to have an excess of 0
or 5 as their last digits. For a city election, the observers counted that 21 precinct totals
had the last digit 0, 18 had the last digit 5, while 102 had some other last digit. Is there
evidence that the elections were rigged?
10.3.
In an earlier example of Poisson distribution, we discussed the number of Nazi bombs
hitting 0.5 0.5km squares in London. The following were counts of squares that have
0, 1, 2, ... hits:
number of hits
count
0
229
1
211
2
93
3
35
4 and up
8
Test whether the data fit the Poisson distribution (for p01 , ...p0k use the Poisson probabilities,
with the parameter estimated as average number of hits per square, = 0.9288).
10.4.
To test the attitudes to a tax reform, the state officials collected data of the opinions of
likely voters, along with their income level
For
Against
Income Level:
Low
Medium
182
213
154
138
High
203
110
Do the people with different incomes have significantly different opinions on tax reform?
(That is, test whether the Opinion variable is independent of Income variable.)
10.5.
Using exponential distribution, confirm the calculation of chi-square (df = 2) critical
points from Table C for upper tail area = 0.1 and = 0.005. Find the point for
2 (df = 2) distribution with = 0.2
Notes
s
Kotswara Rao Kadilyala (1970). Testing for the independence of regression disturbances Econometrica, 38, 97-117. Appears in: A Handbook of Small Data Sets, D. J. Hand, et al, editors (1994). Chapman
and Hall, London.
t
from The R book by Michael Crawley
u
Mlodinow again. The director, Sherry Lansing, was subsequently fired only to see several films developed during her tenure, including Men In Black, hit it big.
v
see http://www.akdart.com/postrate.html
Appendix
173
174
NOTES
Table D
Binomial CDF: F (x) =
x
X
n
k=0
pk (1 p)nk
n=5
x
p
.01
.05
.1
.2
.3
.4
.5
.6
.7
.8
.9
.95
.99
0
1
2
3
4
5
.951
.999
1
1
1
1
.774
.977
.999
1
1
1
.59
.919
.991
1
1
1
.328
.737
.942
.993
1
1
.168
.528
.837
.969
.998
1
.078
.337
.683
.913
.99
1
.031
.187
.5
.812
.969
1
.01
.087
.317
.663
.922
1
.002
.031
.163
.472
.832
1
0
.007
.058
.263
.672
1
0
0
.009
.081
.41
1
0
0
.001
.023
.226
1
0
0
0
.001
.049
1
p
.01
.05
.1
.2
.3
.4
.5
.6
.7
.8
.9
.95
.99
0
1
2
3
4
5
6
.941
.999
1
1
1
1
1
.735
.967
.998
1
1
1
1
.531
.886
.984
.999
1
1
1
.262
.655
.901
.983
.998
1
1
.118
.42
.744
.93
.989
.999
1
.047
.233
.544
.821
.959
.996
1
.016
.109
.344
.656
.891
.984
1
.004
.041
.179
.456
.767
.953
1
.001
.011
.07
.256
.58
.882
1
0
.002
.017
.099
.345
.738
1
0
0
.001
.016
.114
.469
1
0
0
0
.002
.033
.265
1
0
0
0
0
.001
.059
1
n=6
n=7
x
p
.01
.05
.1
.2
.3
.4
.5
.6
.7
.8
.9
.95
.99
0
1
2
3
4
5
6
7
.932
.998
1
1
1
1
1
1
.698
.956
.996
1
1
1
1
1
.478
.85
.974
.997
1
1
1
1
.21
.577
.852
.967
.995
1
1
1
.082
.329
.647
.874
.971
.996
1
1
.028
.159
.42
.71
.904
.981
.998
1
.008
.063
.227
.5
.773
.938
.992
1
.002
.019
.096
.29
.58
.841
.972
1
0
.004
.029
.126
.353
.671
.918
1
0
0
.005
.033
.148
.423
.79
1
0
0
0
.003
.026
.15
.522
1
0
0
0
0
.004
.044
.302
1
0
0
0
0
0
.002
.068
1
p
.01
.05
.1
.2
.3
.4
.5
.6
.7
.8
.9
.95
.99
0
1
2
3
4
5
6
7
8
.923
.997
1
1
1
1
1
1
1
.663
.943
.994
1
1
1
1
1
1
.43
.813
.962
.995
1
1
1
1
1
.168
.503
.797
.944
.99
.999
1
1
1
.058
.255
.552
.806
.942
.989
.999
1
1
.017
.106
.315
.594
.826
.95
.991
.999
1
.004
.035
.145
.363
.637
.855
.965
.996
1
.001
.009
.05
.174
.406
.685
.894
.983
1
0
.001
.011
.058
.194
.448
.745
.942
1
0
0
.001
.01
.056
.203
.497
.832
1
0
0
0
0
.005
.038
.187
.57
1
0
0
0
0
0
.006
.057
.337
1
0
0
0
0
0
0
.003
.077
1
n=8
NOTES
175
Table D (continued)
n=9
x
0
1
2
3
4
5
6
7
8
9
p
.01
.914
.997
1
1
1
1
1
1
1
1
.05
.63
.929
.992
.999
1
1
1
1
1
1
.1
.387
.775
.947
.992
.999
1
1
1
1
1
.2
.134
.436
.738
.914
.98
.997
1
1
1
1
.3
.04
.196
.463
.73
.901
.975
.996
1
1
1
.4
.01
.071
.232
.483
.733
.901
.975
.996
1
1
.5
.002
.02
.09
.254
.5
.746
.91
.98
.998
1
.6
0
.004
.025
.099
.267
.517
.768
.929
.99
1
.7
0
0
.004
.025
.099
.27
.537
.804
.96
1
.8
0
0
0
.003
.02
.086
.262
.564
.866
1
.9
0
0
0
0
.001
.008
.053
.225
.613
1
.95
0
0
0
0
0
.001
.008
.071
.37
1
.99
0
0
0
0
0
0
0
.003
.086
1
p
.01
.904
.996
1
1
1
1
1
1
1
1
1
.05
.599
.914
.988
.999
1
1
1
1
1
1
1
.1
.349
.736
.93
.987
.998
1
1
1
1
1
1
.2
.107
.376
.678
.879
.967
.994
.999
1
1
1
1
.3
.028
.149
.383
.65
.85
.953
.989
.998
1
1
1
.4
.006
.046
.167
.382
.633
.834
.945
.988
.998
1
1
.5
.001
.011
.055
.172
.377
.623
.828
.945
.989
.999
1
.6
0
.002
.012
.055
.166
.367
.618
.833
.954
.994
1
.7
0
0
.002
.011
.047
.15
.35
.617
.851
.972
1
.8
0
0
0
.001
.006
.033
.121
.322
.624
.893
1
.9
0
0
0
0
0
.002
.013
.07
.264
.651
1
.95
0
0
0
0
0
0
.001
.012
.086
.401
1
.99
0
0
0
0
0
0
0
0
.004
.096
1
p
.01
.895
.995
1
1
1
1
1
1
1
1
1
1
.05
.569
.898
.985
.998
1
1
1
1
1
1
1
1
.1
.314
.697
.91
.981
.997
1
1
1
1
1
1
1
.2
.086
.322
.617
.839
.95
.988
.998
1
1
1
1
1
.3
.02
.113
.313
.57
.79
.922
.978
.996
.999
1
1
1
.4
.004
.03
.119
.296
.533
.753
.901
.971
.994
.999
1
1
.5
0
.006
.033
.113
.274
.5
.726
.887
.967
.994
1
1
.6
0
.001
.006
.029
.099
.247
.467
.704
.881
.97
.996
1
.7
0
0
.001
.004
.022
.078
.21
.43
.687
.887
.98
1
.8
0
0
0
0
.002
.012
.05
.161
.383
.678
.914
1
.9
0
0
0
0
0
0
.003
.019
.09
.303
.686
1
.95
0
0
0
0
0
0
0
.002
.015
.102
.431
1
.99
0
0
0
0
0
0
0
0
0
.005
.105
1
n = 10
x
0
1
2
3
4
5
6
7
8
9
10
n = 11
x
0
1
2
3
4
5
6
7
8
9
10
11
176
NOTES
Table D (continued)
n = 15
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
p
.01
.86
.99
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.05
.463
.829
.964
.995
.999
1
1
1
1
1
1
1
1
1
1
1
.1
.206
.549
.816
.944
.987
.998
1
1
1
1
1
1
1
1
1
1
.2
.035
.167
.398
.648
.836
.939
.982
.996
.999
1
1
1
1
1
1
1
.3
.005
.035
.127
.297
.515
.722
.869
.95
.985
.996
.999
1
1
1
1
1
.4
0
.005
.027
.091
.217
.403
.61
.787
.905
.966
.991
.998
1
1
1
1
.5
0
0
.004
.018
.059
.151
.304
.5
.696
.849
.941
.982
.996
1
1
1
.6
0
0
0
.002
.009
.034
.095
.213
.39
.597
.783
.909
.973
.995
1
1
.7
0
0
0
0
.001
.004
.015
.05
.131
.278
.485
.703
.873
.965
.995
1
.8
0
0
0
0
0
0
.001
.004
.018
.061
.164
.352
.602
.833
.965
1
.9
0
0
0
0
0
0
0
0
0
.002
.013
.056
.184
.451
.794
1
.95
0
0
0
0
0
0
0
0
0
0
.001
.005
.036
.171
.537
1
.99
0
0
0
0
0
0
0
0
0
0
0
0
0
.01
.14
1
p
.01
.818
.983
.999
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.05
.358
.736
.925
.984
.997
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.1
.122
.392
.677
.867
.957
.989
.998
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.2
.012
.069
.206
.411
.63
.804
.913
.968
.99
.997
.999
1
1
1
1
1
1
1
1
1
1
.3
.001
.008
.035
.107
.238
.416
.608
.772
.887
.952
.983
.995
.999
1
1
1
1
1
1
1
1
.4
0
.001
.004
.016
.051
.126
.25
.416
.596
.755
.872
.943
.979
.994
.998
1
1
1
1
1
1
.5
0
0
0
.001
.006
.021
.058
.132
.252
.412
.588
.748
.868
.942
.979
.994
.999
1
1
1
1
.6
0
0
0
0
0
.002
.006
.021
.057
.128
.245
.404
.584
.75
.874
.949
.984
.996
.999
1
1
.7
0
0
0
0
0
0
0
.001
.005
.017
.048
.113
.228
.392
.584
.762
.893
.965
.992
.999
1
.8
0
0
0
0
0
0
0
0
0
.001
.003
.01
.032
.087
.196
.37
.589
.794
.931
.988
1
.9
0
0
0
0
0
0
0
0
0
0
0
0
0
.002
.011
.043
.133
.323
.608
.878
1
.95
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.003
.016
.075
.264
.642
1
.99
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.001
.017
.182
1
n = 20
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
NOTES
177
Table E
Poisson CDF: F (x) =
x
X
e k
k=0
k!
.1
.2
.3
.4
.5
.6
.7
.8
.9
1.5
0
1
2
3
4
5
6
7
8
.905
.995
1
1
1
1
1
1
1
.819
.982
.999
1
1
1
1
1
1
.741
.963
.996
1
1
1
1
1
1
.67
.938
.992
.999
1
1
1
1
1
.607
.91
.986
.998
1
1
1
1
1
.549
.878
.977
.997
1
1
1
1
1
.497
.844
.966
.994
.999
1
1
1
1
.449
.809
.953
.991
.999
1
1
1
1
.407
.772
.937
.987
.998
1
1
1
1
.368
.736
.92
.981
.996
.999
1
1
1
.223
.558
.809
.934
.981
.996
.999
1
1
.135
.406
.677
.857
.947
.983
.995
.999
1
2.5
3.5
4.5
5.5
6.5
7.5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.082
.287
.544
.758
.891
.958
.986
.996
.999
1
1
1
1
1
1
1
1
1
1
1
.05
.199
.423
.647
.815
.916
.966
.988
.996
.999
1
1
1
1
1
1
1
1
1
1
.03
.136
.321
.537
.725
.858
.935
.973
.99
.997
.999
1
1
1
1
1
1
1
1
1
.018
.092
.238
.433
.629
.785
.889
.949
.979
.992
.997
.999
1
1
1
1
1
1
1
1
.011
.061
.174
.342
.532
.703
.831
.913
.96
.983
.993
.998
.999
1
1
1
1
1
1
1
.007
.04
.125
.265
.44
.616
.762
.867
.932
.968
.986
.995
.998
.999
1
1
1
1
1
1
.004
.027
.088
.202
.358
.529
.686
.809
.894
.946
.975
.989
.996
.998
.999
1
1
1
1
1
.002
.017
.062
.151
.285
.446
.606
.744
.847
.916
.957
.98
.991
.996
.999
.999
1
1
1
1
.002
.011
.043
.112
.224
.369
.527
.673
.792
.877
.933
.966
.984
.993
.997
.999
1
1
1
1
.001
.007
.03
.082
.173
.301
.45
.599
.729
.83
.901
.947
.973
.987
.994
.998
.999
1
1
1
.001
.005
.02
.059
.132
.241
.378
.525
.662
.776
.862
.921
.957
.978
.99
.995
.998
.999
1
1
0
.003
.014
.042
.1
.191
.313
.453
.593
.717
.816
.888
.936
.966
.983
.992
.996
.998
.999
1
178
NOTES
Table E (continued)
x
0
1
2
3
4
5
9
0
.001
.006
.021
.055
.116
10
0
0
.003
.01
.029
.067
11
0
0
.001
.005
.015
.038
12
0
0
.001
.002
.008
.02
13
0
0
0
.001
.004
.011
14
0
0
0
0
.002
.006
15
0
0
0
0
.001
.003
16
0
0
0
0
0
.001
17
0
0
0
0
0
.001
18
0
0
0
0
0
0
19
0
0
0
0
0
0
20
0
0
0
0
0
0
6
7
8
9
10
.207
.324
.456
.587
.706
.13
.22
.333
.458
.583
.079
.143
.232
.341
.46
.046
.09
.155
.242
.347
.026
.054
.1
.166
.252
.014
.032
.062
.109
.176
.008
.018
.037
.07
.118
.004
.01
.022
.043
.077
.002
.005
.013
.026
.049
.001
.003
.007
.015
.03
.001
.002
.004
.009
.018
0
.001
.002
.005
.011
11
12
13
14
15
.803
.876
.926
.959
.978
.697
.792
.864
.917
.951
.579
.689
.781
.854
.907
.462
.576
.682
.772
.844
.353
.463
.573
.675
.764
.26
.358
.464
.57
.669
.185
.268
.363
.466
.568
.127
.193
.275
.368
.467
.085
.135
.201
.281
.371
.055
.092
.143
.208
.287
.035
.061
.098
.15
.215
.021
.039
.066
.105
.157
16
17
18
19
20
.989
.995
.998
.999
1
.973
.986
.993
.997
.998
.944
.968
.982
.991
.995
.899
.937
.963
.979
.988
.835
.89
.93
.957
.975
.756
.827
.883
.923
.952
.664
.749
.819
.875
.917
.566
.659
.742
.812
.868
.468
.564
.655
.736
.805
.375
.469
.562
.651
.731
.292
.378
.469
.561
.647
.221
.297
.381
.47
.559
21
22
23
24
25
1
1
1
1
1
.999
1
1
1
1
.998
.999
1
1
1
.994
.997
.999
.999
1
.986
.992
.996
.998
.999
.971
.983
.991
.995
.997
.947
.967
.981
.989
.994
.911
.942
.963
.978
.987
.861
.905
.937
.959
.975
.799
.855
.899
.932
.955
.725
.793
.849
.893
.927
.644
.721
.787
.843
.888
26
27
28
29
30
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.999
.999
1
1
1
.997
.998
.999
1
1
.993
.996
.998
.999
.999
.985
.991
.995
.997
.999
.972
.983
.99
.994
.997
.951
.969
.98
.988
.993
.922
.948
.966
.978
.987
31
32
33
34
35
36
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.999
1
1
1
1
1
.998
.999
1
1
1
1
.996
.998
.999
.999
1
1
.992
.995
.997
.999
.999
1