Introduction To Probability and Mathematical Statistics
Introduction To Probability and Mathematical Statistics
PDFCompressor
Specia' Continuous Distributions
Notation and Parameters Continuous pdf fix) Mean Variance MGF M(t)
Student's t
X t(v)
v= 1,2,...
Snedecor's F
(y1 + v2
X F(v1, y2)
2
(y \ ) Ív2\ (vii!
'\!'2) \\2) \2
2
2v(v1 +v2-2) v1(v2 2)2(v2 4) y1 = 1,2,.
y2 = 1,2,.
BetaX - BETA(a,b)
O 0<b <a
*Not tractable. Does not exist.
V
v-2
2<v
F'(a)F(b)
O<x<1
O
l<v
\ VJ
(a+b+ l)(a+b)2
2<v2 4<v2 aa+b ab
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
Specia' Continuous Distributions
Notation and Parameters Continuous pdf f(x) M
ean Variance MGF M(t)
WeibuH
xtreme Value
ß 0<x E
X EV(0, 17) exp {[(x )/O] exp [(x-17)/0]} - yO
&T(i + Ot)
2O2
6 0<0
y 0.5772 (Euler's const.)
Cauchy
X CAU(O, ,j) 0<0
thr{1 + [(x )/0],}
Pareto
X - PAR(O, ,c)
+ X/O)K +1
¡ç 0(1
1)2
**
0<0
0<x 1<K 2<Chi-Square
X -'- x2(v) 2/2 U(v/2)
SECOND EDITION
Idaho
At Du::bur T homson
Learñing
Australia Canada Mexico Singapore Spain United Kingdom United States
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
The Duxbuiy Classic Series is a collection of authoritativeworks from respected authors.
Reissued as paperbacks, these successful titles are now more affordable.
For more information about this or any óther Duxbury product, contact:
DUXBTJRY 511 Forest Lodge Road Pacific Grove, CA 93950 USA
www.duxbury.com 1-800-423-0563 (Thomson Learning Academic
Resource Center)
All rights reserved. No part of this work may be reproduced, transcribed or used in any form or by any
meansgraphic, electronic, or mechanical, including photocopying, recording, taping, Web distribution,
or information storage and/or retrieval systemswithout the prior written permission of the publisher.
10 9 8 7 6 5 4 3 2
Bain, Lee J.
Introduction to probability and mathematical statistics / Lee J. Bain, Max Engelhardt.-2°'
ed. p. cm.(The Duxbuiy advanced series in statistics and decision sciences) Includes
(paperback) 1. Probabilities. 2. Mathematical statistics. I. Engelhardt, Max. II. Title. ifi. Series.
QA273.B2546 1991 519.2---dc2O
91-25923
dF
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS
CHA PTER
PROBABILITY i
1.1 Introduction 1 1.2 Notation
and terminology 2 1.3 Definition of probability 9 1.4 Some
properties of probability 13 1.5 Conditional probability 1 6 1.6 Counting techniques 3 1
Summary 4 2 Exercises 43
CHAPTER
CHAPTER
CHAPTER
36 4.2 Joint
4.1 Introduction 1 discrete distributions 137 4.3 Joint continuous distributions
144 4.4 Independent random variables 149 4.5 Conditional distributions 153 4.6 Random
samples 158 Summary 165 Exercises 165
CHAPTER
CHAPTER
CHA PTER
CHAPTER
CHAPTER
CHAPTER 10
SUFFICIENCY AND COMPLETENESS 335
CHAPTER 11
INTERVAL ESTIMATION 358
58 11.2 Confidence
11.1 Introduction 3 intervals 359 11.3
-
Pivotal quantity method 362 11.4
General method 369 11.5 Two-sample problems 377 11.6 Bayesian interval estimation 382
Summary 383 Exercises 384
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS ix
CI-/APTER 12
TESTS OF HYPOTHESES 389
89 12.2 Composite hypotheses 3 95 12.3 Tests for the normal
12 1 Introduction 3
distribution 398 12.4 Binomial tests 404 12.5 Poisson tests 406 12.6 Most powerful tests 406
12.7 Uniformly most powerful tests 411 12.8 Generalized likelihood ratio tests 417 12.9
436
Conditional tests 426 12.10 Sequential tests 428 Summary 435 Exercises -
CHAPTER 13
CONTINGENCY TABLES AND
GOODNESS-OF-FIT 442
42 13.2 One-sample
13.1 Introduction 4 binomial case 4 43 13.3 r-Sample binomial test
(completely specified H0) 444 13.4 One-sample multinomial 447 13.5 r-Sample
multinomial 4 48 13.6 Test for independence, r x c contingency table 450 13.7 Chi-squared
goodness-of-fit test 4 53 13.8 Other goodness-of-fit tests 4 57 Summary 4 61 Exercises 4 62
CHAPTER 14
NONPARAMETRIC METHODS 468
68 14.2 One-sample
14.1 Introduction 4 sign test 469 14.3 Binomial test (test on quantiles)
476
471 14.4 Two-sample sign test
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
14.5 14.6 14.7 14.8 14.9
15.1 15.2 15.3 15.4 15.5
16.1 16.2 16.3 16.4 16.5
CONTENTS
Wilcoxon paired-sample signed-rank test Paired-sample randomization test Wilcoxon and Mann-Whitney
(WMW) tests Correlation teststests of independence Wald-Wolfowjtz runs test Summary Exercises CHAPTER
J5*
477 482 483 486 492 494 495
REGRESSION AND LINEAR MODELS Introduction Linear regression Simple linear regression General
linear model Analysis of hivariate data Summary Exercises
CHAPTER
* 499
499 500 501 515 529 534 535
RELIABILITY AND SURVIVAL DISTRIBUTIONS 540 Introduction Reliability concepts Exponential
distribution Weibull distribution Repairable systems Summary Exercises
540 541 548 560 570 579 579
APPENDIXA REVIEWOFSETS 587 APPENDIX B SPECIAL DISTRIBUTIONS 594
APPENDIX C TABLES OF DISTRIBUTIONS 598 ANSWERS TO SELECTED EXERCISES
619 REFERENCES 638 INDEX 641
Advanced (or optional) topics
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PREFACE
and moment generating functions, which occurred somewhat later in the first
covering the basic material is calculus, with the lone exception of the material
matrices. This material can be omitted if so desired. Our intent was to produce
thexi
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
xii PREFACE
those who use the book will find it both interesting and informative.
ACKNOWLEDG M ENTS
Thanks also are due to the following users of the first edition who were kind
enough to relate their experiences to the authors: H. A. David, Iowa State Uni-
versity; Peter Griffin, California State UniversitySacramento.
Finally, special thanks are due for the moral support of our wives, Harriet Bain and Linda Engelhardt.
Lee J. Bain Max Engelhardt
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C H A P T
PROBtJ3ILITY
1.1
mathe- matical
model that makes it possible to describe or predict the observed value of some
characteristic of interest. As an example, consider the velocity of a falling body after a certain
length of time, t. The formula y = gt, where g 32.17 feet per second p er second, per second, o fa
body provides f alling a useful f rom rest mathematical i n a vacuum. model This f or i s the a n
velocity, e xample in o f feet a deterministic model. For such a model, carrying out repeated
experiments under ideal conditions would result in essentially the same velocity each time, and this
would be predicted by the model. On the other hand, such a model may not be adequate
when
the experiments are carried out under less than ideal conditions. There may be unknown or
well as measurement error or other factors that might cause the results to vary on different
performances of the I
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2 CHAPTER 1 PROBABILITY
and for which a deterministic model would not be appro. priate. For example,
called probability models (or probabilistic models). The term stochastic, which is
derived from the Greek word stochos, meaning "guess," is sometimes used
instead of the term probabilistic. A careful study of probability models requires
some familiarity with the nota- tion and terminology of set theory. We will
L2
NOTATION AND TERMINOLOGY
The term experiment refers to the process of obtaining an observed result of some
about which outcome will occur when the experiment is performed We will
assume that an experiment is repeatable under essentially the same conditions,
and that the set of all possible outcomes can be completely specified before
experimentation.
Definition
1.2.1
The set of all possible outcomes of an experiment is called the sample space, denoted
by S.Note Chat one and only one of the possible outcomes will occur on any given
trial of
the experiments.
Exaíipk i .2.1 An experiment consists of tossing two coins, and the observed face of each coin is
of interest. The set of possible outcomes may be represented by the sample space
which simply lists all possible pairings of the symbols H (heads) and T (tails). An
alternate way of representing such a sample space is to list all possible ordered
pairs of the numbers i and O, S = {(l, 1), (1, 0), (0, 1), (0, O)}, where, for example,
(1, 0) indicates that the first coin landed heads up and the second coin landed
tails up.
Example 1.2.2 Suppose that in Example 1.2.1 we were not interested in the individual outcomes
of the coins, but only in the total number of heads obtained from the two coins. An appropriate
sample space could then be written as S'1' = {0, 1, 2}. Thus, differ- ent sample spaces may be
appropriate for the same experiment, depending on the characteristic of interest.
Exampl& 1.2.3 If a coin is tossed repeatedly until a head occurs, then the natural sample space is S
= {H, TH, TTH, . . .}. If one is interested in the number of tosses required to obtain a head, then
a possible sample space for this experiment would be the set of all positive integers, S'1 = {1, 2,
3, . .}, and the outcomes would correspond directly to the number of tosses required to obtain the
.
first head. We will show in the next chapter that an outcome corresponding to a sequence of
tosses in which a head is never obtained need not be included in the sample space.
Exampk 1.2.4 A light bulb is placed in service and the time of operation until it burns out is
measured, At least conceptually, the sample space for this experiment can be taken to be the
set of nonnegative real numbers, S = {tIO t < c}. Note that if the actual failure time could be
measured only to the nearest hour, then the sample space for the actual observed failure time
observable sample space, one might prefer to describe the properties and behavior of light bulbs
in terms of the conceptual sample space S. In cases of this type, the dis- creteness imposed by
measurement limitations is sufficiently negligible that it can be ignored, and both the measured
response and the conceptual response can be discussed relative to the conceptual sample space S.
comes, say S = {e1, e2 .....eN}, and it is said to be countably infinite if its out-
comes can be put into a one-to-one correspondence with the positive integers, say
S= {e1,e2,...}.
Definition 1.2.2
If a sample space S is either finite or countably infinite then it is called a thscrete
sample space.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER 1 PROBABILITY
A set that is either finite or countably infinite also i said to be countable. This is
the case in the first three examples. It is also true for the last example when
failure times are recorded to the nearest hour, but not for the conceptual sample
space. Because the conceptual space involves outcomes that may assume any
value in some interval of real numbers (i.e., the set of nonnegative real numbers),
it could be termed a continuous sample space, and it provides an example where a
experiments exist, the sample spaces of which also could be characterized as con-
tinuous, such as experiments involving two or more continuous responses.
Suppose a heat lamp is tested and X, the amount of light produced (in lumens), and Y, the amount of
be the Cartesian product of the set of all nonnegative real numbers with itself,
made during a 24-hour period. The observed result is the graph of a continuous real-valued
function f(t) defined on the time interval [0, 24] = {t O t 24}, and an appropriate sample space
Definition 1.2.3
An event is a subset of the sample space S. If A is an event, then A has occurred if it
contains the outcomes that correspond to the event of obtaining "at least one
head." As mentioned earlier, if one of the outcomes in A occurs, then we say that
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.2 NOTATION AND TERMINOLOGY
the event A has occurred. Similarly, if one of the outcomes in B = {HT, TH, TT} occurs, then we say that the
event "at least one tail" has occurred. Set notation and terminology provide a useful framework for
describing the possible outcomes and related physical events that may be of interest in an experiment.
As suggested above, a subset of outcomes corresponds to a physical event, and the event or the subset is
said to occur if any outcome in the subset occurs. The usual set operations of union, intersection, and
complement provide a way of expressing new events in terms of events that already have been defined. For
example, the event C of obtaining "at least one head and at least one tail" can be expressed as the
intersection of A and B, C = A n B = {HT, TH}. Simi- larly, the event "at least one head or at least one
tail" can be expressed as the union A u B {HH, HT, TH, TT}, and the event "no heads" can be expressed
as the complement of A relative to S, A' = {TT}.
A review of set notation and terminology is given in Appendix A. In general, suppose S is the sample space for
some experiments, and that A and B are events. The intersection A n B represents the outcomes of the event
"A and B," while the union A u B represents the event "A or B." The complement A' corresponds to the
event "not A." Other events also can be represented in terms of intersections, unions, and complements. For
example, the event "A but not B" is said to occur if the outcome of the experiment belongs to A n B', which
some- times is written as A - B. The event "exactly one of A or B" is said to occur if the outcome belongs to
(A n B') u (A' n B). The set A' n B' corresponds to the event "neither A nor B." The set identity A' n B'
= (A u B)' is another way to represent this event. This is one of the set properties that usually are referred
to as De Morgan's laws. The other such property is A' u B' = (A n B)'.
corresponds to the occurrence of the event "every A1; i = 1, ..., k." The occurrence of
A1 n n Ak (or fl A,)
an outcome in
the union A1 u u Ak (or A,) corresponds to the occurrence of the event
"at least one A,; i = 1, ..., k." Similar remarks apply in the case of a countably infinite collection A1, A2,
..., with the notations A1 n A2 n (or fl
A1) for
In a discrete sample space, any subset can be written as a countable union of elementary events, and
we have no difficulty in associating every subset with an event in the discrete
case.
In Example 1.2.1, the elementary events are {HH}, {HT}, {TH}, and {TT}, and any other event can be
written as a finite union of these elementary events. Simi- larly, in Example 1.2.3,
,and any
the elementary events are {H}, {TH}, {TTH}, . . .
event can be
represented as a countable union of these elementary events.
It is not as easy to represent events for the continuous examples. Rather than
attempting to characterize these events rigorously, we will discuss some examples.
In Example 1.2.4, the light bulbs could fail during any time interval, and any
interval of nonnegative real numbers would correspond to an interesting event
for that experiment. Specifically, suppose the time until failure is measured in
hours. The event that the light bulb "survives at most 10 hours" corresponds to
the interval A = [0, 10] = {tIO t 1O}. The event that the light bulb "survives
more than 10 hours" is A' = (10, cu) = {tI 10 < t < cc}. If B = [0, 1 5), then
C = B n A' = (10, 15) is the event of "failure between 10 and 15 hours."
In Example 1.2.5, any Cartesian product based on intervals of nonnegative real numbers would
correspond to an event of interest. For example, the event
(10, 20) x [5, cc) = {(x, y) 110 <x < 20 and 5 y < cc} corresponds to "the amount of light is
between 10 and 20 lumens and the amount of energy is at least 5 joules." Such
an event can be represented graphically as a rectangle in the xy plane with sides
parallel to the coordinate axes.
In general, any physical event can be associated with a reasonable subset of S, and often a subset of S
can be associated with some meaningful event. For math- ematical reasons,
If events are mutually exclusive, then they have no outcomes in common. Thus, the occurrence of one
event precludes the possibility of the other occurring. In Example 1.2.1, if A is
the event "at least one head" and if we let B be the event "both tails," then A
= A' (the
änd B are mutually exclusive. Actually, in this example B
complement of A). In general, complementary events are mutually exclusive,
but the converse is not true. For example, if C is the event "both heads,"
then B and C are mutually exclusive, but not complementary.
The notion of mutually exclusive events can be extended easily to more than two events.
Definition 1.26 Events A1, A2, A3, ..., are said to be mutually exclusive
j
i
if they are pairwise mutually exclusive That is if A, n A = Ø whenever
One possible approach to assigning probabilities to events involves the notion of relative frequency.
converge to some constant p. One then might define the probability of obtaining
represents the number of times that the event A occurs among M trials of a given
second graph gives the results for M = 600 rolls. By inspection of these graphs,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
CHAPTER 1 PROBABILITY
8
obviously the relative frequencies tend to "stabilize" near some fixed value as M increases Also included in
the figure is a dotted line of height 1/6, which is the value that experience would suggest as the long term
relative frequency of the outcomes of rolling a die Of course, in this example, the results are more relevant
to the properties of the random number generator used to simulate the experi ment than to those of actual
dice.
FIGURE 1.1 Relative frequencies of elementary events for die-rolling experiment
A. 7/30
2 3 4
(M = 30)
-4.
6/30 107/600 103/600 99/600
97/600
96/600 98/600 ,
2 3 4 (M = 600)
If, for an event A, the limit offA as M approaches infinity exists, then one could assign probability to A by
P(A) = um fA
(1.2.1)
This expresses a property known as statistical regularity Certain technical questions about this
occurs and
rn(S) on each =
M, trial. because
Furthermore, m(A)
counts if the
A and number
B are of
mutually exclusive events, then outcomes in A are distinct from outcomes in B, and consequently rn(A u
B) = rn(A) + m(B). More generally, if A1, A2, ... are pairwise mutually exclusive, then m(A1 u A2 u =
hus, the following properties hold for relative frequencies:
m(A1) + rn(A2) + T
O(1.2.2)
f5=1
(1.2.3)
fA1 u A2 u = fA, +fA2 +
(1.2.4)
if A1, A2, ... are pairwise mutually exclusive events.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.3 DEFINITION OF PROBABILITY
Although the relative frequency approach may not always be adequate as a practical method of
However, many people consider this interpretation too restrictive. By they
L
3
DEFINITON OF OAflLITY
numbers, is a collection of sets (events), and the range of which is a Some
set
functions are not suitable for assigning probabilities to events. The properties
given in the following definition are motivated by similar properties that hold
Definition
1.3.1
For a given experiment, S denotes the sample space and A, A1, A2, ... represent
possible events. A set function that associates a real value P(A) with each event A is
called a probability set function, and P(A) is called the probability of A, if the follow-
F(S) = i (1.3.2)
P(UAI) (1.3.3)
These properties all seem to agree with our intuitive concept of probability, and these few
properties are sufficient to allow a mathematical structure to be developed.
One consequence of the properties is that the null event (empty set) has prob-
ability zero, P(Ø) = O (see Exercise li). Also, if A and B are two mutually exclu-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
10 CHAPTER 1 PROBABILITY
sive events,
then
P(A u B)= P(A)+ P(B) (1.3.4) Similarly, if A1, A2, ..., A is a finite collection of
(See Exercise 12.) In the case of a finite sample space, notice that there is at most
a finite number of nonempty mutually exclusive events. Thus, in this case it
Example 1.3.1 The successful completion of a construction project requires that a piece of
equipment works properly. Assume that either the "project succeeds" (A1) or it fails because of
one and only one of the following: "mechanical failure" (A2) or "electrical failure" (A3). Suppose
that mechanical failure is three times as likely as electrical failure, and successful completion is
occur, we also have from (1.3.2) and (1.3.5) that P(A1) + P(A2) + P(A3) = 1. These equations
P(A3) = 0.1. The event "failure" is represented by the union A2 u A3, and because A2 and A3
are assumed to be mutually exclu- sive, we have from equation (1.3.5) that the probability of
Because each term in the sum (1.3.7) corresponds to an outcome in S, it is an ordinary summation
probability of any other event then can be determined from the above
With this notation, we understand that the summation is taken over all indices i
such that e is an outcome in A. This approach works equally well for both finite
and countably infinite sample spaces, but if A is a countably infinite set the sum-
mation in (1.3.8) is actually an infinite series.
Example 1.3.2 If two coins are tossed as in Example 1.2.1, then S = {HH, HT, TH, TT}; if the
coins are balanced, it is reasonable to assume that each of the four outcomes is equally likely.
Because F(S) = 1, the probability assigned to each elementary event must be 1/4. Any event in
a finite sample space can be written as a finite union of distinct elementary events, so the
probability of any event is a sum including the constant term 1/4 for each elementary event
P(C) = P({HT}) + P({TH}) = 1/4 + 1/4 = 1/2 Note that the "equally likely" assumption cannot
be applied indiscriminately. For example, in Example 1.2.2 the number of heads
is of interest, and the sample space is S* = (0, 1, 2}. The elementary event
{1} corresponds to the event C = {HT, TH} in S. Rather than assigning the
ÇLASSICAL PROBABILITY
this description Note that the e qually likely assumption requires the experi-
ment to be carried out in such a way that the assumption is realistic. That is, the
coin should be balanced, the die should not be loaded, the deck should be shuf-
fled, the lottery tickets should be well mixed, and so forth. This imposes a very
outcomes are favorable to the occurrence of the event as well as how many are in the sample space, and
then finding the ratio. Some techniques that will be useful in solving some of the more complicated counting
problems will be presented in Section 1.6.
The formula presented in (1 3 12) sometimes is referred to as classical probabil- ity For problems in which
this method of assignment is appropriate, it is fairly easy to show that our general definition of probability is
satisfied Specifically, for any event A,
n(Au B) n(A)± n(B)
ri(A)Nn(S) N N P(A B)
N P(A)+ P(B)
if A and B are mutually exclusive.
RANDOM SELECTION A major application of classical probability arises in connection with choosing an
object or a set of objects at random from a collection of objects
(1.3.12)
2 3 2 If an object is chosen from a finite collection of distinct objects in such a manner that each object has the
Similarly, if a subset of the objects is chosen so that each subset of the same
size has the same probability of being chosen, then we say that the subset was
14Example 1.3.3 A game of chance involves drawing a card from an ordinary deck of 52
playing
cards. It should not matter whether the card comes from the top or some other part of the
deck if the cards are well shuffled. Each card would have the same probability, 1/52, of being
In Section 1.6 we will develop, among other things, a method for counting the number
of
subsets of a given size.
From general properties of sets and the properties of Definition 1.3.1 we can derive other useful
properties of probability. Each of the following theorems per tains to one or
more events relative to the same experiment.
P(A)=1P(A') (141)
This theorem is particularly useful when an event A is relatively complicated, but its complement A
is easier to analyze
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14 CHAPTER 1 PROBARILITY
Example 1.4.1 An experiment consists of tossing a coin four times, and the event A of interest is
"at least one head." The event A contains most of the possible outcomes, but the complement, "no
Proof From Theorem (1.4.1), P(A) = i - P(A'). Also, from Definition (13.1),
we know that P(A') O. Therefore, P(A) 1 .
Note that this theorem combined with Definition (1.3.1) implies that
(1.4.2)
O P(A) 1 Equations (1.3.3), (1.3.4), and (1.3.5) provide formulas for the
probability of a union in the case of mutually exclusive events. The following
theorems provide formulas that apply more generally.
A u B = (A n B') u B and
A = (A n B) u (A n B')
lt also follows that the events A n B' and B are mutually exclusive because
(A n B) n B
= Ø, so that equation (1 3 4) implies
P(A u B) = P(A n B') + P(B) Similarly, A n B and A n B'
are mutually exclusive, so that
P(AuB)=P(AnB')+P(B)
= [P(A) - P(A n B)] + P(B)
= P(A) + P(B) - P(A n B)
Examp'e 1.4.2 Suppose one card is drawn at random from an ordinary deck of 52 playing cards.
As noted in Example 13 3 this means that each card has the same probability 1/52, of being
chosen.
Let A be the event of obtaining "a red ace" and let B be the event "a heart." Then P(A) = 2/52,
P(B) = 13/52, and P(A n B) = 1/52. From Theorem (1.4.3) we have P(A u B) =
Proof
See Exercise 16.
lt is intuitively clear that if every outcome of A is also an outcome of B, then A is no more likely to
occur than B. The ,next theorem formalizes this notion.
Theorm 1.4.5 IfA B, then P(A) P(B).
flA1) 1 P(A)
k (1.4.7)
Proof
k
UM together
with
inequality (1.4.6).
1.5
CONDITIONAL PROBABILITY
A major objective of probability modeling is to determine how likely it is that an event A will occur when a
certain experiment is performed. However, in numer- ous cases the probability assigned to A will be
affected by knowledge of the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 CONDITIONAL PROBABILITY '7
occurrence or nonoccurrence of another event B. In such an example we will use the terminology conditional
probability of A given B and the notation P(A will be used to distinguish between this new concept and
ordinary probability B)
I
P(A).
Example 1.5.1 A box contains 100 microchips, some of which were produced by factory i and the rest by
factory 2. Some of the microchips are defective and some are good (nondefective). An experiment
Let A be the event "obtaining a defective microchip"; consequently, A' is the event "obtaining a good
microchip." Let B be the event "the microchip was produced by factory 1" and B' the event "the microchip
was produced by factory 2," Table 1.1 gives the number of microchips in each category.
P(A) = = =
0.20
Now suppose that each microchip has a number stamped on it that identifies which factory produced it.
Thus, before testing whether it is defective, it can be determined whether B has occurred (produced by
factory 1) or B' has occurred (produced by factory 2). Knowledge of which factory produced the
microchip affects the likelihood that a defective microchip is selected, and the use of condi- tional
probability is appropriate. Fo example, if the event B has occurred, then the only microchips we should
consider are those in the first column of Table 1.1, and the total number is n(B) = 60. Furthermore, the only
defective chips to con- sider are those in both the first column and the first row, and the total number is n(A
n B) = 15. Thus, the conditional probability of A given B is
B)
n(B)
P(AIB)= n(A 15025
60
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
18 CHAPTER 1 PROBAffiLITY
probabilities,
P(AlB)_nn_P) n(B)/n(S) P(B)
This last result can be derived under more general circumstances as follows. Suppose we conduct an
experiment with a sample space S, and suppose we are given that the event B has occurred. We wish to
know the probability that an event A has occurred given that B has occurred, written P(A B). That is, we
proportional subset
of B for to which
P(A n A
B), is
true, say P(A so
the B) probability
= kP(A n B).
of A Similarly, given
B P(A' B) = kP(A' n B). Together these should represent the total probability rela- tive
to B, so
P(AIB)+ P(A'IB) = k[P(A n B) + P(A' n B)]
= kP[(A n B) u (A' n B)]
= kP(B)
and k = l/P(B). That is,
P(AnB) P(AnB)
P(AIB) - P(A n B) + P(A' n B) -
Relative to the sample space B, conditional probabilities defined by (1.5.1) satisfy the original definition
Thus, B) = the P(B properties B) = 1, so the conditions of a probability set function are satisfied.
I I
Table 1.2. The enumeration of possible outcomes can be a tedious problem, and useful techniques that are
ways Section
principle, of doing 1.6.
The which another, values
says then in
that this
there if example
there n1 n 2 based
are n1 a re are ways w of o f
ays on
doing both. Thus, for example, the total number of ordered two-card hands that
can be formed from 52 cards (without replacement) is 52 51 = 2652. Similarly,
the number of ordered two-card hands in which both cards are aces is 4 . 3, the
number in which the first card is an ace and the second is not an ace is 4 . 48, and
so forth. The appropriate products for all cases are provided in Table 1.2.
1
2 48 '4
448 48 4
47 51 48 51 451
4851 52 51
For example, the probability of getting "an ace on the first draw and an ace on the second draw" is
given by
P(A1 n A2)
- 5 2. 51
Suppose one is interested in P(A1) without regard to what happens on the second
This same result would have occurred if A1 had been partitioned by another event, say B which
deals only with the face value of the second card This follows because n(B u B')
= 51, and relative to the 52 . 51 ordered pairs of cards,
n(A1) = 4 n(B) + 4 n(B') = 4 n(B u B') = 4 51 The numerators of probabilities such as P(A1),
P(A'1), P(A2), and P(A'2), which deal with only one of the draws, appear in the
margins of Table 1.2. These prob-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 CONDITIONAL PROBABILITY 21
abilities may be referred to as marginal probabilities. Note that the marginal probabilities in fact can be
computed directly from the original 52-card sample space, and it is not necessary to consider the sample
space of ordered pairs at all. For example, P(A1) = 4 ' 51/52 51 = 4/52, which is the probability that would
be obtained for one draw from the original 52-card sample space. Clearly, this result would apply to
sampling-without-replacement problems in general. What may be less intuitive is that these results also apply
to marginal probabilities such as P(A2), and not just to the outcomes on the first draw, That is, if the outcome
of the first draw is not known, then P(A2) also can be computed from the original sample space and is
given by P(A2) = 4/52 This can be verified in this example because
(A2 A2 = (A2 n A1)
n A)
and
P(A2) 4.3
52' 51 P(A21A1)
P(A1)
That is, given that A1 is true, we are restricted to the first column of Table 1.2, and the relative proportion
of the time that A2 is true on the reduced sample space is (4 ' 3)/fl4 ' 3 + (4 ' 48)]. Again, it may be less
obvious, but it is possible to carry this problem the 51 card conditional one step sample further and compute
space, and obtain the P (A2 much A1) directly in simpler solution t erms of that P
I (A2 IA1) = 3/5 1, there
being three aces remaining in the 51 remaining cards in the conditional sample space. Thus, it is common
practice in this type of problem to compute the conditional probabilities and marginal probabilities directly
from the one-dimensional sample spaces (one marginal and one conditional space), rather than obtain the
the first draw. The conditional probability that an ace is drawn on the second draw given that
an ace was
obtained on the first draw is
P(A1 n A2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
22 CHAPTER 1 PROBABILITY
43
52 51
This procedure would extend to three or more draws (without replacement) where, for example, if
A3 denotes obtaining "an ace on the third draw," then
432
52 51 50
An indication of the general validity of this approach for computing condition- al probabilities is
obtained by considering P(A2 A1) in the example. Relative to the joint sample
the given event A1 can occur on the first draw and 51 is the total number of
possible outcomes in the conditional sample space for the second draw;
also, 12 = 4 . 3 represents the number of ways the given event A1 can occur
times the number of ways a success, A2, can occur in the conditional
sample space. Because the number of ways A1 can occur is a common multiplier
in the numerator and denominator when counting ordered pairs, one may equiv-
alently count directly in the one-dimensional conditional space associated with
events that pertain to the first draw from a deck, and if A is an event that
that involve information about both draws. More generally, if B1, B2, ..., Bk
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.5 CONDITIONAL PROBABILITY 23
Theorem 1.5.2 Total Probability If B1, B2, ..., Bk is a collection of mutually exclusive and
exhaustive events, then for any event A,
and the theorem results from applying Theorem 1.5.1 to each term in this sum-
mation.
Theorem 1.5.2 sometimes is known as the Law of Total Probability, because it corresponds to
mutually exclusive ways in which A can occur relative to a partii tion of the
total sample space S
Sometimes it is helpful to illustrate this result with a tree diagram. One such diagram for the case of
three events B1, B2, and B3 is given in Figure 1.3.
The probability associated with branch B- is P(BJ, and the probability associ- ated with each branch
categorized according to which shift produced them. As before, the experiment consists of choosing a
of microchips defective
and from non-
a Various probabilities can be computed directly from the table.
For example, P(B1) = 25/100, P(B2) = 35/100, P(B3) = 40/100, P(A B1) = 5/25, P(A B2) = 10/35, and P(A B3)
= 5/40. It is possible to compute P(A) either directly from the table, P(A) = 20/100 = 0.20, or by using the
Law of Total Probability:
P(A) = P(B1)P(A = B, B B3 Totals A' A
I
20 5 10 25 5 35 20
80
Totals 25 35 40 100
Box i contains the 25 microchips from shift 1, box 2 contains the 35 microchips from shift 2, and box 3
P(A) = 71V
s \ + ÍiVio\
+
5/25 flB
10/3 5
5/40
10 defective 25 good (i\( s 57 280 As a result of this new experiment, suppose that the component obtained
is defective, but it is not known which box it came from. It is possible to compute the probability that it
A n B,
Ex8mple 1.5.4
5 defective 35 good
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
26 CHAPTER
1 PROBABILITY
Theor@rn 1.5.3 Bayes' Rule If we assume the conditions of Theorem 1.5.2, then for each j = 1,
P(BJ)P(A
2,..,,k, P(BJ
A)
I I B) (1 5°)
P(B)P(A B)
Proof
From Definition 1.5.1 and Multiplication Theorem 1.5.5 we have
P(A n B) P(B)P(AIB)
P(B.IA) J
P(A) P(A)
The theorem follows by replacing the denominator with the right side of (1.5.6).
For the data of Example 1.5.4, the conditional probability that the microchip came from box 1, given
that it is defective, is
(1/3)(5/25)
B P( A) - (1/3)(5/25) + (1/3X10/351+
(1/3)(5/40)
= 0,327
=
Similarly, P(B2 A) = 80/171 = 0.468 and P(B3 IA) = 35/171 = 0.205. Notice that these differ from
Example 1.5.5 A man starts at the point O on the map shown in Figure 1.6. He first chooses a
path at random and follows it to point B, B2, or B3. From that point, he chooses a
It might be of interest to know the probability that the man arrives at point A4. This can be computed
from the Law of Total Probability:
P(A4) = P(B1)P(A4 B1) + P(B2)P(A4 B2) + P(B3)P(A4 B3)
Definition 15.2
Two events A and B are called independent events if
P(A n B) = P(A)P(B) (1.5.9)
Otherwise, A and B are called dependent events.
= (O(i) + ()() + () =
i
Suppose the man arrives at point A4, but it is not known which route he took. The probability that he
passed through a particular point, B1, B2, or B3, can be computed from Bayes' Rule. For example,
P(B1jA4)=
(1/3)(1/4) 1 (1/3)(1/4) + (1/3X1/2) + (1/3X0) - 3 which agrees with the unconditional probability, P(B1) = 1/3.
This is an example of a very special situation called "independence," which we will pursue in the next
section. However, this does not occur in every case. For example, an application of Bayes' Rule also leads to
P(B2 A4) = 2/3, which does not agree with P(B2) = 1/3. Thus, if he arrived at point A4, it is twice as likely
I
that he passed through point B2 as it is that he passed through B1. Of course, the most striking result This
probability that an event B will occur. In other words, P(B A) = P(B). We saw this
happen in Example 1 5
5, because the probability of passing through point B1 was 1/3 whether the knowledge that the man
arrived at point A4 was taken into account. As a result of the Multiplication Theorem (1.5.5), an
equivalent formulation
when this of
this situation happens the two is
P(A n B) = events are said P(A)P(B
P(AIB)=P(A) P(BIA)_P(B) J
We saw examples of both independent and dependent events in Example 1.5.5. There was also an example of
mutually exclusive events, because P(B3 A4) = 0, which implies P(B1 n A4) = 0. There is often confusion
between the concepts of independent events and mutually exclusive events. Actually, these are quite differ- ent
notions, and perhaps this is seen best by comparisons involving condition- al probabilities. Specifically, if
A and B are mutually exclusive, then P(A B) = P(B A) = 0, whereas for independent nonnull events the
I
conditional probabilities
are nonzero as noted by Theorem 1.5.4. In other words, the pro- perty of being
mutually exclusive involves a very strong form of dependence, because, for nonnull events, the occurrence
of one event precludes the occurrence of the other event.
There are many applications in which events are assumed to be independent.
Example 15.6 A "system" consists of several components that are hooked up in some particular
configuration. It is often assumed in applications that the failure of one com- ponent
does not affect the
likelihood that another component will fail. Thus, the failure of one component is assumed to be
independent of the failure of another component.
A series system of two components, C1 and C2, is illustrated by Figure 1.7. It is easy to think of such a system
in terms of two electrical components (for example, batteries in a flashlight) where current must pass
through both components for the system to function. If A1 is the event "C1 fails" and A2 is the event "C2
fails," then the event "the system fails" is A u A2. Suppose that P(A1) = 0.1 and P(A2) = 0.2. If we
assume that A1 and A2 are independent, then the probability that the system fails is
P(A1 u A2) = P(A1) + P(A2) - P(A1 n A2)
P(A1) + P(A2) - P(A1)P(A2)
= 0.1 + 0.2 - (0.1)(0.2) = 0.28 The probability that the system works properly is i - 0.28 = 0.72.
FIGURE 1.7 Series system of two components
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.5 CON$TIONAL PROBABtLITY 29
Notice that the assumption of independence permits us to factor the probabil- ity
of the joint event, P(A1 r A2), into the product of the marginal probabilities,
P(A1)P(A2).
Another common example involves the notion of a parallel system, as illus-
trated in Figure 1.8. For a parallel system to fail, it is necessary that both com-
ponents fail so the event the system fails is A1 r' A2 The probability that thu
system fails is P(A1 A2) = P(A1)P(A2) = (0.ÎXO.2) = 0.02, again assuming the
the outcome of the first card is recorded and then the card is replaced in the
deck and the deck is shuffled before the second draw is made.
This type of
sampling is referred to as sampling with replacement, and it would be
reasonable to assume that the draws are independent trials. In this case P(A1 r
A2) = P(A1)P(A2). There are many other problems in which it is reasonable
Theorem 1.5.5 Two events A and B are independent if and only if the following pairs of events
are also
independent:
AandB'.
A' and B.
A' and
B'.
Proof
See Exercise 38.
It is also possible to extend the notion of independence to more than two events.
Definition 15.3
The k events A1, A2.....Ak are said to be independent or mutually independent if
for e very j 2, 3.....k and every subset of distinct indices i1, i2, ...,
Example 1.5.7 A box contains eight tickets, each labeled with a binary number. Two are labeled
111, two are labeled 100, two 010, and two 001. An experiment consists of drawing
one
ticket at random from the box. Let A be the event "the first digit is 1," B the event "the second
digit is 1," and C the event "the third digit is 1." This is illustrated by Figure 1.9. It follows that
P(A) = P(B) = P(C) 4/8 = 1/2 and that P(A n B) = P(A n C) P(B n C) = 2/8 = 1/4, thusAB,
and C arçpa1r wise independent. However, they are not mutually independent, because
P(A n B n C) = = = P(A)P(B)P(C)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 COUNTING TECHNIQUES 3
1
COUNTING TECHNIQUES
In be many
reasonable experiments
to assume with
that finite
all sample
posible spaces,
outcomes such
as are games
equally of
chance, likely. In it
may that case
a realistic probability model should result by following the
classical approach and taking the probability of any event A to be P(A) =
n(A)/N, where N is the total number of possible outcomes and n(A) is the
Counting the number of ways in which an event may occur can be a tedious
problem in complicated experiments.
A few helpful counting techniques will be
discussed.
Exampk 1.5.8 In Figure 1.9, let us change the number on one ticket in the first column from 111 to
110, and the number of one ticket in the second column from loo to 101. We still haveP(A) =
P(B) = P(C) =
bu
t
P(B n C) = = P(B)P(C)
and
P(A n B n C) = =
P(A)P(B)P(C)
In this case we have three-way facto rization, but not independence of all pairs.
16
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
CHAPTER 1 PROBABIUTY
32
MULTIPLICATION PRINCIPLE First note that if one operation can be performed in n1 ways and a
second oper- ation can be performed in n2 ways, then there are n1 n2 ways in which both operations can
be carried out
Example i .6.1 Suppose a coin is tossed and then a marble is selected at random from a box containing
one black (B), one red (R) and one green (G) marble The possible outcomes are HB, HR, HG, TB, TR,
and TG. For each of the two possible outcomes of the coin there are three marbles that may be selected
tree
for a total of 2 3 = 6 possible outcomes. The situation also is easily illustrated by a diagram, as in
Figure 1.10.
FIGURE i.io Tree diagram of two-stage experiment
HR
discussed in Example
Another application of the multiplication principle was 1.5.2 in connection with
two-card hands.
counting the number of ordered
Note that the multiplication principle can be extended to more than two oper- ations. In particular, if the ith
be performed in n,
of r successive operations can ways, then the total number of ways to carry out all r
operations is the product
I (1.6.1)
One standard type of counting problem is covered by the following theorem.
Theorem 1.6.1 If there are N possible outcomes of each of r trials of an experiment, then there
are N possible outcomes in the sample space.
answered? The answer
Example I .6,2 How many ways can a 20-question truefalse test be
is 220.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 COUNTING TECHNIQUES
33
Example 1.6.3 How many subsets are there from a set of m elements? In forming a subset, one
must decide for each element whether to include that element in the subset. Thus for each of m
elements there are two choices, which give a total of 2m possible subsets. This includes the null
set, which corresponds to the case of not including any element in the subset.
As suggested earlier, the way an experiment is carried out or the method of sampling may affect the
sample space and the probability assignment over the sample space. In
Example 1.6.4 If five cards are drawn from a deck of 52 cards with replacement, then there are
(52) possible hands. If the five cards are drawn without replacement, then the more general
multiplication principle may be applied to determine that there are 52 . 5 . 5f . 49 . 48 possible
hands. In the first case, the same card may occur more than once in the same hand. In the
second case, however, a card may not b e repeated.
Note that in both cases in the above example, order is considered important. That is, two five-card
hands may eventually end up with the same five cards, but they are counted as
different hands in the example if the cards were obtained in a different order. For
example, let all five cards be spades. The outcome (ace, king, queen,
jack, ten) is
different from the outcome (king, ace, queen, jack, ten). If order had not been
considered important, both of these outcomes would be con- sidered the same;
this same (unordered) outcome, On the other hand, only one outcome
corresponds to all five cards being the ace of spades (in the sampling-
with-replacement case), whether the cards are ordered or unordered. This
noted earlier that there are fewer distinct results if order is not taken into
account, but the probability of any one of these unordered results occurring then
would be greater. Note also that it is common practice to assume that order is
- r)!
= (n n!
Proof
To fill r positions from n objects, the first position may be filled in n ways using any one of the n objects, the
second position may be filled in n - i ways, and so on until n - (r - 1) objects are left to fill in the rth
FicurùpIe 165 The number of permutations of the four letters a, b, c, d taken two at a time is 4!/2! =
12. These are displayed in Figure 1.11. In picking two out of the four letters,
there are six unordered ways
to choose two letters from the four, as given by the top row. Each combination of two letters then can be
permuted 2! ways to g et the total of 12 ordered arrangements.
FIGUflE 1.11 Permutations of four objects taken two at a time
ab ac a d bc bd cd ha ca da cb db dc
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 COUNTING TECHNIQUES
35
Example 1.6.6 A box contains n tickets, each marked with a different integer, 1, 2, 3, ..., n. If three
consecutive integers One possible solution would be to let the sample space consist of all ordered triples (i,
j, k), where i different integers in the range i to n. The number of such triples = n - 3) = n(n - 1)(n -
2) The triples that consist of consecutive j and k are integers is P3 w
ould be (1, 2 3), (2, 3 4) ,
permuting the entries in these (n There - 2, n would - 1, n) be or 3 any (n of - the 2) triples = 6 triples.
,jr = (n\
(n).
rj Ir!-
(nr)! n!
Dividing by r! gives the desired expression for
Thus, the number of combinations of four letters taken two at a time is (4"\ 4!
6, as noted above. If rnorder is considered, then the number of
21 2!2!
arrangements becomes 6 2! = 12 as before. Thus, () counts the number of
paired symbols in either the first or second row, but not both, in Figure 1.11. It also is possible to solve the
gers 1, 2, ..., n taken three at a time. Equivalently, this would be the collection of
all subsets of size 3 from the set {1. 2, 3, ..,, n}, of which there are
as before. This shows that some problems can be solved using either combinations or permutations.
Usually, if there is a choice, the combination approach is simpler because the
sample space is smaller. However, combinations are not appropriate in some
problems
Example 1.6.1 In Example 1.6.6, suppose that the sampling is done with replacement. Now, the
same number can be repeated in the triples (i, f k), so that the sample space has n3 outcomes.
There are still only 6(n - 2) triples of consecutive integers, because repeated integers cannot be
consecutive. The probability of consecutive integers in this case is 6(n - 2)/n3. Integers can be
Example 1.6.8 4. familiar use of the combination notation is in expressing the binomial
expansion(a + b) = k O ()akb
(1 6 4)
In this case, () is the coefficient of akbn_k, and it represents the number of ways
of choosing k of the n factors (a + b) from which to use the a term, with the b
term being used from the remaining n - k factors.
Example 1.6.9 The combination concept can be used to determine the number of subsets of a set
If order is taken into account as in Example 1.6.4, then the number of ordered five-card hands is52P5 = (52
Notice that arrangements are distinguishable if they differ by exchanging marbles of different colors,
but not if the exchange involves the same color. We will refer to these 10 different arrangements as
permutations of the five objects even though the objects are not all distinguishable. (1.6.5)
INDISTINGUISHABLE OBJECTS
guishable. The
objects. discussion
There are to
also this
many point
applications has
dealt with involving
black and three white, but otherwise indistinguish- black able. (B) In Figure and three 1,12, white we (W)
A more general way to count such permutations first would be to introduce labels for the objects,
objects, but within each color there are permutations that we don't
want to
count. We can compensate by dividing by the number of permu- tations of
black objects (2 ) and of white objects (3!) Thus, the number of permu- tations of
nondistinguishable objects is
5! 2!3!
=
Theorem 1.6.5 The number of distinguishable permutations of u objects of which r are of one
kind and n - r are of another kind
is
1.6.6)
(n'\ kr) n!
r!(n
- r)!
Clearly, this concept can be generalized to the case of permuting k types of objects.
Theorem 1.6.6 The number of permutations of n objects of which r1 are of one kind, 12 of a
n
!
(1.6.7)
Proof
This follows from the argument of Example 1.6.11, except with k different colors
of balls.
Example 1.6.12 You have 10 marblestwo black, three white, and five red, but otherwise not
distinguishable The number of different permutations is
2520
2!5!
The notion of permutations of n objects, not all of which are distinguishable, is related to yet another
type of operation with n distinct objects.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 COUNTING TECHNIQUES 39
PARTITIONING Let us select r objects fromn distinct objects and place them in a box or "cell," and then
(n)
place the remaining n - r objects in a second cell. Clearly, there are
ways of doing this (because permuting the objects within a cell will not produce a new result), and this is
referred to as the number of ways of partitioning n objects into two cells with r objects in one cell and n -
Note that partitioning assumes that the number of objects to be placed in each cell is fixed, and that the
order in which the objects are placed into cells is not considered. By successively selecting the objects,
- 369,600
This is also the number of ways of arranging 12 popsicles, of which three are red, thrçe are green, three are
orange, and three are yellow, if popsicles of the same color are otherwise indistinguishable.
PROBABILITY COMPUTATIONS
As mentioned earlier, if it can be assumed that all possible outcomes are equally likely to occur, then the
classical probability concept is useful for assigning prob- abilities to events, and the counting techniques
reviewed in this section may be helpful in computing the number of ways an event may occur.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
40 CHAPTER 1 PROBAWLITY
whether the items are indistinguishable, and so on, may have an effect on the
number of possible outcomes.
ExampI '1.6.14 A student answers 20 truefalse questions at random. The probability of getting
100% on the test is P(100%) = 1/220 = 0.00000095. We wish to know the prob-
ability of getting 80% right, that is, answering 16 questions correctly. We do not
care which 16 questions are answered correctly, so there are () ways of choos
ExampM 1,6.15 Sampling Without Replacement A box contains 10 black marbles and 20 white
marbles, and five marbles are selected without replacement The probability of getting exactly
(0)
There are total possible outcomes. Also there are() ways of choosing
the two black marbles from the 10 black marbles, and 3) ways
of choosing the
()(0)
remaining principle, there three arewhite marbles ways from the of
achieving 20 white marbles. the event By o f the getting multiplication t wo
black
marbles. Note that order was not considered important in this problem, although
all 30 marbles are considered distinct in this computation, both in considering
the total number of outcomes in the sample space and in considering how many
outcomes correspond to the desired event occurring. Even though the question
does not distinguish between the order of outcomes, it is possible to consider the
question relative to the larger sample space of equally likely ordered outcomes.
In that case one would have 30P5 =() . 5! possible outcomes and
(1.6.9
)
P(exactly 2
black
P(BBWWW) -
Similarly,P(BWBWW) 10209
1918
30 29 28 27 26
and so on. Thus, each particular ordering has the same probability. If we do not
wish to distinguish between the ordering of the black and white marbles, then
which again is the same as equation (1.6.8). That is, there are () = 10 different
particular orderings that have two black and three white marbles (see Figure 1.12).
One could consider () as the number of ways of choosing two positions out of
the five positions in which to place two black marbles. If a particular order is not
particular sequence, it follows that there are only () unordered sequences rather
than 5! sequences. Thus, although two black marbles may be distinct, permuting them does not
produce a different result The order of the black marbles within themselves
was not considered important when defining the ordered sequences;
only the order between black and white was considered. Thus the
coefficient
P(B) = 10/30 and P(W) = 20/30. Indeed, the assumption that the black marbles
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
42 CHAPTER 1 PROBABILITY
and white marbles are indistinguishable within themselves appears more natural in the conditional
probability approach. Nevertheless, the distinctness assump- tion is a convenient aid in the first approach
to obtain the more basic equally likely sample space, even though the question itself does not
2 black)= /5V1o\2(2o3
2)ö) ö ) (1.6.11)
Of course, in this case the outcomes on each draw are independent. If one chooses to use the classical
1.6.15 consider it is more the sample convenient space just of
approach in this case, it is more convenient to
(3)
30 to consider equally likely the sample ordered space outcomes; of in unordered Example
outcomes as in equation (1.6.8), rather than the ordered outcomes as in equation (1.6.9). For event A, one then
has "exactly 2 black,"
n)
P(A) - The form in this case remains quite similar to equation (1.6.11), although the
argument would be somewhat different. There are () different patterns in which
the ordered arrangements may contain two black and three white marbles, and for each pattern there are
102303 distinct arrangements that can be formed in this sample space.
Because many diverse types of probability problems can be stated, a unique approach often may be
needed to identify the mutually exclusive ways that an event can occur in such a manner that these
the observed result is uncertain before experimentation. The basic approach involves defining the sample
outcomes of the experiment, and defining an event mathematically as the set of outcomes associated
with occurrence of the event, The primary motivation for assigning probability
to an event involves the long-term relative frequency inter- pretation. However,
the approach of defining probability in terms of a simple set of axioms is more
event has occurred, then the events are considered independent Care should be
EXERCISES
2. Two gum balls are obtained from the machine in Exercise I from two trials. The order of
the outcomes is important. Assume that at least two balls of each color are in the machine,
What is an appropriate sample space? How many total possible events are there that contain
eight outcomes? Express the following events as unions of elementary events. C1 = getting a
There are
red ball on the first trial, C2 = getting at least one red ball, C1 n C2, C'1 n C2. 3.
four basic blood groups: O, A, B, and AB. Ordinarily, anyone can receive the blood of a donor
from their own group. Also, anyone can receive the blood of a donor f rom the O group, and
any of the four types can be used by a recipient from the AB group.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
44 CHAPTER 1 PROBABILITY
All other possibilities are undesirable. An experiment consists of drawing a pint of blood
and determining its type for each of the next two donors who enter a blood bank.
List the possible (ordered) outcomes of this experiment. List the outcomes
corresponding to the event that the second donor can receive the blood of the first
donor. List the outcomes corresponding to the event that each donor can receive
the blood of the other.
4. An experiment consists of drawing gum balls from a gum-ball machine until a red ball is
obtained. Describe a sample space for this experiment.
5. The number of alpha particles emitted by a radioactive sample in a fixed time interval is
counted.Give a sample space for this
experiment.
The elapsed time is measured until the first alpha particle is emitted. Give a sample space for this experiment.
6. An experiment is conducted to determine what fraction of a piece of metal is gold. Give a
sample space for this experiment.
7. A randomly selected car battery is tested and the time of failure is recorded. Give an
appropriate sample space for this experiment.
8. We obtain 100 gum balls from a machine, and we get 20 red (R), 30 black (B), and 50 green
(G) gum balls.
Can we use, as a probability model for the color of a gum ball from the machine, one given by p3 = P(R) = 0.2, P2
P(B) = 0.3, and p3 = P(G) = 0.5? Suppose we later notice that some yellow (Y)
gum balls are also in the machine. Could we use as a model p1 0.2, P2 = 0.3, p 3
0.5, and p4 = P(Y) = 0.1?
9. In Exercise 2, suppose that each of the nine possible outcomes in the sample space is
equally likely to occur. Compute each of the following:
P(both red)
P(C1).
P(C2). P(C1
n C2),
P(C'L n C2).
P(C1 u C2).
10. Consider Exercise 3. Suppose, for a particular racial group, the four blood types are
equally likely to
occur.
Compute the probability that the second donor can receive blood from the first donor. Compute the probability
that each donor can receive blood from the other. Compute the probability that
neither can receive blood from the other.
11. Prove that P(Ø) = 0. Hint: Let A = 0 for all i in equation (1.3.3).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
45
12. Prove equation (1.3.5). Hint: Let A Ø for all i> k in equation (1.3.3).
13. When an experiment is performed, one and only one of the events A1, A2, or A3 will
occur. Find P(A1), F(A2), and P(A3) under each of the following assumptions:
P(A1) = P(A2) P(A3). P(A1) =
P(A2) and P(A3) = 1/2. P(A1) =
2P(A2) = 3P(A3).
14. A balanced coin is tossed four times. List the possible outcomes and compute
the
probability of each of the following events:
exactly three heads. at least one head. the
number of heads equals the number of tails. the
number of heads exceeds the number of tails.
15. Two part-time teachers are hired by the mathematics department and each is assigned at
random to teach a single course, in trigonometry, algebra, or calculus. List the outcomes in the
sample space and find the probability that they will teach different courses. Assume that more
than one section of each course is offered.
16. Prove Theorem 1.4.4. Hint: Write A u B u C = (A u B) u Cand apply Theorem 1.4.3.
17. Prove Theorem 1.4.5. Hint: IfA B, then we can write B = A u (B n A'), a disjoint
union
19. Let P(A) = P(B) = 1/3 and P(A n B) 1/10, Find the following
P(B'). P(A
u B'). P(B
n A').
P(A' u
B').
20. Let P(A) = 1/2, P(B) 1/8, and P(C) = 1/4, where A, B, and C are mutually exclusive.
Find the following:
P(A u B u C).
P(A' n B' n C').
21. The event that exactly one of the events Aor B occurs can be represented as
23. A certain family owns two television sets, one color and one black-and-white set. Let A be
the event the color set is on and B the event the black and white set is on If P(A) = 04
P(B) = 0 3 and P(A u B) = 0 5 find the probability of each event
both are on. the color set is on and
the other is offi exactly one set is on.
neither set is on.
25. A box contains three good cards and two bad (penalty) cards. Player A chooses a card and
then player B chooses a card. Compute the following probabilities:
P(A good). P(B good A good). P(B good A bad). P(B good n A good) using (1.5.5).
Write out the sample space of ordered pairs and compute P(B good n A good) and
P(B good A good) dìrectly from definitions. (Note: Assume that the cards are
distinct.) P(B good). P(A good j B good).
26. Repeat Exercise 25, but assume that player A looks at his card, replaces it in the box, and
remixes the cards before player B draws.
27. A bag contains five blue balls and three red balls. A boy draws a ball, and then draws
another without replacement. Compute the following probabilities:
P(2 blue balls). P(1
blue and i red).
P(at least i blue).
P(2 red balls).
28. In Exercise 27, suppose a third ball is drawn without replacement. Find:
(a) P(no red balls left after third draw).
(b P(i red ball left).
P(first red ball on last draw). P(a red ball on last draw).
29. A family has two children. It is known that at least one is a boy. What is the probability
that the family has two boys, given at least one boy? Assume P(boy) = 1/2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
EXERCISES
47
30. Two cards are drawn from a deck of cards without replacement.
What is the probability that the second card is a heart, given that the first card is a heart? What
is the probability that both
cards are hearts, given that at least one is a heart? 31. A box contains five green balls, three black balls, and seven red
balls. Two balls are selected at random without replacement from the box. What is the probability that:
both balls are red? both balls are the same color?
32. A softball team has three pitchers, A, B, and C, with winning percentages of 0,4, 0.6, and
0 8 respectively These pitchers pitch with frequency 2 3 and 5 out of every 10 games respectively P(C) = 05 Find In
other words for a randomly selected game P(A) = 0 2 P(B) = 0 3 and P(team wins game) = P(W) P(A pitched game team
won) = P(A W) 33. One card is selected from a deck of 52 cards and placed in a second deck. A card then is
I
output. Of their 'respective outputs, 5%, 3%, and 2% are defective. A bolt is selected What
is the probability that it is
defective? Given that it s defective what is the probability that it was made by machine 1 9
36. Drawer A contains five pennies and three dimes, while drawer B contains three pennies and seven dimes. A drawer is
Three independent components are hooked in series. Each component fails with
probability p What is the probability that the system does not fail9
Three independent components are hooked in parallel. Each component fails with
probability p. What is the probability that the system does not fail?
Consider the following system with assigned probabilities of malfunction for the
five components. Assume that malfunctions occur independently.
The probability that a marksman hits a target is 0.9 on any given shot, and repeated shots
are independent. He has two pistols; one contains two bullets and the other contains only
one bullet He selects a pistol at random and shoots at the target until the pistol is empty
What is the probability of hitting the target exactly one time9
43 Rework Exercise 27 assuming that the balls are chosen with i eplacement
44 In a marble game a shooter may (A) miss, (B) hit one marble out and stick in the ring, or
(C) hit one marble out and leave the ring. If B occurs, the shooter shoots again.
and P(C) and these prob ibilities do not change from
If P(A) = p1 P(B) = P2 p shot to
shot, then express the probability of getting out exactly three marbles on one turn.
What is the probability of getting out exactly x marbles in one turn? Show that the
probability of getting one marble is greater than the probability of getting zero
marbles if
46. A, B, and C are events such that P(A) = 1/3, P(B) = 1/4, and P(C) = 1/5. Find
47 A bowl contains four lottery tickets with the numbers 111 221 212 and 122 One ticket is
etermine
drawn at random from the bowl and A, is the event 2 in the ith place i 1 2 3 D
whether A, A2, and A3 are independent.
49. License plate numbers consist of two letters followed by a four-digit number, such as
SB7904 or AY1637.
How many different plates are possible if letters and digits can be repeated? Answer
(a) if letters can be repeated but digits cannot. (e) How many of the plates in (b) have
a four-digit number that is greater than 5500?
50 In how many ways can three boys and three girls sit in a row if boys and girls must
alternate7
51. How many odd three-digit numbers can be formed from the digits 0, 1, 2, 3, 4 if digits can
be repeated, but the first digit cannot be zero?
52 Suppose that from 10 distinct objects four are chosen at random with replacement
What is the probability that no object is chosen more than once7 What is
the probability that at least one object is chosen more than once?
A restaurant advertises 256 types of nachos. How many topping ingredients must be
available to meet this claim if plain corn chips count as one type?
A club consists of 17 men and 13 women, and a committee of five members must be
How many committees are possible with three men and two omen9 (e) Answer (b) if a
particular man must be included.
55 4 football coach has 49 players available for duty on a special kick receiving team
If 11 must be chosen to play on this special team, how many different teams are
possible? If the 49 include 24 offensive and 25 defensive players, what is the
probability that a randomly selected team has five offensive and six defensive
players?
(n (n
1
(b)
r)r-1
57. Provide solutions for the following sums:
74
74\ 74\
2) + 4) +
58. Seven people show up to apply for jobs as cashiers at a discount store.
If only three jobs are available, in how many ways can three be selected from the
seven applicants? Suppose there are three male and four female applicants, and all
seven are equally qualified, so the three jobs are filled at random. What is the
probability that the three hired are all of the same sex? In how many different ways
could the seven applicants be lined up while waiting for an interview? If there are
four females and three males, in how many ways can the applicants be lined up if
the first three are female?
59. The club in Exercise 54 must elect three officers: president, vice-president, and secretary.
How many different ways can this turn out? 60. How many ways can 10 students be lined up to get on a bus if
a particular pair of students
refuse to follow each other in line?
61. Each student in a class of size n was born in a year with 365 days, and each reports his or
her birth date (month and day, but not year).
How many ways can this happen? How many ways can this happen with no
repeated birth dates? What is the probability of no matching birth dates9 In a
class of 23 students, what is the probability of at least one repeated birth date?
63. How many ways can you partition 26 letters into three boxes containing 9, 11, and
6
letters
?
64. How many ways can you permute 9 a's, 11 b's, and 6 c's?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES 51
65. A contest consists of finding all of the code words that can be formed from the letters in
the name "ATARI," Assume that the letter A can be used twice, but the others at most
once.
How many five-letter words can be formed?
How many two-letter words can be
formed? How many words can be formed?
66. Three buses are available to transport 60 students on a field trip. The buses seat 15, 20,
and 25 passengers, respectively. How many different ways can the students. be loaded on
the buses?
67. A certain machine has nine switches mounted in a row. Each switch has three positions, a,
possible?
Answer (a) if each position is used three
times.
69. Suppose the winning number in a lottery is a four-digit number determined by drawing
four slips of paper (without replacement) from a box that contains nine slips numbered
consecutively i through 9 and then recording the digits in order from smallest to largest.
How many different lottery numbers are possible? Find the probability that the
winning number has only odd digits. How many different lottery numbers are
possible if the digits are recorded in the order they were drawn?
70. Consider four dice A, B, C, and D numbered as follows: A has 4 on four faces and O on
two faces; B has 3 on all six faces; C has 2 on four faces and 6 on two faces; and D has 5
on three faces and i on the other three faces. Suppose the statement A > B means that the
face showing on A is greater than on B, and so forth. Show that P[A > B] = P[B> C]
.P[C> D] = P[D > A] = 2/3. In other words, if an opponent chooses a die, you can
always select one that will defeat him with probability 2/3.
71 A laboratory test for steroid use in profesional athletes has detection rates given in the
following table:
Test Result
Steroid Use +
Yes .90 .10 No .01 .99
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
52 CHAPTER 1 PROBABILITY
21
INTRODUCTIONOur purpose is to develop mathematical models for describing the
probabilities
of outcomes or events occurring in a sample space Because mathematical equa
tions are expressed in terms of numerical values rather, than as heads, colors, or
other properties, it is convenient to define a function, known as a random vari-
able, that associates each outcome in the experiment with a real number. We then
can express the probability model for the experiment in terms of this associated
random variable. Of course, in many experiments the results of interest already
are numerical quantities, and in that case the natural function to use as the
Pefinition 21.1 Random Variable A random variable, say X, is a function defined over a sample space,
S, that associates a real number, X(e) = x, with each possible outcome e in S.
53
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
54 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Capital letters, such as X, Y, and Z will be used to denote random variables. The lower case letters
x, y, z, ... will be used to denote possible values that the corresponding random
variables can attain. For mathematical reasons, it will be necessary to restrict
the types of functions that are considered to be random variables. We will
discuss this point after the following example.
FIG UflE 2.1 Sample space for two rolls of a four-sided die
x=3overB3,andx=4overB4. maximum.
In other words, X has value x =
i over B1, x = 2 over B2, Other random variables also could be considered.
B={eleeS and X(e)eA} (2.1.1) to be an event in the underlying sample space S. Even
though A and B are subsets of different spaces, they usually are referred to as
equivalent events, and we writeP[X n A] = P(B) (2.1.2) The
notation Pr(A)
sometimes is used instead of P[X n A] in equation (2.1.2). This defines a set
function on the collection of real-valued events, and it can be shown to satisfy
the three basic conditions of a probability set function, as given by Definition
1.3.1. Although the random variable X is defined as a function of e, it usually
is possible to express the events of interest only in terms of the real values that
X assumes.
Thus, our notation usually will s'ippress the dependence on the
out- comes in S, such as we have done in equation (2.1.2).
For instance, in Example 2.1.1, if we were interested in the event of obtaining a score of "at most 3,"
would be to represent the event in terms of some interval that contains the
values 1, 2, and 3 but not 4, such as A = (w, 3]. The associated equivalent
3/16 + 5/16 = 9/16.A convenient notation for P[X e A], in this example, is P[X
3]. Actually, any other real event containing 1, 2, and 3 but not 4 could be used in
this way, but intervals, and especially those of the form (- cx, x], will be of
special importance in developing the properties of random variables.
As mentioned in Section 1.3, if the probabilities can be determined for each
elementary event in a discrete sample space, then the probability of any event can
be calculated from these by expressing the event as a union of mutually exclusive
based on assigning probabilities to intervals of the form (- ct, x] for all real
B = [X x] = {eje eS and X(e) e (oc, x]} (2.1.3) a re events in the sample space
S. The probabilities of other real events can be evaluated in terms of the
probabilities assigned to such intervals. For example, for the game of Example
2.1.1, we have determined that P[X 3] = 9/16, and it also follows, by a similar
argument, that F[X 2] = 1/4. Because (- , 2] con- tains I and 2 but not 3,
and (co, 3] = (cc, 2] u (2, 3], it follows that P[X = 3] = P[X 3 ] - P[X
2] = 9/16 - 1/4 = 5/16. Other
examples of random variables can be based on the
sampling problems of Section 1.6.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
56 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Example 2.1.2 In Example 1.6.15, we discussed several alternative approaches for computing the
probability of obtaining exactly two black marbles, when selecting five (without replacement) from a
collection of 10 black and 20 white marbles. Suppose we are concerned with the general problem of obtaining
x black marbles, for arbitrary x. Our approach will be to define a random variable X as the number of
black marbles
in the sample, and to determine the probability P[X x] for every pos- sible value x. This is
easily accomplished with the approach given by equation (1.6.8), and the result is(lo'\( 20
xÂ5-
P[X = x = 0, 1,2, 3,4, 5 (2.1.4)
Random variables that arise from counting operations, such as the random var- iables in Examples 2. 1.1.
subscripted notation, f(x), is u sed.The following theorem gives general properties that any discrete pdf
must satisfy.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2,2 DISCRETE RANDOM VARIABLES 57
Th'rem 2.2.1 A function f(x) is a discrete pdf if and only if it satisfies both of the following
properties for at most a countably infinite set of reals Xj, x2,.
f(x1) 0 (2.2.2)
Proo
f
Property (2.2.2) follows from the fact that the value of a discrete pdf is a probabil- ity and must be
nonnegative. Because x1, x2, ... represent all possible values of X, the events
Consequently, any pdf must satisfy properties (2.2.2) and (2.2.3) and any func-
tion that satisfies properties (2.2.2) and (2.2.3) will assign probabilities consis-
tent with Definition 1.3.1.
In some problems, it is possible to express the pdf by means of an equation, such as equation
For example, one way to specify the pdf of X for the random variable
X in
Example 2.1.1 is given in Table 2.1.
TABLE 2.1
mum of
the 2
Of course, these are the probabilities, respectively, of the events B, B2, B3, and B4 in S.
A graphic representation off(x) is also of some interest. It would be possible to leave f(x) undefined
at points that are not possible values of X, but it is conve- nient to define f(x)
as zero at such points. The graph of the pdf in Table 2.1 is shown in Figure
2.2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
58 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
FIGURE 2.2 Discrete pdf of the maximum of two roHs of a four-sided die
f(x
)
T
Î X
ExmpIe 2.2i Example 2.1.1 involves two rolls of a four-sided die. Now we will roll a
1 2
12-sided (dodecahedral) die twice. If each face is marked with an integer, i through 12, then
each value is equally likely to occur on a single roll of the die. As before, we define a random
variable X to be the maximum obtained on the two rolls. It is not hard to see that for each
1= x1 12
So c = 1/(12)2 = 1/144.
As mentioned in the last section, another way to specify the distribution of probability is to assign
probabilities to intervals of the form (- cc, x], for all real x The probability
Definition 2.2.2
The cumulative distribution function (CDF) of a random variable X is defined for
any real x by
FiGURE 2.3 The CDF of the maximum of two rolls of a four-sided die
F(x)
9/16
-
1/4
1/16 X
The function F(x) often is referred to simply as the stribution function of X, and the subscripted
appropriate. will
use a short If we notation
write X to
indicate f(x) or that
X -
and CDF F(x). As seen in Figure 2.3, the CDF of the distribution given in
seen by comparing Figures 2.2 and 2.3. The general relationship between F(x)
order, pdff(x)
x1 and
<x2 CDF
<x3 F(x).
< If .., the
then possible
f(x1)
= F(x1), and for any i> 1,
where the summation is taken over all indices i such that x x.
The CDF of any random variable must satisfy the properties of the following theorem.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
60 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Theorem 2.2.3 A function F(x) is a CDF for some random variable X if and only if it satisfies
the following
properties:
um F(x) = O (2.2.8)
The first two properties say that F(x) can be made arbitrarily close to O or i by taking x arbitrarily
large, and negative or positive, respectively. In the examples considered so far,
it turns out that F(x) actually assumes these limiting values. Property (2.2.10)
says that F(x) is continuous from the right. Notice that in Figure 2.3 the only
hand, as x approaches these values from the left, the limit of F(x) is the
value of F(x) on the lower step, so F(x) is not (in general) continuous from the
left. Property (2.2.11) says that F(x) is nondecreasing, which is easily seen to be the
interval case of in Figure the form 2.3. (- In cxc, general, b] can be this
represented property follows as the union from the of two fact disjoint that
for any P[a < x a <b. It follows that b] 0, and thus equation F(b)
F(a) + P[a
=
This reduces the problem of computing probabilities for events defined in terms
of intervals of the form (a, b] to taking differences with F(x). Generally, it is
probability distribution by considering the pdf directly, rather than the CDF,
although the CDF will- provide a good basis for defining continuous prob-
ability distributions. This will be considered in the next section. Some important
Definition 22..3
If X is a discrete random vaiabIe with pdff(x), then the expected value of X is
range of X is infinite In the latter case, if the infinite series is not absolutely
convergent, then we will say that E(X) does not exist. Other common notations
for E(X) include u, possibly with a subscript, The terms mean and expectation
Example 2.2.2 A box contains four chips. Two are labeled with the number 2, one is labeled
with a 4, and the other with an 8. The average of the numbers on the four chips is (2 + 2 + 4 + 8)/4
= 4. The experiment of choosing a chip at random and record- ing its number can be associated
with a discrete random variable X having dis- t inct values x = 2, 4 , or 8, with f(2) = 1/2 and f(4)
= f(8) = 1/4. The corresponding expected value or mean is
4(i) +
= E(X) = 2(i) + 8() = 4
as before. Notice that this also could model selection from a larger collection, as
long as the possible observed values of X and the respective proportions in the
collection,f(x), remain the same as in the present example.
There is an analogy between the distribution of probability to values, x, and the distribution of
mass to points in a physical system For example, if masses of 0.5, 0.25, and 0.25
grams are placed at the respective points x = 2, 4, and 8 cm on the horizontal
axis, then the value 2(0.5) + 4(0.25) + 8(0.25) = 4 is the "center of mass' or
2J
In the previous example E(X) coincides with one of the possible values of X, but this is not always the
case, as illustrated by the following example.
from the box considered in Example 2.2.2. If the numbers on the two chips
match, then the player wins $2; otherwise, she loses $1. Let X be the amount won
by the player on a single play of the game. There are only two possible values,
spond to a match. The distribution of X isf(2) = 1/6 andf(-1) = 5/6, and con- sequently the expected
have that IL and f approach f( - 1) and f(2), respectively, and thus the player's
average winnings approach E(X) as M approaches infinity.
Notice also that the game will be more equitable if the payoff to the player is changed to $5 rather
than $2, because the resulting expected amount won then will be (- 1X5/6) +
Example 2.3.1 Each work day a man rides a bus to his place of business. Although a new bus
arrives promptly every five minutes, the man generally arrives at the bus stop at a random time
between bus arrivals. Thus, we might take his waiting time on any given morning to be a random
variable X.
Although in practice we usually measure time only to the nearest unit (seconds, minutes, etc.), in theory
we could measure time to within some arbitrarily small unit. Thus, even though
in practice it might be possible to regard X as a discrete
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
FIGURE 2.5
2.3 CONTINUOUS RANDOM VARIABLES 63
years that the frequency of days when he waits no more than the form x
[0, 5]. In other words, P[O X 5] = 1, and it follows that that i = F(5)
F(x)=Oifx<O = c 5, and thus a nd c F(x)= = 1/5, and lifx>5. F(x) = x/5
P[x<X'x+Ax]=F(x+Ax)F(x)=cAx this
imposes on the distribution
of X is for all O x < x + Ax 5 and some c > 0. Of course, this implies that if F(x)
for x P[x < O <X or x> x + 5, Ax] the = O when d erivative x and x also +
In general, if F(x) is the CDF of a continuous random variable X, then we will denote its derivative
(where it exists) byf(x), and under certain conditions, which will be specified
shortly, we will callf(x) the probability density function of X. In our example,
F(x) can be represented for values of x in the interval [0, 5] as the integral of its
derivative:
F(x) f x)
X
o 5
o 5
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
64 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
This provides a general approach to defining the distribution of a continuous random variable X.
Definition
23.1
A random variable X is called a continuous random variable if there is a function f(x), called the probability
density function (pdf) of X, such that the CDF can be represented as
rx
F(x)=J f(t)dt (2.3.1) -
In more advanced treatments of probability, such distributions sometimes are called "absolutely
continuous" distributions. The reason for such a distinction is that CDFs exist
that are continuous (in the usual sense), but which cannot be represented as
the integral of the derivative. We will apply the terminology con tinuous
distribution only to probability distributions that satisfy property (2.3.1).
Sometimes it is convenient to use a subscripted notation, Fx(x) and f(x), for the CDF and pdf,
respectively.
The defining property (2.3.1) provides a way to derive the CDF when the pdf is given, and it follows by
the Fundamental Theorem of Calculus (hat the pdf can be obtained from the
CDF by differentiation. Specifically,
wherever the derivative exists. Recall from Example 2.3.1 that there were two
values of x where the derivative of F(x) did not exist. In general, there may be
many values of x where F(x) is not differentiable, and these will occur at discon
tinuity points of the pdf, f(x). Inspection of the graphs of f(x) and F(x) in Figure
2.5 shows that this situation occurs in the example at x = O and x 5. However,
this will not usually create a problem if the set of such values is finite, because an
ity whether ¡ includes the endpoints or not. In other words, for a continuous
random variable X, if a < b,
P[a <X b] = P[a X < b] = P[a <X < b]
= P[a X b ] (2.3.3) and each of these has the value F(b) - F(a).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.3 CONTINUOUS RANDOM VARIABLES 65
Thus, the CDF, F(x), assigns probabilities to events of the form (- cia, x], and equation (2.3.3) shows
how the probability assignment can be extended to any interval. Any function
Theorem 231 A functionf(x) is a pdf for some continuous random variable X if and only if it
satisfies the properties
f(x) ? O (2.3.4)
roe
= 1 (2.3.5) P
Jf(x) - dx roof
Properties (2.2.9) and (2.2.11) of a CDF follow from properties (2.3.5) and (2.3.4),
respectively. The other properties follow from general results about integrals.
Example 2.3.2 A machine produces copper wire, and occasionally there is a flaw at some point
along the wire. The length of wire (in meters) produced between successive flaws is a continuous
random variable X with pdf of the form
(2.3.6
)
+x) x>O
xO
b], can be expressed directly in terms of the CDF or as integrals of the pdf. For example, the probability
that a flaw occurs between 0.40 and 0.45 meters is given by
Consideration of the frequency of occurrences over short intervals was sug- gested as a possible way to
study a continuous distribution in Example 23.1. This approach provides some insight into the general
nature of continuous dis- tributions. For example, it may be observed that the frequency of occurrences
over short intervals of length Ax, say [x, x + Ax], is at least approximately pro- portional to the length of
This is illustrated in Figure 2.6. for the copper wire example. The exact probability in equation (2.3.7) is
the limits
0 40 and 0 45 to obtain the exact answer 0 035 For longer intervals inte gratingf(x) as in equation
(2.3.8) would be more reasonable.
Note that in Section 2.2 we referred to a probability density function or density function for a discrete
random variable, but the interpretation there is different, because probability is assigned at discrete points
in that case rather than in a continuous manner. However, it will be convenient to refer to the "density func-
tion" or pdf in both continuous and discrete cases, and to use the same notation, f(x) or f(x), in the later
As in the discrete case, other notations for E(X) are ¡t or ¡t, and the terms mean or expectation of X also
still valid in this case, where mass is assigned to the x-axis in a continuous
manner and in accordance with f(x). Thus, ¡L can also be regarded as a central
measure for a continuous distribution.
In Example 2.3,2, the mean length between flaws in a piece of wire is
CO
¡=JxOdx+J x2(l+x)3dx
- OIf we make the
substitution t = i + x, then
Definition 232
If O <p < 1, then a 100 x pth percentile of the distribution of a continuous random
variable X is a solution x to the equation
F(x)=p (2.3.10)
In general, a distribution may not be continuous, and if it has a discontinuity, then there will be some
values of p for which equation (2.3.10) has no solution. Although we
emphasize the continuous case in this book, it is possible to state a general
definition of percentile by defining a pth percentile of the distribution of X to
be a value x, such that P[X xi,] p and P[X x,,] ? i - p. In essence, x is a value
such that 100 x p percent of the population values are at most x, and 100 x (1