100% found this document useful (1 vote)
575 views167 pages

Introduction To Probability and Mathematical Statistics

Uploaded by

simona lydia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
575 views167 pages

Introduction To Probability and Mathematical Statistics

Uploaded by

simona lydia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 167

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION

PDFCompressor
Specia' Continuous Distributions
Notation and ​Parameters ​Continuous pdf fix) ​Mean Variance ​MGF M(t)
Student's t
X ​t(v)
v= 1,2,...
Snedecor's F
(y1 + v2
X ​F(v1, y2)

2​
(y \ )​ ​Ív2\ (vii!
​ ​'\!'2) ​\\2) \2

2
2v(v1 +v2-2) ​v1(v2 2)2(v2 4) ​y1 = 1,2,.
y2 = 1,2,.

Beta​X - BETA(a,b)

O ​0<b <a
​ ​*Not tractable. ​Does not exist.

V​
v-2
2<v

F'(a)F(b)​
O<x<1

O​
l<v

\ ​VJ

​ ​(a+b+ l)(a+b)2
2<v2 4<v2 ​a​a+b ab
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
Specia' Continuous Distributions
Notation and ​Parameters ​Continuous pdf ​f(x) M
​ ean ​Variance ​MGF M(t)
WeibuH

X WEI(0,ß) ​0<0 ​0<ß

xe' ​or(i +) ​02[(t ​ - +​

​ xtreme Value
ß ​0<x E
X ​EV(0, 17) ​exp {[(x )/O] exp [(x-17)/0]} ​- yO
&T(i + Ot) ​
2O2 ​
6​ 0<0
y ​0.5772 ​(Euler's const.)
Cauchy
X CAU(O, ,j) ​0<0
thr{1 + [(x )/0],}
Pareto
X - ​PAR(O, ,c)
+ ​X/O)K +1
¡ç ​0(1 ​

'ç1 ​ (-2)( ​02K


O​

1)2
**
0<0
0<x ​1<K ​2<​Chi-Square
X -'- x2(v) ​2/2 U(v/2)

X''2 'e'2 ​ 2v ​I 2t ​v=1,2,... ​0<x


V​
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
INTRODUCTION ​TO
PROBABILITY AND
MATHEMATICAL
STATISTICS

SECOND EDITION

Lee J. Bain ​University of


Missouri - Rol/a

Max Engelhardt ​University of

Idaho​
At Du::bur T​ homson
Learñing
Australia Canada Mexico Singapore ​Spain ​United Kingdom United States
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
The Duxbuiy Classic Series is a collection of authoritativeworks from respected authors.
Reissued as paperbacks, these successful titles are now more affordable.

COPYRIGHT © 1992, 1987 by Brooks/Cole ​Dux bury is an imprint of


Brooks/Cole, a division of Thomson Learning ​The Thomson Learning
logo is a trademark used herein under license.

For more information about this or any óther Duxbury product, contact:
DUXBTJRY ​511 Forest Lodge Road ​Pacific Grove, CA 93950 USA
www.duxbury.com 1-800-423-0563 (Thomson Learning Academic
Resource Center)

All rights reserved. No part of this work may be reproduced, transcribed or used in any form or by any
meansgraphic, electronic, or mechanical, including photocopying, recording, taping, Web distribution,
or information storage and/or retrieval systemswithout the prior written permission of the publisher.

For permission to use material from this work, contact us by


Web: ​www.thomsonrights.com ​fax: ​1-800-730-2215 phone:
1-800-730-2214

Printed in the United States of America

10 ​9 ​8 ​7 6 5 ​4 ​3 ​2

Library of Congress Cataloging-in-Publication Data

Bain, Lee J.
Introduction to probability and mathematical statistics / Lee J. Bain, Max Engelhardt.-2°'
ed. ​p. ​cm.(The Duxbuiy advanced series in statistics and decision sciences) ​Includes

bibliographical references and index. ISBN


​ 0-534-92930-3 (hard cover) ISBN 0-534-38020-4

(paperback) ​1. Probabilities. ​2. Mathematical statistics. ​I. Engelhardt, Max. ​II. Title. ​ifi. Series.
QA273.B2546 ​1991 ​519.2---dc2O
91-25923

dF
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS

CHA PTER

PROBABILITY ​i
1.1 ​Introduction ​1 ​1.2 ​Notation
and terminology ​2 ​1.3 ​Definition of probability ​9 ​1.4 ​Some
properties of probability ​13 ​1.5 ​Conditional probability 1​ 6 ​1.6 ​Counting techniques 3​ 1
Summary 4​ 2 ​Exercises ​43

CHAPTER

RANDOM VARIABLES AND THEIR


DISTRIBUTIONS 53
​ 3 ​2.2 ​Discrete
2.1 ​Introduction 5 random variables ​56 ​2.3 ​Continuous random variables
62 ​2.4 ​Some properties of expected values ​71 ​2.5 ​Moment generating functions ​78
Summary 8​ 3 ​Exercises ​83​V
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
Vi ​CONTENTS

CHAPTER

SPECIAL PROBABILITY DISTRIBUTIONS ​90


​ 0 ​3.2 ​Special
3.1 ​Introduction 9 discrete distributions ​91 ​3,3 ​Special continuous
distributions ​109 3.4 ​Location and scale parameters ​124 ​Summary ​127 ​Exercises ​128

CHAPTER

JOINT DISTRIBUTIONS ​136

​ 36 ​4.2 ​Joint
4.1 ​Introduction 1 discrete distributions ​137 ​4.3 ​Joint continuous distributions
144 ​4.4 ​Independent random variables ​149 ​4.5 ​Conditional distributions ​153 ​4.6 ​Random
samples ​158 ​Summary ​165 ​Exercises ​165

CHAPTER

PROPERTIES OF RANDOM VARIABLES 171

5.1 ​Introduction ​. ​171 5.2


​ ​Properties of expected values ​172 ​5.3 ​Correlation ​177 ​5.4
Conditional expectation ​180 ​5;5 ​Joint moment generating functions ​186 ​Summary ​188
Exercises ​189
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS ​vu

CHAPTER

FUNCTIONS OF RANDOM VARIABLES 193


6.1 ​Introduction 1​ 93 ​6.2 ​The
CDF' technique ​194 ​6.3 ​Transformation methods ​197 ​6.4
Sums of random variables 2​ 09 ​6.5 ​Order statistics ​214 ​Summary ​226 ​Exercises ​226

CHA PTER

LIMITING DISTRIBUTIONS ​231


​ 31 ​7.2 ​Sequences
7.1 ​Introduction 2 of random variables ​232 ​7.3 ​The central limit theorem
236 ​7.4 ​Approximations for the binomial distribution ​240 ​7.5 ​Asymptotic normal
distributions 2​ 43 ​7.6 ​Properties of stochastic convergence 2​ 45 ​7.7 ​Additional limit
theorems 2​ 47 ​7.8* Asymptotic distributions of extreme-order statistics 2​ 50 ​Summary
259 ​Exercises 2​ 59

CHAPTER

STATISTICS AND SAMPLING


DISTRIBUTIONS 263
​ 63 ​8.2 ​Statistics 2​ 63 ​8.3 ​Sampling
8.1 ​Introduction 2 distributions ​. ​267 ​8.4 ​The t, F, and
beta distributions ​273 ​8.5 ​Large-sample approximations 2​ 80 ​Summary ​283 ​Exercises ​283

* Advanced (or optional) topics


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
viii ​CONTENTS

CHAPTER

POINT ESTIMATION 288


​ 88 ​9.2 ​Some
9.1 ​Introduction 2 methods of estimation ​290 ​9.3 ​Criteria for evaluating
estimators ​302 ​9.4 ​Large-sample properties ​311 9.5 ​Bayes and minimax estimators ​319
Summary 3​ 27 ​Exercises ​328

CHAPTER ​ 10
SUFFICIENCY AND COMPLETENESS 335

​ 35 ​10.2 ​Sufficient statistics ​337 10.3


10.1 ​Introduction 3 ​ ​Further properties of sufficient

statistics ​342 10.4


​ ​Completeness and exponential class ​345 ​Summary ​353 ​Exercises ​353

CHAPTER ​ 11
INTERVAL ESTIMATION 358

​ 58 ​11.2 ​Confidence
11.1 ​Introduction 3 intervals ​ 359 11.3
-​
​ ​Pivotal quantity method ​362 ​11.4
General method ​369 ​11.5 ​Two-sample problems ​377 ​11.6 ​Bayesian interval estimation ​382
Summary ​383 ​Exercises ​384
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS ​ix

CI-/APTER ​ 12
TESTS OF HYPOTHESES 389
​ 89 ​12.2 ​Composite hypotheses 3​ 95 ​12.3 ​Tests for the normal
12 1 ​Introduction 3
distribution ​398 12.4 ​Binomial tests ​404 ​12.5 ​Poisson tests ​406 ​12.6 ​Most powerful tests ​406
12.7 ​Uniformly most powerful tests ​411 ​12.8 ​Generalized likelihood ratio tests ​417 ​12.9
436
Conditional tests ​426 ​12.10 Sequential tests ​428 ​Summary ​435 ​Exercises ​ -​

CHAPTER ​ 13
CONTINGENCY TABLES ​AND
GOODNESS-OF-FIT 442
​ 42 ​13.2 ​One-sample
13.1 ​Introduction 4 binomial case 4​ 43 ​13.3 ​r-Sample binomial test
(completely specified H0) ​444 ​13.4 ​One-sample multinomial ​447 ​13.5 ​r-Sample
multinomial 4​ 48 ​13.6 ​Test for independence, r x c contingency table ​450 ​13.7 ​Chi-squared
goodness-of-fit test 4​ 53 ​13.8 ​Other goodness-of-fit tests 4​ 57 ​Summary 4​ 61 ​Exercises 4​ 62

CHAPTER ​ 14
NONPARAMETRIC METHODS 468

​ 68 ​14.2 ​One-sample
14.1 ​Introduction 4 sign test ​469 ​14.3 ​Binomial test (test on quantiles)
476
471 ​14.4 ​Two-sample sign test ​

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
14.5 ​14.6 ​14.7 14.8 ​14.9
15.1 ​15.2 15.3 15.4 ​15.5
16.1 ​16.2 ​16.3 ​16.4 ​16.5
CONTENTS

Wilcoxon paired-sample signed-rank test ​Paired-sample randomization test ​Wilcoxon and Mann-Whitney
(WMW) tests ​Correlation teststests of independence ​Wald-Wolfowjtz runs test ​Summary ​Exercises ​CHAPTER

J5*
477 ​482 ​483 ​486 492 494 ​495
REGRESSION AND LINEAR MODELS ​Introduction ​Linear regression Simple linear regression ​General
linear model ​Analysis of hivariate data ​Summary ​Exercises
CHAPTER
* ​499
499 ​500 ​501 ​515 529 534 ​535
RELIABILITY AND SURVIVAL ​DISTRIBUTIONS 540 ​Introduction ​Reliability concepts ​Exponential
distribution ​Weibull distribution ​Repairable systems ​Summary ​Exercises
540 ​541 ​548 ​560 ​570 ​579 ​579
APPENDIXA REVIEWOFSETS 587 ​APPENDIX B ​SPECIAL DISTRIBUTIONS 594
APPENDIX C TABLES OF DISTRIBUTIONS 598 ​ANSWERS TO SELECTED EXERCISES
619 ​REFERENCES 638 ​INDEX 641
Advanced (or optional) topics
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PREFACE

This book provides an introduction to probability and mathematical statistics.


Although the primary focus of the book is on a mathematical development of the
subject, we also have included numerous examples and exercises that are oriented
toward applications. We have attempted to achieve a level of presentation that is
appropriate for senior-level undergraduates and beginning graduate students.
The second edition involves several major changes, many of which were sug ​gested by reviewers
and users of the first edition. Chapter 2 now is devoted to ​general properties of
random variables and their distributions. The chapter now ​includes moments

and moment generating functions, which occurred somewhat ​later in the first

edition. Special distributions have been placed in Chapter 3. Chapter


​ 8 is
completely changed. It now considers sampling distributions and ​some basic
properties of statistics. Chapter 15 is also new. It deals with regression ​and related
aspects of linear models ​As with the first edition, the only prerequisite for

covering the basic material is ​calculus, with the lone exception of the material

on general linear models in Section


​ 15.4; this assumes some familiarity with

matrices. This material can be ​omitted if so desired. ​Our intent was to produce

a book that could be used as a textbook for a two-semester


​ sequence in
which the first semester is devoted to probability con- ​cepts and the second
covers mathematfcal statistics. Chapters 1 through 7 include topics that usually

are covered in a one-semester introductory course in probabil- ​ity, while

Chapters 8 through 12 contain standard topics in mathematical sta- tistics.



Chapters 13 and 14 deal with goodness-of-fit and nonparametric statistics. ​These

chapters tend to be more methods-oriented. Chapters 15 and 16 cover


material in regression and reliability, and these would be considered as optional
or special topics. In any event, judgment undoubtedly will be required in

the​xi
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
xii ​PREFACE

choice of topics covered or the amount of time allotted to topics if the

desired ​material is to be completed in a two-semester course. ​lt is our hope that

those who use the book will find it both interesting and informative.

ACKNOWLEDG M ENTS

We gratefully acknowledge the numerous suggestions provided by the


following ​reviewers:

Dean H. Fearn ​Alan M. Johnson ​Calfornia ​State UniversityHayward


University ​of​Arkansas, Little Rock

Joseph Glaz ​Benny P. Lo ​University of Connecticut ​Ohione


College

Victor Goodman ​D. Ramachandran ​Rensselaer Polytechnic Institute ​California


State UniversitySacramento

Shu-ping C. Hodgson ​Douglas A. Wolfe Central Michigan


University ​Ohio State University

Robert A. Hultquist Linda J. Young ​Pennsylvania State University


Oklahoma State University

Thanks also are due to the following users of the first edition who were kind

enough to relate their experiences to the authors: H. A. David, Iowa State Uni-
versity; Peter Griffin, California State UniversitySacramento.
Finally, special thanks are due for the moral support of our wives, Harriet Bain ​and Linda Engelhardt.
Lee J. Bain ​Max Engelhardt
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C ​H ​A ​P ​T

PROBtJ3ILITY

1.1

INTRODUCTION​In any scientific study of a physical phenomenon, it is desirable to ​have a

mathe- matical
​ model that makes it possible to describe or predict the observed value of some
characteristic of interest. As an example, consider ​the velocity of a falling ​body after a certain
length of time, t. The formula y = gt, where g ​32.17 feet per ​second p ​ er second, ​per second, o ​ fa
body ​provides f​ alling ​a useful f​ rom rest ​mathematical i​ n a vacuum. ​model ​This f​ or i​ s ​the a​ n
velocity, e​ xample ​in o​ f ​feet a​ ​deterministic model. For such a model, carrying out repeated
experiments under ideal conditions would result in essentially the same velocity each time, and this

would be predicted by the model. On the other hand, such a model may not be adequate
​ when

the experiments are carried out under less than ideal conditions. ​There may be unknown or

uncontrolled variables, such as air temperature or humidity,


​ that might affect the outcome, as

well as measurement error or other ​factors that might cause the results to vary on different

performances of the I​
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2 ​CHAPTER 1 ​PROBABILITY

experiment. Furthermore, we may not have sufficient knowledge to derive a


more complicated model that could account for all causes of variation.
There are also other types of phenomena in which different results may natu- ​rally occur by chance,

and for which a deterministic model would not be appro. ​priate. For example,

an experiment may consist of observing the number of particles


​ emitted by a
radioactive source, the time until failure of a manufactured ​component, or the
outcome of a game of chance.
The motivation for the study of probability is to provide mathematical models ​for such
nondeterministic situations; the corresponding mathematical models will ​be

called probability models (or probabilistic models). The term stochastic, which ​is

derived from the Greek word stochos, meaning "guess," is sometimes used
instead of the term probabilistic. ​A careful study of probability models requires

some familiarity with the nota- ​tion and terminology of set theory. We will

assume that the reader has some knowledge


​ of sets, but for convenience we
have included a review of the basic ​ideas of set theory in Appendix A.

L2
NOTATION AND TERMINOLOGY

The term experiment refers to the process of obtaining an observed result of some

phenomenon. A performance of an experiment is called a trial of the experiment,

and an observed result is called an outcome. This terminology is rather general,


and it could pertain to such diverse activities as scientific experiments or games

of chance. Our primary interest will be in situations where there is uncertainty

about which outcome will occur when the experiment is performed We will
assume that an experiment is repeatable under essentially the same conditions,

and that the set of all possible outcomes can be completely specified before
experimentation.

Definition
1.2.1
The set of all possible outcomes of an experiment is called the sample space, denoted
by S.​Note Chat one and only one of the possible outcomes will occur on any given

trial of
​ the experiments.

Exaíipk i .2.1 ​An experiment consists of tossing two coins, and the observed face of each coin is
of interest. The set of possible outcomes may be represented by the sample space

S = {HH, HT, TH, TT}


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.2 ​NOTATION AND TERMINOLOGY ​3

which simply lists all possible pairings of the symbols H (heads) and T (tails). An
alternate way of representing such a sample space is to list all possible ordered
pairs of the numbers i and O, S = {(l, 1), (1, 0), (0, 1), (0, O)}, where, for example,

(1, 0) indicates that the first coin landed heads up and the second coin landed
tails up.

Example 1.2.2 ​Suppose that in Example 1.2.1 we were not interested in the individual outcomes
of the coins, but only in the total number of heads obtained from the two coins. ​An appropriate
sample space could then be written as S'1' = {0, 1, 2}. Thus, differ- ​ent sample spaces may be
appropriate for the same experiment, depending on the ​characteristic of interest.

Exampl& 1.2.3 ​If a coin is tossed repeatedly until a head occurs, then the natural sample space is ​S
= {H, TH, TTH, . ​. .}. ​ If one is interested in the number of tosses required to ​obtain a head, then
a possible sample space for this experiment would be the set of all positive integers, S'1 = {1, 2,
3, .​ ​ .}, and the outcomes would correspond ​directly to the number of tosses required to obtain the
. ​

first head. We will show in ​the next chapter that an outcome corresponding to a sequence of
tosses in which ​a head is never obtained need not be included in the sample space.

Exampk 1.2.4 ​A light bulb is placed in service and the time of operation until it burns out is
measured, At least conceptually, the sample space for this experiment can be ​taken to be the
set of nonnegative real numbers, S = {tIO ​t < c}. ​Note that if the actual failure time could be

measured only to the nearest hour, ​then the sample space for the actual observed failure time

would be the set of nonnegative


​ integers, S'i' = {O, 1, 2, 3, . ​ .}. Even though S* may be the
.​

observable ​sample space, one might prefer to describe the properties and behavior of light ​bulbs
in terms of the conceptual sample space S. In cases of this type, the dis- ​creteness imposed by
measurement limitations is sufficiently negligible that it can ​be ignored, and both the measured
response and the conceptual response can be ​discussed relative to the conceptual sample space S.

A sample space S is said to be finite if it consists of a finite number of out-

comes, say S = {e1, e2 .....eN}, and it is said to be countably infinite if its out-
comes can be put into a one-to-one correspondence with the positive integers, say
S= {e1,e2,...}.

Definition 1.2.2
If a sample space S is either finite or countably infinite then it is called a thscrete
sample space.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER 1 ​PROBABILITY

A set that is either finite or countably infinite also i ​said to be countable. This ​is

the case in the first three examples. It is also true for the last example when
failure times are recorded to the nearest hour, but not for the conceptual sample
space. Because the conceptual space involves outcomes that may assume any
value in some interval of real numbers (i.e., the set of nonnegative real numbers),
it could be termed a continuous sample space, and it provides an example where ​a

discrete sample space is not an appropriate model. Other, ​more complicated

experiments exist, the sample spaces of which also could be characterized ​as con-
tinuous, such as experiments involving two or more continuous responses.

Suppose a heat lamp is tested and X, the amount of light produced (in lumens), ​and Y, the amount of

heat energy (in joules), are measured. An appropriate sample


​ space would

be the Cartesian product of the set of all nonnegative real ​numbers with itself,

S=[O,co)x[O,cx)={(x,y)fOx<cc and Oy<cc} ​Each variable would be capable of


assuming any value in some subinterval of ​[O, cc).
Sometimes it is possible to determine bounds on such physical variables, but ​often it is more

convenient to consider a conceptual model in which the variables ​are not

bounded. If the likelihood of the variables in the conceptual model


exceeding such bounds is negligible, then there is no practical difficulty in using

the conceptual model.

Example 1.2.6 A thermograph is a machine that records temperature continuously by tracing a

graph on a roll of paper as it moves through the machine. A thermographic recording


​ is

made during a 24-hour period. The observed result is the graph of a ​continuous real-valued

function f(t) defined on the time interval ​[0, ​24] ​= {t O t​ ​24}, and an appropriate sample space

would be a collection of such functions.


Definition 1.2.3
An event is a subset of the sample space S. If A is an event, then A has occurred if it

contains the outcome that occurred.

To illustrate this concept, consider Example 1.2.1. The subset

A = {HH, HT, TH}

contains the outcomes that correspond to the event of obtaining "at least ​one

head." As mentioned earlier, if one of the outcomes in A occurs, then ​we say that
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.2 ​NOTATION AND TERMINOLOGY

the event A has occurred. Similarly, if one of the outcomes in B = {HT, TH, TT} occurs, then we say that the
event "at least one tail" has occurred. ​Set notation and terminology provide a useful framework for

describing the ​possible outcomes and related physical events that may be of interest in an experiment.

As suggested above, a subset of outcomes corresponds to a physical ​event, and the event or the subset is
said to occur if any outcome in the subset ​occurs. The usual set operations of union, intersection, and
complement provide ​a way of expressing new events in terms of events that already have been defined. ​For

example, the event C of obtaining "at least one head and at least one tail" ​can be expressed as the

intersection of A and B, C = A n B = {HT, TH}. Simi- ​larly, the event "at least one head or at least one

tail" can be expressed as the ​union A u B ​{HH, HT, TH, TT}, and the event "no heads" can be expressed
as the complement of A relative to S, A' = {TT}.
A review of set notation and terminology is given in Appendix A. ​In general, suppose S is the sample space for
some experiments, and that A and ​B are events. The intersection A n B represents the outcomes of the event
"A and ​B," while the union A u B represents the event "A or B." The complement A' ​corresponds to the
event "not A." Other events also can be represented in terms ​of intersections, unions, and complements. For
example, the event "A but not B" ​is said to occur if the outcome of the experiment belongs to A n B', which

some- ​times is written as A - B. The event "exactly one of A or B" is said to occur if the ​outcome belongs to

(A n B') u (A' n B). The set A' n B' corresponds to the event​ "neither A nor B." The set identity A' n B'
= (A u B)' is another way to ​represent this event. This is one of the set properties that usually are referred
to as De Morgan's laws. The other such property is A' u B' = (A n B)'.

​ is a finite collection of events, occurrence of an ​k ​outcome in the intersection


More generally, if A1, ​..., A

​ corresponds to the ​occurrence of the event "every A1; i = 1, ..., k." The occurrence of
A1 n ​n Ak (or ​fl A,)
an outcome in
the union A1 u ​u Ak (or ​A,) corresponds to the occurrence of the event
"at least one A,; i = 1, ..., k." Similar remarks apply in the case of a countably ​infinite collection A1, A2,
..., ​with the notations A1 n A2 n ​(or ​fl ​
A1) for

the intersection and A1 u A2 u ​(or ​Y A)


​ for the union.
The intersection (or union) of a finite or countably infinite collection of events ​is called a countable intersection
(or union).
We will consider the whole sample space S as a special type of event, called the ​sure event, and we also will
include the empty set Ø as an event, called the null ​event. Certainly, any set consisting of only a single
outcome may be considered as ​an event.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER 1 ​PROBABILITY

Definition 1.2.4 ​An event is called an elementary event if it contains


exactly one outcome of the ​experiment.

In a discrete sample space, any subset can be written as a countable union of ​elementary events, and
we have no difficulty in associating every subset with an ​event in the discrete
case.
In Example 1.2.1, the elementary events are {HH}, {HT}, {TH}, and {TT}, and ​any other event can be
written as a finite union of these elementary events. ​Simi- ​larly, in Example 1.2.3,
,and any ​
the elementary events are {H}, {TH}, {TTH}, . ​ . . ​
event can be
represented as a countable union of these elementary events.
It is not as easy to represent events for the continuous examples. Rather than
attempting to characterize these events rigorously, we will discuss some examples.
In Example 1.2.4, the light bulbs could fail during any time interval, and any
interval of nonnegative real numbers would correspond to an interesting event
for that experiment. Specifically, suppose the time until failure is measured in
hours. The event that the light bulb "survives at most 10 hours" corresponds to
the interval A = [0, 10] = ​{tIO ​t ​1O}. The event that the light bulb "survives
more than 10 hours" is A' = (10, cu) = ​{tI ​10 < t < cc}. If B = [0, 1​ 5), then
C = B n A' = (10, 15) is the event of "failure between 10 and 15 hours."
In Example 1.2.5, any Cartesian product based on intervals of nonnegative real numbers would
correspond to an event of interest. For example, the event

(10, 20) x [5, cc) = {(x, y) 110 <x < 20 and 5 ​y < cc} corresponds to "the amount of light is
between 10 and 20 lumens and the amount ​of energy is at least 5 joules." Such
an event can be represented graphically as a ​rectangle in the xy plane with sides
parallel to the coordinate axes.
In general, any physical event can be associated with a reasonable subset of S, and often a subset of S

can be associated with some meaningful event. For math- ​ematical reasons,

though, when defining probability it is desirable to restrict the types


​ of subsets
that we will consider as events in some cases. Given a collection ​of events, we
will want any countable union of these events to be an event. We ​also will
want complements of events and countable intersections of events to be ​included
in the collection of subsets that are defined to be events. We will assume ​that the
collection of possible events includes all such subsets, but we will not ​attempt
to describe all subsets that might be called events. ​An important situation arises

in the following developments when two events correspond


​ to disjoint subsets.
Definition ​1.25
Two events A and B are called mutually exclusive if A n B ​= 0.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.2 ​NOTATtON AND TERMINOLOGY

If events are mutually exclusive, then they have no outcomes in common. Thus, ​the occurrence of one
event precludes the possibility of the other occurring. In ​Example 1.2.1, if A is

the event "at least one head" and if we let B be the event ​"both tails," then A

​ ​= ​A' (the
änd B are mutually exclusive. Actually, in this example B
complement of A). In general, complementary events are mutually ​exclusive,

but the converse is not true. For example, if C is the event "both heads,"

then B and C are mutually exclusive, but not complementary.
The notion of mutually exclusive events can be extended easily to more than ​two events.

Definition ​1.26 ​Events A1, A2, A3, ..., are said to be mutually exclusive
j
​i ​
if they are pairwise ​mutually exclusive That is if A, n A ​= Ø whenever

One possible approach to assigning probabilities to events involves the notion ​of relative frequency.

RELATIVE FREQUENCY ​For the experiment of tossing a coin, we may

declare that the probability of ​obtaining a head is 1/2 This could be

interpreted in terms of the relative fre quency


​ with which a head is obtained
on repeated tosses. Even though the coin ​may be tossed only once, conceivably
it could be tossed many times, and experi- ​ence leads us to expect a head on

approximately one-half of the tosses. At least conceptually,


​ as the number of
tosses approaches infinity, the proportion of times a head occurs is expected to

converge to some constant p. One then might define ​the probability of obtaining

a head to be this conceptual limiting value. For a balanced


​ coin, one would
expect p ​= ​1/2, but if the coin is unbalanced, or if the ​experiment is conducted
under unusual conditions that tend to bias the outcomes ​in favor of either heads
or tails, then this assignment would not be appropriate. More generally, if m(A)

represents the number of times that the event A occurs ​among M trials of a given

experiment, then fA ​= ​m(A)/M represents the relative frequency


​ of occurrence of
A on these trials of the experiment.
Example 1.2.7 ​An experiment consists of rolling an ordinary six-sided die. A natural sample ​space
is the set of the first six positive integers, S ​= ​{1, 2, 3, 4, 5, 6}. A simulated ​die-rolling experiment is

performed, using a "random number generator" on a ​computer. In Figure ​1.1, t​ he relative

​ ​= {1}, ​A2 ​= {2}, ​and so on are represented as the heights of


frequencies of the elementary events A1
vertical lines. The ​first graph shows the relative frequencies for the first M ​= 30 rolls, and the

second graph gives the results for M ​= 600 rolls. By inspection of these graphs,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
CHAPTER 1 ​PROBABILITY
8​
obviously the relative frequencies tend to "stabilize" near some fixed value as M ​increases Also included in
the figure is a dotted line of ​height 1/6, which is the ​value that experience would suggest as the long term
relative frequency of the ​outcomes of rolling a die Of course, in this example, the results are more relevant
to the properties of the random number generator used to simulate the experi ​ment than to those of actual
dice.
FIGURE 1.1 ​Relative frequencies of elementary events for die-rolling experiment

A. ​7/30
2 ​3 ​4
(M = 30)
-4.
6/30 ​107/600 103/600 99/600
97/600
96/600 ​98/600 ​ ,​

2 ​3 ​4 ​(M = 600)
If, for an event A, the limit offA as M approaches infinity exists, then one could assign probability to A by
P(A) = um fA
(1.2.1)

This expresses a property known as statistical regularity Certain technical ​questions about this

property require further drcussion For example, it is not clear sense,


​ ​under or
​ whether ​what it
​ ​conditions
will necessarily ​the limit be
​ the ​in equation same
​ for every ​(1 2 1) sequence
​ ​ ​or trials
​will exist of ​ ​in what
Our ​approach to this problem will be to define probability in terms of a set of axioms ​and eventually show
that the desired limiting behavior follows
To motivate the defining axioms of probability, consider the following proper ​ties of relative frequencies. If

S is the sample space for an experiment and A is an ​event, ​occurrences then


​ clearly ​of A, O
​ ​and m(A)
​ ​S

occurs and
​ rn(S) ​on each =
​ M, ​trial. because
​ ​Furthermore, m(A)
​ counts ​if the
​ ​A and number
​ ​B are of

mutually exclusive events, then outcomes in A are distinct from outcomes in B, ​and consequently rn(A u
B) = rn(A) + m(B). More generally, if A1, A2, ... are ​pairwise mutually exclusive, ​then m(A1 u A2 u ​=
​ hus, the following properties hold for relative frequencies:
m(A1) + rn(A2) ​+ T

O​(1.2.2) ​
f5=1
(1.2.3)
fA1 ​u A2 u ​= fA, +fA2 +
(1.2.4)
if A1, A2, ... are pairwise mutually exclusive events.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.3 ​DEFINITION OF PROBABILITY

Although the relative frequency approach may not always be adequate ​as a ​practical method of

assigning probabilities, it is the way that probability usually is


​ interpreted.

However, many people consider this interpretation too restrictive. ​By ​they

regarding ​are willing probability


​ ​to assign as
​ a ​probability subjective
​ measure

​ belief that ​involving an


in any situation of ​ event ​uncertainty will ​ occur,
with6ut assuming properties such as repeatability or statistical regularity. Sta-
tistical methods based on both the relative frequency approach and the sUbjective
approach will be discussed in later chapters.

L
3
DEFINITON OF ​OAflLITY

Given an experiment with an associated sample space S, the primary objective of


probability modeling is to assign to each event A a real number F(A), called the
probability of A, that will provide a measure of the likelihood that A will occur

when the experiment is performed. Mathematically,


​ we can think of P(A) as a set

function. In other words, it is ​a ​function ​subset of ​whose ​the real ​domain

numbers, ​is a collection of sets (events), and the ​range of which is a Some
​ set
functions are not suitable for assigning probabilities to events. The properties
given in the following definition are motivated by similar properties ​that hold

for relative frequencies.

Definition
1.3.1
For a given experiment, S denotes the sample space and A, A1, A2, ... represent
possible events. A set function that associates a real value P(A) with each event A ​is
called a probability set function, and P(A) is called the probability of A, if the ​follow-

ing properties are satisfied:


O ​F(A) ​for every A ​(1.3.1)

F(S) = i ​(1.3.2)

P(UAI) (1.3.3)

ifA1, ​A2,.. ​are pairwise mutually exclusive events.

These properties all seem to agree with our intuitive concept of probability, and these few
properties are sufficient to allow a mathematical structure to be ​developed.
One consequence of the properties is that the null event (empty set) has prob-

ability zero, P(Ø) = O (see Exercise li). Also, if A and B are two mutually exclu-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
10 ​CHAPTER 1 ​PROBABILITY

sive events,
then

P(A u B)= P(A)+ P(B) ​(1.3.4) ​Similarly, if A1, A2, ​..., ​A is a finite collection of

pairwise mutually exclusive ​events, then​P(A1 u A2 u ​u Ak) = P(A1) ​+ ​P(A2) +

... + P(Ak) ​(1.3.5)

(See Exercise 12.) ​In the case of a finite sample space, notice that there is at most

a finite number ​of nonempty mutually exclusive events. Thus, in this case it

would suffice to verify


​ equation (1.3.4) or (1.3.5) instead of (1.3.3).

Example 1.3.1 ​The successful completion of a construction project requires that a piece of
equipment works properly. Assume that either the "project succeeds" (A1) or it ​fails because of
one and only one of the following: "mechanical failure" (A2) or ​"electrical failure" (A3). Suppose

that mechanical failure is three times as likely as ​electrical failure, and successful completion is

twice as likely as mechanical failure.


​ The resulting assignment of probability is determined by
f​
the equations ​P(A2) = 3P(A3) and P(A1) = 2P(A2). Because one and only one ​ these events ​will

occur, we also have from (1.3.2) and (1.3.5) that P(A1) + P(A2) + P(A3) = 1. ​These equations

provide a system that can be solved simultaneously to obtain P(A1)


​ = 0.6, P(A2) = 0.3, and

P(A3) = 0.1. The event "failure" is represented by ​the union A2 u A3, and because A2 and A3

are assumed to be mutually exclu- ​sive, we have from equation (1.3.5) that the probability of

failure is P(A2 u A3) =


​ 0.3 + 0.1 = 0.4.

PROBABILITY IN DISCRETE SPACES ​The assignment of probability in


the case of a discrete sample space can be ​reduced to assigning probabilities
to the elementary events. Suppose that to each ​elementary event ​{e} ​we assign a
real number p, so that P({e}) = ​m. To
​ satisfy ​the conditions of Definition 1.3.1, it
is necessary that

p1O ​foralli ​(1.3.6)


(1.3.7
)

Because each term in the sum (1.3.7) corresponds to an outcome in S, it is an ordinary summation

when S is finite, and an infinite series when S is countably ​infinite. The

probability of any other event then can be determined from the ​above

assignment by representing the event as a union of mutually exclusive


elementary events, and summing the corresponding values of p. A concise nota-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.3 ​DEFINITION OF PROBABILITY ​
11

tion for this is given by

P(A) = ​P({e1}) ​(1.3.8)

With this notation, we understand that the summation is taken over all indices i
such that e is an outcome in A. This approach works equally well for both finite
and countably infinite sample spaces, but if A is a countably infinite set the sum-
mation in (1.3.8) is actually an infinite series.

Example 1.3.2 ​If two coins are tossed as in Example 1.2.1, then S = {HH, HT, TH, TT}; if the

coins are balanced, it is reasonable to assume that each of the four outcomes is ​equally likely.

Because F(S) = 1, the probability assigned to each elementary event ​ must be 1/4. Any event in
a finite sample space can be written as a finite ​union of distinct elementary events, so the

probability of any event is a sum ​including the constant term 1/4 for each elementary event

in the union. For example,


​ if C = {HT, TH} represents the event "exactly one head," then

P(C) = P({HT}) + P({TH}) = 1/4 + 1/4 = 1/2 Note that the "equally likely" assumption cannot
be applied indiscriminately. ​For example, in Example 1.2.2 the number of heads

is of interest, and the sample ​space is S* = (0, ​1, 2}. The elementary event

{1} corresponds to the event ​C = {HT, TH} in S. Rather than assigning the

​ ​we should assign P({1}) = 1/2 and P({0}) =


probability 1/3 to the outcomes in S's',
P({2}) = 1/4.

In many problems, including those involving games of chance, the nature of

the outcomes dictates the assignment of equal probability to each elementary

event. This type of model sometimes is referred to as the classical probability


model.

ÇLASSICAL PROBABILITY

Suppose that a finite number of possible outcomes may occur in an experiment,


and that it is reasonable to assume that each outcome is equally likely to occur.
Typical problems involving games of chancesuch as tossing a coin, rolling a
die, drawing cards from a deck, and picking the winning number in a lotteryfit

this description Note that the e​ qually likely ​assumption requires the experi-

ment to be carried out in such a way that the assumption is realistic. That is, the
coin should be balanced, the die should not be loaded, the deck should be shuf-
fled, the lottery tickets should be well mixed, and so forth. ​This imposes a very

special requirement on the assignment of probabilities to the


​ elementary

outcomes. In particular, let the sample space consist of N distinct ​outcomes,​S =

(e1, e2, ​..., ​eN) ​(1.3.9)


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
12 ​CHAPTER 1 ​PROBABILITY
The "equally likely" assumption requires of the values p that

Pi - P ​PN (1.3.10>
​ and, to satisfy equations (1.3.6) and (1.3.7), necessarily
i​
p = P({e1}) ​ (1.3.11)
In this case, because all teijiis in the sum ​1.3.8) are the same, p = 1/N, it ​follows that
n(A)
P(A) ​
N
where ri(A) represents the number of outcomes in A. In other words, if the out- ​comes of an experiment are
equally likely, then the problem of assigning prob ​abilities to events is reduced to counting how many

outcomes are favorable to ​the occurrence of the event as well as how many are in the sample space, and
then finding the ratio. Some techniques that will be useful in solving some of the ​more complicated counting
problems will be presented in Section 1.6.
The formula presented in (1 3 12) sometimes is referred to as classical probabil- ​ity For problems in which
this method of assignment is appropriate, it is fairly ​easy to show that our general definition of probability is
satisfied Specifically, for ​any event A,
n(Au B) ​n(A)± n(B) ​
ri(A)​N​n(S) ​N ​ N ​P(A ​B) ​
N P(A)+ P(B)
if A and B are mutually exclusive.
RANDOM SELECTION ​A major application of classical probability arises in connection with choosing an
object or a set of objects at random from a collection of objects
(1.3.12)

2 3 2 ​If an object is chosen from a finite collection of distinct objects in such a manner ​that each object has the

same probability of being chosen, then we say that the object


​ was chosen at random.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.4 ​SOME PROPERTIES OF PROBABILITY ​13

Similarly, if a subset of the objects is chosen so that each subset of the same

size has the same probability of being chosen, then we say that the subset was

chosen at random. Usually, no distinction is made when the elements of the


subset are listed in a different order, but occasionally it will be useful to make this
distinction.

14​Example 1.3.3 A game of chance involves drawing a card from an ordinary deck of 52
playing ​
cards. It should not matter whether the card comes from the top or some other ​part of the

deck if the cards are well shuffled. Each card would have the same ​probability, 1/52, of being

selected. Similarly, if a game involves drawing five cards,


​ then it should not matter whether the
top five cards or any other five cards ​are drawn. The probability assigned to each possible set of
five cards would be ​the reciprocal of the total number of subsets of size 5 from a set of size 52.

In ​Section 1.6 we will develop, among other things, a method for counting the number
​ of
subsets of a given size.

SOME PROPERTIES OF ​:OABILITY

From general properties of sets and the properties of Definition 1.3.1 we can ​derive other useful
properties of probability. Each of the following theorems per ​tains to one or
more events relative to the same experiment.

Theorem 1.4.1 ​IfA is an event and A' is its complement, then

P(A)=1P(A') ​(141)

Proof ​Because A' ​is ​the complement of A relative ​to S, S = A u A'.


Because ​A - ​A' = Ø, A and A' are mutually exclusive, so it follows from
equations (1.3.2) ​and (1.3.4) that

i = P(S) = P(A u A') = P(A) + P(A')


which established the theorem

This theorem is particularly useful when an event A is relatively complicated, ​but its complement A
is easier to analyze
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14 ​CHAPTER 1 PROBARILITY

Example 1.4.1 ​An experiment consists of tossing a coin four times, and the event A of interest is
"at least one head." The event A contains most of the possible outcomes, but the ​complement, "no

heads," contains only one, A' = TTTT}, so n(A') = 1. It can be shown


​ by listing ​all of the
possible outcomes that n(S) = 16, so that P(A') ​= n(A )/n(S) = 1/16 Thus P(A) = i - P(A') =
i - ​1/16 = 15/16

Theorem 1.4.2 ​For any event A, P(A) ​1.

Proof ​From Theorem (1.4.1), P(A) = i - P(A'). Also, from Definition (13.1),
we know ​that P(A') ​O. Therefore, P(A) 1​ .

Note that this theorem combined with Definition (1.3.1) implies that
(1.4.2) ​
O ​P(A) 1​ ​ Equations (1.3.3), (1.3.4), and (1.3.5) provide formulas for the
probability of a ​union in the case of mutually exclusive events. The following
theorems provide ​formulas that apply more generally.

Theorem 1.4.3 ​For any two events A and B,


(1.4.3) ​
P(A ​B) = P(A) + ​P(B) - P(A n B) ​ Proof ​The approach will be to express the events A u
B ​and ​A ​as unions of mutually ​exclusive events. From set properties we can show
that

A u ​B = (A n B') u B ​and
A = (A ​n B) u (A n B')

See Figure 1.2 for an illustration of these identities.

FIGURE 1.2 ​Partitioning of events

AUB = (AflB')UB ​A = (AflB)U(AflB')


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.4 ​SOME PROPERTIES OF PROBABILITY ​'5

lt also follows that the events A n B' and B are mutually exclusive because

(A n B) ​n B
​ = Ø, so that equation (1 3 4) implies
P(A u B) = P(A ​n ​B') + P(B) ​Similarly, A n B and A n B'
are mutually exclusive, so that

P(A) = P(A n B) + P(A n B')

The theorem follows from these


equations:

P(AuB)=P(AnB')+P(B)
= [P(A) - P(A n B)] + P(B)
= P(A) + P(B) - P(A ​n ​B)

Examp'e 1.4.2 ​Suppose one card is drawn at random from an ordinary deck of 52 playing cards.
As noted in Example 13 3 this means that each card has the same probability ​1/52, of being
chosen.
Let A be the event of obtaining "a red ace" and let B be the event "a heart." ​Then P(A) = 2/52,
P(B) = 13/52, and P(A n B) = 1/52. From Theorem (1.4.3) we ​have P(A u B) =

2/52 + 13/52 - 1/52 ​14/52 = 7/26.

Theorem 1.4.3 can be extended easily to three


events.

Theorem 1.4.4 ​For any three events A, B, arid C,

P(A u B u C) = P(A) + P(B) + P(C)

- P(A n B) - P(A n C) - P(B n C) ​+ P(A n B n


C) ​(1.4.4)

Proof
See Exercise 16.

lt is intuitively clear that if every outcome of A is also an outcome of B, then A ​is no more likely to
occur than B. The ,next theorem formalizes this notion.
Theorm 1.4.5 ​IfA ​B, then P(A) ​P(B).

See Exercise 17.


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
16 ​CHAPTER 1 ​PROBABILITY
Property (1.33) provides a formula for the probability of a countably infinite ​union when the events are
mutually exclusive. If the events are not mutually ​exclusive, then the right side of property (1.3.3) still
provides an upper bound for ​this probability, as shown in the following theorem
Theorem 1.4.6 ​Boole's Inequality ​IfA1, A2,... is a sequence of events, then
P(UA1) (1.4.5)

Proof
Let B1 = A1, ​B2 ​= A2 n A, and in general B1 = A1 n ​(1U'A1'. It
​ follows that
B. and B1, B2,.
are mutually exclusive. Because B1 ​A1, it follows ​1=1 ​1=-1 ​from Theorem 1.4.5 that ​P(B1) ​P(A1), and thus
=​
P(UAI) ​= F ​ (UBI) ​ P(B1) ​P(A1)
A similar result holds for finite unions. In particular,
P(A1 u A2 u ​u Ak) ​P(A1) + P(A2) + ​ ... + ​P(A)
(1.4.6)
which can be shown by a proof similar to that of Theorem 1.4.6.
Theorem 1.4.7 ​Bonferroni's Inequality ​If A1, A2, ​..., ​A are events, then
P(k

flA1) ​ 1 P(A)
k ​(1.4.7)

Proof

This follows from Theorem 1.4.1 applied to ​( A1 = ¿=1


k​
UM together
​ with
inequality (1.4.6).

1.5
CONDITIONAL PROBABILITY
A major objective of probability modeling is to determine how likely it is that an event A will occur when a

certain experiment is performed. However, in numer- ​ous cases the probability assigned to A will be
affected by knowledge of the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 ​CONDITIONAL PROBABILITY ​ '7
occurrence or nonoccurrence of another event B. In such an example we will use ​the terminology conditional
probability of A given B and the notation P(A ​will be used to distinguish between this new concept and
ordinary probability ​ B)
I​

P(A).
Example 1.5.1 ​A box contains 100 microchips, some of which were produced by factory i and ​the rest by

factory 2. Some of the microchips are defective and some are good ​(nondefective). An experiment

consists of choosing one microchip at random from


​ the box and testing whether it is good or defective.

Let A be the event ​"obtaining a defective microchip"; consequently, A' is the event "obtaining a good

microchip." Let B be the event "the microchip was produced by factory 1" ​and B' the event "the microchip
was produced by factory 2," Table 1.1 gives the ​number of microchips in each category.

TABLE 1.1 ​from Numbers


​ ​nondefectivo ​two factories
of defective ​microchips and
​ ​B ​B' ​Totals

A ​A' ​15 ​5 ​20 ​45 ​35 ​80


Totals ​60 ​40 ​100
The probability of obtaining a defective microchip is

P(A) = ​= =
​ 0.20
Now suppose that each microchip has a number stamped on it that identifies ​which factory produced it.
Thus, before testing whether it is defective, it can be ​determined whether B has occurred (produced by
factory 1) or B' has occurred ​(produced by factory 2). Knowledge of which factory produced the
microchip ​affects the likelihood that a defective microchip is selected, and the use of condi- ​tional

probability is appropriate. Fo example, if the event B has occurred, then the ​ only microchips we should
consider are those in the first column of Table 1.1, ​and the total number is n(B) = 60. Furthermore, the only

defective chips to con- ​sider are those in both the first column and the first row, and the total number is n(A

n B) = 15. Thus, the conditional probability of A given B is
B) ​
​ ​n(B) ​
P(AIB)= n(A 15025
60
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
18 ​CHAPTER 1 ​PROBAffiLITY

Notice ​can express that


​ if ​conditional we
​ divide both ​probability the
​ numerator ​in terms and
​ ​of
denominator ​some ordinary by
​ n(S) ​unconditional =
​ 100, we

probabilities,​
P(AlB)_nn_P) ​ n(B)/n(S) ​P(B)

This last result can be derived under more general circumstances as follows. ​Suppose we conduct an

experiment with a sample space S, and suppose we are ​given that the event B has occurred. We wish to

know the probability that ​an ​event A has occurred given that B has occurred, written P(A B). That is, we

​ ​be partitioned probability


want the ​ ​into of
​ A ​two relative
​ ​subsets,

to the reduced sample space B. We know that B ​can ​B = (A n B) u (A' n B) ​A ​should n


​ B is ​be the

proportional subset
​ of B for ​to which
​ ​P(A n A
​ ​B), is
​ true, ​say P(A so
​ the ​B) probability
​ ​= kP(A n B).

of A ​Similarly, given
​ B ​P(A' B) = kP(A' n B). Together these should represent the total probability rela- tive

to B, so
P(AIB)+ P(A'IB) = k[P(A n B) + P(A' n B)]
= kP[(A n B) u (A' n B)]
= kP(B)
and k = l/P(B). That is,
P(AnB) ​ P(AnB)
P(AIB) ​- P(A n ​ B) + P(A' n B) - ​

P(B) ​and l/P(B) is the proportionality ​reduced sample space add to 1.


constant that makes the probabilities ​on the ​Definition 1.5.1
The conditional probability of ari event A, given the event ​B, ​is defined by
P(A n B)
PAB​
P(B)
if ​P(B) ​O.
(1.5,1)

Relative to the sample space B, conditional probabilities defined by (1.5.1) ​satisfy the original definition

of probability, and thus conditional probabilities enjoy


​ all the usual properties of probability on the
reduced sample space. For
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 CONDITIONAL PROBABILITY ​19
example, if two events A1 and A2 are mutually exclusive, then
P(A1 u A2!B)
u A2) n B]
P(B) ​P[(A1 n B) u (A2 n B)] ​P(B) ​P(A1 n B) + P(A2 n B) ​P(B)
=P(A1IB)+ P(A2IB) ​This result generalizes to more than two events. Similarly, P(A ​ B) ​O and ​P(S I​

Thus, ​ B) = the P(B properties ​ B) = 1, so the conditions of a probability set function are satisfied.
I​ I​

derived in Section 1.4 hold conditionally. In particular, ​P(AJB)= i P(A'IB)


O ​P(A ​ B) ​i ​P(A1
I​ u A1IB)=P(A1IB)+P(A2IB)P(A1 n A2IB)
The following theorem results immediately from equation (1,5.1):
Theorem 1.5.1 ​For any events A and B,
P(A n B) = P(B)P(AIB)= P(A)P(BIA) ​(1.5.5)
This sometimes is referred to as the Multiplication Theorem of probability. It ​provides a way to compute the

probability of the joint occurrence of A and B by multiplying


​ the probability of one event and the

​ 0.15, ​In terms or


conditional probability of the ​other ​= 15/100 ​event. = ​ Example ​can compute
​ we of

​ ​compute directly P(A n B) P(B)P(A


1.5.1, we it​ ​can as ​ B) = (60/i00)(15/60) ​= 0.15 or P(A)P(BIA) =
(20/100)(15/20) = 0.15.
Formula (1.5.5) also is quite useful in dealing with problems involving sampling ​without replacement. Such
experiments consist of choosing objects one at a time from a finite collection, without replacing chosen
objects before the next choice. ​Perhaps the most common example of this is dealing cards from a deck.
Example 1.5.2 ​Two cards are drawn without replacement from a deck of cards. Let A1 denote
the event of getting "an ace on the first draw" and A2 denote the event of getting ​"an ace on the second
draw."
The number of ways in which different outcomes can occur can be enumerated, ​and the results are given in

Table 1.2. The enumeration of possible outcomes can ​be a tedious problem, and useful techniques that are

helpful in such counting


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
20 ​CHAPTER 1 ​PROBABILITY

problems ​doing ​the so-called ​one are


​ ​thing discussed
​ ​multiplication ​and n2 in

ways Section
​ ​
principle, ​of doing 1.6.
​ The ​which ​another, values
​ ​says ​then in

that this
​ ​there ​if example
​ ​
there ​ ​n1 n​ 2 based
​are n1 a​ re are ​ ​ways w ​ ​of o​ f
​ ays on

doing both. Thus, for example, the total number of ordered two-card hands that
can be formed from 52 cards (without replacement) is 52 ​51 = 2652. Similarly,

the number of ordered two-card hands in which both cards are aces is 4 .​ ​3, the

number in which the first card is an ace and the second is not an ace is 4 ​. 48, and

so forth. The appropriate products for all cases are provided in Table 1.2.

TABLE 1.2 ​Partitioning ​of ways to draw the



numbers ​two cards

1
2​ ​ ​48 '4
448 48 ​ 4​
​ 47 ​ 51 ​48 ​51 ​451
4851 ​52 ​51

For example, the probability of getting "an ace on the first draw and an ace on ​the second draw" is
given by

P(A1 n ​A2) ​
- 5​ 2. 51
Suppose one is interested in P(A1) without regard to what happens on the second

draw. First note that A1 may be partitioned as


A1 = (A1 n A2) u (A1 n
A)
so
that

P(A1) = P(A1 n A2) + P(A1 n


A'2)

4.3 ​- 52 ​51 +​ ​52


4'48 ​51 ​451 ​4
51 . 52 - 52

This same result would have occurred if A1 had been partitioned by another ​event, say B which
deals only with the face value of the second card This follows ​because n(B u B')
= 51, and relative to the 52 . 51 ordered pairs of cards,

n(A1) = 4 ​n(B) + 4 ​n(B') = 4 ​n(B u B') = 4 ​51 ​The numerators of probabilities such as P(A1),
P(A'1), P(A2), and P(A'2), which deal with only one of the draws, appear in the
margins of Table 1.2. These prob-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 ​CONDITIONAL PROBABILITY ​21

abilities may be referred to as marginal probabilities. Note that the marginal ​probabilities in fact can be
computed directly from the original 52-card sample ​space, and it is not necessary to consider the sample
space of ordered pairs at all. ​For example, P(A1) = 4 ' 51/52 ​51 = 4/52, which is the probability that would
be obtained for one draw from the original 52-card sample space. Clearly, this ​result would apply to
sampling-without-replacement problems in general. What ​may be less intuitive is that these results also apply
to marginal probabilities such as P(A2), and not just to the outcomes on the first draw, That is, if the outcome
of ​the first draw is not known, then P(A2) also can be computed from the original ​sample space and is
given by P(A2) = 4/52 This can be verified in this example ​because
(A2 ​A2 = (A2 n A1)
n A)
and

P(A2) 4.3
​ ​52' 51 P(A21A1)
​ ​P(A1)

(4 ​' ​3)/(52 ​51) ​- (4 ​' 51)1(52 ​51)


5

That is, given that A1 is true, we are restricted to the first column of Table 1.2, ​and the relative proportion

of the time that A2 is true on the reduced sample space ​ is (4 ' 3)/fl4 ' 3 + (4 ' 48)]. Again, it may be less
obvious, but it is possible ​to carry this problem ​the 51 card conditional ​one step ​sample ​further and compute
space, and obtain the P ​ (A2 ​much ​ A1) directly in ​simpler solution t​ erms of ​that P
I​ ​ (A2 IA1) = 3/5 1, there
being three aces remaining in the 51 remaining cards in ​the conditional sample space. Thus, it is common
practice in this type of problem ​to compute the conditional probabilities and marginal probabilities directly

from ​the one-dimensional sample spaces (one marginal and one conditional space), ​rather than obtain the

joint probabilities from the joint sample space of ordered


48'4 ​52' 51
4​52
Indeed, if the result of the first draw is not known, then the second draw could ​just as well be considered as

the first draw. ​The conditional probability that an ace is drawn on the second draw given that
​ an ace was
obtained on the first draw is
P(A1 n ​A2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
22 ​CHAPTER 1 ​PROBABILITY

pairs. For example,

P(A1 n A2)=P(A1)P(A2 JA1)

43
52 ​51

This procedure would extend to three or more draws (without replacement) ​where, for example, if
A3 denotes obtaining "an ace on the third draw," then

P(A1 n A2 n A3) = P(AI)P(A2 J A1)P(A3 JA1 n A2)

432
52 ​51 ​50

An indication of the general validity of this approach for computing condition- ​al probabilities is

obtained by considering P(A2 A1) in the example. Relative to ​the joint sample

space of ordered pairs, 204 = 4 . 51, where 4 represents the number


​ of ways

the given event A1 can occur on the first draw and 51 is the total ​number of

possible outcomes in the conditional sample space for the second ​draw;

also, 12 = 4 . 3 represents the number of ways the given event A1 can ​occur

times the number of ways a success, A2, can occur in the conditional
sample space. Because the number of ways A1 can occur is a common multiplier
in the numerator and denominator when counting ordered pairs, one may equiv-
alently count directly in the one-dimensional conditional space associated with

the second draw ​The computational advantage of this approach is obvious,

because it allows ​the computation of the probability of an event in a

complicated higher- ​dimensional product space as a product of

probabilities, one marginal and the others


​ conditional, of events in simpler
one-dimensional sample spaces.
The above discussion is somewhat tedious, but it may provide insight into the ​physical meaning of
conditional probability and marginal probability, and also ​into the topic of
sampling without replacement, which will come up again in the ​following
sections.

TOTAL PROBABILITY AND BAYES' RULE


As noted in Example 1.5.2, it sometimes is useful to partition an event, say A, into
the union of two or more mutually exclusive events For example if B and B are

events that pertain to the first draw from a deck, and if A is an event that

pertains to the second draw, then it is worthwhile to consider the partition


A = (A n B) u (A n B') to compute P(A), because this separates A into two events

that involve information about both draws. More generally, if B1, B2, ..., Bk
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.5 ​CONDITIONAL PROBABILITY ​23

are mutually exclusive and exhaustive, in the sense that B1 u B2 u ​u B ​=

S, then​A = (A n B1) u A n B2) u ​u (A n ​Bk)

This is useful in the following theorem.

Theorem 1.5.2 ​Total Probability ​If B1, B2, ​..., Bk ​is a collection of mutually exclusive and
exhaustive events, then for any event A,

P(A)= )..P(BI)P(AIBI) ​(1.5.6) ​i= i

Proof ​The events A n B1, A nB2, ..., A n Bk are mutually exclusive, so it


follows ​that

P(A) = ​P(A n B1) (​ 1 .5.7)

and the theorem results from applying Theorem 1.5.1 to each term in this sum-
mation.

Theorem 1.5.2 sometimes is known as the Law of Total Probability, because it ​corresponds to
mutually exclusive ways in which A can occur relative to a partii ​tion of the
total sample space S
Sometimes it is helpful to illustrate this result with a tree diagram. One such ​diagram for the case of
three events B1, B2, and B3 is given in Figure 1.3.

FIGURE 1.3 ​Tree diagram showing the Law of Total Probability


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
24 ​CHAPTER 1 ​PROBABILITY

The probability associated with branch B- is P(BJ, and the probability associ- ​ated with each branch

labeled A is a conditional probability P(A ​I B.), ​which may


​ be different depending on which branch, B-, it
follows. For A to occur, it must occur jointly with one and only one of the events B-. Thus, only A n B1,
A n B2, or A n B3 must occur, and the probability of A is the sum of the prob- ​abilities of these joint events
P(BJP(A B3
EzampiG 15.3 ​Factory i in Example 1.5.1 has two shifts, and the microchips from factory i can ​be

categorized according to which shift produced them. As before, the experiment ​consists of choosing a

microchip at random from the box and testing to see whether


​ it is defective. Let B1 be the event
"produced by shift 1" (factory 1), B2 ​the event "produced by shift 2" (factory 1), and B3 the event "produced
by factory ​2 ​As before, let A be the event obtaining a defective microchip ​The categories ​are given by Table 1 3
TABLE 1.3 ​defective common Numbers
​ ​lot

of ​microchips defective
​ and ​from non-
​ ​a ​Various probabilities can be computed directly from the table.
For example, ​P(B1) = 25/100, P(B2) = 35/100, P(B3) = 40/100, ​P(A B1) = 5/25, ​P(A B2) ​= 10/35, and P(A B3)
= 5/40. It is possible to compute P(A) ​either directly from ​the table, P(A) = 20/100 = 0.20, or by using the
Law of Total Probability:
P(A) = P(B1)P(A ​ = ​B, ​B ​B3 ​Totals ​A' A
I​
​ ​20 5​ ​10 ​25 5​ ​35 20
​ ​80
Totals ​25 ​35 ​40 ​100

B1) + ​725V5\ ​+ ​(3sVlo\ ​!l\1Q0)) ​P(B2)P(A B2) + ​+


(4oVs
P(B3)P(A ​I B3)

= 0.05 + 0,10 + 0.05 = 0.20


This problem is illustrated by the tree diagram in Figure 1.4.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
25 ​1.5 ​CONDITIONAL PROBABILITY
FIGURE 1.4 ​Tree diagram for selection of microchips from combined lo
Consider the following variation on Example 1.5.3, The microchips are sorted ​into three separate boxes.

Box i contains the 25 microchips from shift 1, box 2 ​contains the 35 microchips from shift 2, and box 3

contains the remaining 40 microchips


​ from factory 2. The new experiment consists of choosing a box at
random then selecting a microchip from the box This experiment is illustrated in ​Figure 1.5.
FIGURE 1.5 ​S&ection of microchips from three different sources
5 defective ​20 good
Box i ​Box2 ​Box3
In this case, it is not possible to compute P(A) directly from Table 1.3, but it ​still is possible to use equation
(1.5.6) by redefining the events B1, B2, and B3 to ​be respectively choosing "box 1," "box 2," and "box 3."
​ ​1/3, and
Thus, the new assignment ​of probability to B1, B2, and B3 is P(B1) = P(B2) ​P(B3) =

P(A) ​= 71V
​ s \ ​+ ÍiVio\
​ ​+
5/25 ​flB
10/3 5
5/40
10 defective ​25 good​ (i\( s ​57 ​280 ​As a result of this new experiment, suppose that the component obtained
is ​defective, but it is not known which box it came from. It is possible to compute ​the probability that it

came from a particular box given that it was defective, although


​ a special formula is required.
A f​ l ​B2

A n B,
Ex8mple 1.5.4
5 defective 35 good
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
26 CHAPTER
​ 1 PROBABILITY

Theor@rn 1.5.3 Bayes' Rule ​If we assume the conditions of Theorem 1.5.2, then for each j = 1,
P(BJ)P(A ​
2,..,,k, P(BJ
​ ​ A) ​
I​ I​ B) ​(1 5°)
P(B)P(A B)

Proof
From Definition 1.5.1 and Multiplication Theorem 1.5.5 we have

P(A n B) ​P(B)P(AIB) ​
P(B.IA) ​ J

P(A) ​P(A)

The theorem follows by replacing the denominator with the right side of (1.5.6).

For the data of Example 1.5.4, the conditional probability that the microchip ​came from box 1, given
that it is defective, is

(1/3)(5/25) ​
B​ P( ​A) ​- (1/3)(5/25) + (1/3X10/351+

(1/3)(5/40)

= 0,327
=​

Similarly, P(B2 A) = 80/171 = 0.468 and P(B3 IA) = 35/171 = 0.205. ​Notice that these differ from

the unconditional probabilities, P(B) = 1/3 =


​ 0.333. This reflects the
different proportions of defective items in the boxes. In ​other words because

box 2 has a higher proportion of defectives choosing a defective


​ item
effectively increases the likelihood that it was chosen from box 2
For another illustration, consider the following example.

Example 1.5.5 A man starts at the point O on the map shown in Figure 1.6. He first chooses a

path at random and follows it to point B, B2, or B3. From that point, ​he chooses a

new path at random and follows it to one of the points A, i = 1, ​2,...,7.

FIGURE 1.6 ​Map of possible paths


A ​A2 ​A5 ​A4 ​A5 ​A6 ​A7
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 ​CONDITIONAL PROBABILITY ​27

It might be of interest to know the probability that the man arrives at point ​A4. This can be computed
from the Law of Total Probability:
P(A4) = P(B1)P(A4 B1) + P(B2)P(A4 B2) + P(B3)P(A4 B3)
Definition 15.2
Two events A and B are called independent events if
P(A n B) = P(A)P(B) ​(1.5.9)
Otherwise, A and B are called dependent ​events.

​ ​
= ​(O(i) + ()() + () ​=
i
Suppose the man arrives at point A4, but it is not known which route he ​took. The probability that he
passed through a particular point, B1, B2, or B3, ​can be computed from Bayes' Rule. For example,
P(B1jA4)=
(1/3)(1/4) ​1 ​(1/3)(1/4) + (1/3X1/2) + (1/3X0) - 3 ​which agrees with the unconditional probability, P(B1) = 1/3.
This is an example of a very special situation called "independence," which we ​will pursue in the next
section. However, this does not occur in every case. For ​example, an application of Bayes' Rule also leads to
P(B2 ​A4) = 2/3, which does ​not agree with P(B2) = 1/3. Thus, if he arrived at point A4, ​ it is twice as likely
I​

that he passed through point B2 as it is that he passed through B1. Of course, the ​most striking result ​This

reflects the concerns


​ point B3, because P(B3 ​obvious fact that he cannot arrive ​ A4) ​at ​= ​point ​0, while
I ​

A4 ​P(B3) ​by passing ​= 1/3. through


​ point B3. The practical value of conditioning is obvious when consider-
ing some action such as betting on whether the man passed through point B3.
INDEPENDENT EVENTS ​In some situations, knowledge that an event A has occurred will not affect the

probability that an event B will occur. In other words, P(B A) = P(B). We saw this
​ happen in Example 1 5
5, because the probability of passing through point ​B1 was 1/3 whether the knowledge that the man

arrived at point A4 was taken ​into account. As a result of the Multiplication Theorem (1.5.5), an

equivalent formulation
​ ​when this of
​ this situation ​happens the two is
​ P(A n B) = ​events are said P(A)P(B

to be independent ​ A) = P(A)P(B). In general, ​or stochastically independent.


I​

As already noted, an equivalent formulation can be given in terms of condi- ​tional probability.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
28 ​CHAPTER 1 ​PROBABILITY
Theorem 15.4 ​If A and B are events such that P(A) > 0 and P(B) > 0, then A and B are inde-
pendent if and only if either of the following holds:

P(AIB)=P(A) ​P(BIA)_P(B) ​ J
We saw examples of both independent and dependent events in Example 1.5.5. ​There was also an example of
mutually exclusive events, because P(B3 A4) = 0, which implies P(B1 n A4) = 0. There is often confusion
between the concepts of ​independent events and mutually exclusive events. Actually, these are quite differ- ​ent
notions, and perhaps this is seen best by comparisons involving condition- ​al ​probabilities. ​Specifically, ​if

A and B are ​mutually ​exclusive, ​then ​P(A ​ B) = P(B A) = 0, whereas for independent nonnull events the
I​

conditional probabilities
​ are nonzero as noted by Theorem 1.5.4. In other words, the pro- ​perty of being
mutually exclusive involves a very strong form of dependence, ​because, for nonnull events, the occurrence
of one event precludes the occurrence ​of the other event.
There are many applications in which events are assumed to be independent.
Example 15.6 A "system" consists of several components that are hooked up in some particular

configuration. It is often assumed in applications that the failure of one com- ponent
​ does not affect the
likelihood that another component will fail. Thus, the ​failure of one component is assumed to be
independent of the failure of another ​component.
A series system of two components, C1 and C2, is illustrated by Figure 1.7. It is easy to think of such a system
in terms of two electrical components (for example, ​batteries in a flashlight) where current must pass
through both components for ​the system to function. If A1 is the event "C1 fails" and A2 is the event "C2

fails," ​then the event "the system fails" is A u A2. Suppose that P(A1) = 0.1 and P(A2) ​ = 0.2. If we
assume that A1 and A2 are independent, then the probability ​that the system fails is
P(A1 u A2) = P(A1) + P(A2) - P(A1 n A2)
P(A1) + P(A2) - P(A1)P(A2)
= 0.1 + 0.2 - (0.1)(0.2) = 0.28 ​The probability that the system works properly is i - 0.28 = 0.72.
FIGURE 1.7 ​Series system of two components
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.5 CON$TIONAL PROBABtLITY ​29

Notice that the assumption of independence permits us to factor the probabil- ​ity

of the joint event, P(A1 r A2), into the product of the marginal probabilities,
P(A1)P(A2).
Another common example involves the notion of a parallel system, as illus-
trated in Figure 1.8. For a parallel system to fail, it is necessary that both com-

ponents fail so the event the system fails ​is A1 r' A2 The probability that thu

system fails is P(A1 ​A2) = P(A1)P(A2) = (0.ÎXO.2) = 0.02, again assuming the

components fail independently. Note


​ that the probability of failure for a series
system is greater than the prob- ​ability of failure of either component, whereas
for a parallel system it is less. This ​is because both components must function
for a series system to function, and ​consequently the system is more likely to
fail than an individual component. On ​the other hand, a parallel system is a
redundant system: One component can fail, ​but the system will continue to

function provided the other component functions. ​Such redundancy is common

in aerospace systems, where the failure of the system


​ may be catastrophic.
A common example of dependent events occurs in connection with repeated ​sampling without
replacement from a finite collection. In Example 1.5.2 we con- ​sidered the
results of drawing two cards in succession from a deck. It turns out that the
events A1 (ace on the first draw) and A2 (ace on the second draw) are
dependent because P(A2) = 4/52, while P(A2 A1) = 3/51 ​Suppose instead that

the outcome of the first card is recorded and then the ​card is replaced in the

deck and the deck is shuffled before the second draw is made.
​ This type of
sampling is referred to as sampling with replacement, and it ​would be
reasonable to assume that the draws are independent trials. In this case ​P(A1 r

A2) = P(A1)P(A2). ​There are many other problems in which it ​is reasonable

to assume that repeated


​ trials of an experiment are independent, such as tossing
a coin or rolling ​a die repeatedly.
It is possible to show that independence of two events also implies the indepen- ​dence of some related
events.

FIGURE 1.8 ​Parallel system of two components


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
30 ​CHAPTER 1 ​PROBABILITY

Theorem 1.5.5 ​Two events A and B are independent if and only if the following pairs of events
are also
independent:

AandB'.
A' and B.
A' and
B'.

Proof
See Exercise 38.

It is also possible to extend the notion of independence to more than two ​events.

Definition 15.3
The k events A1, A2.....Ak are said to be independent or mutually independent if
for e​ very j ​2, 3.....k and every subset of distinct indices i1, i2, ...,

r Al2 n ​n A,) = P(A,,)P(A,2) . ​ P(A) ​(1.5.10)


.​

Suppose A, B, and C are three mutually independent events. According to the

definition of mutually independent events, it is not sufficient simply to verify


pairwise independence. It would be necessary to verify P(A ​n ​B) = P(A)P(B),
P(A n C) = P(A)P(C), ​P(B n C) = P(B)P(C), ​and ​also ​P(A n B ​n C) ​=
P(A)P(B)P(C). The following examples show that pairwise independence does
not imply this last threeway factorization and vice versa.

Example 1.5.7 ​A box contains eight tickets, each labeled with a binary number. Two are labeled

111, two are labeled 100, two 010, and two 001. An experiment consists of drawing
​ one
ticket at random from the box. Let A be the event "the first digit is ​1," B the event "the second
digit is 1," and C the event "the third digit is 1." This ​is illustrated by Figure 1.9. It follows that
P(A) = P(B) = P(C) ​4/8 = 1/2 and that P(A n B) = P(A ​n C) ​P(B n C) = 2/8 = 1/4, thusAB,
and C arçpa1r ​wise independent. However, they are not mutually independent, because
P(A n ​B ​n C) = = ​= ​P(A)P(B)P(C)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 ​COUNTING TECHNIQUES 3
1

FIGURE 1.9 ​Selection of numbered tickets

COUNTING TECHNIQUES
In ​be many
​ ​reasonable experiments
​ ​to assume with
​ ​that finite
​ ​all sample

posible spaces,
​ ​outcomes such
​ as ​are games
​ ​equally of
​ chance, ​likely. In it

may ​that case
​ ​a realistic probability model should result by following the
classical ​approach and taking the probability of any event A to be P(A) ​=
n(A)/N, where ​N is the total number of possible outcomes and n(A) is the

number of these outcomes


​ that correspond to occurrence of the event A.

Counting the number of ​ways in which an event may occur can be a tedious
problem in complicated experiments.
​ A few helpful counting techniques will be
discussed.
Exampk 1.5.8 ​In Figure 1.9, let us change the number on one ticket in the first column from 111 ​to

110, and the number of one ticket in the second column from loo to 101. We ​still have​P(A) =

P(B) = P(C) =

bu
t

P(B n C) = ​= P(B)P(C)

and

P(A n B n C) = =
P(A)P(B)P(C)

In this case we have three-way facto rization, but not independence of all pairs.

16
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
CHAPTER 1 ​PROBABIUTY
32 ​
MULTIPLICATION PRINCIPLE ​First note that if one operation can be performed in n1 ways and a
second oper- ​ation can be performed in n2 ways, then there are n1 ​n2 ways in which both ​operations can
be carried out
Example i .6.1 ​Suppose a coin is tossed and then a marble is selected at random ​from a box containing
one black (B), one red (R) and one green (G) ​marble The possible ​outcomes are HB, HR, HG, TB, TR,
and TG. For each ​of the two possible ​outcomes of the coin there are three marbles that may be ​selected
tree ​
for a total of ​2 ​3 = 6 possible outcomes. The situation also is easily illustrated by a ​ diagram, as in
Figure 1.10.
FIGURE i.io ​Tree diagram of two-stage experiment ​
HR
discussed in Example ​
Another application of the multiplication principle was ​ 1.5.2 in connection with
two-card hands.
counting the number of ordered ​
Note that the multiplication principle can be extended to more ​than two oper- ​ations. In particular, if the ith
be performed in n, ​
of r successive operations can ​ ways, then the total number of ways to carry out ​all r
operations is the product
I (1.6.1)

One standard type of counting problem is covered by the ​following theorem.
Theorem 1.6.1 ​If there are N possible outcomes of each of r trials of an ​experiment, then there
are N possible outcomes in the sample space.
answered? The answer
Example I .6,2 ​How many ways can a 20-question truefalse test be ​
is 220.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 ​COUNTING TECHNIQUES ​
33

Example 1.6.3 ​How many subsets are there from a set of m elements? In forming ​a subset, one

must decide for each element whether to include that element in the subset. Thus ​for each of m

elements there are two choices, which give ​a total of 2m possible subsets. This includes the null

set, which corresponds to the ​case of not including any element in the subset.

As suggested earlier, the way an experiment is carried out ​or the method of ​sampling may affect the

sample space and the probability assignment ​over the ​sample space. In

particular, sampling items from a finite population with and without



replacement are two common schemes. Sampling without replacement ​was

illustrated in Example 1.5.2. Sampling with replacement is covered by


Theorem 1.6.1.

Example 1.6.4 ​If five cards are drawn from a deck of 52 cards with replacement, then there ​are
(52) ​possible hands. If the five cards are drawn without replacement, then the ​more general
multiplication principle may be applied to determine that there are ​52 . ​5 ​. 5f ​. 49 . 48 possible
hands. In the first case, the same card may occur ​more than once in the same hand. In the
second case, however, a card may not b​ e repeated.

Note that in both cases in the above example, order is considered important. ​That is, two five-card
hands may eventually end up with the same five cards, but ​they are counted as

different hands in the example if the cards were obtained in ​a ​different order. For

example, let all five cards be spades. The outcome (ace, king, queen,
​ jack, ten) is
different from the outcome (king, ace, queen, jack, ten). If ​order had not been
considered important, both of these outcomes would be con- ​sidered the same;

indeed, there would be several different ordered outcomes cor- ​responding to

this same (unordered) outcome, On the other hand, only one outcome

corresponds to all five cards being the ace of spades (in the sampling-
with-replacement case), whether the cards are ordered or unordered. ​This

introduces the concept of distinguishable and indistinguishable elements. ​Even

though order may be important, a new result or arrangement will not be


obtained if two indistinguishable elements are interchanged. Thus, fewer ordered

arrangements are possible if some of the items are indistinguishable. We also

noted earlier that there are fewer distinct results if order is not taken into
account, but the probability of any one of these unordered results occurring then

would be greater. Note also that it is common practice to assume that order is

not important when drawing without replacement, unless otherwise specified,


although we did consider order important in Example 1.6.4.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
34 ​CHAPTER 1 ​PROBABILITY
PERMUTATIONS AND COMBINATIONS Some particular formulas that are helpful in counting the
number of possible ​arrangements for sorne of the cases mentioned will be given. An ordered arrange- ​ment
of a set of objects is known as a permutation.
Theorem 1.6.2 ​The number of permutations of ​n ​distinguishable objects is n!.
Proof
This follows by applying the multiplication principle. To fill ​n ​positions with ​n ​distinct objects, the first
position may be filled ​n ​ways using any one of the ​n ​objects, the second position may be filled ​n - i ​ways using
any of the remaining ​n - i ​objects, and so on until the last object is placed in the last position. Thus, ​by ​n(n-1)
the ​multiplication ​.....1=Ñ!ways.
principle, ​this ​operation may be ​carried ​out ​in
For example, the number of arrangements of five distinct cards is 5! = 120. ​One also may be interested in the
number of ways of selecting r objects from ​n ​distinct objects and then ordering these r objects.
Theorem 1.6.3 The number of permutations of n distinct objects taken r at a time is

​ ​- ​r)!
= ​(n n!
Proof
To fill r positions from ​n ​objects, the first position may be filled in ​n ​ways using ​any one of the ​n ​objects, the

second position may be filled in n - i ways, and so ​on until ​n - (r - ​1) objects are left to fill in the rth

position. Thus, the total number


​ of ways of carrying out this operation is
n(n-1)(n-2) .....(n_(r_i))=(')!
1 .6.2)

FicurùpIe 165 The number of permutations of the four letters a, b, c, d taken two at a time is ​4!/2! =
12. These are displayed in Figure 1.11. In picking two out of the four letters,
​ there are six unordered ways
to choose two letters from the four, as given by the top row. Each combination of two letters then can be
permuted 2! ways to g​ et the total of 12 ordered arrangements.
FIGUflE 1.11 ​Permutations of four objects taken two at a time
ab ​ac a​ d ​bc ​bd cd ha ​ca ​da ​cb ​db ​dc
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 ​COUNTING TECHNIQUES ​
35

Example 1.6.6 A box contains n tickets, each marked with a different integer, 1, 2, 3, ​..., n. If ​three

tickets are selected at random without replacement, what is the probability of


​ obtaining tickets with

consecutive integers One possible solution would be to ​let the sample space consist of all ordered triples (i,

j, k), where i ​different integers in the range i to n. The number of such triples ​= n ​- 3) = n(n - 1)(n -

2) The triples that consist of consecutive ​j and k are ​integers ​is ​P3 w
​ ould be (1, 2 3), (2, 3 4) ​,

permuting the entries in these ​(n ​There ​- 2, n ​would ​- 1, n) ​be ​or ​3 ​any ​(n ​of ​- ​the ​2) ​triples ​= 6 ​triples.

The desired probability is ​formed by (n - 2) such ​ 6(n-2) ​6(n-2) ​6


n(n - 1)(n - 2) - n(n - 1)
If the order of the objects is not important, then one may simply be interested in the number of
combinations that are possible when selecting r objects from ​n

distinct objects. The symbol ​\ r) usually


​ is used to denote this number. ​Theorem 1.6.4 The number of
combinations of n distinct objects chosen r at a time is
(n ​(1.6.3)
Proof
As suggested in the preceding example, ,,P, may be interpreted as the number of ​ways of choosing r objects

from n objects and then permuting the r objects r! ways,


​ giving

,jr ​= ​(n\

(n).
rj Ir!-
​ ​(nr)! n!
​ ​Dividing by r! gives the desired expression for ​
Thus, the number of combinations of four letters taken two at a time is ​(4"\ ​4!
6, as noted above. If rnorder is considered, then the number of
21 ​2!2! ​
arrangements becomes 6 ​2! = 12 as before. Thus, () counts the number of
paired symbols in either the first or second row, but not both, in Figure 1.11. ​It also is possible to solve the

probability problem in Example 1.6.6 using combinations.


​ The sample space would consist of all
combinations of the ​n inte-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
36 ​CI-IAPTER i ​PROBABILITY

gers 1, 2, ..., n taken three at a time. Equivalently, this would be the collection of
all subsets of size 3 from the set {1. 2, 3, ..,, n}, of which there are

(n'\ ​n! ​n(n lXn-2)


6

The n - 2 combinations or subsets of consecutive integers would be {1, 2, 3},


{ 2, 3, 4}, ..., {n - 2, n - 1, n}. As usual, no distinction should be made of subsets
that list the elements in a different order. The resulting probability is

(n-2) ​6 ​[n(n - 1)(n - 2)16]

as before. ​This shows that some problems can be solved using either combinations or permutations.

Usually, if there is a choice, the combination approach is simpler ​because the
sample space is smaller. However, combinations are not appropriate ​in some
problems

Example 1.6.1 ​In Example 1.6.6, suppose that the sampling is done with replacement. Now, the
same number can be repeated in the triples (i, f k), so that the sample space has ​n3 outcomes.
There are still only 6(n - 2) triples of consecutive integers, because ​repeated integers cannot be

consecutive. The probability of consecutive integers ​in this case is 6(n - 2)/n3. Integers can be

repeated in this case, so the com- bination


​ approach is not appropriate.

Example 1.6.8 ​4. familiar use of the combination notation is ​in expressing the binomial

expansion​(a + b) ​= k​ O ()akb
​ ​(1 6 4)

In this case, () is the coefficient of akbn_k, and it represents the number of ways

of choosing k of the n factors (a ​+ b) ​from which to use the a term, with the ​b
term being used from the remaining n - k factors.

Example 1.6.9 ​The combination concept can be used to determine the number of subsets of a set

of m elements. There are ways of choosing j elements from the ni elements, ​


\JJ
so there are () subsets off elements forI = 0, 1, ..., m. The case j = O corre-

sponds to the null set and is represented by () = i ​because O' is defined to be


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 ​COUNTING TECHNIQUES ​
37
equal to 1, for notational convenience. Thus the total number of subsets includ- ​ing the null set is given by
= (1 + 1)
Example 1.6.10 If five cards are drawn from a deck of cards without replacement, the number of
five-card hands is
f'52'\ ​52!
5) = 5!47!

If order is taken into account as in Example 1.6.4, then the number of ordered ​five-card hands is​52P5 = ​(52

5)5. ​= ​52! ​be


Similarly, in Example 1.5.2 the number of ordered two-card hands ​was given to ​(52
= 52 ​51
k2 ​
FIGURE 1.12 ​Distinguishable arrangements of five objects, two of ​one type and three of another

BBWWW ​WWBBW BWBWW


​ ​WWBWB ​WWWBB WBBWW
​ ​BWWBW ​WBWWB ​BWWWB WBWBW

Notice that arrangements are distinguishable if they differ by exchanging ​marbles of different colors,

but not if the exchange involves the ​same color. We ​will refer to these 10 different arrangements as

permutations of the five objects ​even though the objects are not all distinguishable. (1.6.5)

INDISTINGUISHABLE OBJECTS

guishable. The
​ ​objects. discussion
​ ​There are to
​ ​also this
​ ​many point
​ ​applications has
​ dealt with ​involving

arrangements ​objects that of


​ ​are ​n distinguishable ​not all distin- Example
​ 16.11 You have five marbles, two

black and three white, but otherwise indistinguish- ​black ​able. ​(B) ​In Figure ​and three ​1,12, ​white ​we ​(W)

​ arbles ​all the distinguishable a​ rrangements of two


represent m
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
38 ​CHAPTEF1 i ​PROBABILITY

A more general way to count such permutations first would be to introduce ​labels for the objects,

say B1 B2 W1 W2 W3. There are 5! permutations of these ​distinguishable

objects, but within each color there are permutations that we don't
​ want to
count. We can compensate by dividing by the number of permu- ​tations of
black objects (2 ) and of white objects (3!) Thus, the number of permu- ​tations of
nondistinguishable objects is

5! ​2!3!
=

This is a special case of the following theorem.

Theorem 1.6.5 ​The number of distinguishable permutations of u objects of which r are of one
kind and n - r are of another kind
is
1.6.6)

(n'\ ​kr) n!
​ ​r!(n
- r)!

Clearly, this concept can be generalized to the case of permuting k types of ​objects.

Theorem 1.6.6 The number of permutations of n objects of which r1 are of one kind, 12 of a

second kind, ​..., rk ​of a kth kind is

n
!
(1.6.7)

Proof
This follows from the argument of Example 1.6.11, except with k different colors
of balls.

Example 1.6.12 ​You have 10 marblestwo black, three white, and five red, but otherwise not
distinguishable The number of different permutations is

2520
2!5! ​

The notion of permutations of n objects, not all of which are distinguishable, is ​related to yet another
type of operation with n distinct objects.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 ​COUNTING TECHNIQUES ​39

PARTITIONING Let us select r objects from​n ​distinct objects and place them in a box or "cell," ​and then
(n)
place the remaining n - r objects in a second cell. Clearly, there are ​
ways of doing this (because permuting the objects within a cell will not produce a ​new result), and this is

referred to as the number of ways of partitioning ​n ​objects ​into two cells with r objects in one cell and n -

r in the other, The concept generalizes


​ readily to partitioning n distinct objects into more than two cells.
Theorem 1.6.7 ​The number of ways of partitioning a set of n objects into k cells with r1 objects
in the first cell r2 in the second cell, and so forth is
n! ​r1!r2!
k ​where ​r. = n.

Note that partitioning assumes that the number of objects to be placed in each ​cell is fixed, and that the

order in which the objects are placed into cells is not ​considered. ​By successively selecting the objects,

the number of partitions also may be expressed


​ as

(n(n - ​r2 ​r1 ​) ​(n - r1 - ​- rk_1 ​n!


rk ​) - r1!r2! ... rk!
Exampk, 1.6.13 How ​many ways can you distribute 12 different popsicles equally among four
children? By Theorem 1.6.7 this is
12. ​3!3!3!3!

- 369,600
This is also the number of ways of arranging 12 popsicles, of which three are red, ​thrçe are green, three are
orange, and three are yellow, if popsicles of the same ​color are otherwise indistinguishable.
PROBABILITY COMPUTATIONS
As mentioned earlier, if it can be assumed that all possible outcomes are equally ​likely to occur, then the
classical probability concept is useful for assigning prob- ​abilities to events, and the counting techniques
reviewed in this section may be ​helpful in computing the number of ways an event may occur.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
40 ​CHAPTER 1 ​PROBAWLITY

Recall that the method of sampling, and assumptions concerning order,

whether the items are indistinguishable, and so on, may have an effect on the
number of possible outcomes.

ExampI '1.6.14 A student answers 20 truefalse questions at random. The probability of getting

100% on the test is P(100%) = 1/220 = 0.00000095. We wish to know the prob-
ability of getting 80% right, that is, answering 16 questions correctly. We do not

care which 16 questions are answered correctly, so there are () ways ​of choos

ing exactly 16 cortect answers, and P(80%) =​ ()/220 ​= 00046.

ExampM 1,6.15 Sampling Without Replacement A box contains 10 black marbles and 20 white
marbles, and five marbles are selected without replacement The probability of ​getting exactly

two black marbles is ​


(l0)(20)

P(exactly 2 black) ​= -​ 0.360 ​(1.6.8)

(0) ​
There are ​ total possible outcomes. Also there ​ are() ​ways of choosing

the two black marbles from the 10 black marbles, and ​3) ways
​ of choosing the

()(0) ​
remaining principle, there three arewhite ​ marbles ​ways ​from the ​of

achieving ​20 white marbles. ​the event ​By o​ f ​the ​getting ​multiplication t​ wo
black

marbles. Note that order was not considered important in this problem, although
all 30 marbles are considered distinct in this computation, both in considering
the total number of outcomes in the sample space and in considering how many

outcomes correspond to the desired event occurring. Even though the question
does not distinguish between the order of outcomes, it is possible to consider the
question relative to the larger sample space of equally likely ordered outcomes.
In that case one would have ​30P5 ​=​() . 5!​ possible outcomes and
(1.6.9
)
P(exactly 2
black

which gives the same answer as before.


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 COUNTING TECHNIQUES ​
41

It also is possible to attack this problem by the conditional probability

approach discussed in Section 1 5 First consider the probability of getting the


outcome BBWWW in the specified order Here we choose to use the distinction
between B and W but not the distinction within the B's or within the W s By the
conditional probability approach, this joint probability may be expressed as

P(BBWWW) -

Similarly,​P(BWBWW) 10209
​ 1918

30 29 28 27 26

and so on. Thus, each particular ordering has the same probability. If we do not
wish to distinguish between the ordering of the black and white marbles, then

P(exactly 2 black) ​= (5'\


​ ​109201918 ​30 29 28 27 26 (1.6.10)

which again is the same as equation (1.6.8). That is, there are () ​= 10 different

particular orderings that have two black and three white marbles (see Figure 1.12).

One could consider () as the number of ways of choosing two positions out of

the five positions in which to place two black marbles. If a particular order is ​not

required, the probability of a successful outcome is greater. We


​ could continue to
co'isider all 30 marbles distinct in this framework but ​because only the order
between black and white was considered in computing ​a

particular sequence, it follows that there are only () unordered sequences rather

than 5! sequences. Thus, although two black marbles may be distinct, permuting ​them does not
produce a different result The order of the black marbles within ​themselves
was not considered important when defining the ordered sequences;

only the order between black and white was c​onsidered. Thus the
coefficient

could also be interpreted as the number of permutations of five things of which


two were alike and three were alike (see Figure 1.12).
Thus, we have seen that it is possible to think of the black and white marbles ​as being
indistinguishable within themselves in this problem, and the same value ​for
P(exactly 2 black) is obtained; however, the computation is no longer carried
out over an original basic sample space of equally likely outcomes. For example,
on the first draw one would just have the two possible outcomes, B and W,
although these two outcomes obviously would not be equally likely, but rather

P(B) = 10/30 and P(W) = 20/30. Indeed, the assumption that the black marbles
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
42 ​CHAPTER 1 ​PROBABILITY
and white marbles are indistinguishable within themselves appears more natural ​in the conditional
probability approach. Nevertheless, the distinctness assump- ​tion is a convenient aid in the first approach

to obtain the more basic equally ​likely sample space, even though the question itself does not

require dis- tinguishing


​ within a color.
Example 1.6.16 Sampling with Replacement ​If the five marbles are drawn with replacement in ​Example
1.6.15, then the conditional probability approach seems most natural ​and analogous ​P(exactly to
​ (1.6.10),


2 black)​= /5V1o\2(2o3
​ 2)ö) ö​ ) ​(1.6.11)
Of course, in this case the outcomes on each draw are independent. ​If one chooses to use the classical

​ ​1.6.15 ​consider ​it is more ​the sample ​convenient ​space ​just ​of
approach in this case, it is more convenient to
(3) ​
30 ​to consider ​equally likely ​the sample ​ordered ​space ​outcomes; ​of​ in ​unordered ​Example

outcomes as in equation (1.6.8), rather than the ordered outcomes as in equation ​(1.6.9). For event A, one then
has "exactly 2 black,"

n) ​
P(A) ​ - ​The form in this case remains quite similar to equation (1.6.11), although the
argument would be somewhat different. There are () different patterns in which
the ordered arrangements may contain two black and three white marbles, and ​for each pattern there are
102303 distinct arrangements that can be formed in this ​sample space.

Because many diverse types of probability problems can be stated, a unique ​approach often may be

needed to identify the mutually exclusive ways that an ​event can occur in such a manner that these

ways can be readily counted. However,


​ certain classical problems (such as those illustrated in Examples
1.6.15 ​and ​can ​1.6.16) ​be determined ​can be recognized ​for them. For ​easily ​these ​and ​problems, ​general
probability ​the ​individual ​distribution ​counting ​functions ​prob- ​lems need not be analyzed so carefully each
time.
SUMMARY
The purpose of this chapter was to develop the concept of probability in order to ​model phenomena where

the observed result is uncertain before experimentation. ​The basic approach involves defining the sample

space as the set of all possible


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES ​
43

outcomes of the experiment, and defining an event mathematically as the set of ​outcomes associated
with occurrence of the event, The primary motivation for ​assigning probability
to an event involves the long-term relative frequency inter- pretation. However,

the approach of defining probability in terms of a simple set ​of axioms is more

general, and it allows the possibility of other methods of assignment


​ and
other interpretations of probability. This approach also makes it ​possible to
derive general properties of probability.
The notion of conditional probability allows the introduction of additional ​information concerning
the occurrence of one event when assigning probability ​to another. If the
probability assigned to one event is not affected by the informa- ​tion that another

event has occurred, then the events are considered independent ​Care should be

taken not to confuse the concepts of independent and mutually exclusive ​


events. Specifically, mutually exclusive events are dependent,. because ​the
occurrence of one precludes the occurrence of the other. In other words, the
conditional probability of one given the other is zero.
One of the primary methods of assigning probability, which applies in the case of
a finite sample space, is based on the assumption that all outcomes are equally
likely to occur. To implement this method, it is useful to have techniques for
counting the number of outcomes in an event. The primary techniques include

formulas for counting ordered arrangements of objects (permutations) and

unordered sets of objects (combinations). To


​ express probability models by
general formulas, it is convenient first to ​introduce the concept of a "random
variable" and a function that describes the ​probability distribution. These
concepts will be discussed in the next chapter, and ​general solutions then can be
provided for some of the basic counting problems ​most often encountered.

EXERCISES

1. A gum-ball machine gives out a red, a black, or a green gum ball.


Describe an appropriate sample space. ​List all
possible events. ​If R is the event "red," then list
the outcomes in R'. ​If G is the event "green," then
what is R r' G?

2. Two gum balls are obtained from the machine in Exercise I from two trials. The order of
the outcomes is important. Assume that at least two balls of each color are in the machine,
What is an appropriate sample space? How many total possible events are there that contain
eight outcomes? ​Express the following events as unions of elementary events. C1 ​= getting a

​ ​There are
red ball ​on the first trial, C2 = getting at least one red ball, C1 n C2, C'1 n C2. 3.

four basic blood groups: O, A, B, and AB. Ordinarily, anyone ​can receive the ​blood of a donor

from their own group. Also, anyone can receive the blood of ​a donor f​ rom the O group, and

any of the four types can be used by a recipient from the AB ​group.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
44 ​CHAPTER 1 ​PROBABILITY

All other possibilities are undesirable. An experiment consists of drawing a pint of blood
and determining its type for each of the next two donors who enter a blood bank.
List the possible (ordered) outcomes of this experiment. ​List the outcomes
corresponding to the event that the second donor can receive the ​blood of the first
donor. ​List the outcomes corresponding to the event that each donor can receive
the blood ​of the other.

4. ​An experiment consists of drawing gum balls from a gum-ball machine until a red ball is
obtained. Describe a sample space for this experiment.

5. ​The number of alpha particles emitted by a radioactive sample in a fixed time interval is
counted.​Give a sample space for this

experiment.
The elapsed time is measured until the first alpha particle is emitted. Give a sample ​space for this experiment.

6. ​An experiment is conducted to determine what fraction of a piece of metal is gold. Give a
sample space for this experiment.

7. A randomly selected car battery is tested and the time of failure is recorded. Give an
appropriate sample space for this experiment.

8. ​We obtain 100 gum balls from a machine, and we get 20 red (R), 30 black (B), and 50 green
(G) gum balls.
Can we use, as a probability model for the color of a gum ball from the machine, ​one given by p3 = P(R) = 0.2, P2
P(B) = 0.3, and p3 = P(G) = 0.5? ​Suppose we later notice that some yellow (Y)
gum balls are also in the machine. ​Could we use as a model p1 ​0.2, P2 = 0.3, p​ 3
0.5, and p4 = P(Y) = 0.1?

9. ​In Exercise 2, suppose that each of the nine possible outcomes in the sample space is
equally likely to occur. Compute each of the following:
P(both red)
P(C1).
P(C2). ​P(C1
n C2),
P(C'L n C2).
P(C1 u C2).

10. ​Consider Exercise 3. Suppose, for a particular racial group, the four blood types are
equally likely to
occur.
Compute the probability that the second donor can receive blood from the first ​donor. ​Compute the probability
that each donor can receive blood from the other. Compute the probability that
neither can receive blood from the other.

11. ​Prove that P(Ø) = 0. Hint: Let A = 0 for all i in equation (1.3.3).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES ​
45

12. ​Prove equation (1.3.5). Hint: Let A ​Ø for all i> k in equation (1.3.3).

13. When an experiment is performed, one and only one of the events A1, A2, or A3 will
occur. Find P(A1), F(A2), and P(A3) under each of the following assumptions:
P(A1) = P(A2) ​P(A3). ​P(A1) =
P(A2) and P(A3) = 1/2. ​P(A1) =
2P(A2) = 3P(A3).

14. A balanced coin is tossed four times. List the possible outcomes and compute
the
probability of each of the following events:
exactly three heads. ​at least one head. the
number of heads equals the number of tails. the
number of heads exceeds the number of tails.

15. Two part-time teachers are hired by the mathematics department and each is assigned at
random to teach a single course, in trigonometry, algebra, or calculus. List the outcomes ​in the
sample space and find the probability that they will teach different courses. Assume that more
than one section of each course is offered.

16. ​Prove Theorem 1.4.4. Hint: Write A u B u C = (A u B) u Cand apply Theorem 1.4.3.

17. ​Prove Theorem 1.4.5. Hint: IfA ​B, then we can write B = A u (B n A'), a disjoint

union

18. ​If A and B are events, show that:


P(A n B') = P(A) - P(A n B). ​P(A u B) = i - P(A' n B').

19. ​Let P(A) = P(B) = 1/3 and P(A n B) ​1/10, Find the following
P(B'). ​P(A
u B'). ​P(B
n A').
P(A' u
B').

20. ​Let P(A) = 1/2, P(B) 1/8, and P(C) = 1/4, where A, B, and C are mutually exclusive.
Find the following:
P(A u B u C).
P(A' n B' n C').

21. ​The event that exactly one of the events Aor B occurs can be represented as

(A n B') u (A' n B). Show that


P[(A nB)u(A
nB)]=P(A)-i-P(B)-2P(AnB)
22. A track star runs two races on a certain day. The probability that he wins the first race is
0.7, the probability that he wins the second race is 0.6, and the probability that he wins

both races is 0.5. Find the probability that:


(a) he wins at least one
race.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1 ​PROBABILITY
46 CHAPTER

he wins exactly one race. ​(e) he wins neither race.

23. A certain family owns two television sets, one color and one black-and-white set. Let A be
the event the color set is on and B the event the black and white set is on If P(A) = 04
P(B) = 0 3 and P(A u B) = 0 5 find the probability of each event
both are on. ​the color set is on and
the other is offi ​exactly one set is on.
neither set is on.

24. ​Suppose P(A1) = 1/(3 + i) for ¡ = 1, 2, 3, 4. Find an upper bound for


P(A1 u A2 u A3 u A4).

25. ​A box contains three good cards and two bad (penalty) cards. Player A chooses a card and
then player B chooses a card. Compute the following probabilities:
P(A good). ​P(B good A good). P(B good A bad). ​P(B good n A good) using (1.5.5).
Write out the sample space of ordered pairs and compute P(B good n A good) and
P(B good A good) dìrectly from definitions. (Note: Assume that the cards are
distinct.) P(B good). ​P(A good j B good).

26. ​Repeat Exercise 25, but assume that player A looks at his card, replaces it in the box, and
remixes the cards before player B draws.

27. A bag contains five blue balls and three red balls. A boy draws a ball, and then draws
another without replacement. Compute the following probabilities:
P(2 blue balls). ​P(1
blue and i red).
P(at least i blue).
P(2 red balls).

28. ​In Exercise 27, suppose a third ball is drawn without replacement. Find:
(a) P(no red balls left after third draw).
(b ​P(i red ball left).
P(first red ball on last draw). ​P(a red ball on last draw).

29. A family has two children. It is known that at least one is a boy. What is the probability
that the family has two boys, given at least one boy? Assume P(boy) = 1/2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
EXERCISES ​
47
30. Two cards are drawn from a deck of cards without replacement.
What is the probability that the second card is a heart, given that the first card is a ​heart? What
​ is the probability that both
cards are hearts, given that at least one is a heart? ​31. A ​box contains five green balls, three black balls, and seven red
balls. Two balls ​are ​selected at random without replacement from the box. What is the probability that:
both balls are red? ​both balls are the same color?
32. A softball team has three pitchers, A, B, and C, with winning percentages of 0,4, 0.6, and

0 8 respectively These pitchers pitch with frequency 2 3 and 5 out of every 10 games ​respectively ​P(C) = 05 Find ​In

other words for a randomly selected game P(A) ​= 0 2 P(B) = 0 3 and P(team ​ wins game) = P(W) ​P(A pitched game team
won) = P(A ​ W) ​33. ​One card is selected from a deck of 52 cards and placed in a second deck. A card then is
I​

selected from the second deck


What is the probability the second card is an ace? ​If the first card is placed into a deck of 54 cards containing two jokers,
then what is ​the probability that a card drawn from the second deck is an ace? ​Given that an ace was drawn from the second
deck in (b), what is the conditional ​probability that an ace was transferred?
34. A pocket contains three coins, one of which had a head ​on both sides, while the other two ​coins are normal A coin is

chosen at random from the pocket and tossed three times


Find the probability of obtaining three heads. ​Ifa head turns up all three times, what is the probability that this is the

two-headed ​coin? ​3.


In a bolt factory, machines 1, 2, and 3 respectively produce 20%, 30%, and 50% of ​the ​total ​at random.

output. Of their 'respective outputs, 5%, 3%, and 2% are defective. A bolt is ​selected What
​ is the probability that it is
defective? ​Given that it ​s defective what is the probability that it was made by machine 1 ​9
36. ​Drawer A contains five pennies and three dimes, while drawer B contains three pennies ​and seven dimes. A ​ ​drawer is

selected at random, and a coin is selected at random from ​that drawer.


Find the probability of selecting a dime. ​Suppose a dime is obtained What is the probability that it came from drawer B
37. ​Let P(A) = 0.4 and P(A u B) ​0.6.
For what value of P(B) are A and B mutually exclusive? ​For what value of P(B) are A and B independent?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
48 ​CHAPTER 1 ​PROBABILITY

Prove Theorem 1.5.5. Hint: Use Exercise


18.

Three independent components are hooked in series. Each component fails with
probability p What is the probability that the system does not fail9

Three independent components are hooked in parallel. Each component fails with
probability p. What is the probability that the system does not fail?

Consider the following system with assigned probabilities of malfunction for the
five ​components. Assume that malfunctions occur independently.

What is the probability the system does not malfunction?

The probability that a marksman hits a target is 0.9 on any given shot, and repeated shots
are independent. He has two pistols; one contains two bullets and the other contains only
one bullet He selects a pistol at random and shoots at the target until the pistol is empty
What is the probability of hitting the target exactly one time9

43 ​Rework Exercise 27 assuming that the balls are chosen with i eplacement

44 ​In a marble game a shooter may (A) miss, (B) hit one marble out and stick in the ring, or
(C) hit one marble out and leave the ring. If B occurs, the shooter shoots again.
and P(C) ​ and these prob ibilities do not change from ​
If P(A) = p1 ​P(B) = P2 ​ p​ shot to
shot, then express the probability of getting out exactly three marbles on one ​turn.
What is the probability of getting out exactly x marbles in one turn? Show that the
probability of getting one marble is greater than the probability of ​getting zero
marbles if

1 P2 ​Pl< ​2_pl 45.


​ ​In the marble game in Exercise 44, suppose the probabilities depend on the
number of
marbles left in the ring, N. Let

1​0.2N ​0.8N ​P(A) = P(B) = ​P(C) = ​N+1 N+1


N+1
Rework Exercise 44 under this assumption.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES ​49

46. ​A, B, and C are events such that P(A) = 1/3, P(B) = 1/4, and P(C) = 1/5. Find

P(A ii B u C) under each of the following assumptions:


If A, B, and C are mutually exclusive.
If A, B, and C are independent.

47 ​A bowl contains four lottery tickets with the numbers 111 221 212 and 122 One ticket is
​ etermine
drawn at random from the bowl and A, is the event 2 in the ith place ​i ​1 2 3 D
whether A, A2, and A3 are independent.

Code words are formed from the letters A through Z.


How many 26-letter words can be formed without repeating any letters?
How many five letter words can be formed without repeating any letters7
How many five-letter words can be formed if letters can be repeated?

49. ​License plate numbers consist of two letters followed by a four-digit number, such as
SB7904 or AY1637.
How many different plates are possible if letters and digits can be repeated? ​Answer
(a) if letters can be repeated but digits cannot. ​(e) How many of the plates in (b) have
a four-digit number that is greater than 5500?

50 ​In how many ways can three boys and three girls sit in a row if boys and girls must
alternate7

51. How many odd three-digit numbers can be formed from the digits 0, 1, 2, 3, 4 if digits can
be repeated, but the first digit cannot be zero?

52 ​Suppose that from 10 distinct objects four are chosen at random with replacement
What is the probability that no object is chosen more than once7 What is
the probability that at least one object is chosen more than once?

A restaurant advertises 256 types of nachos. How many topping ingredients must be
available to meet this claim if plain corn chips count as one type?

A club consists of 17 men and 13 women, and a committee of five members must be

chosen.​How many committees are possible?

How many committees are possible with three men and two omen9 ​(e) Answer (b) if a
particular man must be included.

55 ​4 football coach has 49 players available for duty on a special kick receiving team
If 11 must be chosen to play on this special team, how many different teams are
possible? ​If the 49 include 24 offensive and 25 defensive players, what is the
probability that a ​randomly selected team has five offensive and six defensive
players?

56. ​For positive integers n > r, show the following:


(a) ​
()(n.r)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
50 ​CHAPTER 1 ​PROBABILITY

(n (n
​ 1
(b) ​

r)r-1
57. ​Provide solutions for the following sums:

​ ​74
74\ 74\

76'\ ​76'\ ​16\ ​76 ​o) +

2) + 4) +

(b), Hint: Use Exercise 56(b).

58. ​Seven people show up to apply for jobs as cashiers at a discount store.
If only three jobs are available, in how many ways can three be selected from the
seven applicants? ​Suppose there are three male and four female applicants, and all
seven are equally ​qualified, so the three jobs are filled at random. What is the
probability that the ​three hired are all of the same sex? In how many different ways
could the seven applicants be lined up while waiting for an interview? If there are
four females and three males, in how many ways can the applicants be lined up if
the first three are female?

59. ​The club in Exercise 54 must elect three officers: president, vice-president, and secretary.
How many different ways can this turn out? ​60. How many ways can 10 students be lined up to get on a bus if
a particular pair of students
refuse to follow each other in line?

61. ​Each student in a class of size n was born in a year with 365 days, and each reports his or
her birth date (month and day, but not year).
How many ways can this happen? ​How many ways can this happen with no
repeated birth dates? What is the probability of no matching birth dates9 ​In a
class of 23 students, what is the probability of at least one repeated birth date?

62. A kindergarten student has 12


crayons.
How many ways can three blue, four red, and five green crayons be arranged in a row? ​How many ways can 12
distinct crayons be placed in three boxes containing 3, 4, ​and 5 crayons,
respectively?

63. How many ways can you partition 26 letters into three boxes containing 9, 11, and
6
letters
?

64. How many ways can you permute 9 a's, 11 b's, and 6 c's?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES ​51

65. A contest consists of finding all of the code words that can be formed from the letters in
the name "ATARI," Assume that the letter A can be used twice, but the others at most
once.
How many five-letter words can be formed?
How many two-letter words can be
formed? How many words can be formed?

66. ​Three buses are available to transport 60 students on a field trip. The buses seat 15, 20,
and 25 passengers, respectively. How many different ways can the students. be loaded on
the buses?

67. A certain machine has nine switches mounted in a row. Each switch has three positions, a,

b, and c.​How many different settings are

possible?
Answer (a) if each position is used three
times.

Suppose 14 students have tickets for a


concert.
Three students (Bob, Jim, and Tom) own cars and will provide transportation to the
concert. Bob's car has room for three passengers (nondrivers), while the cars owned
by Jim and Tom each has room for four passengers. In how many different ways can
the 11 passengers be loaded into the cars? At the concert hall the students are seated
together in a row. If they take their seats ​in random order, find the probability that
the three students who drove their cars ​have adjoining seats.

69. ​Suppose the winning number in a lottery is a four-digit number determined by drawing
four slips of paper (without replacement) from a box that contains nine slips numbered
consecutively i through 9 and then recording the digits in order from smallest to largest.
How many different lottery numbers are possible? ​Find the probability that the
winning number has only odd digits. ​How many different lottery numbers are
possible if the digits are recorded in the order they were drawn?

70. ​Consider four dice A, B, C, and D numbered as follows: A has 4 on four faces and O on
two faces; B has 3 on all six faces; C has 2 on four faces and 6 on two faces; and ​D ​has 5
on three faces and i on the other three faces. Suppose the statement A > B means that the
face showing on A is greater than on B, and so forth. Show that P[A > ​B] = P[B> C]
.P[C> D] = P[D ​> A] = 2/3. In other words, if an opponent chooses a die, you can
always select one that will defeat him with probability 2/3.

71 ​A laboratory test for steroid use in profesional athletes has detection rates given in the
following table:

Test Result
Steroid Use ​+​
Yes ​.90 ​.10 ​No ​.01 ​.99
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
52 ​CHAPTER 1 ​PROBABILITY

If the rate of steroid use among professional athletes is lin 50:


What is the probability that a professional athlete chosen at random will have a ​negative test
result for steroid use? 1f the athlete tests positive, what is the probability that he has actually
been using steroids? 7 ​ 2 ​A box contains four disks that have different colors on each side. Disk
i is red and green, disk 2 is red and white, disk 3 is red and black, and disk 4 is green and
white. One disk is ​
selected at random from the box, Define events as follows: A = one side is
red, B = one ​side is green, C = one side is white, and D = one side is black.
Are A and B independent events? Why or why not? Are B
and C independent events? Why or why not? (e) Are any
pairs of events mutually exclusive? Which ones?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
RANDOM VARIABLES ​AND
THEI ​DISTRIBUTIONS

21
INTRODUCTION​Our purpose is to develop mathematical models for describing the

probabilities
of outcomes or events occurring in a sample space Because mathematical equa
tions are expressed in terms of numerical values rather, than as heads, colors, or
other properties, it is convenient to define a function, known as a random vari-
able, that associates each outcome in the experiment with a real number. We then
can express the probability model for the experiment in terms of this associated
random variable. Of course, in many experiments the results of interest already

are numerical quantities, and in that case the natural function to use as the

random variable would be the identity function.

Pe​finition 21.1 ​Random Variable A random variable, say X, is a function defined over a sample ​space,
S, that associates a real number, X(e) = x, with each possible outcome e in S.

53
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
54 ​CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Capital letters, such as X, Y, and Z will be used to denote random variables. The lower case letters
x, y, z, ... will be used to denote possible values that the ​corresponding random
variables can attain. For mathematical reasons, it will be ​necessary to restrict
the types of functions that are considered to be random ​variables. We will
discuss this point after the following example.

Example 2.1.1 ​A four-sided (tetrahedral) die has a different numberi, 2, 3, or 4affixed to


each side. On any given roll, each of the four numbers is equally likely to occur. ​A game consists
of rolling the die twice, and the score is the maximum of the two ​numbers that occur. Although
the score cannot be predicted, we can determine ​the set of possible values and define a random
variable. In particular, if e​ = (i, J) ​where i, ​j E ​{1, 2, 3, 4}, then X
​ (e) ​= max (i, J). The sample space,
S, and X are ​illustrated in Figure 2 1

FIG UflE 2.1 ​Sample space for two rolls of a four-sided die

(1,4) (2,4) ​(3,4) ​(4,4)

(1,3) ​(2,3) ​(3,3) ​(4,3)


s
(1,2) (2,2) (3,2) ​(4,2)

(1,1) (2,1) ​(3,1) ​(4,1)

​ 2, B3, and B4 of S contains the pairs ​(i,


Each of the events ​B1, B J) ​that have a ​common

x=3overB3,andx=4overB4. maximum.
​ In other words, X has value x =

i over B1, x = 2 over ​B2, ​Other random variables also could be considered.

For example, the random variable


​ Y(e) = i + j ​represents the total on the two
rolls.

The concept of a random variable permits us to associate with any sample


space, S, a sample space that is a set of real numbers, and in which the events of
interest are subsets of real numbers. If such a real-valued event is denoted by A,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.1 ​INTRODUCTION ​
SS

then we would want the associated set

B={eleeS and X(e)eA} ​(2.1.1) ​to be an event in the underlying sample space S. Even
though A and B are ​subsets of different spaces, they usually are referred to as
equivalent events, and ​we write​P[X n A] = P(B) ​(2.1.2) The
​ notation Pr(A)
sometimes is used instead of P[X n A] in equation (2.1.2). ​This defines a set
function on the collection of real-valued events, and it can be ​shown to satisfy
the three basic conditions of a probability set function, as given ​by Definition
1.3.1. ​Although the random variable X is defined as a function of ​e, ​it usually

is ​possible to express the events of interest only in terms of the real values that

X assumes.
​ Thus, our notation usually will s'ippress the dependence on the
out- ​comes in S, such as we have done in equation (2.1.2).
For instance, in Example 2.1.1, if we were interested in the event of obtaining a ​score of "at most 3,"

this would correspond to X = 1, 2, or 3, or X ​E ​{1, 2, 3}. ​Another possibility

would be to represent the event in terms of some interval that ​contains the

values 1, 2, and 3 but not 4, such as A ​= (w, 3]. The associated ​equivalent

event ​in ​S i​ s ​B = B1 u B u B3, a​ nd ​the ​probability ​is ​P[X e A] = P(B) = 1/16 +

3/16 + 5/16 = 9/16.​A ​convenient ​notation ​for ​P[X e A], ​in this example, is P[X

3]. Actually, any other real event containing ​1, 2, and 3 but not 4 could be used in

this way, but intervals, and especially those of​ the form (- cx, x], will be of
special importance in developing the properties of ​random variables.
As mentioned in Section 1.3, if the probabilities can be determined for each
elementary event in a discrete sample space, then the probability of any event can
be calculated from these by expressing the event as a union of mutually exclusive

elementary events, and summing over their probabilities A​ more general


approach for assigning probabilities to events in a real sample ​space can be

based on assigning probabilities to intervals of the form (- ct, x] ​for all real

numbers x. Thus, we will consider as random variables only functions X ​ that


satisfy the requirements that, for all real x, sets of the form

B = [X ​x] = {eje eS and X(e) e (oc, x]} ​(2.1.3) a​ re events in the sample space
S. The probabilities of other real events can be ​evaluated in terms of the
probabilities assigned to such intervals. For example, ​for the game of Example
2.1.1, we have determined that P[X ​3] = 9/16, and it ​also follows, by a similar

argument, that F[X ​2] = 1/4. Because (- ​, 2] con- ​tains I and 2 but not 3,

and (co, 3] ​= (cc, 2] u (2, ​3], it follows that ​P[X = 3] = P[X 3​ ] - P[X
2] = 9/16 - 1/4 = 5/16. Other
​ examples of random variables can be based on the
sampling problems of ​Section 1.6.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
56 ​CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Example 2.1.2 ​In Example 1.6.15, we discussed several alternative approaches for computing the
probability of obtaining exactly two black marbles, when selecting five (without ​replacement) from a
collection of 10 black and 20 white marbles. Suppose we are concerned with the general problem of obtaining

x black marbles, for arbitrary x. ​Our approach will be to define a random variable X as the number of

black marbles
​ in the sample, and to determine the probability P[X ​x] for every pos- ​sible value x. This is

easily accomplished with the approach given by equation ​(1.6.8), and the result is​(lo'\( 20

xÂ5​- ​
P[X = ​ x = 0, 1,2, 3,4, 5 ​(2.1.4)

Random variables that arise from counting operations, such as the random var- ​iables in Examples 2. ​1.1.

and 2.1.2, are integer-valued. Integer-valued ran- dom


​ variables are examples of an important special type
known as discrete random ​variables.

2.2 ​DISCRETE RANDOM VARIABLES​(30


Definition 2.2.1 ​If the set of all possible values of a random variable, X, is a countable set, ​x1, x2.....x, or
x1, x2, ..., then X is called a discrete random variable. The func- ​tion
f(x)=P[X = x] ​x = x1, x2, ... ​(2.2.1)
that assigns the probability to each possible value x will be called the discrete prob- ​ability density function (discrete pdf).
If it is clear from the context that X is discrete, then we simply will say pdf. ​Another common terminology
is probability mass function (pmf), and the possible ​values, x, are called mass points of X. Sometimes a

subscripted notation, f(x), ​is u​ sed.​The following theorem gives general properties that any discrete pdf

must satisfy.

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2,2 ​DISCRETE RANDOM VARIABLES ​57

Th'rem ​2.2.1 A function f(x) is a discrete pdf if and only if it satisfies both of the following
properties for at most a countably infinite set of reals Xj, x2,.

f(x1) ​0 ​(2.2.2)

for all x, and​f(x1) = 1 ​(2.2.3)


all
x

Proo
f
Property (2.2.2) follows from the fact that the value of a discrete pdf is a probabil- ​ity and must be

nonnegative. Because x1, x2, ... represent all possible values of ​X, the events

{X = x1], [X = ​... ​constitute an exhaustive partition of the sample


​ space. Thus,

f(x1) = ​P[X = ​allx allx

Consequently, any pdf must satisfy properties (2.2.2) and (2.2.3) and any func-
tion that satisfies properties (2.2.2) and (2.2.3) will assign probabilities consis-
tent with Definition 1.3.1.

In some problems, it is possible to express the pdf by means of an equation, ​such as equation

(2.1.4). However, it is sometimes more convenient to express it ​in tabular form

For example, one way to specify the pdf of X for the random variable
​ X in
Example 2.1.1 is given in Table 2.1.

TABLE 2.1

mum of
​ the 2

f(x) ​1/16 ​3/16 ​5/16 ​7/16

Of course, these are the probabilities, respectively, of the events B, B2, ​B3, ​and B4 in S.
A graphic representation off(x) is also of some interest. It would be possible to ​leave f(x) undefined
at points that are not possible values of X, but it is conve- ​nient to define f(x)
as zero at such points. The graph of the pdf in Table 2.1 is ​shown in Figure
2.2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
58 ​CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

FIGURE 2.2 ​Discrete pdf of the maximum of two roHs of a four-sided die

f(x
)

T​
Î​ X​
ExmpIe 2.2i ​Example 2.1.1 involves two rolls of a four-sided die. Now we will roll a
1 ​2​

12-sided ​(dodecahedral) die twice. If each face is marked with an integer, i through ​12, ​then
each value is equally likely to occur on a single roll of the die. As before, we ​define a random

variable X to be the maximum obtained on the two rolls. It is ​not hard to see that for each

value x there are an odd number, 2x - 1, of ways iòr


​ that value to occur. Thus, the pdf of X must
have the form

f(x)=c(2x 1) ​for x= 1,2, ​12 ​(224)


One way to determine c would be to do a more complete analysis of the counting
problem but another way would be to use equation (22 ​3) ​In particular

1= ​x1 12​

f(x)=c(2x-1)= ​x=1 12​ ​=


C[ [2(12)(13)
​ ​2 12
​ ​ c(12)2
​- =

So c = 1/(12)2 = 1/144.

As mentioned in the last section, another way to specify the distribution of ​probability is to assign

probabilities to intervals of the form (- cc, x], for all real ​x The probability

assigned to such an event is given by a funclion called the cumulative



distribution function

Definition 2.2.2
The cumulative distribution function (CDF) of a random variable X ​is ​defined for
any real ​x by

F(x) = P[X ​x] ​(2.2.5)


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.2 ​DISCRETE RANDOM VARIABLES ​59

FiGURE 2.3 ​The CDF of the maximum of two rolls of a four-sided die

F(x)

9/16
-

1/4

1/16 ​X

The function F(x) often is referred to simply as the ​stribution function of X, ​and the subscripted

notation, F(x), sometimes is used. ​a particular For


​ brevity, ​form we
​ often ​is

appropriate. will
​ use a short ​If we notation
​ ​write X to
​ indicate ​f(x) or that
​ ​X -

a distribution ​F(x), this will of


​ ​mean that the random variable X has pdf f(x)

and CDF F(x). ​As seen in Figure 2.3, the CDF of the distribution given in

Table 2.1 is a nondecreasing


​ step function. The step-function form of F(x) is
common to all ​discrete distributions, and the sizes of the steps or jumps in
the graph of F(x) c​ orrespond to the values of f(x) at those points. This is easily

seen by comparing ​Figures 2.2 and 2.3. ​The general relationship between F(x)

and f(x) for a discrete distribution is given


​ by the following theorem.

Thsorem 2.2.2 ​Let ​values X


​ be ​of ​a ​X ​discrete are
​ indexed random
​ ​in variable
​ ​increasing with

order, pdff(x)
​ ​x1 and
​ ​<x2 CDF
​ ​<x3 F(x).
​ ​< ​If ​.., the
​ ​then possible
​ ​f(x1)
= F(x1), and for any i> 1,

f(x) = F(x) - F(x1) ​(2.2.6)


Furthermore, if x <x1 then F(x) = O, and for any other real x
F(x) = ​f(x) (​ 2.2.7)
Xi ​X

where the summation is taken over all indices i such that x ​x.

The CDF of any random variable must satisfy the properties of the following ​theorem.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
60 ​CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Theorem 2.2.3 ​A function F(x) is a CDF for some random variable X if and only if it satisfies
the following
properties:

um F(x) ​= ​O ​(2.2.8)

um F(x) ​= ​i ​(2.2.9) ​X -.​11m F(x + h) ​= ​F(x) ​(2.2.10) li​ -'0

a ​< ​b implies F(a) F


​ (b) ​(2.2.11)

The first two properties say that F(x) can be made arbitrarily close to O or i by ​taking x arbitrarily
large, and negative or positive, respectively. In the examples ​considered so far,
it turns out that F(x) actually assumes these limiting values. ​Property (2.2.10)

says that F(x) is continuous from the right. Notice that in Figure ​2.3 the only

discontinuities are at the values 1, 2, 3, and 4, and the limit as x approaches



these values from the right is the value of F(x) at these values. On the ​other

hand, as x approaches these values from the left, the limit of F(x) is the

value of F(x) on the lower step, so F(x) is not (in general) continuous from ​the

left. Property (2.2.11) says that F(x) is nondecreasing, which is easily seen to be ​the

interval ​case ​of ​in Figure ​the form ​2.3. ​(- ​In ​cxc, ​general, b] can be this

represented property follows as the union from the of two fact disjoint that

an ​intervals​(cc,b]=(c13,a] u(a,b] ​(2.2.12)

for any ​P[a ​< ​x a​ <b. ​It ​follows ​that ​b] ​0, and thus equation F(b)
​ ​ ​F(a) + P[a
=

<x ​(2.2.11) is obtained. b]


​ ​F(a), ​because ​Actually, by this argument we have

obtained another very useful result, namely.


P[a <X ​b] ​= ​F(b) - F(a) ​(2.2.13)

This reduces the problem of computing probabilities for events defined in terms
of intervals of the form (a, b] to taking differences with F(x). ​Generally, it is

somewhat easier to understand the nature of a random variable ​and its

probability distribution by considering the pdf directly, rather than the CDF,

although the CDF will- provide a good basis for defining continuous prob-

ability distributions. This will be considered in the next section. ​Some important

properties of probability distributions involve numerical quantities


​ called
expected values.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.2 ​DISCRETE RANDOM VARIABLES ​
61

Definition 22..3
If X is a discrete random vaiabIe with pdff(x), then the expected value of X is

defined by​E(X) = ​xf(x) ​(2.2.14)

The sum (2.2.14) is understood to be over all possible values of X. Further-


more, it is an ordinary sum if the range of X is finite, and an infinite series if the

range of X is infinite In the latter case, if the infinite series is not absolutely
convergent, then we will say that E(X) does not exist. Other common notations
for E(X) include u, possibly with a subscript, ​The terms mean and expectation

also are often used. The


​ mean or expected value o a random variable is a
"weighted average," and ​it can be considered as a measure of the "center" of
the associated probability ​distribution.

Example 2.2.2 A box contains four chips. Two are labeled with the number 2, one is labeled
with a 4, and the other with an 8. The average of the numbers on the four chips is ​(2 + 2 + 4 + 8)/4
= 4. The experiment of choosing a chip at random and record- ing its number can be associated
with a discrete random variable X having dis- t​ inct ​values ​x = 2, 4​ , ​or ​8, ​with f(2) = 1/2 and f(4)
= f(8) = 1/4. ​The ​corresponding expected value or mean is

​ ​4(i) ​+ ​
= E(X) = ​2(i) + 8() =​ 4
as before. Notice that this also could model selection from a larger collection, as
long as the possible observed values of X and the respective proportions in the
collection,f(x), remain the same as in the present example.

There is an analogy between the distribution of probability to values, x, and ​the distribution of
mass to points in a physical system For example, if masses of ​0.5, 0.25, and 0.25
grams are placed at the respective points x = 2, 4, and 8 cm on ​the horizontal

axis, then the value 2(0.5) + 4(0.25) + 8(0.25) = 4 is the "center of ​mass' or

balance point of the corresponding system This is illustrated in Figure



2.4.

FIGURE 2.4 ​The center-of-mass interpretation of the mean


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
62 ​CHAPTER 2 RANDOM VARIABLES AND THDR DSTRIBUTI0NS

2J
In the previous example E(X) coincides with one of the possible values of X, ​but this is not always the
case, as illustrated by the following example.

A game of chance is based on drawing two chips at random without replacement

from the box considered in Example 2.2.2. If the numbers on the two chips
match, then the player wins $2; otherwise, she loses $1. Let X be the amount won

by the player on a single play of the game. There are only two possible values,

X = 2 if both chips bear the number 2, and X = 1 otherwise. Furthermore,


there are () = 6 ways to draw two chips, and only one of these outcomes corre-

spond to a match. The distribution of X isf(2) = 1/6 andf(-1) = 5/6, and con- ​sequently the expected

amount won is E(X) ​= (- 1X5/6) + (2)(1/6) = - 1/2. Thus, the


​ expected amount

"won" by the player is actually an expected loss of one-half ​dollar. The​


connection with long-term relative frequency also is well illustrated by this
example. Suppose the game is played M times in succession, and denote the
relative frequencies of winning and losing by ​and fL, respectively. The average

amount the player wins is (- l)fL + (2)f. Because of statistical regularity, we

have that IL and f approach f( - 1) and f(2), respectively, and thus the player's
average winnings approach E(X) as M approaches infinity.
Notice also that the game will be more equitable if the payoff to the player is ​changed to $5 rather

than $2, because the resulting expected amount won then ​will be (- 1X5/6) +

(5X1/6) = O. In general, for a game of chance, if the net amount


​ won by a
player is X, then the game is said to be a fair game if E(X) = O.

CONTINUOUS RANDOM VARIÍI.ES


The notion of a discrete random variable provides an adequate means of prob-
ability modeling for a large class of problems, including those that arise from the
operation of counting. However, a discrete random variable is not an adequate

model in many situations, and we must consider the notion of a continuous


random variable. The CDF defined earlier remains meaningful for continuous
random variables, but it also is useful to extend the concept of a pdf to contin-
uous random variables.

Example 2.3.1 ​Each work day a man rides a bus to his place of business. Although a new bus
arrives promptly every five minutes, the man generally arrives at the bus stop at a ​random time
between bus arrivals. Thus, we might take his waiting time on any ​given morning to be a random
variable X.
Although in practice we usually measure time only to the nearest unit (seconds, ​minutes, etc.), in theory
we could measure time to within some arbitrarily small unit. Thus, even though
in practice it might be possible to regard X as a discrete
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
FIGURE 2.5
2.3 ​CONTINUOUS RANDOM VARIABLES ​63

ng ​time for ​a bus


random variable with possible values determined by the smallest appropriate

time unit, it usually is more convenient to consider the idealized situation in


which X is assumed capable of attaining any value in some interval, and not just
discrete points.
Returning to the man waiting for his bus, suppose that he is very observant ​and noticed over the

years that the frequency of days when he waits no more ​than ​the form x​

minutes ​F(x) = ​for P[X


​ ​ bus ​x] is
the ​ ​= proportional
​ ​cx, for some to
​ ​constant x​
for all ​c x.
​ ​> ​This O.
​ Because suggests
​ ​the a​ CDF ​buses ​of ​arrive at regular

five-minute intervals, the range of possible values of X is the ​time interval

[0, 5]. In other words, P[O ​X ​5] = 1, and it ​follows that ​that ​i = F(5)

F(x)=Oifx<O ​= c ​5, and thus a​ nd ​c ​F(x)= ​= 1/5, and ​lifx>5. ​F(x) = x/5

if O ​x ​5. It also follows ​Another way to study this distribution would be to

observe the relative fre- quency


​ of bus arrivals during short time intervals of the
same length, but distrib- ​uted throughout the waiting4ime interval [0, 5]. It may
be that the frequency of ​bus arrivals during intervals of the form (x, x + Ax] for
small Ax was proportion- ​al to the length of the interval, Ax, regardless of the
value of x. The correspond- ​ing condition

P[x<X'x+Ax]=F(x+Ax)F(x)=cAx this
​ imposes on the distribution
of X is ​for all O ​x < x + Ax ​5 and some c > 0. Of course, this implies that if F(x)

is ​differentiable at x, its derivative is constant, F'(x) = c > O. Note also that

for ​x ​P[x ​< O ​<X ​or x> ​x + ​5, ​Ax] ​the ​= O when d​ erivative ​x and x ​also +

Ax e​ xists, ​are not possible b​ ut ​F'(x) = O ​because ​values of X, and the


​ derivative
does not exist at all at x = O or 5.

In general, if F(x) is the CDF of a continuous random variable X, then we will ​denote its derivative
(where it exists) byf(x), and under certain conditions, which will be specified
shortly, we will callf(x) the probability density function of X. In ​our example,
F(x) can be represented for values of x in the interval [0, 5] as the ​integral of its
derivative:

F(x) ​= ​Jf(t) ​dt = dt =


The graphs of F(x) andf(x) are shown in Figure
2.5.

F(x) ​f x)
X
o ​5
o ​5
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
64 ​CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

This provides a general approach to defining the distribution of a continuous ​random variable X.

Definition
23.1
A random variable X is called a continuous random variable if there is a function ​f(x), called the probability
density function (pdf) of X, such that the CDF can be ​represented as

rx ​
F(x)=J ​f(t)dt ​(2.3.1) ​-

In more advanced treatments of probability, such distributions sometimes are called "absolutely
continuous" distributions. The reason for such a distinction is ​that CDFs exist
that are continuous (in the usual sense), but which cannot be ​represented as
the integral of the derivative. We will apply the terminology con ​tinuous
distribution only to probability distributions that satisfy property (2.3.1).
Sometimes it is convenient to use a subscripted notation, Fx(x) and f(x), for ​the CDF and pdf,
respectively.
The defining property (2.3.1) provides a way to derive the CDF when the pdf is ​given, and it follows by
the Fundamental Theorem of Calculus (hat the pdf can ​be obtained from the
CDF by differentiation. Specifically,

f(x) = f- F(x) = F(x) ​(2.3.2)

wherever the derivative exists. Recall from Example 2.3.1 that there were two
values of x where the derivative of F(x) did not exist. In general, there may be
many values of x where F(x) is not differentiable, and these will occur at discon
tinuity points of the pdf, f(x). Inspection of the graphs of f(x) and F(x) in Figure
2.5 shows that this situation occurs in the example at x = O and x ​5. However,
this will not usually create a problem if the set of such values is finite, because an

integrand can be redefined arbitrarily at a finite number of values x without


affecting the value of the integral. Thus, the function F(x), as represented in pro-
perty (2.3.1), is unaffected regardless of how we treat such values. It also follows
by similar considerations that events such as [X = c], where c is a constant, will

have probability zero when X is a continuous random variable. Consequently,


events of the form [X e I], where J is an interval, are assigned the same probabil-

ity whether ¡ includes the endpoints or not. In other words, for a continuous
random variable X, if a < b,
P[a <X ​b] = P[a ​X < b] = P[a <X < b]
= P[a ​X b​ ] ​(2.3.3) ​and each of these has the value F(b) - F(a).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.3 ​CONTINUOUS RANDOM VARIABLES ​65

Thus, the CDF, F(x), assigns probabilities to events of the form (- cia, x], and ​equation (2.3.3) shows

how the probability assignment can be extended to any ​interval. ​Any function

f(x) may be considered as a possible candidate for a pdf if it produces


​ a
legitimate CDF when integrated as in property (2.3.1). The following ​theorem
provides conditions that will guarantee this

Theorem 231 A functionf(x) is a pdf for some continuous random variable X if and only if it
satisfies the properties

f(x) ? O ​(2.3.4)

for all real x,


and

roe​
​ = 1 ​(2.3.5) P
Jf(x) ​- dx ​ roof

Properties (2.2.9) and (2.2.11) of a CDF follow from properties (2.3.5) and (2.3.4),
respectively. The other properties follow from general results about integrals.

Example 2.3.2 ​A machine produces copper wire, and occasionally there is a flaw at some point
along the wire. The length of wire (in meters) produced between successive flaws ​is a continuous
random variable X with pdf of the form
(2.3.6
)

+x) ​x>O ​
xO

where c is a constant. The value of c can be determined by means of property


(2.3.5). Specifically, set

i = ​Jf(x) d​ x = ​Jc(1 ​+ x) ​dx ​=

which is obtained following the substitution u = i + x and an application of the


power rule for integrals. This implies that the constant is c = 2.
Clearly property (2.3.4) also is satisfied in this case.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
66 ​CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
The CDF for this random variable is given by
F(x) = P[X ​x] = ​Jf(t) dt

Odt+Ç2(1+t)3dt x​ ​Jo ​Odt ​ x0 ​J1(1+x)2 ​x>0​x0 Probabilities


​ of intervals, such as P[a ​X

b], can be expressed directly in ​terms of the CDF or as integrals of the pdf. For example, the probability
that a ​flaw occurs between 0.40 and 0.45 meters is given by

rO.45 ​P[0.40 ​X ​0.45] ​= f(x)


​ dx = F(0.45) - F(0.40) = 0.035
)O.4O

Consideration of the frequency of occurrences over short intervals was sug- ​gested as a possible way to

study a continuous distribution in Example 23.1. ​This approach provides some insight into the general

nature of continuous dis- ​tributions. For example, it may be observed that the frequency of occurrences

over short intervals of length Ax, say [x, x + Ax], is at least approximately pro- ​portional to the length of

the interval, Ax, where the proportionality factor depends


​ on x, sayf(x). The condition this imposes on
the distribution of X is
P[x ​X x​ + Ax] = F(x + Ax) - F(x)
f(x) Ax ​(2.3.7) ​where the error in the approximation is negligible relative to the length of the ​interval, Ax.

This is illustrated in Figure 2.6. for the copper wire example. ​The exact probability in equation (2.3.7) is

represented by the area of the shaded


​ region under the graph of f(x), while the approximation is the area
of the ​corresponding rectangle with heightf(x) and width Ax.
The smaller the value of Ax, the closer this approximation becomes. In this sense, it might be reasonable
to think of f(x) as assigning probability density' ​for the distribution of X, and the term probability density
function seems appro- ​priate forf(x). In other words, for a continuous random variable X,f(x) is not a

probability, although it does determine the probability assigned to arbitrarily small


​ intervals. The area
between the x-axis and the graph of f(x) assigns prob- ​ability to intervals, so that for a <b,

P[aXb] ​f(x) dx ​(2.3.8)


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
(2.3.9) ​2.3 CONTINUOUS RANDOM VARIABLES
67
FIGURE 2.6 ​Continuous assignment of probability by pdf
In Example 2.3.2, we could take the probability that the length between suc- ​cessive flaws ​between ​0.40 ​and
0.45 ​meters ​to ​f(0 40)(0 05) = 2(1 4)-(0 05) = 0036, or we could integrate the be
​ ​pdf approximately
​ ​between

the limits
​ 0 40 and 0 45 to obtain the exact answer 0 035 For longer intervals inte ​gratingf(x) as in equation
(2.3.8) would be more reasonable.
Note that in Section 2.2 we referred to a probability density function or density ​function for a discrete
random variable, but the interpretation there is different, ​because probability is assigned at discrete points
in that case rather than in a ​continuous manner. However, it will be convenient to refer to the "density func-

tion" or pdf in both continuous and discrete cases, and to use the same notation, ​f(x) or f(x), in the later

chapters of the book. This will avoid the necessity of separate


​ statements of general results that apply to
both cases.
The notion of expected value can be extended to continuous random variables.
De finition 2.3.2
If X is a continuous random variable with pdff(x), then the expected value of X is ​defined by​E(X) ​= J ​- xf(x)
​ dx ​if the
integral in equation (2.3.9) ​is ​absolutely convergent. Otherwise we say that ​E(X) does not exist.

As in the discrete case, other notations for E(X) are ​¡t ​or ¡t, and the terms ​mean or expectation of X also

are commonly used. The center-of-mass analogy is


PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
68 ​CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

still valid in this case, where mass is assigned to the x-axis in a continuous
manner and in accordance with f(x). Thus, ​¡L ​can also be regarded as a central
measure for a continuous distribution.
In Example ​2.3,2, ​the mean length between flaws in a piece of wire is

CO ​
¡=JxOdx+J x2(l+x)3dx
​ ​
-​ O​If we make the

substitution t = i + x, then

dt ​=2(1 ​)= ​Other properties of probability distributions can be described


in terms of ​quantities called percentiles.

Definition 232
If O <p < 1, then a 100 x pth percentile of the distribution of a continuous random
variable X is a solution x to the equation

F(x)=p ​(2.3.10)

In general, a distribution may not be continuous, and if it has a discontinuity, then there will be some
values of p for which equation (2.3.10) has no solution. ​Although we
emphasize the continuous case in this book, it is possible to state a general
definition of percentile by defining a pth percentile of the distribution of ​X to

be a value x, such that P[X ​xi,] ​p and P[X ​x,,] ? i - p. ​In essence, x is a value

such that 100 x p percent of the population values are ​at most x, and 100 x (1

p) percent of the population values are at least xi,. This


​ is illustrated for a
continuous distribution in Figure 2 7 We also can think ​in terms of a
proportion p rather than a percentage 100 x p of the population, ​and in this
context x is called a pth qiientile of the distribution.

FIGURE 2.7 ​A 100 x pth percentile

You might also like