100% found this document useful (1 vote)

575 views167 pages

Introduction To Probability and Mathematical Statistics

Uploaded by

simona lydia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

575 views167 pages

Introduction To Probability and Mathematical Statistics

Uploaded by

simona lydia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 167

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION

PDFCompressor
Specia' Continuous Distributions
Notation and Parameters Continuous pdf fix) Mean Variance MGF M(t)
Student's t
X t(v)
v= 1,2,...
Snedecor's F
(y1 + v2
X F(v1, y2)

2
(y \ ) Ív2\ (vii!
'\!'2) \\2) \2

2
2v(v1 +v2-2) v1(v2 2)2(v2 4) y1 = 1,2,.
y2 = 1,2,.

BetaX - BETA(a,b)

O 0<b <a
*Not tractable. Does not exist.

V
v-2
2<v

F'(a)F(b)
O<x<1

O
l<v

\ VJ

(a+b+ l)(a+b)2
2<v2 4<v2 aa+b ab
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
Specia' Continuous Distributions
Notation and Parameters Continuous pdf f(x) M
ean Variance MGF M(t)
WeibuH

X WEI(0,ß) 0<0 0<ß

xe' or(i +) 02[(t - +

xtreme Value
ß 0<x E
X EV(0, 17) exp {[(x )/O] exp [(x-17)/0]} - yO
&T(i + Ot)
2O2
6 0<0
y 0.5772 (Euler's const.)
Cauchy
X CAU(O, ,j) 0<0
thr{1 + [(x )/0],}
Pareto
X - PAR(O, ,c)
+ X/O)K +1
¡ç 0(1

'ç1 (-2)( 02K

1)2
**
0<0
0<x 1<K 2<Chi-Square
X -'- x2(v) 2/2 U(v/2)

X''2 'e'2 2v I 2t v=1,2,... 0<x

V
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
INTRODUCTION TO
PROBABILITY AND
MATHEMATICAL
STATISTICS

SECOND EDITION

Lee J. Bain University of

Missouri - Rol/a

Max Engelhardt University of

Idaho
At Du::bur T homson
Learñing
Australia Canada Mexico Singapore Spain United Kingdom United States
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
The Duxbuiy Classic Series is a collection of authoritativeworks from respected authors.
Reissued as paperbacks, these successful titles are now more affordable.

COPYRIGHT © 1992, 1987 by Brooks/Cole Dux bury is an imprint of

Brooks/Cole, a division of Thomson Learning The Thomson Learning
logo is a trademark used herein under license.

For more information about this or any óther Duxbury product, contact:
DUXBTJRY 511 Forest Lodge Road Pacific Grove, CA 93950 USA
www.duxbury.com 1-800-423-0563 (Thomson Learning Academic
Resource Center)

All rights reserved. No part of this work may be reproduced, transcribed or used in any form or by any
meansgraphic, electronic, or mechanical, including photocopying, recording, taping, Web distribution,
or information storage and/or retrieval systemswithout the prior written permission of the publisher.

For permission to use material from this work, contact us by

Web: www.thomsonrights.com fax: 1-800-730-2215 phone:
1-800-730-2214

Printed in the United States of America

10 9 8 7 6 5 4 3 2

Library of Congress Cataloging-in-Publication Data

Bain, Lee J.
Introduction to probability and mathematical statistics / Lee J. Bain, Max Engelhardt.-2°'
ed. p. cm.(The Duxbuiy advanced series in statistics and decision sciences) Includes

bibliographical references and index. ISBN

0-534-92930-3 (hard cover) ISBN 0-534-38020-4

(paperback) 1. Probabilities. 2. Mathematical statistics. I. Engelhardt, Max. II. Title. ifi. Series.
QA273.B2546 1991 519.2---dc2O
91-25923

dF
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS

CHA PTER

PROBABILITY i
1.1 Introduction 1 1.2 Notation
and terminology 2 1.3 Definition of probability 9 1.4 Some
properties of probability 13 1.5 Conditional probability 1 6 1.6 Counting techniques 3 1
Summary 4 2 Exercises 43

CHAPTER

RANDOM VARIABLES AND THEIR

DISTRIBUTIONS 53
3 2.2 Discrete
2.1 Introduction 5 random variables 56 2.3 Continuous random variables
62 2.4 Some properties of expected values 71 2.5 Moment generating functions 78
Summary 8 3 Exercises 83V
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
Vi CONTENTS

CHAPTER

SPECIAL PROBABILITY DISTRIBUTIONS 90

0 3.2 Special
3.1 Introduction 9 discrete distributions 91 3,3 Special continuous
distributions 109 3.4 Location and scale parameters 124 Summary 127 Exercises 128

CHAPTER

JOINT DISTRIBUTIONS 136

36 4.2 Joint
4.1 Introduction 1 discrete distributions 137 4.3 Joint continuous distributions
144 4.4 Independent random variables 149 4.5 Conditional distributions 153 4.6 Random
samples 158 Summary 165 Exercises 165

CHAPTER

PROPERTIES OF RANDOM VARIABLES 171

5.1 Introduction . 171 5.2

Properties of expected values 172 5.3 Correlation 177 5.4
Conditional expectation 180 5;5 Joint moment generating functions 186 Summary 188
Exercises 189
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS vu

CHAPTER

FUNCTIONS OF RANDOM VARIABLES 193

6.1 Introduction 1 93 6.2 The
CDF' technique 194 6.3 Transformation methods 197 6.4
Sums of random variables 2 09 6.5 Order statistics 214 Summary 226 Exercises 226

CHA PTER

LIMITING DISTRIBUTIONS 231

31 7.2 Sequences
7.1 Introduction 2 of random variables 232 7.3 The central limit theorem
236 7.4 Approximations for the binomial distribution 240 7.5 Asymptotic normal
distributions 2 43 7.6 Properties of stochastic convergence 2 45 7.7 Additional limit
theorems 2 47 7.8* Asymptotic distributions of extreme-order statistics 2 50 Summary
259 Exercises 2 59

CHAPTER

STATISTICS AND SAMPLING

DISTRIBUTIONS 263
63 8.2 Statistics 2 63 8.3 Sampling
8.1 Introduction 2 distributions . 267 8.4 The t, F, and
beta distributions 273 8.5 Large-sample approximations 2 80 Summary 283 Exercises 283

* Advanced (or optional) topics

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
viii CONTENTS

CHAPTER

POINT ESTIMATION 288

88 9.2 Some
9.1 Introduction 2 methods of estimation 290 9.3 Criteria for evaluating
estimators 302 9.4 Large-sample properties 311 9.5 Bayes and minimax estimators 319
Summary 3 27 Exercises 328

CHAPTER 10
SUFFICIENCY AND COMPLETENESS 335

35 10.2 Sufficient statistics 337 10.3

10.1 Introduction 3 Further properties of sufficient

statistics 342 10.4

Completeness and exponential class 345 Summary 353 Exercises 353

CHAPTER 11
INTERVAL ESTIMATION 358

58 11.2 Confidence
11.1 Introduction 3 intervals 359 11.3
-
Pivotal quantity method 362 11.4
General method 369 11.5 Two-sample problems 377 11.6 Bayesian interval estimation 382
Summary 383 Exercises 384
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CONTENTS ix

CI-/APTER 12
TESTS OF HYPOTHESES 389
89 12.2 Composite hypotheses 3 95 12.3 Tests for the normal
12 1 Introduction 3
distribution 398 12.4 Binomial tests 404 12.5 Poisson tests 406 12.6 Most powerful tests 406
12.7 Uniformly most powerful tests 411 12.8 Generalized likelihood ratio tests 417 12.9
436
Conditional tests 426 12.10 Sequential tests 428 Summary 435 Exercises -

CHAPTER 13
CONTINGENCY TABLES AND
GOODNESS-OF-FIT 442
42 13.2 One-sample
13.1 Introduction 4 binomial case 4 43 13.3 r-Sample binomial test
(completely specified H0) 444 13.4 One-sample multinomial 447 13.5 r-Sample
multinomial 4 48 13.6 Test for independence, r x c contingency table 450 13.7 Chi-squared
goodness-of-fit test 4 53 13.8 Other goodness-of-fit tests 4 57 Summary 4 61 Exercises 4 62

CHAPTER 14
NONPARAMETRIC METHODS 468

68 14.2 One-sample
14.1 Introduction 4 sign test 469 14.3 Binomial test (test on quantiles)
476
471 14.4 Two-sample sign test

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
14.5 14.6 14.7 14.8 14.9
15.1 15.2 15.3 15.4 15.5
16.1 16.2 16.3 16.4 16.5
CONTENTS

Wilcoxon paired-sample signed-rank test Paired-sample randomization test Wilcoxon and Mann-Whitney
(WMW) tests Correlation teststests of independence Wald-Wolfowjtz runs test Summary Exercises CHAPTER

J5*
477 482 483 486 492 494 495
REGRESSION AND LINEAR MODELS Introduction Linear regression Simple linear regression General
linear model Analysis of hivariate data Summary Exercises
CHAPTER
* 499
499 500 501 515 529 534 535
RELIABILITY AND SURVIVAL DISTRIBUTIONS 540 Introduction Reliability concepts Exponential
distribution Weibull distribution Repairable systems Summary Exercises
540 541 548 560 570 579 579
APPENDIXA REVIEWOFSETS 587 APPENDIX B SPECIAL DISTRIBUTIONS 594
APPENDIX C TABLES OF DISTRIBUTIONS 598 ANSWERS TO SELECTED EXERCISES
619 REFERENCES 638 INDEX 641
Advanced (or optional) topics
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PREFACE

This book provides an introduction to probability and mathematical statistics.

Although the primary focus of the book is on a mathematical development of the
subject, we also have included numerous examples and exercises that are oriented
toward applications. We have attempted to achieve a level of presentation that is
appropriate for senior-level undergraduates and beginning graduate students.
The second edition involves several major changes, many of which were sug gested by reviewers
and users of the first edition. Chapter 2 now is devoted to general properties of
random variables and their distributions. The chapter now includes moments

and moment generating functions, which occurred somewhat later in the first

edition. Special distributions have been placed in Chapter 3. Chapter

8 is
completely changed. It now considers sampling distributions and some basic
properties of statistics. Chapter 15 is also new. It deals with regression and related
aspects of linear models As with the first edition, the only prerequisite for

covering the basic material is calculus, with the lone exception of the material

on general linear models in Section

15.4; this assumes some familiarity with

matrices. This material can be omitted if so desired. Our intent was to produce

a book that could be used as a textbook for a two-semester

sequence in
which the first semester is devoted to probability concepts and the second
covers mathematfcal statistics. Chapters 1 through 7 include topics that usually

are covered in a one-semester introductory course in probabil- ity, while

Chapters 8 through 12 contain standard topics in mathematical sta- tistics.

Chapters 13 and 14 deal with goodness-of-fit and nonparametric statistics. These

chapters tend to be more methods-oriented. Chapters 15 and 16 cover

material in regression and reliability, and these would be considered as optional
or special topics. In any event, judgment undoubtedly will be required in

thexi
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
xii PREFACE

choice of topics covered or the amount of time allotted to topics if the

desired material is to be completed in a two-semester course. lt is our hope that

those who use the book will find it both interesting and informative.

ACKNOWLEDG M ENTS

We gratefully acknowledge the numerous suggestions provided by the

following reviewers:

Dean H. Fearn Alan M. Johnson Calfornia State UniversityHayward

University ofArkansas, Little Rock

Joseph Glaz Benny P. Lo University of Connecticut Ohione

College

Victor Goodman D. Ramachandran Rensselaer Polytechnic Institute California

State UniversitySacramento

Shu-ping C. Hodgson Douglas A. Wolfe Central Michigan

University Ohio State University

Robert A. Hultquist Linda J. Young Pennsylvania State University

Oklahoma State University

Thanks also are due to the following users of the first edition who were kind

enough to relate their experiences to the authors: H. A. David, Iowa State Uni-
versity; Peter Griffin, California State UniversitySacramento.
Finally, special thanks are due for the moral support of our wives, Harriet Bain and Linda Engelhardt.
Lee J. Bain Max Engelhardt
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C H A P T

PROBtJ3ILITY

1.1

INTRODUCTIONIn any scientific study of a physical phenomenon, it is desirable to have a

mathematical
model that makes it possible to describe or predict the observed value of some
characteristic of interest. As an example, consider the velocity of a falling body after a certain
length of time, t. The formula y = gt, where g 32.17 feet per second p er second, per second, o fa
body provides f alling a useful f rom rest mathematical i n a vacuum. model This f or i s the a n
velocity, e xample in o f feet a deterministic model. For such a model, carrying out repeated
experiments under ideal conditions would result in essentially the same velocity each time, and this

would be predicted by the model. On the other hand, such a model may not be adequate
when

the experiments are carried out under less than ideal conditions. There may be unknown or

uncontrolled variables, such as air temperature or humidity,

that might affect the outcome, as

well as measurement error or other factors that might cause the results to vary on different

performances of the I
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2 CHAPTER 1 PROBABILITY

experiment. Furthermore, we may not have sufficient knowledge to derive a

more complicated model that could account for all causes of variation.
There are also other types of phenomena in which different results may natu- rally occur by chance,

and for which a deterministic model would not be appro. priate. For example,

an experiment may consist of observing the number of particles

emitted by a
radioactive source, the time until failure of a manufactured component, or the
outcome of a game of chance.
The motivation for the study of probability is to provide mathematical models for such
nondeterministic situations; the corresponding mathematical models will be

called probability models (or probabilistic models). The term stochastic, which is

derived from the Greek word stochos, meaning "guess," is sometimes used
instead of the term probabilistic. A careful study of probability models requires

some familiarity with the notation and terminology of set theory. We will

assume that the reader has some knowledge

of sets, but for convenience we
have included a review of the basic ideas of set theory in Appendix A.

L2
NOTATION AND TERMINOLOGY

The term experiment refers to the process of obtaining an observed result of some

phenomenon. A performance of an experiment is called a trial of the experiment,

and an observed result is called an outcome. This terminology is rather general,

and it could pertain to such diverse activities as scientific experiments or games

of chance. Our primary interest will be in situations where there is uncertainty

about which outcome will occur when the experiment is performed We will
assume that an experiment is repeatable under essentially the same conditions,

and that the set of all possible outcomes can be completely specified before
experimentation.

Definition
1.2.1
The set of all possible outcomes of an experiment is called the sample space, denoted
by S.Note Chat one and only one of the possible outcomes will occur on any given

trial of
the experiments.

Exaíipk i .2.1 An experiment consists of tossing two coins, and the observed face of each coin is
of interest. The set of possible outcomes may be represented by the sample space

S = {HH, HT, TH, TT}

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.2 NOTATION AND TERMINOLOGY 3

which simply lists all possible pairings of the symbols H (heads) and T (tails). An
alternate way of representing such a sample space is to list all possible ordered
pairs of the numbers i and O, S = {(l, 1), (1, 0), (0, 1), (0, O)}, where, for example,

(1, 0) indicates that the first coin landed heads up and the second coin landed
tails up.

Example 1.2.2 Suppose that in Example 1.2.1 we were not interested in the individual outcomes
of the coins, but only in the total number of heads obtained from the two coins. An appropriate
sample space could then be written as S'1' = {0, 1, 2}. Thus, different sample spaces may be
appropriate for the same experiment, depending on the characteristic of interest.

Exampl& 1.2.3 If a coin is tossed repeatedly until a head occurs, then the natural sample space is S
= {H, TH, TTH, . . .}. If one is interested in the number of tosses required to obtain a head, then
a possible sample space for this experiment would be the set of all positive integers, S'1 = {1, 2,
3, . .}, and the outcomes would correspond directly to the number of tosses required to obtain the
.

first head. We will show in the next chapter that an outcome corresponding to a sequence of
tosses in which a head is never obtained need not be included in the sample space.

Exampk 1.2.4 A light bulb is placed in service and the time of operation until it burns out is
measured, At least conceptually, the sample space for this experiment can be taken to be the
set of nonnegative real numbers, S = {tIO t < c}. Note that if the actual failure time could be

measured only to the nearest hour, then the sample space for the actual observed failure time

would be the set of nonnegative

integers, S'i' = {O, 1, 2, 3, . .}. Even though S* may be the
.

observable sample space, one might prefer to describe the properties and behavior of light bulbs
in terms of the conceptual sample space S. In cases of this type, the dis- creteness imposed by
measurement limitations is sufficiently negligible that it can be ignored, and both the measured
response and the conceptual response can be discussed relative to the conceptual sample space S.

A sample space S is said to be finite if it consists of a finite number of out-

comes, say S = {e1, e2 .....eN}, and it is said to be countably infinite if its out-
comes can be put into a one-to-one correspondence with the positive integers, say
S= {e1,e2,...}.

Definition 1.2.2
If a sample space S is either finite or countably infinite then it is called a thscrete
sample space.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER 1 PROBABILITY

A set that is either finite or countably infinite also i said to be countable. This is

the case in the first three examples. It is also true for the last example when
failure times are recorded to the nearest hour, but not for the conceptual sample
space. Because the conceptual space involves outcomes that may assume any
value in some interval of real numbers (i.e., the set of nonnegative real numbers),
it could be termed a continuous sample space, and it provides an example where a

discrete sample space is not an appropriate model. Other, more complicated

experiments exist, the sample spaces of which also could be characterized as con-
tinuous, such as experiments involving two or more continuous responses.

Suppose a heat lamp is tested and X, the amount of light produced (in lumens), and Y, the amount of

heat energy (in joules), are measured. An appropriate sample

space would

be the Cartesian product of the set of all nonnegative real numbers with itself,

S=[O,co)x[O,cx)={(x,y)fOx<cc and Oy<cc} Each variable would be capable of

assuming any value in some subinterval of [O, cc).
Sometimes it is possible to determine bounds on such physical variables, but often it is more

convenient to consider a conceptual model in which the variables are not

bounded. If the likelihood of the variables in the conceptual model

exceeding such bounds is negligible, then there is no practical difficulty in using

the conceptual model.

Example 1.2.6 A thermograph is a machine that records temperature continuously by tracing a

graph on a roll of paper as it moves through the machine. A thermographic recording

made during a 24-hour period. The observed result is the graph of a continuous real-valued

function f(t) defined on the time interval [0, 24] = {t O t 24}, and an appropriate sample space

would be a collection of such functions.

Definition 1.2.3
An event is a subset of the sample space S. If A is an event, then A has occurred if it

contains the outcome that occurred.

To illustrate this concept, consider Example 1.2.1. The subset

A = {HH, HT, TH}

contains the outcomes that correspond to the event of obtaining "at least one

head." As mentioned earlier, if one of the outcomes in A occurs, then we say that
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.2 NOTATION AND TERMINOLOGY

the event A has occurred. Similarly, if one of the outcomes in B = {HT, TH, TT} occurs, then we say that the
event "at least one tail" has occurred. Set notation and terminology provide a useful framework for

describing the possible outcomes and related physical events that may be of interest in an experiment.

As suggested above, a subset of outcomes corresponds to a physical event, and the event or the subset is
said to occur if any outcome in the subset occurs. The usual set operations of union, intersection, and
complement provide a way of expressing new events in terms of events that already have been defined. For

example, the event C of obtaining "at least one head and at least one tail" can be expressed as the

intersection of A and B, C = A n B = {HT, TH}. Simi- larly, the event "at least one head or at least one

tail" can be expressed as the union A u B {HH, HT, TH, TT}, and the event "no heads" can be expressed
as the complement of A relative to S, A' = {TT}.
A review of set notation and terminology is given in Appendix A. In general, suppose S is the sample space for
some experiments, and that A and B are events. The intersection A n B represents the outcomes of the event
"A and B," while the union A u B represents the event "A or B." The complement A' corresponds to the
event "not A." Other events also can be represented in terms of intersections, unions, and complements. For
example, the event "A but not B" is said to occur if the outcome of the experiment belongs to A n B', which

sometimes is written as A - B. The event "exactly one of A or B" is said to occur if the outcome belongs to

(A n B') u (A' n B). The set A' n B' corresponds to the event "neither A nor B." The set identity A' n B'
= (A u B)' is another way to represent this event. This is one of the set properties that usually are referred
to as De Morgan's laws. The other such property is A' u B' = (A n B)'.

is a finite collection of events, occurrence of an k outcome in the intersection

More generally, if A1, ..., A

corresponds to the occurrence of the event "every A1; i = 1, ..., k." The occurrence of
A1 n n Ak (or fl A,)
an outcome in
the union A1 u u Ak (or A,) corresponds to the occurrence of the event
"at least one A,; i = 1, ..., k." Similar remarks apply in the case of a countably infinite collection A1, A2,
..., with the notations A1 n A2 n (or fl
A1) for

the intersection and A1 u A2 u (or Y A)

for the union.
The intersection (or union) of a finite or countably infinite collection of events is called a countable intersection
(or union).
We will consider the whole sample space S as a special type of event, called the sure event, and we also will
include the empty set Ø as an event, called the null event. Certainly, any set consisting of only a single
outcome may be considered as an event.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER 1 PROBABILITY

Definition 1.2.4 An event is called an elementary event if it contains

exactly one outcome of the experiment.

In a discrete sample space, any subset can be written as a countable union of elementary events, and
we have no difficulty in associating every subset with an event in the discrete
case.
In Example 1.2.1, the elementary events are {HH}, {HT}, {TH}, and {TT}, and any other event can be
written as a finite union of these elementary events. Simi- larly, in Example 1.2.3,
,and any
the elementary events are {H}, {TH}, {TTH}, . . .
event can be
represented as a countable union of these elementary events.
It is not as easy to represent events for the continuous examples. Rather than
attempting to characterize these events rigorously, we will discuss some examples.
In Example 1.2.4, the light bulbs could fail during any time interval, and any
interval of nonnegative real numbers would correspond to an interesting event
for that experiment. Specifically, suppose the time until failure is measured in
hours. The event that the light bulb "survives at most 10 hours" corresponds to
the interval A = [0, 10] = {tIO t 1O}. The event that the light bulb "survives
more than 10 hours" is A' = (10, cu) = {tI 10 < t < cc}. If B = [0, 1 5), then
C = B n A' = (10, 15) is the event of "failure between 10 and 15 hours."
In Example 1.2.5, any Cartesian product based on intervals of nonnegative real numbers would
correspond to an event of interest. For example, the event

(10, 20) x [5, cc) = {(x, y) 110 <x < 20 and 5 y < cc} corresponds to "the amount of light is
between 10 and 20 lumens and the amount of energy is at least 5 joules." Such
an event can be represented graphically as a rectangle in the xy plane with sides
parallel to the coordinate axes.
In general, any physical event can be associated with a reasonable subset of S, and often a subset of S

can be associated with some meaningful event. For mathematical reasons,

though, when defining probability it is desirable to restrict the types

of subsets
that we will consider as events in some cases. Given a collection of events, we
will want any countable union of these events to be an event. We also will
want complements of events and countable intersections of events to be included
in the collection of subsets that are defined to be events. We will assume that the
collection of possible events includes all such subsets, but we will not attempt
to describe all subsets that might be called events. An important situation arises

in the following developments when two events correspond

to disjoint subsets.
Definition 1.25
Two events A and B are called mutually exclusive if A n B = 0.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.2 NOTATtON AND TERMINOLOGY

If events are mutually exclusive, then they have no outcomes in common. Thus, the occurrence of one
event precludes the possibility of the other occurring. In Example 1.2.1, if A is

the event "at least one head" and if we let B be the event "both tails," then A

= A' (the
änd B are mutually exclusive. Actually, in this example B
complement of A). In general, complementary events are mutually exclusive,

but the converse is not true. For example, if C is the event "both heads,"

then B and C are mutually exclusive, but not complementary.
The notion of mutually exclusive events can be extended easily to more than two events.

Definition 1.26 Events A1, A2, A3, ..., are said to be mutually exclusive
j
i
if they are pairwise mutually exclusive That is if A, n A = Ø whenever

One possible approach to assigning probabilities to events involves the notion of relative frequency.

RELATIVE FREQUENCY For the experiment of tossing a coin, we may

declare that the probability of obtaining a head is 1/2 This could be

interpreted in terms of the relative fre quency

with which a head is obtained
on repeated tosses. Even though the coin may be tossed only once, conceivably
it could be tossed many times, and experience leads us to expect a head on

approximately one-half of the tosses. At least conceptually,

as the number of
tosses approaches infinity, the proportion of times a head occurs is expected to

converge to some constant p. One then might define the probability of obtaining

a head to be this conceptual limiting value. For a balanced

coin, one would
expect p = 1/2, but if the coin is unbalanced, or if the experiment is conducted
under unusual conditions that tend to bias the outcomes in favor of either heads
or tails, then this assignment would not be appropriate. More generally, if m(A)

represents the number of times that the event A occurs among M trials of a given

experiment, then fA = m(A)/M represents the relative frequency

of occurrence of
A on these trials of the experiment.
Example 1.2.7 An experiment consists of rolling an ordinary six-sided die. A natural sample space
is the set of the first six positive integers, S = {1, 2, 3, 4, 5, 6}. A simulated die-rolling experiment is

performed, using a "random number generator" on a computer. In Figure 1.1, t he relative

= {1}, A2 = {2}, and so on are represented as the heights of

frequencies of the elementary events A1
vertical lines. The first graph shows the relative frequencies for the first M = 30 rolls, and the

second graph gives the results for M = 600 rolls. By inspection of these graphs,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
CHAPTER 1 PROBABILITY
8
obviously the relative frequencies tend to "stabilize" near some fixed value as M increases Also included in
the figure is a dotted line of height 1/6, which is the value that experience would suggest as the long term
relative frequency of the outcomes of rolling a die Of course, in this example, the results are more relevant
to the properties of the random number generator used to simulate the experi ment than to those of actual
dice.
FIGURE 1.1 Relative frequencies of elementary events for die-rolling experiment

A. 7/30
2 3 4
(M = 30)
-4.
6/30 107/600 103/600 99/600
97/600
96/600 98/600 ,

2 3 4 (M = 600)
If, for an event A, the limit offA as M approaches infinity exists, then one could assign probability to A by
P(A) = um fA
(1.2.1)

This expresses a property known as statistical regularity Certain technical questions about this

property require further drcussion For example, it is not clear sense,

under or
whether what it
conditions
will necessarily the limit be
the in equation same
for every (1 2 1) sequence
or trials
will exist of in what
Our approach to this problem will be to define probability in terms of a set of axioms and eventually show
that the desired limiting behavior follows
To motivate the defining axioms of probability, consider the following proper ties of relative frequencies. If

S is the sample space for an experiment and A is an event, occurrences then

clearly of A, O
and m(A)
S

occurs and
rn(S) on each =
M, trial. because
Furthermore, m(A)
counts if the
A and number
B are of

mutually exclusive events, then outcomes in A are distinct from outcomes in B, and consequently rn(A u
B) = rn(A) + m(B). More generally, if A1, A2, ... are pairwise mutually exclusive, then m(A1 u A2 u =
hus, the following properties hold for relative frequencies:
m(A1) + rn(A2) + T

O(1.2.2)
f5=1
(1.2.3)
fA1 u A2 u = fA, +fA2 +
(1.2.4)
if A1, A2, ... are pairwise mutually exclusive events.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.3 DEFINITION OF PROBABILITY

Although the relative frequency approach may not always be adequate as a practical method of

assigning probabilities, it is the way that probability usually is

interpreted.

However, many people consider this interpretation too restrictive. By they

regarding are willing probability

to assign as
a probability subjective
measure

belief that involving an

in any situation of event uncertainty will occur,
with6ut assuming properties such as repeatability or statistical regularity. Sta-
tistical methods based on both the relative frequency approach and the sUbjective
approach will be discussed in later chapters.

L
3
DEFINITON OF OAflLITY

Given an experiment with an associated sample space S, the primary objective of

probability modeling is to assign to each event A a real number F(A), called the
probability of A, that will provide a measure of the likelihood that A will occur

when the experiment is performed. Mathematically,

we can think of P(A) as a set

function. In other words, it is a function subset of whose the real domain

numbers, is a collection of sets (events), and the range of which is a Some
set
functions are not suitable for assigning probabilities to events. The properties
given in the following definition are motivated by similar properties that hold

for relative frequencies.

Definition
1.3.1
For a given experiment, S denotes the sample space and A, A1, A2, ... represent
possible events. A set function that associates a real value P(A) with each event A is
called a probability set function, and P(A) is called the probability of A, if the follow-

ing properties are satisfied:

O F(A) for every A (1.3.1)

F(S) = i (1.3.2)

P(UAI) (1.3.3)

ifA1, A2,.. are pairwise mutually exclusive events.

These properties all seem to agree with our intuitive concept of probability, and these few
properties are sufficient to allow a mathematical structure to be developed.
One consequence of the properties is that the null event (empty set) has prob-

ability zero, P(Ø) = O (see Exercise li). Also, if A and B are two mutually exclu-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
10 CHAPTER 1 PROBABILITY

sive events,
then

P(A u B)= P(A)+ P(B) (1.3.4) Similarly, if A1, A2, ..., A is a finite collection of

pairwise mutually exclusive events, thenP(A1 u A2 u u Ak) = P(A1) + P(A2) +

... + P(Ak) (1.3.5)

(See Exercise 12.) In the case of a finite sample space, notice that there is at most

a finite number of nonempty mutually exclusive events. Thus, in this case it

would suffice to verify

equation (1.3.4) or (1.3.5) instead of (1.3.3).

Example 1.3.1 The successful completion of a construction project requires that a piece of
equipment works properly. Assume that either the "project succeeds" (A1) or it fails because of
one and only one of the following: "mechanical failure" (A2) or "electrical failure" (A3). Suppose

that mechanical failure is three times as likely as electrical failure, and successful completion is

twice as likely as mechanical failure.

The resulting assignment of probability is determined by
f
the equations P(A2) = 3P(A3) and P(A1) = 2P(A2). Because one and only one these events will

occur, we also have from (1.3.2) and (1.3.5) that P(A1) + P(A2) + P(A3) = 1. These equations

provide a system that can be solved simultaneously to obtain P(A1)

= 0.6, P(A2) = 0.3, and

P(A3) = 0.1. The event "failure" is represented by the union A2 u A3, and because A2 and A3

are assumed to be mutually exclusive, we have from equation (1.3.5) that the probability of

failure is P(A2 u A3) =

0.3 + 0.1 = 0.4.

PROBABILITY IN DISCRETE SPACES The assignment of probability in

the case of a discrete sample space can be reduced to assigning probabilities
to the elementary events. Suppose that to each elementary event {e} we assign a
real number p, so that P({e}) = m. To
satisfy the conditions of Definition 1.3.1, it
is necessary that

p1O foralli (1.3.6)

(1.3.7
)

Because each term in the sum (1.3.7) corresponds to an outcome in S, it is an ordinary summation

when S is finite, and an infinite series when S is countably infinite. The

probability of any other event then can be determined from the above

assignment by representing the event as a union of mutually exclusive

elementary events, and summing the corresponding values of p. A concise nota-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.3 DEFINITION OF PROBABILITY
11

tion for this is given by

P(A) = P({e1}) (1.3.8)

With this notation, we understand that the summation is taken over all indices i
such that e is an outcome in A. This approach works equally well for both finite
and countably infinite sample spaces, but if A is a countably infinite set the sum-
mation in (1.3.8) is actually an infinite series.

Example 1.3.2 If two coins are tossed as in Example 1.2.1, then S = {HH, HT, TH, TT}; if the

coins are balanced, it is reasonable to assume that each of the four outcomes is equally likely.

Because F(S) = 1, the probability assigned to each elementary event must be 1/4. Any event in
a finite sample space can be written as a finite union of distinct elementary events, so the

probability of any event is a sum including the constant term 1/4 for each elementary event

in the union. For example,

if C = {HT, TH} represents the event "exactly one head," then

P(C) = P({HT}) + P({TH}) = 1/4 + 1/4 = 1/2 Note that the "equally likely" assumption cannot
be applied indiscriminately. For example, in Example 1.2.2 the number of heads

is of interest, and the sample space is S* = (0, 1, 2}. The elementary event

{1} corresponds to the event C = {HT, TH} in S. Rather than assigning the

we should assign P({1}) = 1/2 and P({0}) =

probability 1/3 to the outcomes in S's',
P({2}) = 1/4.

In many problems, including those involving games of chance, the nature of

the outcomes dictates the assignment of equal probability to each elementary

event. This type of model sometimes is referred to as the classical probability

model.

ÇLASSICAL PROBABILITY

Suppose that a finite number of possible outcomes may occur in an experiment,

and that it is reasonable to assume that each outcome is equally likely to occur.
Typical problems involving games of chancesuch as tossing a coin, rolling a
die, drawing cards from a deck, and picking the winning number in a lotteryfit

this description Note that the e qually likely assumption requires the experi-

ment to be carried out in such a way that the assumption is realistic. That is, the
coin should be balanced, the die should not be loaded, the deck should be shuf-
fled, the lottery tickets should be well mixed, and so forth. This imposes a very

special requirement on the assignment of probabilities to the

elementary

outcomes. In particular, let the sample space consist of N distinct outcomes,S =

(e1, e2, ..., eN) (1.3.9)

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
12 CHAPTER 1 PROBABILITY
The "equally likely" assumption requires of the values p that

Pi - P PN (1.3.10>
and, to satisfy equations (1.3.6) and (1.3.7), necessarily
i
p = P({e1}) (1.3.11)
In this case, because all teijiis in the sum 1.3.8) are the same, p = 1/N, it follows that
n(A)
P(A)
N
where ri(A) represents the number of outcomes in A. In other words, if the outcomes of an experiment are
equally likely, then the problem of assigning prob abilities to events is reduced to counting how many

outcomes are favorable to the occurrence of the event as well as how many are in the sample space, and
then finding the ratio. Some techniques that will be useful in solving some of the more complicated counting
problems will be presented in Section 1.6.
The formula presented in (1 3 12) sometimes is referred to as classical probability For problems in which
this method of assignment is appropriate, it is fairly easy to show that our general definition of probability is
satisfied Specifically, for any event A,
n(Au B) n(A)± n(B)
ri(A)Nn(S) N N P(A B)
N P(A)+ P(B)
if A and B are mutually exclusive.
RANDOM SELECTION A major application of classical probability arises in connection with choosing an
object or a set of objects at random from a collection of objects
(1.3.12)

2 3 2 If an object is chosen from a finite collection of distinct objects in such a manner that each object has the

same probability of being chosen, then we say that the object

was chosen at random.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.4 SOME PROPERTIES OF PROBABILITY 13

Similarly, if a subset of the objects is chosen so that each subset of the same

size has the same probability of being chosen, then we say that the subset was

chosen at random. Usually, no distinction is made when the elements of the

subset are listed in a different order, but occasionally it will be useful to make this
distinction.

14Example 1.3.3 A game of chance involves drawing a card from an ordinary deck of 52
playing
cards. It should not matter whether the card comes from the top or some other part of the

deck if the cards are well shuffled. Each card would have the same probability, 1/52, of being

selected. Similarly, if a game involves drawing five cards,

then it should not matter whether the
top five cards or any other five cards are drawn. The probability assigned to each possible set of
five cards would be the reciprocal of the total number of subsets of size 5 from a set of size 52.

In Section 1.6 we will develop, among other things, a method for counting the number
of
subsets of a given size.

SOME PROPERTIES OF :OABILITY

From general properties of sets and the properties of Definition 1.3.1 we can derive other useful
properties of probability. Each of the following theorems per tains to one or
more events relative to the same experiment.

Theorem 1.4.1 IfA is an event and A' is its complement, then

P(A)=1P(A') (141)

Proof Because A' is the complement of A relative to S, S = A u A'.

Because A - A' = Ø, A and A' are mutually exclusive, so it follows from
equations (1.3.2) and (1.3.4) that

i = P(S) = P(A u A') = P(A) + P(A')

which established the theorem

This theorem is particularly useful when an event A is relatively complicated, but its complement A
is easier to analyze
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14 CHAPTER 1 PROBARILITY

Example 1.4.1 An experiment consists of tossing a coin four times, and the event A of interest is
"at least one head." The event A contains most of the possible outcomes, but the complement, "no

heads," contains only one, A' = TTTT}, so n(A') = 1. It can be shown

by listing all of the
possible outcomes that n(S) = 16, so that P(A') = n(A )/n(S) = 1/16 Thus P(A) = i - P(A') =
i - 1/16 = 15/16

Theorem 1.4.2 For any event A, P(A) 1.

Proof From Theorem (1.4.1), P(A) = i - P(A'). Also, from Definition (13.1),
we know that P(A') O. Therefore, P(A) 1 .

Note that this theorem combined with Definition (1.3.1) implies that
(1.4.2)
O P(A) 1 Equations (1.3.3), (1.3.4), and (1.3.5) provide formulas for the
probability of a union in the case of mutually exclusive events. The following
theorems provide formulas that apply more generally.

Theorem 1.4.3 For any two events A and B,

(1.4.3)
P(A B) = P(A) + P(B) - P(A n B) Proof The approach will be to express the events A u
B and A as unions of mutually exclusive events. From set properties we can show
that

A u B = (A n B') u B and
A = (A n B) u (A n B')

See Figure 1.2 for an illustration of these identities.

FIGURE 1.2 Partitioning of events

AUB = (AflB')UB A = (AflB)U(AflB')

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.4 SOME PROPERTIES OF PROBABILITY '5

lt also follows that the events A n B' and B are mutually exclusive because

(A n B) n B
= Ø, so that equation (1 3 4) implies
P(A u B) = P(A n B') + P(B) Similarly, A n B and A n B'
are mutually exclusive, so that

P(A) = P(A n B) + P(A n B')

The theorem follows from these

equations:

P(AuB)=P(AnB')+P(B)
= [P(A) - P(A n B)] + P(B)
= P(A) + P(B) - P(A n B)

Examp'e 1.4.2 Suppose one card is drawn at random from an ordinary deck of 52 playing cards.
As noted in Example 13 3 this means that each card has the same probability 1/52, of being
chosen.
Let A be the event of obtaining "a red ace" and let B be the event "a heart." Then P(A) = 2/52,
P(B) = 13/52, and P(A n B) = 1/52. From Theorem (1.4.3) we have P(A u B) =

2/52 + 13/52 - 1/52 14/52 = 7/26.

Theorem 1.4.3 can be extended easily to three

events.

Theorem 1.4.4 For any three events A, B, arid C,

P(A u B u C) = P(A) + P(B) + P(C)

- P(A n B) - P(A n C) - P(B n C) + P(A n B n

C) (1.4.4)

Proof
See Exercise 16.

lt is intuitively clear that if every outcome of A is also an outcome of B, then A is no more likely to
occur than B. The ,next theorem formalizes this notion.
Theorm 1.4.5 IfA B, then P(A) P(B).

See Exercise 17.

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
16 CHAPTER 1 PROBABILITY
Property (1.33) provides a formula for the probability of a countably infinite union when the events are
mutually exclusive. If the events are not mutually exclusive, then the right side of property (1.3.3) still
provides an upper bound for this probability, as shown in the following theorem
Theorem 1.4.6 Boole's Inequality IfA1, A2,... is a sequence of events, then
P(UA1) (1.4.5)

Proof
Let B1 = A1, B2 = A2 n A, and in general B1 = A1 n (1U'A1'. It
follows that
B. and B1, B2,.
are mutually exclusive. Because B1 A1, it follows 1=1 1=-1 from Theorem 1.4.5 that P(B1) P(A1), and thus
=
P(UAI) = F (UBI) P(B1) P(A1)
A similar result holds for finite unions. In particular,
P(A1 u A2 u u Ak) P(A1) + P(A2) + ... + P(A)
(1.4.6)
which can be shown by a proof similar to that of Theorem 1.4.6.
Theorem 1.4.7 Bonferroni's Inequality If A1, A2, ..., A are events, then
P(k

flA1) 1 P(A)
k (1.4.7)

Proof

This follows from Theorem 1.4.1 applied to ( A1 = ¿=1

k
UM together
with
inequality (1.4.6).

1.5
CONDITIONAL PROBABILITY
A major objective of probability modeling is to determine how likely it is that an event A will occur when a

certain experiment is performed. However, in numerous cases the probability assigned to A will be
affected by knowledge of the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 CONDITIONAL PROBABILITY '7
occurrence or nonoccurrence of another event B. In such an example we will use the terminology conditional
probability of A given B and the notation P(A will be used to distinguish between this new concept and
ordinary probability B)
I

P(A).
Example 1.5.1 A box contains 100 microchips, some of which were produced by factory i and the rest by

factory 2. Some of the microchips are defective and some are good (nondefective). An experiment

consists of choosing one microchip at random from

the box and testing whether it is good or defective.

Let A be the event "obtaining a defective microchip"; consequently, A' is the event "obtaining a good

microchip." Let B be the event "the microchip was produced by factory 1" and B' the event "the microchip
was produced by factory 2," Table 1.1 gives the number of microchips in each category.

TABLE 1.1 from Numbers

nondefectivo two factories
of defective microchips and
B B' Totals

A A' 15 5 20 45 35 80

Totals 60 40 100
The probability of obtaining a defective microchip is

P(A) = = =
0.20
Now suppose that each microchip has a number stamped on it that identifies which factory produced it.
Thus, before testing whether it is defective, it can be determined whether B has occurred (produced by
factory 1) or B' has occurred (produced by factory 2). Knowledge of which factory produced the
microchip affects the likelihood that a defective microchip is selected, and the use of conditional

probability is appropriate. Fo example, if the event B has occurred, then the only microchips we should
consider are those in the first column of Table 1.1, and the total number is n(B) = 60. Furthermore, the only

defective chips to consider are those in both the first column and the first row, and the total number is n(A

n B) = 15. Thus, the conditional probability of A given B is
B)
n(B)
P(AIB)= n(A 15025
60
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
18 CHAPTER 1 PROBAffiLITY

Notice can express that

if conditional we
divide both probability the
numerator in terms and
of
denominator some ordinary by
n(S) unconditional =
100, we

probabilities,
P(AlB)_nn_P) n(B)/n(S) P(B)

This last result can be derived under more general circumstances as follows. Suppose we conduct an

experiment with a sample space S, and suppose we are given that the event B has occurred. We wish to

know the probability that an event A has occurred given that B has occurred, written P(A B). That is, we

be partitioned probability

want the into of
A two relative
subsets,

to the reduced sample space B. We know that B can B = (A n B) u (A' n B) A should n

B is be the

proportional subset
of B for to which
P(A n A
B), is
true, say P(A so
the B) probability
= kP(A n B).

of A Similarly, given
B P(A' B) = kP(A' n B). Together these should represent the total probability relative

to B, so
P(AIB)+ P(A'IB) = k[P(A n B) + P(A' n B)]
= kP[(A n B) u (A' n B)]
= kP(B)
and k = l/P(B). That is,
P(AnB) P(AnB)
P(AIB) - P(A n B) + P(A' n B) -

P(B) and l/P(B) is the proportionality reduced sample space add to 1.

constant that makes the probabilities on the Definition 1.5.1
The conditional probability of ari event A, given the event B, is defined by
P(A n B)
PAB
P(B)
if P(B) O.
(1.5,1)

Relative to the sample space B, conditional probabilities defined by (1.5.1) satisfy the original definition

of probability, and thus conditional probabilities enjoy

all the usual properties of probability on the
reduced sample space. For
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 CONDITIONAL PROBABILITY 19
example, if two events A1 and A2 are mutually exclusive, then
P(A1 u A2!B)
u A2) n B]
P(B) P[(A1 n B) u (A2 n B)] P(B) P(A1 n B) + P(A2 n B) P(B)
=P(A1IB)+ P(A2IB) This result generalizes to more than two events. Similarly, P(A B) O and P(S I

Thus, B) = the P(B properties B) = 1, so the conditions of a probability set function are satisfied.
I I

derived in Section 1.4 hold conditionally. In particular, P(AJB)= i P(A'IB)

O P(A B) i P(A1
I u A1IB)=P(A1IB)+P(A2IB)P(A1 n A2IB)
The following theorem results immediately from equation (1,5.1):
Theorem 1.5.1 For any events A and B,
P(A n B) = P(B)P(AIB)= P(A)P(BIA) (1.5.5)
This sometimes is referred to as the Multiplication Theorem of probability. It provides a way to compute the

probability of the joint occurrence of A and B by multiplying

the probability of one event and the

0.15, In terms or

conditional probability of the other = 15/100 event. = Example can compute
we of

compute directly P(A n B) P(B)P(A

1.5.1, we it can as B) = (60/i00)(15/60) = 0.15 or P(A)P(BIA) =
(20/100)(15/20) = 0.15.
Formula (1.5.5) also is quite useful in dealing with problems involving sampling without replacement. Such
experiments consist of choosing objects one at a time from a finite collection, without replacing chosen
objects before the next choice. Perhaps the most common example of this is dealing cards from a deck.
Example 1.5.2 Two cards are drawn without replacement from a deck of cards. Let A1 denote
the event of getting "an ace on the first draw" and A2 denote the event of getting "an ace on the second
draw."
The number of ways in which different outcomes can occur can be enumerated, and the results are given in

Table 1.2. The enumeration of possible outcomes can be a tedious problem, and useful techniques that are

helpful in such counting

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
20 CHAPTER 1 PROBABILITY

problems doing the so-called one are

thing discussed
multiplication and n2 in

ways Section

principle, of doing 1.6.
The which another, values
says then in

that this
there if example

there n1 n 2 based
are n1 a re are ways w of o f
ays on

doing both. Thus, for example, the total number of ordered two-card hands that
can be formed from 52 cards (without replacement) is 52 51 = 2652. Similarly,

the number of ordered two-card hands in which both cards are aces is 4 . 3, the

number in which the first card is an ace and the second is not an ace is 4 . 48, and

so forth. The appropriate products for all cases are provided in Table 1.2.

TABLE 1.2 Partitioning of ways to draw the

numbers two cards

1
2 48 '4
448 48 4
47 51 48 51 451
4851 52 51

For example, the probability of getting "an ace on the first draw and an ace on the second draw" is
given by

P(A1 n A2)
- 5 2. 51
Suppose one is interested in P(A1) without regard to what happens on the second

draw. First note that A1 may be partitioned as

A1 = (A1 n A2) u (A1 n
A)
so
that

P(A1) = P(A1 n A2) + P(A1 n

A'2)

4.3 - 52 51 + 52

4'48 51 451 4
51 . 52 - 52

This same result would have occurred if A1 had been partitioned by another event, say B which
deals only with the face value of the second card This follows because n(B u B')
= 51, and relative to the 52 . 51 ordered pairs of cards,

n(A1) = 4 n(B) + 4 n(B') = 4 n(B u B') = 4 51 The numerators of probabilities such as P(A1),
P(A'1), P(A2), and P(A'2), which deal with only one of the draws, appear in the
margins of Table 1.2. These prob-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 CONDITIONAL PROBABILITY 21

abilities may be referred to as marginal probabilities. Note that the marginal probabilities in fact can be
computed directly from the original 52-card sample space, and it is not necessary to consider the sample
space of ordered pairs at all. For example, P(A1) = 4 ' 51/52 51 = 4/52, which is the probability that would
be obtained for one draw from the original 52-card sample space. Clearly, this result would apply to
sampling-without-replacement problems in general. What may be less intuitive is that these results also apply
to marginal probabilities such as P(A2), and not just to the outcomes on the first draw, That is, if the outcome
of the first draw is not known, then P(A2) also can be computed from the original sample space and is
given by P(A2) = 4/52 This can be verified in this example because
(A2 A2 = (A2 n A1)
n A)
and

P(A2) 4.3
52' 51 P(A21A1)
P(A1)

(4 ' 3)/(52 51) - (4 ' 51)1(52 51)

That is, given that A1 is true, we are restricted to the first column of Table 1.2, and the relative proportion

of the time that A2 is true on the reduced sample space is (4 ' 3)/fl4 ' 3 + (4 ' 48)]. Again, it may be less
obvious, but it is possible to carry this problem the 51 card conditional one step sample further and compute
space, and obtain the P (A2 much A1) directly in simpler solution t erms of that P
I (A2 IA1) = 3/5 1, there
being three aces remaining in the 51 remaining cards in the conditional sample space. Thus, it is common
practice in this type of problem to compute the conditional probabilities and marginal probabilities directly

from the one-dimensional sample spaces (one marginal and one conditional space), rather than obtain the

joint probabilities from the joint sample space of ordered

48'4 52' 51
452
Indeed, if the result of the first draw is not known, then the second draw could just as well be considered as

the first draw. The conditional probability that an ace is drawn on the second draw given that
an ace was
obtained on the first draw is
P(A1 n A2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
22 CHAPTER 1 PROBABILITY

pairs. For example,

P(A1 n A2)=P(A1)P(A2 JA1)

43
52 51

This procedure would extend to three or more draws (without replacement) where, for example, if
A3 denotes obtaining "an ace on the third draw," then

P(A1 n A2 n A3) = P(AI)P(A2 J A1)P(A3 JA1 n A2)

432
52 51 50

An indication of the general validity of this approach for computing conditional probabilities is

obtained by considering P(A2 A1) in the example. Relative to the joint sample

space of ordered pairs, 204 = 4 . 51, where 4 represents the number

of ways

the given event A1 can occur on the first draw and 51 is the total number of

possible outcomes in the conditional sample space for the second draw;

also, 12 = 4 . 3 represents the number of ways the given event A1 can occur

times the number of ways a success, A2, can occur in the conditional
sample space. Because the number of ways A1 can occur is a common multiplier
in the numerator and denominator when counting ordered pairs, one may equiv-
alently count directly in the one-dimensional conditional space associated with

the second draw The computational advantage of this approach is obvious,

because it allows the computation of the probability of an event in a

complicated higher- dimensional product space as a product of

probabilities, one marginal and the others

conditional, of events in simpler
one-dimensional sample spaces.
The above discussion is somewhat tedious, but it may provide insight into the physical meaning of
conditional probability and marginal probability, and also into the topic of
sampling without replacement, which will come up again in the following
sections.

TOTAL PROBABILITY AND BAYES' RULE

As noted in Example 1.5.2, it sometimes is useful to partition an event, say A, into
the union of two or more mutually exclusive events For example if B and B are

events that pertain to the first draw from a deck, and if A is an event that

pertains to the second draw, then it is worthwhile to consider the partition

A = (A n B) u (A n B') to compute P(A), because this separates A into two events

that involve information about both draws. More generally, if B1, B2, ..., Bk
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.5 CONDITIONAL PROBABILITY 23

are mutually exclusive and exhaustive, in the sense that B1 u B2 u u B =

S, thenA = (A n B1) u A n B2) u u (A n Bk)

This is useful in the following theorem.

Theorem 1.5.2 Total Probability If B1, B2, ..., Bk is a collection of mutually exclusive and
exhaustive events, then for any event A,

P(A)= )..P(BI)P(AIBI) (1.5.6) i= i

Proof The events A n B1, A nB2, ..., A n Bk are mutually exclusive, so it

follows that

P(A) = P(A n B1) ( 1 .5.7)

and the theorem results from applying Theorem 1.5.1 to each term in this sum-
mation.

Theorem 1.5.2 sometimes is known as the Law of Total Probability, because it corresponds to
mutually exclusive ways in which A can occur relative to a partii tion of the
total sample space S
Sometimes it is helpful to illustrate this result with a tree diagram. One such diagram for the case of
three events B1, B2, and B3 is given in Figure 1.3.

FIGURE 1.3 Tree diagram showing the Law of Total Probability

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
24 CHAPTER 1 PROBABILITY

The probability associated with branch B- is P(BJ, and the probability associated with each branch

labeled A is a conditional probability P(A I B.), which may

be different depending on which branch, B-, it
follows. For A to occur, it must occur jointly with one and only one of the events B-. Thus, only A n B1,
A n B2, or A n B3 must occur, and the probability of A is the sum of the probabilities of these joint events
P(BJP(A B3
EzampiG 15.3 Factory i in Example 1.5.1 has two shifts, and the microchips from factory i can be

categorized according to which shift produced them. As before, the experiment consists of choosing a

microchip at random from the box and testing to see whether

it is defective. Let B1 be the event
"produced by shift 1" (factory 1), B2 the event "produced by shift 2" (factory 1), and B3 the event "produced
by factory 2 As before, let A be the event obtaining a defective microchip The categories are given by Table 1 3
TABLE 1.3 defective common Numbers
lot

of microchips defective
and from non-
a Various probabilities can be computed directly from the table.
For example, P(B1) = 25/100, P(B2) = 35/100, P(B3) = 40/100, P(A B1) = 5/25, P(A B2) = 10/35, and P(A B3)
= 5/40. It is possible to compute P(A) either directly from the table, P(A) = 20/100 = 0.20, or by using the
Law of Total Probability:
P(A) = P(B1)P(A = B, B B3 Totals A' A
I
20 5 10 25 5 35 20
80
Totals 25 35 40 100

B1) + 725V5\ + (3sVlo\ !l\1Q0)) P(B2)P(A B2) + +

(4oVs
P(B3)P(A I B3)

= 0.05 + 0,10 + 0.05 = 0.20

This problem is illustrated by the tree diagram in Figure 1.4.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
25 1.5 CONDITIONAL PROBABILITY
FIGURE 1.4 Tree diagram for selection of microchips from combined lo
Consider the following variation on Example 1.5.3, The microchips are sorted into three separate boxes.

Box i contains the 25 microchips from shift 1, box 2 contains the 35 microchips from shift 2, and box 3

contains the remaining 40 microchips

from factory 2. The new experiment consists of choosing a box at
random then selecting a microchip from the box This experiment is illustrated in Figure 1.5.
FIGURE 1.5 S&ection of microchips from three different sources
5 defective 20 good
Box i Box2 Box3
In this case, it is not possible to compute P(A) directly from Table 1.3, but it still is possible to use equation
(1.5.6) by redefining the events B1, B2, and B3 to be respectively choosing "box 1," "box 2," and "box 3."
1/3, and
Thus, the new assignment of probability to B1, B2, and B3 is P(B1) = P(B2) P(B3) =

P(A) = 71V
s \ + ÍiVio\
+
5/25 flB
10/3 5
5/40
10 defective 25 good (i\( s 57 280 As a result of this new experiment, suppose that the component obtained
is defective, but it is not known which box it came from. It is possible to compute the probability that it

came from a particular box given that it was defective, although

a special formula is required.
A f l B2

A n B,
Ex8mple 1.5.4
5 defective 35 good
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
26 CHAPTER
1 PROBABILITY

Theor@rn 1.5.3 Bayes' Rule If we assume the conditions of Theorem 1.5.2, then for each j = 1,
P(BJ)P(A
2,..,,k, P(BJ
A)
I I B) (1 5°)
P(B)P(A B)

Proof
From Definition 1.5.1 and Multiplication Theorem 1.5.5 we have

P(A n B) P(B)P(AIB)
P(B.IA) J

P(A) P(A)

The theorem follows by replacing the denominator with the right side of (1.5.6).

For the data of Example 1.5.4, the conditional probability that the microchip came from box 1, given
that it is defective, is

(1/3)(5/25)
B P( A) - (1/3)(5/25) + (1/3X10/351+

(1/3)(5/40)

= 0,327
=

Similarly, P(B2 A) = 80/171 = 0.468 and P(B3 IA) = 35/171 = 0.205. Notice that these differ from

the unconditional probabilities, P(B) = 1/3 =

0.333. This reflects the
different proportions of defective items in the boxes. In other words because

box 2 has a higher proportion of defectives choosing a defective

item
effectively increases the likelihood that it was chosen from box 2
For another illustration, consider the following example.

Example 1.5.5 A man starts at the point O on the map shown in Figure 1.6. He first chooses a

path at random and follows it to point B, B2, or B3. From that point, he chooses a

new path at random and follows it to one of the points A, i = 1, 2,...,7.

FIGURE 1.6 Map of possible paths

A A2 A5 A4 A5 A6 A7
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.5 CONDITIONAL PROBABILITY 27

It might be of interest to know the probability that the man arrives at point A4. This can be computed
from the Law of Total Probability:
P(A4) = P(B1)P(A4 B1) + P(B2)P(A4 B2) + P(B3)P(A4 B3)
Definition 15.2
Two events A and B are called independent events if
P(A n B) = P(A)P(B) (1.5.9)
Otherwise, A and B are called dependent events.

= (O(i) + ()() + () =
i
Suppose the man arrives at point A4, but it is not known which route he took. The probability that he
passed through a particular point, B1, B2, or B3, can be computed from Bayes' Rule. For example,
P(B1jA4)=
(1/3)(1/4) 1 (1/3)(1/4) + (1/3X1/2) + (1/3X0) - 3 which agrees with the unconditional probability, P(B1) = 1/3.
This is an example of a very special situation called "independence," which we will pursue in the next
section. However, this does not occur in every case. For example, an application of Bayes' Rule also leads to
P(B2 A4) = 2/3, which does not agree with P(B2) = 1/3. Thus, if he arrived at point A4, it is twice as likely
I

that he passed through point B2 as it is that he passed through B1. Of course, the most striking result This

reflects the concerns

point B3, because P(B3 obvious fact that he cannot arrive A4) at = point 0, while
I

A4 P(B3) by passing = 1/3. through

point B3. The practical value of conditioning is obvious when consider-
ing some action such as betting on whether the man passed through point B3.
INDEPENDENT EVENTS In some situations, knowledge that an event A has occurred will not affect the

probability that an event B will occur. In other words, P(B A) = P(B). We saw this
happen in Example 1 5
5, because the probability of passing through point B1 was 1/3 whether the knowledge that the man

arrived at point A4 was taken into account. As a result of the Multiplication Theorem (1.5.5), an

equivalent formulation
when this of
this situation happens the two is
P(A n B) = events are said P(A)P(B

to be independent A) = P(A)P(B). In general, or stochastically independent.

I

As already noted, an equivalent formulation can be given in terms of conditional probability.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
28 CHAPTER 1 PROBABILITY
Theorem 15.4 If A and B are events such that P(A) > 0 and P(B) > 0, then A and B are inde-
pendent if and only if either of the following holds:

P(AIB)=P(A) P(BIA)_P(B) J
We saw examples of both independent and dependent events in Example 1.5.5. There was also an example of
mutually exclusive events, because P(B3 A4) = 0, which implies P(B1 n A4) = 0. There is often confusion
between the concepts of independent events and mutually exclusive events. Actually, these are quite different
notions, and perhaps this is seen best by comparisons involving conditional probabilities. Specifically, if

A and B are mutually exclusive, then P(A B) = P(B A) = 0, whereas for independent nonnull events the
I

conditional probabilities
are nonzero as noted by Theorem 1.5.4. In other words, the property of being
mutually exclusive involves a very strong form of dependence, because, for nonnull events, the occurrence
of one event precludes the occurrence of the other event.
There are many applications in which events are assumed to be independent.
Example 15.6 A "system" consists of several components that are hooked up in some particular

configuration. It is often assumed in applications that the failure of one component
does not affect the
likelihood that another component will fail. Thus, the failure of one component is assumed to be
independent of the failure of another component.
A series system of two components, C1 and C2, is illustrated by Figure 1.7. It is easy to think of such a system
in terms of two electrical components (for example, batteries in a flashlight) where current must pass
through both components for the system to function. If A1 is the event "C1 fails" and A2 is the event "C2

fails," then the event "the system fails" is A u A2. Suppose that P(A1) = 0.1 and P(A2) = 0.2. If we
assume that A1 and A2 are independent, then the probability that the system fails is
P(A1 u A2) = P(A1) + P(A2) - P(A1 n A2)
P(A1) + P(A2) - P(A1)P(A2)
= 0.1 + 0.2 - (0.1)(0.2) = 0.28 The probability that the system works properly is i - 0.28 = 0.72.
FIGURE 1.7 Series system of two components
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.5 CON$TIONAL PROBABtLITY 29

Notice that the assumption of independence permits us to factor the probability

of the joint event, P(A1 r A2), into the product of the marginal probabilities,
P(A1)P(A2).
Another common example involves the notion of a parallel system, as illus-
trated in Figure 1.8. For a parallel system to fail, it is necessary that both com-

ponents fail so the event the system fails is A1 r' A2 The probability that thu

system fails is P(A1 A2) = P(A1)P(A2) = (0.ÎXO.2) = 0.02, again assuming the

components fail independently. Note

that the probability of failure for a series
system is greater than the probability of failure of either component, whereas
for a parallel system it is less. This is because both components must function
for a series system to function, and consequently the system is more likely to
fail than an individual component. On the other hand, a parallel system is a
redundant system: One component can fail, but the system will continue to

function provided the other component functions. Such redundancy is common

in aerospace systems, where the failure of the system

may be catastrophic.
A common example of dependent events occurs in connection with repeated sampling without
replacement from a finite collection. In Example 1.5.2 we considered the
results of drawing two cards in succession from a deck. It turns out that the
events A1 (ace on the first draw) and A2 (ace on the second draw) are
dependent because P(A2) = 4/52, while P(A2 A1) = 3/51 Suppose instead that

the outcome of the first card is recorded and then the card is replaced in the

deck and the deck is shuffled before the second draw is made.
This type of
sampling is referred to as sampling with replacement, and it would be
reasonable to assume that the draws are independent trials. In this case P(A1 r

A2) = P(A1)P(A2). There are many other problems in which it is reasonable

to assume that repeated

trials of an experiment are independent, such as tossing
a coin or rolling a die repeatedly.
It is possible to show that independence of two events also implies the independence of some related
events.

FIGURE 1.8 Parallel system of two components

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
30 CHAPTER 1 PROBABILITY

Theorem 1.5.5 Two events A and B are independent if and only if the following pairs of events
are also
independent:

AandB'.
A' and B.
A' and
B'.

Proof
See Exercise 38.

It is also possible to extend the notion of independence to more than two events.

Definition 15.3
The k events A1, A2.....Ak are said to be independent or mutually independent if
for e very j 2, 3.....k and every subset of distinct indices i1, i2, ...,

r Al2 n n A,) = P(A,,)P(A,2) . P(A) (1.5.10)

Suppose A, B, and C are three mutually independent events. According to the

definition of mutually independent events, it is not sufficient simply to verify

pairwise independence. It would be necessary to verify P(A n B) = P(A)P(B),
P(A n C) = P(A)P(C), P(B n C) = P(B)P(C), and also P(A n B n C) =
P(A)P(B)P(C). The following examples show that pairwise independence does
not imply this last threeway factorization and vice versa.

Example 1.5.7 A box contains eight tickets, each labeled with a binary number. Two are labeled

111, two are labeled 100, two 010, and two 001. An experiment consists of drawing
one
ticket at random from the box. Let A be the event "the first digit is 1," B the event "the second
digit is 1," and C the event "the third digit is 1." This is illustrated by Figure 1.9. It follows that
P(A) = P(B) = P(C) 4/8 = 1/2 and that P(A n B) = P(A n C) P(B n C) = 2/8 = 1/4, thusAB,
and C arçpa1r wise independent. However, they are not mutually independent, because
P(A n B n C) = = = P(A)P(B)P(C)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 COUNTING TECHNIQUES 3
1

FIGURE 1.9 Selection of numbered tickets

COUNTING TECHNIQUES
In be many
reasonable experiments
to assume with
that finite
all sample

posible spaces,
outcomes such
as are games
equally of
chance, likely. In it

may that case
a realistic probability model should result by following the
classical approach and taking the probability of any event A to be P(A) =
n(A)/N, where N is the total number of possible outcomes and n(A) is the

number of these outcomes

that correspond to occurrence of the event A.

Counting the number of ways in which an event may occur can be a tedious
problem in complicated experiments.
A few helpful counting techniques will be
discussed.
Exampk 1.5.8 In Figure 1.9, let us change the number on one ticket in the first column from 111 to

110, and the number of one ticket in the second column from loo to 101. We still haveP(A) =

P(B) = P(C) =

bu
t

P(B n C) = = P(B)P(C)

and

P(A n B n C) = =
P(A)P(B)P(C)

In this case we have three-way facto rization, but not independence of all pairs.

16
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
CHAPTER 1 PROBABIUTY
32
MULTIPLICATION PRINCIPLE First note that if one operation can be performed in n1 ways and a
second operation can be performed in n2 ways, then there are n1 n2 ways in which both operations can
be carried out
Example i .6.1 Suppose a coin is tossed and then a marble is selected at random from a box containing
one black (B), one red (R) and one green (G) marble The possible outcomes are HB, HR, HG, TB, TR,
and TG. For each of the two possible outcomes of the coin there are three marbles that may be selected
tree
for a total of 2 3 = 6 possible outcomes. The situation also is easily illustrated by a diagram, as in
Figure 1.10.
FIGURE i.io Tree diagram of two-stage experiment
HR
discussed in Example
Another application of the multiplication principle was 1.5.2 in connection with
two-card hands.
counting the number of ordered
Note that the multiplication principle can be extended to more than two operations. In particular, if the ith
be performed in n,
of r successive operations can ways, then the total number of ways to carry out all r
operations is the product
I (1.6.1)

One standard type of counting problem is covered by the following theorem.
Theorem 1.6.1 If there are N possible outcomes of each of r trials of an experiment, then there
are N possible outcomes in the sample space.
answered? The answer
Example I .6,2 How many ways can a 20-question truefalse test be
is 220.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 COUNTING TECHNIQUES
33

Example 1.6.3 How many subsets are there from a set of m elements? In forming a subset, one

must decide for each element whether to include that element in the subset. Thus for each of m

elements there are two choices, which give a total of 2m possible subsets. This includes the null

set, which corresponds to the case of not including any element in the subset.

As suggested earlier, the way an experiment is carried out or the method of sampling may affect the

sample space and the probability assignment over the sample space. In

particular, sampling items from a finite population with and without

replacement are two common schemes. Sampling without replacement was

illustrated in Example 1.5.2. Sampling with replacement is covered by

Theorem 1.6.1.

Example 1.6.4 If five cards are drawn from a deck of 52 cards with replacement, then there are
(52) possible hands. If the five cards are drawn without replacement, then the more general
multiplication principle may be applied to determine that there are 52 . 5 . 5f . 49 . 48 possible
hands. In the first case, the same card may occur more than once in the same hand. In the
second case, however, a card may not b e repeated.

Note that in both cases in the above example, order is considered important. That is, two five-card
hands may eventually end up with the same five cards, but they are counted as

different hands in the example if the cards were obtained in a different order. For

example, let all five cards be spades. The outcome (ace, king, queen,
jack, ten) is
different from the outcome (king, ace, queen, jack, ten). If order had not been
considered important, both of these outcomes would be considered the same;

indeed, there would be several different ordered outcomes cor- responding to

this same (unordered) outcome, On the other hand, only one outcome

corresponds to all five cards being the ace of spades (in the sampling-
with-replacement case), whether the cards are ordered or unordered. This

introduces the concept of distinguishable and indistinguishable elements. Even

though order may be important, a new result or arrangement will not be

obtained if two indistinguishable elements are interchanged. Thus, fewer ordered

arrangements are possible if some of the items are indistinguishable. We also

noted earlier that there are fewer distinct results if order is not taken into
account, but the probability of any one of these unordered results occurring then

would be greater. Note also that it is common practice to assume that order is

not important when drawing without replacement, unless otherwise specified,

although we did consider order important in Example 1.6.4.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
34 CHAPTER 1 PROBABILITY
PERMUTATIONS AND COMBINATIONS Some particular formulas that are helpful in counting the
number of possible arrangements for sorne of the cases mentioned will be given. An ordered arrangement
of a set of objects is known as a permutation.
Theorem 1.6.2 The number of permutations of n distinguishable objects is n!.
Proof
This follows by applying the multiplication principle. To fill n positions with n distinct objects, the first
position may be filled n ways using any one of the n objects, the second position may be filled n - i ways using
any of the remaining n - i objects, and so on until the last object is placed in the last position. Thus, by n(n-1)
the multiplication .....1=Ñ!ways.
principle, this operation may be carried out in
For example, the number of arrangements of five distinct cards is 5! = 120. One also may be interested in the
number of ways of selecting r objects from n distinct objects and then ordering these r objects.
Theorem 1.6.3 The number of permutations of n distinct objects taken r at a time is

- r)!
= (n n!
Proof
To fill r positions from n objects, the first position may be filled in n ways using any one of the n objects, the

second position may be filled in n - i ways, and so on until n - (r - 1) objects are left to fill in the rth

position. Thus, the total number

of ways of carrying out this operation is
n(n-1)(n-2) .....(n_(r_i))=(')!
1 .6.2)

FicurùpIe 165 The number of permutations of the four letters a, b, c, d taken two at a time is 4!/2! =
12. These are displayed in Figure 1.11. In picking two out of the four letters,
there are six unordered ways
to choose two letters from the four, as given by the top row. Each combination of two letters then can be
permuted 2! ways to g et the total of 12 ordered arrangements.
FIGUflE 1.11 Permutations of four objects taken two at a time
ab ac a d bc bd cd ha ca da cb db dc
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 COUNTING TECHNIQUES
35

Example 1.6.6 A box contains n tickets, each marked with a different integer, 1, 2, 3, ..., n. If three

tickets are selected at random without replacement, what is the probability of

obtaining tickets with

consecutive integers One possible solution would be to let the sample space consist of all ordered triples (i,

j, k), where i different integers in the range i to n. The number of such triples = n - 3) = n(n - 1)(n -

2) The triples that consist of consecutive j and k are integers is P3 w
ould be (1, 2 3), (2, 3 4) ,

permuting the entries in these (n There - 2, n would - 1, n) be or 3 any (n of - the 2) triples = 6 triples.

The desired probability is formed by (n - 2) such 6(n-2) 6(n-2) 6

n(n - 1)(n - 2) - n(n - 1)
If the order of the objects is not important, then one may simply be interested in the number of
combinations that are possible when selecting r objects from n

distinct objects. The symbol \ r) usually

is used to denote this number. Theorem 1.6.4 The number of
combinations of n distinct objects chosen r at a time is
(n (1.6.3)
Proof
As suggested in the preceding example, ,,P, may be interpreted as the number of ways of choosing r objects

from n objects and then permuting the r objects r! ways,

giving

,jr = (n\

(n).
rj Ir!-
(nr)! n!
Dividing by r! gives the desired expression for
Thus, the number of combinations of four letters taken two at a time is (4"\ 4!
6, as noted above. If rnorder is considered, then the number of
21 2!2!
arrangements becomes 6 2! = 12 as before. Thus, () counts the number of
paired symbols in either the first or second row, but not both, in Figure 1.11. It also is possible to solve the

probability problem in Example 1.6.6 using combinations.

The sample space would consist of all
combinations of the n inte-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
36 CI-IAPTER i PROBABILITY

gers 1, 2, ..., n taken three at a time. Equivalently, this would be the collection of
all subsets of size 3 from the set {1. 2, 3, ..,, n}, of which there are

(n'\ n! n(n lXn-2)

The n - 2 combinations or subsets of consecutive integers would be {1, 2, 3},

{ 2, 3, 4}, ..., {n - 2, n - 1, n}. As usual, no distinction should be made of subsets
that list the elements in a different order. The resulting probability is

(n-2) 6 [n(n - 1)(n - 2)16]

as before. This shows that some problems can be solved using either combinations or permutations.

Usually, if there is a choice, the combination approach is simpler because the
sample space is smaller. However, combinations are not appropriate in some
problems

Example 1.6.1 In Example 1.6.6, suppose that the sampling is done with replacement. Now, the
same number can be repeated in the triples (i, f k), so that the sample space has n3 outcomes.
There are still only 6(n - 2) triples of consecutive integers, because repeated integers cannot be

consecutive. The probability of consecutive integers in this case is 6(n - 2)/n3. Integers can be

repeated in this case, so the com- bination

approach is not appropriate.

Example 1.6.8 4. familiar use of the combination notation is in expressing the binomial

expansion(a + b) = k O ()akb
(1 6 4)

In this case, () is the coefficient of akbn_k, and it represents the number of ways

of choosing k of the n factors (a + b) from which to use the a term, with the b
term being used from the remaining n - k factors.

Example 1.6.9 The combination concept can be used to determine the number of subsets of a set

of m elements. There are ways of choosing j elements from the ni elements,

\JJ
so there are () subsets off elements forI = 0, 1, ..., m. The case j = O corre-

sponds to the null set and is represented by () = i because O' is defined to be

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 COUNTING TECHNIQUES
37
equal to 1, for notational convenience. Thus the total number of subsets including the null set is given by
= (1 + 1)
Example 1.6.10 If five cards are drawn from a deck of cards without replacement, the number of
five-card hands is
f'52'\ 52!
5) = 5!47!

If order is taken into account as in Example 1.6.4, then the number of ordered five-card hands is52P5 = (52

5)5. = 52! be

Similarly, in Example 1.5.2 the number of ordered two-card hands was given to (52
= 52 51
k2
FIGURE 1.12 Distinguishable arrangements of five objects, two of one type and three of another

BBWWW WWBBW BWBWW

WWBWB WWWBB WBBWW
BWWBW WBWWB BWWWB WBWBW

Notice that arrangements are distinguishable if they differ by exchanging marbles of different colors,

but not if the exchange involves the same color. We will refer to these 10 different arrangements as

permutations of the five objects even though the objects are not all distinguishable. (1.6.5)

INDISTINGUISHABLE OBJECTS

guishable. The
objects. discussion
There are to
also this
many point
applications has
dealt with involving

arrangements objects that of

are n distinguishable not all distin- Example
16.11 You have five marbles, two

black and three white, but otherwise indistinguish- black able. (B) In Figure and three 1,12, white we (W)

arbles all the distinguishable a rrangements of two

represent m
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
38 CHAPTEF1 i PROBABILITY

A more general way to count such permutations first would be to introduce labels for the objects,

say B1 B2 W1 W2 W3. There are 5! permutations of these distinguishable

objects, but within each color there are permutations that we don't
want to
count. We can compensate by dividing by the number of permutations of
black objects (2 ) and of white objects (3!) Thus, the number of permutations of
nondistinguishable objects is

5! 2!3!
=

This is a special case of the following theorem.

Theorem 1.6.5 The number of distinguishable permutations of u objects of which r are of one
kind and n - r are of another kind
is
1.6.6)

(n'\ kr) n!
r!(n
- r)!

Clearly, this concept can be generalized to the case of permuting k types of objects.

Theorem 1.6.6 The number of permutations of n objects of which r1 are of one kind, 12 of a

second kind, ..., rk of a kth kind is

n
!
(1.6.7)

Proof
This follows from the argument of Example 1.6.11, except with k different colors
of balls.

Example 1.6.12 You have 10 marblestwo black, three white, and five red, but otherwise not
distinguishable The number of different permutations is

2520
2!5!

The notion of permutations of n objects, not all of which are distinguishable, is related to yet another
type of operation with n distinct objects.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
1.6 COUNTING TECHNIQUES 39

PARTITIONING Let us select r objects fromn distinct objects and place them in a box or "cell," and then
(n)
place the remaining n - r objects in a second cell. Clearly, there are
ways of doing this (because permuting the objects within a cell will not produce a new result), and this is

referred to as the number of ways of partitioning n objects into two cells with r objects in one cell and n -

r in the other, The concept generalizes

readily to partitioning n distinct objects into more than two cells.
Theorem 1.6.7 The number of ways of partitioning a set of n objects into k cells with r1 objects
in the first cell r2 in the second cell, and so forth is
n! r1!r2!
k where r. = n.

Note that partitioning assumes that the number of objects to be placed in each cell is fixed, and that the

order in which the objects are placed into cells is not considered. By successively selecting the objects,

the number of partitions also may be expressed

(n(n - r2 r1 ) (n - r1 - - rk_1 n!

rk ) - r1!r2! ... rk!
Exampk, 1.6.13 How many ways can you distribute 12 different popsicles equally among four
children? By Theorem 1.6.7 this is
12. 3!3!3!3!

- 369,600
This is also the number of ways of arranging 12 popsicles, of which three are red, thrçe are green, three are
orange, and three are yellow, if popsicles of the same color are otherwise indistinguishable.
PROBABILITY COMPUTATIONS
As mentioned earlier, if it can be assumed that all possible outcomes are equally likely to occur, then the
classical probability concept is useful for assigning probabilities to events, and the counting techniques
reviewed in this section may be helpful in computing the number of ways an event may occur.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
40 CHAPTER 1 PROBAWLITY

Recall that the method of sampling, and assumptions concerning order,

whether the items are indistinguishable, and so on, may have an effect on the
number of possible outcomes.

ExampI '1.6.14 A student answers 20 truefalse questions at random. The probability of getting

100% on the test is P(100%) = 1/220 = 0.00000095. We wish to know the prob-
ability of getting 80% right, that is, answering 16 questions correctly. We do not

care which 16 questions are answered correctly, so there are () ways of choos

ing exactly 16 cortect answers, and P(80%) = ()/220 = 00046.

ExampM 1,6.15 Sampling Without Replacement A box contains 10 black marbles and 20 white
marbles, and five marbles are selected without replacement The probability of getting exactly

two black marbles is

(l0)(20)

P(exactly 2 black) = - 0.360 (1.6.8)

(0)
There are total possible outcomes. Also there are() ways of choosing

the two black marbles from the 10 black marbles, and 3) ways
of choosing the

()(0)
remaining principle, there three arewhite marbles ways from the of

achieving 20 white marbles. the event By o f the getting multiplication t wo
black

marbles. Note that order was not considered important in this problem, although
all 30 marbles are considered distinct in this computation, both in considering
the total number of outcomes in the sample space and in considering how many

outcomes correspond to the desired event occurring. Even though the question
does not distinguish between the order of outcomes, it is possible to consider the
question relative to the larger sample space of equally likely ordered outcomes.
In that case one would have 30P5 =() . 5! possible outcomes and
(1.6.9
)
P(exactly 2
black

which gives the same answer as before.

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1.6 COUNTING TECHNIQUES
41

It also is possible to attack this problem by the conditional probability

approach discussed in Section 1 5 First consider the probability of getting the

outcome BBWWW in the specified order Here we choose to use the distinction
between B and W but not the distinction within the B's or within the W s By the
conditional probability approach, this joint probability may be expressed as

P(BBWWW) -

Similarly,P(BWBWW) 10209
1918

30 29 28 27 26

and so on. Thus, each particular ordering has the same probability. If we do not
wish to distinguish between the ordering of the black and white marbles, then

P(exactly 2 black) = (5'\

109201918 30 29 28 27 26 (1.6.10)

which again is the same as equation (1.6.8). That is, there are () = 10 different

particular orderings that have two black and three white marbles (see Figure 1.12).

One could consider () as the number of ways of choosing two positions out of

the five positions in which to place two black marbles. If a particular order is not

required, the probability of a successful outcome is greater. We

could continue to
co'isider all 30 marbles distinct in this framework but because only the order
between black and white was considered in computing a

particular sequence, it follows that there are only () unordered sequences rather

than 5! sequences. Thus, although two black marbles may be distinct, permuting them does not
produce a different result The order of the black marbles within themselves
was not considered important when defining the ordered sequences;

only the order between black and white was considered. Thus the
coefficient

could also be interpreted as the number of permutations of five things of which

two were alike and three were alike (see Figure 1.12).
Thus, we have seen that it is possible to think of the black and white marbles as being
indistinguishable within themselves in this problem, and the same value for
P(exactly 2 black) is obtained; however, the computation is no longer carried
out over an original basic sample space of equally likely outcomes. For example,
on the first draw one would just have the two possible outcomes, B and W,
although these two outcomes obviously would not be equally likely, but rather

P(B) = 10/30 and P(W) = 20/30. Indeed, the assumption that the black marbles
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
42 CHAPTER 1 PROBABILITY
and white marbles are indistinguishable within themselves appears more natural in the conditional
probability approach. Nevertheless, the distinctness assumption is a convenient aid in the first approach

to obtain the more basic equally likely sample space, even though the question itself does not

require dis- tinguishing

within a color.
Example 1.6.16 Sampling with Replacement If the five marbles are drawn with replacement in Example
1.6.15, then the conditional probability approach seems most natural and analogous P(exactly to
(1.6.10),

2 black)= /5V1o\2(2o3
2)ö) ö ) (1.6.11)
Of course, in this case the outcomes on each draw are independent. If one chooses to use the classical

1.6.15 consider it is more the sample convenient space just of
approach in this case, it is more convenient to
(3)
30 to consider equally likely the sample ordered space outcomes; of in unordered Example

outcomes as in equation (1.6.8), rather than the ordered outcomes as in equation (1.6.9). For event A, one then
has "exactly 2 black,"

n)
P(A) - The form in this case remains quite similar to equation (1.6.11), although the
argument would be somewhat different. There are () different patterns in which
the ordered arrangements may contain two black and three white marbles, and for each pattern there are
102303 distinct arrangements that can be formed in this sample space.

Because many diverse types of probability problems can be stated, a unique approach often may be

needed to identify the mutually exclusive ways that an event can occur in such a manner that these

ways can be readily counted. However,

certain classical problems (such as those illustrated in Examples
1.6.15 and can 1.6.16) be determined can be recognized for them. For easily these and problems, general
probability the individual distribution counting functions problems need not be analyzed so carefully each
time.
SUMMARY
The purpose of this chapter was to develop the concept of probability in order to model phenomena where

the observed result is uncertain before experimentation. The basic approach involves defining the sample

space as the set of all possible

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
43

outcomes of the experiment, and defining an event mathematically as the set of outcomes associated
with occurrence of the event, The primary motivation for assigning probability
to an event involves the long-term relative frequency interpretation. However,

the approach of defining probability in terms of a simple set of axioms is more

general, and it allows the possibility of other methods of assignment

and
other interpretations of probability. This approach also makes it possible to
derive general properties of probability.
The notion of conditional probability allows the introduction of additional information concerning
the occurrence of one event when assigning probability to another. If the
probability assigned to one event is not affected by the information that another

event has occurred, then the events are considered independent Care should be

taken not to confuse the concepts of independent and mutually exclusive

events. Specifically, mutually exclusive events are dependent,. because the
occurrence of one precludes the occurrence of the other. In other words, the
conditional probability of one given the other is zero.
One of the primary methods of assigning probability, which applies in the case of
a finite sample space, is based on the assumption that all outcomes are equally
likely to occur. To implement this method, it is useful to have techniques for
counting the number of outcomes in an event. The primary techniques include

formulas for counting ordered arrangements of objects (permutations) and

unordered sets of objects (combinations). To

express probability models by
general formulas, it is convenient first to introduce the concept of a "random
variable" and a function that describes the probability distribution. These
concepts will be discussed in the next chapter, and general solutions then can be
provided for some of the basic counting problems most often encountered.

EXERCISES

1. A gum-ball machine gives out a red, a black, or a green gum ball.

Describe an appropriate sample space. List all
possible events. If R is the event "red," then list
the outcomes in R'. If G is the event "green," then
what is R r' G?

2. Two gum balls are obtained from the machine in Exercise I from two trials. The order of
the outcomes is important. Assume that at least two balls of each color are in the machine,
What is an appropriate sample space? How many total possible events are there that contain
eight outcomes? Express the following events as unions of elementary events. C1 = getting a

There are
red ball on the first trial, C2 = getting at least one red ball, C1 n C2, C'1 n C2. 3.

four basic blood groups: O, A, B, and AB. Ordinarily, anyone can receive the blood of a donor

from their own group. Also, anyone can receive the blood of a donor f rom the O group, and

any of the four types can be used by a recipient from the AB group.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
44 CHAPTER 1 PROBABILITY

All other possibilities are undesirable. An experiment consists of drawing a pint of blood
and determining its type for each of the next two donors who enter a blood bank.
List the possible (ordered) outcomes of this experiment. List the outcomes
corresponding to the event that the second donor can receive the blood of the first
donor. List the outcomes corresponding to the event that each donor can receive
the blood of the other.

4. An experiment consists of drawing gum balls from a gum-ball machine until a red ball is
obtained. Describe a sample space for this experiment.

5. The number of alpha particles emitted by a radioactive sample in a fixed time interval is
counted.Give a sample space for this

experiment.
The elapsed time is measured until the first alpha particle is emitted. Give a sample space for this experiment.

6. An experiment is conducted to determine what fraction of a piece of metal is gold. Give a
sample space for this experiment.

7. A randomly selected car battery is tested and the time of failure is recorded. Give an
appropriate sample space for this experiment.

8. We obtain 100 gum balls from a machine, and we get 20 red (R), 30 black (B), and 50 green
(G) gum balls.
Can we use, as a probability model for the color of a gum ball from the machine, one given by p3 = P(R) = 0.2, P2
P(B) = 0.3, and p3 = P(G) = 0.5? Suppose we later notice that some yellow (Y)
gum balls are also in the machine. Could we use as a model p1 0.2, P2 = 0.3, p 3
0.5, and p4 = P(Y) = 0.1?

9. In Exercise 2, suppose that each of the nine possible outcomes in the sample space is
equally likely to occur. Compute each of the following:
P(both red)
P(C1).
P(C2). P(C1
n C2),
P(C'L n C2).
P(C1 u C2).

10. Consider Exercise 3. Suppose, for a particular racial group, the four blood types are
equally likely to
occur.
Compute the probability that the second donor can receive blood from the first donor. Compute the probability
that each donor can receive blood from the other. Compute the probability that
neither can receive blood from the other.

11. Prove that P(Ø) = 0. Hint: Let A = 0 for all i in equation (1.3.3).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
45

12. Prove equation (1.3.5). Hint: Let A Ø for all i> k in equation (1.3.3).

13. When an experiment is performed, one and only one of the events A1, A2, or A3 will
occur. Find P(A1), F(A2), and P(A3) under each of the following assumptions:
P(A1) = P(A2) P(A3). P(A1) =
P(A2) and P(A3) = 1/2. P(A1) =
2P(A2) = 3P(A3).

14. A balanced coin is tossed four times. List the possible outcomes and compute
the
probability of each of the following events:
exactly three heads. at least one head. the
number of heads equals the number of tails. the
number of heads exceeds the number of tails.

15. Two part-time teachers are hired by the mathematics department and each is assigned at
random to teach a single course, in trigonometry, algebra, or calculus. List the outcomes in the
sample space and find the probability that they will teach different courses. Assume that more
than one section of each course is offered.

16. Prove Theorem 1.4.4. Hint: Write A u B u C = (A u B) u Cand apply Theorem 1.4.3.

17. Prove Theorem 1.4.5. Hint: IfA B, then we can write B = A u (B n A'), a disjoint

union

18. If A and B are events, show that:

P(A n B') = P(A) - P(A n B). P(A u B) = i - P(A' n B').

19. Let P(A) = P(B) = 1/3 and P(A n B) 1/10, Find the following
P(B'). P(A
u B'). P(B
n A').
P(A' u
B').

20. Let P(A) = 1/2, P(B) 1/8, and P(C) = 1/4, where A, B, and C are mutually exclusive.
Find the following:
P(A u B u C).
P(A' n B' n C').

21. The event that exactly one of the events Aor B occurs can be represented as

(A n B') u (A' n B). Show that

P[(A nB)u(A
nB)]=P(A)-i-P(B)-2P(AnB)
22. A track star runs two races on a certain day. The probability that he wins the first race is
0.7, the probability that he wins the second race is 0.6, and the probability that he wins

both races is 0.5. Find the probability that:

(a) he wins at least one
race.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1 PROBABILITY
46 CHAPTER

he wins exactly one race. (e) he wins neither race.

23. A certain family owns two television sets, one color and one black-and-white set. Let A be
the event the color set is on and B the event the black and white set is on If P(A) = 04
P(B) = 0 3 and P(A u B) = 0 5 find the probability of each event
both are on. the color set is on and
the other is offi exactly one set is on.
neither set is on.

24. Suppose P(A1) = 1/(3 + i) for ¡ = 1, 2, 3, 4. Find an upper bound for

P(A1 u A2 u A3 u A4).

25. A box contains three good cards and two bad (penalty) cards. Player A chooses a card and
then player B chooses a card. Compute the following probabilities:
P(A good). P(B good A good). P(B good A bad). P(B good n A good) using (1.5.5).
Write out the sample space of ordered pairs and compute P(B good n A good) and
P(B good A good) dìrectly from definitions. (Note: Assume that the cards are
distinct.) P(B good). P(A good j B good).

26. Repeat Exercise 25, but assume that player A looks at his card, replaces it in the box, and
remixes the cards before player B draws.

27. A bag contains five blue balls and three red balls. A boy draws a ball, and then draws
another without replacement. Compute the following probabilities:
P(2 blue balls). P(1
blue and i red).
P(at least i blue).
P(2 red balls).

28. In Exercise 27, suppose a third ball is drawn without replacement. Find:
(a) P(no red balls left after third draw).
(b P(i red ball left).
P(first red ball on last draw). P(a red ball on last draw).

29. A family has two children. It is known that at least one is a boy. What is the probability
that the family has two boys, given at least one boy? Assume P(boy) = 1/2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
EXERCISES
47
30. Two cards are drawn from a deck of cards without replacement.
What is the probability that the second card is a heart, given that the first card is a heart? What
is the probability that both
cards are hearts, given that at least one is a heart? 31. A box contains five green balls, three black balls, and seven red
balls. Two balls are selected at random without replacement from the box. What is the probability that:
both balls are red? both balls are the same color?
32. A softball team has three pitchers, A, B, and C, with winning percentages of 0,4, 0.6, and

0 8 respectively These pitchers pitch with frequency 2 3 and 5 out of every 10 games respectively P(C) = 05 Find In

other words for a randomly selected game P(A) = 0 2 P(B) = 0 3 and P(team wins game) = P(W) P(A pitched game team
won) = P(A W) 33. One card is selected from a deck of 52 cards and placed in a second deck. A card then is
I

selected from the second deck

What is the probability the second card is an ace? If the first card is placed into a deck of 54 cards containing two jokers,
then what is the probability that a card drawn from the second deck is an ace? Given that an ace was drawn from the second
deck in (b), what is the conditional probability that an ace was transferred?
34. A pocket contains three coins, one of which had a head on both sides, while the other two coins are normal A coin is

chosen at random from the pocket and tossed three times

Find the probability of obtaining three heads. Ifa head turns up all three times, what is the probability that this is the

two-headed coin? 3.

In a bolt factory, machines 1, 2, and 3 respectively produce 20%, 30%, and 50% of the total at random.

output. Of their 'respective outputs, 5%, 3%, and 2% are defective. A bolt is selected What
is the probability that it is
defective? Given that it s defective what is the probability that it was made by machine 1 9
36. Drawer A contains five pennies and three dimes, while drawer B contains three pennies and seven dimes. A drawer is

selected at random, and a coin is selected at random from that drawer.

Find the probability of selecting a dime. Suppose a dime is obtained What is the probability that it came from drawer B
37. Let P(A) = 0.4 and P(A u B) 0.6.
For what value of P(B) are A and B mutually exclusive? For what value of P(B) are A and B independent?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
48 CHAPTER 1 PROBABILITY

Prove Theorem 1.5.5. Hint: Use Exercise

18.

Three independent components are hooked in series. Each component fails with
probability p What is the probability that the system does not fail9

Three independent components are hooked in parallel. Each component fails with
probability p. What is the probability that the system does not fail?

Consider the following system with assigned probabilities of malfunction for the
five components. Assume that malfunctions occur independently.

What is the probability the system does not malfunction?

The probability that a marksman hits a target is 0.9 on any given shot, and repeated shots
are independent. He has two pistols; one contains two bullets and the other contains only
one bullet He selects a pistol at random and shoots at the target until the pistol is empty
What is the probability of hitting the target exactly one time9

43 Rework Exercise 27 assuming that the balls are chosen with i eplacement

44 In a marble game a shooter may (A) miss, (B) hit one marble out and stick in the ring, or
(C) hit one marble out and leave the ring. If B occurs, the shooter shoots again.
and P(C) and these prob ibilities do not change from
If P(A) = p1 P(B) = P2 p shot to
shot, then express the probability of getting out exactly three marbles on one turn.
What is the probability of getting out exactly x marbles in one turn? Show that the
probability of getting one marble is greater than the probability of getting zero
marbles if

1 P2 Pl< 2_pl 45.

In the marble game in Exercise 44, suppose the probabilities depend on the
number of
marbles left in the ring, N. Let

10.2N 0.8N P(A) = P(B) = P(C) = N+1 N+1

N+1
Rework Exercise 44 under this assumption.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES 49

46. A, B, and C are events such that P(A) = 1/3, P(B) = 1/4, and P(C) = 1/5. Find

P(A ii B u C) under each of the following assumptions:

If A, B, and C are mutually exclusive.
If A, B, and C are independent.

47 A bowl contains four lottery tickets with the numbers 111 221 212 and 122 One ticket is
etermine
drawn at random from the bowl and A, is the event 2 in the ith place i 1 2 3 D
whether A, A2, and A3 are independent.

Code words are formed from the letters A through Z.

How many 26-letter words can be formed without repeating any letters?
How many five letter words can be formed without repeating any letters7
How many five-letter words can be formed if letters can be repeated?

49. License plate numbers consist of two letters followed by a four-digit number, such as
SB7904 or AY1637.
How many different plates are possible if letters and digits can be repeated? Answer
(a) if letters can be repeated but digits cannot. (e) How many of the plates in (b) have
a four-digit number that is greater than 5500?

50 In how many ways can three boys and three girls sit in a row if boys and girls must
alternate7

51. How many odd three-digit numbers can be formed from the digits 0, 1, 2, 3, 4 if digits can
be repeated, but the first digit cannot be zero?

52 Suppose that from 10 distinct objects four are chosen at random with replacement
What is the probability that no object is chosen more than once7 What is
the probability that at least one object is chosen more than once?

A restaurant advertises 256 types of nachos. How many topping ingredients must be
available to meet this claim if plain corn chips count as one type?

A club consists of 17 men and 13 women, and a committee of five members must be

chosen.How many committees are possible?

How many committees are possible with three men and two omen9 (e) Answer (b) if a
particular man must be included.

55 4 football coach has 49 players available for duty on a special kick receiving team
If 11 must be chosen to play on this special team, how many different teams are
possible? If the 49 include 24 offensive and 25 defensive players, what is the
probability that a randomly selected team has five offensive and six defensive
players?

56. For positive integers n > r, show the following:

(a)
()(n.r)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
50 CHAPTER 1 PROBABILITY

(n (n
1
(b)

r)r-1
57. Provide solutions for the following sums:

74
74\ 74\

76'\ 76'\ 16\ 76 o) +

2) + 4) +

(b), Hint: Use Exercise 56(b).

58. Seven people show up to apply for jobs as cashiers at a discount store.
If only three jobs are available, in how many ways can three be selected from the
seven applicants? Suppose there are three male and four female applicants, and all
seven are equally qualified, so the three jobs are filled at random. What is the
probability that the three hired are all of the same sex? In how many different ways
could the seven applicants be lined up while waiting for an interview? If there are
four females and three males, in how many ways can the applicants be lined up if
the first three are female?

59. The club in Exercise 54 must elect three officers: president, vice-president, and secretary.
How many different ways can this turn out? 60. How many ways can 10 students be lined up to get on a bus if
a particular pair of students
refuse to follow each other in line?

61. Each student in a class of size n was born in a year with 365 days, and each reports his or
her birth date (month and day, but not year).
How many ways can this happen? How many ways can this happen with no
repeated birth dates? What is the probability of no matching birth dates9 In a
class of 23 students, what is the probability of at least one repeated birth date?

62. A kindergarten student has 12

crayons.
How many ways can three blue, four red, and five green crayons be arranged in a row? How many ways can 12
distinct crayons be placed in three boxes containing 3, 4, and 5 crayons,
respectively?

63. How many ways can you partition 26 letters into three boxes containing 9, 11, and
6
letters
?

64. How many ways can you permute 9 a's, 11 b's, and 6 c's?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES 51

65. A contest consists of finding all of the code words that can be formed from the letters in
the name "ATARI," Assume that the letter A can be used twice, but the others at most
once.
How many five-letter words can be formed?
How many two-letter words can be
formed? How many words can be formed?

66. Three buses are available to transport 60 students on a field trip. The buses seat 15, 20,
and 25 passengers, respectively. How many different ways can the students. be loaded on
the buses?

67. A certain machine has nine switches mounted in a row. Each switch has three positions, a,

b, and c.How many different settings are

possible?
Answer (a) if each position is used three
times.

Suppose 14 students have tickets for a

concert.
Three students (Bob, Jim, and Tom) own cars and will provide transportation to the
concert. Bob's car has room for three passengers (nondrivers), while the cars owned
by Jim and Tom each has room for four passengers. In how many different ways can
the 11 passengers be loaded into the cars? At the concert hall the students are seated
together in a row. If they take their seats in random order, find the probability that
the three students who drove their cars have adjoining seats.

69. Suppose the winning number in a lottery is a four-digit number determined by drawing
four slips of paper (without replacement) from a box that contains nine slips numbered
consecutively i through 9 and then recording the digits in order from smallest to largest.
How many different lottery numbers are possible? Find the probability that the
winning number has only odd digits. How many different lottery numbers are
possible if the digits are recorded in the order they were drawn?

70. Consider four dice A, B, C, and D numbered as follows: A has 4 on four faces and O on
two faces; B has 3 on all six faces; C has 2 on four faces and 6 on two faces; and D has 5
on three faces and i on the other three faces. Suppose the statement A > B means that the
face showing on A is greater than on B, and so forth. Show that P[A > B] = P[B> C]
.P[C> D] = P[D > A] = 2/3. In other words, if an opponent chooses a die, you can
always select one that will defeat him with probability 2/3.

71 A laboratory test for steroid use in profesional athletes has detection rates given in the
following table:

Test Result
Steroid Use +
Yes .90 .10 No .01 .99
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
52 CHAPTER 1 PROBABILITY

If the rate of steroid use among professional athletes is lin 50:

What is the probability that a professional athlete chosen at random will have a negative test
result for steroid use? 1f the athlete tests positive, what is the probability that he has actually
been using steroids? 7 2 A box contains four disks that have different colors on each side. Disk
i is red and green, disk 2 is red and white, disk 3 is red and black, and disk 4 is green and
white. One disk is
selected at random from the box, Define events as follows: A = one side is
red, B = one side is green, C = one side is white, and D = one side is black.
Are A and B independent events? Why or why not? Are B
and C independent events? Why or why not? (e) Are any
pairs of events mutually exclusive? Which ones?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
RANDOM VARIABLES AND
THEI DISTRIBUTIONS

21
INTRODUCTIONOur purpose is to develop mathematical models for describing the

probabilities
of outcomes or events occurring in a sample space Because mathematical equa
tions are expressed in terms of numerical values rather, than as heads, colors, or
other properties, it is convenient to define a function, known as a random vari-
able, that associates each outcome in the experiment with a real number. We then
can express the probability model for the experiment in terms of this associated
random variable. Of course, in many experiments the results of interest already

are numerical quantities, and in that case the natural function to use as the

random variable would be the identity function.

Pefinition 21.1 Random Variable A random variable, say X, is a function defined over a sample space,
S, that associates a real number, X(e) = x, with each possible outcome e in S.

53
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
54 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Capital letters, such as X, Y, and Z will be used to denote random variables. The lower case letters
x, y, z, ... will be used to denote possible values that the corresponding random
variables can attain. For mathematical reasons, it will be necessary to restrict
the types of functions that are considered to be random variables. We will
discuss this point after the following example.

Example 2.1.1 A four-sided (tetrahedral) die has a different numberi, 2, 3, or 4affixed to

each side. On any given roll, each of the four numbers is equally likely to occur. A game consists
of rolling the die twice, and the score is the maximum of the two numbers that occur. Although
the score cannot be predicted, we can determine the set of possible values and define a random
variable. In particular, if e = (i, J) where i, j E {1, 2, 3, 4}, then X
(e) = max (i, J). The sample space,
S, and X are illustrated in Figure 2 1

FIG UflE 2.1 Sample space for two rolls of a four-sided die

(1,4) (2,4) (3,4) (4,4)

(1,3) (2,3) (3,3) (4,3)

s
(1,2) (2,2) (3,2) (4,2)

(1,1) (2,1) (3,1) (4,1)

2, B3, and B4 of S contains the pairs (i,

Each of the events B1, B J) that have a common

x=3overB3,andx=4overB4. maximum.
In other words, X has value x =

i over B1, x = 2 over B2, Other random variables also could be considered.

For example, the random variable

Y(e) = i + j represents the total on the two
rolls.

The concept of a random variable permits us to associate with any sample

space, S, a sample space that is a set of real numbers, and in which the events of
interest are subsets of real numbers. If such a real-valued event is denoted by A,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.1 INTRODUCTION
SS

then we would want the associated set

B={eleeS and X(e)eA} (2.1.1) to be an event in the underlying sample space S. Even
though A and B are subsets of different spaces, they usually are referred to as
equivalent events, and we writeP[X n A] = P(B) (2.1.2) The
notation Pr(A)
sometimes is used instead of P[X n A] in equation (2.1.2). This defines a set
function on the collection of real-valued events, and it can be shown to satisfy
the three basic conditions of a probability set function, as given by Definition
1.3.1. Although the random variable X is defined as a function of e, it usually

is possible to express the events of interest only in terms of the real values that

X assumes.
Thus, our notation usually will s'ippress the dependence on the
outcomes in S, such as we have done in equation (2.1.2).
For instance, in Example 2.1.1, if we were interested in the event of obtaining a score of "at most 3,"

this would correspond to X = 1, 2, or 3, or X E {1, 2, 3}. Another possibility

would be to represent the event in terms of some interval that contains the

values 1, 2, and 3 but not 4, such as A = (w, 3]. The associated equivalent

event in S i s B = B1 u B u B3, a nd the probability is P[X e A] = P(B) = 1/16 +

3/16 + 5/16 = 9/16.A convenient notation for P[X e A], in this example, is P[X

3]. Actually, any other real event containing 1, 2, and 3 but not 4 could be used in

this way, but intervals, and especially those of the form (- cx, x], will be of
special importance in developing the properties of random variables.
As mentioned in Section 1.3, if the probabilities can be determined for each
elementary event in a discrete sample space, then the probability of any event can
be calculated from these by expressing the event as a union of mutually exclusive

elementary events, and summing over their probabilities A more general

approach for assigning probabilities to events in a real sample space can be

based on assigning probabilities to intervals of the form (- ct, x] for all real

numbers x. Thus, we will consider as random variables only functions X that

satisfy the requirements that, for all real x, sets of the form

B = [X x] = {eje eS and X(e) e (oc, x]} (2.1.3) a re events in the sample space
S. The probabilities of other real events can be evaluated in terms of the
probabilities assigned to such intervals. For example, for the game of Example
2.1.1, we have determined that P[X 3] = 9/16, and it also follows, by a similar

argument, that F[X 2] = 1/4. Because (- , 2] contains I and 2 but not 3,

and (co, 3] = (cc, 2] u (2, 3], it follows that P[X = 3] = P[X 3 ] - P[X
2] = 9/16 - 1/4 = 5/16. Other
examples of random variables can be based on the
sampling problems of Section 1.6.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
56 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Example 2.1.2 In Example 1.6.15, we discussed several alternative approaches for computing the
probability of obtaining exactly two black marbles, when selecting five (without replacement) from a
collection of 10 black and 20 white marbles. Suppose we are concerned with the general problem of obtaining

x black marbles, for arbitrary x. Our approach will be to define a random variable X as the number of

black marbles
in the sample, and to determine the probability P[X x] for every possible value x. This is

easily accomplished with the approach given by equation (1.6.8), and the result is(lo'\( 20

xÂ5-
P[X = x = 0, 1,2, 3,4, 5 (2.1.4)

Random variables that arise from counting operations, such as the random variables in Examples 2. 1.1.

and 2.1.2, are integer-valued. Integer-valued ran- dom

variables are examples of an important special type
known as discrete random variables.

2.2 DISCRETE RANDOM VARIABLES(30

Definition 2.2.1 If the set of all possible values of a random variable, X, is a countable set, x1, x2.....x, or
x1, x2, ..., then X is called a discrete random variable. The function
f(x)=P[X = x] x = x1, x2, ... (2.2.1)
that assigns the probability to each possible value x will be called the discrete probability density function (discrete pdf).
If it is clear from the context that X is discrete, then we simply will say pdf. Another common terminology
is probability mass function (pmf), and the possible values, x, are called mass points of X. Sometimes a

subscripted notation, f(x), is u sed.The following theorem gives general properties that any discrete pdf

must satisfy.

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2,2 DISCRETE RANDOM VARIABLES 57

Th'rem 2.2.1 A function f(x) is a discrete pdf if and only if it satisfies both of the following
properties for at most a countably infinite set of reals Xj, x2,.

f(x1) 0 (2.2.2)

for all x, andf(x1) = 1 (2.2.3)

all
x

Proo
f
Property (2.2.2) follows from the fact that the value of a discrete pdf is a probability and must be

nonnegative. Because x1, x2, ... represent all possible values of X, the events

{X = x1], [X = ... constitute an exhaustive partition of the sample

space. Thus,

f(x1) = P[X = allx allx

Consequently, any pdf must satisfy properties (2.2.2) and (2.2.3) and any func-
tion that satisfies properties (2.2.2) and (2.2.3) will assign probabilities consis-
tent with Definition 1.3.1.

In some problems, it is possible to express the pdf by means of an equation, such as equation

(2.1.4). However, it is sometimes more convenient to express it in tabular form

For example, one way to specify the pdf of X for the random variable
X in
Example 2.1.1 is given in Table 2.1.

TABLE 2.1

mum of
the 2

f(x) 1/16 3/16 5/16 7/16

Of course, these are the probabilities, respectively, of the events B, B2, B3, and B4 in S.
A graphic representation off(x) is also of some interest. It would be possible to leave f(x) undefined
at points that are not possible values of X, but it is convenient to define f(x)
as zero at such points. The graph of the pdf in Table 2.1 is shown in Figure
2.2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
58 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

FIGURE 2.2 Discrete pdf of the maximum of two roHs of a four-sided die

f(x
)

T
Î X
ExmpIe 2.2i Example 2.1.1 involves two rolls of a four-sided die. Now we will roll a
1 2

12-sided (dodecahedral) die twice. If each face is marked with an integer, i through 12, then
each value is equally likely to occur on a single roll of the die. As before, we define a random

variable X to be the maximum obtained on the two rolls. It is not hard to see that for each

value x there are an odd number, 2x - 1, of ways iòr

that value to occur. Thus, the pdf of X must
have the form

f(x)=c(2x 1) for x= 1,2, 12 (224)

One way to determine c would be to do a more complete analysis of the counting
problem but another way would be to use equation (22 3) In particular

1= x1 12

f(x)=c(2x-1)= x=1 12 =

C[ [2(12)(13)
2 12
c(12)2
- =

So c = 1/(12)2 = 1/144.

As mentioned in the last section, another way to specify the distribution of probability is to assign

probabilities to intervals of the form (- cc, x], for all real x The probability

assigned to such an event is given by a funclion called the cumulative

distribution function

Definition 2.2.2
The cumulative distribution function (CDF) of a random variable X is defined for
any real x by

F(x) = P[X x] (2.2.5)

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.2 DISCRETE RANDOM VARIABLES 59

FiGURE 2.3 The CDF of the maximum of two rolls of a four-sided die

F(x)

9/16
-

1/4

1/16 X

The function F(x) often is referred to simply as the stribution function of X, and the subscripted

notation, F(x), sometimes is used. a particular For

brevity, form we
often is

appropriate. will
use a short If we notation
write X to
indicate f(x) or that
X -

a distribution F(x), this will of

mean that the random variable X has pdf f(x)

and CDF F(x). As seen in Figure 2.3, the CDF of the distribution given in

Table 2.1 is a nondecreasing

step function. The step-function form of F(x) is
common to all discrete distributions, and the sizes of the steps or jumps in
the graph of F(x) c orrespond to the values of f(x) at those points. This is easily

seen by comparing Figures 2.2 and 2.3. The general relationship between F(x)

and f(x) for a discrete distribution is given

by the following theorem.

Thsorem 2.2.2 Let values X

be of a X discrete are
indexed random
in variable
increasing with

order, pdff(x)
x1 and
<x2 CDF
<x3 F(x).
< If .., the
then possible
f(x1)
= F(x1), and for any i> 1,

f(x) = F(x) - F(x1) (2.2.6)

Furthermore, if x <x1 then F(x) = O, and for any other real x
F(x) = f(x) ( 2.2.7)
Xi X

where the summation is taken over all indices i such that x x.

The CDF of any random variable must satisfy the properties of the following theorem.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
60 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Theorem 2.2.3 A function F(x) is a CDF for some random variable X if and only if it satisfies
the following
properties:

um F(x) = O (2.2.8)

um F(x) = i (2.2.9) X -.11m F(x + h) = F(x) (2.2.10) li -'0

a < b implies F(a) F

(b) (2.2.11)

The first two properties say that F(x) can be made arbitrarily close to O or i by taking x arbitrarily
large, and negative or positive, respectively. In the examples considered so far,
it turns out that F(x) actually assumes these limiting values. Property (2.2.10)

says that F(x) is continuous from the right. Notice that in Figure 2.3 the only

discontinuities are at the values 1, 2, 3, and 4, and the limit as x approaches

these values from the right is the value of F(x) at these values. On the other

hand, as x approaches these values from the left, the limit of F(x) is the

value of F(x) on the lower step, so F(x) is not (in general) continuous from the

left. Property (2.2.11) says that F(x) is nondecreasing, which is easily seen to be the

interval case of in Figure the form 2.3. (- In cxc, general, b] can be this

represented property follows as the union from the of two fact disjoint that

an intervals(cc,b]=(c13,a] u(a,b] (2.2.12)

for any P[a < x a <b. It follows that b] 0, and thus equation F(b)
F(a) + P[a
=

<x (2.2.11) is obtained. b]

F(a), because Actually, by this argument we have

obtained another very useful result, namely.

P[a <X b] = F(b) - F(a) (2.2.13)

This reduces the problem of computing probabilities for events defined in terms
of intervals of the form (a, b] to taking differences with F(x). Generally, it is

somewhat easier to understand the nature of a random variable and its

probability distribution by considering the pdf directly, rather than the CDF,

although the CDF will- provide a good basis for defining continuous prob-

ability distributions. This will be considered in the next section. Some important

properties of probability distributions involve numerical quantities

called
expected values.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.2 DISCRETE RANDOM VARIABLES
61

Definition 22..3
If X is a discrete random vaiabIe with pdff(x), then the expected value of X is

defined byE(X) = xf(x) (2.2.14)

The sum (2.2.14) is understood to be over all possible values of X. Further-

more, it is an ordinary sum if the range of X is finite, and an infinite series if the

range of X is infinite In the latter case, if the infinite series is not absolutely
convergent, then we will say that E(X) does not exist. Other common notations
for E(X) include u, possibly with a subscript, The terms mean and expectation

also are often used. The

mean or expected value o a random variable is a
"weighted average," and it can be considered as a measure of the "center" of
the associated probability distribution.

Example 2.2.2 A box contains four chips. Two are labeled with the number 2, one is labeled
with a 4, and the other with an 8. The average of the numbers on the four chips is (2 + 2 + 4 + 8)/4
= 4. The experiment of choosing a chip at random and recording its number can be associated
with a discrete random variable X having dis- t inct values x = 2, 4 , or 8, with f(2) = 1/2 and f(4)
= f(8) = 1/4. The corresponding expected value or mean is

4(i) +
= E(X) = 2(i) + 8() = 4
as before. Notice that this also could model selection from a larger collection, as
long as the possible observed values of X and the respective proportions in the
collection,f(x), remain the same as in the present example.

There is an analogy between the distribution of probability to values, x, and the distribution of
mass to points in a physical system For example, if masses of 0.5, 0.25, and 0.25
grams are placed at the respective points x = 2, 4, and 8 cm on the horizontal

axis, then the value 2(0.5) + 4(0.25) + 8(0.25) = 4 is the "center of mass' or

balance point of the corresponding system This is illustrated in Figure

2.4.

FIGURE 2.4 The center-of-mass interpretation of the mean

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
62 CHAPTER 2 RANDOM VARIABLES AND THDR DSTRIBUTI0NS

2J
In the previous example E(X) coincides with one of the possible values of X, but this is not always the
case, as illustrated by the following example.

A game of chance is based on drawing two chips at random without replacement

from the box considered in Example 2.2.2. If the numbers on the two chips
match, then the player wins $2; otherwise, she loses $1. Let X be the amount won

by the player on a single play of the game. There are only two possible values,

X = 2 if both chips bear the number 2, and X = 1 otherwise. Furthermore,

there are () = 6 ways to draw two chips, and only one of these outcomes corre-

spond to a match. The distribution of X isf(2) = 1/6 andf(-1) = 5/6, and consequently the expected

amount won is E(X) = (- 1X5/6) + (2)(1/6) = - 1/2. Thus, the

expected amount

"won" by the player is actually an expected loss of one-half dollar. The

connection with long-term relative frequency also is well illustrated by this
example. Suppose the game is played M times in succession, and denote the
relative frequencies of winning and losing by and fL, respectively. The average

amount the player wins is (- l)fL + (2)f. Because of statistical regularity, we

have that IL and f approach f( - 1) and f(2), respectively, and thus the player's
average winnings approach E(X) as M approaches infinity.
Notice also that the game will be more equitable if the payoff to the player is changed to $5 rather

than $2, because the resulting expected amount won then will be (- 1X5/6) +

(5X1/6) = O. In general, for a game of chance, if the net amount

won by a
player is X, then the game is said to be a fair game if E(X) = O.

CONTINUOUS RANDOM VARIÍI.ES

The notion of a discrete random variable provides an adequate means of prob-
ability modeling for a large class of problems, including those that arise from the
operation of counting. However, a discrete random variable is not an adequate

model in many situations, and we must consider the notion of a continuous

random variable. The CDF defined earlier remains meaningful for continuous
random variables, but it also is useful to extend the concept of a pdf to contin-
uous random variables.

Example 2.3.1 Each work day a man rides a bus to his place of business. Although a new bus
arrives promptly every five minutes, the man generally arrives at the bus stop at a random time
between bus arrivals. Thus, we might take his waiting time on any given morning to be a random
variable X.
Although in practice we usually measure time only to the nearest unit (seconds, minutes, etc.), in theory
we could measure time to within some arbitrarily small unit. Thus, even though
in practice it might be possible to regard X as a discrete
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
FIGURE 2.5
2.3 CONTINUOUS RANDOM VARIABLES 63

ng time for a bus

random variable with possible values determined by the smallest appropriate

time unit, it usually is more convenient to consider the idealized situation in

which X is assumed capable of attaining any value in some interval, and not just
discrete points.
Returning to the man waiting for his bus, suppose that he is very observant and noticed over the

years that the frequency of days when he waits no more than the form x

minutes F(x) = for P[X

bus x] is
the = proportional
cx, for some to
constant x
for all c x.
> This O.
Because suggests
the a CDF buses of arrive at regular

five-minute intervals, the range of possible values of X is the time interval

[0, 5]. In other words, P[O X 5] = 1, and it follows that that i = F(5)

F(x)=Oifx<O = c 5, and thus a nd c F(x)= = 1/5, and lifx>5. F(x) = x/5

if O x 5. It also follows Another way to study this distribution would be to

observe the relative fre- quency

of bus arrivals during short time intervals of the
same length, but distrib- uted throughout the waiting4ime interval [0, 5]. It may
be that the frequency of bus arrivals during intervals of the form (x, x + Ax] for
small Ax was proportional to the length of the interval, Ax, regardless of the
value of x. The corresponding condition

P[x<X'x+Ax]=F(x+Ax)F(x)=cAx this
imposes on the distribution
of X is for all O x < x + Ax 5 and some c > 0. Of course, this implies that if F(x)

is differentiable at x, its derivative is constant, F'(x) = c > O. Note also that

for x P[x < O <X or x> x + 5, Ax] the = O when d erivative x and x also +

Ax e xists, are not possible b ut F'(x) = O because values of X, and the

derivative
does not exist at all at x = O or 5.

In general, if F(x) is the CDF of a continuous random variable X, then we will denote its derivative
(where it exists) byf(x), and under certain conditions, which will be specified
shortly, we will callf(x) the probability density function of X. In our example,
F(x) can be represented for values of x in the interval [0, 5] as the integral of its
derivative:

F(x) = Jf(t) dt = dt =

The graphs of F(x) andf(x) are shown in Figure
2.5.

F(x) f x)
X
o 5
o 5
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
64 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

This provides a general approach to defining the distribution of a continuous random variable X.

Definition
23.1
A random variable X is called a continuous random variable if there is a function f(x), called the probability
density function (pdf) of X, such that the CDF can be represented as

rx
F(x)=J f(t)dt (2.3.1) -

In more advanced treatments of probability, such distributions sometimes are called "absolutely
continuous" distributions. The reason for such a distinction is that CDFs exist
that are continuous (in the usual sense), but which cannot be represented as
the integral of the derivative. We will apply the terminology con tinuous
distribution only to probability distributions that satisfy property (2.3.1).
Sometimes it is convenient to use a subscripted notation, Fx(x) and f(x), for the CDF and pdf,
respectively.
The defining property (2.3.1) provides a way to derive the CDF when the pdf is given, and it follows by
the Fundamental Theorem of Calculus (hat the pdf can be obtained from the
CDF by differentiation. Specifically,

f(x) = f- F(x) = F(x) (2.3.2)

wherever the derivative exists. Recall from Example 2.3.1 that there were two
values of x where the derivative of F(x) did not exist. In general, there may be
many values of x where F(x) is not differentiable, and these will occur at discon
tinuity points of the pdf, f(x). Inspection of the graphs of f(x) and F(x) in Figure
2.5 shows that this situation occurs in the example at x = O and x 5. However,
this will not usually create a problem if the set of such values is finite, because an

integrand can be redefined arbitrarily at a finite number of values x without

affecting the value of the integral. Thus, the function F(x), as represented in pro-
perty (2.3.1), is unaffected regardless of how we treat such values. It also follows
by similar considerations that events such as [X = c], where c is a constant, will

have probability zero when X is a continuous random variable. Consequently,

events of the form [X e I], where J is an interval, are assigned the same probabil-

ity whether ¡ includes the endpoints or not. In other words, for a continuous
random variable X, if a < b,
P[a <X b] = P[a X < b] = P[a <X < b]
= P[a X b ] (2.3.3) and each of these has the value F(b) - F(a).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.3 CONTINUOUS RANDOM VARIABLES 65

Thus, the CDF, F(x), assigns probabilities to events of the form (- cia, x], and equation (2.3.3) shows

how the probability assignment can be extended to any interval. Any function

f(x) may be considered as a possible candidate for a pdf if it produces

a
legitimate CDF when integrated as in property (2.3.1). The following theorem
provides conditions that will guarantee this

Theorem 231 A functionf(x) is a pdf for some continuous random variable X if and only if it
satisfies the properties

f(x) ? O (2.3.4)

for all real x,

and

roe
= 1 (2.3.5) P
Jf(x) - dx roof

Properties (2.2.9) and (2.2.11) of a CDF follow from properties (2.3.5) and (2.3.4),
respectively. The other properties follow from general results about integrals.

Example 2.3.2 A machine produces copper wire, and occasionally there is a flaw at some point
along the wire. The length of wire (in meters) produced between successive flaws is a continuous
random variable X with pdf of the form
(2.3.6
)

+x) x>O
xO

where c is a constant. The value of c can be determined by means of property

(2.3.5). Specifically, set

i = Jf(x) d x = Jc(1 + x) dx =

which is obtained following the substitution u = i + x and an application of the

power rule for integrals. This implies that the constant is c = 2.
Clearly property (2.3.4) also is satisfied in this case.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
66 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS
The CDF for this random variable is given by
F(x) = P[X x] = Jf(t) dt

Odt+Ç2(1+t)3dt x Jo Odt x0 J1(1+x)2 x>0x0 Probabilities

of intervals, such as P[a X

b], can be expressed directly in terms of the CDF or as integrals of the pdf. For example, the probability
that a flaw occurs between 0.40 and 0.45 meters is given by

rO.45 P[0.40 X 0.45] = f(x)

dx = F(0.45) - F(0.40) = 0.035
)O.4O

Consideration of the frequency of occurrences over short intervals was suggested as a possible way to

study a continuous distribution in Example 23.1. This approach provides some insight into the general

nature of continuous distributions. For example, it may be observed that the frequency of occurrences

over short intervals of length Ax, say [x, x + Ax], is at least approximately proportional to the length of

the interval, Ax, where the proportionality factor depends

on x, sayf(x). The condition this imposes on
the distribution of X is
P[x X x + Ax] = F(x + Ax) - F(x)
f(x) Ax (2.3.7) where the error in the approximation is negligible relative to the length of the interval, Ax.

This is illustrated in Figure 2.6. for the copper wire example. The exact probability in equation (2.3.7) is

represented by the area of the shaded

region under the graph of f(x), while the approximation is the area
of the corresponding rectangle with heightf(x) and width Ax.
The smaller the value of Ax, the closer this approximation becomes. In this sense, it might be reasonable
to think of f(x) as assigning probability density' for the distribution of X, and the term probability density
function seems appropriate forf(x). In other words, for a continuous random variable X,f(x) is not a

probability, although it does determine the probability assigned to arbitrarily small

intervals. The area
between the x-axis and the graph of f(x) assigns probability to intervals, so that for a <b,

P[aXb] f(x) dx (2.3.8)

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION
PDFCompressor
(2.3.9) 2.3 CONTINUOUS RANDOM VARIABLES
67
FIGURE 2.6 Continuous assignment of probability by pdf
In Example 2.3.2, we could take the probability that the length between successive flaws between 0.40 and
0.45 meters to f(0 40)(0 05) = 2(1 4)-(0 05) = 0036, or we could integrate the be
pdf approximately
between

the limits
0 40 and 0 45 to obtain the exact answer 0 035 For longer intervals inte gratingf(x) as in equation
(2.3.8) would be more reasonable.
Note that in Section 2.2 we referred to a probability density function or density function for a discrete
random variable, but the interpretation there is different, because probability is assigned at discrete points
in that case rather than in a continuous manner. However, it will be convenient to refer to the "density func-

tion" or pdf in both continuous and discrete cases, and to use the same notation, f(x) or f(x), in the later

chapters of the book. This will avoid the necessity of separate

statements of general results that apply to
both cases.
The notion of expected value can be extended to continuous random variables.
De finition 2.3.2
If X is a continuous random variable with pdff(x), then the expected value of X is defined byE(X) = J - xf(x)
dx if the
integral in equation (2.3.9) is absolutely convergent. Otherwise we say that E(X) does not exist.

As in the discrete case, other notations for E(X) are ¡t or ¡t, and the terms mean or expectation of X also

are commonly used. The center-of-mass analogy is

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
68 CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS

still valid in this case, where mass is assigned to the x-axis in a continuous
manner and in accordance with f(x). Thus, ¡L can also be regarded as a central
measure for a continuous distribution.
In Example 2.3,2, the mean length between flaws in a piece of wire is

CO
¡=JxOdx+J x2(l+x)3dx

- OIf we make the

substitution t = i + x, then

dt =2(1 )= Other properties of probability distributions can be described

in terms of quantities called percentiles.

Definition 232
If O <p < 1, then a 100 x pth percentile of the distribution of a continuous random
variable X is a solution x to the equation

F(x)=p (2.3.10)

In general, a distribution may not be continuous, and if it has a discontinuity, then there will be some
values of p for which equation (2.3.10) has no solution. Although we
emphasize the continuous case in this book, it is possible to state a general
definition of percentile by defining a pth percentile of the distribution of X to

be a value x, such that P[X xi,] p and P[X x,,] ? i - p. In essence, x is a value

such that 100 x p percent of the population values are at most x, and 100 x (1

p) percent of the population values are at least xi,. This

is illustrated for a
continuous distribution in Figure 2 7 We also can think in terms of a
proportion p rather than a percentage 100 x p of the population, and in this
context x is called a pth qiientile of the distribution.

FIGURE 2.7 A 100 x pth percentile

Jawaban Latihan PSM Bab 12
No ratings yet
Jawaban Latihan PSM Bab 12
10 pages
Interval Estimation for Statisticians
No ratings yet
Interval Estimation for Statisticians
13 pages
Group 4 - Summary PSM
No ratings yet
Group 4 - Summary PSM
33 pages
Nilai Harapan Dan MGF Bersamaa
No ratings yet
Nilai Harapan Dan MGF Bersamaa
38 pages
Notes
No ratings yet
Notes
306 pages
East Java Population Projection Model
No ratings yet
East Java Population Projection Model
11 pages
2020-2021 EDA 101 Handout
No ratings yet
2020-2021 EDA 101 Handout
192 pages
Numerical Differentiation Guide
No ratings yet
Numerical Differentiation Guide
43 pages
Answ Exam Ibp 07
No ratings yet
Answ Exam Ibp 07
16 pages
Graph Theory and Its Applications, Third Edition Jonathan L. Gross Download
No ratings yet
Graph Theory and Its Applications, Third Edition Jonathan L. Gross Download
113 pages
TA: Yury Petrachenko, CAB 484, Yuryp@ualberta - Ca, HTTP://WWW - Ualberta.ca/ Yuryp
No ratings yet
TA: Yury Petrachenko, CAB 484, Yuryp@ualberta - Ca, HTTP://WWW - Ualberta.ca/ Yuryp
6 pages
Tabel F (Sidney Siegel Nonparametric Statistics For The Behavioral Sciences)
No ratings yet
Tabel F (Sidney Siegel Nonparametric Statistics For The Behavioral Sciences)
2 pages
Scan 18-Sep-2019
No ratings yet
Scan 18-Sep-2019
34 pages
Markov Chain Exercises
No ratings yet
Markov Chain Exercises
3 pages
Syntax SAS Fungsi Transfer Multi Input
No ratings yet
Syntax SAS Fungsi Transfer Multi Input
1 page
Generalized Method of Moments Estimation: Econometrics C Lecture Note 8 Heino Bohn Nielsen November 20, 2013
No ratings yet
Generalized Method of Moments Estimation: Econometrics C Lecture Note 8 Heino Bohn Nielsen November 20, 2013
31 pages
Siegel-Tukey Test: Non-Parametric Scale Analysis
No ratings yet
Siegel-Tukey Test: Non-Parametric Scale Analysis
21 pages
Bell Number
No ratings yet
Bell Number
8 pages
Ariesta Solusi
No ratings yet
Ariesta Solusi
6 pages
Bartle Intro Realanalysis Eds4-232-241
No ratings yet
Bartle Intro Realanalysis Eds4-232-241
10 pages
Assignment1 (Group6)
No ratings yet
Assignment1 (Group6)
7 pages
Jaw Aban Exercise Bab 2
No ratings yet
Jaw Aban Exercise Bab 2
32 pages
Lampiran Syntax R Sarima
No ratings yet
Lampiran Syntax R Sarima
4 pages
Statistics and Combinatorics Guide
No ratings yet
Statistics and Combinatorics Guide
28 pages
L1 Solutions
No ratings yet
L1 Solutions
15 pages
Discrete Mathematics Tutorial 5
No ratings yet
Discrete Mathematics Tutorial 5
6 pages
Compound Poisson Process PDF
No ratings yet
Compound Poisson Process PDF
12 pages
Statistics Homework Analysis
No ratings yet
Statistics Homework Analysis
7 pages
Practical Statistics For Educators 6th Edition Ruth Ravid Download Full Chapters
0% (1)
Practical Statistics For Educators 6th Edition Ruth Ravid Download Full Chapters
116 pages
Actuarial Exam LTAM Guide
No ratings yet
Actuarial Exam LTAM Guide
7 pages
Fuzzy Hungarian Method For Solving Intuitionistic Fuzzy
No ratings yet
Fuzzy Hungarian Method For Solving Intuitionistic Fuzzy
7 pages
Chap 4 Act
100% (1)
Chap 4 Act
49 pages
6.1 There Is An Urn Containing 9 Balls, Which Can Be Either Green or Red. The Number of Red Balls in The
No ratings yet
6.1 There Is An Urn Containing 9 Balls, Which Can Be Either Green or Red. The Number of Red Balls in The
6 pages
Applications of Set Theory
No ratings yet
Applications of Set Theory
15 pages
Statistical Sampling Analysis
No ratings yet
Statistical Sampling Analysis
4 pages
Models For Quantifying Risk 6th Edition Stephen J. Camilli PDF Download
100% (6)
Models For Quantifying Risk 6th Edition Stephen J. Camilli PDF Download
101 pages
Matlab Code of Newton Gregory Interpolation
100% (2)
Matlab Code of Newton Gregory Interpolation
2 pages
6.21, 7.3, 7.4 - 103046
No ratings yet
6.21, 7.3, 7.4 - 103046
3 pages
Full CHapter 4 Solutions
No ratings yet
Full CHapter 4 Solutions
44 pages
EngMath4 Chapter12
No ratings yet
EngMath4 Chapter12
39 pages
Probability of Drawing Aces, Kings, Queens
100% (1)
Probability of Drawing Aces, Kings, Queens
2 pages
Tugas UAS Multilevel (Nestria Agista)
No ratings yet
Tugas UAS Multilevel (Nestria Agista)
31 pages
3 Sequences
No ratings yet
3 Sequences
15 pages
Syntax SAS Untuk Metode Fungsi Transfer Multi Input Dengan Deteksi Outlier
No ratings yet
Syntax SAS Untuk Metode Fungsi Transfer Multi Input Dengan Deteksi Outlier
7 pages
Poset & Lattice
No ratings yet
Poset & Lattice
15 pages
Penerapan Tabel Mortalita Di Excel
No ratings yet
Penerapan Tabel Mortalita Di Excel
3 pages
Chapter 1 Survival Distributions and Life Tables
100% (1)
Chapter 1 Survival Distributions and Life Tables
14 pages
Fuzzy Expert Systems Overview
No ratings yet
Fuzzy Expert Systems Overview
54 pages
Iterative Methods in Linear Algebra
No ratings yet
Iterative Methods in Linear Algebra
19 pages
Diagonalization and Quadratic Forms: Kkkq1223 Engineering Mathematics 2 (Linear Algebra)
No ratings yet
Diagonalization and Quadratic Forms: Kkkq1223 Engineering Mathematics 2 (Linear Algebra)
29 pages
Lesson 4 Sols
No ratings yet
Lesson 4 Sols
20 pages
Week 3 Modifikasi Posisi palsu-dan-Metode-Newton 9543 0
No ratings yet
Week 3 Modifikasi Posisi palsu-dan-Metode-Newton 9543 0
14 pages
Tabel Mortalitas
No ratings yet
Tabel Mortalitas
23 pages
Epsilon Delta Proofs
No ratings yet
Epsilon Delta Proofs
2 pages
Introduction To Probability and Mathematical Statistics
93% (29)
Introduction To Probability and Mathematical Statistics
662 pages
Lee J. Bain, Max Engelhardt-Introduction To Probability and Mathematical Statistics (2000)
67% (3)
Lee J. Bain, Max Engelhardt-Introduction To Probability and Mathematical Statistics (2000)
658 pages
Bain Engelhardt
100% (2)
Bain Engelhardt
658 pages
Demographic and Health Data Analysis
No ratings yet
Demographic and Health Data Analysis
22 pages
Transfer of Drug Dissolution Testing by Statistical Approaches: Case Study
No ratings yet
Transfer of Drug Dissolution Testing by Statistical Approaches: Case Study
9 pages
Advanced ML: Math Foundations
No ratings yet
Advanced ML: Math Foundations
46 pages
The Welfare State As Piggy Bank Information Risk Uncertainty and The Role of The State 1st Edition Barr Download
100% (5)
The Welfare State As Piggy Bank Information Risk Uncertainty and The Role of The State 1st Edition Barr Download
127 pages
The Effect of School Administration and Educational Supervision On Teachers Teaching Performance: Training Programs As A Mediator Variable
100% (1)
The Effect of School Administration and Educational Supervision On Teachers Teaching Performance: Training Programs As A Mediator Variable
17 pages
International Baccalaureate (IB) : Artificial Neural Networks - #3
No ratings yet
International Baccalaureate (IB) : Artificial Neural Networks - #3
13 pages
Bold
No ratings yet
Bold
12 pages
Recognizing Facial Expressions Through Tracking: Salih Burak Gokturk
No ratings yet
Recognizing Facial Expressions Through Tracking: Salih Burak Gokturk
22 pages
Lot Acceptance Sampling Plan
100% (1)
Lot Acceptance Sampling Plan
17 pages
Econometrics: Wages & Unemployment
No ratings yet
Econometrics: Wages & Unemployment
9 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
Martín Albo, J., Núñez, J., Navarro, J., & Grijalvo, F. (2007)
No ratings yet
Martín Albo, J., Núñez, J., Navarro, J., & Grijalvo, F. (2007)
11 pages
Action Research
No ratings yet
Action Research
12 pages
Class: Ix Subject: Mathematics Assignment 12: Statistics
100% (1)
Class: Ix Subject: Mathematics Assignment 12: Statistics
2 pages
Quantitative Methods
No ratings yet
Quantitative Methods
5 pages
Tray Dryer
No ratings yet
Tray Dryer
11 pages
A Study of Cramming For Succes in Present
No ratings yet
A Study of Cramming For Succes in Present
14 pages
Statistics & Probability Assignment
No ratings yet
Statistics & Probability Assignment
5 pages
Coconut Shell Reinforced Cement Bricks
No ratings yet
Coconut Shell Reinforced Cement Bricks
27 pages
The Problem-Solving Skills of Senior High School S
No ratings yet
The Problem-Solving Skills of Senior High School S
8 pages
Adavanced Qualitative Research Methods Versus Advanced Quantitative Research Methods
No ratings yet
Adavanced Qualitative Research Methods Versus Advanced Quantitative Research Methods
13 pages
CNN Basics and Components Guide
No ratings yet
CNN Basics and Components Guide
132 pages
Teachers Facilitating Strategies in Conducting Science Investigatory Project
No ratings yet
Teachers Facilitating Strategies in Conducting Science Investigatory Project
20 pages
Predicting The Compresive Strength of Concrete
No ratings yet
Predicting The Compresive Strength of Concrete
9 pages
Research Article: Tesfaye Mekonnen Sifan
No ratings yet
Research Article: Tesfaye Mekonnen Sifan
12 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Kadi Sarva Vishwa Vidhyalaya Guidelines For Summer Project Report
No ratings yet
Kadi Sarva Vishwa Vidhyalaya Guidelines For Summer Project Report
5 pages
Subject-Wise High Yield Topics - Friday Special Session
No ratings yet
Subject-Wise High Yield Topics - Friday Special Session
21 pages
Concrete Delivery Optimization
No ratings yet
Concrete Delivery Optimization
21 pages
Prosocial Behavior in Tamu Schools
No ratings yet
Prosocial Behavior in Tamu Schools
11 pages

Introduction To Probability and Mathematical Statistics

Uploaded by

Introduction To Probability and Mathematical Statistics

Uploaded by

PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION

X WEI(0,ß) ​0<0 ​0<ß

xe' ​or(i +) ​02[(t ​ - +​

'ç1 ​ (-2)( ​02K

X''2 'e'2 ​ 2v ​I 2t ​v=1,2,... ​0<x

Lee J. Bain ​University of

Max Engelhardt ​University of

COPYRIGHT © 1992, 1987 by Brooks/Cole ​Dux bury is an imprint of

For permission to use material from this work, contact us by

Printed in the United States of America

Library of Congress Cataloging-in-Publication Data

bibliographical references and index. ISBN

RANDOM VARIABLES AND THEIR

SPECIAL PROBABILITY DISTRIBUTIONS ​90

JOINT DISTRIBUTIONS ​136

PROPERTIES OF RANDOM VARIABLES 171

5.1 ​Introduction ​. ​171 5.2

FUNCTIONS OF RANDOM VARIABLES 193

LIMITING DISTRIBUTIONS ​231

STATISTICS AND SAMPLING

* Advanced (or optional) topics

POINT ESTIMATION 288

​ 35 ​10.2 ​Sufficient statistics ​337 10.3

statistics ​342 10.4

This book provides an introduction to probability and mathematical statistics.

edition. Special distributions have been placed in Chapter 3. Chapter

on general linear models in Section

a book that could be used as a textbook for a two-semester

are covered in a one-semester introductory course in probabil- ​ity, while

Chapters 8 through 12 contain standard topics in mathematical sta- tistics.

chapters tend to be more methods-oriented. Chapters 15 and 16 cover

choice of topics covered or the amount of time allotted to topics if the

desired ​material is to be completed in a two-semester course. ​lt is our hope that

We gratefully acknowledge the numerous suggestions provided by the

Dean H. Fearn ​Alan M. Johnson ​Calfornia ​State UniversityHayward

Joseph Glaz ​Benny P. Lo ​University of Connecticut ​Ohione

Victor Goodman ​D. Ramachandran ​Rensselaer Polytechnic Institute ​California

Shu-ping C. Hodgson ​Douglas A. Wolfe Central Michigan

Robert A. Hultquist Linda J. Young ​Pennsylvania State University

INTRODUCTION​In any scientific study of a physical phenomenon, it is desirable to ​have a

uncontrolled variables, such as air temperature or humidity,

experiment. Furthermore, we may not have sufficient knowledge to derive a

an experiment may consist of observing the number of particles

assume that the reader has some knowledge

phenomenon. A performance of an experiment is called a trial of the experiment,

and an observed result is called an outcome. This terminology is rather general,

of chance. Our primary interest will be in situations where there is uncertainty

S = {HH, HT, TH, TT}

would be the set of nonnegative

A sample space S is said to be finite if it consists of a finite number of out-

discrete sample space is not an appropriate model. Other, ​more complicated

heat energy (in joules), are measured. An appropriate sample

S=[O,co)x[O,cx)={(x,y)fOx<cc and Oy<cc} ​Each variable would be capable of

convenient to consider a conceptual model in which the variables ​are not

bounded. If the likelihood of the variables in the conceptual model

the conceptual model.

Example 1.2.6 A thermograph is a machine that records temperature continuously by tracing a

graph on a roll of paper as it moves through the machine. A thermographic recording

would be a collection of such functions.

contains the outcome that occurred.

To illustrate this concept, consider Example 1.2.1. The subset

A = {HH, HT, TH}

​ is a finite collection of events, occurrence of an ​k ​outcome in the intersection

the intersection and A1 u A2 u ​(or ​Y A)

Definition 1.2.4 ​An event is called an elementary event if it contains

though, when defining probability it is desirable to restrict the types

in the following developments when two events correspond

RELATIVE FREQUENCY ​For the experiment of tossing a coin, we may

declare that the probability of ​obtaining a head is 1/2 This could be

interpreted in terms of the relative fre quency

approximately one-half of the tosses. At least conceptually,

a head to be this conceptual limiting value. For a balanced

experiment, then fA ​= ​m(A)/M represents the relative frequency

performed, using a "random number generator" on a ​computer. In Figure ​1.1, t​ he relative

​ ​= {1}, ​A2 ​= {2}, ​and so on are represented as the heights of

property require further drcussion For example, it is not clear sense,

X WEI(0,ß) 0<0 0<ß

xe' or(i +) 02[(t - +

'ç1 (-2)( 02K

X''2 'e'2 2v I 2t v=1,2,... 0<x

Lee J. Bain University of

Max Engelhardt University of

COPYRIGHT © 1992, 1987 by Brooks/Cole Dux bury is an imprint of

SPECIAL PROBABILITY DISTRIBUTIONS 90

JOINT DISTRIBUTIONS 136

5.1 Introduction . 171 5.2

LIMITING DISTRIBUTIONS 231

35 10.2 Sufficient statistics 337 10.3

statistics 342 10.4

are covered in a one-semester introductory course in probabil- ity, while

desired material is to be completed in a two-semester course. lt is our hope that

Dean H. Fearn Alan M. Johnson Calfornia State UniversityHayward

Joseph Glaz Benny P. Lo University of Connecticut Ohione

Victor Goodman D. Ramachandran Rensselaer Polytechnic Institute California

Shu-ping C. Hodgson Douglas A. Wolfe Central Michigan

Robert A. Hultquist Linda J. Young Pennsylvania State University

INTRODUCTIONIn any scientific study of a physical phenomenon, it is desirable to have a

discrete sample space is not an appropriate model. Other, more complicated

S=[O,co)x[O,cx)={(x,y)fOx<cc and Oy<cc} Each variable would be capable of

convenient to consider a conceptual model in which the variables are not

is a finite collection of events, occurrence of an k outcome in the intersection

the intersection and A1 u A2 u (or Y A)

Definition 1.2.4 An event is called an elementary event if it contains

RELATIVE FREQUENCY For the experiment of tossing a coin, we may

declare that the probability of obtaining a head is 1/2 This could be

experiment, then fA = m(A)/M represents the relative frequency

performed, using a "random number generator" on a computer. In Figure 1.1, t he relative

= {1}, A2 = {2}, and so on are represented as the heights of

S is the sample space for an experiment and A is an event, occurrences then

regarding are willing probability

belief that involving an

function. In other words, it is a function subset of whose the real domain

ifA1, A2,.. are pairwise mutually exclusive events.

pairwise mutually exclusive events, thenP(A1 u A2 u u Ak) = P(A1) + P(A2) +

... + P(Ak) (1.3.5)

PROBABILITY IN DISCRETE SPACES The assignment of probability in

p1O foralli (1.3.6)

when S is finite, and an infinite series when S is countably infinite. The

P(A) = P({e1}) (1.3.8)

we should assign P({1}) = 1/2 and P({0}) =

outcomes. In particular, let the sample space consist of N distinct outcomes,S =

(e1, e2, ..., eN) (1.3.9)

SOME PROPERTIES OF :OABILITY

Theorem 1.4.1 IfA is an event and A' is its complement, then

Proof Because A' is the complement of A relative to S, S = A u A'.

Theorem 1.4.2 For any event A, P(A) 1.

Theorem 1.4.3 For any two events A and B,

FIGURE 1.2 Partitioning of events

AUB = (AflB')UB A = (AflB)U(AflB')

2/52 + 13/52 - 1/52 14/52 = 7/26.

Theorem 1.4.4 For any three events A, B, arid C,

- P(A n B) - P(A n C) - P(B n C) + P(A n B n

This follows from Theorem 1.4.1 applied to ( A1 = ¿=1

TABLE 1.1 from Numbers

A A' 15 5 20 45 35 80

Notice can express that

be partitioned probability

to the reduced sample space B. We know that B can B = (A n B) u (A' n B) A should n

P(B) and l/P(B) is the proportionality reduced sample space add to 1.

derived in Section 1.4 hold conditionally. In particular, P(AJB)= i P(A'IB)

0.15, In terms or

compute directly P(A n B) P(B)P(A

problems doing the so-called one are

TABLE 1.2 Partitioning of ways to draw the