eee TES
& Statistics
BY -eey Tome /14(e) 1)We ensure every
Cambridge learner can...
Aspire
We help every student reach their full potential
with complete syllabus support from experienced
teachers, subject experts and examiners.
Succeed
We bring our esteemed academic standards
to your classroom and pack our resources
with effective exam preparation. You can trust
Oxford resources to secure the best results.
Progress
We embed critical thinking skills into our
resources, encouraging students to think
independently from an early age and building
foundations for future success
Find out more
www.oxfordsecondary.com/cambridge
BN) 833
SUCCEED
ba:{0} 03:8)Pepa Ue ee)
eee TES
& Statistics
BY -eey Tome /14(e) 1)
OXFORDOXFORD
UNIVERSITY Pees
Great Clarendon Street, Oxford, OX2 6DP, United Kingdom
Oxford University Press isa departinent ofthe University of Oxford
Ie farthers the University's objective of excellence in research,
Scholarship, and education by publishing worldwide. Oxford isa registered
trade mark of Oxford University Press in the UK and in certain other countries
© Oxford University Press 2018
“The moral rights ofthe author have been asserted
First published in 2015
Second edition 2018
All rights reserved. No part ofthis publication may be reproduced stored
ins retrieval ystem, of transmitted, in any form or by any means, without
the prior permission in writing of Oxford University Press, or as expressly
permitted by law, or under terms agreed with the appropriate reprographics
"ighis organization. Enquiries concerning reproduction outside the scope.
‘of the above should be sent to the Rights Department, Oxford University Press,
ar the address above
You must not circulate this book in any other binding or cover
and yout must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
ora-19-8025175
10987654321
Paper used in the production of this book is @ narurl, recyclable product
‘made from wood grown in sustainable forests
‘The manufacturing process conforms to the environmental regulations
of the country of origin
Printed in Great Britain by Bell and Bain Ltd Glasgow
‘The questions, all example answers and comments that appear in
this Book were written by the authors.
Acknowledgements
“The publisher wotld lke to thant the following for permission to reproduce photographs:
DPI: Norma Jean Gargeszlage footstock: p2 (TL: Reckermann|Stockphoto: p2 (TR) MaicaiStockphot
2 (ML): dpa picture aliance/Alamy Stock Phone p2 (Ml) Tim Grahamialamy Stock Photo; p2 (HL:
Echo(Cettyimages; p2 (BR: Richard Wearlage footstock: p13 (1): ERproductions Lidjagefootstock:
13 (BL) Chungking/Shuterstock: p13 BL}: Phil Robinsonfage foostock: p14 (1) LoloStock)
Shutterstock; p14 (BL) DziminjFototia: pt (BR: Pixel Shepherd|ase footstock: p20: Leigh Prather)
Shutterstock: p24: Gwright/Alamy Stock Photo: p2#: Martin Plobjage footstock: 7 (1) Bert de Rute)
‘Alamy Stock Photo; p47 (Mj: Vacim Petrakoy)Shurterstock: p47 (Bk: Danita Delmont/Shutverstock:
PAB: Caro Seebery/Photothor; p5S (Tl) 0J0.Images/Stockphoto; p55 (TR) Denis Kuvaey/Shurterstock
B60: Barna Tanko/Shutterstock: p62: Lucky photozrapher/Shutterstock: p63: DariazuShutterstock;
p80: FloridaStock/Shutterstock: p86 (BLy. Janes Steid/Shuxterstock: p86 (BR): Phocologyi971/
Shutterstock; p112 (I) £1 iphoto'Shutrerstock; p112 (Bp: Ei Katsumatefage footstacky p12 Prank
‘Vetero/Alamy Stock Photo; p14: lan Murrayjage foostock: p36: Javier Larreajage footstock:
P160: Sarymsakov Andrey/Shutterstock; p16 (IT: MRI805jiStockphoto; p76 (TR): ssuaphotoy
'Stockphoto: p177: AshDesignShuttersiock
‘Gover Itutration by lan Norss, Oxford University PressContents
1 The Poisson distribution 1
1.1 Introducing the Poisson distribution 2
1.2 ‘The role of the parameter of the Poisson distribution, 5
1.3. The recurrence relation for the Poisson distribution 7
1.4 Mean and variance of the Poisson distribution 10
1.5. Modelling with the Poisson distribution 12
2 Approximations involving the Poisson distribution 20
2.1 Poisson as an approximation to the binomial 2
2.2. The normal approximation to the Poisson distribution 23
3, Linear combination of random variables 28
3.1. Expectation and variance of a linear function of a random variable 29
3.2 Linear combination of two (or more) independent random variables 34
3.3 Expectation and variance of a sum of repeated independent
observations of a random variable, and the mean of those observations 38
3.4 Comparing the sum of repeated independent observations with the
multiple of a single observation 40
Maths in real-life: The mathematics of the past 44
4 Linear combination of Poisson and normal variables 46
4.1 ‘The distribution of the sum of two independent Poisson random
variables "7
4.2 Linear functions and combinations of normal random variables 50
5 Continuous random variables 58
5.1 Introduction to continuous random variables 59
5.2. Probability density functions 61
5.3 Mean and variance of a continuous random variable 65
5.4 Mode of a continuous random variable n
6 Sampling 76
6.1 Populations, census and sampling 7
6.2. Advantages and disadvantages of sampling 9
6.3. Variability between samples and use of random numbers 81
6.4 ‘The sampling distribution of a statistic 86
6.5 Sampling distribution of the mean of repeated observations
of arandom variable 2
6.6 Sampling distribution of the mean of a sample from a
normal distribution 4
6.7 The Central Limit Theorem 96
6.8 Descriptions of some sampling methods 100
Maths in real-life: Modelling statistics 1047 Estimation
7.1 Interval estimation
7.2 Unbiased estimate of the population mean
7.3. Unbiased estimate of the population variance
7.4 Confidence intervals for the mean of a normal distribution
7.5 Confidence intervals for the mean of a large sample
from any distribution
7.6 Confidence intervals for a proportion
8 Hypothesis testing for discrete distributions
8.1 The logical basis for hypothesis testing
8.2 Critical region
8.3 Type Land ‘Type Il errors
84 Hypothesis test for the proportion p of
a binomial distribution
8.5 Hypothesis test for the mean of a Poisson distribution
9 Hypothesis testing using the normal distribution
9.1 Hypothesis test for the mean of a normal distribution,
9.2. Hypothesis test for the mean using a large sample
9.3. Using confidence interval to carry out a hypothesis test
Maths in real-tife: A risky business
Exam-style paper A
Exam-style paper B
‘Tables of the normal distribution
Answers
Glossary
Index
106
107
109)
i
116
119
121
127
128
132
137
142
145
152
153,
137
160
364
166
168.
170
171
194
197Introduction
About this book
‘This book has been written to cover the Cambridge International AS &
A Level Mathematics (9709) course, and is fully aligned to the syllabus.
Inaddition to the main curriculum content, you will find:
‘Maths in real-life’, showing how principles learned in this course are
used in the real world.
© Chapter openers, which outline how each topic in the Cambridge
9709 syllabus is used in real-life.
‘The book contains the following features:
Did you know?
‘Advice on MAM-STYLE GUESTION
calculator use
‘Throughout the book, you will encounter worked examples and a host
of rigorous exercises. The examples show you the important techniques
required to tackle questions. The exercises are carefully graded, starting
from a basic level and going up to exam standard, allowing you plenty of
opportunities to practise your skills. Together, the examples and exercises
put maths in a real-world context, with a truly international focus.
At the start of each chapter, you will see a list of objectives covered
in the chapter. These are drawn from the Cambridge AS and A Level
syllabus. Each chapter begins with a Before you start section and ends
with a Summary exercise and Chapter summary, ensuring that you fully
understand each topic.
Each chapter contains key mathematical terms to improve understanding,
highlighted in colour, with full definitions provided in the Glossary of
terms at the end of the book.
‘The answers given at the back of the book are concise, However, you
should show as many steps in your working as possible, All exam-style
questions have been written by the author.About the author
James Nicholson is an experienced teacher of mathematics at secondary
level, taught for 12 years at Harrow School as well as spending 13 years as
Head of Mathematics in a large Belfast grammar school. He is the author
of two A Level statistics texts, and editor of the Concise Oxford Dictionary
of Mathematics, He bas also contributed to a number of other sets of
curriculum and assessment materials, is an experienced examiner and has
acted as a consultant for UK government agencies on accreditation of new
specifications.
James ran schools workshops for the Royal Statistical Society for many
years, and has been a member of the Schools and Further Education
Committee of the Institute of Mathematics and its Applications since 2000,
including six years as chair, and is currently a member of the Community
of Interest group for the Advisory Committee on Mathematics Education.
He has served as a vice-president of the International Association for
Statistics Education for four years, and is currently Chair of the Advisory
Board to the International Statistical Literacy Project.
Anote from the author
The aim of this book is to help students prepare for the Statistics 2 unit of the
Cambridge International AS and A Level Mathematics syllabus, though it
‘may also be found to be useful in providing support material for other
AS and A Level courses. The book contains a large number of practice
questions, many of which are exam-styl.
In writing the book I have drawn on my experiences of teaching
Mathematics, Statistics and Further Mathematics to A Level over many
‘years as well as on my experience as an examiner, and discussion with
statistics educators from many countries at international conferences.So
weCEED )
seers
Student book & Cambridge syllabus
Pou Rell
Student Book: Complete Probability & Statistics 2
Bees for Cambridge International AS & A Level
Syllabus: Cambridge International AS & A Level
Mathematics: Probability and Statistics 2 (9709)
ole UL eed Student Book
‘Syllabus overview
Unit S2: Probability & Statistics 2 (Paper 6)
1. The Poisson distribution
* Calculate probabilities for the distribution Po(A) Pages 3-9
* Use the fact that if X ~ Pola] then the mean and variance of X are each equal to 4 Pages 10-11
+ Understand the relevance of the Poisson distribution to the distribution of random Pages 12-15,
events, and use the Poisson distribution as a modal
* Use the Poisson distribution as an approximation to the binomial cstribution where Pages 21-22
appropriate (7 > 50 and np <5, approximately)
* Use the normal distribution, with continuity correction, as an approximation to the Pages 23-25
Poisson distribution where appropriate (2. > 15, approximately)
2. Linear combinations of random variables
* Use, in the course of solving problems, the results that:
— ElaX +b) = aX) + b and Variax + b) = aver) Pages 29-33,
— Flax + bY) = aE) + DE Pages 34-38
— Variax + bY) = @°VartX) + b*Var(y) for independent Xend ¥ Pages 34-98
— if has a normal distribution then so does aX +b Pages 50-54
— if.Xand Y have independent normal distributions then aX + bY has anormal distibution | Pages 50-54
= ifX and ¥ nave independent Poisson clatrioutions then X-+ Yhas a Poisson distribution | Pages 4749
3._Continuous random variables
‘+ Understand the concept of @ continuous random variable, and recall and use properties Pages 59-61
of a probability density function (restricted to functions defined over a single interval)
* Use a probability density function to solve problems involving probebilties, and to Pages 61-73
calculate the mean and variance of a distribution (explicit knowledge of the cumulative
distribution function is not ineluded, but location of the median, for example, in simple
cases by direct consideration of an area may be required)Ree Mera ey ey
BN nts
Seren)
na seem tng Senne
Natuenin
4. Sampling and estimation
‘+ Understand the distinction between a sample and a population, and appreciate the Pages 77-79
necessity for randomness in choasing samples
‘+ Explain in simple terms why a given sampling method may be unsatisfactory (knowledge
f particular sampling methods, such as quota or strated sampling, is not required, but
candidates should have an elementary understanding of the use of random numbers in
producing random samples)
‘+ Recognise that a sample mean can be regarded as a random variable, and use the
facts that EXX) = and that VariX}
'* Use the fact that X has a normal distribution if X has a normal distribution
‘© Use the Central Limit Theorem where appropriate
‘© Caloulate unbiased estimates of the population mean and variance from a sample, using
either raw or summarised data (only a simple understanding of the term ‘unbiased’ is,
required)
‘© Determine and interpret a confidence interval for a population mean in cases where the
population is normally distributed with known variance or where a large sample is used
+ Determine, from a large sample, an approximate confidence interval for a population
proportion
Pages 79-86
Pages 86-24
Pages 94-96
Pages 96-100
Pages 109-116
Pages 116-121
Pages 121-124
5. Hypothesis tests
‘+ Understand the nature of a hypothesis test, the difference between one-tall and two-tail
tests, and the terms null hypothesis, alternative hypothesis, significance level, rejection
region (or critical region), acceptance region and test statistic
‘+ Formulate hypotheses and carry out a hypothesis test in the context of a single
‘observation from a population which has a binomial or Poisson distribution, using
— ditect evaluation of probabilities,
— anormal approximation to the binomial or the Poisson distribution, where
appropriate
‘+ Formulate hypotheses and carry out a hypothesis test concerning the population mean
in cases where the population is normally distributed with known variance or where a
large sample is used
‘+ Understand the terms Type | error and Type Il error in relation to hypothesis tests
‘+ Calculate the probabilities of making Type | and Type il errors in specific situations
involving tests based on a normal distribution or direct evaluation of binomial or Poisson
probabilities
Pages 128-137
Pages 142-149
Pages 153-161
Pages 137-139
Pages 140-141The Poisson distribution
‘The Poisson distribution can be used to (at
least approximately) model a large number of
natural and social phenomena. You might not
expect the number of photons arriving at a
cosmic ray observatory, the number of claims
made to an insurance company, the number
of earthquakes ofa given intensity and the
number of atoms decaying in a radioactive
material to have much in common, but
they are all examples of this distribution.
‘The photo is of VERITAS - Very Energetic
Radiation Telescope Array in Arizona ~ which
is helping to shape our understanding of
how subatomic particles like photons are
accelerated to extremely high energy levels.
Objectives
After studying this chapter you should be able to:
© Calculate probabilities for the distribution Po(A).
‘© Use the fact that if X ~ Po(A) then the mean and variance of X are each equal to 2.
@ Understand the relevance of the Poisson distribution to the distribution of random events,
and use the Poisson distribution as a model.
Before you start
You should know how to: Skills check:
1. Use your calculator to work out values of 1. Find the value of:
exponential functions, e.g. a) e
Find the value of e** b) e2
e* = 0.0821 (3 s.£)
2. Substitute values into more complex 2. Find the value of p=
formulae, e.g.
x2.54
rr
0.0821%39.06 _
2.082 1799.08 9.134 (3p)
Find the value of p1.4 Introducing the Poisson distribution
‘Think about the following random variables:
.
.
.
.
.
‘The number of dandelions in a square metze of a piece of open ground.
‘The number of errors in a page of a typed manuscript.
The number of cars passing a point on a motorway in a minute,
The number of telephone calls received by a company switchboard in half an hour.
‘The number of lightning strikes in an area over a year.
Introducing the Poisson distributionDo they have any features in common? Does any one of them stand out
as being rather different?
‘The behaviour in five of these photos follows the Poisson distribution.
Formally, the conditions are that
i) events occur at random
ii) events occur independently of one another
iii) the average rate of occurrences remains constant
iv) there is zero probability of simultaneous occurrences.
‘The Poisson distribution is defined as
P(X= 17) == for r=0,1,2,
7
You need to have a value for A in order for this to make sense, so there is
a family of Poisson distributions but there is only one parameter, A, which
is the mean number of occurrences in the time period (or length, area or
volume) being considered.
You can write the Poisson distribution as X ~ Po(A).
Example 1
If X ~ Po(3) find P(X = 2).
P(X=2)= £2 $0224 sf)
Example 2
‘The number of cars passing a point on a road during a 5-minute period
may be modelled by the Poisson distribution with parameter 4.
Find the probability that in a 5-minute period
i) 2carsgo past ii) fewer than 3 cars go past.
X ~ Po(4)
i) P(X=2) ae = 0.14652 .147 (3s)
ii) P(X=0) aoe = 0.01831... = 0.0183 3s.£) Remember that OF = 1 and a? = 1
X= 1) = £4 = 0.07326... = 0.0733(3s.£)
"
P(X <3) = 0.01831... + 0.07326... + 0.146525...
.238 (3s.f)
The Poisson distribution [UYry
Mathematical note: It is not immediately obvious from the
mathematics you cover in this course that the form of the Poisson
distribution constitutes a probability distribution - remember from
SL Chapter 5 this requires all probabilities to be non-negative (which
they obviously all are here because exp(~A) > 0 for any value of A) but
also that the sum of the probabilities is 1.
on
7
because x4) =14A4
P= for r= 0, 1, 2, 3, «. isa probability distribution
Bz
23!
4 ~ this is an example
of an advanced topic in Pure Maths where functions like exponentials,
logarithms and the trigonometric functions have (infinite) power
series forms. Truncated forms of these infinite series are how
electronic calculators obtain values of these functions.
Exercise 1.1
1. IfX~Po(2)find i) P(X=1) ii) P(X=2) iii) P(X =3).
2. IfX~ Po(1.8)find i) P(X=0) ii) POX=1) iii) P(X = 2),
3. IfX~Po(5.3) find i) P(X=3) ii) P(X=5) iii) P(X =7).
4. IfX~ Po(04) find i) P(X=0) ii) PX=) iti) P(X = 2).
5. IfX~ Po(2.15) find i) P(X=2) ii) P(X=4) iii) P(X =6).
6. IfX~ Po(3.2) find i) P(X=2) ii) P(X <2) i) P(X > 2).
7. ‘The number of telephone calls arriving at an office switchboard in a
5-minute period may be modelled by a Poisson distribution with
parameter 3.2. Find the probability that in a 5-minute period
a) exactly 2 calls are received
b) more than 2 calls are received.
8. ‘The number of accidents which occur on a particular stretch of road in
a day may be modelled by a Poisson distribution with parameter 1.3.
Find the probability that on a particular day
a) exactly 2 accidents occur on that stretch of road
b) fewer than 2 accidents occur.
Introducing the Poisson distribution1.2 The role of the parameter of the Poisson distribution
‘The mean number of events in an interval of time or space is proportional
to the size of the interval.
Example 2 in Section 1.1 looked at the number of cars passing a point on
a road during a 5-minute period. This may be modelled by the Poisson
distribution with parameter 4.
In this case, the number of cars passing that point in a 20-minute
period may be modelled by the Poisson distribution with parameter 16,
and in a L-minute period may be modelled by the Poisson distribution
with parameter 0.8.
Ifthe conditions for a Poisson distribution are satisfied in a given period,
they are also satisfied for periods of different length.
Example 3
‘The number of accidents in a week on a stretch of road is known to follow
a Poisson distribution with mean 2.1.
Find the probability that
a) ina given week there is 1 accident
b)_in a two week period there are 2 accidents
c)_ there is 1 accident in each of two successive weeks.
a) In one week, the number of accidents follows a Po(2.1) distribution,
= 0.257 (3s.f).
so the probability of 1 accident =
uw
b) In two weeks, the number of accidents follows a Po(4.2) distribution,
ets.
2
©) his cannot be done directly as a Poisson distribution since it says what has to
132 3s.£).
so the probability of 2 accidents =
happen in each of two time periods, but these are the outcomes
considered in part a).
So the probability this happens in two successive weeks is Ee
The Poisson distributionExample 4
‘The number of flaws in a metre length of dress material is known to
follow a Poisson distribution with parameter 0.4.
Find the probabilities that
a) there are no flaws in a 1 metre length.
b) there is | flawin a 3 metre length
©) there is | flaw in a piece of material which is half.a metre long.
a) X~Po(0.4) = P(X =0)
at = 0.361 (3 sf).
b) ¥~Po(1.2) > P(¥ =1)=
©) Z~Po(0.2) = P(Z ~1)= <*02' 0.164 (3s).
Exercise 1.2
1. The number of telephone calls arriving at an office switchboard in a
5-minute period may be modelled by a Poisson distribution with
parameter 1.4. Find the probability that in a 10-minute period
a) exactly 2 calls are received
b) more than 2 calls are received.
2. The number of accidents which occur on a particular stretch of road in
a day may be modelled by a Poisson distribution with parameter 0.4.
Find the probability that during a week (7 days)
a) exactly 2 accidents occur on that stretch of road
b) fewer than 2 accidents occur.
3. ‘The number of letters delivered to a house on a day may be modelled by
a Poisson distribution with parameter 0.8.
a) Find the probability that there are 2 letters delivered on a particular day.
b) ‘The home owner is away for 3 days. Find the probability that there
will be more than 2 letters waiting for him when he gets back.
4. ‘The number of errors on a page of a booklet can be modelled by a
Poisson distribution with parameter 0.2.
a) Find the probability that there is exactly 1 error on a given page.
b) A section of the booklet has 7 pages. Find the probability that there
are no more than 2 errors in the section.
¢) The booklet has 25 pages altogether. Find the probability that the
booklet contains exactly 6 errors altogether.
EM The role of the parameter of the Poisson distributi5. ‘The number of people calling a car breakdown service can be modelled by
4 Poisson distribution, and the service has an average of 6 calls per hour.
Find the probability that in a half-hour period
a) exactly 2 calls are received
b) more than 2 calls are received.
The recurrence relation for the Poisson distribution
You can calculate probabilities for a Poisson distribution in sequence using
a recurrence relation.
Example 5
IfX ~ Po(a)
a) write down the probability that
a) i)
etxdt
baa
‘The general relationship is P(X = k +1) = & x P(X=b).
‘The graphs on the next page show the probability distributions for different values of A and
what effect changing the value of A has on the shape of a particular Poisson distribution.
The Poisson distribution PaPoisson, A= 1.2 [= EW)
All Poisson variables have a sample space
which is all of the non-negative integers.
a] However, when 2 is relatively low, the
eel probabilities tail off very quickly.
3 12_15, 12_o6, ’
be OO OG
TL, (| se the initial probability that = 0 is multiplied
°
Sa ee ee wn | by 1.2, then 0.6, then 04, 0.3, ... and so the
z mode of X
possnn A=28 1 £0 Here Ais larger than in the previous graph and
ats the peak has moved across to the right.
= 04 For values of X which are less than A the
Baa probability increases, but once x is greater than
3 on A the probabilities start to decrease.
era Hf { More values of x have a noticeable probability,
oll I fc so the highest individual probability is not as
07127374 6 6°78 9 10°12 12° | largeas it wasin the previous graph and the
x distribution is more spread out.
waa EET What happens when A is an integer?
Here P(X = 4) = P(X = 3) x 4s P(X = 3) and
the distribution has two modes ~ at 3 and 4.
i Generally, the mode of the Poisson (A)
302 distribution is at the integer below 2 when A is
* oath Lip A not an integer and there are two modes (at 4
° 2 .
Se Say’ [and A~1) when it is an integer.
A< Lisa special case.
Poleson A= 0.8 [
os meanest Fin! Here even the first time the recurrence relation
04 is used you are multiplying by < 1, so the mode
Eos will be 0 and the probability distribution is
Wo strictly decreasing for all values of x.
Ea dh I 4
° a
orga 4 6 67 8 9 I
The recurrence relation for the Poisson distribution‘The general forms for the probabilities of 0 and | for a Poisson distribution are
Example 7
X ~ Po(5.8). State the
Since 5.8 is not an integer, the mode is the integer below it,
i.e, the mode is 5.
Exercise 1.3
1. X~Po(2.5)
a) Write down an expression for P(X = 4) in terms of P(X = 3).
b) IFP(X=
©) Calculate P(X = 4) directly and check it is the same as your answer to b)..
d) What is the mode of X7
2. X~ Pols)
a) Write down an expression for P(X = 5) in terms of P(X = 4).
b) Explain why X has two modes at 4 and 5.
.214, calculate the value of your expression in part a).
3. X~Po(A) and P(X = 4) = 1.2 x P(X=3).
a) Find the value of &
b) Whatis the mode of X?
The Poisson distribution (UR)10
1.4 Mean and variance of the Poisson distribution
IfX ~ Po(A), then E(X)=As Var(X)=A=> st. dev. (o)= VA.
A special property of the Poisson distribution is that the mean and variance
are always equal.
Example 8
‘The number of calls arriving at a company's switchboard in a L0-minute period can be modelled
bya Poisson distribution with parameter 3.5.
Give the mean and variance of the number of calls which arrive in
i) Here A =3,5 so the mean and variance will both be 3.5.
ii) Here A = 21 (= 3.5 x 6) so the mean and variance will both be 21.
iii) Here A= 1.75 (= 3.5 + 2) so the mean and variance will both be 1.75.
Example 9
‘A dual carriageway has one lane blocked off because of roadworks.
‘The number of cars passing a point ina road in a number of 1-minute intervals is summarised in
the table.
Numberofecas | 0 | 1 | 2.3) 4) 5 6 |
Frequency 3 | 4/4/25 [30/3 | 1 |
a) Calculate the mean and variance of the number of cars passing in I-minute intervals.
b) Is the Poisson likely to provide an adequate model for the distribution of the number of cars
assing in 1-minute intervals?
a) Df=70. Yixf=228, x?
= 836, 80 ¥ = TE
and Var(x) = Sf 836 -( )
Sy 70 \70
b) ‘The mean and variance are not numerically close so it is unlikely the Poisson will be an
adequate model (with only one lane open for traffic, overtaking cannot happen on this stretch
of the road and the numbers of cars will be much more consistent than would happen in
normal circumstances - hence the variance is much lower than would be expected if the
Poisson model did apply).
= 228
.26 (3 s.f.)
333
Mean and variance of the Poisson distributionDerivation of mean and variance of the
Poisson distribution
You must be able to use these results but are not required to be able to prove
them ~ they are included here for completeness, and as a nice manipulation
using the power series expression for the exponential function.
X ~ Pola) <> Pr{X =k} =
i
eta Sy tat
OxeF +2 xa
E(x) = Dk x
K!
cancelling k, alter discarding the zero case
nan Pee oe ca SER uh
sax Dk Gy axdik eee
egagh?
* 20D
Bor)= Dex
aR+A
Then Var(X) = A? +A—2? =A.
Exercise 1.4
1. TEX ~Po(3.2) find i) BCX)
ii) Var(X).
2. IfX ~ Po(49) find the mean and standard deviation of X.
3. X~ Po(3.6)
a) Find the mean and standard deviation of X.
b) Find P(X > 1), where p= B(X).
) Find P(X > 1 + 20), where ois the standard deviation of X.
d) Find P(X < pt 20).
4. X isthe number of telephone calls arriving at an office switchboard in a 10-
minute period. X may be modelled by a Poisson distribution with parameter 6.
a) Find the mean and standard deviation of X.
b) Find P(X > wu), where = E(X).
©) Find P(X > ft +20), where ois the standard deviation of X.
@) Find P(X
3)=1-P(Xs3 0.8571 = 0.1429.
b) Ina 20-minute period G ofan hour), the mean number of cyclists will be 2 x 5 = 5
P(exactly one) = 342 (3s.f.).
©) The situation is that of a binomial distribution - there are 6 ‘trials,
the number of cyclists in each hour is independent of the other
periods, and the probability of more than 3 in an hour remains
the same for all the 6-hour periods, ie. if Y = number of times
that more than 3 cyclists pass by in an hour exactly once ina
6-hour period Y ~ B(6, 0.1429) (using the probability calculated
in part a) ii).
P(Y = 1) = 6 x 0.1429! x (1 - 0.1429)* = 0.397 (3s.£).
Example 12
Ata certain harbour the number of boats arriving in a 15-minute period can be modelled by a
Poisson distribution with parameter 1.5.
a) Find the probability that exactly six boats will arrive in a period of an hour.
b) Given that exactly six boats arrive in a period of an hour, find the conditional probability that
twice as many arrive in the second half hour as arrive in the first half hour.
a) Inan hour the average number of boats arriving is 6, so
P(6 boats arrive in an hour) = £-© = 0.161.
b) Iftwice as many arrive in the second half hour, then there needs to be 2 in a half-hour period
and then 4 in the next half hour, so
P(2 boats arrive in half hour, then 4 boats in next half hour)
ost
Gir = 0.224 x 0.168 = 0.0376.
‘Then the conditional probability is
P(2 then 4 in half hour | 6 boats arrive in an hour) = TTR = 0.234,
The Poisson distributionExercise 1.5
1. For the following random variables state whether they can be modelled
by a Poisson distribution.
If they can, give the value of the parameter 4; if they cannot then explain why.
a) ‘The average number of cars per minute passing a point on a road is 12.
The traffic is flowing freely.
X= number of cars which pass in a 15 second period.
b) ‘The average number of cars per minute passing a point on a road is 14.
‘There are roadworks blocking one lane of the road.
X= number of cars which pass in a 30 second period.
©) Amelie normally gets letters at an average rate of 1.5 per day.
X= number of letters Amelie gets on December 22nd.
@) A petrol station which stays open all the time gets an average of
832 customers in a 24 hour time period.
X= number of customers in a quarter of an hour at the petrol station.
e) An A&E department in a hospital treats 32 patients an hour on average.
X= number of patients treated between 5pm and 7 pm on a Friday evening,
2. For the following situations state what assumptions are needed if a
Poisson distribution is to be used to model them, and give the value
of A. that would be used.
You are not expected to do any calculations!
a) On average defects in a roll of cloth occur at a rate of 0.2 per metre.
How many defects are there in a roll which is 8m long?
b) On average defects in a roll of cloth occur once in 2 metres.
How many defects are there in a roll which is 8m long?
©) Asmall shop averages 8 customers per hour.
How many customers does it have in 20 minutes?
3. An explorer thinks that the number of mosquito bites he gets when he is
in the jungle will follow a Poisson distribution.
‘The explorer records the number of mosquito bites he gets in the jungle
during a number of hour-long periods, and the results are summarised
in the table.
Number of bites | 0 | 1 | 2 | 3 | 4 | 5 | 6 [57
Frequency 3{7{solteoelo6{[slilo
IE Modelling with the Poisson distributiona) Calculate the mean and variance of the number of bites the explorer gets,
in an hour in the jungle.
b) Do you think the Poisson is a good model for the number of bites
the explorer gets in an hour in the jungle?
|. The number of emails Serena gets can be modelled by a Poisson distribution
with a mean rate of 1.5 per hour.
a) i) Whatis the probability that Serena gets no emails between 4 pm and 5 pm?
ii) What is the probability that Serena gets more than 2 emails
between 4 pm and 5 pm?
ii) What is the probability that Serena gets one email between 6 pm and 6.20 pm?
b) What is the probability that Serena gets more than 2 emails in
an hour exactly twice in a 5-hour period?
©) Would it be sensible to use the Poisson distribution to find the
probability that Serena gets no emails between 4 am and 5 am?
. ‘The number of lightning strikes in the neighbourhood of a campsite in a
week can be modelled by a Poisson distribution with parameter 1.5.
a) Find the probability that there is exactly one lightning strike in the
neighbourhood in a given week,
b) Alejandra spends three weeks at the campsite, Find the probability that there
are exactly three lightning strikes in the neighbourhood during her holiday.
) Given that the neighbourhood has exactly three lightning strikes during her
holiday, find the conditional probability that each week has exactly one strike.
Summary exercise 1
« IfX ~ Po(1.45) find @) Find
a) P(X=2) P(X 41), where j= E(X),
©) Find P(|X ~ j1| < 0), where ovis the
standard deviation of X.
7. Anurban safety officer thinks that the
number of traffic accidents in an area will
follow a Poisson distribution.
‘The officer records the number of accidents
in the area each week over a period of several
months, and the results are summarised in
the table.
Number of
accidents |°| |?
4/5/16
Frequency |5/1[1|3[5|2/ 1] 0
a) Calculate the mean and variance of the
number of accidents in the area in a
week.
Summary exercise 1
b) Do you think the Poisson is a good
model for the number of accidents in the
area in a week?
‘The number of errors on a page of a book
can be modelled by a Poisson distribution
with parameter 0.15.
a) Find the probability that there is exactly
lerror ona given page.
b) A chapter of the book has 20 pages.
Find the probability that there are no
more than 2 errors in the chapter.
©) What is the most likely number of errors
in the chapter?
} EXAM-STYLE QUESTIONS
9. ‘The number of errors on a page of the first
proofs of a book can be modelled by a
Poisson distribution with parameter 0.6.
a) Find the probability that a page has
exactly one error on i
b) Find the probability that a double page
spread has exactly two errors on it.
©) Given that a double page spread
has exactly two errors on it, find the
conditional probability that each page
has exactly one error on it.
10. A shop sells spades. The demand for spades
follows a Poisson distribution with mean
2.7 per week.
a) Find the probability that the demand is
exactly 2 spades in any one week.
b) ‘The shop has 4 spades in stock at
the beginning of a week. Find the
probability that this will be enough to
satisfy the demand for spades in that
week.©) Given instead that there are 1 spades in
stock, find, by trial and error, the least
value of 1 for which the probability of
not being able to satisfy the demand for
spades in that week is less than 0.1
11. ‘Ihe random variable X has the distribution
Po(2.5). the random variable ¥ is defined
by ¥=2X.
a) Find the mean and variance of Y.
b) Give a reason why the variable ¥ does
not have a Poisson distribution.
12, Cars travelling south on a rural road
pass a particular point randomly and
independently at an average rate of 2 cars
every three minutes.
a) Find the probability that exactly 3 cars
travel south past that point in a S-minute *
period.
Chapter summary
The Poisson distribution is defined as
Cars travelling north on that road pass the
same point randomly and independently at
an average rate of | car each minute.
b) Find the probability that a total of fewer
than 4 cars pass that point in a 3-minute
period.
. The number of lightning strikes at a
particular place in a 28-day period has a
Poisson distribution with mean 1.2.
a) Find the probability that at most 2
lightning strikes will be recorded at that
place in a 42-day period.
b) Find, in days, correct to 1 decimal place,
the longest time period for which the
probability that no lightning strikes will
be recorded at that place is at least 0.9.
P(X=1)= £4 forr=0, 238
The Poisson distribution has a single parameter, 2.
© The Poisson distribution is often written as X ~ Po(A).
@ IfX~ Po(A), then E(X) =A; Var(X)= 07 = A=» st. dev. (0) = V2
The conditions for the Poisson are
i) events occur at random
ii) events occur independently of one another
iii) the average rate of occurrences remains constant
iv) there is zero probability of simultaneous occurrences.
The Poisson distributionery
Approximations involving
the Poisson distribution
‘The Poisson provides a good approximation
to binomial distributions where 1 is large
under certain conditions.
For example, the number of genetic
mutations in a stretch of DNA can be
modelled well by the Poisson distribution
~ there is a lot of work currently being
done to understand the processes involved
in genetic mutations in both the plant
and animal domains, with the possibility
of significant medical advances in the
treatment of diseases like cancer and
Parkinsons.
Objectives
After studying this chapter you should be able to:
© Use the Poisson distribution as an approximation to the binomial distribution where
appropriate (n > 50 and np < 5, approximately).
© Use the normal distribution, with continuity correction, as an approximation to the Poisson
distribution where appropriate (A > 15, approximately).
Before you start
You should know how to: Skills check:
1. Calculate probabilities using the binomial 1, X ~ B(40, 0.03). Find P(X < 2).
distribution, eg.
X~ B(LO, 0.3). Find P(X = 2).
P(X=
10)
3 Jos 0.7" = 0.233 (3s.f.)
2. Calculate probabilities using the normal 2. X~N(20, 20). Find P(X < 17.1).
distribution, e.g.
X ~N(40, 15). Find P(X < 44.2).
44.2— 40
P(X < 44.2) =7([2< ~1084)
vis
= 0.861 (3s.f)2.1 Poisson as an approximation to the binomial
In the last chapter of SI you met the use of the normal distribution as
an approximation to the binomial distribution, provided certain conditions
were satisfied by the parameters 1 and p. Here we meet a second
approximation to the binomial.
IF X ~ B(n, p) with n large (nt > 50) and p close to0 (np < 5) then
X ~ approximately Bo(A) with A
Here are some examples where the binomial and Poisson distributions
have the same mean:
Poisson (mean = 4) and son mean = 4) and
binomial (9 = 10,9 0.4) binomial (a= 40,0 0.1)
02s 02
z= tna] | $2 tena
Boss Bos
3 os Zo
© cos F a0
© °
o's 2°3"4 5 6 7 8 9 wu 2 or 2's 4 56 7 8 9 0D
‘The mean of the binomial is 4 and the variance _| ‘The variance of the binomial is now 3.6
is 2.4, (remember that the variance of the Poisson is 4).
‘The two sets of probabilities are not particularly | The agreement between the two sets of
similar, probabilities is now pretty strong.
Poisson (mean = 4) and Poisson (mean = 4) and
binomat (a= 40, p = 0.01) binomial n= 4000, p = 0.002)
0.26 025
go © oromal | 3 2 1 oom
Boss Boss}
Zoos Boot
© 0s © 0s
° °
os 23°45 6 1 8 9 sn 2 on 2 3 as 6 7 8 9 0D
‘These two graphs both seem to show the binomial and Poisson to be exactly the same ~ but they are
not: while you cannot see any difference on this scale graphically, there are differences between the
binomial and the Poisson in both cases and the differences in the last case are much smaller than
the differences when = 400 and p = 0.01.
‘There isa fundamental difference in that the Poisson outcome space has no
upper limit whereas the binomial is bounded by the value of n. However, when,
ris large and p is small, the probabilities of high values of x are very small so
Approximations involving the Poisson distributionthis is not a problem (in the same way that the normal can never provide an
exact model for any physical measurements like heights or weights because the
distribution cannot take negative values).
“The use of the Poisson as an approximation to the binomial improves as 11
increases and as p gets smaller.
Example 1
The probal
a) Ifa sample of size 5 is taken, find the probability that exactly one of the components is faulty.
b) What is the probability that a batch of 250 of these components has more than 3 faulty
components in it?
that a component coming off a production line is faulty is 0.01.
a) If X = number of faulty components in sample then X ~ B(5, 0.01) and
P(X = 1) = 5x 0.01 x 0.99" = 0.0480 (3.£.)
b) If Y= number of faulty components in the batch then X ~ B(250,0.01)"~ Po(2.5)
and P(Y > 3) = 1 — P(Y <3) = 1 - 0.758 = 0.242 (3s.f.)
If you are working in a situation where p is close to 1, you can choose to count
failures instead of successes and still construct an appropriate Poisson approximation.
Exercise 2.1
1. ‘The proportion of defective pipes coming off a production line is 0.
‘A sample of 60 pipes is examined.
a) Using the exact binomial distribution calculate the probabilities that there are
i) o ii) 1 iti) 2 iv) more than 2
defectives in the sample.
b)_ Using an appropriate approximate distribution calculate the probabilities
that there are
i) 0 fi) 1 iii) 2 iv) more than 2
defectives in the sample.
2. a) State the conditions under which a Poisson distribution may be used
to approximate a binomial distribution.
b) 5% of the times a faulty ATM asks for a personal identification number
(PIN number) it does not register the number entered correctly. If Tenter
my PIN correctly each time, what is the probability that the ATM will not
register it correctly in 3 attempts?
©) Over a period of time, 90 attempts are made to enter a PIN. Ifall of the
customers enter their PIN correctly, what is the probability that fewer
than 3 of the attempts are not registered correctly.
Poisson as an approximation to the binomial3. Ina small town, the football team claim that 95% of the people in town
support them. If the claim is correct and a survey of 80 randomly chosen
people asks whether they support the football team, find the probability
that more than 75 people say they do.
4. A rare but harmless medical condition affects 1 in 200 people.
a) Ata cinema-showing which 130 people attend, what is the probability
that exactly one person has the condition?
b) Ata concert where the audience is 600, use an appropriate approximate
distribution to find the probability that there are fewer than 5 people
with the condition.
5. ‘The Nutty Fruitease party claim that 1 in 250 people support their policy to
distribute free fruit and nut chocolate bars to children taking examinations.
a) Inan opinion poll which asks 1000 voters about a range of policies put
forward by different parties, find the probability that
i) no-one will support the Nutty Fruitcase party policy
ii) at least 5 people will support the policy.
b) Ifthe opinion poll had 7 people supporting the policy, does this mean
that the Nutty Fruitcase party have underestimated the support there
is for this policy?
6. A rare medical condition affects 1 in 150 sheep.
a) Ina small farm holding with a flock of 180 sheep, what is the probability
that exactly one sheep has the condition?
b) A large farm has a flock of 500 sheep. Use an appropriate approximate
distribution to find the probability that there are fewer than 5 sheep with,
the condition.
2.2 The normal approximation to the Poisson distribution
For large A( > 15, approximately) you would often use a normal approximation
particularly when the probability of an interval is required, e.g. P(X 2 15) or
P(6 < X < 14), since this is a single calculation for a continuous random variable
but requires multiple calculations for a discrete random variable.
Remember that the normal uses the standard deviation to calculate the
z-score, i.e. 2 =*
You must also include the continuity correction (which you met in SL
when using the normal to approximate another discrete distribution ~
the binomial),
‘The parameters used are the mean and variance of the Poisson, ie. l= 0 = A.
Approximations involving the Poisson diExample 2
IFX ~ Po(16) calculate P(I1 < X-< 15)
8) using the exact Poisson probabilities b)_by using a normal approximation.
a) POLS X< 15)= P= 11, 12,13, 14 13)
16? 16 16)
13! lat 15}
0.389
so use the N(16, 16) distribution to approximate the
(16) distribution.
‘The continuity correction says P< X < 15) = (05 <¥ <15.5)
where Y is the approximating normal.
105-16 15 =i)
vis" ie)
(-1.375 < Z <-0.125)
= ©(1.375) - (0.125
P(10.5 < Y< 15.5) =
0.9155 - 0.5498 = 0.366.
Example 3
‘The demand for a particular spare part in a car accessory
shop may be modelled by a Poisson distribution.
On average the demand per week for that part is 2.5.
a) The shop has 4 in stock at the start of one week.
What is the probability that they will not be able
to supply everyone who asks for that part during
the week.
b) ‘The manager is going to be away for 6 weeks, and
wants to leave sufficient stock that there is no more than a 5% probability of running
out of any parts while he is away. How many of this particular spare part should
a) For the demand in a week, use the Po(2.5) distribution. Then if the demand is 4 or less
the shop can supply all the customers.
P(X < 4) = 0.0821 + 0.2052 + 0.2565 + 0.2138 + 0.1336 = 0.8912.
‘The probability of not being able to supply all the demand is 1 - 0.891 = 0.109 (35.f.).
b) For the demand in 6 weeks, use the Po(15) distribution, which can be approximated
by the N(15,15) distribution.
‘You need to find k so that P(demand < k) > 0.95.
+ (0,95) = 1.6449 so you need to find the smallest integer k which satisfies
(ke > 1.6449, which is 21 (solution is k > 20.9).
The normal approximation to the Poisson distributionExercise 2.2
1. Let X ~ Po(A)jand Y ~ N(A, A) where A satisfies the conditions needed for
¥ to be used as an approximation for X.
Write down the probability you need to calculate for ¥ (including the
continuity correction) as the approximation for each of the following
probabilities for X.
a) P(X<16) b) P(X > 22) <) P(Xs17) d) P(45 =X < 62)
2. Which of the following could reasonably be approximated by a normal distribution?
(for those which can, state the normal distribution that would be used).
a) X~ Po(16) b) X~ Po(12.32) ©) X~ Po(8.5)
3. Use normal approximations to calculate
a) P(X< 42) ifX~Po(49) —-b) P(X29) if X~ Po(17.5) )_ P(X2 13) if X ~ Po(18.4)
d) P(25 2).
ii) Find the probability that X = 2 given
that X22.
b) Using an appropriate approximate
distribution calculate the probabilities
that there are
i) 0 1 i) 2
iv) > 2 defective pipes in the sample.
b) Random samples of 150 values of X are
taken.
2. A rare disease affects 1 in 2000 people on
average.
i) Describe fully the distribution of the
sample mean.
ii) Find the probability that the mean of
a random sample of size 150 is less
than 2.4.
a) Use a suitable approximation to find the
probability that, ofa random sample of
7500 people in a city, more than 3 people
have the disease.
On average 3 people in every 10000 in
Canada have a particular gene. A random
sample of 4000 people in Canada is chosen.
b) Ina random sample of 1 people, the
probability that no one has the disease
is less than 0.01. Find the least possible
‘The random variable X denotes the number
value of n.
of people in the sample who have the gene.
Use an approximating distribution to calculate
the probability that there will be more than
2 people in the sample who have the gene.
3. Customers arri
at the exchange and
refunds desk in a store at a constant average
rate of 1 every 2 minutes.
a) State one condition for the number of
customers arriving in a given period to
be modelled by a Poisson distribution,
2% of bottles on a production line do not
have their tops securely fastened. This fault
occurs randomly. 200 bottles are checked to
Assume now that a Poisson distribution is a see whether the tops are securely fastened.
suitable model.
b) Find the probability that exactly 4
customers will arrive during a randomly
chosen 10-minute period.
©) Find the probability that less than 3
customers will arrive during a randomly
chosen 5-minute period.
Use a suitable approximation to find the
probability that fewer than 4 do not have the
top securely fastened.
Summary exercise 2a
A dissertation contains 5480 words. For each } 8.
word, the probability it contains an error is
0.001, and these errors can be assumed to
occur independently. The number of words
with errors in the dissertation is represented
by the random variable X.
a) State the exact distribution of X,
including the value of any parameters.
b) State an approximate distribution for X,
including any parameters, and justify the
use of this approximation.
Use this approximate distribution to find
the probability that that there are more
than 4 words printed wrongly in the
dissertation.
°)
Chapter summary
If X ~ B(n, p) with 1 large (n> 50) and p close to 0 (np <5) then X ~ approximately Po(A)
with 2 = np.
‘A manufacturer packs computer
components in boxes of 500. On average,
1 in 2000 components is faulty. Use a suitable
approximation to estimate the probability
that a randomly chosen box contains at least
one faulty component.
On average 1 in 3000 adults has a certain
medical condition,
a) Use a suitable approximation to find the
probability that, in a random sample
of 4500 people, fewer than 4 have this
condition.
b) Ina random sample of » people, where
nis large, the probability that none has
the condition is less than 10%, Find the
smallest possible value of v.
IX ~ Po(A) with A> 15 (approximately) then X ~ approximately N(A, 2).
When the Poisson is approximated by a normal distribution, a continuity correction must
be used.
Approximations involving the Poisson distributioni Linear combination of random variables
The real world is not simple; many
things are made up of more than
one component. It is often easier to
model each component of a process
separately than it is to try to produce a
complex model of the whole process.
Simulations then allow you to get
a good idea of what the behaviour
of the overall process would be. For
example, simulating the number of
passengers on a flight, and then the
baggage and person weights would be
easier to do separately,
Objectives
After studying this chapter you should be able to:
© Use, in the course of solving problems, the results that
© E(aX +b) = aB(X) + band Var(aX +b) = «Var(X)
© E(aX + bY) = aE(X) + bE(Y)
© Var(aX + bY) = a?Var(X) + 6°Var(¥) for independent X and Y.
Before you start
You should know how to: Skills check:
X) and
1, Calculate the mean and variance of a 1. Calculate
random variable, eg.
x
P(X =.) | 0.2 | 0.4 | 04
x 3f4]5
P(X =x) | 0.1 | 0.6 | 03
Calculate E(X) and Var(X).
E(X) = (3 x 0.1) + (4x 0.6) + (5x 0.3) = 4.2
9 x 0.1) + (16 x 0.6) + (25 x 0.3)
8
Var(X) = 18 - 4.2
0.363.41 Expectation and variance of a linear function
of a random variable
In SI Sections 5.3 and 5.4 you met the expectation and variance of
a discrete random variable:
‘@ The mean or expected value of a probability distribution is defined
as w= E(X)= > px.
©. The variance of a probability distribution is defined as
Var(X) = E[{X - E(X)}). The alternative version (which is easier to
use in practice) is Var(X) = BO®) - {E(X)P.
In SI Section 2.5 you saw:
Ifa set of data values X is related to a set of values Y so that Y = aX + b, then
mean of Y= ax meanofX +b
© standard deviation of Y= a x standard deviation of X
@ variance of ¥ = a’ x variance of X.
‘The same relationship applies if X and Y are random variables defined
in the same way (Y = aX + b).
‘The proof of these results is easiest to do by considering the multiplication
bya constant and adding a constant separately, and then the full result is
obtained just by applying them one after the other. We will show it in full
here for discrete random variables, but the same result holds for
continuous random variables which you will meet in Chapter 5
(where the summation is replaced by integration).
IfY=ax
He = Dap = by = Dy = Dax) p=
EX?) = Yiatp > EW") = Py p= Va) pa a’ Yap =a HX)
Var(X) = BX?) ~ (ste)?s Var(¥) = EO?) (t4,)? = a E(X?) — (aptg?
AP = Atty
P Var(X).
{EO°) - (uxY} =
Ify=X+b
te = Lop > oy = Typ = De +) p= Tap + bY p= p+ bosince Dp =.
Linear combination of random variables [MBX