A Level Further Mathematics For AQA
A Level Further Mathematics For AQA
Thinking
A Level Further
Mathematics for AQA
Statistics Student Book (AS/A Level)
Stephen Ward, Paul Fannon, Vesna Kadelburg and Ben Woolley
Contents
Introduction
How to use this resource
2 Poisson distribution
1: Using the Poisson model
2: Using the Poisson distribution in hypothesis tests
3 Chi-squared tests
1: Contingency tables
2: Yates’ correction
4 Continuous distributions
1: Continuous random variables
2: Expectation and variance of continuous random variables
3: Expectation and variance of functions of a random variable
4: Sums of independent random variables
5: Linear combinations of normal variables
6: Cumulative distribution functions
7: Piecewise-defined probability density functions
8: Rectangular distribution
9: Exponential distribution
10: Combining discrete and continuous random variables
Focus on … Proof 1
Focus on … Modelling 1
6 Confidence intervals
1: Confidence intervals
2: Confidence intervals for the mean when the population variance is unknown
Focus on … Proof 2
Focus on … Modelling 2
Formulae
Answers
Acknowledgements
Introduction
You have probably been told that mathematics is very useful, yet it can often seem like a lot of techniques
that just have to be learnt to answer examination questions. You are now getting to the point where you
will start to see where some of these techniques can be applied in solving real problems. However, as well
as seeing how maths can be useful, we hope that anyone working through this book will realise that it can
also be incredibly frustrating, surprising and ultimately beautiful.
The book is woven around three key themes from the new curriculum.
Proof
Maths is valued because it trains you to think logically and communicate precisely. At a high level, maths is
far less concerned about answers and more about the clear communication of ideas. It is not about being
neat – although that might help! It is about creating a coherent argument that other people can easily
follow but find difficult to refute. Have you ever tried looking at your own work? If you cannot follow it
yourself it is unlikely anybody else will be able to understand it. In maths we communicate using a variety
of means – feel free to use combinations of diagrams, words and algebra to aid your argument. And once
you have attempted a proof, try presenting it to your peers. Look critically (but positively) at some other
people’s attempts. It is only through having your own attempts evaluated and trying to find flaws in other
proofs that you will develop sophisticated mathematical thinking. This is why we have included lots of
common errors in our Work it out boxes – just in case your friends don’t make any mistakes!
Problem solving
Maths is valued because it trains you to look at situations in unusual, creative ways, to persevere and to
evaluate solutions along the way. We have been heavily influenced by a great mathematician and maths
educator George Polya, who believed that students were not just born with problem-solving skills – they
developed them by seeing problems being solved and reflecting on their solutions before trying similar
problems. You may not realise it but good mathematicians spend most of their time being stuck. You need
to spend some time on problems you can’t do, trying out different possibilities. If after a while you have not
cracked it, then look at the solution and try a similar problem. Don’t be disheartened if you cannot get it
immediately – in fact, the longer you spend puzzling over a problem the more you will learn from the
solution. You may never need to integrate a rational function in the future, but we firmly believe that the
problem solving skills you will develop by trying it can be applied to many other situations.
Modelling
Maths is valued because it helps us solve real-world problems. However, maths describes ideal situations
and the real world is messy! Modelling is about deciding on the important features needed to describe the
essence of a situation and turning them into a mathematical form, then using that to make predictions,
compare to reality and possibly improve the model. In many situations the technical maths is actually the
easy part – especially with modern technology. Deciding which features of reality to include or ignore and
anticipating the consequences of these decisions is the hard part. Yet it is amazing how some fairly drastic
assumptions – such as pretending a car is a single point or that people’s votes are independent – can result
in models that are surprisingly accurate.
More than anything else, this book is about making links – links between the different chapters, the topics
covered and the themes above, links to other subjects and links to the real world. We hope that you will
grow to see maths as one great complex but beautiful web of interlinking ideas.
Maths is about so much more than examinations, but we hope that if you take on board these ideas (and
do plenty of practice!) you will find maths examinations a much more approachable and possibly even
enjoyable experience. However, always remember that the results of what you write down in a few hours
by yourself in silence under exam conditions are not the only measure you should consider when judging
your mathematical ability – it is only one variable in a much more complicated mathematical model!
How to use this resource
Throughout this resource you will notice particular features that are designed to aid your learning. This
section provides a brief overview of these features.
predict the mean, mode, median and variance of a discrete random variable
understand how a linear transformation of a variable changes the mean and variance
prove and use the formulae for expectation and variance of a special distribution called the uniform
distribution
recognise when it is appropriate to use a uniform distribution.
If you are following the A Level course, you will also learn how to:
calculate the mean of a discrete random variable after a non-linear transformation.
Learning objectives
A short summary of the content that you will learn in each chapter.
WORKED EXAMPLE
The left-hand side shows you how to set out your working. The right-hand side explains the more
difficult steps and helps you understand why a particular method was chosen.
PROOF
WORK IT OUT
Can you identify the correct solution and find the mistakes in the two incorrect solutions?
A Level You should know how to use the rules of 1 Two events A and B are
Mathematics probability. independent. If P(A)=0.4
Student Book 1,
and P(B)=0.3, find
Chapter 21
P(A AND B).
A Level You should know how to find probabilities of 2 P(X=x)=kx for x=1,2, 3.
Mathematics discrete random variables. Find the value of k.
Student Book 1,
Chapter 21
A Level You should know how to find the mean, variance 3 Find the variance of 2, 5
Mathematics and standard deviation of data, including and 8.
Student Book 1, familiarity with formulae involving sigma notation.
Chapter 20
Key point
Common error
Specific mistakes that are often made. These typically appear next to the point in the Worked
example where the error could occur.
Tip
Each chapter ends with a Checklist of learning and understanding and a Mixed practice exercise, which
includes past paper questions marked with the icon .
In between chapters, you will find extra sections that bring together topics in a more synoptic way.
FOCUS ON…
Unique sections relating to the preceding chapters that develop your skills in proof, problem-solving and
modelling.
Questions covering topics from across the preceding chapters, testing your ability to apply what you
have learned.
Key terms are picked out in colour within chapters. You can hover over these terms to view their definitions,
or find them in the Glossary tab. Towards the end of the resource you will find practice paper questions,
short answers to all questions and worked solutions.
Rewind
Fast forward
Links to topics that you may cover in greater detail later in your study.
Focus on…
Links to problem-solving, modelling or proof exercises that relate to the topic currently being
studied.
Interesting or historical information and links with other subjects to improve your awareness
about how mathematics contributes to society.
1 A sequence is defined by un=2×3n−1. Use the principle of mathematical induction to prove that
u1+u2+…+un=3n−1.
Black – practice questions which come in several parts, each with subparts i and ii. You only need
attempt subpart i at first; subpart ii is essentially the same question, which you can use for further
practice if you got part i wrong, for homework, or when you revisit the exercise during revision.
predict the mean, mode, median and variance of a discrete random variable
understand how a linear transformation of a variable changes the mean and
variance
prove and use the formulae for expectation and variance of a special distribution
called the uniform distribution
recognise when it is appropriate to use a uniform distribution.
If you are following the A Level course, you will also learn how to:
calculate the mean of a discrete random variable after a non-linear transformation.
A Level You should know how to use the rules 1 Two events and are
independent. If
Mathematics of probability.
and find
Student Book 1,
.
Chapter 21
A Level You should know how to find the mean, variance 3 Find the variance of
Mathematics and standard deviation of data, including and .
Student Book 1, familiarity with formulae involving sigma
Chapter 20 notation.
A Level Further You should know how to calculate sums of 4 Find and simplify an
Mathematics powers of .
expression for
Student Book 1,
Chapter 11
Tip
Discrete variables don’t have to take integer values. However, the possible distinct values can be
listed, though the list may be infinite. For example:if is the standard UK shoe size of a random
adult member of the public, takes values , , , up to and is a discrete random
variable.If is the exact foot length of a random adult member of the public (in cm), takes
values in the interval [ , ] and is a continuous random variable.
Many real-life situations follow probability distributions – such as the velocity of a molecule in a waterfall or
the amount of tax paid by an individual. It is extremely difficult to make a prediction about a single
observation, but it turns out that you can predict remarkably accurately the overall behaviour of many
millions of observations. In this chapter you will see how you can predict the mean and variance of a
discrete random variable.
Section 1: Average and spread of a discrete random variable
The most commonly used measure of the average of a random variable is the expectation. It is a value
representing the mean result if the variable were to be measured an infinite number of times.
Tip
The expectation of a random variable does not need to be a value that the variable can actually
take.
where is each possible value that can take and is the associated probability.
Tip
The subscript in the formula in Key point 1.1 is just a counter referring to each possible value
and its associated probability.
You do not need to be able to prove this result, but you might find it helpful to see this proof.
PROOF 1
The mean of pieces of discrete data is Start from the definition of the mean.
The random variable has a probability distribution as shown in the table. Calculate .
As well as knowing the expected average, you may also be interested in how far away from the average
you can expect an outcome to be. The variance, , of a random variable is a value representing the
degree of variation that would be seen if the variable were to be repeatedly measured an infinite number
of times. It is a measure of how spread out the variable is.
Fast forward
The quantity is the expected value of , read as ‘the mean of the squares’. This variance formula is
often read as ‘the mean of the squares minus the square of the mean’.
Use the values from the distribution in the formula in Key point 1.2.
Tip
Many calculators can simplify this process. You normally have to treat the values of the
random variable as data and the probabilities as the frequency.
Two other less commonly used measures of average are the mode and the median. For data, the mode is
the most common result and this extends to variables.
The mode of a discrete random variable is the value of associated with the largest
probability.
For data, the median is the value that has half the data values below it and half above it. You can interpret
this in terms of probabilities.
b the median.
So the median is . Look for the first value that has a value
of greater than or equal to .
You could also check that
but this is not necessary here.
b
Use Key point 1.1.
Which is the correct solution? Identify the errors made in the incorrect solutions.
EXERCISE 1A
1 Calculate the expectation, mode, median, variance and standard deviation of each of these discrete
random variables.
a i
ii
b i
ii
c i
ii
d i ,
ii ,
, where .
a Show that .
, .
Score
Probability
Find the value of in order for the player to get an expected profit of counters per roll.
9 Two fair dice labelled with the values to are thrown. The random variable is the difference
between the larger and the smaller score, or zero if they are the same.
b Find .
c Find .
d Find the median of .
e Find .
10 a In a game a player pays an entrance fee of £ . He then selects one number from or and
rolls three fair four-sided dice, numbered to . If his chosen number appears on all three dice he
wins four times the entrance fee. If his number appears on exactly two of the dice he wins three
times the entrance fee. If his number appears on exactly one dice he wins £ . If his number does
not appear on any of the dice he wins nothing.
Copy and complete the probability table.
Profit £
Probability
b The game organiser wants to make a profit over many plays of the game. Given that he must
charge a whole number of pence, what is the minimum amount the organiser must charge?
11 Viewers are asked to rate a new film on a three-point scale. Their marks are modelled by the random
variable as shown.
a The mean, median and mode of are all equal. Find the variance of .
b Two independent viewers of the film are both asked their opinion.
i What is the probability that their total score is more than ?
The most common type of transformation is a linear transformation. This is where the new variable
is found from the old variable by multiplying by a constant and/or adding on a constant. You might do
this, for example, if you change the units of measurement. This kind of change is also known as ‘linear
coding’.
If you know the original mean and variance and how the data were transformed, you can use a shortcut to
find the mean and variance of the new data.
Fast forward
You will prove Key point 1.5 after you have developed a little more theory.
This means that the standard deviation of , is . This makes sense as multiplying the data by
does change how spread out they are, but adding on does not change the spread.
b To find the standard deviation you first need to find the variance of ,
using Key point 1.5.
Common error
It is easy to get confused with the minus sign in the transformations in Worked example 1.5.
Remember that both variances and standard deviations are always positive.
Non-linear transformations
You can also apply non-linear transformations to , such as , or . When you do this
there is no shortcut to finding the mean and variance of the transformed variable. You need to adapt
Key point 1.1.
Consider the discrete random variable outcome on a fair six-sided dice. If , you can
construct the probability distribution for :
The discrete random variable has the distribution shown in the table.
If °, find:
a
b .
a Apply Key point 1.6
You can use Key point 1.6 to prove Key point 1.5.
PROOF 2
Let
You can separate out a sum into its different terms, taking
out constant factors.
Use the fact that for any probability distribution
and the definition of expectation from Key point 1.1. You
have now established the first part of Key point 1.6.
Considering to get to
the variance:
EXERCISE 1B
ii
c i
ii
d i
ii
e i
ii .
ii
c i
ii
d i
ii .
3 Stephen goes on a mile bike ride every weekend. The distance until he stops for a picnic is
modelled by , where and .
is the distance remaining after his picnic. Find and .
4 The rule for converting between degrees Celsius and degrees Fahrenheit is:
When a bread oven is operating it has expected temperature with standard deviation
.
Find the expected temperature and standard deviation in degrees Fahrenheit.
5 The random variable has expectation and variance . If , find the values of
and so that the expectation of is zero and the standard deviation is .
6 is a discrete random variable where and . is a transformation of
such that . Find and the standard deviation of .
7 is a discrete random variable satisfying for .
Find:
a the value of
b
c
e .
10 The St Petersburg Paradox describes a game where a fair coin is tossed repeatedly until a head
is found. You win pounds if the first head occurs on the toss. How much should you pay to
play this game?
Section 3: The discrete uniform distribution
You have already met some special distributions that occur so often that they are named. For example, the
binomial and the normal distributions. Another very common distribution is the discrete uniform
distribution.
This is a distribution in which all the whole numbers from to are equally likely and it is given the symbol
. For example, gives the distribution of the outcomes on a fair six-sided dice.
If you identify a random variable as following a uniform distribution you can immediately write down the
expectation and variance.
and .
Rewind
You met the rules for working with indices in A Level Mathematics Student Book 1, Chapter 2.
You can prove the result in Key point 1.8 by using your knowledge of sums of powers of integers.
PROOF 3
If then and .
Use the result for the sum of the first positive integers:
The discrete random variable is equally likely to take any even value from to inclusive. Find
the variance of .
EXERCISE 1C
EXERCISE 1C
1 Find the mean and variance of these distributions.
a i
ii
b i
ii
2 A fair spinner has sides labelled . Find the expected mean and standard deviation of the
results of the spinner.
3 A fair dice has sides labelled . Find the expectation and standard deviation of the outcome
of throwing the dice.
4 a The random variable is equally likely to take any integer value between and . Show that this
can be written as where .
b Hence find the variance of .
5 A string of Christmas lights starts with a plug then contains a light every from the plug.
One light is broken. Assuming all bulbs are equally likely to break, what are the expected mean and
variance of the distance of the broken light from the plug?
6 The random variable is equally likely to take the value of any odd number between and
inclusive. Find the variance of .
7 The discrete random variable takes values . Find the expectation and
variance of .
8 and . Find .
A discrete uniform distribution models situations in which all discrete outcomes are equally
likely.
2
A discrete random variable has a distribution defined by for . Find
. Choose from these options.
3 A drawer contains three white socks and five black socks. Two socks are drawn without
replacement. is the number of black socks drawn.
b Find .
4 A fair six-sided dice is thrown once. The random variable is calculated as half the result if
the dice shows an even number, or one higher than the result if the dice shows an odd
number.
b Find .
c Find .
b is the discrete random variable that is equally likely to take any integer value between
and .
Find and .
c is the discrete random variable that is equally likely to take any even value between
and .
Find and .
8 The random variable has expectation and variance . If , find the values of
and so that the expectation of is and the standard deviation is .
b . Find and .
10 A fair dice is thrown until a has been thrown or three throws have been made. is the
discrete random variable representing the number of throws made.
b Find .
d The number of points awarded in the game, , is given by . Find the variance of .
11 a A four-sided dice labelled with the values to is rolled twice. Write down, in a table, the
probability distribution of , the sum of the two rolls.
b Find and .
c A four-sided dice is rolled once and the score, , is twice the result. Find the mean and
variance of .
12 The discrete random variable follows the distribution. is the expectation of and
is the variance of . Find .
Find, in terms of :
d .
15 In a card game a pack of standard playing cards is used. The cards are dealt one at a time
until the Queen of Spades (a unique card in the pack) is revealed.
a What are the expected mean and standard deviation of the number of cards until the
Queen of Spades is revealed?
b In the game the player scores points if the Queen of Spades is the th card revealed.
Find the expected number of points scored.
16 A box contains a large number of pea pods. The number of peas in a pod can be modelled by
the random variable . The probability distribution of is shown here:
or fewer or more
a Two pods are picked randomly from the box. Find the probability that the number of peas in
each pod is at most .
b It is given that .
iii Some children play a game with the pods, randomly picking a pod and scoring points
depending on the number of peas in the pod. For each pod picked, the number of points
scored, , is found by doubling the number of peas in the pod and then subtracting .
[© AQA 2014]
17 In a computer game, players try to collect five treasures. The number of treasures that Isaac
collects in one play of the game is represented by the discrete random variable .
a i Show that .
b The number of points that Isaac scores for collecting treasures is where .
[© AQA 2014]
2 Poisson distribution
Chapter 1 You should know how to find 3 Find and for this distribution:
the expectation and variance
of discrete random variables.
A Level You should know how to 4 A coin is tossed times and tails are
Mathematics carry out hypothesis tests on observed. Use a two-tailed test to
Student Book 1, the binomial distribution. determine at the significance level if
Chapter 22 this coin is biased.
There are many situations in which you know the average rate of events within a given space or time, in
contexts ranging from commercial, such as the number of calls through a telephone exchange per minute,
to biological, such as the number of clover plants seen per square metre in a pasture. If the events can be
considered independent of each other (so that the probability of each event is not affected by what has
already been seen), the number of events in a fixed space or time interval can be modelled by the Poisson
distribution.
Section 1: Using the Poisson model
The Poisson distribution is commonly used when these conditions hold:
If these conditions are satisfied then the discrete random variable ‘number of events, ’, follows the
Poisson distribution with mean . You write this as .
Tip
If a question mentions average rate of success, or events occurring at a constant rate, you
should use the Poisson distribution.
If you can identify a fixed number of trials, you should use the binomial distribution.
The Poisson distribution can also be a useful approximate model for discrete random variables in other
situations. However, if the stated conditions are not met this can only be established by looking empirically
at data.
Once you have identified that a situation follows a Poisson distribution, you can use facts about the
probability of a certain number of events, the expected number of events and the expected variance.
for
Common error
Notice that the values of the mean and variance are equal for the Poisson distribution. This is something
you look out for when determining if data are likely to fit a Poisson model, although in itself is not sufficient
to decide – there are other distributions with this feature.
0.4
0.3
0.2
0.1
0 x
0 1 2 3 4 5 6
Notice that:
Recordable accidents occur in a factory at an average rate of every year, independently of each
other. Find the probability that in a given year exactly recordable accidents occurred.
The Poisson distribution is scalable. For example, if the number of butterflies seen on a flower in minutes
follows a Poisson distribution with mean , then the number of butterflies seen on a flower in minutes
follows a Poisson distribution with mean , the number of butterflies seen on a flower in minutes follows
a Poisson distribution with mean , and so on.
Tip
Learn how to use your calculator to find Poisson probabilities, and cumulative
probabilities, .
If there are, on average, buses per hour arriving at a bus stop, find the probability that more
than buses arrive in minutes.
If random variables and follow Poisson distributions such that and and
, then .
Although you do not need to know the proof of the result in Key point 2.2, it does show an interesting link
with the binomial expansion.
PROOF 4
Hywel receives an average of emails and texts each hour. These are the only types of
message he receives.
a Assuming that both the emails and the texts form an independent Poisson distribution, find the
probability that he receives more than messages in an hour.
b Explain why the assumption that the emails and texts form independent Poisson distributions is
unlikely to be true.
Common error
Sometimes people think that the mean rate in a Poisson distribution has to be a whole number.
This is not the case.
The number of errors in a computer code is believed to follow a Poisson distribution with a mean
of errors per lines of code. Find the probability that there are more than errors in lines
of code. Which is the correct solution? Identify the errors made in the incorrect solutions.
More than errors in lines is equivalent to more than error in lines, so you need
EXERCISE 2A
4 From a particular observatory, shooting stars are observed in the night sky at an average rate of
one every five minutes. Assuming that this rate is constant and that shooting stars occur (and
are observed) independently of each other, what is the probability that more than are seen
over a period of one hour?
5 When examining blood from a healthy individual, under a microscope, a haematologist knows
he should see on average four white blood cells in each high power field. Find the probability
that blood from a healthy individual will show:
a seven white blood cells in a single high power field
b a total of white blood cells in six high power fields, selected independently.
6 A wire manufacturer is looking for flaws. Experience suggests that there are on average
flaws per metre in the wire.
a Determine the probability that there is exactly one flaw in one metre of the wire.
b Determine the probability that there is at least one flaw in metres of the wire.
7 The random variable has a Poisson distribution with mean . Calculate:
a
b
c
d
8 The number of eagles observed in a forest in one day follows a Poisson distribution with mean
.
a Find the probability that more than three eagles will be observed on a given day.
b Given that at least one eagle is observed on a particular day, find the probability that exactly
two eagles are seen that day.
9 The random variable follows a Poisson distribution. Given that , find:
a the mean of the distribution
b .
10 Let be a random variable with a Poisson distribution, such that . Use technology
to estimate , giving your answer to three significant figures.
11 The number of emails Sarah receives per day follows a Poisson distribution with mean . Let
be the number of emails received in one day and the number of emails received in a seven-
day week.
a Calculate and .
b Find the probability that Sarah receives emails every day in a seven-day week.
c Explain why this is not the same as .
12 The number of mistakes a teacher makes while marking homework has a Poisson distribution
with a mean of errors per piece of homework.
a Find the probability that there are at least two marking errors in a randomly chosen piece of
homework.
b Find the most likely number of marking errors occurring in a piece of homework. Justify your
answer.
c Find the probability that in a class of students fewer than half of them have errors in their
marking.
13 A car company has two limousines that it hires out by the day. The number of requests per day
has a Poisson distribution with mean requests per day.
a Find the probability that neither limousine is hired on any given day.
b Find the probability that some requests have to be denied on any given day.
c If each limousine is to be used equally, on how many days in a period of days would you
expect a particular limousine to be in use?
14 The random variable follows a Poisson distribution with mean . Given that
, find the exact value of .
15 The random variable follows a Poisson distribution with mean .
a Show that .
For a one-tailed test, compare the calculated probability to the significance level directly. For a two-tailed
test, you usually find the probability of one tail and compare it to half of the significance level.
The number of telephone calls received by a company follows a Poisson distribution. Over long
experience it is thought that the mean is calls per hour. After a redesign of their website it is
found that they got calls in an hour. Test at the significance level if this provides significant
evidence of a change in the mean number of calls per hour.
This is more than so do not reject Compare the upper tail to half of the significance value,
. since this is a two-tailed test. If you want the -value,
double the probability to get a -value of
There is insufficient evidence to
suggest that the mean number of Write a conclusion within the context of the question.
calls has changed from per hour.
is the random variable ‘number of absences per day in a school’. It is thought to follow a
Poisson distribution with mean . Following a change in the registration system, the number of
absences over five days was . Test at the significance level if the change in the registration
system has affected the average rate of absences. Which is the correct solution? Identify the
errors made in the incorrect solutions.
A ,
Under , .
If there are absences over five days, this is a rate of eight per day, so you need
. This is more than so you cannot reject . The average rate is
absences per day.
B ,
. Since this is a two-tailed test you must double this to get a -value of
. This is less than the significance level so you can reject . There is evidence at the
significance level that the average rate has changed from absences per day.
C ,
. so reject .
EXERCISE 2B
1 Conduct these hypothesis tests based on the given observation. You can assume that the data follow
a Poisson distribution. Use the significance level.
a i
ii
b i
ii
c i
ii
d i
ii
2 Find the critical region (the set of values for which the null hypothesis is rejected) at the
significance level if:
a i
ii
b i
ii
c i
ii .
3 It is known that a sample of radium emits alpha particles per millisecond. A second sample of the
same size and shape emits alpha particles in a millisecond. Test at the significance level whether
this sample has the same emission rate as radium.
4 a Over a long period it is believed that the average number of cars travelling past a traffic light
follows a Poisson distribution, with cars per minute. After some roadworks, it is thought that the
number of cars passing is lower. In a one minute observation only cars pass the traffic light. Find
the -value of this observation and hence decide at the significance level if the roadworks have
caused a decrease in traffic levels.
b Suggest two reasons why a Poisson distribution might not be appropriate.
5 a The number, , of accidents per month on a road is studied. The mean number of accidents per
month is with standard deviation . Explain why this supports the suggestion that the number
of accidents follows a Poisson distribution.
b Assume that does indeed follow a Poisson distribution. It is thought that adding a speed camera
will reduce the average number of accidents from . In the month after the camera was added
there were accidents. Test at the significance level if this is evidence of a reduction in the
average number of accidents.
6 The numbers of mistakes in nine pieces of a student’s homework are shown:
a Estimate the mean and standard deviation of the number of mistakes, based upon these data.
b After a new hedge has been planted it is thought that the number of bees arriving will increase. In
minutes bees visit the flower. Test at the significance level if there is evidence that the
number of bees has increased.
8 The number of leaks in a pipe is known to follow a Poisson distribution with mean leaks per km.
After the water pressure was changed, an inspection of of pipe revealed leaks. Has there been
a change in the mean number of leaks? Test, using the significance level.
9 It is known from long experience that earthquakes occur in a particular town once every four months.
Environmentalists believe that a change in the way oil is extracted from a well will increase the
number of earthquakes. They monitor the activity for one year and six earthquakes occur.
a Test at the significance level whether the number of earthquakes has increased from the long-
term trend, stating your -value.
b They continue to monitor earthquake activity and the following year six earthquakes also occur.
Test at the significance level whether the number of earthquakes has increased from the long-
term trend, stating your -value.
10 The discrete random variable follows . A single observation is used to test against
. What is the smallest value of for which will be rejected at the significance level
when the observation is ?
If , and , then
You can use the Poisson distribution to conduct a hypothesis test to see if it suggests that the
mean rate has changed.
Mixed practice 2
1 The number of complaints in a shop in any hour while it is open follows a Poisson distribution
with mean per hour. Find the probability that in a three-hour shift there are fewer than
complaints, giving your answer to three significant figures. Choose from these options.
3 The random variable is the number of robins that visit a bird table each hour. The random
variable is the number of thrushes that visit a bird table each hour. These are the only types
of bird that visit the table.
is the random variable ‘Number of birds visiting the table each hour’.
b Find the probability that no birds visit the table in one hour.
c Find
4 is the random variable ‘number of burgers ordered per hour in a restaurant’. It is thought
that .
a Write down two conditions required for the Poisson distribution to model data.
b Find
c During a ‘happy hour’ special offer the number of burgers sold increased to . Test at the
significance level whether the special offer has increased the average rate of burgers
ordered from .
5 Salah is sowing flower seeds in his garden. He scatters seeds randomly so that the number of
seeds falling on any particular region is a random variable with a Poisson distribution, with
mean value proportional to the area. He intends to sow fifty thousand seeds over an area of
.
7 Seven observations of the random variable , the number of power surges per day in a power
cable, are shown:
a Estimate the mean and standard deviation of , based upon these observations.
b Use your answer to part a to explain why the Poisson distribution is a plausible model for
.
c When a new brand of cable is used it is observed that there are power surges in five
days. Does this suggest that the new brand has a different average rate of power surges to
your answer in part a? Use a significance level.
a Find the probability that on a particular day she will answer more than phone calls.
b Find the probability that she will answer more than phone calls every day during a five-
day week.
9 During the month of August in Bangalore, India, there are on average rainy days.
a Find the probability that there are fewer than seven rainy days during the month of August
in a particular year.
b Find the probability that, in ten consecutive years, exactly five have fewer than seven rainy
days in August.
b .
12 A geyser erupts randomly. The eruptions at any given time are independent of one another
and can be modelled, using a Poisson distribution with mean per day.
a Determine the probability that there will be exactly one eruption between and
b Determine the probability that there are more than eruptions during one day.
c Determine the probability that there are no eruptions in the minutes Naomi spends
watching the geyser.
d Find the probability that the first eruption of a day occurs between and
e If each eruption produces litres of water, find the expected volume of water produced
in a week.
f Determine the probability that there will be at least one eruption in at least six out of the
eight hours the geyser is open for public viewing.
g Given that there is at least one eruption in an hour, find the probability that there is exactly
one eruption.
13 In a particular town, rainstorms occur at an average rate of two per week and can be
modelled, using a Poisson distribution.
a What is the probability of at least eight rainstorms occurring during a particular four-week
period?
b Given that the probability of at least one rainstorm occurring in a period of complete
weeks is greater than , find the least possible value of .
14 Patients arrive at random at an emergency room in a hospital at the rate of per hour
throughout the day.
a Find the probability that exactly four patients will arrive at the emergency room between
and .
b Given that fewer than patients arrive in one hour, find the probability that more than
arrive.
15 It is thought that . A single observation of takes the value . Does this provide
evidence at the significance level that the average rate has decreased? Support your
answer by writing down the -value of the observation.
16 Based on long experience a gardener knows that birds tend to arrive in his garden at an
average rate of per hour.
a State two assumptions required to model the birds’ arrival, using a Poisson distribution. Are
these reasonable assumptions?
b If these assumptions do hold, find the probability of observing more than birds in an
hour.
The gardener plants some new flowers. He wants to know if this changes the birds’ behaviour.
c If is the true average rate of arrival of birds after the new flowers have been planted,
write down suitable null and alternative hypotheses for answering the gardener’s question.
d If birds are observed in an hour, what is the conclusion of the test at significance?
17 A water company believes that pipes have leaks per km, following a Poisson distribution.
After increasing water pressure they are concerned that there are more leaks. They find
leaks in a section of pipe. Does this provide significant evidence at the significance
level to suggest that the mean number of leaks has increased?
18 A shop has four copies of the magazine Ballroom Dancing delivered each week. Any unsold
copies are returned. The demand for the magazine follows a Poisson distribution with mean
requests per week.
a Calculate the probability that the shop cannot meet the demand in a given week.
d Determine the smallest number of copies of the magazine that should be ordered each
week to ensure that the demand is met with a probability of at least .
19 Annette is a senior typist and makes an average of mistakes per letter. Bruno is a trainee
typist and makes an average of mistakes per letter. Assume that the number of mistakes
made by any typist follows a Poisson distribution.
i Find the probability that a randomly chosen letter contains exactly three mistakes.
ii Given that a letter contains exactly three mistakes, find the probability that it was typed
by Annette.
c Annette and Bruno type one letter each. Given that the two letters contain a total of three
mistakes, find the probability that Annette made more mistakes than Bruno.
20 The number of worms in a square metre in a forest satisfies the distribution . A scientist
samples many square-metre areas but only records areas where some worms are observed.
What is the mean value of her observations?
21 Mohammed is offered a week’s trial with a view to being permanently employed to service
bicycles in Robyn’s bicycle shop.
a Find the probability that, on Mohammed’s first day, the number of bicycles brought in to be
serviced is:
i or fewer
ii more than
iii exactly .
b Before starting work, Mohammed told his mother that he hoped that, during his first week (
days), the number of bicycles brought in to be serviced would be:
at least , otherwise Robyn might decide that there was not enough work to justify
permanently employing him
not more than , so that he would not have to work too hard.
Find the probability that Mohammed’s hopes will be met.
[© AQA 2011]
22 At a Roman site, coins are found at an average rate of coin per . Assume that the
number of coins found can be modelled by a Poisson distribution.
b Determine the probability that more than coins are found in an area of .
c Bronze brooches are less common than coins at this site, and are found at an average rate
of brooch per . The number of these brooches found is independent of the number of
coins found. Assume that the number of bronze brooches found can also be modelled by a
Poisson distribution.
i Determine the probability that the total number of coins and bronze brooches found in
an area of is at least .
ii Sometimes, Romans buried a hoard of several coins together. They did not usually bury
several bronze brooches together. State, with a reason, which of
[© AQA 2013]
3 Chi-squared tests
A Level Mathematics You should know how to 2 The probability of Andrew scoring
Student Book 1, Chapter calculate probabilities for a goal is and the probability of
21 independent events. Helen scoring a goal is . Given
that these outcomes are
independent, what is the
probability that they both score a
goal?
Independent?
One common question you can ask in a statistical situation is whether or not two variables are dependent –
for example, do future earnings depend on A Level choices? In this chapter you will look at a statistical test
to answer this type of question.
Estimated angle
80
(degrees)
70
60
50
0 5 10 15
Time (s)
It turns out that if two variables are independent then they will definitely be uncorrelated, but the
reverse is not true. You can write:
Section 1: Contingency tables
In this section you will try to design a hypothesis test that decides whether two variables are dependent:
Tip
Choose to be that the variables are independent because you can use that to calculate
expected values. You cannot use the fact that two variables are dependent to calculate expected
values unless you are given more information about what that dependence is.
To describe the two variables you use contingency tables that list how often each combination of
variables occurs. For example, this table illustrates the results of a survey of young families. The observed
value in cell is called .
Number of children
or more
or fewer
Number
of
bedrooms
or more
This is a contingency table. Notice that each cell contains actual frequencies rather than probabilities
or proportions.
You are going to need a way of measuring how far this is away from the numbers you would expect if the
two variables were independent. To do this you look at the totals.
Number of children
Total
or more
or fewer
Number
of
bedrooms
or more
Total
Based on the sample, the probability of having bedrooms or fewer is and the probability of having
children is . If the two variables are independent, the probability of both occurring is the product of
these probabilities, so the probability of two children and two bedrooms is . In a sample of size
you would then expect there to be families with two children and two bedrooms.
The expected frequency in cell is called .
Tip
You can create another contingency table containing all the expected frequencies.
Number of children
or more
or fewer
Number
of
bedrooms
or more
There are several possible measures of the difference between observed and expected values. The
measure you need to know is called chi-squared .
Tip
Notice that the row totals and the column totals are the same as for the original data. This is a
useful check.
The chi-squared value that gives the difference between the observed values, , and the
expected values, , is
Large values indicate a big difference between observed and expected data. Is this value large enough to
conclude that number of bedrooms and number of children are not independent? To decide, you need to
know the distribution of to see how likely the observed value is. This distribution has a single
parameter that depends on the number of cells in the table and that, for historical reasons, is called the
degrees of freedom – often given the symbol (lowercase Greek letter ‘nu’) or DF.
If the null hypothesis is true (that the variables are independent) approximately follows the chi-
squared distribution with degrees of freedom – the distribution. However, this approximation is only
valid if all the expected frequencies in the contingency table are greater than .
If the null hypothesis is true and the expected value for all , then the chi-squared value is
In the survey results, not all of the expected values are above . When this happens you need to combine
some rows or columns in a way that is sensible in context. The most obvious way with the example given is
to combine the ‘ children’ group with the ‘ or more children’ group.
Number of children
or more
or fewer
Number of
bedrooms
or more
You can then create the new contingency table of expected values.
Tip
You can find the expected values by adding up the corresponding expected values from the
original table. You don’t have to recalculate the frequencies, using Key point 3.1.
Number of children
or more
or fewer
Number of
bedrooms
or more
You can find the contributions of each cell to the total chi-squared value.
Number of children
or more
or fewer
Number of
bedrooms
or more
Totalling these contributions, and . You can compare this value to critical
values given in the formula book.
The highlighted value, 9.488, gives the critical value for a test at the significance level with four
degrees of freedom. The column is headed because of chi-squared values with degrees of
freedom are below this value, therefore are higher. The calculated value of is higher than , so
you reject the null hypothesis and conclude that number of bedrooms and number of children are
dependent variables.
The contingency table showing the contributions of each cell to the sum shows some cells have a
Tip
Some calculators allow you to do the chi-squared test automatically and provide you with the
-value. This alternative approach is acceptable.
Determine at the significance level whether or not the colour of a car sold by a dealership is
independent of the gender of the purchaser.
Gender
Total
Male Female
Blue
Red
Colour
Green
Silver
Total
Silver
This contingency table shows the favourite sports played by different age groups in a sample at a
sports centre.
Age
Soccer
Basketball
Favourite
sport
Swimming
Tennis
a Test at the significance level whether preferred sport and age are independent, showing the
contributions of each cell.
The expected values are: Find expected values, using Key point 3.1.
Check that all the expected values are above ,
which they are not in this case. Notice that the
Soccer row and column totals are the same as those in
the observed data table.
Basketball
Swimming
Tennis
Several cells in the range have a The most obvious choice is to combine the
frequency less than , so combining this and groups.
column with the column:
Soccer
Basketball
Swimming
Tennis
Soccer
Basketball
Swimming
Tennis
Soccer
Basketball
Swimming
Tennis
A biologist claims that the mobility of fish is dependent on their breeding ground. A sample of
fish was taken, with equal numbers from each of the two breeding grounds, Ellesmere and Duxbury,
studied. A test was used to classify the fish as sedentary, normal or highly mobile. In Ellesmere half
of the fish were classified as highly mobile and one-fifth as normal. In Duxbury one quarter of the
fish were classified as normal and of the Duxbury fish were classified as sedentary.
Expected values:
Sedentary Normal Highly
mobile You can use Key point 3.1 to calculate the
expected values. All expected values are larger
Ellesmere than so no combining is required.
Duxbury
Find the value of , using Key point 3.2.
EXERCISE 3A
1 Test these contingency tables to see if the two variables are dependent at the significant level.
State carefully the number of degrees of freedom and the value of . In part b, combine
suitable columns to make all expected frequencies greater than 5.
a i Exam grade
or , or or
Mr Archer
Teacher Ms Baker
Mrs Chui
ii Time working
hours hours hours
Male
Gender
Female
b i Age
Social
media
followers
ii Cost
Red
Colour Green
Blue
2 A Physics teacher wants to investigate whether or not there is any association between the
Physics grade her students get and the Mathematics course they study. She collects data for a
random sample of students over several years. The results are given in this table.
or lower or or
Further Maths
Maths AS or
No Maths
Walk
Car
Other
Total
a Copy and complete the contingency table.
b Calculate the value of for this data.
c Conduct an appropriate test at the significance level to answer James’ question.
d What assumptions does James have to make in conducting this test?
5 The owner of a beauty salon wants to find out whether there is any association between the
number of times in a year people visit the salon and the amount of money they spend on each
visit. He collects these data for a random sample of clients.
Number of visits
Amount spent
per visit £
Is there evidence, at the level of significance, that there is some association between the
number of visits and the amount of money spent? Interpret your result in context.
6 A drugs manufacturer claims that the speed of recovery from a certain illness is higher for people
who take a higher dose of their new drug. They provide these data for a sample of patients.
No drug taken Single dose Double dose
days
days
days
days
Test whether there is evidence for the manufacturer’s claim at the level of significance.
Interpret your results in context.
7 A company is investigating their gender equality policies. As a part of this investigation they
collect data on salaries, to the nearest pound, for a random sample of employees, as shown in
this table.
Male Female
c Find the largest sample size that will produce a significant result in the chi-squared test at the
significance level, assuming that all cells have an expected frequency of at least 5.
9 A researcher believes that these percentages are the true proportions of people voting for
different political parties based on their gender:
Male Female
Party A
Party B
Party C
First factor
Total
Second factor B
Total
a Copy the table and fill in the blanks.
b Hence explain why it can be said that this contingency table has degrees of freedom.
11 Explain why the formula for chi-squared contains:
a squaring before summing
b dividing by the expected value.
Section 2: Yates’ correction
It turns out that when the number of degrees of freedom (i.e. a × contingency table) then
the approximation that is not very good. To improve upon this you use an alternative
formula, called Yates’ correction.
Yates’ correction:
Rewind
You met the modulus function, , in A Level Mathematics Student Book 2, Chapter 3.
This contingency table shows the results of people in a driving test, along with their
gender.
Gender
Male Female
Pass
Result
Fail
Test at the significance level if the outcome of the test is independent of gender.
Gender
Find the expected values, using
Male Female Key point 3.1.
Pass
Result
Fail
WORK IT OUT
3.1
Test at the significance level if there is any association between teacher and test result:
Teacher
Total
Mr A Mrs B
Pass
Result
Fail
Total
Which is the correct solution? Identify the errors made in the incorrect solutions.
A If there is no association then each cell will be the same, so the expected values are:
Teacher
Mr Mrs
Pass
Result
Fail
So
The critical value when is , so you can reject ; the result does not
depend on the teacher.
Teacher
Mr Mrs
Pass
Result
Fail
The critical value is , which is more than the calculated value, so do not reject ;
the result does depend on the teacher.
Teacher
Mr Mrs
Pass
Result
Fail
Sometimes Yates’ correction only becomes clear after you combine rows and columns of a
contingency table.
This contingency table shows the location and ownership status of a random sample of
houses.
Urban
Rural
Does this sample provide evidence at the significance level that ownership status depends
on location?
: Ownership status and location are First write the null and alternative
independent. hypotheses.
: Ownership status and location are
dependent.
Urban
Rural
EXERCISE 3B
1 Use Yates’ correction to test these contingency tables for evidence of association, using
significance.
a i
ii
b i
ii
2 Gregor Mendel, the founder of modern genetics, carefully observed peas and found these
results:
Wrinkled Round
Yellow
Green
Show that the round or wrinkled appearance of the pea is independent of the colour at the
level of significance.
3 A scientist wanted to find out if a colleague could tell whether tea or milk was put in the cup first
when tea was prepared for her. The results are shown here.
Tea first Milk first
Likes
Dislikes
4 This table shows the number of books in libraries in rural and urban locations.
Number of books
to
Rural
Urban
Conduct a test at the significance level to determine if the number of books differs between
rural and urban libraries.
5 These data show the number of murders each year and the amount spent on horror films in the
cinema across the last years in the UK.
Amount spent on horror films in million £
to
Number of
murders
Test at the significance level to see if there is an association between the amount spent on
horror films and the number of murders each year. Does this provide evidence that watching
horror films encourages people to commit murder?
6 In admissions to the six largest departments in Berkeley, a university in California, followed
this pattern.
Accepted Rejected
Male
Female
Total
7 Explain why the null hypothesis in a chi-squared test cannot be ‘The two variables are
dependent’.
The distribution provides a very important method for deciding if two variables are
independent.
If the variables are independent, you use the formula
2 A contingency table has all expected frequencies larger than and a chi-squared value of
. What is the largest range of values of for which there is evidence that the two factors are
not independent at significance?
3 The area manager of a bank obtained information on randomly selected loans made by
the bank during the previous two years.
The loan recipient types were categorised as ‘Individual’, ‘Small business’ or ‘Large business’.
Recipient
Satisfactory
Outcome
Bad debt
Using a distribution and the level of significance, test whether the outcome of a loan is
independent of the type of recipient.
[© AQA 2013]
4 This contingency table shows the data on hair colour and eye colour for a sample of
children.
Eye colour
Brown
Hair colour
Blonde
a Assuming that hair colour and eye colour are independent, calculate the expected
frequencies.
b Calculate the value of the statistic for this data and state the number of degrees of
freedom.
c Perform a suitable hypothesis test at the level of significance to decide whether hair
colour and eye colour are independent. State your hypotheses and your conclusion clearly.
5 A nurse thinks that she has noticed that more boys are born at certain times of the year. She
records the data for babies born in her hospital in one year.
Boy
Girl
Test at the significance level whether her data gives evidence for any association between
the gender of the baby and the time of the year. You must show all your working clearly.
6 Find the value of the appropriate chi-squared test statistic (to three significant figures) for this
contingency table.
7 A large estate agency would like all the properties that it handles to be sold within three
months. A manager wants to know whether the type of property affects the time taken to sell
it. The data for a random sample of properties sold are tabulated here.
Type of property
Total
Flat Terraced Semidetached Detached
Total
b The manager plans to spend extra money on advertising for one type of property in an
attempt to increase the number sold within three months. Explain why the manager might
choose:
i terraced properties
ii flats.
[© AQA 2013]
8 Fiona, a lecturer in a school of engineering, believes that there is an association between the
class of degree obtained by her students and the grades they had achieved in A Level
Mathematics.
In order to investigate her belief, she collected the relevant data on the performances of a
random sample of recent graduates who had achieved grades or in A level
Mathematics. These data are tabulated here.
Class of degree
Total
A Level grade
Total
b Make two comments on the degree performance of those students in the sample who
achieved a grade in Level Mathematics.
[© AQA 2012]
9 An organisation kept details of sideswipe accidents involving heavy goods vehicles (HGVs)
during 2006.
The type of each sideswipe accident was recorded as changing lane to the left, changing lane
to the right or overtaking moving vehicle.
The HGV involved was identified as either British registered (right-hand drive) or foreign
registered (left-hand drive).
Total
ii Describe any differences found in the type of sideswipe accident between British
registered and foreign registered HGVs.
b A further random sample of serious HGV accidents was investigated. It was found that
of these involved drivers who were years of age or younger. Of these accidents,
resulted in prosecution for a driving offence. Of the other accidents, which involved drivers
over the age of years, resulted in prosecution for a driving offence.
ii Carry out a test, at the significance level, to investigate whether the age of the
driver is independent of whether a prosecution for a driving offence resulted.
Interpret your conclusion in context.
[© AQA 2011]
10 The director of a large company wants to know whether there is any association between the
ages of her staff and the departments they work in. The table shows the data for a sample of
employees.
Accounts
Personnel
Marketing
Communications
Perform a suitable test at the level of significance to decide whether there is any
association between age and department.
11 The table shows the experience of a bank over a long period of the types of loan that they
give and whether they are repaid or defaulted (i.e. not repaid).
Repaid Defaulted
Personal
Mortgage
Business
a Show that whether or not a loan gets repaid depends on the type of loan.
b A statistician wants to sample loans at random. Show that would not show
dependence between the two values, using significance.
c Find the smallest whole number for which the sample would be expected to show
dependence at significance. You can assume that all expected values are above .
12 Research was carried out to investigate for a possible connection between weekly alcohol
consumption and development of Type 2 diabetes. In the research report, it was stated that a
sample of women, aged between and , was studied and that of these women went
on to develop Type diabetes.
The women were categorised according to their average level of weekly alcohol
consumption. This was measured, in grams of alcohol per week, as ‘less than ’, ‘between
and ’ or ‘more than 30’.
Type 2 diabetes
developed
Yes No
Less than
More than
b A medical reviewer for a newspaper read the report and then stated that people should
increase their weekly alcohol consumption in order to decrease their chance of developing
Type diabetes.
Make two comments on his statement, referring to both the study and the sources of
association, if any, identified when carrying out the test in part a.
c In fact, women were involved in the research but the frequencies in the resulting
contingency table had been divided by in order to make the calculation simpler.
The test in part a was therefore repeated using the correct frequencies.
[© AQA 2010]
4 Continuous distributions
A Level Mathematics You should know the meaning of the 2 Find the interquartile range
Student Book 1, statistical measures covered in AS of:
Chapter 20 Level Mathematics
A Level Mathematics You should know how to use the 5 Given that
Student Book 2, rules of probability including and , find .
Chapter 20 conditional probability.
Mass Frequency
Not all of the data in category has a mass of exactly . A bag with mass or
would be included in this category. It is impossible to list all the different possible actual masses, and it is
impossible to measure the mass absolutely accurately. When you collect continuous data, you have to put
it into groups. This means that you cannot talk about the probability of a single value of a continuous
random variable (CRV). You can only talk about the probability of the CRV being in a specified interval.
x
5.05 5.15
5.087 954 6
A useful way of representing probabilities of a CRV is as an area under a graph. The probability of a single
value would correspond to the ‘area’ of a vertical line, which would be zero. However, you can find the area
of the CRV in any interval by integration.
The function which you have to integrate is called the probability density function (PDF), and it is often
denoted . The defining feature of is that the area between two values is the probability of the CRV
falling between those two values.
Tip
For a continuous random variable, it does not matter whether you use strict inequalities
or inclusive inequalities .
y
y = f(x)
x
O a b
As with discrete probabilities, the total probability over all cases must equal . Also, no probability can ever
be negative. This provides two requirements for a function to be a probability density function.
, for all
Tip
The limits and represent the fact that, in theory, a continuous random variable can take
any real value. In practice, the limits of the integral are set to the lowest and the highest value
the variable can take.
f(x)
x
O 1
a
The total area is . The limits are and because the PDF is only
non-zero between and .
b Use the formula in Key point 4.1 and substitute the value of
found in part a.
EXERCISE 4A
1 For each of these distributions find the possible values of the unknown parameter .
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
f i
ii
g i
ii
h i
ii
2 In each part, a continuous random variable has the given probability density function.
a
i Find
ii Find
i Find
ii Find
i Find
ii Find .
3 In each part, a continuous random variable has the given probability density function.
i Find if
ii Find if
i Find if
ii Find if
c
i Find if
ii Find if
4 A model predicts that the angle, , an alpha particle is deflected by a nucleus is modelled by
the PDF
5 The probability density function of finding a seed at a distance from a tree is proportional to
. The minimum distance seed is found from the tree is . Find the probability of a seed
being found more than from the tree.
7 Given that the continuous random variable has probability density function
8
A continuous random variable has probability density function
a Find in terms of if
b Find in terms of if
9 The continuous random variable has probability density function for and
otherwise. The probability of two independent observations of both being above is .
Find the values of and of .
10
The continuous random variable has probability density function . Find
11 The continuous random variable has probability density function given by for
. Prove that there is only one possible value of , and state its value.
Section 2: Expectation and variance of continuous random variables
The expressions for expectation and variance of continuous random variables all involve integration.
Tip
You might notice that the expressions for and look similar to those for discrete
random variables, but with integration instead of summation signs. This is because there is a link
between sums and integrals.
You need to evaluate these integrals over the whole domain of the probability density function.
It is also possible to find the median and mode for a continuous distribution.
The defining feature of the median is that half of the data should be below this value and half above. You
can interpret this in terms of probability.
If you represent the median of a continuous random variable with PDF by , then it satisfies
.
Don't forget to look at the end points of the function when finding the mode.
You can use similar ideas to find the quartiles (or any other percentile). For example, if is the lower
quartile and is the upper quartile then
Although the lower limit is written as minus infinity, in practice it starts from the lowest value for which the
probability density function is defined.
Find the median and mode of the random variable with probability density function
For median :
Use the formula in Key point 4.4 with the lower limit as .
For the mode, check for a maximum point. This could be where the
derivative is zero or at an end point.
EXERCISE 4B
1 Find , the median of , the mode of and if has the given probability density
function.
a i
ii
b i
ii
c i
ii
d i
ii
ii
ii
3
The continuous random variable has pdf
b Find .
a Show that, for all values of , the function satisfies the conditions to be a PDF.
a Show that .
.
Section 3: Expectation and variance of functions of a random variable
Linear transformations
Suppose the average height of students in a class was and their standard deviation was . If they
all stood on their -high chairs then the new average height would be , but the range, and any
other measure of variability, would not change, so the standard deviation would still be . In other
words, if you add a constant on to a variable, you add the same constant on to the expectation, but the
variance does not change:
Rewind
You have already met this idea for discrete random variables in Key point 2.5. In this chapter
you extend it to continuous random variables.
If, instead, each student were given a magical growing potion that doubled their heights, the new average
height would be . However, the range, along with any other measure of variability, would have
doubled, so the new standard deviation would be . This means that their variance would change from
to . In other words, if you multiply a variable by a constant, you multiply the expectation by
the constant and multiply the variance by the constant squared:
Common error
It is important to know that this only works for the structure , which is called a linear
function. So, for example, cannot be simplified to or .
For a random variable with expectation and variance . If and are constants,
then
,
A length of pipe is cut into a long pipe with average length and standard deviation .
The leftover piece is used as a short pipe. Find the mean and standard deviation of the short pipe.
Length of long pipe
Define your variables.
Length of short pipe
Connect your variables.
General transformations
Key point 1.6 stated that for a discrete random variable
You can extend this to continuous random variables by changing the probability into a probability density
function and integrating instead of summing.
Common error
You will always get a positive variance (since square numbers are always positive), even if the
coefficients are negative. If you find you have a negative variance, something has gone wrong!
Given that the random variable has probability density function for and
otherwise, find .
EXERCISE 4C
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii .
2 Given that , find:
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
3 Given that is a continuous random variable with PDF for and otherwise,
find:
a i
ii
b i
ii
c i
ii
d i
ii
4 The expected distance of a random taxi journey is miles with standard deviation miles.
The charge for a taxi journey is £ plus £ per mile (so that, for example, a mile journey
would cost £ ). Find:
6 Daniel has hours of playtime each Sunday afternoon. In that time he either reads or plays
games. If the expected amount of time reading is hours with a standard deviation of hours,
find:
a Find .
b Find the expected volume of the cube.
Common error
Notice that the answer to part b is not the cube of the answer to part a.
8 The continuous random variable has probability density function for
and otherwise.
Find:
a
b
c .
9 The continuous random variable has probability density function for
and otherwise.
.
Section 4: Sums of independent random variables
A tennis racquet is formed by adding together two components – the handle and the head. If both
components have their own distribution of length and they are combined together randomly then you have
formed a new random variable – the length of the racquet. It is not surprising that the average length of
the racquet is the sum of the average lengths of the parts, but with a little thought you can reason that the
standard deviation will be less than the sum of the standard deviations of the parts. To get extremely long
or extremely short tennis racquets you must have extremes in the same direction for both the handle and
the head. This is not very likely. It is more likely that:
Tip
The first of the results in Key point 4.7 is true even if and are not independent.
For independent random variables with expectation and variance , and with
expectation and variance :
The results in Key point 4.7 extend to more than two variables.
Find the mean and standard deviation of the total height of a whole burger in a bun, assuming that
the thicknesses of the individual parts are independent.
In Key point 4.7 it was stressed that and have to be independent, but this does not mean that they
have to be drawn from different populations. They could be two different observations of the same
population, for example, the heights of two different people added together. This is a different variable
from the height of one person doubled. Use a subscript to emphasise when there are repeated
observations from the same population:
The expectation of both of these combinations is the same: . However, the variance is different.
So the variability of a single observation doubled is greater than the variability of two independent
observations added together. This is consistent with the earlier argument about the possibility of
independent observations cancelling out extreme values.
You can also combine the results in Key points 4.5 and 4.7 to look at other linear combinations of
independent random variables.
The volume of lemonade Chris purchases at a supermarket is a discrete random variable with
mean and standard deviation . The volume of lemonade Chris drinks on the
journey home is a continuous random variable with mean and standard deviation .
Assume that is independent of . is the random variable: volume of lemonade in ml remaining
after the journey home.
a Find the expected mean and standard deviation of .
EXERCISE 4D
EXERCISE 4D
1 Let and be two independent variables with , , and . Find
the expectation and variance of:
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii .
2 Let and be two independent variables with , , and . Find:
a
b
c
d .
3 and are two independent observations of the random variable with and
.
The sample mean, , of these two observations is also a random variable defined by
.
a Show that .
b Find .
4 The average mass of a man in an office is with standard deviation . The average mass of a
woman in the office is with standard deviation . The empty lift has a mass of . What is
the expectation and standard deviation of the total mass of the lift when women and men are
inside?
5 A weighted dice has mean outcome with standard deviation . Brian rolls the dice once and doubles
the outcome. Camilla rolls the dice twice and adds the results together. Work out the expected mean
and standard deviation of the difference between their scores.
6 Exam scores at a large school have mean and standard deviation . Two students are selected at
random. Find the expected mean and standard deviation of the difference between their exam scores.
7 Adrian cycles to school with a mean time of minutes and a standard deviation of minutes.
Pamela walks to school with a mean time of minutes and a standard deviation of minutes.
They each calculate the total time it takes them to get to school over a five day week. Find the
expected mean and standard deviation of the difference in the total weekly journey times, assuming
journey times are independent.
8 is the random variable mass of a gerbil. Explain the difference between and .
Section 5: Linear combinations of normal variables
Although the proof is beyond the scope of this course, it turns out that any linear combination of normal
variables will also follow a normal distribution. You can use the methods from Section 4 to find out the
parameters of this distribution.
Rewind
You studied the normal distribution in A Level Mathematics Student Book 2, Chapter 21.
Use Key point 4.5.
Rewind
In Chapter 2 you met the idea that the Poisson distribution was scalable. You can now interpret
this as meaning that the sum of two Poisson variables is also Poisson. This is the only other
distribution in this course that has this property. However, it only applies to sums of Poisson
distributions – not to differences or multiples or linear combinations.
EXERCISE 4E
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
f i , where is the average of observations of .
The masses of their hand luggage follow a normal distribution with mean and variance
.
a State the distribution of the total mass of a passenger and their hand luggage and find any
necessary parameters.
b What is the probability that the total mass of a passenger and their luggage exceeds ?
3 Evidence suggests that the times Aaron takes to run are normally distributed with mean
and standard deviation . The times Bashir takes to run are normally
distributed with mean and standard deviation .
a Find the mean and standard deviation of the difference between Aaron’s and
Bashir’s times.
The rods are checked in batches of six, and a batch is rejected if the average length is less than
or more than .
a Find the distribution, including any necessary parameters, of the mean of a random sample
of six rods.
a What is the probability that a randomly chosen pipe has a length of or more?
b What is the probability that the average length of a randomly chosen set of pipes of this
type is or more?
6 The masses, , of male birds of a certain species are normally distributed with mean
and standard deviation .
The masses, , of female birds of this species are normally distributed with mean and
standard deviation .
b Find the probability that the mass of a randomly chosen male bird is more than twice the
mass of a randomly chosen female bird.
c Find the probability that the total mass of three male birds and female birds (chosen
independently) exceeds .
7 A shop sells apples and pears. The masses, in grams, of the apples can be assumed to have a
distribution and the masses of the pears, in grams, can be assumed to have a
distribution.
a Find the probability that the mass of a randomly chosen apple is more than double the mass
of a randomly chosen pear.
b A shopper buys apples and a pear. Find the probability that the total mass is greater than
.
8 The length of a corn snake is normally distributed with mean .
The probability that a randomly selected sample of corn snakes has an average of above
is .
9 a In a test, boys have scores that follow the distribution . Girls’ scores follow .
What is the probability that a randomly chosen boy and a randomly chosen girl differ in
scores by less than ?
b What is the probability that a randomly chosen boy scores less than three-quarters of the
mark of a randomly chosen girl?
10 The daily rainfall in Algebraville follows a normal distribution with mean and standard
deviation .
On a randomly chosen day, there is a probability of that the rainfall is greater than .
In a randomly chosen seven-day week, there is a probability of that the mean daily rainfall
is less than .
b What assumption was required in performing this calculation? How reasonable is this
assumption?
11 Anu uses public transport to go to school each morning. The time she waits each morning for
the transport is normally distributed with a mean of minutes and a standard deviation of
minutes.
a On a specific morning, what is the probability that Anu waits more than minutes?
b During a particular week (Monday to Friday), what is the probability that:
i her total morning waiting time does not exceed minutes
Tip
A cumulative distribution function (CDF) measures the probability of a random variable being less
than or equal to a particular value. Normally, if the probability density function is called , the
cumulative distribution function is called .
Tip
The in the integral is a dummy variable. You could replace it with any other symbol. The
only real variable in this expression is the in the upper limit, which corresponds to the in
the left hand expression.
Since you can undo integration by differentiation, you can recover the probability density function from
.
Given that is the cumulative distribution function, then you can find the probability
density function, , using:
Find the cumulative distribution function, given that a continuous random variable has
probability density function for and otherwise.
If :
Once you have the cumulative distribution function you can use it to find the median, quartiles and
any other percentiles, since the th percentile is defined as the value such that . i.e.
.
Rewind
You saw that you could do this without explicitly referring to a cumulative distribution
function, in Exercise 4B.
WORKED EXAMPLE 4.11
and otherwise.
Therefore .
EXERCISE 4F
EXERCISE 4F
1 Find the cumulative distribution function for each of these probability density functions, and
hence find the median of the distribution.
a i
ii
b i
ii
2 Given each continuous cumulative distribution function, find the probability density function and
the median.
a i
ii
b i
ii
3 Find the exact value of the percentile of the continuous random variable that has pdf
for and otherwise.
4
A continuous random variable has cumulative distribution function
Rewind
You already met this idea in the context of kinematics in A Level Mathematics Student Book 1,
Chapter 16.
a Sketch .
0 x
0 1 2 3 4 5 6 7 8
Again, you need to split the integral for into two parts.
When:
EXERCISE 4G
a Show that .
a Sketch .
b Show that .
c Find .
Rewind
Tip
The easiest way to get this result is not to use integration, but to realise that the graph forms
a rectangle with width and total area .
b – a
1
Area = 1
b – a
a b
Since
b Find the standard deviation of the difference between the quoted value and the true value (with
quoted values below the true value giving a negative difference)
a . Define variables.
EXERCISE 4H
EXERCISE 4H
1 Find these probabilities. In parts a to d, follows a rectangular distribution over .
a i ;
ii ;
b i ;
ii ;
c i ;
ii ;
d i ;
ii ;
e i When a measurement is quoted to the nearest cm it is equally likely to be anywhere within
of the stated value. Find the probability that a measurement quoted as being to
the nearest cm is actually above .
ii A car’s milometer shows the number of completed miles it has done. Jerry’s car shows
miles. What is the probability that it will show miles in the next miles?
2 Find the expected mean and standard deviation of:
ii the true age of a boy who (honestly) describes himself as eighteen years old.
3 A piece is cut off one end of a log of length . Given that the cut is equally likely to be made
anywhere along the log, find:
b the expected mean and standard deviation of the length of the piece.
4 A string of length is randomly cut into two pieces. Find the probability that the length of the
shorter piece is less than .
5 Five random numbers are selected from the interval . Find the probability that they are all
smaller than .
6 a Prove, using integration, that the variance of the rectangular distribution between and is
b Hence prove that the ratio is independent of and , stating its value.
7 A rod of length is cut into two parts. The position of the cut is uniformly distributed along the
length of the rod. Find the mean and standard deviation of the length of the shorter part.
Section 9: Exponential distribution
When you model the waiting interval until a first success in a Poisson-type situation you can use the
exponential distribution. It is defined by the number of successes in a unit interval of time, , and it
is written as . Since the waiting interval is a continuous variable, the probability distribution is
described using a probability density function.
Rewind
You can find the mean and the variance of the exponential distribution by using integration.
Fast forward
You are asked to prove the formula for the variance in question 10 in Exercise 4I.
PROOF 5
So Use
When , .
WORKED EXAMPLE 4.16
The number of leaks in any miles of pipes in a sewer system follows a Poisson distribution
with mean .
a Find the probability that the first leak will be found in the first half mile.
b Find the variance of the distance until the first leak is found.
a Define variables.
You could also be asked to find a probability of a variable with an exponential distribution being greater
than a particular value. You can do this by integration, but it is useful to know the cumulative
distribution function. You can find this using integration. If then
Rewind
You met integration by parts in A Level Mathematics Student Book 2, Chapter 11.
If , then .
The exponential distribution also has a property called memorylessness. Prior waiting does not change
how long you are likely to wait for an event. This means that as well as measuring the amount of time
until a first event, it also measures the interval between events, as shown in Worked example 4.17.
During the summer Tanis sneezes on average two times every hour.
a State an assumption that must be made to model the time until the next sneeze by using an
exponential distribution.
b Assuming that the time until the next sneeze can be modelled by an exponential distribution,
find the exact probability that Tanis goes more than ninety minutes after waking up before
sneezing.
b Define variables.
EXERCISE 4I
a i if
ii if
b i if
ii if
c i waiting more than seconds for an emission from a radioactive substance that emits three
alpha particles per minute on average
ii waiting less than fifteen minutes for a bus that comes three times per hour on average.
2 Find the expected mean and standard deviation of:
a i
ii
b i the distance travelled in a car before reaching the first pot hole if pot holes along a certain
road are spread independently at an average rate of per kilometre
ii the time from the beginning of the day until the first phone call at a call centre that receives
an average of calls per hour.
3 The number of emails Khaled receives in an hour follows a Poisson distribution with mean . What
is the probability that the next email arrives in less than minutes?
a Find the probability that two birds will arrive in the next ten minutes.
b Find the probability that there is more than a ten-minute wait before the next bird arrives.
c Find the expected mean and standard deviation of the time (in minutes) spent waiting for a
bird.
5 When Ben walks down a particular street, he meets people he knows at an average rate of three
every 5 minutes. Different meetings are independent of each other. What is the probability that
Ben has to walk for more than minutes before he meets a person he knows?
6 The probability of waiting less than minutes for a bus is . If the waiting time is modelled by an
exponential distribution, find the probability of waiting more than minutes.
7 The probability of waiting more than minutes for a phone call is . Find an expression for the
mean waiting time for a phone call in terms of and , assuming the waiting time can be
modelled by the exponential distribution.
8 Show that the probability of a variable with an exponential distribution taking a value larger
than its mean is independent of .
9 The number of buses arriving at a bus stop in an hour follows a Poisson distribution with mean .
a Name the distribution which models the time, in minutes, Amanda has to wait until the next
bus arrives. State any necessary parameters.
b Given that Amanda has already been waiting for minutes, find the probability that she has to
wait at least minutes.
c Show that the answer in part b is the same as the probability that Amanda has to wait at least
minutes.
11 is the number of successes that occur in one unit of time, so that . is the number of
successes that occur in units of time.
If this is the case you apply all the rules learnt in this chapter and in Chapter 1 but using sums over the
discrete part of the random variable and integrals over the continuous part of the random variable.
Tip
Notice that in Worked example 4.18 the end point of the continuous part of the variable is a
part of the discrete random variable. You might worry about situations like this, but it is
perfectly possible to define random variables in this way.
When or then .
a Find the value of .
b Find .
EXERCISE 4J
a i for for
ii for for
b i for for
ii for for
c i for for
ii for for
d i for for
ii for for
2 The random variable is defined for and for . Between and the
probability density function is given by . It is also known that is . Find:
a the value of
b
c
d .
3 The random variable is defined for and for .
for
for
b Find .
5 The mixed random variable can take any values from to and the discrete values and . It
has cumulative distribution function:
for
for
Find:
a
b
d .
6 A mixed random variable can take the discrete values and and continuous values between
and . It has cumulative distribution function:
for
for
Find:
8 An athletics coach records the time of a squad of junior sprinters. He records the time of
anyone who runs between and seconds as precisely as he can. Anyone who runs between
and seconds gets their time recorded to the nearest tenth of a second. He models the time
recorded by this probability distribution:
for
for
The probability of a continuous random variable taking any single value is a meaningless
concept, but it is possible to work with the probability of it being in a given range. To do this
you use a probability density function such that the area under the curve represents
probability. The total area is therefore 1, and the function is never negative.
The summation formulae for the expectation of discrete random variables become integrals for
continuous random variables:
is still
The expectation and variance of a linear transformation are given by:
The cumulative distribution function gives the probability of the random variable taking a
value less than or equal to .
For a continuous distribution with PDF :
The main uses of cumulative distribution functions are to find percentiles of a distribution and
to convert from a distribution of one continuous random variable to a distribution of a function
of that variable.
If follows a rectangular distribution between and , then
If , then
for
Mixed practice 4
1 A continuous random variable has probability density function for . Find the exact
value of .
2 and are independent random variables. has mean and standard deviation . has
mean and standard deviation . Given that , what is the standard deviation of
?
b Find .
c Find .
4
Given that is a continuous random variable with PDF , find:
a the value of
b the expectation of
c the variance of .
5 The Jones’ expected spend on their garden is £ with a variance of £ . This is paid for
out of a bank account containing £ .
a What is the standard deviation in the amount remaining in the bank account after the
garden has been paid for?
However much the Jones’ spend on their garden, the Smiths will spend twice as much plus
£ .
c What is the standard deviation in the amount that the Smiths will spend?
b Evaluate:
ii
and otherwise.
9 The probability density function for the continuous random variable is for and
otherwise.
b Find .
c Find .
10 A doctor measures the masses of babies. If a baby has a mass between and the mass is
recorded as accurately as possible. If the mass is between and the mass is recorded
to the nearest . The doctor models the recorded masses using the random variable
with probability distribution defined by:
for .
c Find .
d Find .
11 The time taken, in minutes, to wash the dishes is modelled by a random variable with
expectation and standard deviation .
The time taken, in minutes, to clean the table is modelled by a random variable with
expectation and standard deviation .
a Before leaving Hassan must wash the dishes then clean the table. is the total time this
takes. Find the expectation and standard deviation of .
b When Alice visits the jobs can be shared. Hassan washes the dishes and Alice cleans the
table. is the time Alice has to wait after finishing cleaning the table before they can
leave. Find the expected mean and standard deviation in .
d Hassan keeps a record of the total time he spends washing the dishes over days. He
assumes that the times taken each day are independent and models the total time in the
days using the random variable . For what values of will be more than times
the standard deviation of ?
12 The times Markus takes to answer a multiple choice question are normally distributed with
mean and standard deviation . He has one hour to complete a test
consisting of questions.
Assuming the questions are independent, find the probability that Markus does not complete
the test in time.
13 The masses of men in a factory are known to be normally distributed with mean and
standard deviation . There is an elevator with a maximum recommended load of .
With men in the elevator, calculate the probability that their combined mass exceeds the
maximum recommended load.
14 Davina makes bracelets by threading purple and yellow beads. Each bracelet consists of
seven randomly selected purple beads and four randomly selected yellow beads. The
diameters of the beads are normally distributed with standard deviation . The average
diameter of a purple bead is and the average diameter of a yellow bead is . Find
the probability that the length of a bracelet is less than .
15 The masses of the parents at a primary school are normally distributed with mean and
variance , and the masses of the children are normally distributed with mean and
variance . Let the random variable represent the combined mass of two randomly
chosen parents and the random variable represent the combined mass of four randomly
chosen children.
b Find the probability that four children weigh more than two parents.
0.5
0 x
0 1 2 3 4 5 6 7 8 9 10
a Find .
e Find the probability that the mean of a random sample of values of is greater than .
a Find the expected waiting time until the first beta particle is observed.
b Find the probability of waiting more than minutes to observe a beta particle.
c Given that no particles have been observed in the first minutes, find the probability that
it takes more than hours to observe a beta particle.
a Prove that .
c Two independent observations of are made. Find an expression for the probability that
the maximum of these two observations is less than where .
19 The humidity of air is measured by a weather station. It can only take values from to
inclusive.
b Find .
22 The marks students scored in a Mathematics test follow a normal distribution with mean
and variance . The marks of the same group of students in an English test follow a normal
distribution with mean and variance .
a Find the probability that a randomly chosen student scored a higher mark in English than in
Mathematics.
b Find the probability that the average English mark of a class of students is higher than
their average Mathematics mark.
a Show that .
b What is the probability that the random variable has a value that lies between and ?
c Find the mean and variance of the distribution. Give your answers in terms of .
The random variable represents the lifetime, in years, of a certain type of battery.
d Find the probability that a battery lasts more than six months.
A calculator is fitted with three of these batteries. Each battery fails independently of the
other two.
d Hence, or otherwise:
i find
e Calculate the value of the median of , giving your answer to three decimal places.
[© AQA, 2012]
25 The continuous random variable has probability density function defined by:
ii .
ii the median, , of .
[© AQA, 2011]
FOCUS ON … PROOF 1
Rewind
(Theorem 1)
(Theorem 2)
A finite double sum of a sum can be split into two sums: (Theorem 5)
PROOF 6
1 Theorem ____
2 Theorem ____
3 Theorem ____
Properties of sums
4
6 Theorem ____
7 Theorem ____
QUESTIONS
Use techniques similar to those in Proof 6 to answer these questions. and are independent discrete
random variables.
1 Prove that .
2 a Prove that .
WORKED EXAMPLE
In a Poisson distribution the probability of two events occurring is . Find the probability of one
event occurring.
If then
Write the information given in terms of , the parameter
of the Poisson distribution.
0.3
QUESTIONS
b In another tosses, two heads are observed. Show that the probability of this happening, and the
observation in part a happening, is .
c By using technology or otherwise, find the value of that maximises the probability of getting three
heads in tosses.
QUESTIONS
To help you to understand the required conditions in context, here are some examples of real-life
situations. Comment on whether the Poisson distribution would be an appropriate model in each of these
situations. Where the Poisson is not appropriate, state which conditions are not met.
1 The number of fish in a volume of an ocean where fish occur at an average rate of per .
2 The number of signals received in an hour by a mobile phone from a communication mast when a
signal is received every seconds.
3 The number of beta particles emitted every minute by a radioactive substance that emits on average
beta particle every seconds.
4 The waiting time for a bus when one arrives on average every minutes.
5 The number of errors in pages of a textbook if there is an average of error on every pages.
6 The number of fish caught in ten hours in a small pond if an average of fish are caught every hour.
7 The number of girls in randomly selected people if it is expected that of the population are
girls.
Tip
interpret the different types of errors that can be made while conducting
hypothesis tests, called type I and type II errors
calculate the probability of a type I error based on a Poisson distribution
calculate the probability of a type I error based on a binomial distribution.
If you are following the A Level course, you will also learn how to:
use a new type of hypothesis test for the mean, called a -test
calculate the probability of a type II error based on a Poisson distribution
calculate the probability of a type II error based on a binomial distribution
calculate the probability of type I and type II errors based on a normal distribution.
A Level Mathematics You should know how to conduct 2 A six-sided dice is rolled five
Student Book 1, hypothesis tests, using the binomial times and three sixes are
Chapter 22 distribution. observed. Test at the
significance level if this provides
evidence that the dice is biased
towards rolling more sixes than a
fair dice.
Chapter 2 You should know how to conduct 3 The number of bees arriving at a
hypothesis tests, using the Poisson flower is modelled by a Poisson
distribution. distribution. If six bees arrive in
one minute, does this provide
evidence at the significance
level that the true mean is
greater than ?
A Level Mathematics You should know how to conduct 5 A sample of objects drawn
Student Book 2, hypothesis tests with the normal from a normal distribution with a
Chapter 22 distribution. standard deviation of has a
mean of . Conduct a two-
tailed test at significance to
decide if this provides significant
evidence of a change from a
mean of .
Rewind
You studied hypothesis testing using the normal distribution in A Level Mathematics
Student Book 2, Chapter 22.
One of the reasons hypothesis tests are so important in modern statistics is that they try to give a
probability of certain types of error. You will see in this chapter which types of errors are controlled and
which are not, and how to calculate their probabilities.
Section 1: -tests
In a -test to see if the population mean has changed from you are testing the hypotheses:
where and are found from the sample while is the value in the null hypothesis and , the
population standard deviation, is assumed to be the same as a previously held value. You can use the
fact that to do calculations with this statistic.
Tip
If you do not know (or have reason to believe that it has changed) you must instead estimate it
from the data. An appropriate way to estimate this is using the square root of the unbiased
estimate of the variance, . You can then construct a -score.
x
x1 x2 x3 σ
S
-scores very rarely exceed (or go below ). However, if your sample just happens to have a very
small standard deviation – for example, if the sample is and in the graph shown – then the -
score can get quite large: it is not unusual for it to be around . This highlights that it does not follow
a normal distribution. The likelihood of getting a very tightly clustered sample depends on . At low
values of this possible clustering has a very big effect, but at large values of the sample standard
deviation is a very good approximation of the population standard deviation. This means that there
are lots of different -distributions, depending on the value of .
Tip
Although the -distribution gives the distribution of the random variable , conventionally it
is written with a lower case .
Z-distribution
t-distribution with n = 7
t-distribution with n = 4
As the value of grows, the -distribution gets closer and closer to the -distribution.
You might wonder why there is an in the formula in Key point 5.1. Rather than using the value of
to describe the -distribution, conventionally you use the degrees of freedom, . Because you fix
one parameter, the population mean, when you are doing a -test you use the formula .
Tip
Some graphical calculators can perform a -test. You should state the test statistic, its
distribution and the -value from your calculator. If the mean and standard deviation of
the sample are not given in the question you should state those too: your calculator will
find them in the process of performing the -test.
To conduct a -test, you calculate the value of and then look up the critical value from the table
given in the formula book. This still requires some work as the information is given in terms of
cumulative probabilities. To do a one-tailed test with significance level , you look up the column
headed by in the table. To do a two-tailed test with significance level , you look up the column
headed by in the table. If the modulus of your -score is more than this value, you reject .
1 – α α
α
–
2
α
1 –– α
–
2 2
The label of a pre-packaged steak claims that it has a mass of . A random sample of
steaks is taken and their masses are:
Test at the significance level whether the label’s claim is accurate, stating any
assumptions you need to make.
mass of a steak in Define variables. You must use the -test since the
true variance is unknown, but to do this test the
Assume that underlying distribution must be normal.
From your calculator: Find sample statistics from your calculator. Since you
do not know the true population variance, you use an
unbiased estimate, .
Common error
Conclusions are often stated without a sense of statistical uncertainty. For example, it would
be wrong to state that the conclusion of the test in Worked example 5.1 is: ‘The label is
correct.’
EXERCISE 5A
1 In each of these situations it is believed that is normally distributed. Decide the result of the
test if it is conducted at the significance level.
a i ;
ii ;
b i ;
ii ;
c i Data:
ii Data:
2 John believes that the average time taken for his computer to start is seconds. To test his
belief, he records the times (in seconds) taken for the computer to start:
b Assuming that the masses are normally distributed, test Michael’s suspicion at the level of
significance.
4 The crawling ages of babies in a nursery are recorded. The sample has mean months and
standard deviation months. A parenting book claims that the average age for babies crawling
is months. Test at the level whether babies in the nursery crawl significantly earlier than
average, assuming that the distribution of crawling ages is normal.
5 Penelope thinks that cleaning the kettle will decrease the amount of time it takes to boil (
seconds). She knows that the average boiling time before cleaning is seconds. After cleaning,
she boils the kettle times and summarises the results as:
a Calculate an estimate of the mean time for the athletes in the club.
b Find an unbiased estimate of the population variance based on this sample.
c Test the coach’s belief at the level of significance.
d State what assumption you have made about the distribution of the athletes’ times.
7 The lengths of bananas are found to follow a normal distribution with mean Roland has
recently changed banana supplier and wants to test whether their mean length is different. He
takes a random sample of bananas and obtains these summary statistics:
b For what values of will Aki reject the null hypothesis at the significance level?
Section 2: Errors in hypothesis testing
Defining type I and type II errors
The acceptable conclusions to a hypothesis test are:
If the first conclusion is wrong – i.e. you have rejected while it was true – it is called a type I error
(spoken as ‘type one error’).
If the second conclusion is wrong – i.e. you have failed to reject when you should have done – it is called
a type II error (spoken as ‘type two error’).
a : Defendant is innocent.
: Defendant is guilty.
b A type I error would be saying that an innocent A type I error is rejecting when it was
person is guilty. true.
c A type II error would be saying that a guilty A type II error is not rejecting when it
person is innocent. was false.
In a hypothesis test, the probability of a type I error is equal to the actual significance level.
The phrase ‘actual significance level’ is used because you might find when testing a discrete random
variable that you cannot create a critical region that has exactly the desired significance level. If you are
asked to design a test with significance, because of the discrete nature of the variables you might have
to create a critical level that in fact has an actual significance level near to . Conventionally, you would
choose the largest significance level you can that is less than .
You might not be told the significance level of a test, in which case you need to use a formula to calculate
it.
In a hypothesis test:
The null hypothesis is rejected if . Find the probability of a type I error in this test.
Derren wants to test whether a six-sided dice is biased towards rolling sixes. He rolls a dice
times.
is true, :
Use your calculator to work out the probabilities
from the binomial distribution. Since you are
looking for evidence of , you need to find
as this is an event as extreme as
observed, or more extreme in the direction of the
alternative hypothesis, i.e. it is the -value of an
observed .
Rewind
You could also answer the type of test shown in Worked example 5.4 using a chi-squared
test – see Chapter 3 for a reminder of using a chi-squared test – with two categories: and
not . However, this would have only one degree of freedom so the fact that the chi-
squared test is only approximate is particularly problematic. It is preferable to use an exact
binomial test, as shown.
The following diagrams show the effect of different true population means on this test. In the first
diagram the true mean is , so anything that falls in the red region gets this right, but falling in the
blue regions results in a type I error. In the second, third and fourth diagrams the true mean is getting
further and further away from . Anything falling in the red regions picks this up, but anything falling
in the blue regions fails to detect that the true mean is not . All of these are now type II errors.
Rejection Acceptance Rejection
region region region
H 0 is true
5% type I error
95% correctly not reject H
0
H 0 is untrue
(small difference)
Type II error
Correctly reject H0
x
µ = 120.4
H 0 is untrue
(medium difference)
Type II error
Correctly reject H0
x
µ = 130
H 0 is untrue
(large difference)
Type II error
Correctly reject H0
x
µ = 160
In a hypothesis test:
, State hypotheses.
State the test statistic and its distribution
(assuming is true).
x
9 a
An important concept when studying tests is called the power of a test. It is defined as the
probability of rejecting when it is false, so it is the probability of not making a type II error.
In a hypothesis test:
A call centre believes that it receives calls at an average rate of per hour. To test this it looks
at the number of calls in a two-hour period. If that number is greater than or lower than , it
rejects the hypothesis that the average rate is per hour. Given that the actual rate of calls is
calls per hour, find the power of the test.
number of calls in two hours You are considering the actual rate rather than the
rate under the null hypothesis.
EXERCISE 5B
1 Given that , find the probability of a type I error for each of these situations.
a i ; reject if
ii ; reject if
b i ; reject if
ii ; reject if
c i ; reject if or
ii ; reject if or
2 Given that , find the probability of a type I error in each of these situations.
a i ; reject if
ii ; reject if
b i ; reject if
ii ; reject if
c i ; reject if or
ii ; reject if or
3 Given that , find the probability of a type I error for each of these situations.
a i ; reject if
ii ; reject if
b i ; reject if or
ii ; reject if or
4 Find the probability of a type II error for each of these situations. The sample mean is being
tested and the sample size, , is specified in each case.
a i with , ; significance; . In reality, .
ii ; reject if ; real
b i ; reject if ; real
ii ; reject if ; real
c i ; reject if or ; real
ii ; reject if or ; real
6 Given that , find the probability of a type II error in each of these situations.
a i ; reject if ; true
ii ; reject if ; true
b i ; reject if ; true
ii ; reject if ; true
c i ; reject if or ; true
ii ; reject if or ; true
7 What are the advantages and disadvantages of increasing the significance level of a hypothesis
test?
8 A television magician tries to trick an audience into believing that a coin is biased. He records
himself tossing a fair coin many hundreds of times until he tosses ten heads in a row. He then
shows the audience the film containing only the ten heads being tossed.
a State the null and alternative hypotheses in this situation.
b If an audience member believed that the coin is biased, is this an example of a type I or a
type II error?
9 A textbook says that there is positive correlation between two variables if the sample correlation
coefficient is more than . Describe, in this context, what is meant by:
a a type I error
b a type II error.
10 A student conducts a binomial hypothesis test to see if a six-sided dice is fair. He rolls the dice
times and if he sees more than sixes, he will claim that the dice is biased.
a Describe, in the context of this test, what is meant by:
i a type I error
ii a type II error.
b State two changes to the test that would make a type II error less likely.
11 The numbers of people arriving at a health club follow a Poisson distribution with mean per
hour. After a new swimming pool is opened, the management want to test whether the number
of people visiting the club has increased.
a State suitable null and alternative hypotheses.
b They decide to record the number of people arriving at the club during a randomly chosen
hour, and to reject the null hypothesis if this number is larger than .
Find the significance level of this test. Comment on your result.
12 A long-term study suggests that traffic accidents at a particular junction occur randomly at a
constant rate of per week. After new traffic lights are installed, it is believed that the number
of accidents has decreased. The number of accidents over a -week period is recorded.
a Let denote the average number of accidents in a -week period. State suitable hypotheses
involving .
b It is decided to reject the null hypothesis if the number of accidents recorded is less than or
equal to . Find the probability of making a type I error.
c The average number of accidents has in fact decreased to per week. Find the probability
of making a type II error in this test.
13 The masses of eggs are known to be normally distributed with standard deviation . Dhalia
wants to test whether eggs produced by her hens have mass greater than on average.
a State suitable null and alternative hypotheses to test Dhalia’s idea.
Dhalia weighs eggs and finds that their average mass is .
b Test at the significance level whether Dhalia’s eggs have mass greater than on
average. State your conclusion clearly.
c Write down the probability of making a type I error in this test.
d What is the smallest average mass of the eggs that would lead Dhalia to reject the null
hypothesis?
e Given that the average mass of Dhalia’s eggs is actually , find the power of the test.
14 A coin is flipped times. It is decided that it is a biased coin if or heads are observed.
a State the null and alternative hypotheses.
b Find the significance level of the test.
c Given that the true probability of flipping a head is , find the probability of a type II error as a
function of .
d Show that the probability of a type II error is maximised when .
15 A population is known to have a normal distribution with a variance of and an unknown mean
. It is proposed to test the hypotheses using the mean of a sample of size
.
a Find the appropriate critical regions corresponding to a significance level of:
i
ii
b Given that the true population mean is , calculate the probability of making a type II error
when the level of significance is:
i
ii
16 The number of worms in a square metre of forest is known to follow a Poisson distribution. The
mean is thought to be . This is rejected if no worms are observed when a square metre is
observed. If the true mean is , find an expression in terms of for the power of this test.
A -test is a way of testing to see if a sample provides evidence of a change in the population
mean from a previously held belief. It is based on the -score:
.
Mixed practice 5
1 Find the value of the -score (to three significant figures) for these data when testing the
null hypothesis .
a Write down null and alternative hypotheses for the chemist’s belief.
It is known that
In reality, .
d What could be done to decrease the probability of a type II error without changing the
probability of a type I error?
5 The number of beta particles emitted in one second by an isotope of a radioactive element
is known to follow a distribution. A theory suggests that but a physicist believes
that this might be an underestimate.
b The physicist decides that he will reject the null hypothesis if he sees more than beta
particles in a five-second period. What is the significance level of this test?
6 A union representative wishes to test a company’s claim that it pays an average salary of
£ . She suspects that the company pays less than this.
The union representative takes a random sample of employees and finds their wages
in thousands of pounds. Her results are summarised here:
7 A coin is flipped times and it is decided that it is a biased coin if more than heads or
fewer than heads are observed.
c If the coin is actually biased so that heads occur of the time, find the power of the
test.
8 Safeerah regularly cycles to and from work. She has a steel-framed bicycle that weighs
. Her mean journey time for the round trip is minutes. Her friend, Josh, has a carbon-
framed bicycle that weighs . Safeerah is thinking of buying a carbon-framed bicycle to
reduce her journey time, and Josh agrees to lend her his bicycle so that she can try it.
a The carbon-framed bicycle is sold using the slogan: ‘Less weight means more speed’.
Safeerah, who weighs , is expecting that the per cent reduction in bicycle mass
will substantially reduce her journey times. Josh tells her not to expect this as the
resultant mass reduction is actually closer to per cent.
b Safeerah records her journey times with the carbon-framed bicycle on typical days as:
Assuming that these times may be regarded as a random sample from a normal
distribution, test, at the significance level, whether her mean journey time with the
carbon-framed bicycle is less than minutes.
[© AQA 2013]
9 A company manufactures bath panels. The bath panels should be deep, but a small
amount of variability is acceptable. The depths are known to be normally distributed with
standard deviation .
a In order to check that the mean depth is , Amir takes a random sample of bath
panels from the current production and measures their depths, in millimetres, with these
results.
b Isabella, a manager, tells Amir that, in order to check whether the current mean is
, it is necessary to take a larger sample. Amir therefore takes a random sample of
size from the current production and finds that the mean depth is .
Test whether the current mean is , using the data from this second sample and
the significance level.
c It is proposed to carry out hypothesis tests at regular intervals to check that the mean
remains at .
Amir proposes that the tests be based on random samples of size , but Isabella favours
random samples of size . Explain which, if either, sample size would lead to a smaller
risk:
i of a type I error
ii of a type II error.
[© AQA 2011]
10 A town council wanted residents to apply for grants that were available for home insulation.
In a trial, a random sample of residents was encouraged, either in a letter or by a
phone call, to apply for the grants. The outcomes are shown in the table.
a The council believed that a phone call was more effective than a letter in encouraging
people to apply for a grant. Use a -test to investigate this belief at the significance
level.
b After the trial, all the residents in the town were encouraged, either in a letter or by a
phone call, to apply for the grants. It was found that there was no association between
the method of encouragement and the outcome. State, with a reason, whether a type I
error, a type II error or neither occurred in carrying out the test in part a.
[© AQA 2013]
6 Confidence intervals
Chapter 5 You should know 4 Based on this sample, test to see if there is evidence
how to conduct - that at the significance level:
tests. .
In this chapter you will learn how to construct confidence intervals for the mean in different situations and
see how such intervals can be interpreted.
Section 1: Confidence intervals
A single value calculated from a sample used to estimate a population parameter is called a point
estimate. You are trying to find an interval that has a specified probability of including the true population
value of the statistic you are interested in. This interval is called a confidence interval and the specified
probability is called the confidence level.
For example, given the data , you can calculate the sample mean, which is . However, it is
very unlikely that the mean of the population this sample was drawn from is exactly . You will now
develop a method that will allow you to say with confidence that the population mean is somewhere
between and . This does not mean that there is a probability of that the true mean is between
and , but rather that of confidence intervals constructed from samples like this would contain
the true mean.
To develop the theory, you are going to look at creating confidence levels, which are the default
choice. Suppose you are estimating the population mean, using the sample mean . Initially, you will only
consider random variables drawn from a normal distribution so that , where is the population
mean (the thing you want to find) and is the standard deviation in one observation of . You can find, in
terms of and , a region symmetrical about that has a probability of containing .
You can find the -score of the upper bound. Using the symmetry of the situation, you find that of the
distribution is above the upper bound, so the -score is . You can say that:
Rewind
You saw in A Level Mathematics Student Book 2, Chapter 21, that is the inverse
normal distribution that tells you the -score that results in the cumulative probability .
Be warned – although this looks like it is a statement about the probability of , in your derivation you
treated as a constant so it is meaningless to talk about a probability of . This statement is still
concerned with the probability distribution of .
So, if the sample mean is , your confidence interval for is:
Tip
You can generalise this method to other confidence levels. To find a confidence interval you can find the
critical -score geometrically, using the properties of this graph.
c% c%
2 2
x
q Z
50%
From this diagram you can see that the critical -score is the one where there is a probability of of
being below it.
Tip
This process creates a symmetric interval around the sample mean. It is also possible to create a
non-symmetric interval, but that is beyond the scope of this course.
Tip
where is the sample mean, is the standard deviation in one observation of and
The masses of fish in a pond are known to have a normal distribution with a standard deviation
. The mean mass of fish from the pond is found to be .
a Find a confidence interval for the mean mass of all the fish in the pond.
b Guidance from a vet suggests that the pond is an unsuitable environment if the mean mass of the
fish is below . Does your confidence interval suggest that the environment is unsuitable?
a For a confidence interval Use your calculator to find the -
score associated with a
So confidence interval is , which is
confidence interval.
.
b A mean mass of is within the confidence interv You need to consider whether a true
so the confidence interval does not necessarily mean of is consistent with your
confidence interval.
suggest that the environment is unsuitable.
You do not need to know the centre of the interval to find the width of the confidence interval. From Key
point 6.1 you know that the confidence interval goes from . Therefore its width is .
Fast forward
The inference in part b of Worked example 6.1 is effectively a type of hypothesis test. You will
see later in this chapter how you can quantify the significance level when using a confidence
interval to perform a hypothesis test.
The results in a test are known to be normally distributed with a standard deviation of . How
many people need to be tested to find an confidence interval with a width of less than ?
For an confidence interval Find the -score associated with an confidence
interval.
Set up an inequality.
If the sample size is sufficiently large (greater than ) and you do not know the true variance, you need to
use the unbiased estimate of the variance as a substitute for the true variance.
You can use confidence intervals to conduct hypothesis tests. For example, if you find a confidence
interval you can use it to conduct a significance two-tailed hypothesis test.
A vet is measuring the masses of a breed of dog . Her data are summarised here:
So
You can then use the
expression in Key point
6.1, substituting for .
b ,
The true mean being is consistent with the confidence
interval found, so you do not reject at the significance
level. There is not significant evidence that the textbook is
incorrect.
You must take care not to draw false inferences from confidence intervals. It is important to know the types
of error that can be made, as shown in Worked example 6.4.
Ramon works out a confidence interval for the population mean as to . He claims that:
a of any observed data will be between and
b the probability that the population mean is between and is
Decide which of these statements, if any, are correct. Justify your answers.
a This is not necessarily true. The confidence interval is for the mean rather than a
single observation. Even if this statement was about the sample mean there would be
variations between samples.
b This is not true. The population mean is not a random variable so you cannot talk
about a probability associated with it.
c This is not necessarily true. The confidence interval will be centred on the sample
mean, which may not equal the population median.
EXERCISE 6A
b .
2 Find the required symmetric confidence interval for the population mean for the summarised data. You
can assume that the data are taken from a normal distribution with known variance.
a i , , ; confidence interval
ii , , ; confidence interval
b i , , ; confidence interval
ii , , ; confidence interval
3 Copy and complete this table. You can assume that the data are taken from a normal distribution with
known variance and that the confidence level is symmetric.
Confidence level Lower bound of Upper bound of
interval interval
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
4 The blood oxygen levels (measured as percentages) of an individual are known to be normally
distributed with a standard deviation of . Based upon six readings, Niamh finds that her blood
oxygen levels are on average .
a Find a symmetric confidence interval for Niamh’s true blood oxygen level.
b A doctor needs to be called if the true mean oxygen level falls below . Does the confidence
interval suggest that the true oxygen level is below ?
5 The birth masses of male babies in a hospital are known to be normally distributed with variance .
a Find a symmetric confidence interval for the average birth mass if a random sample of ten
male babies have an average mass of .
b If average birth masses are below then an investigation must be conducted. Based upon this
confidence interval, should an investigation be conducted?
6 A data set is summarised here:
Find a symmetric confidence interval for the mean, assuming that the data are drawn from a
normal distribution.
7 a A sample of people in a town have an average wage of £ with an unbiased estimate of the
population variance of million. The wages follow a normal distribution. Find a symmetric
confidence interval for the mean wage in the town.
b Is there significant evidence (at significance) that the mean wage in this town is different from
£ ?
8 When a scientist measures the concentration of a solution, the measurement obtained can be
assumed to be a normally distributed random variable with standard deviation .
a He makes independent measurements of the concentration of a particular solution and correctly
calculates the confidence interval for the true value as . Determine the confidence
level of this interval.
b The scientist claims that this means that of sample means will be between and . Is
this a correct interpretation of the confidence interval? Justify your answer.
c He is now given a different solution and is asked to determine a confidence interval for its
concentration. The symmetric confidence interval is required to have a width less than . Find
the minimum number of measurements required.
9 A supermarket wishes to estimate the average amount spent shopping each week by single men. It is
known that the amount spent has a normal distribution with standard deviation € . What is the
smallest sample required so that the margin of error (the difference between the centre of the interval
and the boundary) for an symmetric confidence interval is less than € ?
10 A physicist wishes to find a confidence interval for the mean voltage of some batteries. She therefore
randomly selects batteries and measures their voltages. Based on her results, she obtains the
confidence interval [ ]. The voltages of batteries are known to be normally distributed
with a standard deviation of .
a Find the value of .
b Assuming that the same confidence interval had been obtained from measuring batteries, what
would be its level of confidence?
c A confidence interval for the mean voltage of a different brand of batteries is found to be [
]. Is there significant evidence that the second brand of battery has a higher voltage than
the first brand of battery?
11 a A set of data items produces a confidence interval for the mean of ( ). You can assume
that the data are drawn from a normally distributed population.
Given that , find the confidence level, giving your answer to two significant figures.
b Jasmine wants to test these hypotheses:
Use the given confidence interval to conduct a hypothesis test, stating the significance level.
12 From experience it is known that the variance in the increase between marks in a beginning-of-year
test and an end-of-year test is . A random sample of four students in Mr Jack’s class was selected
and the results in the two tests were recorded.
Alma Brenda Ciaron Dominique
Beginning of year
End of year
a Assuming that the difference can be modelled by a normal distribution with variance , find a
symmetric confidence interval for the mean increase.
b How could the width of the confidence interval be decreased?
c Do these data provide evidence at the significance level that Mr Jack’s class is doing better than
the school average of a -mark increase?
13 Which of these statements are true for symmetric confidence intervals of the mean?
a There is a probability of that the true mean is within the interval.
b If you were to repeat the sampling process times, of the intervals would contain the true
mean.
c Once the interval has been created there is a chance that the next sample mean will be within
the interval.
does not follow the normal distribution, but rather the -distribution (as long as follows a
normal distribution). In Section 1 you assumed that when the sample size is large the difference
between the -distribution and the normal distribution is sufficiently small that it can be ignored. In
this section you will look at how you can adapt the theory from Section 1 when sample sizes are small
– less than about .
Rewind
The -distribution and associated calculations were covered in Chapter 5. Remember that
the number of degrees of freedom is given by .
You can follow a similar analysis to the one leading to Key point 6.1 to get a formula for a confidence
interval using a t-distribution when the sample size is small.
If the estimated variance is found from the sample and the sample size is small, the
symmetric confidence interval for the population mean is given by:
Tip
You can find the value of from some calculators or by using the percentage points
table in the formula book. For example, if you are looking at a symmetric
confidence interval, that means that there is below the upper bound of the
interval so you use the percentage point.
95%
2.5% 2.5%
x
97.5%
Find a confidence interval for the mean of the data , assuming that the data is
drawn from a normal distribution.
EXERCISE 6B
1 Find the required symmetric confidence interval for the population mean for these data, some of
which have been summarised. You can assume that the data are taken from a normal
distribution.
a i , , ; confidence interval
ii , , ; confidence interval
b i , ; confidence interval
ii , ; confidence interval
c i ; confidence interval
ii ; confidence interval
2 A garden contains a large number of rose bushes. A random sample of eight bushes was taken
and the heights in cm were measured and the data were summarised as:
,
a State an assumption that is necessary to find a confidence interval for the mean height of
rose bushes.
b Find the sample mean.
c A newspaper report on this study claims that most students watch between and
hours of television each day. Is this a reasonable conclusion from this confidence interval?
Explain your answer.
4 The random variable is normally distributed with mean . A random sample of observations
is taken on , and it is found that:
c The manufacturer claims that the lifetime of the printer cartridge is at least pages. Is the
confidence interval found consistent with this claim?
6 The times taken for four people to complete a crossword puzzle are measured and the results
are shown in this table.
Person Time (minutes)
John
Diane
David
Jane
a Find a confidence interval for the true population mean, assuming that the times follow a
normal distribution.
b The newspaper says that the average time to complete the crossword is more than
minutes.
i State suitable null and alternative hypotheses for this test.
ii Use your confidence interval from part b i to determine the conclusion to this hypothesis
test at the significance level.
7 The masses of four burgers, in grams, before and after being cooked for one minute, are
measured:
Burger
Before cooking
After cooking
A symmetric confidence interval for the mean mass loss was found to include values from
. It can be assumed that the masses follow a normal distribution.
a Find the value of .
A confidence interval for the mean is a range of possible values for the population mean,
along with a confidence level.
If the true population variance is known and the sample mean follows a normal distribution
then the confidence interval takes the form:
where .
When carrying out a hypothesis test or finding a confidence interval for the mean, if the
sample size is sufficiently large ( ) and you do not know the true variance, you can use the
unbiased estimate of the variance as a substitute for the true variance.
If the estimated variance is found from the sample and the sample size is small, the
confidence interval for the population mean is given by:
, ,
3 The masses of bananas are investigated. The masses of a random sample of of these
bananas were measured and the mean was found to be with an unbiased variance of
. It is assumed that the masses follow a normal distribution.
4 The time taken for a mechanic to replace a set of brake pads on a car is recorded. In a
week she changes sets of brake pads and minutes and .
Assuming that the times are normally distributed, calculate a symmetric confidence
interval for the mean time taken for the mechanic to replace a set of brake pads.
6 A random sample of four students in a school was selected and the results they got in two
tests were recorded:
Beginning of year
End of year
a Find a symmetric confidence interval for the mean increase in marks from the
beginning of year until the end of year, assuming that the differences follow a normal
distribution.
b Hence conduct a test at the significance level to see if the results have changed
between the beginning and end of the year.
7 The random variable is normally distributed with mean and standard deviation .
8 From experience it is known that the variance in the mass decrease during a diet is . A
random sample of four people was selected and their masses before and after their diet
were recorded.
Before diet
After diet
a Assuming that the mass loss follows a normal distribution, find a confidence interval
for the mean mass loss during the diet.
b Hence conduct a test at significance to see if the diet results in a change in mass.
Assuming that these masses form a random sample from a normal population, calculate:
ii If such confidence intervals are constructed from separate random samples from
the same population, find the probability that at least one of them will not include .
b Jurgen can run metres in a mean time of seconds. His coach changes his
training programme to concentrate on his starting speed. After following the new training
programme, a random sample of of Jurgen’s -metre running times has mean
seconds and standard deviation seconds.
ii Use the confidence limits to decide whether there is significant evidence that the new
training programme has been effective. Justify your decision.
[© AQA 2015]
FOCUS ON … PROOF 2
If , then and .
You need to know the formula for binomial probabilities and the binomial expansion. One part of the proof
also involves differentiation using the chain rule.
Rewind
Refer to A Level Mathematics Student Book 2 for revision on the binomial distribution and on
the chain rule.
QUESTIONS
3 Explain why .
6 a Show that .
The formula for the endpoints of a symmetric confidence interval where the population variance is
known is approximately .
Use a spreadsheet to create random numbers generated from the normal distribution :
Observation
Sample 1 2 3 4 5 6 7 8 9 10
1st 27.62 20.34 22.50 10.06 20.75 18.76 26.51 10.41 =NORMINV(RAND(),20,5)
NORMINV(p robabili ty, mean, standard_dev)
Then find the mean of the sample and use the formula to find the confidence interval:
A B C D E F G H I J K L M N O
1 Observation Confidence interval
2 Sample 1 2 3 4 5 6 7 8 9 10 Mean Lower Upper
3 1st 13.20 26.32 17.06 14.70 19.77 13.47 21.86 21.30 20.27 15.74 18.37 15.27 =L3+1.96*5/SQRT(10)
4
Tip
Some spreadsheets have the option of generating random numbers from a given distribution. If
your spreadsheet does not have this facility, then you can still use a random number generator
which provides random numbers from the rectangular distribution between and ; most
spreadsheets do have this function. You might have to think about why the formula shown then
provides random numbers from the normal distribution; it is not obvious.
Check if the confidence interval does contain the true mean, which was :
L M N O P Q R S
Confidence interval
Mean Lower Upper Check
20.97 17.87 24.07 =IF(AND(M3<20,N3>20),1,0)
IF(logical_test, [value_if_true], [value_if_false])
Then copy this all down to consider samples, all of size . Count how many do contain the true mean.
Confidence interval
Mean Lower Upper Check Counting:
20.80 17.70 23.90 1.00 =SUM( 03:0202)
21.52 18.42 24.62 1.00 SUM(number1, [number 2], …)
21.94 18.84 25.04 1.00
QUESTIONS
QUESTIONS
1 For each sample, can you say with certainty whether or not the true mean is within the
calculated confidence interval?
2 What percentage of the calculated confidence intervals contain the true mean?
3 If, instead of using the true standard deviation, the sample standard deviation is used, then a
-interval is required.
5 Repeat the investigation from question , but this time using one sample of size from a
distribution and one sample of size from a distribution.
Tip
The purpose of questions 4 and 5 is to highlight that it is not a good idea to use the overlap of
two confidence intervals to test to see if the mean of two distributions is the same, as the
significance level is not obvious.
FOCUS ON … MODELLING 2
QUESTIONS
1 Use a spreadsheet to create a list of samples of size , taken from the normal distribution
.
2 Find the mean of each sample. Is the mean of all the means of each sample zero?
Tip
In Excel you can create a random number from the distribution using the
syntax “ ” or the function provided by the Data Analysis
toolpak.
3 Find the standard deviation of these samples. Is the mean of the standard deviations of each
sample approximately ?
4 Construct the -score for each sample mean using the formula:
Tip
If you are using Excel, you might want to use the Data Analysis toolpak to create
the histogram.
5 Construct the -score for each sample mean using the formula:
Based on this exercise, you should see that for small sample sizes there is a noticeable difference
between -scores and -scores, necessitating the use of the -distribution. However, for larger
sample sizes the differences are small compared to most other sources of uncertainty, so the
normal distribution can be used as an approximation to the -distribution.
CROSS-TOPIC REVIEW EXERCISE 1
1 The discrete random variable can only take the values and . If , find .
Choose from these options.
2 The length of an athlete’s long jump is modelled by a normal distribution with standard
deviation . A sample of jumps is measured. What will be the width (to three
significant figures) of a confidence interval for the mean? Choose from these options.
A
B
C
D
3 A continuous random variable has probability density function defined by
b What assumptions are required to make the conclusion of the test in part a valid?
c Jane says that she would be more likely to study Further Mathematics if she attended
North Academy. Is this a valid inference from the data? Justify your answer.
a Show that .
It is given that .
Based on long experience, Andrew knows that the average time spent on the car section
is minutes with standard deviation minutes, and the average time spent on the train
section is minutes with standard deviation minutes.
a Assuming that there is no waiting time, find the expectation and standard deviation in
Andrew’s total journey time.
b For the meeting Andrew gets paid £ plus £ per hour he spends travelling. Find the
expectation and standard deviation in the amount Andrew gets paid.
7 For the year 2014, this table summarises the masses, kilograms, of a random sample of
women residing in a particular city who are aged between years and years.
Mass ( ) Number of women
Total
a Calculate estimates of the mean and the standard deviation of these masses.
b i Construct a confidence interval for the mean mass of women residing in the city,
who are aged between years and years.
ii Hence comment on a claim that the mean mass of women residing in the city, who
are aged between years and years has increased from that of in 1965.
[© AQA 2014]
8 Two independent random variables have normal distributions, and
.
a State the distribution of , including any necessary parameters.
b Find .
9 At a remote hospital, in an area where there are many venomous snakes, the number of
patients during one week requiring treatment after a venomous snake bite may be
modelled by a Poisson distribution with mean .
a For this hospital, find the probability that:
i no more than patient requires treatment after a venomous snake bite during a
particular week
ii at least patients require treatment after a venomous snake bite during a particular
period of weeks
iii more than patients but fewer than patients require treatment after a venomous
snake bite during a particular period of weeks.
b Each patient who has been bitten by a venomous snake is treated with a single dose of
an anti-venom which is effective against the venoms of all the snakes common in that
area.
The anti-venom is expensive and has a limited shelf life, so that a delivery of fresh anti-
venom is made at -week intervals.
The hospital stores just enough anti-venom so that the probability that it runs out of
anti-venom before the next delivery is less than per cent.
Quoting probabilities to justify your answer, state how many doses of anti-venom the
hospital should have in its store immediately after a delivery of fresh anti-venom.
[© AQA 2015]
10 Dana, a researcher in the USA, investigated game-related stress for sports officials in
inter-school baseball, basketball and soccer.
You may assume that the officials involved in this investigation represent a random
sample.
a Use the information in Table 1 to complete the contingency table, Table 2, with
frequencies that could be analysed to investigate whether the coping style used by
officials is associated with the sport involved.
Table 2
Coping style
AP AV
Baseball
Sport Basketball
Soccer
b Examine, using the level of significance, whether the coping style used by officials
is associated with the sport involved.
a Find in terms of .
b Show that .
c What is the largest possible value of the variance of ?
12 A sample of size is drawn from a normally distributed population with standard deviation
. A confidence interval for the mean was correctly calculated to be .
Find:
a the unbiased estimate of the population mean
b the value of .
13 In a diamond mine, the number of diamonds found per cubic metre of material mined is
known to follow a Poisson distribution with mean .
a If , find the probability of finding:
i diamonds in of mining
ii The survey results show that the sample contains diamonds. Conduct a hypothesis
test at the significance level.
e the median of .
16 The volume of lemonade in a can produced at a factory follows a normal distribution with
standard deviation . A quality control test takes a random, independent sample of
cans. The factory manager claims that the cans should, on average, contain .
a If the true mean is , find the probability that exactly cans contain less than
.
b Jane decides that if or more cans in the sample contain less than she will
reject the batch.
i State in this context what is meant by a type I error.
ii Find the probability of a type I error in Jane’s test.
c The mean of the sample is found to be .
i Construct a confidence interval for the true mean of the cans, giving your
answer to decimal place.
ii Phillip uses the confidence interval from part c i to determine whether the cans do
come from a population with a mean of less than . What conclusion does Phillip
draw and what is the significance level of his conclusion?
17 Members of a library may borrow up to books. Past experience has shown that the
number of books borrowed, , follows the distribution shown in the table.
b Assume that the numbers of books borrowed by two particular members are
independent. Find the probability that one of these members borrows more than
books and that the other borrows fewer than books.
c Show that the mean of is , and calculate the variance of .
d One of the library staff notices that the values of the mean and the variance of are
similar and suggests that a Poisson distribution could be used to model .
Without further calculations, give two reasons why a Poisson distribution would not be
suitable to model .
e The library introduces a fee of pence for each book borrowed.
Assuming that the probabilities do not change, calculate:
i the mean amount that will be paid by a member
ii the standard deviation of the amount that will be paid by a member.
[© AQA 2016]
CROSS-TOPIC REVIEW EXERCISE 2
1 It is assumed that people arrive in a queue randomly and at a constant average rate of
per minute. The random variable is the time, in minutes, between people arriving in
the queue.
a State the distribution of , including any parameters.
Basic
Higher
Total
a Use a -test, at the level of significance, to investigate whether there is an
association between age when leaving education and greatest rate of income tax
paid.
b It is believed that residents of this town who had left education at a later age were
more likely to be paying the higher rate of income tax. Comment on this belief.
[© AQA 2015]
b Find the probability that the error resulting from this rounding down is greater than
.
c i State the value for .
Julie’s records of her students’ first-time performances in their driving tests are shown in
the table.
Age Pass Fail
a Use a -test at the level of significance to investigate Julie’s belief.
is the amount Manuel spends on his meal in pounds. If the burger costs £ and each
drink costs £ , find:
b i
ii .
b Cauchy has a budget of £ per month for his phone. Anything that he does not
spend on his phone he saves. Find the mean and variance in , the amount saved in
pounds each month.
8 South Riding Alarms (SRA) maintains household burglar-alarm systems. The company
aims to carry out an annual service of a system in a mean time of minutes.
Technicians who carry out an annual service must record the times at which they start
and finish the service.
a Gary is employed as a technician by SRA and his manager, Rajul, calculates the times
taken for annual services carried out by Gary. The results, in minutes, are as
follows:
Assume that these times may be regarded as a random sample from a normal
distribution.
Carry out a hypothesis test, at the significance level, to examine whether the
mean time for an annual service carried out by Gary is minutes.
b Rajul suspects that Gary may be taking longer than minutes on average to carry
out an annual service. Rajul therefore calculates the times taken for annual
services carried out by Gary.
Assume that these times may also be regarded as a random sample from a normal
distribution but with a standard deviation of minutes.
Find the highest value of the sample mean which would not support Rajul’s suspicion
at the significance level. Give your answer to two decimal places.
[© AQA 2014]
9 The time taken to complete a test is modelled by the normal distribution. The average
score on this test is with standard deviation . A sample of students in a school
take the test and if their average is above it will be decided that the school is doing
better than the rest of the population.
a Explain why the normal distribution is a plausible model for the test results.
b Assuming that the standard deviation is still , find the significance level of this
test.
c If the true mean of students in the school is , find the power of the test.
d If the true mean of the students was higher than , would the power of the test be
higher or lower? Explain your answer. No further calculations are required.
Assuming that the holiday-makers are a random sample, use a test, at the
level of significance, to investigate the claim.
[© AQA, 2010]
13 The discrete random variable follows the distribution and satisfies .
a Find the value of .
b Show that .
14 Lorraine bought a new golf club. She then practised with this club by using it to hit golf
balls on a golf range.
After several such practice sessions, she believed that there had been no change from
metres in the mean distance that she had achieved when using her old club.
To investigate this belief, she measured, at her next practice session, the distance,
metres, of each of a random sample of shots with her new club. Her results gave
Investigate Lorraine’s belief at the level of significance, stating any assumption that
you make.
[© AQA 2010]
15 Wellgrove village has a main road running through it that has a speed limit. The
villagers were concerned that many vehicles travelled too fast through the village, and
so they set up a device for measuring the speed of vehicles on this main road. This
device indicated that the mean speed of vehicles travelling through Wellgrove was
.
In an attempt to reduce the mean speed of vehicles travelling through Wellgrove, life-
size photographs of a police officer were erected next to the road on the approaches to
the village. The speed, , of a sample of vehicles was then measured and the
following data obtained.
a State an assumption that must be made about the sample in order to carry out a
hypothesis test to investigate whether the desired reduction in mean speed had
occurred.
b Given that the assumption that you stated in part a is valid, carry out such a test,
using the level of significance.
c Explain, in the context of this question, the meaning of:
i a type I error
ii a type II error.
[© AQA 2015]
16 The discrete random variable satisfies this distribution:
17 Long-term observations suggest that the number of cars passing the school gates
follows a Poisson distribution with the mean of cars per minute. Following the opening
of a new supermarket at the end of the road, the head teacher wishes to find out
whether this mean has increased. She sends a group of students to count the cars
passing the school gates during a -minute interval.
Let be the number of cars passing the school gates in a -minute interval, so that
.
a Write down suitable null and alternative hypotheses.
b Find the critical region for the test at the significance level.
c The students counted cars. State the conclusion of the test.
In reality, the mean number of cars has increased to per minute.
ii What is the probability that all of these confidence intervals are above the true
mean?
AS LEVEL PRACTICE PAPER
45 minutes, 40 marks
1 The number of beetles in a forest can be modelled by a Poisson distribution with parameter
beetles per square metre. Find the probability, to three significant figures, that in a area
there are fewer than beetles.
Choose from these options.
A
D [1 mark]
2 The discrete random variable has a probability distribution given by for
and otherwise. Find .
Choose from these options.
A
B
C
D [1 mark]
b Find . [1 mark]
c Find the standard deviation of . [4 marks]
4 Sarah models the number of buses arriving at a bus stop using a Poisson distribution. is the
number of Route buses arriving in an hour and is the number of Route buses arriving in
an hour. Sarah models these as being independent with and .
a Given that , state in context an interpretation of the variable and write down its
distribution, including any parameters. [2 marks]
b Find the probability that or fewer buses arrive in an hour. [2 marks]
c Give one reason why the assumption that and are independent is unlikely to be the
case. [1 mark]
To check her model, Sarah counts the buses arriving in randomly selected hours.
5 The continuous random variable has probability density function given by for
and otherwise.
a Find the value of . [3 marks]
c Find . [3 marks]
6 This table shows the results of a survey in a school about weekly hours spent watching TV.
Test at the significance level whether school year and hours spent watching TV are
independent.
School year
Hours
[5 marks]
7 a The number of leaks in a pipe is known by a water company to follow a Poisson distribution
with mean leaks per . A new contractor claims that they can reduce the number of
leaks. After they have maintained the pipes for some time, a random stretch of pipe is
investigated and found to have leaks. Test the contractor’s claim at the significance
level. [5 marks]
b It is decided that if three or fewer leaks are found in , then the contractor has reduced
the number of leaks. What is the probability of a type I error? [2 marks]
A LEVEL PRACTICE PAPER
60 minutes, 50 marks
D [1 mark]
D [1 mark]
a Find . [1 mark]
b Find . [2 marks]
a State the null and alternative hypotheses when conducting a chi-squared test for
independence. [2 marks]
b Write down appropriate null and alternative hypotheses to test if the new [2 marks]
swimming technique is effective. [2 marks]
d Investigate, using the significance level, whether the new technique improved the
mean time. [4 marks]
e State one assumption required for your test to be valid. Comment on how reasonable
the assumption is in this context. [2 marks]
7 The number of phone calls received by an IT helpline is known to follow a Poisson
distribution. It is thought to receive a mean of phone calls per hour.
A change to the IT system is designed to encourage fewer phone calls to the helpline. If
there are phone calls or fewer in a -hour period, the change will be deemed successful.
a Find the probability of a type I error in this process. [3 marks]
b In reality the number of phone calls was per hour. Find the probability of a type II
error. [3 marks]
8 When a scientist records the volume of acid required to neutralise a solution she records
her results to the nearest millilitre. For example, if she records a volume of , she
believes that the true volume required is somewhere in between and with all
possibilities equally likely.
The error, , is a random variable defined as the true volume of acid required to
neutralise the solution minus the recorded volume.
a State an appropriate distribution to model , including its parameters. [2 marks]
b Find the probability that the magnitude of the error, , is less than . [1 mark]
c Find the probability that in two independent observations the magnitude of the error is
less than . [2 marks]
d Hence find the probability density function of the random variable , the maximum
magnitude of the error in two observations. [3 marks]
FORMULAE
Probability
Standard deviation
Discrete distributions
Distribution of Mean Variance
Binomial
Poisson
Sampling distributions
For a random sample of independent observations from a distribution having mean and
variance :
The table gives the values of satisfying , where is a random variable having the student’s
-distribution with degrees of freedom.
0 x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
45
50
55
60
65
70
75
80
85
90
95
100
125
150
200
The table gives the values of satisfying , where is a random variable having the
distribution with degrees of freedom.
0 x
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
45 45
50 50
55 55
60 60
65 65
70 70
75 75
80 80
85 85
90 90
95 95
100 100
Answers
1 Discrete random variables
BEFORE YOU START
1
EXERCISE 1A
1 Answers are given to where appropriate.
a i
ii
b i
ii
c i
ii
d i
ii
2 a Proof.
3 a
4 a Proof.
5 a
c
6
7 a
8
9 a
10 a Profit £
Probability
b £
11 a
b i
ii Proof.
12 a
b
c
d i
ii
EXERCISE 1B
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
2 a i
ii
b i
ii
c i
ii
d i
ii
7 a
e
8 a
9 Proof.
EXERCISE 1C
1 a i
ii
b i
ii
4 a Proof;
9 Proof.
10 Proof.
MIXED PRACTICE 1
1 C
2 C
3 a
b
4 a
e
5 a
6 a
8
9 a
10 a
d
11 a
12
13 a
14
15 a
b
16 a
b i
ii Proof.
iii
17 a i Proof.
ii
iii Proof.
iv
b
2 Poisson distribution
BEFORE YOU START
1
4 Do not reject .
EXERCISE 2A
ii
b i
ii
c i
ii
2 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
4
5 a
6 a
7 a
b
8 a
9 a
10
11 a
c There are alternative ways to get emails in a week other than every day.
12 a
c
13 a
14
15 a Proof.
EXERCISE 2B
In this exercise answers are given to , where appropriate.
b i Reject -value
ii Reject -value
c i Do not reject -value
ii Reject -value
2 a i
ii
b i
ii
c i
ii
3 Reject -value
4 a -value . Do not reject .
b Rate might be different at different times. Cars might not be independent – cars might
be travelling together.
5 a
c Reject -value
b Reject -value .
b Reject -value
10
MIXED PRACTICE 2
In this exercise answers are given to , where appropriate.
1 C
2 C
3 a .
c
4 a Independent events. Constant rate of success.
c Reject -value .
5 a
6 a
b
7 a
8 a
b
9 a
b
10 a
b
11 a
c
12 a
g
13 a
b
14 a
d Do not reject
17 -value
18 a
d
19 a i
ii
b i
ii
c
20
21 a i
ii
iii
b
22 a i
ii
c i
ii The coins buried in a hoard are no longer independent. The Poisson assumption requires
independence, so brooches are more likely to be modelled by a Poisson distribution.
3 Chi-squared tests
BEFORE YOU START
1 Yes; the -value is
EXERCISE 3A
1 a i
ii
b i
ii
b or or or
Further Maths
Maths AS or A
No Maths
Walk
Car
Other
Total
d He must assume that his data are representative; for example, it was not a day with unusual
traffic. He must also assume that the respondents were independent; for example, not lots of
students from the same bus.
5 Significant evidence of association; . People who visit more often tend to spend
more money on each visit.
7 a Male Female
£
£
£
£
£
8 a Proof.
9 a Proof.
b Proof; .
10 a First factor
Total
A 12 12 1 5
Second factor B 10 1 4 5
C 3 2 25 20
Total
b Proof.
11 a Proof.
b Proof.
EXERCISE 3B
1 a i
ii
b i
ii
2 Proof;
3 Independent; .
7 You cannot do a calculation based on two factors being dependent unless you know exactly what
that dependency is.
MIXED PRACTICE 3
1 D
2 B
c Do not reject . No significant evidence that hair colour and eye colour are dependent.
6 A
7 a ; significant evidence of association.
b i There are more of them.
ii Largest contribution to .
8 a ; Fiona’s belief is justified.
b Fewer than expected gained Class . More than expected gained Class 2ii.
9 a i ; significant evidence of association.
ii Accidents involving changing lane to the left are less likely, and accidents involving changing
lane to the right are more likely than expected for foreign registered HGVs.
b i Expected values Prosecution resulted No prosecution
years or under
Over years
b Proof;
ii
iii No change.
4 Continuous distributions
BEFORE YOU START
1
EXERCISE 4A
In this exercise answers are given to s.f., where appropriate.
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
f i
ii
g i
ii
h i
ii
2 a i
ii
b i
ii
c i
ii
3 a i
ii
b i
ii
c i
ii
4 a
8 a
10
11 Proof;
EXERCISE 4B
1 a i ; ; ;
ii ; ; ;
b i ; ; ;
ii ; ; ;
c i ; ; ;
ii ; ; ;
d i ; ; ;
ii ; ; ;
2 a i
ii
b i
ii
3 a
4 a
b
5 a Proof.
6 a Proof.
EXERCISE 4C
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
2 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
3 a i
ii
b i
ii
c i
ii
d i
ii
4 a £
b £
5
6 a
b
7 a
b
8 a
9 a if is odd, if is even.
EXERCISE 4D
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
2 a
d
3 a Proof.
4 ;
5 ;
6 ;
7 minutes; minutes
8 is twice the mass of a single gerbil; is the sum of the masses of two different gerbils.
EXERCISE 4E
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
f i
ii
2 a
b
3 a
c
4 a
b
5 a
b
6 a
c
7 a
8
9 a
10 a
b Assumes that the rainfall each day is independent of the rainfall on other days. This is unlikely
to be the case.
11 a
b i
ii
iii
EXERCISE 4F
1 a i ;
ii
b i
ii
2 a i
ii
b i
ii
4 a
b , otherwise.
EXERCISE 4G
1 a f(x)
5k
0 x
0 5 10
c
2 a
3 a Proof.
c
d
4 a
b f(w)
0 w
0 3 7
d
5 a Proof.
6 a
7 a f(x)
0 x
0 π 5π
– –
2 8
b Proof.
EXERCISE 4H
In this exercise answers are given to 3 s.f., where appropriate.
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
i
2 a i
ii
b i
ii
3 a
5
6 a Proof.
b Proof; .
EXERCISE 4I
In this exercise answers are given to , where appropriate.
1 a i
ii
b i
ii
c i
ii
2 a i
ii
b i ;
ii
3
4 a
8 Proof; it equals
9 a Exponential;
c Proof.
10 Proof.
a
c Proof.
d Proof.
EXERCISE 4J
1 a i
ii
b i
ii
c i ;
ii ;
d i ;
ii ;
2 a
3 a
4 a
b
5 a
6 a
7 Proof;
8 a
MIXED PRACTICE 4
1
2
3 a
c
4 a
5 a £
b £
c £
6 a
b i
ii
7 a
b
9 a
10 a
d
11 a
c Probably not. Especially in the situation in part b it is likely that when Alice is finished Hassan
might try to speed up.
12
13
14
15 a
16 a
d Proof.
e
17 a
c
18 a Proof.
b
c
19 a
20
21
22 a
b
23 a Proof.
d
e i
ii
24 a f(x)
3
–
10 x + 7
y =–
40
1
–
5
x
O 1 5
c Proof.
d i
ii Proof.
e
25 a f(x)
3
–
32
x
O 1 11
–
2
b i-ii Proof.
c i
ii
d
Focus on … Proof 1
Proof 6
1 Theorem 2.
2 Theorem 3.
3 Theorem 5.
4 Properties of sums.
6 Theorem 1.
7 Theorem 4.
1–3 Proof.
Focus on … Problem solving 1
1 0.613 or 0.168
2 0.199 or 0.416
3 8
4 a, b Proof.
c 18
Focus on … Modelling 1
1 Not very appropriate. Rate might not be constant in every part of the ocean. The presence of fish might
not be independent.
4 Not appropriate. This is not a number of events, and the rate might not be constant throughout the day.
Buses might not be independent.
5 The rate might not be constant, but the Poisson tends to work quite well in these situations.
6 Not appropriate. The number of fish being caught is sufficient that it might have a significant effect on
the number of fish remaining in the pond.
7 This is well modelled by the Poisson distribution (and indeed is used in a derivation of the chi-squared
statistic).
2 Reject
3 Do not reject
5 Do not reject .
EXERCISE 5A
1 a i Reject
ii Reject
b i Do not reject
ii Do not reject
c i Do not reject
ii Do not reject
2 a
b Reject
b Do not reject .
4 Reject
5 a
b Do not reject
6 a
c Reject .
7 a
b Do not reject .
c i
ii Reject .
8 a
EXERCISE 5B
1 a i
ii
b i
ii
c i
ii
2 a i
ii
b i
ii
c i
ii
3 a i
ii
b i
ii
4 a i
ii
b i
ii
c i
ii
5 a i
ii
b i
ii
c i
ii
6 a i
ii
b i
ii
c i
ii
7 Decreases the risk of a type II error, but increases the risk of a type I error.
b Type I error.
b For example: Roll the dice more times, look for more than sixes, consider other numbers,
do a chi-squared test.
11 a
b . This is very small and requires extreme evidence before change is found. This does not
seem to be required in this situation.
12 a
c
13 a
b Do not reject . There is not enough evidence that Dhalia’s eggs are heavier.
e
14 a
d Proof.
15 a i
ii
b i
ii
16
MIXED PRACTICE 5
1 C
2 A
3 a
c Do not reject
c
6 a
d Reject .
7 a
c
8 a Proof;
b Type I error.
Chapter 6 Confidence intervals
BEFORE YOU START
1
3 Do not reject
4 No significant evidence.
EXERCISE 6A
1 a
b
2 a i
ii
b i
ii
b Yes
7 a
b No. This is a confidence interval for the population mean, not sample means.
10 a
b
c No, since the confidence intervals overlap (although it is quite difficult to find the significance
level).
11 a
12 a
c Yes
13 a False.
b False.
c False.
d True.
e False.
14
EXERCISE 6B
1 a i
ii
b i
ii
c i
ii
2 a Assume heights are normally distributed.
3 a
c No. The confidence interval is for the mean value for an individual. The sample is too small for
meaningful generalisations.
4
5 a
c Yes
6 a
b i
ii Do not reject .
7 a
b
8 a
b Proof.
MIXED PRACTICE 6
1 D
2 a
c Do not reject
5
6 a
b No. the given probability suggests that which does not fall in the confidence interval.
8 a
b Do not reject
9 a
10 a i
ii
b i
2–3 Proof.
4 Proof.
5–6 Proof.
Focus on … Problem solving 2
1 Yes. No probability is involved.
2 About 95%.
b No change.
4 About 99.4%.
5 About 30%.
Focus on … Modelling 2
1 Investigation.
5 The shape looks like a normal distribution, but it is much wider – it extends to t-scores above 3 and
below −3.
6 The standard deviation is much closer to 1 and the t-scores histogram is very similar to the z-scores
histogram.
Cross-topic review exercise 1
1 B
2 D
3 a f(x)
9k
x
O 3 4
b Proof.
c i
ii
4 a ; degrees of freedom .
c No. Just because there is dependency does not mean that there is causality.
5 a Proof.
6 a minutes; minutes
b £ £
7 a Mean: ; s.d.:
b i
8 a
9 a i
ii
iii
b
10 a AP AV
Baseball
Basketball
Soccer
c Soccer officials are far less likely than expected to use an AV coping style. Baseball officials are far
more likely than expected to use an AV coping style.
11 a
b Proof.
12 a
13 a i
ii
b i
iii
iv Making a type II error, rejecting a genuine opportunity, might turn out to be very costly. Further
tests can always be done to be more certain.
14 a Phone calls are independent of each other, there is a constant average rate of phone calls.
b For example: The same customer might call back, breaking independence. The rate during office
hours might be different from the rate during the night.
c ( )
e hours ( s.f.).
15 a
e
16 a
b i or more cans containing less than even though the mean is actually .
ii
c i
ii Significant evidence that the cans contain less than on average; significance level is .
17 a
c Proof;
ii
Cross-topic review exercise 2
1 a
3 a
c i
ii
iii
More students than expected in the age group pass their test first time.
5 a
b i
ii
6 a
b
7 a
b
9 a Most students will be close to the average, with fewer and fewer students getting scores as you
move further from the mean.
d Power would increase because the test will be more likely to pick up the difference from .
10 a
11 ; sufficient evidence to support Judith’s belief. Assumption that the population is normally
distributed.
14 . Evidence to support Lorraine’s belief. Assume that the distances follow a normal
distribution.
15 a Random sample.
c i Concluding that the mean speed has reduced when in fact it has not.
ii Concluding that the mean speed is still when in fact it has reduced.
16 a
17 a
d
18 a
19 a
c No
d i
ii
AS Level Practice Paper
1 C
2 C
3 a 0.1
b 0.8
c 0.8
b 0.0591 (3 s.f.)
c 0.338 (3 s.f.)
b 0.0212 (3 s.f.)
A Level Practice Paper
1 B
2 D
3 a 53
b 59
c 5
4 a H0: Gender and lessons are independent; H1: Gender and lessons are not independent.
b 1
5 a 1ln2
c 0.144 (3 s.f.)
6 a Proof.
b H0: μ=35;H1: μ<35
c 11
e Assume that the swimming times are drawn from a normal distribution. Any reasonable
comment, for example: OK because swimming times will be mainly clustered around the average
with few people at extremes or not OK because the swimming club is likely to have people at the
upper tail of the distribution.
7 a 4.58% (3 s.f.)
b 92.5% (3 s.f.)
8 a Rectangular, between −0.5 and 0.5.
b 0.8
c 4x2
d {8x0<x<0.50otherwise
Chapter 1 worked solutions
1 Discrete random variables
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 1A
b and .
c
6 Let the discrete random variable represent the number rolled on the dice.
Substituting this expression into the formula for the expected mean and using the fact that
:
8 Using the formula for the expected mean and using the fact that you are looking for an expected profit
of :
9 a
d and .
So, .
10 a Profit £
Probability
On the other hand you know that the mode has to satisfy mode
Calculating the values of and using the total probability of and the known expected mean of :
Combined score
Probability
So,
12 a
Median
d , . Let the random variable represent the number of people who borrow no books,
.
ii
EXERCISE 1B
8 a Using :
10 The more often the coin is tossed, the more likely it becomes to show a head at least once. However,
there is no value of that guarantees a head. Hence, the expectation value is infinite and it is not
possible to define a fair price for this game.
EXERCISE 1C
EXERCISE 1C
2 Expected mean
3 Expected mean
4 a .
Let , where .
When :
When :
So, , where .
Then,
5 Let be the random variable describing the position number of the broken bulb.
Then .
Then .
and .
10 which is divisible by .
MIXED PRACTICE 1
(Answer )
(Answer )
3 a
4 a
b
c
So, .
Median
5 a
b .
c .
6 a
So, .
Solving for :
Solving for :
10 a
c .
So, .
11 a
12
13 a Using the general relation and the fact that the summed probability must be :
b Using the fact that and substituting the value for from part a:
14
15 a Expected mean
Standard deviation
b Using the fact that there are possible positions of the Queen of Spades, each of which is equally
likely:
16 a
b i Using the fact that the summed probability has to be :
Solving for :
ii
iii
So the standard deviation of is
17 a i Using the fact that the summed probability has to be :
ii
iii
iv
b
Chapter 2 Worked solutions
2 Poisson distribution
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 2A
5 a Let be the number of white blood cells shown in a single high power field, :
b Let be the number of white blood cells shown in six high power fields, :
7 .
d Using the formula for conditional probability and your answers from parts a and b:
9 a
b
10 , so
Tip
c There are alternative ways to get emails in a week other than every day.
12 a Let be the number of errors in a piece of homework, :
Then and
c If there are no requests, no car is used. If there is a day with only a single request each car will be
used with half probability.
15 a
EXERCISE 2B
Assuming is true: .
Using technology:
There is significant evidence that the sample emits a different number of alpha particles.
Tip
4 a
Assuming is true: .
There is not significant evidence to suggest that the average number of cars travelling past the
traffic light is lower than per minute.
b The rate might not be constant and the cars might not be independent. If one car drives slowly, it
might slow down the cars behind it.
5 a A Poisson distribution has equal mean and variance. Here they have approximately the same value.
b
Assuming is true: .
Do not reject at the significance level. There is not significant evidence that there has been a
reduction in the average number of accidents.
6 a The sample mean is and the unbiased estimate of the population variance is , giving a standard
deviation of approximately .
b A Poisson distribution has equal mean and variance. Here, they have approximately the same value.
Assuming is true: .
Reject at the significance level. There is significant evidence that the number of mistakes is
lower.
7 a You need a constant rate over the day and the bees have to arrive independently.
Assuming is true: .
Reject at the significance level. There is significant evidence that the number of bees has
increased.
Assuming is true: .
Using technology: .
Assuming is true: .
Do not reject at the significance level. There is not significant evidence that the number of
earthquakes has increased.
Reject at the significance level. There is significant evidence that the number of earthquakes
has increased.
10 If is true, then .
So,
MIXED PRACTICE 2
(Answer C)
(Answer C)
3 a Assuming that thrushes and robins visit the table independently of each other:
4 a The events have to be independent and the rate of success has to be constant.
Reject at the significance level. There is significant evidence that the average rate of burgers
ordered has increased from .
5 a
b :
6 a
7 a Mean
Standard deviation
Tip
b A Poisson distribution has equal mean and variance. Here they have approximately the same value.
Using technology: .
There is not significant evidence of a change in the average rate of power surges.
Tip
8 a
Let be the number of days in a five-day week on which there are more than calls. So
.
9 a
Let represent the number of years with fewer than seven rainy days.
10 a
So
11 a
So
So
For the first eruption to occur between and there are hours with no eruptions followed
by an hour with at least eruption:
e Over days with an estimate of eruptions per day, each producing litres of water:
15
Do not reject at the significance level. There is not significant evidence that the average rate has
decreased.
16 a The average rate must be constant. However, you might expect it to vary over different times of the
day and with different weather conditions. Birds must arrive independently, but they might come in
flocks.
b :
c ;
Assuming is true: .
Using technology: .
There is no significant evidence of a change in the number of birds arriving each hour.
Tip
Do not reject at the significance level. There is not significant evidence that the number of leaks
has increased.
18 a Using :
Probability
Probability
Tip
, so
ii
b i
ii
c
20 Exclude the probability of . Then scale the old mean by the new probability.
ii
iii
b Let represent the number of bicycles brought in to be serviced in one week ( days).
ii
ii The coins buried in a hoard are no longer independent. The Poisson assumption requires
independence, so brooches are more likely to be modelled by a Poisson distribution.
Worked solutions
3 Chi-squared tests
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 3A
b or lower or or
Further Maths
Maths AS or A
No Maths
Observed values:
Fiction
Non-Fiction
Expected values:
Fiction
Non-Fiction
and .
Walk
Car
Other
Total
b Expected values:
Car
Other
Total
critical value.
d He must assume that his data are representative. For example: it was not a day with unusual traffic.
He must also assume that the respondents were independent. For example: not lots of students from
the same bus.
Reject .
People who visit more often tend to spend more money on each visit.
6 The expected frequencies are given by:
days
days
days
Do not reject .
Hence, the drug does not appear to be effective in increasing the speed of recovery.
7 a Male Female
£
Do not reject .
If is true, half of the voters for each party will be males and half will be females.
Degrees of freedom, . Critical value from the table at the level of significance
is .
Observed Expected
Male Female Male Female
, so do not reject and conclude that, for a sample of size at the level, there is not
significant evidence to suggest that gender and voting intention are independent.
c Let the smallest sample size required be times the sample of size .
, so
b
11 a The square causes larger deviations to have a stronger effect and ensures is positive.
b You divide by the expected value to prevent large groups contributing disproportionately and
overwhelming the contributions of smaller groups.
EXERCISE 3B
Wrinkled Round
Yellow
Green
If is true, then half the drinks that the colleague likes had tea added first, and half had milk added
first.
Observed Expected
(critical value from the table at the significance level), so do not reject at the
level. There is evidence to suggest that the colleague's enjoyment is independent of whether milk or
tea is added first.
4 The number of books does not differ between rural and urban libraries.
Observed Expected
Rural Rural
Urban Urban
(critical value from the table at the significance level), so do not reject at the
level. There is evidence to suggest that the number of books does not differ significantly between rural
and urban libraries.
5 There is no association between the amount spent on horror films and the number of murders each
year.
There is an association between the amount spent on horror films and the number of murders each
year.
Observed Expected
(critical value from the table at the significance level), so reject . At the level,
there is significant evidence of an association between the amount spent on horror films and the
number of murders each year. However, this does not establish causality (it does not show that high
spending on horror films causes a high number of murders).
6 a Acceptance patterns are independent of gender (no association).
Observed Expected
Accepted Rejected Accepted Rejected
Male Male
Female Female
(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that acceptance at this university is dependent on
gender.
The acceptance rates are for males and for females, which appears to be evidence of bias.
Expected
Total
Degrees of freedom, .
(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that acceptance patterns depend on department.
The proportion of men admitted is higher than the proportion of women admitted in two of the six
departments ( and ).
Dept
% Men admitted
% Women admitted
7 You cannot do a calculation based on two factors being dependent unless you know exactly what that
dependency is.
MIXED PRACTICE 3
(Answer B)
Recipient
Small Large
Individual
business business
Satisfactory
Outcome
Bad debt
(critical value from the table at the significance level).
Do not reject .
Eye colour
Blue Green Brown
Brown
Hair colour
Blonde
Do not reject with (critical value from the table at the significance level).
There is no significant evidence that hair colour and eye colour are dependent.
5 Expected frequencies:
Girl
Do not reject . There is no evidence of any association between the gender of the baby and the time
of year.
(Answer A)
7 a Combining the last two columns since otherwise the expected frequencies would be smaller than :
Semi-detached and
Flat Terraced detached
ii The large difference between observed and expected frequencies together with the small
expected frequency gives a large contribution to .
8 a There is no association between class of degree and A-level grade.
A-level grade
Total
Degrees of freedom, .
(critical value from the table at the significance level), so reject . There is some
evidence at the level to suggest that Fiona's belief is justified.
b They obtained more Class degrees, but fewer Class degrees than expected.
9 a i The type of sideswipe accident is independent of where the was registered.
(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that the type of sideswipe accident is dependent on
whether the involved was British registered or foreign registered.
ii Accidents involving changing lane to the left are less likely and accidents involving changing lane
to the right are more likely than expected for foreign registered .
b i Observed Expected
No No
Prosecution Prosecution
prosecution prosecution
(critical value from the table at the significance level), so do not reject . There
is significant evidence at the level to suggest that prosecutions resulted independently of the
age of the driver. There is no significant evidence that the driver's age had an influence on
whether or not a prosecution resulted.
10 There is no association between the age of staff and the department they work in.
There is an association between the age of staff and the department they work in.
Combine the columns for staff aged and due to an expected frequency being .
Expected
Total
Accounts
Personnel
Marketing
Comms.
Total
Degrees of freedom, .
(critical value from the table at the significance level), so reject . There is significant
evidence at the level of an association between the age of staff and the department they work in.
11 a A sixth of businesses do not repay their loans while a fifth of all mortgages and personal loans are
defaulted.
Repaid Defaulted
Personal
Mortgage
Business
Observed frequencies:
Expected frequencies:
Personal
Mortgage
Business
Total
To show dependence at the significance level, (critical value from the table at the
significance level).
Degrees of freedom, .
Expected
Type diabetes
developed
Yes No Total
Less than
Between and
Average level of weekly alcohol
consumption
More than
Total
(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that development of Type diabetes is dependent on
the average level of weekly alcohol consumption, i.e. that there is an association.
Consumption
Diabetes developed
It appears that the advice given might lower the risk for people in the ' ' group, but increase the
risk for people in the ' ' group.
However, these proportions apply to this particular group of people, and there is no evidence to
suggest that the group is representative of the whole population, for whom the proportions might be
very different.
c i The number of rows and columns in the contingency table, the number of degrees of freedom and
the significance level of the test would not change, so the critical value would remain at .
ii Compare with .
Each of the terms that are summed to find the test statistic, , would increase by a factor of , so
would increase by the same factor from to .
iii The conclusion is the same because the test statistic is still greater than the critical value, but the
evidence to support that conclusion would be stronger than before.
Chapter 4 Worked solutions
4 Continuous distributions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 4A
5 Setting the total probability equal to , and using the lower limit and the upper limit , to find :
Using either :
CDF is
Using the fact that the probability of two independent observations of both being below is :
Using the fact that the area under the graph must be equal to and substituting :
10 Using the fact that the total probability has to be and solving the exponential equation either by using
the sinh function or by substituting :
EXERCISE 4B
3 a Using :
5 a is defined between and . To show that is a PDF, you need to show that , for all
and that .
b Using :
b Using , substituting in the value of from part a and using the fact that
:
so
Set and
Then and
So
EXERCISE 4C
EXERCISE 4C
4 a Expected value £ £ £
b Standard deviation £ £
6 a
7 a
9 a
EXERCISE 4D
EXERCISE 4D
2 a
d
3 a
4 Let represent the mass of a woman, represent the mass of a man and represent the total mass
of the women, men and the lift:
5 Let represent the outcome of the roll and let represent the difference between the two scores:
6 Let represent one student's exam scores, and let represent the difference between two students'
scores:
7 Let represent Pamela's journey time, let represent Adrian's journey time and let represent the
difference in total weekly journey times:
EXERCISE 4E
2 a
The distribution is .
b Using your calculator:
3 Let the times taken by Aaron and by Bashir be represented by and , respectively.
a
Standard deviation of
4 a
Tip
6 a
b
7 Let represent the mass of a randomly chosen apple, and let represent the mass of a randomly
chosen pear.
a
For snakes:
Using the fact that for a randomly selected sample of corn snakes, :
So
9 Let represent the score of a randomly chosen boy and let represent the score of a randomly chosen
girl.
a
Mean of
Tip
, so
This gives
Weekly rainfall .
Using the fact that in a randomly chosen -day week, there is a probability of that the mean daily
rainfall is less than :
This gives
b Assumes that the rainfall each day is independent of the rainfall on other days, which is unlikely to
be the case.
11 a
b i
Tip
ii On any day,
Let the number of mornings per week that she waits more than minutes be :
Tip
You can use a probability found from a normal distribution as the parameter in a
binomial distribution.
iii
c Average for a week is more than minutes if she waits more than minutes on the last day.
EXERCISE 4F
4 a
EXERCISE 4G
EXERCISE 4G
1 a
From , area is
, so the median is .
Tip
Using the fact that the area under the graph of the PDF is equal to :
c Using the fact that the area to the right of the upper quartile is :
Tip
You can use technology to find or, if working manually, test for a sign change in the
value of ; start by evaluating and
d The CDF,
First section:
At
Second section:
At
4 a
Using the fact that the area under the graph of the PDF is equal to :
Simplifying:
d Considering the areas under the two sections of the graph of the PDF:
Using area :
Simplifying:
5 a You must show that is never negative and that the area under the graph is .
The function for all real , because and for all real .
b Using integration by parts to find and and using the variance formula
:
and
Integrating by parts:
and
Tip
and
So,
6 a
Using the fact that the area under the graph of the PDF is :
, giving
c The CDF is
First section:
, so , giving
Substituting :
Second section:
, giving
Substituting :
CDF is
d By substitution:
e By substitution:
, giving
7 a
Using the fact that the area under the graph of the PDF is equal to :
, so
c
So,
and
EXERCISE 4H
EXERCISE 4H
3 a Let represent the length of the piece.
b Expected mean
Standard deviation
6 a
7 Mean of
Standard deviation of
EXERCISE 4I
7 Assuming is the average number of calls per minute, you can find an expression for the average
waiting time per call, :
9 a The waiting time per bus follows an exponential distribution with mean Hence .
10 Using integration by parts to find and and using the variance formula
:
and
Integrating by parts:
So , and
The PDF is
EXERCISE 4J
2 a Using the fact that the total probability has to be 1 and the fact that :
So
Hence
c Splitting the calculation of into an integral over the continuous part of the variable and then
adding the value for the discrete part:
d Splitting the caclulation of into an integral over the continuous part of the variable and then
adding the value for the discrete part:
3 a Using the fact that the total probability has to be and the fact that :
So :
Hence .
b Splitting the calculation of into an integral over the continuous part of the variable and then
adding the value for the discrete part, then substituting in the value of c from part a:
4 a Using the fact that if you add the cumulative probability for the continuous interval to the probability
for the integer values you have to obtain a total probability of :
b For
Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:
5 a
Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:
d Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:
6 a Using the fact that and that has to have equal values at and due to continuity:
b For
Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:
MIXED PRACTICE 4
1 (Answer C)
2 Standard deviation of
(Answer B)
3 a Using the fact that the total probability has to equal :
b
c
4 a Using the fact that the total probability has to equal :
c Using :
b £ £
c £
Standard deviation £
b i
So
So
c For
11 a
b
c Probably not. Especially in the situation in part b it is likely that when Alice is finished Hassan might
try to speed up.
d standard deviation
Measures for over days
For :
So, the expectation will be greater than the standard deviation for days.
12 Let the mean time taken per question for the questions be minutes.
Mean of
Variance of
Tip
Alternatively, let the mean time taken to answer all questions be minutes.
Mean of
Variance of
Note that when values are taken from the same normal distribution with variance , of
these values have variance , not .
Mean of
The random variable .
Tip
Mean of ;
14 Let the mean mass of purple beads be , and let the mass of yellow beads be .
Mean of
15 a Mean of
Variance of
16 a
The median, or
CDF PDF
Interval
Tip
Only one statement is needed in the PDF for the intervals and .
Tip
Alternatively, you can find the probability that values have a sum greater than
.
The CDF is
You need to find the probability that both observations are less than , where each of these events
has probability .
b Finding by integrating over the continuous part and summing over the
discrete part:
c Let the median value of be , then .
This gives
Tip
20
21
b Let the average marks of the class in English and in Mathematics be and , respectively.
Mean of
Alternatively, you could find the probability that the sum of the marks is higher in English
than in Mathematics .
e i
ii
24 a
d i
ii
25 a
b i
ii
EXERCISE 5A
2 a
Use the table to find the critical value. When and for a two-tailed test you look at the
column.
Reject at the level. There is significant evidence that John's computer does not take
seconds to start on average.
c Assume that the times are normally distributed. True variance is unknown.
3 a
Use the table to find the critical value. When and for a one-tailed test you look
at the column.
Do not reject . There is no significant evidence that the packets contain more than on
average.
4 .
Use the table to find the critical value. When and for a one-tailed test you look
at the column.
Use the table to find the critical value. When and for a one-tailed test you look at
the column.
Do not reject . There is no significant evidence that cleaning the kettle decreases the time it takes
to boil.
6 a
Use the table to find the critical value. When and for a one-tailed test you
look at the column.
Reject . There is significant evidence that the athletes of the club do, indeed, run faster.
Use the table to find the critical value. When and for a two-tailed test you look
at the column.
Do not reject . There is no significant evidence that the mean length of the bananas is different.
c i
ii
Use the table to find the critical value. When and for a one-tailed test you
look at the column.
Reject . There is significant evidence that the mean length of the bananas is less than .
8 a
b Using :
So, calculating the values of for different , and comparing to the critical values from the
(since two-tailed test at level) column of the table:
Critical value
The first value of that falls into the critical region is so Aki will reject the null hypothesis at the
level for all .
EXERCISE 5B
7 This decreases the risk of a type II error but increases the risk of a type I error.
8 a : coin is fair, : coin is biased.
b For example: roll the dice more times, look for more than sixes, consider other numbers, do a chi-
squared test.
11 a
The significance level is , which is very small and requires extreme evidence before change is
found. This does not seem to be required in this situation.
12 a
13 a
b and
, so do not reject .
There is no evidence to suggest that the mean mass of the eggs is greater than .
14 a
c You need to find the probability that is accepted, given that the alternative value for is true, i.e.
.
d at a stationary point.
Solutions are and .
nd derivative:
b i
ii
16
There is a special case when , where you simply cannot make a type II error. Here the power is .
In all other cases, the power of the test is .
MIXED PRACTICE 5
c Use the table to find the critical value. When and for a two-tailed test you look
at the column.
Do not reject . There is no significant evidence that the volume of produced in a reaction
differs from .
5 a
6 a
d Use the table to find the critical value. When and for a one-tailed test you look
at the column.
Use the table to find the critical value. When and for a one-tailed test you
look at the column.
Do not reject . There is not enough evidence for a reduced journey time.
9 a .
Sample mean,
Sample
Test statistic,
There is no significant evidence at the significance level to doubt that the current mean depth is
.
b .
Sample mean,
Sample
Test statistic,
There is significant evidence at the significance level that the current mean depth is not .
c i Neither;
Differences from are easier to detect in a narrower interval, so a sample size of gives
smaller risk of making a type II error.
10 a : method of encouragement and outcome are independent
Reject . There is sufficient evidence of an association between method of receiving information and
outcome.
EXERCISE 6A
4 a
b lies within the confidence interval found in part a. It is plausible that the true oxygen level is .
5 a
b Yes, there is sufficient evidence that the true mean is below since is above the upper bound of
the confidence interval found in part a.
7 a
b lies within the confidence interval found in part a. No significant evidence of a difference in the
mean wage.
8 a ,
b No. This is a confidence interval for the population mean, not sample means.
c
Using the fact that the width of the confidence interval must be less than :
Using the fact that the width of the confidence interval must be less than :
10 a
b ,
c No, since the confidence intervals overlap (although it is quite difficult to find the significance level).
11 a Using these unbiased estimates:
Upper bound of CI is
so
b In a one-tailed test at the significance level, a mean of lies outside the confidence interval
, so reject .
12 a
c Yes. is below the lower bound of the confidence interval found in part a. The data suggests an
increase of at least at the level.
13 a False.
b False.
c False.
d True.
e False.
14 The larger the interval, the more confidence you can have that the true mean lies within it, hence the
interval will be larger than the one.
EXERCISE 6B
3 a
c No, the confidence interval shows the likely position of the mean, not of the individual member of the
population. The sample is too small for meaningful generalisations.
4 , so unbiased estimate of population standard deviation is:
and
5 a
c Yes. is below the lower bound of the confidence interval. It could even be claimed that the
lifetime is at least pages.
6 a For , the table gives
i.e.
b i
7 a
i.e.
b As in part a, , and
, so for all .
MIXED PRACTICE 6
2 a
c lies within the confidence interval. Do not reject . There is not enough evidence to suggest a
different mean to .
3
6 a
7 a
, giving
Interval is
b .
Test statistic,
Two-tailed test at the level of significance, so critical value is .
is outside the rejection region, so do not reject . There is evidence to suggest that the diet
results in no change in mass.
9 a
10 a i A confidence interval for the population mean has a probability of including the true
population mean.
So the probability that the interval will not include the value of , or
ii (At least one interval will not include ) (No intervals do not include )
Tip
Small sample with taken from the sample, so using the -distribution:
confidence interval is
So, the new programme seems to have been effective and the mean time seems to have
decreased.
Worked solutions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
Then, using the fact that the probabilities must have a total sum of 1:
a+b=1⇒a=1−b
E(Y)=1×a+4×b=3⇒a=13,b=23
Var(Y)=E(Y2)−(E(Y))2=13×12+23×42−32=2 (Answer B)
2 Z=Φ−1(0.975)=1.96 (3 s.f.)
b Using the fact that the total area under the graph must be 1:
∫03kx2 dx+9k(4−3)=9k+9k=1⇒k=118
c i ∫03kx2 dx=9k(4−3)=0.5
∫0qkx2 dx=kq33=0.25
⇒q=0.75k3=2.38 (3 s.f.)
ν=(3−1)(2−1)=2
Comparing the χcalc2 value to the critical value from the table at the 5% significance level:
χcalc2=∑(Oi−Ei)2Ei=31.6 (3 s.f.)>5.991
c No. Just because there is dependency does not mean that there is causality.
5 a Using the fact that the total summed probabilities must be equal to 1:
∫02px+q dx=[px22+qx]02=2p+2q=1⇒p+q=12
b Using the fact that E(X)=23 and substituting in q=12−p from part a:
∫02px2+qx dx=[px33+qx22]02=8p3+2q=8p3+1−2p=2p3+1=23⇒p=−12,q=1
c Calculating P(X>1) and substituting in the values for p and q from part b:
P(X>1)=∫12px+q dx=[px22+qx]12=1−p2−q=14
d Using the formula for Var(X) and substituting in the values for p and q from part b:
Var(X)=E(X2)−(E(X))2=∫02px3+qx2 dx−(23)2=4p+8q3−49=29
6 a X=C+T
E(X)=E(C)+E(T)=20+100=120 minutesσ(X)=σ2(C)+σ(T)2=52+102=11.2 minutes (3 s.f.)
b X=C+T,Y=200+1060X=200+16X
E(Y)=200+16E(X)=£220σ(Y)=136σ2(X)=16σ(X)=£1.86 (3 s.f.)
7 a Mean=10 065160=62.9 kg (3 s.f.);
b i Z=Φ−1(0.99)=2.33 (3 s.f.)
ii 61.7 lies within the confidence interval found in part b i. There is reason to doubt the claim.
8 a (X+Y) has a normal distribution with mean 5+3=8 and variance 22+52=29.
(X+Y)~ N(8,29).
b P(X⩾15−Y)=P(X+Y⩾15)=1 − P(X+Y<15)=1−Φ(15−829)=1−Φ(1.300)=0.0968
9 a Let X represent the number of patients during one week requiring treatment after a venomous snake
bite, X~ Po(0.5).
P(X⩽1)=e−0.5+0.51e−0.51!=0.910 (3 s.f.)
b i
ii Let Y represent the number of patients during an 8-week period requiring treatment after a
venomous snake bite, Y~ Po(4).
P(Y⩾5)=1−P(Y⩽4)=1−∑n=044ne−4n!=0.371 (3 s.f.)
iii Let Z represent the number of patients during a 26-week period requiring treatment after a
venomous snake bite, Z~Po(13).
P(10<Z<20)=P(Z<20)−P(Z⩽10)=∑n=01913ne−13n!−∑n=01013ne−13n!=0.706 (3 s.f.)
c Let W represent the number of patients during a 4-week period requiring treatment after a venomous
snake bite, W~Po(2).
P(W>5)=1−P(W⩽5)=1−∑n=052ne−2n!=0.0166 (3 s.f.)P(W>6)=1−P(W⩽6)=1−∑n=062ne
−2n!=0.004 53 (3 s.f.)
10 a AP AV
Baseball 275 50
Basketball 475 75
Soccer 350 25
AP AV
ν=(3−1)(2−1)=2
Comparing the χcalc2 value to the critical value from the table at the 1% significance level:
χcalc2=∑(Oi−Ei)2Ei=15.0 (3 s.f.)>9.210
There is significant evidence of an association between coping strategy and the sport involved.
c Soccer officials are far less likely than expected to use an AV coping style. Baseball officials are far
more likely than expected to use an AV coping style.
11 a Using the fact that the summed probabilities must equal 1:
a+b=1⇒b=1−aE(X)=b=1−a
b Var(X)=E(X)2−(E(X))2=b−E(X)2 =1−a−(1−a)2=a−a2
c Var(X)=f(a)=a−a2
Differentiating with respect to a and setting equal to 0 to find the maximum value:
f′(a)=1−2a=0⇒a=12⇒Var(X)=14
12 a X¯=12.7+13.32=13
b Z=Φ−1(0.95)=1.65 (3 s.f.)
Zσn=0.3⇒n=(Zσ0.3)2=636 (3 s.f.)
13 a i Let X represent the number of diamonds found per 2 m3, X~ Po(5):
P(X=2)=522!×e−5=0.0842 (3 s.f.)
b i For a one-tailed test at the 10% significance level, and using 1 m3 as your standard unit of volume:
H0: λ=2.5, H1: λ>2.5.
Tip
Alternatively, you could use 2 m3 as our standard unit of volume with
H0: λ=5 and H1: λ>5.
P(X⩾7)>10%, so do not reject H0. There is no significant evidence at the 10% level to suggest that
the number of diamonds found is greater than 5 per 2 m3 (or 2.5 per 1 m3 ), i.e. that the new mine
will be economically viable.
iii For a type I error to be made, the true null hypothesis H0: λ=5 per 2 m3 is rejected.
This will occur at the 10% level of significance in cases where the number of diamonds found, r, is
such that P(X>r)<0.10.
iv To reduce the likelihood of making a type II error. Making such an error would result in a lost
opportunity for the owner, as they would find significant evidence to suggest that the mine is not
economically viable when, in fact, it is.
14 a Phone calls are independent of each other. There is a constant average rate of phone calls.
b For example: The same customer might call back, breaking independence. The rate during office
hours might be different from the rate during the night.
c σ(X)=λ=4.5=2.12 (3 s.f.)
d Let X represent the number of phone calls answered in a 2-hour shift, X~ Po(2×4.5=9).
P(X<10)=∑n=099ne−9n!=0.587 (3 s.f.)
e Let Y represent the number of phone calls answered in an a-hour shift, Y~Po(a×4.5).
b E(X)=∫01x f(x) dx=∫01ax2−x4 dx=a3−15
Substituting a=52 from part a:
E(X)=56−15=1930
c E(30X+2)=30E(X)+2=21
E(1X)=∫011xf(x) dx=∫01a−x2 dx=a−13
d
Substituting a=52 from part a:
E(1X)=52−13=136
∫0max−x3 dx=am22−m44=12n=m2⇒n2−5n+2=0⇒n=52±174≈4.56,0.438⇒m=
±0.662,±2.14 (3 s.f.)
P(X=12)=(2512)×0.525=0.155(3 s.f.)
b i Rejecting the batch when the mean contents are, in fact, 330 ml.
The probability that the mean volume is 330 ml lies outside the 95% confidence interval found in
part c i, so Philip will reject H0. There is evidence to suggest that the cans come from a population
whose contents are, on average, less than 330 ml.
b The total probability is the product of the probabilities of each individual event happening multiplied
by two, since it is not specified who is borrowing more/fewer than 3 books.
2×P(X>3)×P(X<3)=2×0.35×0.45=0.315
c E(X)=1×0.19+2×0.26+3×0.20+4×0.13+5×0.07+6×0.15=3.08
e i 10×E(X)=30.8 pence
c If X~exp(λ), then the standard deviation of X is 1λ, which in this case is 13.
2 a Calculating the expected frequencies:
Some of the frequencies are less than 5, so merging the 17 or 18 and the 19 or more frequencies:
<17 >17
Zero 29.445 9.555
ν=2×1=2
Comparing the χcalc2 value to the critical value from the table at the 5% significance level:
χcalc2=∑(Oi−Ei)2Ei=7.05 (3 s.f.)>5.991
There is evidence of an association between age when leaving education and greatest rate of income
tax paid.
3 a Using the fact that for this probability density function, f(x), ∫00.1f(x) dx=1:
∫00.1k dx=0.1k=1⇒k=10
P(X>0.03)=∫0.030.110 dx=10×0.07=0.7
E(X)=∫00.1xf(x) dx=∫00.110x dx=[5x2]00.1=0.05=120
c i
E(X2)=∫00.1x2 f(x) dx=∫00.110x2 dx=[103x3]00.1=1300
ii
H1: There is an association between students' first time performances and their ages.
Expected frequencies:
Pass Fail
The frequency in the Pass category of the 40–60 age group is less than 5, so combining the 31–39
and the 40–60 age groups:
Observed frequencies:
Pass Fail
17–18 28 20
19–30 2 14
31–60 18 38
Expected frequencies:
Pass Fail
17–18 19.2 28.8
χcalc2=∑(Oi−Ei)2Ei=(28−19.2)219.2+…+(38−33.6)233.6=13.2 (3 s.f.)
Comparing the χcalc2 value to the critical value from the table:
χcalc2=13.2>9.210 (critical value from the table at the 1% significance level), so reject H0. There is
significant evidence at the 1% significance level to support Julie's belief.
b More students than expected in the age group 17−18 pass their test first time.
σ(Y)=n2−112=1.12 (3 s.f.)
5 a
b i Ε(Z)=6+2Ε(X)=6+n+1=11
ii Var(Z)=22Var(X)=n2−13=5
6 a X~ U(n) for n=1,2,3,…,n.
n−10=3−1, so n=12.
E(X)=12+12=132
Let the random variable Y be the number of these three observations that have a value of 3 or less,
so Y~B(3,14).
P(Y<2) =P(Y=0)+ P(Y=1) =( 3 0)×( 1 4)0×( 3 4)3+( 3 1)×( 1 4)1×( 3 4)2 =2732=0.844(3 s.f.)
7 a Y=5+0.02X
E(Y)=5+0.02E(X)=10Var(Y)=0.022Var(X)=1
b Z=25−Y
E(Z)=25−E(Y)=15Var(Z)=Var(Y)=1
8 a H0: μ=20 minutes
H1: μ≠20 minutes
Use the table to find the critical value. For a two-tailed test you need to look at the 0.95 column.
|T|=1.63<1.895
Any value less than or equal to 20.75(2 d.p.) would lead to rejection of Rajul's claim.
9 a The population is large. Most marks would be concentrated around the mean, and the further a mark
is from the mean, the less likely it is to occur.
b The null hypothesis 'The school is not doing better than the rest of the population (μ=60)’ will be
rejected if the 10 students' average is more than 70%.
c Now test H0: The school is not doing better than the rest of the population (μ=60) against H1: μ=65,
and find the probability that the 10 students' average is not more than 70.
P(type II error)=P(X⩽70|μ=65)=Φ(70−651510)=0.8541
d The power of the test would increase. The test would be more likely to identify the difference from
60%.
In context, as the true mean gets closer to 70 (from below), it becomes more and more difficult to
believe that the school is not doing better than the rest of the population.
10 a e−10λ=0.25λ=0.1ln4
Ρ(T=20)=e−20λ=0.0625
c Ε(T)=1λ=7.21 (3 s.f.)
H0: μ=79
H1: μ>79
Use the table to find the critical value at the 5% significance level.
12 H0: the new drug is not effective in the prevention of sickness in holiday-makers.
Sickness No sickness
Drug taken 28 52
No drug taken 7 13
n=100v=(2−1)(2−1)=1
χYates2=∑(|Oi−E|i−0.5)2Ei
Comparing the χYates2 value with the critical value from the table at the 5% significance level:
χYates2=3.37 (3 s.f.)<3.841
Do not reject H0. There is no evidence at the 5% significance level to support the claim that the drug is
effective against the sickness.
13 a Using the fact that E(X)= Var(X):
n+12=n2−112⇒n2−6n−7=0⇒n=7
b Var(2X)=22Var(X)=4Var(X)
4Var(X)=4Ε(X)=2Ε(2X)≠Ε(2X)
H0: μ=190 metres
H1: μ≠190 metres
Use the table to find the critical value. For a two-tailed test you need to look at the 0.99 column.
|T|=1.62 (3 s.f.)<2.821
Do not reject H0. There is sufficient evidence to support Lorraine's belief that there has been no
change.
15 a It must be a random sample.
b H0: μ=44.1 mph
H1: μ<44.1 mph
Use the table to find the critical value at the 1% significance level.
Comparing your calculated t-value with the critical value:
|T|=2.71 (3 s.f.)>2.364
Reject H0. There is significant evidence that the mean speed has reduced.
c i Concluding that the mean speed has reduced when in fact it has not.
ii Concluding that the mean speed is still 44.1 when in fact it has reduced.
16 a Ε(X)=14+22+n4=5+n4
4Ε(1X)=4(14×1+12×2+14×n)=2+1n
n2−3n−4=0⇒n=−1 or 4
b For n=4:
Var(X)=E(X2)−(E(X))2=124+222+424−(94)2=1.1875
Var(1X)=E(1X2)−(E(1X))2=14×12+12×22+14×42−0.56252=0.0742Var(X)Var(1X)=16
17 a H0: λ=70, H1: λ>70
b X represents the number of cars passing the school gates in a 10-minute interval, X~Po(70).
Ρ(X⩾x)=∑n=x∞70ne−70n!⩽0.1⇒X⩾82
c Reject H0. There is sufficient evidence that the mean number of cars has increased.
Ρ(X⩽81)=∑n=081120ne−120n!=1.01×10−4 (3 s.f.)
18 a λ0=3×16=48
Ρ(X⩽35)=∑n=03548ne−48n!=0.0309 (3 s.f.)
b
c The number of visitor groups is now 12 per hour, so if X represents the number of groups arriving in a
3-hour period, X~Po(3×12=36).
Ρ(X>35)=∑n=36∞36ne−36n!≈0.522Power=1−P(X>35)=0.478 (3 s.f.)
19 a Interval is X¯−Z×sn<μG<X¯+Z×sn
2.0001−1.960×10−410<μG<2.0001+1.960×10−410
2.000 038<μG<2.000 162(7 s.f.)
Interval is 0.380<μY<1.62
ii Let the number of confidence intervals that are above the true mean be Y, then Y~B(3,0.025).
www.cambridge.org
Information on this title:
www.cambridge.org/9781316644508 (Paperback)
www.cambridge.org/9781316644324 (Paperback with Cambridge Elevate edition)
www.cambridge.org/9781316644584 (Cambridge Elevate edition 2 years)
www.cambridge.org/9781316644614 (Cambridge Elevate edition 1 year School Site
Licence)
© Cambridge University Press 2018
This publication is in copyright. Subject to statutory exception and to the provisions of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published 2018
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Printed in the United Kingdom by Latimer Trend.
A catalogue record for the print publication is available from the British Library
ISBN 978-1-316-64450-8 Paperback
ISBN 978-1-316-64432-4 Paperback with Cambridge Elevate edition
It is illegal to reproduce any part of this work in material form (including photocopying
and electronic storage) except under the following circumstances:
(i) where you are abiding by a licence granted to your school or institution by the
Copyright Licensing Agency;
(ii) where no such licence exists, or where you wish to exceed the terms of a licence,
and you have gained the written permission of Cambridge University Press;
(iii) where you are allowed to reproduce without permission under the provisions of
Chapter 3 of the Copyright, Designs and Patents Act 1988, which covers, for
example, the reproduction of short passages within certain types of educational
anthology and reproduction for the purposes of setting examination questions.
This textbook has been approved by AQA for use with our qualification. This
means that we have checked that it broadly covers the specification and we are
satisfied with the overall quality. Full details of our approval process can be found
on our website.
We approve textbooks because we know how important it is for teachers and
students to have the right resources to support their teaching and learning.
However, the publisher is ultimately responsible for the editorial control and
quality of this book.
Please note that when teaching the A/AS Level Further Mathematics (7366, 7367)
course, you must refer to AQA’s specification as your definitive source of
information. While this book has been written to match the specification, it cannot
provide complete coverage of every aspect of the course.
A wide range of other useful resources can be found on the relevant subject pages
of our website: www.aqa.org.uk
IMPORTANT NOTE AQA has not approved any Cambridge Elevate content.