100% found this document useful (3 votes)
4K views303 pages

A Level Further Mathematics For AQA

Uploaded by

老湿机
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
4K views303 pages

A Level Further Mathematics For AQA

Uploaded by

老湿机
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 303

Brighter

Thinking

A Level Further
Mathematics for AQA
Statistics Student Book (AS/A Level)
Stephen Ward, Paul Fannon, Vesna Kadelburg and Ben Woolley
Contents
Introduction
How to use this resource

1 Discrete random variables


1: Average and spread of a discrete random variable
2: Expectation and variance of transformations of discrete random variables
3: The discrete uniform distribution

2 Poisson distribution
1: Using the Poisson model
2: Using the Poisson distribution in hypothesis tests

3 Chi-squared tests
1: Contingency tables
2: Yates’ correction

4 Continuous distributions
1: Continuous random variables
2: Expectation and variance of continuous random variables
3: Expectation and variance of functions of a random variable
4: Sums of independent random variables
5: Linear combinations of normal variables
6: Cumulative distribution functions
7: Piecewise-defined probability density functions
8: Rectangular distribution
9: Exponential distribution
10: Combining discrete and continuous random variables

Focus on … Proof 1

Focus on … Problem solving 1

Focus on … Modelling 1

5 Further hypothesis testing


1: t-tests
2: Errors in hypothesis testing

6 Confidence intervals
1: Confidence intervals
2: Confidence intervals for the mean when the population variance is unknown

Focus on … Proof 2

Focus on … Problem solving 2

Focus on … Modelling 2

Cross-topic review exercise

AS Level Practice paper

A Level Practice paper

Formulae

Answers

Worked solutions for chapter exercises


1 Discrete random variables
2 Poisson distribution
3 Chi-squared tests
4 Continuous distributions
5 Further hypothesis testing
6 Confidence intervals

Worked solutions for cross-topic review exercises


Cross-topic review exercises

Acknowledgements
Introduction
You have probably been told that mathematics is very useful, yet it can often seem like a lot of techniques
that just have to be learnt to answer examination questions. You are now getting to the point where you
will start to see where some of these techniques can be applied in solving real problems. However, as well
as seeing how maths can be useful, we hope that anyone working through this book will realise that it can
also be incredibly frustrating, surprising and ultimately beautiful.

The book is woven around three key themes from the new curriculum.

Proof
Maths is valued because it trains you to think logically and communicate precisely. At a high level, maths is
far less concerned about answers and more about the clear communication of ideas. It is not about being
neat – although that might help! It is about creating a coherent argument that other people can easily
follow but find difficult to refute. Have you ever tried looking at your own work? If you cannot follow it
yourself it is unlikely anybody else will be able to understand it. In maths we communicate using a variety
of means – feel free to use combinations of diagrams, words and algebra to aid your argument. And once
you have attempted a proof, try presenting it to your peers. Look critically (but positively) at some other
people’s attempts. It is only through having your own attempts evaluated and trying to find flaws in other
proofs that you will develop sophisticated mathematical thinking. This is why we have included lots of
common errors in our Work it out boxes – just in case your friends don’t make any mistakes!

Problem solving
Maths is valued because it trains you to look at situations in unusual, creative ways, to persevere and to
evaluate solutions along the way. We have been heavily influenced by a great mathematician and maths
educator George Polya, who believed that students were not just born with problem-solving skills – they
developed them by seeing problems being solved and reflecting on their solutions before trying similar
problems. You may not realise it but good mathematicians spend most of their time being stuck. You need
to spend some time on problems you can’t do, trying out different possibilities. If after a while you have not
cracked it, then look at the solution and try a similar problem. Don’t be disheartened if you cannot get it
immediately – in fact, the longer you spend puzzling over a problem the more you will learn from the
solution. You may never need to integrate a rational function in the future, but we firmly believe that the
problem solving skills you will develop by trying it can be applied to many other situations.

Modelling
Maths is valued because it helps us solve real-world problems. However, maths describes ideal situations
and the real world is messy! Modelling is about deciding on the important features needed to describe the
essence of a situation and turning them into a mathematical form, then using that to make predictions,
compare to reality and possibly improve the model. In many situations the technical maths is actually the
easy part – especially with modern technology. Deciding which features of reality to include or ignore and
anticipating the consequences of these decisions is the hard part. Yet it is amazing how some fairly drastic
assumptions – such as pretending a car is a single point or that people’s votes are independent – can result
in models that are surprisingly accurate.

More than anything else, this book is about making links – links between the different chapters, the topics
covered and the themes above, links to other subjects and links to the real world. We hope that you will
grow to see maths as one great complex but beautiful web of interlinking ideas.

Maths is about so much more than examinations, but we hope that if you take on board these ideas (and
do plenty of practice!) you will find maths examinations a much more approachable and possibly even
enjoyable experience. However, always remember that the results of what you write down in a few hours
by yourself in silence under exam conditions are not the only measure you should consider when judging
your mathematical ability – it is only one variable in a much more complicated mathematical model!
How to use this resource
Throughout this resource you will notice particular features that are designed to aid your learning. This
section provides a brief overview of these features.

In this chapter you will learn how to:

predict the mean, mode, median and variance of a discrete random variable
understand how a linear transformation of a variable changes the mean and variance
prove and use the formulae for expectation and variance of a special distribution called the uniform
distribution
recognise when it is appropriate to use a uniform distribution.

   If you are following the A Level course, you will also learn how to:
calculate the mean of a discrete random variable after a non-linear transformation.

Learning objectives
A short summary of the content that you will learn in each chapter.

WORKED EXAMPLE

The left-hand side shows you how to set out your working. The right-hand side explains the more
difficult steps and helps you understand why a particular method was chosen.

PROOF

Step-by-step walkthroughs of standard proofs and methods of proof.

WORK IT OUT

Can you identify the correct solution and find the mistakes in the two incorrect solutions?

A Level You should know how to use the rules of 1 Two events A and B are
Mathematics probability. independent. If P(A)=0.4
Student Book 1,
and P(B)=0.3, find
Chapter 21
P(A AND B).

A Level You should know how to find probabilities of 2 P(X=x)=kx for x=1,2, 3.
Mathematics discrete random variables. Find the value of k.
Student Book 1,
Chapter 21

A Level You should know how to find the mean, variance 3 Find the variance of 2, 5
Mathematics and standard deviation of data, including and 8.
Student Book 1, familiarity with formulae involving sigma notation.
Chapter 20

Before you start


Points you should know from your previous learning and questions to check that you're ready to start the
chapter.

Key point

A summary of the most important methods, facts and formulae.

Common error

Specific mistakes that are often made. These typically appear next to the point in the Worked
example where the error could occur.

Tip

Useful guidance, including on ways of calculating or checking and use of technology.

Each chapter ends with a Checklist of learning and understanding and a Mixed practice exercise, which
includes past paper questions marked with the icon .

In between chapters, you will find extra sections that bring together topics in a more synoptic way.

FOCUS ON…

Unique sections relating to the preceding chapters that develop your skills in proof, problem-solving and
modelling.

CROSS-TOPIC REVIEW EXERCISE

Questions covering topics from across the preceding chapters, testing your ability to apply what you
have learned.

Key terms are picked out in colour within chapters. You can hover over these terms to view their definitions,
or find them in the Glossary tab. Towards the end of the resource you will find practice paper questions,
short answers to all questions and worked solutions.

Rewind

Reminders of where to find useful information from earlier in your study.

Fast forward

Links to topics that you may cover in greater detail later in your study.

Focus on…

Links to problem-solving, modelling or proof exercises that relate to the topic currently being
studied.

Did you know?

Interesting or historical information and links with other subjects to improve your awareness
about how mathematics contributes to society.

Colour coding of exercises


The questions in the exercises are designed to provide careful progression, ranging from basic fluency to
practice questions. They are uniquely colour-coded, as shown below.

1 A sequence is defined by un=2×3n−1. Use the principle of mathematical induction to prove that
u1+u2+…+un=3n−1.

Show that 12+22+…+n2=n(n+1)(2n+1)6


2
Show that 13+23+…+n3=n2(n+1)24
3
Prove by induction that 11×2+12×3+13×4+…+1n(n+1)=nn+1
4
5 Prove by induction that 11×3+13×5+15×7+…+1(2n−1)×(2n+1)=n2n+1

6 Prove that 1×1!+2×+3×3!…+n×n!=(n+1)!−1

7 Use the principle of mathematical induction to show that 12−22+32−42+…+(−1)n−1n2=(−1)n


−1n(n+1)2.

8 Prove that (n+1)+(n+2)+(n+3)+…+(2n)=12n(3n+1)

Prove using induction that sinθ+sin3θ+…+sin(2n−1)θ=sin2nθsinθ, n ∈ ℤ+


9
Prove that ∑k=1nk 2k=(n−1)2n+1+2
10

Black – practice questions which come in several parts, each with subparts i and ii. You only need
attempt subpart i at first; subpart ii is essentially the same question, which you can use for further
practice if you got part i wrong, for homework, or when you revisit the exercise during revision.

Yellow – designed to encourage reflection and discussion


Green – practice questions at a basic level
Blue – practice questions at an intermediate level

Red – practice questions at an advanced level


Purple – challenging questions that apply the concept of the current chapter across other areas of
maths.

   indicates content that is for A level students only

   indicates a question that requires a calculator

   indicates a non-calculator question


1 Discrete random variables

In this chapter you will learn how to:

predict the mean, mode, median and variance of a discrete random variable
understand how a linear transformation of a variable changes the mean and
variance
prove and use the formulae for expectation and variance of a special distribution
called the uniform distribution
recognise when it is appropriate to use a uniform distribution.
If you are following the A Level course, you will also learn how to:
calculate the mean of a discrete random variable after a non-linear transformation.

Before you start…

A Level You should know how to use the rules 1 Two events and are
independent. If
Mathematics of probability.
and find
Student Book 1,
.
Chapter 21

A Level You should know how to find probabilities of 2 for ,


Mathematics discrete random variables. . Find the value of .
Student Book 1,
Chapter 21

A Level You should know how to find the mean, variance 3 Find the variance of
Mathematics and standard deviation of data, including and .
Student Book 1, familiarity with formulae involving sigma
Chapter 20 notation.

A Level Further You should know how to calculate sums of 4 Find and simplify an
Mathematics powers of .
expression for
Student Book 1,
Chapter 11

What are discrete random variables?


A random variable is a variable that can change every time it is observed – such as the outcome when
you roll a dice. A discrete random variable can only take certain values. In A Level Mathematics Student
Book 1, Chapter 21, you covered the probability distributions of discrete random variables – a table or
rule giving a list of all possible outcomes along with their probabilities.

Tip
Discrete variables don’t have to take integer values. However, the possible distinct values can be
listed, though the list may be infinite. For example:if is the standard UK shoe size of a random
adult member of the public, takes values , , , up to and is a discrete random
variable.If is the exact foot length of a random adult member of the public (in cm), takes
values in the interval [ , ] and is a continuous random variable.

Many real-life situations follow probability distributions – such as the velocity of a molecule in a waterfall or
the amount of tax paid by an individual. It is extremely difficult to make a prediction about a single
observation, but it turns out that you can predict remarkably accurately the overall behaviour of many
millions of observations. In this chapter you will see how you can predict the mean and variance of a
discrete random variable.
Section 1: Average and spread of a discrete random variable
The most commonly used measure of the average of a random variable is the expectation. It is a value
representing the mean result if the variable were to be measured an infinite number of times.

Tip

The expectation of a random variable does not need to be a value that the variable can actually
take.

Key point 1.1

The expectation of a discrete random variable is written and calculated as

where is each possible value that can take and is the associated probability.

Tip

The subscript in the formula in Key point 1.1 is just a counter referring to each possible value
and its associated probability.

You do not need to be able to prove this result, but you might find it helpful to see this proof.

PROOF 1

The mean of pieces of discrete data is Start from the definition of the mean.

Since is constant you can take it into the sum.

When the sample size tends to infinity, the


If is large, will tend towards the
sample mean becomes the true population
probability of happening, therefore mean, .
.

WORKED EXAMPLE 1.1

The random variable has a probability distribution as shown in the table. Calculate .

Use the values from the distribution in the


formula in Key point 1.1.

As well as knowing the expected average, you may also be interested in how far away from the average
you can expect an outcome to be. The variance, , of a random variable is a value representing the
degree of variation that would be seen if the variable were to be repeatedly measured an infinite number
of times. It is a measure of how spread out the variable is.

Fast forward

You will see in Section 2 how to find expectations of other functions of .


Key point 1.2

The variance of a discrete random variable is written and calculated as


where

Did you know?


Standard deviation – the square root of variance – is a much more meaningful representation of
the spread of a variable. So why is variance used at all? The answer is purely to do with
mathematical elegance. It turns out that the algebra of variance is far neater than the algebra of
standard deviations.

The quantity is the expected value of , read as ‘the mean of the squares’. This variance formula is
often read as ‘the mean of the squares minus the square of the mean’.

WORKED EXAMPLE 1.2

Calculate for the probability distribution in Worked example 1.1.

From Worked example 1.1:

Use the values from the distribution in the formula in Key point 1.2.

Tip

Many calculators can simplify this process. You normally have to treat the values of the
random variable as data and the probabilities as the frequency.

Two other less commonly used measures of average are the mode and the median. For data, the mode is
the most common result and this extends to variables.

Key point 1.3

The mode of a discrete random variable is the value of associated with the largest
probability.

For data, the median is the value that has half the data values below it and half above it. You can interpret
this in terms of probabilities.

Key point 1.4

The median, , of a discrete random variable is any value that has


and
If there are two possible values, you have to find their mean.
When there are two possible values and you have to take their mean, the median will take a value different
from any observed value of the random variable.

WORKED EXAMPLE 1.3

For the distribution in Worked example 1.1 find:


a the mode

b the median.

a The largest probability is so there are two


modes: and .

b You can create a table of .

So the median is . Look for the first value that has a value
of greater than or equal to .
You could also check that
but this is not necessary here.

A probability distribution can also be described by a function.

WORKED EXAMPLE 1.4

is a random variable that can take values and where

a Find the value of .


b Find the expected mean of .
c Find the standard deviation of .

a Use the fact that the total of all the probabilities


must be

b
Use Key point 1.1.

c To find the standard deviation you first need to


find the variance, which means you need to find
and use Key point 1.2.

Although you only write down three significant


figures in the working, make sure you use the full
So accuracy from your calculator to find the final
answer.

WORK IT OUT 1.1


Find the variance of , the random variable defined by this distribution.

Which is the correct solution? Identify the errors made in the incorrect solutions.

EXERCISE 1A

1 Calculate the expectation, mode, median, variance and standard deviation of each of these discrete
random variables.

a i

ii

b i

ii

c i

ii

d i ,

ii ,

2 A discrete random variable is given by for .


a Show that .
b Find .
3 A discrete random variable has the probability distribution shown and .

a Find the values of and .

b Find the median of .


4 A discrete random variable has its probability given by

, where .

a Show that .

b Find the exact value of .


5 The probability distribution of a discrete random variable is defined by

, .

a Find the value of .


b Find .
c Find the standard deviation of .
6 A fair six-sided dice, with sides numbered is thrown. Find the mean and variance of the
score.

7 The table shows the probability distribution of a discrete random variable .

a Given that , find the values of and of .


b Find the standard deviation of .
8 A biased dice with four faces is used in a game. A player pays counters to roll the dice. The table
shows the possible scores on the dice, the probability of each score and the number of counters the
player receives in return for each score.

Score

Probability

Number of counters player


receives

Find the value of in order for the player to get an expected profit of counters per roll.

9 Two fair dice labelled with the values to are thrown. The random variable is the difference
between the larger and the smaller score, or zero if they are the same.

a Copy and complete this table to show the probability distribution of .

b Find .
c Find .
d Find the median of .
e Find .
10 a In a game a player pays an entrance fee of £ . He then selects one number from or and
rolls three fair four-sided dice, numbered to . If his chosen number appears on all three dice he
wins four times the entrance fee. If his number appears on exactly two of the dice he wins three
times the entrance fee. If his number appears on exactly one dice he wins £ . If his number does
not appear on any of the dice he wins nothing.
Copy and complete the probability table.
Profit £

Probability

b The game organiser wants to make a profit over many plays of the game. Given that he must
charge a whole number of pence, what is the minimum amount the organiser must charge?
11 Viewers are asked to rate a new film on a three-point scale. Their marks are modelled by the random
variable as shown.

a The mean, median and mode of are all equal. Find the variance of .
b Two independent viewers of the film are both asked their opinion.
i What is the probability that their total score is more than ?

ii Show that the expectation of their total score is .


12 The number of books borrowed by each person who visits a library is modelled by the random variable
.

a Find the mean of .

b Show that the expectation of is larger than the median of .


c Show that the standard deviation of is less than the median of .
d people visited the library during an audit period. The numbers of books they borrowed are
independent of each other. Find:
i the probability that exactly three people borrow no books
ii the expected number of people who borrow no books.
Section 2: Expectation and variance of transformations of discrete
random variables
Linear transformations
You might have noticed a link between parts a and b of question 1 in Exercise 1A. The distributions were
very similar but in part b all the -values were multiplied by . All the averages and the standard
deviations were also multiplied by but the variances were multiplied by . This is an example of a
transformation.

The most common type of transformation is a linear transformation. This is where the new variable
is found from the old variable by multiplying by a constant and/or adding on a constant. You might do
this, for example, if you change the units of measurement. This kind of change is also known as ‘linear
coding’.

If you know the original mean and variance and how the data were transformed, you can use a shortcut to
find the mean and variance of the new data.

Key point 1.5

If is a random variable and is a new random variable such that , then:

Fast forward

You will prove Key point 1.5 after you have developed a little more theory.

This means that the standard deviation of , is . This makes sense as multiplying the data by
does change how spread out they are, but adding on does not change the spread.

WORKED EXAMPLE 1.5

A random variable has expectation and variance . is a transformation of given by


. Find:
a the expectation of

b the standard deviation of .


a This is just a direct application of Key point 1.5.

b To find the standard deviation you first need to find the variance of ,
using Key point 1.5.

Common error

It is easy to get confused with the minus sign in the transformations in Worked example 1.5.
Remember that both variances and standard deviations are always positive.

Non-linear transformations
You can also apply non-linear transformations to , such as , or . When you do this
there is no shortcut to finding the mean and variance of the transformed variable. You need to adapt
Key point 1.1.

Consider the discrete random variable outcome on a fair six-sided dice. If , you can
construct the probability distribution for :

The probability of being is just the same as the probability of being . So


, i.e. it is .

Key point 1.6

If is a discrete random variable with expectation and is a function applied to ,


then

WORKED EXAMPLE 1.6

The discrete random variable has the distribution shown in the table.

If °, find:
a
b .
a Apply Key point 1.6

b To find you need which is


.

You can use Key point 1.6 to prove Key point 1.5.

PROOF 2

Let

Then: Apply Key point 1.6 to the function .

You can separate out a sum into its different terms, taking
out constant factors.
Use the fact that for any probability distribution
and the definition of expectation from Key point 1.1. You
have now established the first part of Key point 1.6.

Considering to get to
the variance:

Apply Key point 1.6 to the function and


expand the brackets.
You can separate out a sum into its different terms, taking
out constant factors.
Use the fact that for any probability distribution
and the definitions of and .
Using the definition of
variance from Key point 1.2:

Expand the brackets and lots of terms cancel!


Taking out a factor of leaves the expression for
from Key point 1.2. This completes the proof.

EXERCISE 1B

1 and . Find and if:


a i
ii
b i

ii
c i
ii
d i
ii

e i

ii .

2 The discrete random variable follows this distribution:

Find and if:


a i
ii
b i

ii

c i
ii
d i
ii .
3 Stephen goes on a mile bike ride every weekend. The distance until he stops for a picnic is
modelled by , where and .
is the distance remaining after his picnic. Find and .
4 The rule for converting between degrees Celsius and degrees Fahrenheit is:

When a bread oven is operating it has expected temperature with standard deviation
.
Find the expected temperature and standard deviation in degrees Fahrenheit.
5 The random variable has expectation and variance . If , find the values of
and so that the expectation of is zero and the standard deviation is .
6 is a discrete random variable where and . is a transformation of
such that . Find and the standard deviation of .
7 is a discrete random variable satisfying for .

Find:

a the value of

b
c

e .

8 The discrete random variable has a distribution given by for


.

a Find, in terms of and , .


b Hence find, in terms of and , .
9 A discrete random variable has equal expectation and standard deviation. is a
transformation of such that . Prove that it is only possible for the expectation of
to equal the variance of if

10 The St Petersburg Paradox describes a game where a fair coin is tossed repeatedly until a head
is found. You win pounds if the first head occurs on the toss. How much should you pay to
play this game?
Section 3: The discrete uniform distribution
You have already met some special distributions that occur so often that they are named. For example, the
binomial and the normal distributions. Another very common distribution is the discrete uniform
distribution.

This is a distribution in which all the whole numbers from to are equally likely and it is given the symbol
. For example, gives the distribution of the outcomes on a fair six-sided dice.

Key point 1.7

If a random variable follows a discrete uniform distribution , then


for .

If you identify a random variable as following a uniform distribution you can immediately write down the
expectation and variance.

Key point 1.8

If a random variable follows a discrete uniform distribution , then

and .

Rewind

You met the rules for working with indices in A Level Mathematics Student Book 1, Chapter 2.

You can prove the result in Key point 1.8 by using your knowledge of sums of powers of integers.

PROOF 3

If then and .

denotes the possible values of which are .

is a constant so you can take it out of the sum.

Use the result for the sum of the first positive integers:

All the values of need to be squared.

Use the result for .

Use the formula for variance.


In Section 2 you saw how to find the expectation and variance of a linear transformation of a discrete
random variable. You can find the expectation and variance of a linear transformation of a discrete uniform
distribution in the same way.

WORKED EXAMPLE 1.7

The discrete random variable is equally likely to take any even value from to inclusive. Find
the variance of .

where The values of are . You can write these as ,


where .
So is a linear transformation of .

Apply Key point 1.8.

Apply Key point 1.5.

EXERCISE 1C
EXERCISE 1C
1 Find the mean and variance of these distributions.
a i
ii
b i

ii
2 A fair spinner has sides labelled . Find the expected mean and standard deviation of the
results of the spinner.
3 A fair dice has sides labelled . Find the expectation and standard deviation of the outcome
of throwing the dice.
4 a The random variable is equally likely to take any integer value between and . Show that this
can be written as where .
b Hence find the variance of .
5 A string of Christmas lights starts with a plug then contains a light every from the plug.

One light is broken. Assuming all bulbs are equally likely to break, what are the expected mean and
variance of the distance of the broken light from the plug?
6 The random variable is equally likely to take the value of any odd number between and
inclusive. Find the variance of .
7 The discrete random variable takes values . Find the expectation and
variance of .
8 and . Find .

9 A random number, , is chosen from the fractions

Prove that but .

10 . Prove that is always divisible by .

Checklist of learning and understanding

The expectation of a discrete random variable is written and calculated as


.
The variance of a discrete random variable is written and calculated as
where .
The mode of is the value of associated with the largest probability.
The median, , is any value which has and .
If there are two possible values, you have to find their mean.
If , then:

A discrete uniform distribution models situations in which all discrete outcomes are equally
likely.

If , then for and and .


Mixed practice 1
1 A discrete random variable has and . Find . Choose from these
options.

2
A discrete random variable has a distribution defined by for . Find
. Choose from these options.

3 A drawer contains three white socks and five black socks. Two socks are drawn without
replacement. is the number of black socks drawn.

a Find the probability distribution of .

b Find .

4 A fair six-sided dice is thrown once. The random variable is calculated as half the result if
the dice shows an even number, or one higher than the result if the dice shows an odd
number.

a Write down a table representing the probability distribution of .

b Find .

c Find .

d Find the mode of .

e Find the median of .

5 a . Find the expectation and variance of .

b is the discrete random variable that is equally likely to take any integer value between
and .
Find and .

c is the discrete random variable that is equally likely to take any even value between
and .
Find and .

6 The random variable follows this distribution:


a Write down the median of .

b If , find the values of and .

c Hence find and show that .

7 is a discrete random variable with and . . Find and the


standard deviation of .

8 The random variable has expectation and variance . If , find the values of
and so that the expectation of is and the standard deviation is .

9 is a discrete random variable that can take the value or .

a If , find the standard deviation of .

b . Find and .

10 A fair dice is thrown until a has been thrown or three throws have been made. is the
discrete random variable representing the number of throws made.

a Write down, in tabular form, the distribution of .

b Find .

c Find the median of .

d The number of points awarded in the game, , is given by . Find the variance of .

11 a A four-sided dice labelled with the values to is rolled twice. Write down, in a table, the
probability distribution of , the sum of the two rolls.

b Find and .

c A four-sided dice is rolled once and the score, , is twice the result. Find the mean and
variance of .

12 The discrete random variable follows the distribution. is the expectation of and
is the variance of . Find .

13 is a discrete random variable satisfying for .

Find, in terms of :

d .

14 A discrete random variable has . Find in terms of

15 In a card game a pack of standard playing cards is used. The cards are dealt one at a time
until the Queen of Spades (a unique card in the pack) is revealed.

a What are the expected mean and standard deviation of the number of cards until the
Queen of Spades is revealed?

b In the game the player scores points if the Queen of Spades is the th card revealed.
Find the expected number of points scored.
16 A box contains a large number of pea pods. The number of peas in a pod can be modelled by
the random variable . The probability distribution of is shown here:

or fewer or more

a Two pods are picked randomly from the box. Find the probability that the number of peas in
each pod is at most .

b It is given that .

i Determine the values of and .

ii Hence show that .

iii Some children play a game with the pods, randomly picking a pod and scoring points
depending on the number of peas in the pod. For each pod picked, the number of points
scored, , is found by doubling the number of peas in the pod and then subtracting .

Find the mean and the standard deviation of .

[© AQA 2014]

17 In a computer game, players try to collect five treasures. The number of treasures that Isaac
collects in one play of the game is represented by the discrete random variable .

The probability distribution of is defined by

a i Show that .

ii Calculate the value of .

iii Show that .

iv Find the probability that Isaac collects more than treasures.

b The number of points that Isaac scores for collecting treasures is where .

Calculate the mean and the standard deviation of .

[© AQA 2014]
2 Poisson distribution

In this chapter you will learn how to:

use the conditions required for a Poisson distribution to model a situation


use the Poisson formula and calculate Poisson probabilities
calculate the mean, variance and standard deviation of a Poisson variable
use the distribution of the sum of independent Poisson distributions
carry out a hypothesis test of a population mean from a single observation from a
Poisson distribution.

Before you start…


A Level You should know how to 1 Given that , find .
Mathematics work with the binomial
Student Book 1, distribution.
Chapter 21

A Level You should know how to 2 Given that and ,


Mathematics work with conditional find .
Student Book 2, probability.
Chapter 20

Chapter 1 You should know how to find 3 Find and for this distribution:
the expectation and variance
of discrete random variables.

A Level You should know how to 4 A coin is tossed times and tails are
Mathematics carry out hypothesis tests on observed. Use a two-tailed test to
Student Book 1, the binomial distribution. determine at the significance level if
Chapter 22 this coin is biased.

What is the Poisson distribution?


When you are waiting for a bus there are two possible outcomes – at any given moment the bus either
arrives or it doesn’t. You can try modelling this situation, using a binomial distribution, but it is not clear
what an individual trial is. Instead you have an average rate of success – the number of buses that arrive in
a fixed time period.

There are many situations in which you know the average rate of events within a given space or time, in
contexts ranging from commercial, such as the number of calls through a telephone exchange per minute,
to biological, such as the number of clover plants seen per square metre in a pasture. If the events can be
considered independent of each other (so that the probability of each event is not affected by what has
already been seen), the number of events in a fixed space or time interval can be modelled by the Poisson
distribution.
Section 1: Using the Poisson model
The Poisson distribution is commonly used when these conditions hold:

the events occur singly (one at a time)


the events are independent of each other
the average rate of events (conventionally called lambda, ) is constant.

If these conditions are satisfied then the discrete random variable ‘number of events, ’, follows the
Poisson distribution with mean . You write this as .

Tip

If a question mentions average rate of success, or events occurring at a constant rate, you
should use the Poisson distribution.

If you can identify a fixed number of trials, you should use the binomial distribution.

The Poisson distribution can also be a useful approximate model for discrete random variables in other
situations. However, if the stated conditions are not met this can only be established by looking empirically
at data.

Once you have identified that a situation follows a Poisson distribution, you can use facts about the
probability of a certain number of events, the expected number of events and the expected variance.

Key point 2.1

If a random variable follows a Poisson distribution  , then:

for

These formulae will be given in your formula book.

Common error

Remember that , not .

Notice that the values of the mean and variance are equal for the Poisson distribution. This is something
you look out for when determining if data are likely to fit a Poisson model, although in itself is not sufficient
to decide – there are other distributions with this feature.

A typical Poisson distribution, the distribution, is shown here:


p

0.4

0.3

0.2

0.1

0 x
0 1 2 3 4 5 6

Notice that:

the mean rate does not have to be a whole number


the distribution is not symmetric
the graph, in theory, should continue on to infinite values of , but the probabilities of very large values
of get very small.

WORKED EXAMPLE 2.1

Recordable accidents occur in a factory at an average rate of every year, independently of each
other. Find the probability that in a given year exactly recordable accidents occurred.

Let be the number of recordable


Define the random variable.
accidents in a year:
Give the probability distribution.

Write down the probability required, and calculate


the answer.

The Poisson distribution is scalable. For example, if the number of butterflies seen on a flower in minutes
follows a Poisson distribution with mean , then the number of butterflies seen on a flower in minutes
follows a Poisson distribution with mean , the number of butterflies seen on a flower in minutes follows
a Poisson distribution with mean , and so on.

Tip

Learn how to use your calculator to find Poisson probabilities, and cumulative
probabilities, .

WORKED EXAMPLE 2.2

If there are, on average, buses per hour arriving at a bus stop, find the probability that more
than buses arrive in minutes.

Let be the number of buses in


Define the random variable.
minutes:
Give the probability distribution.
Write down the probability required. To use your
calculator you must relate this probability to
.
The scalability of the Poisson distribution is a consequence of a more general result. If two independent
variables both follow a Poisson distribution then so does their sum.

Key point 2.2

If random variables and follow Poisson distributions such that and and
, then .

Although you do not need to know the proof of the result in Key point 2.2, it does show an interesting link
with the binomial expansion.

PROOF 4

Consider all the different ways in which


can take the value . If then If
then , etc.
Rewrite in sigma notation to keep the
expression shorter.

Use the formula for the Poisson distribution.

You can take out factors of and from


the sum since they are constants.

You are close to having a binomial


coefficient. Multiply by in the sum to get to
this, but then you have to divide by too.
Replace the factorials with a binomial
coefficient.

You can recognise the sum as a binomial


expansion.
This is a Poisson distribution with mean .

WORKED EXAMPLE 2.3

Hywel receives an average of emails and texts each hour. These are the only types of
message he receives.

a Assuming that both the emails and the texts form an independent Poisson distribution, find the
probability that he receives more than messages in an hour.

b Explain why the assumption that the emails and texts form independent Poisson distributions is
unlikely to be true.

Use Key point 2.2 to combine the two


Poisson distributions.
You need to write the required probability in
terms of a cumulative probability to use the
calculator function.
b The rate of arrival of messages is unlikely to
be constant – there will probably be more at
some times of the day than at others. Within
each distribution messages are not likely to
be independent as they may occur as part of
a conversation. The two distributions are
also probably not independent of each other,
as times when more emails arrive might be
similar to times when more texts might
arrive.

Common error

Sometimes people think that the mean rate in a Poisson distribution has to be a whole number.
This is not the case.

WORK IT OUT 2.1

The number of errors in a computer code is believed to follow a Poisson distribution with a mean
of errors per lines of code. Find the probability that there are more than errors in lines
of code. Which is the correct solution? Identify the errors made in the incorrect solutions.

A If is the number of errors in lines, then .

B If is the number of errors in lines, then .

More than errors in lines is equivalent to more than error in lines, so you need

EXERCISE 2A

1 State the distribution of the variable in each of these situations.


a Cars pass under a motorway bridge at an average rate of per second period.
i The number of cars passing under the bridge in one minute.
ii The number of cars passing under the bridge in seconds.

b Leaks occur in water pipes at an average rate of per kilometre.


i The number of leaks in .

ii The number of leaks in .


c worms are found on average in a area of a garden.
i The number of worms found in a area of garden.

ii The number of worms found in a by area of garden.


2 Calculate these probabilities.
a If :
i
ii
b If :
i
ii
c If :
i
ii
d If :
i
ii
e If :
i
ii
3 A random variable follows a Poisson distribution with mean . Copy and complete this table
of probabilities, giving the results to three significant figures.

4 From a particular observatory, shooting stars are observed in the night sky at an average rate of
one every five minutes. Assuming that this rate is constant and that shooting stars occur (and
are observed) independently of each other, what is the probability that more than are seen
over a period of one hour?
5 When examining blood from a healthy individual, under a microscope, a haematologist knows
he should see on average four white blood cells in each high power field. Find the probability
that blood from a healthy individual will show:
a seven white blood cells in a single high power field
b a total of white blood cells in six high power fields, selected independently.
6 A wire manufacturer is looking for flaws. Experience suggests that there are on average
flaws per metre in the wire.
a Determine the probability that there is exactly one flaw in one metre of the wire.

b Determine the probability that there is at least one flaw in metres of the wire.
7 The random variable has a Poisson distribution with mean . Calculate:
a

b
c
d
8 The number of eagles observed in a forest in one day follows a Poisson distribution with mean
.
a Find the probability that more than three eagles will be observed on a given day.
b Given that at least one eagle is observed on a particular day, find the probability that exactly
two eagles are seen that day.
9 The random variable follows a Poisson distribution. Given that , find:
a the mean of the distribution
b .
10 Let be a random variable with a Poisson distribution, such that . Use technology
to estimate , giving your answer to three significant figures.
11 The number of emails Sarah receives per day follows a Poisson distribution with mean . Let
be the number of emails received in one day and the number of emails received in a seven-
day week.
a Calculate and .
b Find the probability that Sarah receives emails every day in a seven-day week.
c Explain why this is not the same as .
12 The number of mistakes a teacher makes while marking homework has a Poisson distribution
with a mean of errors per piece of homework.
a Find the probability that there are at least two marking errors in a randomly chosen piece of
homework.
b Find the most likely number of marking errors occurring in a piece of homework. Justify your
answer.

c Find the probability that in a class of students fewer than half of them have errors in their
marking.
13 A car company has two limousines that it hires out by the day. The number of requests per day
has a Poisson distribution with mean requests per day.
a Find the probability that neither limousine is hired on any given day.
b Find the probability that some requests have to be denied on any given day.

c If each limousine is to be used equally, on how many days in a period of days would you
expect a particular limousine to be in use?
14 The random variable follows a Poisson distribution with mean . Given that
, find the exact value of .
15 The random variable follows a Poisson distribution with mean .

a Show that .

b Given that , find the value of such that .


Section 2: Using the poisson distribution in hypothesis tests
If it is known that a variable follows a Poisson distribution you can use data to make inferences about the
value of the mean. To do this you use a hypothesis test. First, you need to work out the -value – the
probability of getting the observed result or more extreme, assuming that the null hypothesis is true. You
can then compare this to the significance level to determine whether or not to reject the null hypothesis.

For a one-tailed test, compare the calculated probability to the significance level directly. For a two-tailed
test, you usually find the probability of one tail and compare it to half of the significance level.

WORKED EXAMPLE 2.4

The number of telephone calls received by a company follows a Poisson distribution. Over long
experience it is thought that the mean is calls per hour. After a redesign of their website it is
found that they got calls in an hour. Test at the significance level if this provides significant
evidence of a change in the mean number of calls per hour.

It is a two-tailed test because you are looking for a


change in either direction.

If Calculate the probability of the observed outcome or


more extreme, assuming that is true.

This is more than so do not reject Compare the upper tail to half of the significance value,
. since this is a two-tailed test. If you want the -value,
double the probability to get a -value of
There is insufficient evidence to
suggest that the mean number of Write a conclusion within the context of the question.
calls has changed from per hour.

WORK IT OUT 2.2

is the random variable ‘number of absences per day in a school’. It is thought to follow a
Poisson distribution with mean . Following a change in the registration system, the number of
absences over five days was . Test at the significance level if the change in the registration
system has affected the average rate of absences. Which is the correct solution? Identify the
errors made in the incorrect solutions.

A ,

Under , .

If there are absences over five days, this is a rate of eight per day, so you need
. This is more than so you cannot reject . The average rate is
absences per day.

B ,

Let be the number of absences in five days. Under , .

. Since this is a two-tailed test you must double this to get a -value of
. This is less than the significance level so you can reject . There is evidence at the
significance level that the average rate has changed from absences per day.
C ,

. so reject .

EXERCISE 2B

1 Conduct these hypothesis tests based on the given observation. You can assume that the data follow
a Poisson distribution. Use the significance level.
a i

ii
b i

ii
c i

ii
d i

ii
2 Find the critical region (the set of values for which the null hypothesis is rejected) at the
significance level if:
a i
ii
b i

ii
c i
ii .
3 It is known that a sample of radium emits alpha particles per millisecond. A second sample of the
same size and shape emits alpha particles in a millisecond. Test at the significance level whether
this sample has the same emission rate as radium.
4 a Over a long period it is believed that the average number of cars travelling past a traffic light
follows a Poisson distribution, with cars per minute. After some roadworks, it is thought that the
number of cars passing is lower. In a one minute observation only cars pass the traffic light. Find
the -value of this observation and hence decide at the significance level if the roadworks have
caused a decrease in traffic levels.
b Suggest two reasons why a Poisson distribution might not be appropriate.
5 a The number, , of accidents per month on a road is studied. The mean number of accidents per
month is with standard deviation . Explain why this supports the suggestion that the number
of accidents follows a Poisson distribution.

b Assume that does indeed follow a Poisson distribution. It is thought that adding a speed camera
will reduce the average number of accidents from . In the month after the camera was added
there were accidents. Test at the significance level if this is evidence of a reduction in the
average number of accidents.
6 The numbers of mistakes in nine pieces of a student’s homework are shown:

a Estimate the mean and standard deviation of the number of mistakes, based upon these data.

b Hence explain why the Poisson distribution is a plausible model.


c After a study skills session the student produced a piece of work with mistakes. You can assume
that the number of mistakes does follow a Poisson distribution. Test at the significance level if
the mean number of mistakes is lower than the value found in part a.
7 The number of bees visiting a flower is thought to follow a Poisson distribution with mean per
minute.
a Describe in context two conditions that must be met for the Poisson distribution to be an
appropriate model for the arrival of bees.

b After a new hedge has been planted it is thought that the number of bees arriving will increase. In
minutes bees visit the flower. Test at the significance level if there is evidence that the
number of bees has increased.
8 The number of leaks in a pipe is known to follow a Poisson distribution with mean leaks per km.
After the water pressure was changed, an inspection of of pipe revealed leaks. Has there been
a change in the mean number of leaks? Test, using the significance level.
9 It is known from long experience that earthquakes occur in a particular town once every four months.
Environmentalists believe that a change in the way oil is extracted from a well will increase the
number of earthquakes. They monitor the activity for one year and six earthquakes occur.
a Test at the significance level whether the number of earthquakes has increased from the long-
term trend, stating your -value.

b They continue to monitor earthquake activity and the following year six earthquakes also occur.
Test at the significance level whether the number of earthquakes has increased from the long-
term trend, stating your -value.
10 The discrete random variable follows . A single observation is used to test against
. What is the smallest value of for which will be rejected at the significance level
when the observation is ?

Checklist of learning and understanding

The Poisson distribution is commonly used when these conditions hold:


the events occur singly (one at a time)
the events are independent of each other
the average rate of events (conventionally called ) is constant.
If , then:
for

If , and , then
You can use the Poisson distribution to conduct a hypothesis test to see if it suggests that the
mean rate has changed.
Mixed practice 2
1 The number of complaints in a shop in any hour while it is open follows a Poisson distribution
with mean per hour. Find the probability that in a three-hour shift there are fewer than
complaints, giving your answer to three significant figures. Choose from these options.

2 A random variable follows a Poisson distribution with standard deviation . Find to


three significant figures. Choose from these options.

3 The random variable is the number of robins that visit a bird table each hour. The random
variable is the number of thrushes that visit a bird table each hour. These are the only types
of bird that visit the table.

It is believed that and .

is the random variable ‘Number of birds visiting the table each hour’.

a Stating a necessary assumption, write down the distribution of .

b Find the probability that no birds visit the table in one hour.

c Find

4 is the random variable ‘number of burgers ordered per hour in a restaurant’. It is thought
that .

a Write down two conditions required for the Poisson distribution to model data.

b Find

c During a ‘happy hour’ special offer the number of burgers sold increased to . Test at the
significance level whether the special offer has increased the average rate of burgers
ordered from .

5 Salah is sowing flower seeds in his garden. He scatters seeds randomly so that the number of
seeds falling on any particular region is a random variable with a Poisson distribution, with
mean value proportional to the area. He intends to sow fifty thousand seeds over an area of
.

a Calculate the expected number of seeds falling on a region.

b Calculate the probability that a given area receives no seeds.

6 a If write down and .


b Hence find where is the expected standard deviation of .

7 Seven observations of the random variable , the number of power surges per day in a power
cable, are shown:

a Estimate the mean and standard deviation of , based upon these observations.

b Use your answer to part a to explain why the Poisson distribution is a plausible model for
.

c When a new brand of cable is used it is observed that there are power surges in five
days. Does this suggest that the new brand has a different average rate of power surges to
your answer in part a? Use a significance level.

8 A receptionist at a hotel answers on average phone calls a day.

a Find the probability that on a particular day she will answer more than phone calls.

b Find the probability that she will answer more than phone calls every day during a five-
day week.

9 During the month of August in Bangalore, India, there are on average rainy days.

a Find the probability that there are fewer than seven rainy days during the month of August
in a particular year.

b Find the probability that, in ten consecutive years, exactly five have fewer than seven rainy
days in August.

10 The random variable follows a Poisson distribution. Given that , find:

a the mean of the distribution

b .

11 a Given that and find the value of .

b . Find the possible values of such that .

c If and , express in terms of .

12 A geyser erupts randomly. The eruptions at any given time are independent of one another
and can be modelled, using a Poisson distribution with mean per day.

a Determine the probability that there will be exactly one eruption between and

b Determine the probability that there are more than eruptions during one day.

c Determine the probability that there are no eruptions in the minutes Naomi spends
watching the geyser.

d Find the probability that the first eruption of a day occurs between and

e If each eruption produces litres of water, find the expected volume of water produced
in a week.

f Determine the probability that there will be at least one eruption in at least six out of the
eight hours the geyser is open for public viewing.
g Given that there is at least one eruption in an hour, find the probability that there is exactly
one eruption.

13 In a particular town, rainstorms occur at an average rate of two per week and can be
modelled, using a Poisson distribution.

a What is the probability of at least eight rainstorms occurring during a particular four-week
period?

b Given that the probability of at least one rainstorm occurring in a period of complete
weeks is greater than , find the least possible value of .

14 Patients arrive at random at an emergency room in a hospital at the rate of per hour
throughout the day.

a Find the probability that exactly four patients will arrive at the emergency room between
and .

b Given that fewer than patients arrive in one hour, find the probability that more than
arrive.

15 It is thought that . A single observation of takes the value . Does this provide
evidence at the significance level that the average rate has decreased? Support your
answer by writing down the -value of the observation.

16 Based on long experience a gardener knows that birds tend to arrive in his garden at an
average rate of per hour.

a State two assumptions required to model the birds’ arrival, using a Poisson distribution. Are
these reasonable assumptions?

b If these assumptions do hold, find the probability of observing more than birds in an
hour.

The gardener plants some new flowers. He wants to know if this changes the birds’ behaviour.

c If is the true average rate of arrival of birds after the new flowers have been planted,
write down suitable null and alternative hypotheses for answering the gardener’s question.

d If birds are observed in an hour, what is the conclusion of the test at significance?

17 A water company believes that pipes have leaks per km, following a Poisson distribution.
After increasing water pressure they are concerned that there are more leaks. They find
leaks in a section of pipe. Does this provide significant evidence at the significance
level to suggest that the mean number of leaks has increased?

18 A shop has four copies of the magazine Ballroom Dancing delivered each week. Any unsold
copies are returned. The demand for the magazine follows a Poisson distribution with mean
requests per week.

a Calculate the probability that the shop cannot meet the demand in a given week.

b Find the most probable number of magazines sold in one week.

c Find the expected number of magazines sold in one week.

d Determine the smallest number of copies of the magazine that should be ordered each
week to ensure that the demand is met with a probability of at least .

19 Annette is a senior typist and makes an average of mistakes per letter. Bruno is a trainee
typist and makes an average of mistakes per letter. Assume that the number of mistakes
made by any typist follows a Poisson distribution.

a Calculate the probability that on a particular letter:

i Annette makes exactly three mistakes

ii Bruno makes exactly three mistakes.

b Annette types of all the letters.

i Find the probability that a randomly chosen letter contains exactly three mistakes.

ii Given that a letter contains exactly three mistakes, find the probability that it was typed
by Annette.

c Annette and Bruno type one letter each. Given that the two letters contain a total of three
mistakes, find the probability that Annette made more mistakes than Bruno.

20 The number of worms in a square metre in a forest satisfies the distribution . A scientist
samples many square-metre areas but only records areas where some worms are observed.
What is the mean value of her observations?

21 Mohammed is offered a week’s trial with a view to being permanently employed to service
bicycles in Robyn’s bicycle shop.

The number of bicycles brought in to be serviced can be modelled by a Poisson distribution


with mean per day.

a Find the probability that, on Mohammed’s first day, the number of bicycles brought in to be
serviced is:

i or fewer

ii more than

iii exactly .

b Before starting work, Mohammed told his mother that he hoped that, during his first week (
days), the number of bicycles brought in to be serviced would be:

at least , otherwise Robyn might decide that there was not enough work to justify
permanently employing him
not more than , so that he would not have to work too hard.
Find the probability that Mohammed’s hopes will be met.

[© AQA 2011]

22 At a Roman site, coins are found at an average rate of coin per . Assume that the
number of coins found can be modelled by a Poisson distribution.

a Determine the probability that, in an area of :

i at most coins are found

ii exactly coins are found.

b Determine the probability that more than coins are found in an area of .

c Bronze brooches are less common than coins at this site, and are found at an average rate
of brooch per . The number of these brooches found is independent of the number of
coins found. Assume that the number of bronze brooches found can also be modelled by a
Poisson distribution.

i Determine the probability that the total number of coins and bronze brooches found in
an area of is at least .

ii Sometimes, Romans buried a hoard of several coins together. They did not usually bury
several bronze brooches together. State, with a reason, which of

the number of coins found or


the number of bronze brooches found
is likely to be better modelled by a Poisson distribution.

[© AQA 2013]
3 Chi-squared tests

In this chapter you will learn how to:

check if two variables are dependent.


If you are following the A Level course, you will also learn how to:
use Yates’ correction, a way of improving the method for checking if two variables
are dependent.

Before you start…


A Level Mathematics You should know how to 1 A coin is tossed times and
Student Book 1, Chapter conduct hypothesis tests. heads are observed. Does this
22 provide evidence (at a
significance level) that the coin is
biased towards heads?

A Level Mathematics You should know how to 2 The probability of Andrew scoring
Student Book 1, Chapter calculate probabilities for a goal is and the probability of
21 independent events. Helen scoring a goal is . Given
that these outcomes are
independent, what is the
probability that they both score a
goal?

A Level Mathematics You should know how to 3 Evaluate .


Student Book 2, Chapter 3 evaluate expressions
including the modulus
function.

Independent?
One common question you can ask in a statistical situation is whether or not two variables are dependent –
for example, do future earnings depend on A Level choices? In this chapter you will look at a statistical test
to answer this type of question.

Did you know?


You might have already met a test to see if two variables are correlated. This is related to
independence, but it is not quite the same. For example, the scatter graph shows the results of a
psychology experiment where people are asked to estimate the size of an angle, and the time
taken for them to do so is measured.
The two variables are not correlated (there is no linear trend) but they are not independent –
people who spend longer making the estimate seem to have more tightly clustered estimates.
90

Estimated angle
80

(degrees)
70

60

50
0 5 10 15
Time (s)

It turns out that if two variables are independent then they will definitely be uncorrelated, but the
reverse is not true. You can write:
Section 1: Contingency tables
In this section you will try to design a hypothesis test that decides whether two variables are dependent:

: The two variables are independent.

: The two variables are dependent.

Tip

Choose to be that the variables are independent because you can use that to calculate
expected values. You cannot use the fact that two variables are dependent to calculate expected
values unless you are given more information about what that dependence is.

To describe the two variables you use contingency tables that list how often each combination of
variables occurs. For example, this table illustrates the results of a survey of young families. The observed
value in cell is called .

Number of children

or more

or fewer
Number
of
bedrooms
or more

This is a contingency table. Notice that each cell contains actual frequencies rather than probabilities
or proportions.

You are going to need a way of measuring how far this is away from the numbers you would expect if the
two variables were independent. To do this you look at the totals.

Number of children
Total
or more

or fewer
Number
of
bedrooms
or more

Total

Based on the sample, the probability of having bedrooms or fewer is and the probability of having

children is . If the two variables are independent, the probability of both occurring is the product of

these probabilities, so the probability of two children and two bedrooms is . In a sample of size

you would then expect there to be families with two children and two bedrooms.
The expected frequency in cell is called .

Tip

Expected frequencies do not have to be whole numbers.


Key point 3.1

In a contingency table, the expected frequency in cell is

You can create another contingency table containing all the expected frequencies.

Number of children

or more

or fewer
Number
of
bedrooms
or more

There are several possible measures of the difference between observed and expected values. The
measure you need to know is called chi-squared .

Tip

Notice that the row totals and the column totals are the same as for the original data. This is a
useful check.

Key point 3.2

The chi-squared value that gives the difference between the observed values, , and the
expected values, , is

For the given data,

Large values indicate a big difference between observed and expected data. Is this value large enough to
conclude that number of bedrooms and number of children are not independent? To decide, you need to
know the distribution of to see how likely the observed value is. This distribution has a single
parameter that depends on the number of cells in the table and that, for historical reasons, is called the
degrees of freedom – often given the symbol (lowercase Greek letter ‘nu’) or DF.

Key point 3.3

In an by contingency table, the number of degrees of freedom is

If the null hypothesis is true (that the variables are independent) approximately follows the chi-
squared distribution with degrees of freedom – the distribution. However, this approximation is only
valid if all the expected frequencies in the contingency table are greater than .

Key point 3.4

If the null hypothesis is true and the expected value for all , then the chi-squared value is

This will be given in your formula book.


Tip

Only expected values need to be above . Observed values are irrelevant.

In the survey results, not all of the expected values are above . When this happens you need to combine
some rows or columns in a way that is sensible in context. The most obvious way with the example given is
to combine the ‘ children’ group with the ‘ or more children’ group.

Number of children

or more

or fewer
Number of
bedrooms
or more

You can then create the new contingency table of expected values.

Tip

You can find the expected values by adding up the corresponding expected values from the
original table. You don’t have to recalculate the frequencies, using Key point 3.1.

Number of children

or more

or fewer
Number of
bedrooms

or more

You can find the contributions of each cell to the total chi-squared value.

Number of children

or more

or fewer
Number of
bedrooms
or more

Totalling these contributions, and . You can compare this value to critical
values given in the formula book.

The highlighted value, 9.488, gives the critical value for a test at the significance level with four
degrees of freedom. The column is headed because of chi-squared values with degrees of
freedom are below this value, therefore are higher. The calculated value of is higher than , so
you reject the null hypothesis and conclude that number of bedrooms and number of children are
dependent variables.

The contingency table showing the contributions of each cell to the sum shows some cells have a

much larger value of than others.


You can use this to analyse which combinations of the variables are very different from the expected
frequencies. This can give you further insight into what is happening in the situation being investigated. In
the example you can see that two or more children in houses with four or more bedrooms makes the
largest contribution to the . You could interpret this as meaning that large families preferring large
houses is a big factor in why number of children and number of bedrooms are dependent.

Tip

Some calculators allow you to do the chi-squared test automatically and provide you with the
-value. This alternative approach is acceptable.

WORKED EXAMPLE 3.1

Determine at the significance level whether or not the colour of a car sold by a dealership is
independent of the gender of the purchaser.

Gender
Total
Male Female

Blue

Red
Colour
Green

Silver

Total

: Gender and car colour are Set up hypotheses.


independent.
: Gender and car colour are not
independent.
The expected values are:
Male Female
Find expected values, using Key point 3.1. Check that
Blue all the expected frequency values are above , which
they are in this case. Also check that the row and
Red
column totals match the row and column totals in the
Green original table of observed values.

Silver

Find the chi-squared value, using Key point 3.2.

Find the number of degrees of freedom, using Key


point 3.3.
The critical value is which is more
Use the formula book to find the critical value.
than .
Therefore do not reject ; the data set Remember that is a measure of distance
is consistent with gender and car colour between observed and expected values, so your
being independent. calculated distance is less than the critical distance.

Worked example 3.2 shows how to deal with combining groups.

WORKED EXAMPLE 3.2

This contingency table shows the favourite sports played by different age groups in a sample at a
sports centre.

Age

Soccer

Basketball
Favourite
sport
Swimming

Tennis

a Test at the significance level whether preferred sport and age are independent, showing the
contributions of each cell.

b Interpret your results in context.

a : Age and preferred sport are Set up hypotheses.


independent.
: Age and preferred sport are not
independent.

The expected values are: Find expected values, using Key point 3.1.
Check that all the expected values are above ,
which they are not in this case. Notice that the
Soccer row and column totals are the same as those in
the observed data table.
Basketball

Swimming

Tennis

Several cells in the range have a The most obvious choice is to combine the
frequency less than , so combining this and groups.
column with the column:

Soccer

Basketball

Swimming

Tennis

And the corresponding observed values


are:

Soccer

Basketball
Swimming

Tennis

The contributions of each cell are: Use Key point 3.2.

Soccer

Basketball

Swimming

Tennis

The critical value is .


Compare with the critical value from the
Therefore reject ; preferred sport and
formula book and conclude.
age are dependent.
b The main contributions to come from Use the contributions of each cell to find the
basketball and swimming. It appears most important factors.
swimming is more popular than would be
expected amongst younger children,
whilst basketball is more popular than
would be expected amongst older
children.

You might have to construct a contingency table from given information.

WORKED EXAMPLE 3.3

A biologist claims that the mobility of fish is dependent on their breeding ground. A sample of
fish was taken, with equal numbers from each of the two breeding grounds, Ellesmere and Duxbury,
studied. A test was used to classify the fish as sedentary, normal or highly mobile. In Ellesmere half
of the fish were classified as highly mobile and one-fifth as normal. In Duxbury one quarter of the
fish were classified as normal and of the Duxbury fish were classified as sedentary.

Test the biologists claim at the significance level.

: Mobility and breeding ground are


independent.
First write the null and alternative hypotheses.
: Mobility and breeding ground are
dependent.
Observed values:
Create a contingency table. There are fish in
Sedentary Normal Highly
mobile each location. Turn the given proportions in
each location into frequencies and use the fact
Ellesmere that each row adds up to to complete the
table.
Duxbury

Expected values:
Sedentary Normal Highly
mobile You can use Key point 3.1 to calculate the
expected values. All expected values are larger
Ellesmere than so no combining is required.
Duxbury
Find the value of , using Key point 3.2.

Find the number of degrees of freedom, using


Key point 3.3.
The critical value is so do not
reject . There is no significant evidence
that mobility depends upon breeding
ground.

EXERCISE 3A

1 Test these contingency tables to see if the two variables are dependent at the significant level.
State carefully the number of degrees of freedom and the value of . In part b, combine
suitable columns to make all expected frequencies greater than 5.
a i Exam grade
or , or or

Mr Archer

Teacher Ms Baker

Mrs Chui

ii Time working
hours hours hours

Male
Gender
Female

b i Age

Social
media
followers

ii Cost

Red

Colour Green

Blue

2 A Physics teacher wants to investigate whether or not there is any association between the
Physics grade her students get and the Mathematics course they study. She collects data for a
random sample of students over several years. The results are given in this table.
or lower or or

Further Maths

Maths AS or
No Maths

a State the null and alternative hypotheses.


b Calculate the expected frequencies.
c Calculate the value of and write down the number of degrees of freedom.
d Test at the significance level whether the Physics grade is independent of the Mathematics
course studied. Show clearly how you arrived at your conclusion. Interpret your results in
context.
3 A random sample of books was taken from a library. One third of the books were fiction and
the rest were non-fiction. The reading level of each book was assessed as elementary, moderate
or advanced. of the non-fiction books were classified as advanced and were classified as
elementary. One quarter of the fiction books were classified as elementary. There were the same
number of moderate fiction and moderate non-fiction books.
Conduct an appropriate test to determine, at the significance level, if there is evidence to
suggest that reading level depends on whether a book is fiction or non-fiction.
4 James wanted to know whether people being late, early or on time to school depends upon their
mode of transport. This partially filled contingency table shows his results, based on asking
students.

Early On time Late Total

Walk

Car

Other

Total
a Copy and complete the contingency table.
b Calculate the value of for this data.
c Conduct an appropriate test at the significance level to answer James’ question.
d What assumptions does James have to make in conducting this test?
5 The owner of a beauty salon wants to find out whether there is any association between the
number of times in a year people visit the salon and the amount of money they spend on each
visit. He collects these data for a random sample of clients.
Number of visits

Amount spent
per visit £

Is there evidence, at the level of significance, that there is some association between the
number of visits and the amount of money spent? Interpret your result in context.
6 A drugs manufacturer claims that the speed of recovery from a certain illness is higher for people
who take a higher dose of their new drug. They provide these data for a sample of patients.
No drug taken Single dose Double dose

days

days

days
days

Test whether there is evidence for the manufacturer’s claim at the level of significance.
Interpret your results in context.
7 A company is investigating their gender equality policies. As a part of this investigation they
collect data on salaries, to the nearest pound, for a random sample of employees, as shown in
this table.

Male Female

a Assuming salary is independent of gender, calculate the corresponding expected frequencies.


b Carry out a suitable test to determine whether salary is independent of gender. State and
justify your conclusion at the level of significance.

8 a Prove that , where .

b A contingency table has . Find the largest possible sample size.

c Find the largest sample size that will produce a significant result in the chi-squared test at the
significance level, assuming that all cells have an expected frequency of at least 5.
9 A researcher believes that these percentages are the true proportions of people voting for
different political parties based on their gender:

Male Female

Party A

Party B

Party C

a Show that gender and voting intention are dependent.


b Show that if a sample of size follows these proportions, it will not provide a significant result
in the chi-squared test at significance.
c Calculate an estimate of the smallest sample size required to find significant evidence that
gender and preferred political party are dependent, using a significance level.

d Explain why your answer to part c is only an estimate.


10 This contingency table has some blank spaces.

First factor
Total

Second factor B

Total
a Copy the table and fill in the blanks.
b Hence explain why it can be said that this contingency table has degrees of freedom.
11 Explain why the formula for chi-squared contains:
a squaring before summing
b dividing by the expected value.
Section 2: Yates’ correction
It turns out that when the number of degrees of freedom (i.e. a × contingency table) then
the approximation that is not very good. To improve upon this you use an alternative
formula, called Yates’ correction.

Key point 3.5

Yates’ correction:

Rewind

You met the modulus function, , in A Level Mathematics Student Book 2, Chapter 3.

WORKED EXAMPLE 3.4

This contingency table shows the results of people in a driving test, along with their
gender.

Gender

Male Female

Pass
Result
Fail

Test at the significance level if the outcome of the test is independent of gender.

: Gender and result are independent.


Set up hypotheses.
: Gender and result are not independent.

The expected values are:

Gender
Find the expected values, using
Male Female Key point 3.1.
Pass
Result
Fail

Use Key point 3.5.

so the critical value is .

You cannot reject the null hypothesis; the test


outcome is independent of gender.

WORK IT OUT
3.1

Test at the significance level if there is any association between teacher and test result:

Teacher
Total
Mr A Mrs B

Pass
Result
Fail

Total

Which is the correct solution? Identify the errors made in the incorrect solutions.

A If there is no association then each cell will be the same, so the expected values are:
Teacher

Mr Mrs

Pass
Result
Fail

So

The critical value when is , so you can reject ; the result does not
depend on the teacher.

B : The result depends on the teacher.


: The result does not depend on the teacher.

The expected values are:

Teacher

Mr Mrs

Pass
Result
Fail

Using Yates’ correction:

The critical value is , which is more than the calculated value, so do not reject ;
the result does depend on the teacher.

C : Teacher and result are independent.


: Teacher and result are dependent.

The expected values are:

Teacher

Mr Mrs

Pass
Result
Fail

Using Yates correction:


The critical value is , which is more than the calculated value, so do not reject ;
the result and the teacher are independent.

Sometimes Yates’ correction only becomes clear after you combine rows and columns of a
contingency table.

WORKED EXAMPLE 3.5

This contingency table shows the location and ownership status of a random sample of
houses.

Owned outright Owned with Rented


mortgage

Urban

Rural

Does this sample provide evidence at the significance level that ownership status depends
on location?

: Ownership status and location are First write the null and alternative
independent. hypotheses.
: Ownership status and location are
dependent.

Expected values: Use Key point 3.1 to find the expected


values.
Owned Owned Rented
outright with
mortgage

Urban

Rural

Combining the two owned categories


gives observed data:
Since there are two cells with an expectation
Owned Rented below , you must combine rows or columns.
The most reasonable combination here is the
Urban two types of owned category.
Rural

The expected data is:


Owned Rented You do not need to use Key point 3.1 again.
You can just add the expected values of the
Urban
appropriate cells in the previous table.
Rural

Since there is now only one degree of


freedom, it is appropriate to use Yates’
correction.

The critical value is , which is greater


than the observed value, so you do not
reject . There is no significant evidence
that ownership status depends on
location.

EXERCISE 3B

1 Use Yates’ correction to test these contingency tables for evidence of association, using
significance.
a i

ii

b i

ii

2 Gregor Mendel, the founder of modern genetics, carefully observed peas and found these
results:
Wrinkled Round
Yellow
Green

Show that the round or wrinkled appearance of the pea is independent of the colour at the
level of significance.

Did you know?


These results are actually suspiciously close to being perfect – some people believe
Mendel faked his results. However, it is not possible to conduct a hypothesis test to
check this. How is statistics used to check for authenticity in results? In particular, how
is Benford’s law used to check tax returns?

3 A scientist wanted to find out if a colleague could tell whether tea or milk was put in the cup first
when tea was prepared for her. The results are shown here.
Tea first Milk first
Likes
Dislikes

Determine, at the level of significance, whether the colleague’s enjoyment is independent


of whether tea or milk is added first.

Did you know?


‘The lady tasting tea’ was one of the experiments reported by eminent statistician
Ronald Fisher in his book, The Design of Experiments. He used a variant on the chi-
squared test, called Fisher’s exact test.

4 This table shows the number of books in libraries in rural and urban locations.
Number of books
to
Rural
Urban

Conduct a test at the significance level to determine if the number of books differs between
rural and urban libraries.
5 These data show the number of murders each year and the amount spent on horror films in the
cinema across the last years in the UK.
Amount spent on horror films in million £
to
Number of
murders

Test at the significance level to see if there is an association between the amount spent on
horror films and the number of murders each year. Does this provide evidence that watching
horror films encourages people to commit murder?
6 In admissions to the six largest departments in Berkeley, a university in California, followed
this pattern.
Accepted Rejected
Male
Female

a Conduct a chi-squared test at significance to show that acceptance patterns depend on


gender. Is a higher percentage of men or women admitted? Is this evidence of bias?
b Data from six departments are shown here.
Men Women
Total
Department Admitted Rejected Admitted Rejected

Total

Conduct a test at the significance level to determine if acceptance patterns vary in


different departments. In how many departments is the proportion of men admitted higher
than the proportion of women admitted?

Did you know?


This effect is called Simpson’s Paradox. You have to be very careful when using
statistics to support arguments!

7 Explain why the null hypothesis in a chi-squared test cannot be ‘The two variables are
dependent’.

Checklist of learning and understanding

The distribution provides a very important method for deciding if two variables are
independent.
If the variables are independent, you use the formula

to find the expected values in each cell.


The test statistic used is

is the number of degrees of freedom, calculated as (rows ) (columns ).


If , then .
If , then you use an alternative formula, called Yates’ correction:
Mixed practice 3
1 What is the number of degrees of freedom for a contingency table?

Choose from these options.

D More information needed.

2 A contingency table has all expected frequencies larger than and a chi-squared value of
. What is the largest range of values of for which there is evidence that the two factors are
not independent at significance?

Choose from these options.

3 The area manager of a bank obtained information on randomly selected loans made by
the bank during the previous two years.

The loan outcomes were categorised as ‘Satisfactory’ or as a ‘Bad debt’.

The loan recipient types were categorised as ‘Individual’, ‘Small business’ or ‘Large business’.

Recipient

Individual Small business Large business

Satisfactory
Outcome
Bad debt

Using a distribution and the level of significance, test whether the outcome of a loan is
independent of the type of recipient.

Interpret your conclusion in the context of the question.

[© AQA 2013]

4 This contingency table shows the data on hair colour and eye colour for a sample of
children.

Eye colour

Blue Green Brown

Brown
Hair colour
Blonde

a Assuming that hair colour and eye colour are independent, calculate the expected
frequencies.
b Calculate the value of the statistic for this data and state the number of degrees of
freedom.

c Perform a suitable hypothesis test at the level of significance to decide whether hair
colour and eye colour are independent. State your hypotheses and your conclusion clearly.

5 A nurse thinks that she has noticed that more boys are born at certain times of the year. She
records the data for babies born in her hospital in one year.

Spring Summer Autumn Winter

Boy

Girl

Test at the significance level whether her data gives evidence for any association between
the gender of the baby and the time of the year. You must show all your working clearly.

6 Find the value of the appropriate chi-squared test statistic (to three significant figures) for this
contingency table.

Choose from these options.

7 A large estate agency would like all the properties that it handles to be sold within three
months. A manager wants to know whether the type of property affects the time taken to sell
it. The data for a random sample of properties sold are tabulated here.

Type of property
Total
Flat Terraced Semidetached Detached

Sold within three months

Sold in more than three months

Total

a Conduct a test, at the level of significance, to determine whether there is an


association between the type of property and the time taken to sell it. Explain why it is
necessary to combine two columns before carrying out this test.

b The manager plans to spend extra money on advertising for one type of property in an
attempt to increase the number sold within three months. Explain why the manager might
choose:

i terraced properties

ii flats.

[© AQA 2013]
8 Fiona, a lecturer in a school of engineering, believes that there is an association between the
class of degree obtained by her students and the grades they had achieved in A Level
Mathematics.

In order to investigate her belief, she collected the relevant data on the performances of a
random sample of recent graduates who had achieved grades or in A level
Mathematics. These data are tabulated here.

Class of degree
Total

A Level grade

Total

a Conduct a test, at the level of significance, to determine whether Fiona’s belief is


justified.

b Make two comments on the degree performance of those students in the sample who
achieved a grade in Level Mathematics.

[© AQA 2012]

9 An organisation kept details of sideswipe accidents involving heavy goods vehicles (HGVs)
during 2006.

The type of each sideswipe accident was recorded as changing lane to the left, changing lane
to the right or overtaking moving vehicle.

The HGV involved was identified as either British registered (right-hand drive) or foreign
registered (left-hand drive).

The table summarises details for a random sample of sideswipe accidents.

Type of sideswipe accident Total

Changing Changing Overtaking


lane to lane moving
the left to the right vehicle

British registered HGV

Foreign registered HGV

Total

a i Investigate, at the significance level, whether the type of sideswipe accident is


independent of whether the HGV involved was British registered or foreign registered.

ii Describe any differences found in the type of sideswipe accident between British
registered and foreign registered HGVs.

b A further random sample of serious HGV accidents was investigated. It was found that
of these involved drivers who were years of age or younger. Of these accidents,
resulted in prosecution for a driving offence. Of the other accidents, which involved drivers
over the age of years, resulted in prosecution for a driving offence.

i Form a contingency table from this information.

ii Carry out a test, at the significance level, to investigate whether the age of the
driver is independent of whether a prosecution for a driving offence resulted.
Interpret your conclusion in context.
[© AQA 2011]

10 The director of a large company wants to know whether there is any association between the
ages of her staff and the departments they work in. The table shows the data for a sample of
employees.

Accounts

Personnel

Marketing

Communications

Perform a suitable test at the level of significance to decide whether there is any
association between age and department.

11 The table shows the experience of a bank over a long period of the types of loan that they
give and whether they are repaid or defaulted (i.e. not repaid).

Repaid Defaulted

Personal

Mortgage

Business

a Show that whether or not a loan gets repaid depends on the type of loan.

b A statistician wants to sample loans at random. Show that would not show
dependence between the two values, using significance.

c Find the smallest whole number for which the sample would be expected to show
dependence at significance. You can assume that all expected values are above .

d State an additional assumption required for your calculations in parts b and c.

12 Research was carried out to investigate for a possible connection between weekly alcohol
consumption and development of Type 2 diabetes. In the research report, it was stated that a
sample of women, aged between and , was studied and that of these women went
on to develop Type diabetes.

The women were categorised according to their average level  of weekly alcohol
consumption. This was measured, in grams of alcohol per week, as ‘less than ’, ‘between
and ’ or ‘more than 30’.

The results are summarised in the table.

Type 2 diabetes
developed

Yes No

Less than

Average level of weekly alcohol Between and


consumption

More than

a Test, at the level of significance, whether the development of Type diabetes is


independent of the average level of weekly alcohol consumption.

b A medical reviewer for a newspaper read the report and then stated that people should
increase their weekly alcohol consumption in order to decrease their chance of developing
Type diabetes.

Make two comments on his statement, referring to both the study and the sources of
association, if any, identified when carrying out the test in part a.

c In fact, women were involved in the research but the frequencies in the resulting
contingency table had been divided by in order to make the calculation simpler.

The test in part a was therefore repeated using the correct frequencies.

For this test, state:

i the critical value

ii the value of the test statistic

iii the conclusion.

[© AQA 2010]
4 Continuous distributions

In this chapter you will learn how to:

describe probabilities of continuous random variables


calculate expected statistics of continuous random variables and functions of
continuous random variables
find the median, mode and quartiles of continuous random variables
find the expected statistics of the sum of two continuous random variables
work with the sum of two normally distributed random variables.
If you are following the A Level course, you will also learn how to:
convert between probability density function, , and cumulative probability
function,
use distributions of random variables that are part discrete and part continuous
use two new probability distributions – the rectangular and the exponential
use the cumulative distribution function to find the distribution of the function of a
random variable.

Before you start …


Chapter 1 You should know how to calculate 1 Find , given that
expectations and variances for has the distribution:
discrete distributions.

A Level Mathematics You should know the meaning of the 2 Find the interquartile range
Student Book 1, statistical measures covered in AS of:
Chapter 20 Level Mathematics

A Level Mathematics You should know how to integrate all


3 Find
Student Book 1, functions from AS Level
Chapter 14 Mathematics.

A Level Mathematics You should know how to integrate all 4 Find


Student Book 2, functions from A Level Mathematics.
Chapters 9 and 11

A Level Mathematics You should know how to use the 5 Given that
Student Book 2, rules of probability including and , find .
Chapter 20 conditional probability.

A Level Mathematics You should be able to perform 6 Given that ,


Student Book 2, calculations with a normal find .
Chapter 21 distribution.

From discrete to continuous


In Chapter 1 you saw that being able to describe random variables allowed you to make predictions about
their properties. However, a major limitation was that the methods in chapter 1 only applied to discrete
variables. In reality, many variables you are interested in, such as height, weight and time, are continuous
variables. In this chapter you will extend the methods from chapter 1 to work with continuous random
variables.
Section 1: Continuous random variables
Consider these data for the masses of several bags of rice labelled ‘ ’.

Mass Frequency

Not all of the data in category has a mass of exactly . A bag with mass or
would be included in this category. It is impossible to list all the different possible actual masses, and it is
impossible to measure the mass absolutely accurately. When you collect continuous data, you have to put
it into groups. This means that you cannot talk about the probability of a single value of a continuous
random variable (CRV). You can only talk about the probability of the CRV being in a specified interval.

x
5.05 5.15
5.087 954 6

A useful way of representing probabilities of a CRV is as an area under a graph. The probability of a single
value would correspond to the ‘area’ of a vertical line, which would be zero. However, you can find the area
of the CRV in any interval by integration.

The function which you have to integrate is called the probability density function (PDF), and it is often
denoted . The defining feature of is that the area between two values is the probability of the CRV
falling between those two values.

Tip

For a continuous random variable, it does not matter whether you use strict inequalities
or inclusive inequalities .

Key point 4.1

For a continuous random variable with probability density function :

y
y = f(x)

P(a < X < b)

x
O a b

As with discrete probabilities, the total probability over all cases must equal . Also, no probability can ever
be negative. This provides two requirements for a function to be a probability density function.

Key point 4.2

For to be a probability density function, it must satisfy:

, for all

Tip

The limits and represent the fact that, in theory, a continuous random variable can take
any real value. In practice, the limits of the integral are set to the lowest and the highest value
the variable can take.

WORKED EXAMPLE 4.1

The continuous random variable shown has probability density function:

f(x)

x
O 1

a Find the value of .


b Find the probability of being between and .

a
The total area is . The limits are and because the PDF is only
non-zero between and .

b Use the formula in Key point 4.1 and substitute the value of
found in part a.

EXERCISE 4A

1 For each of these distributions find the possible values of the unknown parameter .

a i

ii

b i

ii

c i
ii

d i

ii

e i

ii

f i

ii

g i

ii

h i

ii

2 In each part, a continuous random variable has the given probability density function.

a
i Find

ii Find

i Find

ii Find

i Find
ii Find .
3 In each part, a continuous random variable has the given probability density function.

i Find if
ii Find if

i Find if
ii Find if
c

i Find if

ii Find if
4 A model predicts that the angle, , an alpha particle is deflected by a nucleus is modelled by
the PDF

a Find the value of the constant


b alpha particles are fired at a nucleus. Assuming that the model is correct, estimate the
number of alpha particles deflected by less than .

5 The probability density function of finding a seed at a distance from a tree is proportional to
. The minimum distance seed is found from the tree is . Find the probability of a seed
being found more than from the tree.

6 A random variable has PDF

Find the exact value of

7 Given that the continuous random variable has probability density function

, find the interquartile range of .

8
A continuous random variable has probability density function

a Find in terms of if

b Find in terms of if
9 The continuous random variable has probability density function for and
otherwise. The probability of two independent observations of both being above is .
Find the values of and of .

10
The continuous random variable has probability density function . Find

11 The continuous random variable has probability density function given by for
. Prove that there is only one possible value of , and state its value.
Section 2: Expectation and variance of continuous random variables
The expressions for expectation and variance of continuous random variables all involve integration.

Key point 4.3

The expectation and variance of a continuous random variable are:

Tip

You might notice that the expressions for and look similar to those for discrete
random variables, but with integration instead of summation signs. This is because there is a link
between sums and integrals.

You need to evaluate these integrals over the whole domain of the probability density function.

WORKED EXAMPLE 4.2

A continuous random variable has pdf:

Find and the standard deviation of .

You can do the definite integration on your calculator.

To find the standard deviation you must first find


which requires you to find .

It is also possible to find the median and mode for a continuous distribution.

The defining feature of the median is that half of the data should be below this value and half above. You
can interpret this in terms of probability.

Key point 4.4

If you represent the median of a continuous random variable with PDF by , then it satisfies
.

The mode is the value of at the maximum value of .


Common error

Don't forget to look at the end points of the function when finding the mode.

You can use similar ideas to find the quartiles (or any other percentile). For example, if is the lower
quartile and is the upper quartile then

Although the lower limit is written as minus infinity, in practice it starts from the lowest value for which the
probability density function is defined.

WORKED EXAMPLE 4.3

Find the median and mode of the random variable with probability density function

For median :

Use the formula in Key point 4.4 with the lower limit as .

Using your calculator:


This is a cubic equation. Use your calculator to solve it.

For the mode, check for a maximum point. This could be where the
derivative is zero or at an end point.

Hence the mode is . The largest of these three numbers is .

EXERCISE 4B

1 Find , the median of , the mode of and if has the given probability density
function.

a i

ii

b i

ii

c i
ii

d i

ii

2 a Given that , find if:

ii

b Given that , find if:

ii

3
The continuous random variable has pdf

a Find the expected mean of .

b Find .

4 A continuous random variable has pdf

a Find the value of the constant .


b Find .

5 Consider the function

a Show that, for all values of , the function satisfies the conditions to be a PDF.

b The random variable has probability density function . Find in terms of .

6 is a continuous random variable with probability density function

a Show that .

b Given that , find the exact value of .

7 Given that , is a probability distribution, find and prove that

.
Section 3: Expectation and variance of functions of a random variable
Linear transformations
Suppose the average height of students in a class was and their standard deviation was . If they
all stood on their -high chairs then the new average height would be , but the range, and any
other measure of variability, would not change, so the standard deviation would still be . In other
words, if you add a constant on to a variable, you add the same constant on to the expectation, but the
variance does not change:

Rewind

You have already met this idea for discrete random variables in Key point 2.5. In this chapter
you extend it to continuous random variables.

If, instead, each student were given a magical growing potion that doubled their heights, the new average
height would be . However, the range, along with any other measure of variability, would have
doubled, so the new standard deviation would be . This means that their variance would change from
to . In other words, if you multiply a variable by a constant, you multiply the expectation by
the constant and multiply the variance by the constant squared:

Common error

It is important to know that this only works for the structure , which is called a linear
function. So, for example, cannot be simplified to or .

Key point 4.5

For a random variable with expectation and variance . If and are constants,
then
,

WORKED EXAMPLE 4.4

A length of pipe is cut into a long pipe with average length and standard deviation .
The leftover piece is used as a short pipe. Find the mean and standard deviation of the short pipe.
 Length of long pipe
Define your variables.
 Length of short pipe
Connect your variables.

Apply Key point 4.5.

So the standard deviation of is also .

General transformations
Key point 1.6 stated that for a discrete random variable

You can extend this to continuous random variables by changing the probability into a probability density
function and integrating instead of summing.

When finding the variance you used the fact that .

This can be generalised to any function of .

Common error

You will always get a positive variance (since square numbers are always positive), even if the
coefficients are negative. If you find you have a negative variance, something has gone wrong!

Key point 4.6

If is a continuous random variable with pdf , then:

WORKED EXAMPLE 4.5

Given that the random variable has probability density function for and

otherwise, find .

Identify and . The limits are between and


because that is where is not zero.

EXERCISE 4C

1 Given that , find:

a i

ii

b i

ii

c i

ii
d i
ii
e i
ii .
2 Given that , find:
a i

ii

b i

ii

c i

ii
d i
ii
e i
ii
3 Given that is a continuous random variable with PDF for and otherwise,
find:

a i

ii

b i

ii

c i
ii
d i
ii
4 The expected distance of a random taxi journey is miles with standard deviation miles.
The charge for a taxi journey is £ plus £ per mile (so that, for example, a mile journey
would cost £ ). Find:

a the expected value


b the standard deviation in the charge for a taxi journey.
5 The random variable has and . Given that and , find
.

6 Daniel has hours of playtime each Sunday afternoon. In that time he either reads or plays
games. If the expected amount of time reading is hours with a standard deviation of hours,
find:

a the expected amount of time playing games


b the standard deviation in the amount of time spent playing a game.
7 The side of a cube, , is a continuous random variable with pdf for and
otherwise.

a Find .
b Find the expected volume of the cube.

Common error

Notice that the answer to part b is not the cube of the answer to part a.
8 The continuous random variable has probability density function for
and otherwise.

Find:

a
b
c .
9 The continuous random variable has probability density function for
and otherwise.

a Find where is a positive whole number.


b Find

.
Section 4: Sums of independent random variables
A tennis racquet is formed by adding together two components – the handle and the head. If both
components have their own distribution of length and they are combined together randomly then you have
formed a new random variable – the length of the racquet. It is not surprising that the average length of
the racquet is the sum of the average lengths of the parts, but with a little thought you can reason that the
standard deviation will be less than the sum of the standard deviations of the parts. To get extremely long
or extremely short tennis racquets you must have extremes in the same direction for both the handle and
the head. This is not very likely. It is more likely that:

both are close to average


an extreme value is paired with an average value
an extreme value in one direction is balanced by another.

Tip

The first of the results in Key point 4.7 is true even if and are not independent.

Key point 4.7

For independent random variables with expectation and variance , and with
expectation and variance :

The results in Key point 4.7 extend to more than two variables.

WORKED EXAMPLE 4.6

The mean thickness of the base of a burger bun is with variance

The mean thickness of a burger is with variance .

The mean thickness of the top of a burger bun is with variance .

Find the mean and standard deviation of the total height of a whole burger in a bun, assuming that
the thicknesses of the individual parts are independent.

Define your variables.

Connect your variables.


Apply Key point 4.7.

So the standard deviation of is .

In Key point 4.7 it was stressed that and have to be independent, but this does not mean that they
have to be drawn from different populations. They could be two different observations of the same
population, for example, the heights of two different people added together. This is a different variable
from the height of one person doubled. Use a subscript to emphasise when there are repeated
observations from the same population:

means adding together two different observations of

means observing once and doubling the result.

The expectation of both of these combinations is the same: . However, the variance is different.

From Key point 4.7:

From Key point 4.5:

So the variability of a single observation doubled is greater than the variability of two independent
observations added together. This is consistent with the earlier argument about the possibility of
independent observations cancelling out extreme values.

You can also combine the results in Key points 4.5 and 4.7 to look at other linear combinations of
independent random variables.

WORKED EXAMPLE 4.7

The volume of lemonade Chris purchases at a supermarket is a discrete random variable with
mean and standard deviation . The volume of lemonade Chris drinks on the
journey home is a continuous random variable with mean and standard deviation .
Assume that is independent of . is the random variable: volume of lemonade in ml remaining
after the journey home.
a Find the expected mean and standard deviation of .

b How realistic is the assumption that is independent of ?


a Write the required random variable in
terms of the other two random
variables.
You only have a rule for sums so you
need to write it as a sum.

Use Key point 4.7.


Use Key point 4.5.

Use Key point 4.7.


Use Key point 4.5.
Remember that variance is the square of
the standard deviation.
So the standard deviation of is

b Although the two variables might reasonably be


thought to be independent there are also some
reasons to doubt this. If Chris is very thirsty he
might buy more lemonade and then drink more.
If Chris does not buy any lemonade then he
cannot drink any.
Worked example 4.7 illustrates that the theories of Key points 4.5 and 4.7 are applicable to both continuous
and discrete random variables, or indeed combinations of the two. It also highlights the counter-intuitive
fact that .

EXERCISE 4D
EXERCISE 4D
1 Let and be two independent variables with , , and . Find
the expectation and variance of:

a i
ii
b i

ii

c i

ii

d i
ii
e i
ii .
2 Let and be two independent variables with , , and . Find:

a
b
c
d .
3 and are two independent observations of the random variable with and
.

The sample mean, , of these two observations is also a random variable defined by
.

a Show that .
b Find .
4 The average mass of a man in an office is with standard deviation . The average mass of a
woman in the office is with standard deviation . The empty lift has a mass of . What is
the expectation and standard deviation of the total mass of the lift when women and men are
inside?

5 A weighted dice has mean outcome with standard deviation . Brian rolls the dice once and doubles
the outcome. Camilla rolls the dice twice and adds the results together. Work out the expected mean
and standard deviation of the difference between their scores.

6 Exam scores at a large school have mean and standard deviation . Two students are selected at
random. Find the expected mean and standard deviation of the difference between their exam scores.

7 Adrian cycles to school with a mean time of minutes and a standard deviation of minutes.
Pamela walks to school with a mean time of minutes and a standard deviation of minutes.

They each calculate the total time it takes them to get to school over a five day week. Find the
expected mean and standard deviation of the difference in the total weekly journey times, assuming
journey times are independent.

8 is the random variable mass of a gerbil. Explain the difference between and .
Section 5: Linear combinations of normal variables
Although the proof is beyond the scope of this course, it turns out that any linear combination of normal
variables will also follow a normal distribution. You can use the methods from Section 4 to find out the
parameters of this distribution.

Rewind

You studied the normal distribution in A Level Mathematics Student Book 2, Chapter 21.

Key point 4.8

If and are independent random variables following a normal distribution and


, then also follows a normal distribution.

WORKED EXAMPLE 4.8

Given that , and , find .


Use Key point 4.5.

State the distribution of .

Use your calculator to find the probability.

WORKED EXAMPLE 4.9

Given that and four independent observations of are made, find .

Express in terms of observations of .

Use Key point 4.7.

State the distribution of .


Use your calculator to find the probability.

Rewind

In Chapter 2 you met the idea that the Poisson distribution was scalable. You can now interpret
this as meaning that the sum of two Poisson variables is also Poisson. This is the only other
distribution in this course that has this property. However, it only applies to sums of Poisson
distributions – not to differences or multiples or linear combinations.

EXERCISE 4E

1 Given that and , find:

a i

ii
b i

ii
c i
ii
d i

ii
e i
ii
f i , where is the average of observations of .

ii , where is the average of observations of .


2 An airline has found that the masses of their passengers follow a normal distribution with mean
and variance .

The masses of their hand luggage follow a normal distribution with mean and variance
.

a State the distribution of the total mass of a passenger and their hand luggage and find any
necessary parameters.

b What is the probability that the total mass of a passenger and their luggage exceeds ?
3 Evidence suggests that the times Aaron takes to run are normally distributed with mean
and standard deviation . The times Bashir takes to run are normally
distributed with mean and standard deviation .

a Find the mean and standard deviation of the difference between Aaron’s and
Bashir’s times.

b Find the probability that Aaron finishes a race before Bashir.


c What is the probability that Bashir beats Aaron by more than ?
4 A machine produces metal rods so that their lengths follow a normal distribution with mean
and variance .

The rods are checked in batches of six, and a batch is rejected if the average length is less than
or more than .

a Find the distribution, including any necessary parameters, of the mean of a random sample
of six rods.

b Hence find the probability that a batch is rejected.


5 The distribution of lengths of pipes produced by a machine is normal with mean and
standard deviation .

a What is the probability that a randomly chosen pipe has a length of or more?
b What is the probability that the average length of a randomly chosen set of pipes of this
type is or more?
6 The masses, , of male birds of a certain species are normally distributed with mean
and standard deviation .

The masses, , of female birds of this species are normally distributed with mean and
standard deviation .

a Find the mean and variance of .

b Find the probability that the mass of a randomly chosen male bird is more than twice the
mass of a randomly chosen female bird.
c Find the probability that the total mass of three male birds and female birds (chosen
independently) exceeds .
7 A shop sells apples and pears. The masses, in grams, of the apples can be assumed to have a
distribution and the masses of the pears, in grams, can be assumed to have a
distribution.

a Find the probability that the mass of a randomly chosen apple is more than double the mass
of a randomly chosen pear.
b A shopper buys apples and a pear. Find the probability that the total mass is greater than
.
8 The length of a corn snake is normally distributed with mean .

The probability that a randomly selected sample of corn snakes has an average of above
is .

Find the standard deviation of the length of a corn snake.

9 a In a test, boys have scores that follow the distribution . Girls’ scores follow .
What is the probability that a randomly chosen boy and a randomly chosen girl differ in
scores by less than ?

b What is the probability that a randomly chosen boy scores less than three-quarters of the
mark of a randomly chosen girl?
10 The daily rainfall in Algebraville follows a normal distribution with mean and standard
deviation .

On a randomly chosen day, there is a probability of that the rainfall is greater than .

In a randomly chosen seven-day week, there is a probability of that the mean daily rainfall
is less than .

a Find the value of and of .

b What assumption was required in performing this calculation? How reasonable is this
assumption?
11 Anu uses public transport to go to school each morning. The time she waits each morning for
the transport is normally distributed with a mean of  minutes and a standard deviation of  
minutes.

a On a specific morning, what is the probability that Anu waits more than  minutes?
b During a particular week (Monday to Friday), what is the probability that:
i her total morning waiting time does not exceed  minutes

ii she waits less than minutes on exactly mornings of the week

iii her average morning waiting time is more than  minutes?


c Given that the total morning waiting time for the first four days is  minutes, find the
probability that the average for the week is over  minutes.
d Given that Anu’s average morning waiting time in a week is over  minutes, find the
probability that it is less than minutes.

Tip

Only consider the last day.


Section 6: Cumulative distribution functions
In Key point 4.4 you saw a method for finding the median. This method can be generalised to find any
percentile, using a function called a cumulative distribution function. This function has many surprising
uses because, unlike a probability density function, it represents a real probability so can be combined
using laws of probability.

A cumulative distribution function (CDF) measures the probability of a random variable being less
than or equal to a particular value. Normally, if the probability density function is called , the
cumulative distribution function is called .

Key point 4.9

For a continuous distribution

Tip

The in the integral is a dummy variable. You could replace it with any other symbol. The
only real variable in this expression is the in the upper limit, which corresponds to the in
the left hand expression.

Since you can undo integration by differentiation, you can recover the probability density function from
.

Key point 4.10

Given that is the cumulative distribution function, then you can find the probability
density function, , using:

WORKED EXAMPLE 4.10

Find the cumulative distribution function, given that a continuous random variable has
probability density function for and otherwise.

If State when is below and above the range in which is


defined. When is above the probability of the random
If . variable being below is , because all observed values are
between and .

If :

Since there is no probability of the random variable being below


, the integral starts at .

Once you have the cumulative distribution function you can use it to find the median, quartiles and
any other percentiles, since the th percentile is defined as the value such that . i.e.
.

Rewind

You saw that you could do this without explicitly referring to a cumulative distribution
function, in Exercise 4B.
WORKED EXAMPLE 4.11

The continuous random variable has cumulative distribution function

a Find the probability density function of


b Find the lower quartile of .

a PDF is the derivative of CDF.

and otherwise.

b At the lower quartile: Lower quartile is th percentile.

is non-zero only if . Decide which solution to choose.

Therefore .

EXERCISE 4F
EXERCISE 4F
1 Find the cumulative distribution function for each of these probability density functions, and
hence find the median of the distribution.

a i

ii

b i

ii

2 Given each continuous cumulative distribution function, find the probability density function and
the median.

a i

ii

b i

ii

3 Find the exact value of the percentile of the continuous random variable that has pdf
for and otherwise.

4
A continuous random variable has cumulative distribution function

a Find the value of .


b Find the probability density function.
c Find the median of the distribution.
Section 7: Piecewise-defined probability density functions
A probability density function can have different function rules on different parts of its domain. Such a
function is said to be defined piecewise. All the techniques from the Sections 1–6 still apply; however,
when evaluating definite integrals you need to split them into several parts.

Rewind

You already met this idea in the context of kinematics in A Level Mathematics Student Book 1,
Chapter 16.

WORKED EXAMPLE 4.12

A continuous random variable has probability density function

a Sketch .

b Find the value of .


c Find .
f(x)
a

0 x
0 1 2 3 4 5 6 7 8

b Using the fact that the total


area under the graph of
must be :

The area is now made up of two separate parts, so you need


to work out the two separate areas and add these together.

Again, you need to split the integral for into two parts.

You might be able to evaluate definite integrals on your


calculator.

WORKED EXAMPLE 4.13

A random variable has probability density function


a Find the median of .

b Find the cumulative distribution function of .

a If the median is , then

If : You don’t know whether the median is in or


in , so you need to try both cases.

Remember to check that any solution you find is


in the correct interval. In this case, neither of
these values can be the median.
If :
You need to split the probability into two parts:
.

So the median is . The median must be between and .


b When : You need to look at the two parts of the domain
separately.

When:

You need to split the probability into two parts to


use the different expressions.
Use , which you have already
found.

Remember to write out the full expression for


So,
.

EXERCISE 4G

1 A continuous random variable has probability density function

a Sketch the graph of .


b Find the value of .
c Find the value of such that .
2 A random variable has cumulative distribution function

a Find the median of .

b Find the mean and the variance of .


3 A continuous random variable has probability density function

a Show that .

b Write down the value of .


c Find the upper quartile of .
d Find the cumulative distribution function of .
4 The continuous random variable is defined by the probability density function

a Find the value of .

b Sketch the probability density function.


c Find .
d Find the median of .
5 Function is defined by

a Show that is a valid probability density function.

b Find the variance of a random variable whose probability density function is .


6 The continuous random variable has probability density function

a Find the value of .


b Find the expectation of .
c Find the cumulative distribution function of .
d Find the median of .
e Find the lower quartile of .
7 A continuous random variable has probability density function

a Sketch .

b Show that .

c Find .

d Find the exact value of


Section 8: Rectangular distribution
The rectangular distribution is related to the discrete uniform distribution. It is a distribution where
any equally sized part of the domain has an equal probability of occurring. It is defined by the
endpoints of the domain, and . The probability density function is a constant, and this constant must
be chosen so that the total area under the graph is .

Rewind

You met the discrete uniform distribution in Chapter 1.

Key point 4.11

If follows a rectangular distribution between and , then


for .

Tip

The easiest way to get this result is not to use integration, but to realise that the graph forms
a rectangle with width and total area .

b – a

1
Area = 1
b – a

a b

You can find the mean of this distribution by using integration.

WORKED EXAMPLE 4.14

Prove that if is a random variable following a rectangular distribution over with ,


then .

Use the definition of expectation from Key point 4.3.

Use the PDF of a rectangular distribution from Key point 4.11.

Use the laws of integration.

Use the difference of two squares.

Since

You can use a similar method to find the variance.


Fast forward

You are asked to prove this in question 6 in Exercise 4H.

Key point 4.12

Given that is a random variable following a rectangular distribution over :

WORKED EXAMPLE 4.15

When a measurement is quoted to the nearest it is equally likely to be anywhere within


of the stated value. A large number of measurements of different objects, all of which round to
, are made and their accurate values noted.
a Find the probability that an object quoted as being to the nearest is actually more than
away from .

b Find the standard deviation of the difference between the quoted value and the true value (with
quoted values below the true value giving a negative difference)

a . Define variables.

follows a rectangular distribution over


Identify the distribution.
.

Required probability is Write the required distribution in


. mathematical terms.

Use areas of rectangles rather than


integration.

b Use Key point 4.5.

Use the formula for from Key point


4.12.

So, standard deviation = =

EXERCISE 4H
EXERCISE 4H
1 Find these probabilities. In parts a to d, follows a rectangular distribution over .

a i ;

ii ;

b i ;

ii ;

c i ;

ii ;
d i ;

ii ;
e i When a measurement is quoted to the nearest cm it is equally likely to be anywhere within
of the stated value. Find the probability that a measurement quoted as being to
the nearest cm is actually above .

ii A car’s milometer shows the number of completed miles it has done. Jerry’s car shows
miles. What is the probability that it will show miles in the next miles?
2 Find the expected mean and standard deviation of:

a i given that it follows a rectangular distribution over

ii given that it follows a rectangular distribution over


b i the true value of a result quoted as being to the nearest

ii the true age of a boy who (honestly) describes himself as eighteen years old.
3 A piece is cut off one end of a log of length . Given that the cut is equally likely to be made
anywhere along the log, find:

a the probability that the length of the piece is less than

b the expected mean and standard deviation of the length of the piece.
4 A string of length is randomly cut into two pieces. Find the probability that the length of the
shorter piece is less than .

5 Five random numbers are selected from the interval . Find the probability that they are all
smaller than .

6 a Prove, using integration, that the variance of the rectangular distribution between and is

b Hence prove that the ratio is independent of and , stating its value.

7 A rod of length is cut into two parts. The position of the cut is uniformly distributed along the
length of the rod. Find the mean and standard deviation of the length of the shorter part.
Section 9: Exponential distribution
When you model the waiting interval until a first success in a Poisson-type situation you can use the
exponential distribution. It is defined by the number of successes in a unit interval of time, , and it
is written as . Since the waiting interval is a continuous variable, the probability distribution is
described using a probability density function.

Rewind

You met the Poisson distribution in Chapter 2.

Key point 4.13

Given that , then:

You can find the mean and the variance of the exponential distribution by using integration.

Key point 4.14

Given that , then:

Fast forward

You are asked to prove the formula for the variance in question 10 in Exercise 4I.

PROOF 5

Prove that if , then .

Start from the definition of expectation (Key point 4.3).

The integral starts from as this is the lower limit of the


probability distribution.
You need to use integration by parts. As usual when doing
Identify:
integration by parts, start by identifying and ...

So ... then find and .

So Use

When the square bracket term is . It is less obvious


what happens when , but it turns out that the
terms goes to zero faster than goes to infinity, so overall
it is zero at both limits.
When , .

When , .
WORKED EXAMPLE 4.16

The number of leaks in any miles of pipes in a sewer system follows a Poisson distribution
with mean .

a Find the probability that the first leak will be found in the first half mile.
b Find the variance of the distance until the first leak is found.

a Define variables.

Identify the distribution. Since the number of leaks in


miles follows a Poisson distribution, the distance until the
first leak will follow an exponential distribution. To find the
parameter, you need to find the number of leaks per unit of
distance (miles); here it is .
Write the required probability in mathematical terms
and use the probability density function.

b Use the formula for from Key point 4.14.

You could also be asked to find a probability of a variable with an exponential distribution being greater
than a particular value. You can do this by integration, but it is useful to know the cumulative
distribution function. You can find this using integration. If then

Rewind

You met integration by parts in A Level Mathematics Student Book 2, Chapter 11.

Key point 4.15

If , then .

The exponential distribution also has a property called memorylessness. Prior waiting does not change
how long you are likely to wait for an event. This means that as well as measuring the amount of time
until a first event, it also measures the interval between events, as shown in Worked example 4.17.

WORKED EXAMPLE 4.17

During the summer Tanis sneezes on average two times every hour.

a State an assumption that must be made to model the time until the next sneeze by using an
exponential distribution.
b Assuming that the time until the next sneeze can be modelled by an exponential distribution,
find the exact probability that Tanis goes more than ninety minutes after waking up before
sneezing.

a You must assume that


sneezes occur independently
of each other.

b Define variables.

Identify the distribution. The exponential can be used with


any starting point, so the fact that it is time after waking
is not important.

Write the required probability in mathematical terms and


use the cumulative distribution function. Remember that
units are hours.

EXERCISE 4I

1 Find these probabilities.

a i if

ii if
b i if

ii if
c i waiting more than seconds for an emission from a radioactive substance that emits three
alpha particles per minute on average

ii waiting less than fifteen minutes for a bus that comes three times per hour on average.
2 Find the expected mean and standard deviation of:

a i

ii
b i the distance travelled in a car before reaching the first pot hole if pot holes along a certain
road are spread independently at an average rate of per kilometre

ii the time from the beginning of the day until the first phone call at a call centre that receives
an average of calls per hour.

3 The number of emails Khaled receives in an hour follows a Poisson distribution with mean . What
is the probability that the next email arrives in less than minutes?

4 Birds arrive at a feeding table independently, at an average rate of per hour.

a Find the probability that two birds will arrive in the next ten minutes.
b Find the probability that there is more than a ten-minute wait before the next bird arrives.

c Find the expected mean and standard deviation of the time (in minutes) spent waiting for a
bird.
5 When Ben walks down a particular street, he meets people he knows at an average rate of three
every 5 minutes. Different meetings are independent of each other. What is the probability that
Ben has to walk for more than minutes before he meets a person he knows?

6 The probability of waiting less than minutes for a bus is . If the waiting time is modelled by an
exponential distribution, find the probability of waiting more than minutes.

7 The probability of waiting more than minutes for a phone call is . Find an expression for the
mean waiting time for a phone call in terms of and , assuming the waiting time can be
modelled by the exponential distribution.

8 Show that the probability of a variable with an exponential distribution taking a value larger
than its mean is independent of .

9 The number of buses arriving at a bus stop in an hour follows a Poisson distribution with mean .

a Name the distribution which models the time, in minutes, Amanda has to wait until the next
bus arrives. State any necessary parameters.
b Given that Amanda has already been waiting for minutes, find the probability that she has to
wait at least minutes.
c Show that the answer in part b is the same as the probability that Amanda has to wait at least
minutes.

10 Prove that if , then .

11 is the number of successes that occur in one unit of time, so that . is the number of
successes that occur in units of time.

a Write down the distribution of .


b Find , giving your answer in terms of and .
follows an exponential distribution .
c Explain why .
d Hence prove that the probability density function of is .
Section 10: Combining discrete and continuous random variables
It is possible for a random variable to be discrete in some parts of its domain and continuous in other
parts of its domain. For example, a doctor might measure the masses of babies less than as
precisely as possible (creating a continuous part of the random variable) but masses above might
be measured to the nearest (creating a discrete part of the random variable).

If this is the case you apply all the rules learnt in this chapter and in Chapter 1 but using sums over the
discrete part of the random variable and integrals over the continuous part of the random variable.

Tip

Notice that in Worked example 4.18 the end point of the continuous part of the variable is a
part of the discrete random variable. You might worry about situations like this, but it is
perfectly possible to define random variables in this way.

WORKED EXAMPLE 4.18

The random variable can only take the values , or .

If the variable has PDF .

When or then .
a Find the value of .
b Find .

a Total probability is: The total probability is an integral over the


continuous range plus a sum over the discrete
range.

So therefore Use the fact that the total probability equals 1.

b You need to split the expectation into an integral


over the continuous part of the variable and a sum
over the discrete part.

EXERCISE 4J

1 Find and for these mixed probability distributions.

a i for for

ii for for
b i for for

ii for for
c i for for

ii for for

d i for for

ii for for
2 The random variable is defined for and for . Between and the
probability density function is given by . It is also known that is . Find:

a the value of

b
c

d .
3 The random variable is defined for and for .

Between and the probability density function is given by . It is also known


that P is .

a Find an expression for in terms of .

b Given that , find .


4 The mixed random variable can take any value between and as well as integer values from
to . The distribution is defined by:

for

for

a Find the value of .

b Find .
5 The mixed random variable can take any values from to and the discrete values and . It
has cumulative distribution function:

for

for

Find:

a
b

d .

6 A mixed random variable can take the discrete values and and continuous values between
and . It has cumulative distribution function:

for

for

Find:

a the values of and


b .
7 The mixed random variable can take any values between and , and the discrete values
and . Between and it has probability density function . When or .
Prove that there is only one possible value of and find its value.

8 An athletics coach records the time of a squad of junior sprinters. He records the time of
anyone who runs between and seconds as precisely as he can. Anyone who runs between
and seconds gets their time recorded to the nearest tenth of a second. He models the time
recorded by this probability distribution:

for

for

a Find the value of .


b Find , giving your answer to three decimal places.
c Find the standard deviation of .
The true times of the athletes have probability density function:
for
for

d Find the value of .


e Find and comment on your answer in relation to part b.

 Checklist of learning and understanding

The probability of a continuous random variable taking any single value is a meaningless
concept, but it is possible to work with the probability of it being in a given range. To do this
you use a probability density function such that the area under the curve represents
probability. The total area is therefore 1, and the function is never negative.
The summation formulae for the expectation of discrete random variables become integrals for
continuous random variables:

is still
The expectation and variance of a linear transformation are given by:

If and are independent random variables then

If and are independent random variables following a normal distribution and


, then also follows a normal distribution.
The expectation of a function of a continuous random variable is given by:

The cumulative distribution function gives the probability of the random variable taking a
value less than or equal to .
For a continuous distribution with PDF :

The main uses of cumulative distribution functions are to find percentiles of a distribution and
to convert from a distribution of one continuous random variable to a distribution of a function
of that variable.
If follows a rectangular distribution between and , then

If , then
for
Mixed practice 4
1 A continuous random variable has probability density function for . Find the exact
value of .

Choose from these options.

2 and are independent random variables. has mean and standard deviation . has
mean and standard deviation . Given that , what is the standard deviation of
?

Choose from these options.

3 The continuous random variable has PDF and otherwise.

a Find the cumulative distribution function of .

b Find .

c Find .

4
Given that is a continuous random variable with PDF , find:

a the value of

b the expectation of

c the variance of .

5 The Jones’ expected spend on their garden is £ with a variance of £ . This is paid for
out of a bank account containing £ .

a What is the standard deviation in the amount remaining in the bank account after the
garden has been paid for?

However much the Jones’ spend on their garden, the Smiths will spend twice as much plus
£ .

b What is the expected amount that the Smiths will spend?

c What is the standard deviation in the amount that the Smiths will spend?

6 a If is a continuous random variable with PDF and , find


the value of the constants and .

b Evaluate:

ii

7 The continuous random variable has probability density function

and otherwise.

a Find the cumulative distribution function of .

b Find the exact value of the median of .

8 Given that the continuous random variable has PDF and


otherwise, find the interquartile range of . You will have to make appropriate use of
technology.

9 The probability density function for the continuous random variable is for and
otherwise.

a Find the value of .

b Find .

c Find .

d Find the exact value of .

10 A doctor measures the masses of babies. If a baby has a mass between and the mass is
recorded as accurately as possible. If the mass is between and the mass is recorded
to the nearest . The doctor models the recorded masses using the random variable
with probability distribution defined by:

for .

There are no masses recorded outside of the range from to .

a Write down the value of .

b Hence find the value of .

c Find .

d Find .

11 The time taken, in minutes, to wash the dishes is modelled by a random variable with
expectation and standard deviation .

The time taken, in minutes, to clean the table is modelled by a random variable with
expectation and standard deviation .

In this model and are considered to be independent.

a Before leaving Hassan must wash the dishes then clean the table. is the total time this
takes. Find the expectation and standard deviation of .
b When Alice visits the jobs can be shared. Hassan washes the dishes and Alice cleans the
table. is the time Alice has to wait after finishing cleaning the table before they can
leave. Find the expected mean and standard deviation in .

c Is the assumption that and are independent reasonable in these situations?

d Hassan keeps a record of the total time he spends washing the dishes over days. He
assumes that the times taken each day are independent and models the total time in the
days using the random variable . For what values of will be more than times
the standard deviation of ?

12 The times Markus takes to answer a multiple choice question are normally distributed with
mean and standard deviation . He has one hour to complete a test
consisting of questions.

Assuming the questions are independent, find the probability that Markus does not complete
the test in time.

13 The masses of men in a factory are known to be normally distributed with mean and
standard deviation . There is an elevator with a maximum recommended load of .
With men in the elevator, calculate the probability that their combined mass exceeds the
maximum recommended load.

14 Davina makes bracelets by threading purple and yellow beads. Each bracelet consists of
seven randomly selected purple beads and four randomly selected yellow beads. The
diameters of the beads are normally distributed with standard deviation . The average
diameter of a purple bead is and the average diameter of a yellow bead is . Find
the probability that the length of a bracelet is less than .

15 The masses of the parents at a primary school are normally distributed with mean and
variance , and the masses of the children are normally distributed with mean and
variance . Let the random variable represent the combined mass of two randomly
chosen parents and the random variable represent the combined mass of four randomly
chosen children.

a Find the mean and variance of .

b Find the probability that four children weigh more than two parents.

16 A random variable has cumulative distribution function given by

The diagram shows the graph of .

0.5

0 x
0 1 2 3 4 5 6 7 8 9 10
a Find .

b Find the median of .

c Find the probability density function for .

d Show that the mean of is .

You are given that the variance of is .

e Find the probability that the mean of a random sample of values of is greater than .

17 The number of beta particles emitted by a radioactive substance follows a Poisson


distribution. The probability of observing no particles in hours is

a Find the expected waiting time until the first beta particle is observed.

b Find the probability of waiting more than minutes to observe a beta particle.

c Given that no particles have been observed in the first minutes, find the probability that
it takes more than hours to observe a beta particle.

18 is a continuous random variable following a rectangular distribution between and , with


.

a Prove that .

b Find the cumulative distribution function of .

c Two independent observations of are made. Find an expression for the probability that
the maximum of these two observations is less than where .

19 The humidity of air is measured by a weather station. It can only take values from to
inclusive.

It is modelled by a mixed random variable, , with these properties:

Between and has PDF:

a Find the value of .

b Find .

c Find the median of .

20 The continuous random variable has cumulative distribution function


. Find the probability that in four observations of more than
two observations take a value of less than .

21 The continuous random variable has CDF . The median of is .


Find the values of , and .

22 The marks students scored in a Mathematics test follow a normal distribution with mean
and variance . The marks of the same group of students in an English test follow a normal
distribution with mean and variance .

a Find the probability that a randomly chosen student scored a higher mark in English than in
Mathematics.

b Find the probability that the average English mark of a class of students is higher than
their average Mathematics mark.

23 The continuous random variable has probability density function:

a Show that .

b What is the probability that the random variable has a value that lies between and ?

Give your answer in terms of .

c Find the mean and variance of the distribution. Give your answers in terms of .

The random variable represents the lifetime, in years, of a certain type of battery.

d Find the probability that a battery lasts more than six months.

A calculator is fitted with three of these batteries. Each battery fails independently of the
other two.

e Find the probability that at the end of six months:

i none of the batteries has failed

ii exactly one of the batteries has failed.

24 The random variable has probability density function defined by:

a Sketch the graph of .

b Find the exact value of .

c Prove that the distribution function , for , is defined by .

d Hence, or otherwise:

i find

ii show that the median, , of satisfies the equation .

e Calculate the value of the median of , giving your answer to three decimal places.

[© AQA, 2012]

25 The continuous random variable has probability density function defined by:

a Sketch the graph of .


b Show that:

ii .

c Hence write down the exact value of:

i the interquartile range of

ii the median, , of .

d Find the exact value of .

[© AQA, 2011]
FOCUS ON … PROOF 1

Sums of discrete independent random variables


In this section, you will prove this important result:

Rewind

You studied discrete random variables in Chapter 1.

If and are discrete independent random variables, then

You need to know:

(Theorem 1)

(Theorem 2)

If and are independent random variables, then


(Theorem 3)
(Theorem 4)

A finite double sum of a sum can be split into two sums: (Theorem 5)

PROOF 6

On each line, state which of these theorems are being applied.

1 Theorem ____

2 Theorem ____

3 Theorem ____

Properties of sums
4

5 Theorem ___ and


Theorem ___

6 Theorem ____
7 Theorem ____

QUESTIONS

Use techniques similar to those in Proof 6 to answer these questions. and are independent discrete
random variables.

1 Prove that .


2 a Prove that .

b Hence prove that .



3 Prove that .
FOCUS ON … PROBLEM SOLVING 1

Finding the parameters of a distribution


Often you are not told directly the parameters of a distribution, but have to infer them from given
information. If this is the case, sometimes the equations will be impossible to solve directly, so you have to
use technology to solve them.

WORKED EXAMPLE

In a Poisson distribution the probability of two events occurring is . Find the probability of one
event occurring.

If then
Write the information given in terms of , the parameter
of the Poisson distribution.

0.3

0.2 This equation is not solvable using standard functions, so


(0.605, 0.1) (4.708, 0.1) you can instead sketch it.
0.1

–1 O 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

So or . You can use graphing technology to find the intersection


points.
or

QUESTIONS

1 Given that and , find .

2 Given that and , find .

3 Given that and to decimal places, find .

4 The probability of a biased coin showing a head is .


a In tosses, one head is observed. Show that the probability of this happening is .

b In another tosses, two heads are observed. Show that the probability of this happening, and the
observation in part a happening, is .
c By using technology or otherwise, find the value of that maximises the probability of getting three
heads in tosses.

Did you know?


This type of method is called maximum likelihood estimation and is a very powerful tool in
advanced statistics. You could research and list the uses of this.
FOCUS ON … MODELLING 1

Situations for the Poisson distribution


The Poisson distribution is frequently used to model situations in which there is a rate of events. However,
it can be applied incorrectly because there are several conditions that must be met:

the process must be random, so that it is not totally predictable


there must be a constant average rate, not something that changes in different areas or over time
the events must be independent of each other.

QUESTIONS

To help you to understand the required conditions in context, here are some examples of real-life
situations. Comment on whether the Poisson distribution would be an appropriate model in each of these
situations. Where the Poisson is not appropriate, state which conditions are not met.

Did you know?


In several of the situations in which the Poisson conditions are not perfectly met, in reality
statisticians still use the Poisson model to make useful predictions. This is because all models are
imperfect and the errors in estimating the average rate might well be larger than the errors
caused by a weak dependency between the events. When interpreting models it is vital to
understand the sources and scales of uncertainty in the output.

1 The number of fish in a volume of an ocean where fish occur at an average rate of per .
2 The number of signals received in an hour by a mobile phone from a communication mast when a
signal is received every seconds.
3 The number of beta particles emitted every minute by a radioactive substance that emits on average
beta particle every seconds.
4 The waiting time for a bus when one arrives on average every minutes.

5 The number of errors in pages of a textbook if there is an average of error on every pages.
6 The number of fish caught in ten hours in a small pond if an average of fish are caught every hour.
7 The number of girls in randomly selected people if it is expected that of the population are
girls.

8 A binomial distribution when is very large.

Tip

You might want to use technology to confirm your answer to question 8.


5 Further hypothesis testing

In this chapter you will learn how to:

interpret the different types of errors that can be made while conducting
hypothesis tests, called type I and type II errors
calculate the probability of a type I error based on a Poisson distribution
calculate the probability of a type I error based on a binomial distribution.
If you are following the A Level course, you will also learn how to:
use a new type of hypothesis test for the mean, called a -test
calculate the probability of a type II error based on a Poisson distribution
calculate the probability of a type II error based on a binomial distribution
calculate the probability of type I and type II errors based on a normal distribution.

Before you start …


A Level Mathematics You should know how to calculate 1 Find an unbiased estimate of the
Student Book 1, unbiased estimates of the variance of a population based
Chapter 22 population variance. on this sample:
.

A Level Mathematics You should know how to conduct 2 A six-sided dice is rolled five
Student Book 1, hypothesis tests, using the binomial times and three sixes are
Chapter 22 distribution. observed. Test at the
significance level if this provides
evidence that the dice is biased
towards rolling more sixes than a
fair dice.

Chapter 2 You should know how to conduct 3 The number of bees arriving at a
hypothesis tests, using the Poisson flower is modelled by a Poisson
distribution. distribution. If six bees arrive in
one minute, does this provide
evidence at the significance
level that the true mean is
greater than ?

A Level Mathematics You should know how to conduct 4 If , find .


Student Book 2, calculations with the normal
Chapter 21 distribution.

A Level Mathematics You should know how to conduct 5 A sample of objects drawn
Student Book 2, hypothesis tests with the normal from a normal distribution with a
Chapter 22 distribution. standard deviation of has a
mean of . Conduct a two-
tailed test at significance to
decide if this provides significant
evidence of a change from a
mean of .

Rewind

You studied hypothesis testing using the normal distribution in A Level Mathematics
Student Book 2, Chapter 22.

Realistic hypothesis testing


When studying hypothesis testing, using the normal distribution (the -test), you might have wondered
about conditions required for using it. It is a test in which you are uncertain about the population mean but
you do know the population variance. This is not a situation that occurs very frequently. More often, you
need to use the sample to estimate the variance of the population. To do this, you use a -test.

One of the reasons hypothesis tests are so important in modern statistics is that they try to give a
probability of certain types of error. You will see in this chapter which types of errors are controlled and
which are not, and how to calculate their probabilities.
Section 1: -tests
In a -test to see if the population mean has changed from you are testing the hypotheses:

You calculate the -score for , your mean of a sample of size :

where and are found from the sample while is the value in the null hypothesis and , the
population standard deviation, is assumed to be the same as a previously held value. You can use the
fact that to do calculations with this statistic.

Tip

is a random variable representing the sample mean. is a particular observation of this


random variable.

If you do not know (or have reason to believe that it has changed) you must instead estimate it
from the data. An appropriate way to estimate this is using the square root of the unbiased
estimate of the variance, . You can then construct a -score.

x
x1 x2 x3 σ
S

-scores very rarely exceed (or go below ). However, if your sample just happens to have a very
small standard deviation – for example, if the sample is and in the graph shown – then the -
score can get quite large: it is not unusual for it to be around . This highlights that it does not follow
a normal distribution. The likelihood of getting a very tightly clustered sample depends on . At low
values of this possible clustering has a very big effect, but at large values of the sample standard
deviation is a very good approximation of the population standard deviation. This means that there
are lots of different -distributions, depending on the value of .

Tip

Although the -distribution gives the distribution of the random variable , conventionally it
is written with a lower case .
Z-distribution
t-distribution with n = 7
t-distribution with n = 4

As the value of grows, the -distribution gets closer and closer to the -distribution.

Key point 5.1

The -test is based on the test statistic:

This will be given in your formula book.

You might wonder why there is an in the formula in Key point 5.1. Rather than using the value of
to describe the -distribution, conventionally you use the degrees of freedom, . Because you fix
one parameter, the population mean, when you are doing a -test you use the formula .

Tip

Some graphical calculators can perform a -test. You should state the test statistic, its
distribution and the -value from your calculator. If the mean and standard deviation of
the sample are not given in the question you should state those too: your calculator will
find them in the process of performing the -test.

To conduct a -test, you calculate the value of  and then look up the critical value from the table
given in the formula book. This still requires some work as the information is given in terms of
cumulative probabilities. To do a one-tailed test with significance level , you look up the column
headed by in the table. To do a two-tailed test with significance level , you look up the column
headed by in the table. If the modulus of your -score is more than this value, you reject .

1 – α α

α

2

α
1 –– α

2 2

WORKED EXAMPLE 5.1

The label of a pre-packaged steak claims that it has a mass of . A random sample of
steaks is taken and their masses are:
Test at the significance level whether the label’s claim is accurate, stating any
assumptions you need to make.
mass of a steak in Define variables. You must use the -test since the
true variance is unknown, but to do this test the
Assume that underlying distribution must be normal.

State the hypotheses.

State the test statistic and its distribution.

From your calculator: Find sample statistics from your calculator. Since you
do not know the true population variance, you use an
unbiased estimate, .

So Calculate your -score.


The critical value is . To find the critical value when and for a
two-tailed test you look in the column in the table
given in the formula book.

     
     
     
     
     
     

, therefore do not Compare your calculated -score with the critical


reject – there is no significant value and conclude, putting the conclusion into
evidence to doubt the label’s context.
claim.

Common error

Conclusions are often stated without a sense of statistical uncertainty. For example, it would
be wrong to state that the conclusion of the test in Worked example 5.1 is: ‘The label is
correct.’

EXERCISE 5A

1 In each of these situations it is believed that is normally distributed. Decide the result of the
test if it is conducted at the significance level.
a i ;

ii ;
b i ;

ii ;
c i Data:
ii Data:
2 John believes that the average time taken for his computer to start is seconds. To test his
belief, he records the times (in seconds) taken for the computer to start:

a State suitable hypotheses.


b Test John’s belief at the significance level.
c Justify your choice of test, including any assumptions required.
3 Michael regularly buys packets of tea. He has noticed recently that he gets more cups of
tea than usual out of one packet, and suspects that the packets contain more than on
average. He weighs eight packets and finds that their mean mass is and the standard
deviation of their masses is .
a Find the unbiased estimate of the variance of the masses, based on Michael’s sample.

b Assuming that the masses are normally distributed, test Michael’s suspicion at the level of
significance.
4 The crawling ages of babies in a nursery are recorded. The sample has mean months and
standard deviation months. A parenting book claims that the average age for babies crawling
is months. Test at the level whether babies in the nursery crawl significantly earlier than
average, assuming that the distribution of crawling ages is normal.
5 Penelope thinks that cleaning the kettle will decrease the amount of time it takes to boil (
seconds). She knows that the average boiling time before cleaning is seconds. After cleaning,
she boils the kettle times and summarises the results as:

a State suitable hypotheses.


b Test Penelope’s idea at the significance level.
6 A national survey of athletics clubs found that the mean time for a -year-old athlete to run
is . A coach believes that athletes in his club are faster than average. To test his
belief he collects the times for athletes from his club and summarises the results in this table.
Time, Frequency

a Calculate an estimate of the mean time for the athletes in the club.
b Find an unbiased estimate of the population variance based on this sample.
c Test the coach’s belief at the level of significance.
d State what assumption you have made about the distribution of the athletes’ times.

7 The lengths of bananas are found to follow a normal distribution with mean Roland has
recently changed banana supplier and wants to test whether their mean length is different. He
takes a random sample of bananas and obtains these summary statistics:

a State suitable hypotheses for Roland’s test.


b Test at the significance level whether the data support the hypothesis that the mean
length of Roland’s bananas is different from .
c Roland’s assistant Sonia suggests that they should test whether the mean length of bananas
from the new supplier is less than .
i State suitable hypotheses for Sonia’s test.

ii Find the outcome of Sonia’s test at the significance level.


8 The manufacturers of tins of soup claim that the tins contain, on average, of soup. Aki
wants to test if this is an accurate claim. She samples tins of soup and finds that they have a
mean of and an unbiased estimate of the population standard deviation of .
a State appropriate null and alternative hypotheses.

b For what values of will Aki reject the null hypothesis at the significance level?
Section 2: Errors in hypothesis testing
Defining type I and type II errors
The acceptable conclusions to a hypothesis test are:

1 sufficient evidence to reject at the significance level


2 insufficient evidence to reject at the significance level.

It is always possible that these conclusions are wrong.

If the first conclusion is wrong – i.e. you have rejected while it was true – it is called a type I error
(spoken as ‘type one error’).

If the second conclusion is wrong – i.e. you have failed to reject when you should have done – it is called
a type II error (spoken as ‘type two error’).

Key point 5.2

In hypothesis tests, type I and type II errors can be summarised as:


Reality
is true is false
is true type II error
Claim
is false type I error

WORKED EXAMPLE 5.2

In a court case, defendants are presumed innocent.


a What are the null and alternative hypotheses in this situation?

b What would a type I error be in this situation?


c What would a type II error be in this situation?

a : Defendant is innocent.
: Defendant is guilty.

b A type I error would be saying that an innocent A type I error is rejecting when it was
person is guilty. true.

c A type II error would be saying that a guilty A type II error is not rejecting when it
person is innocent. was false.

Probability of type I errors


You cannot eliminate these errors, but you can find the probability that they occur. For a type I error to
occur, the test statistic must fall within the rejection region while was true. The critical region is
designed so that this probability is the significance level.

Key point 5.3

In a hypothesis test, the probability of a type I error is equal to the actual significance level.

The phrase ‘actual significance level’ is used because you might find when testing a discrete random
variable that you cannot create a critical region that has exactly the desired significance level. If you are
asked to design a test with significance, because of the discrete nature of the variables you might have
to create a critical level that in fact has an actual significance level near to . Conventionally, you would
choose the largest significance level you can that is less than .

You might not be told the significance level of a test, in which case you need to use a formula to calculate
it.

Key point 5.4

In a hypothesis test:

WORKED EXAMPLE 5.3

and these hypotheses are tested:

The null hypothesis is rejected if . Find the probability of a type I error in this test.

Use the definition of a type I error.

Use the tables for the Poisson distribution from the


formula book.

WORKED EXAMPLE 5.4

Derren wants to test whether a six-sided dice is biased towards rolling sixes. He rolls a dice
times.

a State appropriate null and alternative hypotheses.


b If Derren is using a significance level, how many sixes would he have to see to conclude at
significance that the dice is biased?
c For the test proposed in part b, find the probability of a type I error.

a If is the probability of rolling a then:

b If is the number of sixes rolled then if

is true, :
Use your calculator to work out the probabilities
from the binomial distribution. Since you are
looking for evidence of , you need to find
as this is an event as extreme as
observed, or more extreme in the direction of the
alternative hypothesis, i.e. it is the -value of an
observed .

The dice can be said to be biased at the


You need to find the first value of that has a -
significance level if or more sixes value less than .
are observed.
c The probability of a type I error is You can just read this from the table. It is the
. probability of observing five or more sixes while
the null hypothesis is true.

Probability of type II errors


If the true mean is anything other than that suggested by the null hypothesis and you have not
rejected the null hypothesis, then you have made a type II error (see Key point 5.2). The probability of
a type II error depends upon the true value of the population parameter. Suppose you are testing at
the significance level the null hypothesis with a standard deviation of . If the true mean
were you would expect to be able to detect this very easily. If the true mean were you might
have greater difficulty distinguishing this from . If you knew the true mean, then you could find the
probability of an observation of this distribution falling in the acceptance region for .

Rewind

You could also answer the type of test shown in Worked example 5.4 using a chi-squared
test – see Chapter 3 for a reminder of using a chi-squared test – with two categories: and
not . However, this would have only one degree of freedom so the fact that the chi-
squared test is only approximate is particularly problematic. It is preferable to use an exact
binomial test, as shown.

The following diagrams show the effect of different true population means on this test. In the first
diagram the true mean is , so anything that falls in the red region gets this right, but falling in the
blue regions results in a type I error. In the second, third and fourth diagrams the true mean is getting
further and further away from . Anything falling in the red regions picks this up, but anything falling
in the blue regions fails to detect that the true mean is not . All of these are now type II errors.
Rejection Acceptance Rejection
region region region

H 0 is true
5% type I error
95% correctly not reject H
0

2.5% 95% 2.5%


x
µ = 120

H 0 is untrue
(small difference)
Type II error
Correctly reject H0

x
µ = 120.4

H 0 is untrue
(medium difference)
Type II error
Correctly reject H0

x
µ = 130

H 0 is untrue
(large difference)
Type II error
Correctly reject H0

x
µ = 160

Key point 5.5

In a hypothesis test:

WORKED EXAMPLE 5.5

Internet speeds to a household are normally distributed with a standard deviation of .


The internet provider claims that the average speed of an internet connection has increased
above its long-term value of . A sample is taken on occasions and a hypothesis test is
conducted at the significance level. Find the probability of a type II error if the true average
speed is .
is the continuous random variable speed
of internet connection Define variables.

, State hypotheses.
State the test statistic and its distribution
(assuming is true).

x
9 a

Decide the range of that falls into the -


tailed acceptance region.
So do not reject if . State the acceptance region.

Use the definition of a type II error.

An important concept when studying tests is called the power of a test. It is defined as the
probability of rejecting when it is false, so it is the probability of not making a type II error.

Key point 5.6

In a hypothesis test:

WORKED EXAMPLE 5.6

A call centre believes that it receives calls at an average rate of per hour. To test this it looks
at the number of calls in a two-hour period. If that number is greater than or lower than , it
rejects the hypothesis that the average rate is per hour. Given that the actual rate of calls is
calls per hour, find the power of the test.

number of calls in two hours You are considering the actual rate rather than the
rate under the null hypothesis.

To find the probability of a type II error you look at how


likely is to fall into the acceptance region.

EXERCISE 5B

1 Given that , find the probability of a type I error for each of these situations.
a i ; reject if

ii ; reject if
b i ; reject if

ii ; reject if
c i ; reject if or

ii ; reject if or
2 Given that , find the probability of a type I error in each of these situations.
a i ; reject if

ii ; reject if
b i ; reject if

ii ; reject if

c i ; reject if or

ii ; reject if or
3 Given that , find the probability of a type I error for each of these situations.
a i ; reject if

ii ; reject if
b i ; reject if or

ii ; reject if or
4 Find the probability of a type II error for each of these situations. The sample mean is being
tested and the sample size, , is specified in each case.
a i with , ; significance; . In reality, .

ii with , ; significance; . In reality, .


b i with , ; significance; . In reality, .

ii with , ; significance; . In reality, .


c i with , ; significance; In reality, .

ii with , ; significance; . In reality, .


5 Given that , find the probability of a type II error for each of these situations. Find also
the power of the test.
a i ; reject if ; real

ii ; reject if ; real
b i ; reject if ; real

ii ; reject if ; real
c i ; reject if or ; real

ii ; reject if or ; real
6 Given that , find the probability of a type II error in each of these situations.
a i ; reject if ; true

ii ; reject if ; true
b i ; reject if ; true

ii ; reject if ; true
c i ; reject if or ; true

ii ; reject if or ; true
7 What are the advantages and disadvantages of increasing the significance level of a hypothesis
test?
8 A television magician tries to trick an audience into believing that a coin is biased. He records
himself tossing a fair coin many hundreds of times until he tosses ten heads in a row. He then
shows the audience the film containing only the ten heads being tossed.
a State the null and alternative hypotheses in this situation.
b If an audience member believed that the coin is biased, is this an example of a type I or a
type II error?
9 A textbook says that there is positive correlation between two variables if the sample correlation
coefficient is more than . Describe, in this context, what is meant by:
a a type I error
b a type II error.
10 A student conducts a binomial hypothesis test to see if a six-sided dice is fair. He rolls the dice
times and if he sees more than sixes, he will claim that the dice is biased.
a Describe, in the context of this test, what is meant by:
i a type I error
ii a type II error.
b State two changes to the test that would make a type II error less likely.
11 The numbers of people arriving at a health club follow a Poisson distribution with mean per
hour. After a new swimming pool is opened, the management want to test whether the number
of people visiting the club has increased.
a State suitable null and alternative hypotheses.
b They decide to record the number of people arriving at the club during a randomly chosen
hour, and to reject the null hypothesis if this number is larger than .
Find the significance level of this test. Comment on your result.
12 A long-term study suggests that traffic accidents at a particular junction occur randomly at a
constant rate of per week. After new traffic lights are installed, it is believed that the number
of accidents has decreased. The number of accidents over a -week period is recorded.
a Let denote the average number of accidents in a -week period. State suitable hypotheses
involving .
b It is decided to reject the null hypothesis if the number of accidents recorded is less than or
equal to . Find the probability of making a type I error.
c The average number of accidents has in fact decreased to per week. Find the probability
of making a type II error in this test.
13 The masses of eggs are known to be normally distributed with standard deviation . Dhalia
wants to test whether eggs produced by her hens have mass greater than on average.
a State suitable null and alternative hypotheses to test Dhalia’s idea.
Dhalia weighs eggs and finds that their average mass is .
b Test at the significance level whether Dhalia’s eggs have mass greater than on
average. State your conclusion clearly.
c Write down the probability of making a type I error in this test.

d What is the smallest average mass of the eggs that would lead Dhalia to reject the null
hypothesis?
e Given that the average mass of Dhalia’s eggs is actually , find the power of the test.
14 A coin is flipped times. It is decided that it is a biased coin if or heads are observed.
a State the null and alternative hypotheses.
b Find the significance level of the test.
c Given that the true probability of flipping a head is , find the probability of a type II error as a
function of .
d Show that the probability of a type II error is maximised when .
15 A population is known to have a normal distribution with a variance of and an unknown mean
. It is proposed to test the hypotheses using the mean of a sample of size
.
a Find the appropriate critical regions corresponding to a significance level of:
i
ii
b Given that the true population mean is , calculate the probability of making a type II error
when the level of significance is:
i
ii
16 The number of worms in a square metre of forest is known to follow a Poisson distribution. The
mean is thought to be . This is rejected if no worms are observed when a square metre is
observed. If the true mean is , find an expression in terms of for the power of this test.

Checklist of learning and understanding

A -test is a way of testing to see if a sample provides evidence of a change in the population
mean from a previously held belief. It is based on the -score:

A type I error is falsely rejecting .


A type II error is not rejecting when it is false.

.
Mixed practice 5
1 Find the value of the -score (to three significant figures) for these data when testing the
null hypothesis .

Choose from these options.

2 What is the definition of the significance level in a hypothesis test of a continuous


parameter? Choose from these options.

3 A chemist collects data on the volume of produced in a reaction. He believes that it


should be .

a Write down null and alternative hypotheses for the chemist’s belief.

He measures the reaction times and gets these results:

b Find the -score for these data.

c Hence conduct a -test at the significance level.

d What assumption have you made in conducting a -test?

4 a Give the definition of a type II error.

A one-tailed -test is conducted for the hypotheses:

It is known that

It is decided to reject if a mean of observations is less than .

b Find the probability of a type I error in this test.

In reality, .

c Find the probability of a type II error in this test.

d What could be done to decrease the probability of a type II error without changing the
probability of a type I error?

5 The number of beta particles emitted in one second by an isotope of a radioactive element
is known to follow a distribution. A theory suggests that but a physicist believes
that this might be an underestimate.

a State the null and alternative hypotheses.

b The physicist decides that he will reject the null hypothesis if he sees more than beta
particles in a five-second period. What is the significance level of this test?

c In reality, . Find the power of this test.

6 A union representative wishes to test a company’s claim that it pays an average salary of
£ . She suspects that the company pays less than this.

a Write down null and alternative hypotheses for her test.

The union representative takes a random sample of employees and finds their wages
in thousands of pounds. Her results are summarised here:

b Find an unbiased estimate of the variance of .

c What is the -score for her results?

d Conduct a -test at the significance level to test her suspicion.

7 A coin is flipped times and it is decided that it is a biased coin if more than heads or
fewer than heads are observed.

a State the null and alternative hypotheses.

b Find the significance level of the test.

c If the coin is actually biased so that heads occur of the time, find the power of the
test.

8 Safeerah regularly cycles to and from work. She has a steel-framed bicycle that weighs
. Her mean journey time for the round trip is minutes. Her friend, Josh, has a carbon-
framed bicycle that weighs . Safeerah is thinking of buying a carbon-framed bicycle to
reduce her journey time, and Josh agrees to lend her his bicycle so that she can try it.

a The carbon-framed bicycle is sold using the slogan: ‘Less weight means more speed’.
Safeerah, who weighs , is expecting that the per cent reduction in bicycle mass
will substantially reduce her journey times. Josh tells her not to expect this as the
resultant mass reduction is actually closer to per cent.

Justify Josh’s figure of per cent.

b Safeerah records her journey times with the carbon-framed bicycle on typical days as:

Assuming that these times may be regarded as a random sample from a normal
distribution, test, at the significance level, whether her mean journey time with the
carbon-framed bicycle is less than minutes.

[© AQA 2013]

9 A company manufactures bath panels. The bath panels should be deep, but a small
amount of variability is acceptable. The depths are known to be normally distributed with
standard deviation .

a In order to check that the mean depth is , Amir takes a random sample of bath
panels from the current production and measures their depths, in millimetres, with these
results.

Test whether the current mean is , using the significance level.

b Isabella, a manager, tells Amir that, in order to check whether the current mean is
, it is necessary to take a larger sample. Amir therefore takes a random sample of
size from the current production and finds that the mean depth is .

Test whether the current mean is , using the data from this second sample and
the significance level.

c It is proposed to carry out hypothesis tests at regular intervals to check that the mean
remains at .

Amir proposes that the tests be based on random samples of size , but Isabella favours
random samples of size . Explain which, if either, sample size would lead to a smaller
risk:

i of a type I error

ii of a type II error.

[© AQA 2011]

10 A town council wanted residents to apply for grants that were available for home insulation.
In a trial, a random sample of residents was encouraged, either in a letter or by a
phone call, to apply for the grants. The outcomes are shown in the table.

Applied for grant Did not apply for grant Total


Letter
Phone call
Total

a The council believed that a phone call was more effective than a letter in encouraging
people to apply for a grant. Use a -test to investigate this belief at the significance
level.

b After the trial, all the residents in the town were encouraged, either in a letter or by a
phone call, to apply for the grants. It was found that there was no association between
the method of encouragement and the outcome. State, with a reason, whether a type I
error, a type II error or neither occurred in carrying out the test in part a.

[© AQA 2013]
6 Confidence intervals

In this chapter you will learn how to:

estimate the interval in which a population parameter lies, called a confidence


interval
estimate the confidence interval when the population variance is known
use confidence intervals to conduct hypothesis tests.
If you are following the A Level course, you will also learn how to:
find confidence intervals when the population variance is unknown, using the -
distribution.

Before you start…


A Level You should know 1 Calculate an unbiased estimate of the variance of a
Mathematics how to calculate population based on this sample:
Student Book 1, unbiased .
Chapter 22 estimates of the
population
variance.

A Level You should know 2 Given that , find .


Mathematics how to conduct
Student Book 2, calculations with
Chapter 21 the normal
distribution.

A Level You should know 3 A sample of objects drawn from a normal


Mathematics how to conduct distribution with a standard deviation of has a
Student Book 2, hypothesis tests mean of . Conduct a two-tailed test at
Chapter 22 with the normal significance to decide if this provides significant
distribution. evidence of a change from a mean of .

Chapter 5 You should know 4 Based on this sample, test to see if there is evidence
how to conduct - that at the significance level:
tests. .

What is the best way of describing an estimate?


If you want to estimate a population parameter, which is better; having a single value that is very unlikely
to be correct, or having a range of values that is very likely to contain the population statistic? The latter is
usually preferable, and is called a confidence interval.

In this chapter you will learn how to construct confidence intervals for the mean in different situations and
see how such intervals can be interpreted.
Section 1: Confidence intervals
A single value calculated from a sample used to estimate a population parameter is called a point
estimate. You are trying to find an interval that has a specified probability of including the true population
value of the statistic you are interested in. This interval is called a confidence interval and the specified
probability is called the confidence level.

For example, given the data , you can calculate the sample mean, which is . However, it is
very unlikely that the mean of the population this sample was drawn from is exactly . You will now
develop a method that will allow you to say with confidence that the population mean is somewhere
between and . This does not mean that there is a probability of that the true mean is between
and , but rather that of confidence intervals constructed from samples like this would contain
the true mean.

To develop the theory, you are going to look at creating confidence levels, which are the default
choice. Suppose you are estimating the population mean, using the sample mean . Initially, you will only

consider random variables drawn from a normal distribution so that , where is the population
mean (the thing you want to find) and is the standard deviation in one observation of . You can find, in
terms of and , a region symmetrical about that has a probability of containing .

2.5% 95% 2.5%


x
Lower µ Upper
bound bound

You can find the -score of the upper bound. Using the symmetry of the situation, you find that of the
distribution is above the upper bound, so the -score is . You can say that:

You can use the fact that :

You can rearrange the inequalities to focus on :

Rewind

You saw in A Level Mathematics Student Book 2, Chapter 21, that is the inverse
normal distribution that tells you the -score that results in the cumulative probability .

Be warned – although this looks like it is a statement about the probability of , in your derivation you
treated as a constant so it is meaningless to talk about a probability of . This statement is still
concerned with the probability distribution of .
So, if the sample mean is , your confidence interval for is:

Tip

The quantity is sometimes referred to as the standard error.

You can generalise this method to other confidence levels. To find a confidence interval you can find the
critical -score geometrically, using the properties of this graph.

c% c%
2 2
x
q Z

50%

From this diagram you can see that the critical -score is the one where there is a probability of of
being below it.

Tip

This process creates a symmetric interval around the sample mean. It is also possible to create a
non-symmetric interval, but that is beyond the scope of this course.

Tip

Some calculators can find these confidence intervals for you.

Key point 6.1

A symmetric confidence interval for the population mean is:

where is the sample mean, is the standard deviation in one observation of and

WORKED EXAMPLE 6.1

The masses of fish in a pond are known to have a normal distribution with a standard deviation
. The mean mass of fish from the pond is found to be .

a Find a confidence interval for the mean mass of all the fish in the pond.
b Guidance from a vet suggests that the pond is an unsuitable environment if the mean mass of the
fish is below . Does your confidence interval suggest that the environment is unsuitable?
a For a confidence interval Use your calculator to find the -
score associated with a
So confidence interval is , which is
confidence interval.
.

b A mean mass of is within the confidence interv You need to consider whether a true
so the confidence interval does not necessarily mean of is consistent with your
confidence interval.
suggest that the environment is unsuitable.

You do not need to know the centre of the interval to find the width of the confidence interval. From Key
point 6.1 you know that the confidence interval goes from . Therefore its width is .

Fast forward

The inference in part b of Worked example 6.1 is effectively a type of hypothesis test. You will
see later in this chapter how you can quantify the significance level when using a confidence
interval to perform a hypothesis test.

Key point 6.2

The width of a confidence interval is .

WORKED EXAMPLE 6.2

The results in a test are known to be normally distributed with a standard deviation of . How
many people need to be tested to find an confidence interval with a width of less than ?
For an confidence interval Find the -score associated with an confidence
interval.

Set up an inequality.

At least people need to be tested.

If the sample size is sufficiently large (greater than ) and you do not know the true variance, you need to
use the unbiased estimate of the variance as a substitute for the true variance.

You can use confidence intervals to conduct hypothesis tests. For example, if you find a confidence
interval you can use it to conduct a significance two-tailed hypothesis test.

WORKED EXAMPLE 6.3

A vet is measuring the masses of a breed of dog . Her data are summarised here:

It can be assumed that the masses follow a normal distribution.


a Find a confidence interval for the mean.
b A textbook claims that the average mass of this breed is . Conduct a hypothesis test at the
significance level to decide if this sample suggests that the textbook figure is incorrect.

a First you need to work


out the sample statistics.
You need to find the
unbiased estimate of the
variance.

You can use the formula


from Key point 6.1 to find
the appropriate -score.

So
You can then use the
expression in Key point
6.1, substituting for .

b ,
The true mean being is consistent with the confidence
interval found, so you do not reject at the significance
level. There is not significant evidence that the textbook is
incorrect.

You must take care not to draw false inferences from confidence intervals. It is important to know the types
of error that can be made, as shown in Worked example 6.4.

WORKED EXAMPLE 6.4

Ramon works out a confidence interval for the population mean as to . He claims that:
a of any observed data will be between and
b the probability that the population mean is between and is

c the median of the population is .

Decide which of these statements, if any, are correct. Justify your answers.

a This is not necessarily true. The confidence interval is for the mean rather than a
single observation. Even if this statement was about the sample mean there would be
variations between samples.
b This is not true. The population mean is not a random variable so you cannot talk
about a probability associated with it.

c This is not necessarily true. The confidence interval will be centred on the sample
mean, which may not equal the population median.

EXERCISE 6A

1 1 Find the -value for these symmetric confidence levels:


a

b .
2 Find the required symmetric confidence interval for the population mean for the summarised data. You
can assume that the data are taken from a normal distribution with known variance.
a i , , ; confidence interval

ii , , ; confidence interval
b i , , ; confidence interval

ii , , ; confidence interval
3 Copy and complete this table. You can assume that the data are taken from a normal distribution with
known variance and that the confidence level is symmetric.
Confidence level Lower bound of Upper bound of
interval interval
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
4 The blood oxygen levels (measured as percentages) of an individual are known to be normally
distributed with a standard deviation of . Based upon six readings, Niamh finds that her blood
oxygen levels are on average .
a Find a symmetric confidence interval for Niamh’s true blood oxygen level.
b A doctor needs to be called if the true mean oxygen level falls below . Does the confidence
interval suggest that the true oxygen level is below ?
5 The birth masses of male babies in a hospital are known to be normally distributed with variance .
a Find a symmetric confidence interval for the average birth mass if a random sample of ten
male babies have an average mass of .

b If average birth masses are below then an investigation must be conducted. Based upon this
confidence interval, should an investigation be conducted?
6 A data set is summarised here:

Find a symmetric confidence interval for the mean, assuming that the data are drawn from a
normal distribution.

7 a A sample of people in a town have an average wage of £ with an unbiased estimate of the
population variance of million. The wages follow a normal distribution. Find a symmetric
confidence interval for the mean wage in the town.
b Is there significant evidence (at significance) that the mean wage in this town is different from
£ ?

8 When a scientist measures the concentration of a solution, the measurement obtained can be
assumed to be a normally distributed random variable with standard deviation .
a He makes independent measurements of the concentration of a particular solution and correctly
calculates the confidence interval for the true value as . Determine the confidence
level of this interval.
b The scientist claims that this means that of sample means will be between and . Is
this a correct interpretation of the confidence interval? Justify your answer.
c He is now given a different solution and is asked to determine a confidence interval for its
concentration. The symmetric confidence interval is required to have a width less than . Find
the minimum number of measurements required.
9 A supermarket wishes to estimate the average amount spent shopping each week by single men. It is
known that the amount spent has a normal distribution with standard deviation € . What is the
smallest sample required so that the margin of error (the difference between the centre of the interval
and the boundary) for an symmetric confidence interval is less than € ?
10 A physicist wishes to find a confidence interval for the mean voltage of some batteries. She therefore
randomly selects batteries and measures their voltages. Based on her results, she obtains the
confidence interval [ ]. The voltages of batteries are known to be normally distributed
with a standard deviation of .
a Find the value of .
b Assuming that the same confidence interval had been obtained from measuring batteries, what
would be its level of confidence?

c A confidence interval for the mean voltage of a different brand of batteries is found to be [
]. Is there significant evidence that the second brand of battery has a higher voltage than
the first brand of battery?
11 a A set of data items produces a confidence interval for the mean of ( ). You can assume
that the data are drawn from a normally distributed population.
Given that , find the confidence level, giving your answer to two significant figures.
b Jasmine wants to test these hypotheses:

Use the given confidence interval to conduct a hypothesis test, stating the significance level.
12 From experience it is known that the variance in the increase between marks in a beginning-of-year
test and an end-of-year test is . A random sample of four students in Mr Jack’s class was selected
and the results in the two tests were recorded.
Alma Brenda Ciaron Dominique
Beginning of year

End of year

a Assuming that the difference can be modelled by a normal distribution with variance , find a
symmetric confidence interval for the mean increase.
b How could the width of the confidence interval be decreased?
c Do these data provide evidence at the significance level that Mr Jack’s class is doing better than
the school average of a -mark increase?
13 Which of these statements are true for symmetric confidence intervals of the mean?
a There is a probability of that the true mean is within the interval.
b If you were to repeat the sampling process times, of the intervals would contain the true
mean.
c Once the interval has been created there is a chance that the next sample mean will be within
the interval.

d On average of intervals created in this way contain the true mean.


e of sample means will fall within this interval.
14 For a given sample, which will be larger; an symmetric confidence interval for the mean or a
symmetric confidence interval for the mean?
Section 2: Confidence intervals for the mean when the population
variance is unknown
In many real-life situations, when finding an estimate for the population mean you do not know the
true population variance – you estimate it from the sample variance, . This means that the statistic

does not follow the normal distribution, but rather the -distribution (as long as follows a

normal distribution). In Section 1 you assumed that when the sample size is large the difference
between the -distribution and the normal distribution is sufficiently small that it can be ignored. In
this section you will look at how you can adapt the theory from Section 1 when sample sizes are small
– less than about .

Rewind

The -distribution and associated calculations were covered in Chapter 5. Remember that
the number of degrees of freedom is given by .

You can follow a similar analysis to the one leading to Key point 6.1 to get a formula for a confidence
interval using a t-distribution when the sample size is small.

Key point 6.3

If the estimated variance is found from the sample and the sample size is small, the
symmetric confidence interval for the population mean is given by:

where is the sample mean and is chosen so that


The sample must be drawn from a normal distribution.

Tip

You can find the value of from some calculators or by using the percentage points
table in the formula book. For example, if you are looking at a symmetric
confidence interval, that means that there is below the upper bound of the
interval so you use the percentage point.

95%

2.5% 2.5%
x
97.5%

WORKED EXAMPLE 6.5

Find a confidence interval for the mean of the data , assuming that the data is
drawn from a normal distribution.

, Find the sample mean and unbiased estimate of the


variance.
Find the number of degrees of freedom.
th percentage point of is
Use tables to find the -score associated with a
symmetric confidence interval when . If there is
within the confidence interval then there is
below the upper bound.

Apply the formula from Key point 6.3.

EXERCISE 6B

1 Find the required symmetric confidence interval for the population mean for these data, some of
which have been summarised. You can assume that the data are taken from a normal
distribution.
a i , , ; confidence interval

ii , , ; confidence interval

b i , ; confidence interval

ii , ; confidence interval

c i ; confidence interval

ii ; confidence interval
2 A garden contains a large number of rose bushes. A random sample of eight bushes was taken
and the heights in cm were measured and the data were summarised as:

,
a State an assumption that is necessary to find a confidence interval for the mean height of
rose bushes.
b Find the sample mean.

c Find an unbiased estimate for the population variance.


d Find an symmetric confidence interval for the mean height of rose bushes in the garden.
3 A sample of three randomly selected students are found to have an unbiased estimate of the
population variance of in the amount of time they watch television each weekday.
Based upon this sample, the symmetric confidence interval for the mean time a student spends
watching television is calculated as . It can be assumed that the times follow a normal
distribution.
a Find the mean time spent watching television.
b Find the confidence level of the interval.

c A newspaper report on this study claims that most students watch between and
hours of television each day. Is this a reasonable conclusion from this confidence interval?
Explain your answer.

4 The random variable is normally distributed with mean . A random sample of observations
is taken on , and it is found that:

A symmetric confidence interval is calculated for this sample.


Find the confidence level for this interval.

5 The lifetime of a printer cartridge, measured in pages, is believed to be approximately normally


distributed. The lifetimes of randomly chosen printer cartridges are measured and the results
are:
A symmetric confidence interval for the mean was found to be .
a Find the value of .
b What is the confidence level of this interval?

c The manufacturer claims that the lifetime of the printer cartridge is at least pages. Is the
confidence interval found consistent with this claim?
6 The times taken for four people to complete a crossword puzzle are measured and the results
are shown in this table.
Person Time (minutes)
John
Diane
David
Jane
a Find a confidence interval for the true population mean, assuming that the times follow a
normal distribution.
b The newspaper says that the average time to complete the crossword is more than
minutes.
i State suitable null and alternative hypotheses for this test.
ii Use your confidence interval from part b i to determine the conclusion to this hypothesis
test at the significance level.
7 The masses of four burgers, in grams, before and after being cooked for one minute, are
measured:
Burger
Before cooking
After cooking

A symmetric confidence interval for the mean mass loss was found to include values from
. It can be assumed that the masses follow a normal distribution.
a Find the value of .

b Find the confidence level of this interval.


8 The temperature of a block of wood minutes after being lifted out of liquid nitrogen is
measured and then the experiment is repeated. The results are and .
a Assuming that the temperatures are normally distributed, find a confidence interval for
the mean temperature of a block of wood minutes after being lifted out of liquid nitrogen.
b A different block of wood is subjected to the same experiment and the results are and
, where . A second confidence interval is created. Prove that the two confidence
intervals overlap for all values of .

Checklist of learning and understanding

A confidence interval for the mean is a range of possible values for the population mean,
along with a confidence level.
If the true population variance is known and the sample mean follows a normal distribution
then the confidence interval takes the form:
where .

The width of the confidence interval is given by .

When carrying out a hypothesis test or finding a confidence interval for the mean, if the
sample size is sufficiently large ( ) and you do not know the true variance, you can use the
unbiased estimate of the variance as a substitute for the true variance.
If the estimated variance is found from the sample and the sample size is small, the
confidence interval for the population mean is given by:

where is chosen so that . The sample must be drawn from a normal


distribution.
Mixed practice 6
1 The mass of a particular breed of dog is known to be normally distributed with variance
. The masses of a random sample of dogs from this breed are found. What is the
smallest value of required to make the confidence interval for the mean mass less
than wide?
Choose from these options.

2 A data set taken from a normal distribution is summarised as:

, ,

a Calculate the unbiased estimate of the variance of these data.

b Find a confidence interval for the mean.

c Conduct a two-tailed test at significance to determine if there is a change from .

3 The masses of bananas are investigated. The masses of a random sample of of these
bananas were measured and the mean was found to be with an unbiased variance of
. It is assumed that the masses follow a normal distribution.

Find a symmetric confidence interval for .

4 The time taken for a mechanic to replace a set of brake pads on a car is recorded. In a
week she changes sets of brake pads and minutes and .
Assuming that the times are normally distributed, calculate a symmetric confidence
interval for the mean time taken for the mechanic to replace a set of brake pads.

5 The pH of a river is believed to be normally distributed with a standard deviation of .


What is the smallest number of samples that should be taken to get a confidence
interval for the mean with a width of less than ?

6 A random sample of four students in a school was selected and the results they got in two
tests were recorded:

Alma Brenda Ciaron Dominique

Beginning of year
End of year

a Find a symmetric confidence interval for the mean increase in marks from the
beginning of year until the end of year, assuming that the differences follow a normal
distribution.

b Hence conduct a test at the significance level to see if the results have changed
between the beginning and end of the year.

7 The random variable is normally distributed with mean and standard deviation .

A random sample of observations of has a mean of .

a Find a confidence interval for .


b It is believed that . Determine whether or not this is consistent with your
confidence interval for .

8 From experience it is known that the variance in the mass decrease during a diet is . A
random sample of four people was selected and their masses before and after their diet
were recorded.

Bobby Sam Francis Alex

Before diet
After diet

a Assuming that the mass loss follows a normal distribution, find a confidence interval
for the mean mass loss during the diet.

b Hence conduct a test at significance to see if the diet results in a change in mass.

9 A sample of eggs are weighed and the masses in grams are:

                 

Assuming that these masses form a random sample from a normal population, calculate:

a unbiased estimates of the mean and variance of this population

b a confidence interval for the mean.

10 a i A confidence interval for a population mean, , is to be constructed. What is the


probability that the interval will not include the value of ?

ii If such confidence intervals are constructed from separate random samples from
the same population, find the probability that at least one of them will not include .

b Jurgen can run metres in a mean time of seconds. His coach changes his
training programme to concentrate on his starting speed. After following the new training
programme, a random sample of of Jurgen’s -metre running times has mean
seconds and standard deviation seconds.

i Assuming Jurgen’s -metre times are normally distributed, construct a


confidence interval for his new mean time to run metres, giving the limits to three
decimal places.

ii Use the confidence limits to decide whether there is significant evidence that the new
training programme has been effective. Justify your decision.

[© AQA 2015]
FOCUS ON … PROOF 2

Proving the expectation and variance of the binomial distribution


In A Level Mathematics Student Book 2, Chapter 21, you used the formulae for the mean and variance of
the binomial distribution:

If , then and .

In this section you will prove these facts.

You need to know the formula for binomial probabilities and the binomial expansion. One part of the proof
also involves differentiation using the chain rule.

Rewind

Refer to A Level Mathematics Student Book 2 for revision on the binomial distribution and on
the chain rule.

QUESTIONS

1 Expand where is a positive integer.


2 Use your result from question 1 to prove that if is the probability of success and is the
probability of failure, then:

3 Explain why .

4 By differentiating with respect to and treating as a constant, prove that .

5 a By writing the binomial coefficient in terms of factorials, explain why .

b Hence prove that .

6 a Show that .

b Hence prove that .


FOCUS ON … PROBLEM SOLVING 2

Investigating confidence intervals


A common misconception is what is meant by the confidence level of a confidence interval. In this section
you will use spreadsheets to simulate the construction of confidence intervals to gain a better
understanding of them. With many statistics problems, the ability to simulate the situation is an extremely
useful tool in getting started. The screenshots show the syntax of some common spreadsheets, although
you might need to adapt this for the program you are using.

The formula for the endpoints of a symmetric confidence interval where the population variance is
known is approximately .

Use a spreadsheet to create random numbers generated from the normal distribution :

Observation
Sample 1 2 3 4 5 6 7 8 9 10
1st 27.62 20.34 22.50 10.06 20.75 18.76 26.51 10.41 =NORMINV(RAND(),20,5)
NORMINV(p robabili ty, mean, standard_dev)

Then find the mean of the sample and use the formula to find the confidence interval:

A B C D E F G H I J K L M N O
1 Observation Confidence interval
2 Sample 1 2 3 4 5 6 7 8 9 10 Mean Lower Upper
3 1st 13.20 26.32 17.06 14.70 19.77 13.47 21.86 21.30 20.27 15.74 18.37 15.27 =L3+1.96*5/SQRT(10)
4

Tip

Some spreadsheets have the option of generating random numbers from a given distribution. If
your spreadsheet does not have this facility, then you can still use a random number generator
which provides random numbers from the rectangular distribution between and ; most
spreadsheets do have this function. You might have to think about why the formula shown then
provides random numbers from the normal distribution; it is not obvious.

Check if the confidence interval does contain the true mean, which was :

L M N O P Q R S
Confidence interval
Mean Lower Upper Check
20.97 17.87 24.07 =IF(AND(M3<20,N3>20),1,0)
IF(logical_test, [value_if_true], [value_if_false])

Then copy this all down to consider samples, all of size . Count how many do contain the true mean.

Confidence interval
Mean Lower Upper Check Counting:
20.80 17.70 23.90 1.00 =SUM( 03:0202)
21.52 18.42 24.62 1.00 SUM(number1, [number 2], …)
21.94 18.84 25.04 1.00

QUESTIONS
QUESTIONS
1 For each sample, can you say with certainty whether or not the true mean is within the
calculated confidence interval?

2 What percentage of the calculated confidence intervals contain the true mean?

3 If, instead of using the true standard deviation, the sample standard deviation is used, then a
-interval is required.

a How does this affect the width of the confidence intervals?


b How does this affect your answer to question 2?
4 Adapt the spreadsheet to create two samples of size from a distribution. For each of
these samples, create a confidence interval. Note whether the two confidence intervals
overlap. Repeat for lots of pairs of samples of size from a distribution. What
percentage of the pairs have overlapping confidence intervals?

5 Repeat the investigation from question , but this time using one sample of size from a
distribution and one sample of size from a distribution.

Tip

The purpose of questions 4 and 5 is to highlight that it is not a good idea to use the overlap of
two confidence intervals to test to see if the mean of two distributions is the same, as the
significance level is not obvious.
FOCUS ON … MODELLING 2

Simulating the -distribution


The normal distribution and the -distribution are very closely related. To get a better feel for their
similarities and differences, this exercise investigates the shapes of these two distributions.

QUESTIONS

1 Use a spreadsheet to create a list of samples of size , taken from the normal distribution
.
2 Find the mean of each sample. Is the mean of all the means of each sample zero?

Tip

In Excel you can create a random number from the distribution using the
syntax “ ” or the function provided by the Data Analysis
toolpak.

3 Find the standard deviation of these samples. Is the mean of the standard deviations of each
sample approximately ?
4 Construct the -score for each sample mean using the formula:

Plot each of these -scores on a histogram. What do you observe?

Tip

If you are using Excel, you might want to use the Data Analysis toolpak to create
the histogram.

5 Construct the -score for each sample mean using the formula:

Plot each of these -scores on a histogram. What do you observe?


6 Repeat questions 1 to 5 using a list of samples of size taken from the
distribution. What do you observe?

Based on this exercise, you should see that for small sample sizes there is a noticeable difference
between -scores and -scores, necessitating the use of the -distribution. However, for larger
sample sizes the differences are small compared to most other sources of uncertainty, so the
normal distribution can be used as an approximation to the -distribution.
CROSS-TOPIC REVIEW EXERCISE 1

The questions in this exercise cover AS Level material only.

1 The discrete random variable can only take the values and . If , find .
Choose from these options.

2 The length of an athlete’s long jump is modelled by a normal distribution with standard
deviation . A sample of jumps is measured. What will be the width (to three
significant figures) of a confidence interval for the mean? Choose from these options.
A

B
C

D
3 A continuous random variable has probability density function defined by

a Sketch the graph of .

b Show that the value of is .

c i Write down the median value of .

ii Calculate the value of the lower quartile of .


[© AQA 2013]
4 The numbers of people studying Mathematics at different levels in a sample of students
from two different schools were recorded.
North Academy South High School
No Maths
Single Maths
Further Maths
a Conduct an appropriate test to show that there is evidence at the significance level
that the level of Mathematics studied depends on the school attended.

b What assumptions are required to make the conclusion of the test in part a valid?
c Jane says that she would be more likely to study Further Mathematics if she attended
North Academy. Is this a valid inference from the data? Justify your answer.

5 The continuous random variable has probability density function given by


for and otherwise.

a Show that .

It is given that .

b Hence find the values of and .


c Find .
d Find the value of .
6 Andrew travels to a meeting. His journey consists of two independent parts; a section by
car and a section by train. The amount of time spent on the car section is modelled by the
random variable and the amount of time spent on the train section is modelled by the
random variable . All times are in minutes.

Based on long experience, Andrew knows that the average time spent on the car section
is minutes with standard deviation minutes, and the average time spent on the train
section is minutes with standard deviation minutes.
a Assuming that there is no waiting time, find the expectation and standard deviation in
Andrew’s total journey time.
b For the meeting Andrew gets paid £ plus £ per hour he spends travelling. Find the
expectation and standard deviation in the amount Andrew gets paid.

7 For the year 2014, this table summarises the masses, kilograms, of a random sample of
women residing in a particular city who are aged between years and years.
Mass ( ) Number of women

Total
a Calculate estimates of the mean and the standard deviation of these masses.

b i  Construct a confidence interval for the mean mass of women residing in the city,
who are aged between years and years.

ii Hence comment on a claim that the mean mass of women residing in the city, who
are aged between years and years has increased from that of in 1965.
[© AQA 2014]
8 Two independent random variables have normal distributions, and
.
a State the distribution of , including any necessary parameters.
b Find .

9 At a remote hospital, in an area where there are many venomous snakes, the number of
patients during one week requiring treatment after a venomous snake bite may be
modelled by a Poisson distribution with mean .
a For this hospital, find the probability that:
i no more than patient requires treatment after a venomous snake bite during a
particular week

ii at least patients require treatment after a venomous snake bite during a particular
period of weeks

iii more than patients but fewer than patients require treatment after a venomous
snake bite during a particular period of weeks.

b Each patient who has been bitten by a venomous snake is treated with a single dose of
an anti-venom which is effective against the venoms of all the snakes common in that
area.
The anti-venom is expensive and has a limited shelf life, so that a delivery of fresh anti-
venom is made at -week intervals.
The hospital stores just enough anti-venom so that the probability that it runs out of
anti-venom before the next delivery is less than per cent.
Quoting probabilities to justify your answer, state how many doses of anti-venom the
hospital should have in its store immediately after a delivery of fresh anti-venom.
[© AQA 2015]
10 Dana, a researcher in the USA, investigated game-related stress for sports officials in
inter-school baseball, basketball and soccer.

The officials involved in this investigation were categorised as either adopting an


approach (AP) coping style or an avoidance (AV) coping style when dealing with game-
related stress.

Table 1 summarises the results of this investigation.


Table 1
Coping style
AP AV
Baseball
Sport Basketball
Soccer

You may assume that the officials involved in this investigation represent a random
sample.
a Use the information in Table 1 to complete the contingency table, Table 2, with
frequencies that could be analysed to investigate whether the coping style used by
officials is associated with the sport involved.
Table 2

Coping style
AP AV
Baseball
Sport Basketball
Soccer
b Examine, using the level of significance, whether the coping style used by officials
is associated with the sport involved.

c By comparing observed and expected frequencies, identify, in context, two important


facts concerning coping style and sport involved.
[© AQA 2014, adapted]
11 The probability distribution of a discrete random variable is given by:

a Find in terms of .

b Show that .
c What is the largest possible value of the variance of ?
12 A sample of size is drawn from a normally distributed population with standard deviation
. A confidence interval for the mean was correctly calculated to be .
Find:
a the unbiased estimate of the population mean
b the value of .
13 In a diamond mine, the number of diamonds found per cubic metre of material mined is
known to follow a Poisson distribution with mean .
a If , find the probability of finding:
i diamonds in of mining

ii diamond in each of two sections of mining.


b To be economically viable a diamond mine needs more than diamonds per cubic
metre. To survey a potential new mine the owner examines a sample.
i State appropriate null and alternative hypotheses in terms of .

ii The survey results show that the sample contains diamonds. Conduct a hypothesis
test at the significance level.

iii What is the probability of a type I error in this context?


iv Why might the mine owner choose to use a significance level rather than a
significance level when conducting this test?

14 A receptionist answers phone calls for a company.


a State two conditions needed for the number of phone calls answered in an hour to be
modelled by the Poisson distribution.

b Explain why these conditions are unlikely to be met in this situation.


For a certain period of time you can now assume that the number of phone calls answered
in an hour can indeed be modelled by the distribution.

c Find the standard deviation in the number of phone calls answered.


d Find the probability that fewer than phone calls are answered in a -hour shift.
e Find the longest time for which the probability that no phone calls are answered is at
least .
15 The continuous random variable has probability density function given by
for and otherwise. Find:
a the value of
b
c

e the median of .

16 The volume of lemonade in a can produced at a factory follows a normal distribution with
standard deviation . A quality control test takes a random, independent sample of
cans. The factory manager claims that the cans should, on average, contain .
a If the true mean is , find the probability that exactly cans contain less than
.
b Jane decides that if or more cans in the sample contain less than she will
reject the batch.
i State in this context what is meant by a type I error.
ii Find the probability of a type I error in Jane’s test.
c The mean of the sample is found to be .
i Construct a confidence interval for the true mean of the cans, giving your
answer to decimal place.

ii Phillip uses the confidence interval from part c i to determine whether the cans do
come from a population with a mean of less than . What conclusion does Phillip
draw and what is the significance level of his conclusion?
17 Members of a library may borrow up to books. Past experience has shown that the
number of books borrowed, , follows the distribution shown in the table.

a Find the probability that a member borrows more than books.

b Assume that the numbers of books borrowed by two particular members are
independent. Find the probability that one of these members borrows more than
books and that the other borrows fewer than books.
c Show that the mean of is , and calculate the variance of .
d One of the library staff notices that the values of the mean and the variance of are
similar and suggests that a Poisson distribution could be used to model .
Without further calculations, give two reasons why a Poisson distribution would not be
suitable to model .
e The library introduces a fee of pence for each book borrowed.
Assuming that the probabilities do not change, calculate:
i the mean amount that will be paid by a member
ii the standard deviation of the amount that will be paid by a member.
[© AQA 2016]
CROSS-TOPIC REVIEW EXERCISE 2

1 It is assumed that people arrive in a queue randomly and at a constant average rate of
per minute. The random variable is the time, in minutes, between people arriving in
the queue.
a State the distribution of , including any parameters.

b Find the probability that there is a gap of between and minutes.


c What is the expected standard deviation of ?

2 In a particular town, a survey was conducted on a sample of residents aged years


to years. The survey questioned these residents to discover the age at which they
had left full-time education and the greatest rate of income tax that they were paying at
the time of the survey.
The summarised data obtained from the survey are shown in the table.
Age when leaving education (years)
Greatest rate of income tax paid
or less or or more Total
Zero

Basic

Higher

Total
a Use a -test, at the level of significance, to investigate whether there is an
association between age when leaving education and greatest rate of income tax
paid.

b It is believed that residents of this town who had left education at a later age were
more likely to be paying the higher rate of income tax. Comment on this belief.

[© AQA 2015]

3 A digital thermometer measures temperatures in degrees Celsius. The thermometer


rounds down the actual temperature to one decimal place, so that, for example,
and are both shown as . The error, , resulting from this rounding down can
be modelled by a rectangular distribution with the following probability density function.

a State the value of .

b Find the probability that the error resulting from this rounding down is greater than
.
c i State the value for .

ii Use integration to find the value for .

iii Hence find the value for the standard deviation of .


[© AQA 2016]
4 Julie, a driving instructor, believes that the first-time performances of her students in
their driving tests are associated with their ages.

Julie’s records of her students’ first-time performances in their driving tests are shown in
the table.
Age Pass Fail
a Use a -test at the level of significance to investigate Julie’s belief.

b Interpret your result in part a as it relates to the age group.


[© AQA 2010]
5 The random variable represents the number of soft drinks Manuel purchases while
eating a burger. Manuel models using the distribution.
a Find the standard deviation of .

is the amount Manuel spends on his meal in pounds. If the burger costs £ and each
drink costs £ , find:

b i

ii .

6 The discrete random variable follows the distribution and satisfies


.
a Find .
b In three independent observations of , find the probability that fewer than two have
.
7 The random variable measures the number of minutes Cauchy spends on a mobile
phone each month. The mean of is with standard deviation .
Cauchy is on a contract with a fixed charge of £ each month, then per minute.
a Find the mean and the variance in , the amount of money in pounds that Cauchy
spends each month on his mobile phone.

b Cauchy has a budget of £ per month for his phone. Anything that he does not
spend on his phone he saves. Find the mean and variance in , the amount saved in
pounds each month.

8 South Riding Alarms (SRA) maintains household burglar-alarm systems. The company
aims to carry out an annual service of a system in a mean time of minutes.
Technicians who carry out an annual service must record the times at which they start
and finish the service.
a Gary is employed as a technician by SRA and his manager, Rajul, calculates the times
taken for annual services carried out by Gary. The results, in minutes, are as
follows:

Assume that these times may be regarded as a random sample from a normal
distribution.
Carry out a hypothesis test, at the significance level, to examine whether the
mean time for an annual service carried out by Gary is minutes.
b Rajul suspects that Gary may be taking longer than minutes on average to carry
out an annual service. Rajul therefore calculates the times taken for annual
services carried out by Gary.
Assume that these times may also be regarded as a random sample from a normal
distribution but with a standard deviation of minutes.
Find the highest value of the sample mean which would not support Rajul’s suspicion
at the significance level. Give your answer to two decimal places.
[© AQA 2014]
9 The time taken to complete a test is modelled by the normal distribution. The average
score on this test is with standard deviation . A sample of students in a school
take the test and if their average is above it will be decided that the school is doing
better than the rest of the population.
a Explain why the normal distribution is a plausible model for the test results.
b Assuming that the standard deviation is still , find the significance level of this
test.

c If the true mean of students in the school is , find the power of the test.
d If the true mean of the students was higher than , would the power of the test be
higher or lower? Explain your answer. No further calculations are required.

10 The time in seconds between errors in a piano performance is modelled by an


exponential distribution, exp .
a The probability that there is an error in any seconds is Find the value of .

b Find the probability that there is no error in any seconds.


c Find the expected time until the first error.
11 Judith, the village postmistress, believes that, since moving the post office counter into
the local pharmacy, the mean daily number of customers that she serves has increased
from .
In order to investigate her belief, she counts the number of customers that she serves
on randomly selected days, with the following results.

Stating a necessary distributional assumption, test Judith’s belief at the level of


significance.
[© AQA 2010]
12 It is claimed that a new drug is effective in the prevention of sickness in holiday-makers.
A sample of holiday-makers was surveyed, with the following results.
Sickness No sickness Total
Drug taken
No drug taken
Total

Assuming that the holiday-makers are a random sample, use a test, at the
level of significance, to investigate the claim.

[© AQA, 2010]
13 The discrete random variable follows the distribution and satisfies .
a Find the value of .

b Show that .
14 Lorraine bought a new golf club. She then practised with this club by using it to hit golf
balls on a golf range.
After several such practice sessions, she believed that there had been no change from
metres in the mean distance that she had achieved when using her old club.

To investigate this belief, she measured, at her next practice session, the distance,
metres, of each of a random sample of shots with her new club. Her results gave

Investigate Lorraine’s belief at the level of significance, stating any assumption that
you make.
[© AQA 2010]

15 Wellgrove village has a main road running through it that has a speed limit. The
villagers were concerned that many vehicles travelled too fast through the village, and
so they set up a device for measuring the speed of vehicles on this main road. This
device indicated that the mean speed of vehicles travelling through Wellgrove was
.

In an attempt to reduce the mean speed of vehicles travelling through Wellgrove, life-
size photographs of a police officer were erected next to the road on the approaches to
the village. The speed, , of a sample of vehicles was then measured and the
following data obtained.

a State an assumption that must be made about the sample in order to carry out a
hypothesis test to investigate whether the desired reduction in mean speed had
occurred.
b Given that the assumption that you stated in part a is valid, carry out such a test,
using the level of significance.
c Explain, in the context of this question, the meaning of:
i a type I error
ii a type II error.
[© AQA 2015]
16 The discrete random variable satisfies this distribution:

a If , find the possible values of .

b For the larger value of , find the value of  .

17 Long-term observations suggest that the number of cars passing the school gates
follows a Poisson distribution with the mean of cars per minute. Following the opening
of a new supermarket at the end of the road, the head teacher wishes to find out
whether this mean has increased. She sends a group of students to count the cars
passing the school gates during a -minute interval.

Let be the number of cars passing the school gates in a -minute interval, so that
.
a Write down suitable null and alternative hypotheses.

b Find the critical region for the test at the significance level.
c The students counted cars. State the conclusion of the test.
In reality, the mean number of cars has increased to per minute.

d Find the probability that the test results in a type II error.


18 Groups of visitors arrive at a museum randomly, at a constant average rate of per
hour. The director wants to find out whether this rate is smaller on rainy days. She
randomly selects a rainy day and records the number of groups arriving over a -hour
period. She then conducts a hypothesis test, using these hypotheses:
,
where is the population mean number of groups arriving at the museum in a -hour
period.
a Write down the value of .
The manager decides that she will reject the null hypothesis if the number of visitor
groups arriving in the -hour period is less than or equal to .
b Find the probability of a type I error in this test.
The number of visitor groups in fact decreases to per hour on a rainy day.

c Find the power of the test.


19 A physicist measures a quantity associated with the spin of an electron, . She takes
independent readings that have mean . She calculates an unbiased estimate of
the variance as .
She assumes that this quantity follows a normal distribution.
a Find a confidence interval for the true mean of , giving your answer to a
suitable level of accuracy.

b The random variable is defined as . Write down a confidence


interval for the mean of .
c A theory predicts that the true value of is exactly . Is the confidence interval found
in part a consistent with the theory?
d The physicist repeats her experiment three times. Each experiment consists of
independent readings followed by finding a confidence interval.
i What is the probability that at least two of these confidence intervals do contain
the true mean?

ii What is the probability that all of these confidence intervals are above the true
mean?
AS LEVEL PRACTICE PAPER

45 minutes, 40 marks

1 The number of beetles in a forest can be modelled by a Poisson distribution with parameter
beetles per square metre. Find the probability, to three significant figures, that in a area
there are fewer than beetles.
Choose from these options.
A

D [1 mark]
2 The discrete random variable has a probability distribution given by for
and otherwise. Find .
Choose from these options.
A
B

C
D [1 mark]

3 The discrete random variable has this distribution:


a Find the value of . [1 mark]

b Find . [1 mark]
c Find the standard deviation of . [4 marks]

4 Sarah models the number of buses arriving at a bus stop using a Poisson distribution. is the
number of Route buses arriving in an hour and is the number of Route buses arriving in
an hour. Sarah models these as being independent with and .
a Given that , state in context an interpretation of the variable and write down its
distribution, including any parameters. [2 marks]
b Find the probability that or fewer buses arrive in an hour. [2 marks]

c Give one reason why the assumption that and are independent is unlikely to be the
case. [1 mark]

To check her model, Sarah counts the buses arriving in randomly selected hours.

d Use suitable calculations to determine if a Poisson model is feasible. [4 marks]

5 The continuous random variable has probability density function given by for
and otherwise.
a Find the value of . [3 marks]

b Show that median of . [5 marks]

c Find . [3 marks]

6 This table shows the results of a survey in a school about weekly hours spent watching TV.
Test at the significance level whether school year and hours spent watching TV are
independent.
School year

Hours

[5 marks]

7 a The number of leaks in a pipe is known by a water company to follow a Poisson distribution
with mean leaks per . A new contractor claims that they can reduce the number of
leaks. After they have maintained the pipes for some time, a random stretch of pipe is
investigated and found to have leaks. Test the contractor’s claim at the significance
level. [5 marks]

b It is decided that if three or fewer leaks are found in , then the contractor has reduced
the number of leaks. What is the probability of a type I error? [2 marks]
A LEVEL PRACTICE PAPER

60 minutes, 50 marks

1 The number of beta particles emitted by a radioactive isotope follows a Poisson


distribution. On average, beta particles are emitted each second. What is the
probability (to three significant figures) that the second beta particle is emitted between
and seconds after the first beta particle is observed?

Choose from these options.


A

D [1 mark]

2 The discrete random variable has a probability distribution given by for


and otherwise. Find the median of .

Choose from these options.


A

D [1 mark]

3 The discrete random variable follows the distribution shown.

a Find . [1 mark]
b Find . [2 marks]

c Write down the value of . [1 mark]


4 The contingency table shows information about whether a random sample of people
have music lessons, and their gender.
Music lessons No music lessons
Female
Male

a State the null and alternative hypotheses when conducting a chi-squared test for
independence. [2 marks]

b Write down the number of degrees of freedom in this test. [1 mark]


c Conduct a chi-squared test at the significance level to see if there is a link between
gender and choice of lessons. [5 marks]

5 A continuous random variable has probability density function given by

a Find the exact value of . [3 marks]

b Find and , giving your answers to three significant figures. [4 marks]


c Find the standard deviation of   , giving your answer to three significant figures.
[4 marks]
6 A researcher is testing if a new swimming technique is more effective. She
knows the average time of swimmers in her club using the old technique is
seconds. After training swimmers with the new technique she times them over
and summarises their times in seconds:

Lower times are considered better in swimming.


a Show that the unbiased estimate of the variance is to two decimal places.

b Write down appropriate null and alternative hypotheses to test if the new [2 marks]
swimming technique is effective. [2 marks]

c Write down the number of degrees of freedom in the test. [1 mark]

d Investigate, using the significance level, whether the new technique improved the
mean time. [4 marks]

e State one assumption required for your test to be valid. Comment on how reasonable
the assumption is in this context. [2 marks]
7 The number of phone calls received by an IT helpline is known to follow a Poisson
distribution. It is thought to receive a mean of phone calls per hour.

A change to the IT system is designed to encourage fewer phone calls to the helpline. If
there are phone calls or fewer in a -hour period, the change will be deemed successful.
a Find the probability of a type I error in this process. [3 marks]

b In reality the number of phone calls was per hour. Find the probability of a type II
error. [3 marks]
8 When a scientist records the volume of acid required to neutralise a solution she records
her results to the nearest millilitre. For example, if she records a volume of , she
believes that the true volume required is somewhere in between and with all
possibilities equally likely.
The error, , is a random variable defined as the true volume of acid required to
neutralise the solution minus the recorded volume.
a State an appropriate distribution to model , including its parameters. [2 marks]
b Find the probability that the magnitude of the error, , is less than . [1 mark]

c Find the probability that in two independent observations the magnitude of the error is
less than . [2 marks]
d Hence find the probability density function of the random variable , the maximum
magnitude of the error in two observations. [3 marks]
FORMULAE

Probability

Standard deviation

Discrete distributions
Distribution of Mean Variance

Binomial

Poisson

Sampling distributions
For a random sample of independent observations from a distribution having mean and
variance :

For a random sample of observations from :

Distribution-free (non-parametric) tests

Contingency tables: is approximately distributed as

TABLE 1 Percentage points of the student’s -distribution

The table gives the values of satisfying , where is a random variable having the student’s
-distribution with degrees of freedom.

0 x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
45
50
55
60
65
70
75
80
85
90
95
100
125
150
200

TABLE 2 Percentage points of the distribution

The table gives the values of satisfying , where is a random variable having the
distribution with degrees of freedom.

0 x

1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
45 45
50 50
55 55
60 60
65 65
70 70
75 75
80 80
85 85
90 90
95 95
100 100
Answers
1 Discrete random variables
BEFORE YOU START
1

WORK IT OUT 1.1


Solution B is correct.

EXERCISE 1A
1 Answers are given to where appropriate.
a i

ii

b i

ii

c i

ii

d i

ii

2 a Proof.

3 a

4 a Proof.

5 a

c
6

7 a

8
9 a

10 a Profit £

Probability

b £
11 a
b i

ii Proof.
12 a

b
c
d i

ii

EXERCISE 1B
1 a i

ii
b i

ii
c i

ii
d i

ii

e i

ii
2 a i

ii
b i

ii
c i

ii
d i

ii

7 a

e
8 a

9 Proof.

10 Any value allowed, there is no upper limit.

EXERCISE 1C
1 a i

ii

b i

ii

4 a Proof;

9 Proof.
10 Proof.

MIXED PRACTICE 1
1 C

2 C
3 a

b
4 a

e
5 a   

6 a

8
9 a

10 a

d
11 a

12
13 a

14
15 a

b
16 a

b i

ii Proof.

iii
17 a i Proof.

ii

iii Proof.

iv

b
2 Poisson distribution
BEFORE YOU START
1

4 Do not reject .

WORK IT OUT 2.1


Solution A is correct.

EXERCISE 2A

In this exercise answers are given to , where appropriate.


1 a i

ii
b i

ii

c i

ii

2 a i

ii
b i

ii

c i

ii
d i

ii

e i

ii

4
5 a

6 a

7 a
b

8 a

9 a

10

11 a

c There are alternative ways to get emails in a week other than every day.

12 a

c
13 a

14

15 a Proof.

WORK IT OUT 2.2


Solution B is correct.

EXERCISE 2B
In this exercise answers are given to , where appropriate.

1 a i Do not reject -value

ii Do not reject -value

b i Reject -value

ii Reject -value
c i Do not reject -value

ii Do not reject -value


d i Reject -value

ii Reject -value
2 a i

ii
b i
ii
c i

ii

3 Reject -value
4 a -value . Do not reject .

b Rate might be different at different times. Cars might not be independent – cars might
be travelling together.
5 a

b Do not reject -value


6 a

c Reject -value

7 a Constant rate over the day. Bees arrive independently.

b Reject -value .

8 Do not reject -value


9 a Do not reject -value

b Reject -value

10

MIXED PRACTICE 2
In this exercise answers are given to , where appropriate.
1 C

2 C

3 a .

c
4 a Independent events. Constant rate of success.

c Reject -value .
5 a

6 a

b
7 a

c Do not reject -value

8 a
b
9 a

b
10 a

b
11 a

c
12 a

g
13 a

b
14 a

15 Do not reject -value


16 a The average rate must be constant. However, you might expect it to vary over different times
of the day and with different weather conditions. Birds must arrive independently, but they
might come in flocks.

d Do not reject

17 -value

18 a

d
19 a i

ii
b i

ii
c

20
21 a i

ii

iii

b
22 a i

ii

c i

ii The coins buried in a hoard are no longer independent. The Poisson assumption requires
independence, so brooches are more likely to be modelled by a Poisson distribution.
3 Chi-squared tests
BEFORE YOU START
1 Yes; the -value is

EXERCISE 3A
1 a i

ii

b i

ii

2 a : Physics grade and Mathematics course are independent;

: Physics grade and Mathematics course are dependent.

b or or or

Further Maths

Maths AS or A

No Maths

d Reject . Significant evidence of association; students studying a higher level of Mathematics


tend to do better in physics.

3 critical value . Reject . There is significant evidence that the


reading level and fiction/non-fiction are dependent.

4 a Early On time Late Total

Walk

Car

Other
Total

c Reject . Significant evidence that lateness depends on the mode of transport.

d He must assume that his data are representative; for example, it was not a day with unusual
traffic. He must also assume that the respondents were independent; for example, not lots of
students from the same bus.

5 Significant evidence of association; . People who visit more often tend to spend
more money on each visit.

6 No significant dependency; . The drug does not appear to be effective in


increasing the speed of recovery.

7 a Male Female
£
£
£
£
£

b No significant evidence of dependency;

8 a Proof.

9 a Proof.

b Proof; .

d There will be random variation within the sample.

10 a First factor
Total

A 12 12 1 5
Second factor B 10 1 4 5
C 3 2 25 20
Total

b Proof.
11 a Proof.

b Proof.

WORK IT OUT 3.1


Solution C is correct.

EXERCISE 3B
1 a i

ii

b i

ii

2 Proof;

3 Independent; .

4 Independent; . Number of books does not seem to differ significantly between


rural and urban libraries.

5 There is significant evidence of an association; . However, this does not establish


causality.
6 a Proof; . A higher percentage of men are admitted . This appears
to be evidence of bias.

b . Two out of six departments have a higher proportion of men accepted.

7 You cannot do a calculation based on two factors being dependent unless you know exactly what
that dependency is.

MIXED PRACTICE 3
1 D

2 B

3 ; no significant evidence of association.


4 a

c Do not reject . No significant evidence that hair colour and eye colour are dependent.

5 ; do not reject . There is no evidence of association.

6 A
7 a ; significant evidence of association.
b i There are more of them.

ii Largest contribution to .
8 a ; Fiona’s belief is justified.

b Fewer than expected gained Class . More than expected gained Class 2ii.
9 a i ; significant evidence of association.

ii Accidents involving changing lane to the left are less likely, and accidents involving changing
lane to the right are more likely than expected for foreign registered HGVs.
b i Expected values Prosecution resulted No prosecution

years or under
Over years

ii ; it appears that they are independent. There is no significant evidence that


prosecution is dependent on age.

10 ; there is significant evidence of an association between age and department.


11 a Proof.

b Proof;

d The sample follows the long-term trend.


12 a ; development of Type 2 diabetes seems to be dependent on average level of
weekly alcohol consumption.

b Not generalisable to the whole population.

The ‘less than ’ category also has a large contribution.


c i No change, .

ii

iii No change.
4 Continuous distributions
BEFORE YOU START
1

EXERCISE 4A
In this exercise answers are given to s.f., where appropriate.

1 a i

ii

b i

ii
c i

ii

d i

ii
e i

ii

f i

ii

g i

ii

h i

ii

2 a i

ii
b i

ii

c i

ii

3 a i
ii
b i

ii

c i

ii

4 a

8 a

10

11 Proof;

EXERCISE 4B
1 a i ; ; ;

ii ; ; ;

b i ; ; ;

ii ; ; ;

c i ; ; ;

ii ; ; ;

d i ; ; ;

ii ; ; ;

2 a i

ii

b i

ii

3 a

4 a
b

5 a Proof.

6 a Proof.

EXERCISE 4C
1 a i

ii
b i

ii
c i

ii

d i

ii

e i

ii
2 a i

ii
b i

ii
c i

ii
d i

ii
e i

ii
3 a i

ii
b i

ii
c i

ii
d i

ii
4 a £

b £

5
6 a

b
7 a

b
8 a

9 a if is odd, if is even.

EXERCISE 4D
1 a i

ii

b i

ii
c i

ii

d i

ii
e i

ii
2 a

d
3 a Proof.

4 ;

5 ;

6 ;

7 minutes; minutes

8 is twice the mass of a single gerbil; is the sum of the masses of two different gerbils.
EXERCISE 4E
1 a i

ii
b i

ii
c i

ii

d i

ii
e i

ii
f i

ii
2 a

b
3 a

c
4 a

b
5 a

b
6 a

c
7 a

8
9 a

10 a

b Assumes that the rainfall each day is independent of the rainfall on other days. This is unlikely
to be the case.
11 a
b i

ii
iii

EXERCISE 4F

1 a i ;

ii

b i

ii

2 a i

ii

b i

ii

4 a

b , otherwise.

EXERCISE 4G
1 a f(x)

5k

0 x
0 5 10

c
2 a

3 a Proof.

c
d

4 a

b f(w)

0 w
0 3 7

d
5 a Proof.

6 a

7 a f(x)

0 x
0 π 5π
– –
2 8

b Proof.

EXERCISE 4H
In this exercise answers are given to 3 s.f., where appropriate.
1 a i

ii

b i

ii

c i

ii

d i

ii

e i
i

2 a i

ii
b i

ii

3 a

5
6 a Proof.

b Proof; .

EXERCISE 4I
In this exercise answers are given to , where appropriate.
1 a i

ii
b i

ii
c i

ii
2 a i

ii
b i ;

ii
3
4 a

8 Proof; it equals

9 a Exponential;

c Proof.

10 Proof.
a

c Proof.

d Proof.

EXERCISE 4J
1 a i

ii
b i

ii

c i ;

ii ;

d i ;

ii ;

2 a

3 a

4 a
b

5 a

6 a

7 Proof;

8 a

e . Rounding causes a slight underestimate of the true mean time.

MIXED PRACTICE 4
1

2
3 a

c
4 a

5 a £

b £

c £

6 a

b i

ii

7 a
b

9 a

10 a

d
11 a

c Probably not. Especially in the situation in part b it is likely that when Alice is finished Hassan
might try to speed up.

12

13

14
15 a

16 a

d Proof.

e
17 a

c
18 a Proof.

b
c

19 a

20

21

22 a

b
23 a Proof.

d
e i

ii
24 a f(x)
3

10 x + 7
y =–
40

1

5

x
O 1 5

c Proof.
d i

ii Proof.

e
25 a f(x)
3

32

x
O 1 11

2

b i-ii Proof.

c i

ii
d
Focus on … Proof 1
Proof 6

1 Theorem 2.

2 Theorem 3.

3 Theorem 5.

4 Properties of sums.

5 Theorem 4 and Theorem 1

6 Theorem 1.

7 Theorem 4.

1–3  Proof.
Focus on … Problem solving 1
1 0.613 or 0.168

2 0.199 or 0.416

3 8

4 a, b Proof.

c 18
Focus on … Modelling 1
1 Not very appropriate. Rate might not be constant in every part of the ocean. The presence of fish might
not be independent.

2 Not appropriate. Not a random process.

3 This is well modelled by a Poisson distribution.

4 Not appropriate. This is not a number of events, and the rate might not be constant throughout the day.
Buses might not be independent.

5 The rate might not be constant, but the Poisson tends to work quite well in these situations.

6 Not appropriate. The number of fish being caught is sufficient that it might have a significant effect on
the number of fish remaining in the pond.

7 This is well modelled by the Poisson distribution (and indeed is used in a derivation of the chi-squared
statistic).

8 This is well modelled by the Poisson distribution.


5 Further hypothesis testing
BEFORE YOU START
1

2 Reject

3 Do not reject

5 Do not reject .

EXERCISE 5A
1 a i Reject

ii Reject
b i Do not reject

ii Do not reject

c i Do not reject

ii Do not reject
2 a

b Reject

c True variance unknown. Assume that times are normally distributed.


3 a

b Do not reject .

4 Reject

5 a

b Do not reject
6 a

c Reject .

d Assume that they are normally distributed.

7 a

b Do not reject .

c i

ii Reject .
8 a

EXERCISE 5B
1 a i

ii
b i

ii
c i

ii

2 a i

ii

b i

ii
c i

ii

3 a i

ii

b i

ii

4 a i

ii

b i

ii
c i

ii
5 a i

ii
b i

ii
c i

ii

6 a i

ii

b i

ii
c i

ii

7 Decreases the risk of a type II error, but increases the risk of a type I error.

8 a : Coin is fair, : coin is biased.

b Type I error.

9 a Claiming that there is correlation when none really exists.


b Not recognising correlation when there is underlying correlation.
10 a i Claiming that the dice is biased when it is not.

ii Claiming that the dice is not biased when it is.

b For example: Roll the dice more times, look for more than sixes, consider other numbers,
do a chi-squared test.

11 a

b . This is very small and requires extreme evidence before change is found. This does not
seem to be required in this situation.

12 a

c
13 a

b Do not reject . There is not enough evidence that Dhalia’s eggs are heavier.

e
14 a

d Proof.
15 a i

ii
b i

ii

16

MIXED PRACTICE 5
1 C

2 A
3 a

c Do not reject

d Assume that the data are drawn from a normal distribution.


4 a Not rejecting when it is false.

d Increase the sample size.


5 a

c
6 a

d Reject .
7 a

c
8 a Proof;

b . no significant evidence that mean journey time is minutes.


9 a . no significant evidence to doubt that the mean is .

b . significant evidence that the mean is not equal to .


c i Neither. Risk of a type I error is regardless of sample size.

ii Larger sample size leads to a smaller risk of a type II error.


10 a . evidence of association between method of receiving information and
outcome.

b Type I error.
Chapter 6 Confidence intervals
BEFORE YOU START
1

3 Do not reject

4 No significant evidence.

EXERCISE 6A
1 a

b
2 a i

ii

b i

ii

3 Confidence Lower bound of Upper bound of


level interval interval
a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
4 a

b It is plausible that the true oxygen level is .


5 a

b Yes

7 a

b No significant evidence of a difference in the mean wage.


8 a

b No. This is a confidence interval for the population mean, not sample means.

10 a

b
c No, since the confidence intervals overlap (although it is quite difficult to find the significance
level).

11 a

b Reject at the significance level.

12 a

b Increase the sample size.

c Yes

13 a False.

b False.

c False.

d True.

e False.

14

EXERCISE 6B
1 a i

ii
b i

ii
c i

ii
2 a Assume heights are normally distributed.

3 a

c No. The confidence interval is for the mean value for an individual. The sample is too small for
meaningful generalisations.

4
5 a

c Yes
6 a

b i

ii Do not reject .
7 a
b
8 a

b Proof.

MIXED PRACTICE 6
1 D
2 a

c Do not reject

5
6 a

b Significant evidence that the results are different.


7 a

b No. the given probability suggests that which does not fall in the confidence interval.

8 a

b Do not reject

9 a

10 a i

ii
b i

ii New programme seems to have been effective.


Focus on … Proof 2
1 (n0)pn+(n1)pn−1q+(n2)pn−2q2…+(nn)qn

2–3 Proof.

4 Proof.

5–6 Proof.
Focus on … Problem solving 2
1 Yes. No probability is involved.

2 About 95%.

3 a They tend to be wider.

b No change.

4 About 99.4%.

5 About 30%.
Focus on … Modelling 2
1 Investigation.

2 It should be close to zero.

3 No, it should be about 0.7.

4 It looks a lot like a normal distribution.

5 The shape looks like a normal distribution, but it is much wider – it extends to t-scores above 3 and
below −3.

6 The standard deviation is much closer to 1 and the t-scores histogram is very similar to the z-scores
histogram.
Cross-topic review exercise 1
1 B

2 D

3 a f(x)
9k

x
O 3 4

b Proof.

c i

ii
4 a ; degrees of freedom .

b Random, representative sample from each school.

c No. Just because there is dependency does not mean that there is causality.
5 a Proof.

6 a minutes; minutes

b £ £

7 a Mean: ; s.d.:
b i

ii There is reason to doubt the claim.

8 a

9 a i

ii

iii

b
10 a AP AV
Baseball
Basketball
Soccer

b ; number of degrees of freedom ; significant evidence that coping strategy


is associated with sport involved.

c Soccer officials are far less likely than expected to use an AV coping style. Baseball officials are far
more likely than expected to use an AV coping style.
11 a

b Proof.

12 a

13 a i

ii

b i

ii Do not reject null hypothesis.

iii

iv Making a type II error, rejecting a genuine opportunity, might turn out to be very costly. Further
tests can always be done to be more certain.

14 a Phone calls are independent of each other, there is a constant average rate of phone calls.

b For example: The same customer might call back, breaking independence. The rate during office
hours might be different from the rate during the night.

c ( )

e hours ( s.f.).

15 a

e
16 a

b i or more cans containing less than even though the mean is actually .

ii

c i

ii Significant evidence that the cans contain less than on average; significance level is .

17 a

c Proof;

d No probability of books borrowed and no probability of more than books borrowed.


e i

ii
Cross-topic review exercise 2
1 a

2 a ; significant evidence of association.

b Belief is supported at the level of significance.

3 a

c i

ii

iii

4 evidence to support Julie’s belief at significance level.

More students than expected in the age group pass their test first time.

5 a
b i

ii

6 a

b
7 a

8 a ; insufficient evidence to reject null hypothesis.

b
9 a Most students will be close to the average, with fewer and fewer students getting scores as you
move further from the mean.

d Power would increase because the test will be more likely to pick up the difference from .

10 a

11 ; sufficient evidence to support Judith’s belief. Assumption that the population is normally
distributed.

12 no evidence at significance to support the claim that the drug is effective


against the sickness.
13 a
b Proof.

14 . Evidence to support Lorraine’s belief. Assume that the distances follow a normal
distribution.

15 a Random sample.

b ; significant evidence that mean speed has reduced.

c i Concluding that the mean speed has reduced when in fact it has not.

ii Concluding that the mean speed is still when in fact it has reduced.

16 a

17 a

c Sufficient evidence that the mean number of cars has increased.

d
18 a

19 a

c No

d i

ii
AS Level Practice Paper
1 C

2 C

3 a 0.1

b 0.8

c 0.8

4 a Total number of buses arriving in an hour; T~Po(7.5).

b 0.0591 (3 s.f.)

c For example: Both are dependent on traffic.

d Not feasible. Mean≠variance.


5 a 19

b Proof; median=2.38 (3 s.f.);E(X)=2.25.

c 0.338 (3 s.f.)

6 χ2=4.22,ν=6; do not reject H0. No significant evidence of an association.


7 a p − value=0.0212; reject H0. Significant evidence that the contractors have reduced the mean
number of leaks.

b 0.0212 (3 s.f.)
A Level Practice Paper
1 B

2 D

3 a 53

b 59

c 5

4 a H0: Gender and lessons are independent; H1: Gender and lessons are not independent.

b 1

c χYates2=4.84; reject H0.

5 a 1ln2

b E(X)=1.44 (3 s.f.);Var(X)=0.0827 (3 s.f.)

c 0.144 (3 s.f.)
6 a Proof.

b H0: μ=35;H1: μ<35

c 11

d t=(−)3.25; reject H0.

e Assume that the swimming times are drawn from a normal distribution. Any reasonable
comment, for example: OK because swimming times will be mainly clustered around the average
with few people at extremes or not OK because the swimming club is likely to have people at the
upper tail of the distribution.

7 a 4.58% (3 s.f.)

b 92.5% (3 s.f.)
8 a Rectangular, between −0.5 and 0.5.

b 0.8

c 4x2

d {8x0<x<0.50otherwise
Chapter 1 worked solutions
1 Discrete random variables
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

EXERCISE 1A

2 a Using the fact that the summed probability has to be :

3 a Using the fact that the summed probability has to be :

Using the fact that :

b and .

So the median is the mean of and .

Finding the mean of and :

4 a Using the fact that the summed probability has to be :

5 a Using the fact that the summed probability has to be :

c
6 Let the discrete random variable represent the number rolled on the dice.

7 a Using the fact that the summed probability has to be :

Substituting this expression into the formula for the expected mean and using the fact that
:

Solving for and :

8 Using the formula for the expected mean and using the fact that you are looking for an expected profit
of :

9 a

d and .

So, .

10 a Profit £

Probability

b Working out the value of that would give a zero profit:


He would have to charge at least per game.

11 a Since , you know that .

On the other hand you know that the mode has to satisfy mode

Both have to be equal, so .

Calculating the values of and using the total probability of and the known expected mean of :

Substituting the values of and into the formulae for and :

b i Calculating the probabilities and putting them in a table:

Combined score
Probability

So,

ii Using the table from part b i:

12 a

b and . So the median is the mean of and .

Median

d , . Let the random variable represent the number of people who borrow no books,
.

ii

EXERCISE 1B

3 Using the facts that and :


4

Solving for and :

7 a Using the fact that the summed probability has to be :

8 a Using :

The expression for only gives real values for .

So, only for those values of .

10 The more often the coin is tossed, the more likely it becomes to show a head at least once. However,
there is no value of that guarantees a head. Hence, the expectation value is infinite and it is not
possible to define a fair price for this game.

EXERCISE 1C
EXERCISE 1C

2 Expected mean

3 Expected mean

4 a .

Let , where .

When :

When :

So, , where .

b Using the result that when , :

Then,

5 Let be the random variable describing the position number of the broken bulb.

Then .

Let be the distance from the plug, in .

Then .

You know that and .

Using expectation algebra (Key point 1.5):

and .

Expected mean , variance .


6 You can define a new variable, for .

7 Assuming equal probability you can use for :

8 Using and the definition of and :

Taking only the positive solution:

9 Defining a new uniformly distributed variable, . For :

10 which is divisible by .

MIXED PRACTICE 1

(Answer )

2 Using the fact that the summed probability has to be :

(Answer )

3 a

4 a

b
c

d has the highest probability of .

So, .

e and . So the median is the mean of and .

Median

5 a

b .

c .

6 a

So, .

b Using the fact that :

Using the fact that the summed probability has to be :

Solving for :

Substituting and into to find the corresponding values for :


When and when
9 a Using the fact that :

Using the fact that the summed probability has to be :

Solving for :

10 a

c .

So, .

11 a

12
13 a Using the general relation and the fact that the summed probability must be :

b Using the fact that and substituting the value for from part a:

c Using the general relation :

14

15 a Expected mean

Standard deviation

b Using the fact that there are possible positions of the Queen of Spades, each of which is equally
likely:

Expected number of points scored

16 a
b i Using the fact that the summed probability has to be :

Using the fact that :

Solving for :

ii

iii
So the standard deviation of is
17 a i Using the fact that the summed probability has to be :

ii

iii

iv

b
Chapter 2 Worked solutions
2 Poisson distribution
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

EXERCISE 2A

4 Let be the number of shooting stars observed in one hour, :

5 a Let be the number of white blood cells shown in a single high power field, :

b Let be the number of white blood cells shown in six high power fields, :

6 a Let be the number of flaws per metre of the wire, :

b Let be the number of flaws in metres of the wire, :

7 .

d Using the formula for conditional probability and your answers from parts a and b:

8 Let be the number of eagles observed on a given day, :

9 a
b

10 , so

Tip

Use technology or trial-and-improvement methods to find

11 a Let be the number of emails per day, :

Let be the number of emails per seven-day week, :

c There are alternative ways to get emails in a week other than every day.
12 a Let be the number of errors in a piece of homework, :

b and , with further probabilities decreasing.

So the most likely number of errors is .

Let be the number of students with at least one error:

Then and

13 Let be the number of requests per day, :

b If there are or more requests, then some of them have to be denied.

c If there are no requests, no car is used. If there is a day with only a single request each car will be
used with half probability.

14 Using the fact that :

Multiplying through by and rearranging:


Taking only the positive solution:

15 a

b You have to ensure that the prefactor you found in part a is .

Taking only the positive solution:

EXERCISE 2B

is the number of alpha particles emitted in a millisecond.

Assuming is true: .

Using technology:

This is a two-tailed test, so comparing this to half of the significance level:

, so reject at the significance level.

There is significant evidence that the sample emits a different number of alpha particles.

Tip

Alternatively you could compare the directly to the


significance level .

4 a

Assuming is true: .

Calculating the -value and comparing to the significance level:

Do not reject at the significance level.

There is not significant evidence to suggest that the average number of cars travelling past the
traffic light is lower than per minute.

b The rate might not be constant and the cars might not be independent. If one car drives slowly, it
might slow down the cars behind it.
5 a A Poisson distribution has equal mean and variance. Here they have approximately the same value.
b

Assuming is true: .

Calculating the -value and comparing to the significance level:

Do not reject at the significance level. There is not significant evidence that there has been a
reduction in the average number of accidents.
6 a The sample mean is and the unbiased estimate of the population variance is , giving a standard
deviation of approximately .

b A Poisson distribution has equal mean and variance. Here, they have approximately the same value.

Assuming is true: .

Calculating the -value and comparing to the significance level:

Reject at the significance level. There is significant evidence that the number of mistakes is
lower.
7 a You need a constant rate over the day and the bees have to arrive independently.

b Let represent the number of bees arriving in minutes.

Assuming is true: .

Calculating the -value and comparing to the significance level:

Reject at the significance level. There is significant evidence that the number of bees has
increased.

8 represents the number of leaks in of pipe.

Assuming is true: .

Using technology: .

This is a two-tailed test, so comparing this to half of the significance level:

so do not reject at the significance level.

There is not significant evidence of a change in the mean number of leaks.


Tip

Alternatively you could compare the directly to the


significance level .

9 a represents the number of earthquakes in one year.

Assuming is true: .

Calculating the -value and comparing to the significance level:

Do not reject at the significance level. There is not significant evidence that the number of
earthquakes has increased.

b represents the number of earthquakes in two years.

Assuming that is true: .

Calculating the -value and comparing to the significance level:

Reject at the significance level. There is significant evidence that the number of earthquakes
has increased.

10 If is true, then .

So,

So, for to be rejected at the significance level:

The smallest possible value is .

MIXED PRACTICE 2

1 Let be the number of complaints in a hour shift, :

(Answer C)

(Answer C)
3 a Assuming that thrushes and robins visit the table independently of each other:

4 a The events have to be independent and the rate of success has to be constant.

c represents the number of burgers sold per hour.

Assuming that is true: .

Calculating the -value and comparing to the significance level:

Reject at the significance level. There is significant evidence that the average rate of burgers
ordered has increased from .

5 a

b :

6 a

7 a Mean

Unbiased estimate of population variance

Standard deviation

Tip

You learnt how to calculate an unbiased estimate of variance in A Level Mathematics


Student Book 1, Chapter 22. You should also be able to use your calculator to find the
unbiased estimate of the population variance.

b A Poisson distribution has equal mean and variance. Here they have approximately the same value.

c Let represent the number of power surges per days.


Assuming is true: .

Using technology: .

This is a two-tailed test, so comparing this to half of the significance level:


, so do not reject at the significance level.

There is not significant evidence of a change in the average rate of power surges.

Tip

Alternatively you could compare the directly to the


significance level

8 a

Let be the number of days in a five-day week on which there are more than calls. So
.

9 a

b Using the result from part a:

Let represent the number of years with fewer than seven rainy days.

10 a

So

11 a

So

Using the fact that :

So

12 a Let represent the number of eruptions in one hour:


b Let represent the number of eruptions in one day:

c Let represent the number of eruptions in minutes.

There are half hours in a day.

d Let represent the number of eruptions in one hour:

For the first eruption to occur between and there are hours with no eruptions followed
by an hour with at least eruption:

e Over days with an estimate of eruptions per day, each producing litres of water:

f Let represent the number of eruptions in one hour:

g Let represent the number of eruptions in one hour:

13 a Let represent the number of rainstorms in a four-week period:

b Let represent the number of rainstorms in complete weeks:

Since is an integer, the value of has to be at least .


14 a Let represent the number of patients that arrive in minutes:

b Let represent the number of patients that arrive in one hour:

15

Assuming that is true: .

Calculating the -value and comparing to the significance level:

Do not reject at the significance level. There is not significant evidence that the average rate has
decreased.
16 a The average rate must be constant. However, you might expect it to vary over different times of the
day and with different weather conditions. Birds must arrive independently, but they might come in
flocks.

b :

c ;

d Let represent the number of birds arriving per hour.

Assuming is true: .

Using technology: .

This is a two-tailed test, so comparing this to half of the significance level:


, so do not reject at the significance level.

There is no significant evidence of a change in the number of birds arriving each hour.

Tip

Alternatively you could compare the directly to the


significance level

17 represents the number of leaks in a section of pipe.


Assuming that is true: .

Calculating the -value and comparing to the significance level:

Do not reject at the significance level. There is not significant evidence that the number of leaks
has increased.
18 a Using :

b Number requested or more


Number sold

Probability

Probability

Most probable number sold in a week is .

c Let the number sold in a week be denoted by .

d Let the smallest number of copies be .

Tip

Note that is a constant.

, so

Summing up to gives the first value , so at least copies should be ordered.


19 a i

ii

b i

ii
c

20 Exclude the probability of . Then scale the old mean by the new probability.

21 a Let represent the number of bicycles brought in to be serviced per day.

ii

iii

b Let represent the number of bicycles brought in to be serviced in one week ( days).

22 a Let represent the number of coins found per .

ii

c Let represent the number of coins found per .

Let represent the number of brooches found per .


i

ii The coins buried in a hoard are no longer independent. The Poisson assumption requires
independence, so brooches are more likely to be modelled by a Poisson distribution.
Worked solutions
3 Chi-squared tests
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

EXERCISE 3A

2 a : Physics grade and Mathematics course are independent,

: Physics grade and Mathematics course are dependent.

b or lower or or

Further Maths
Maths AS or A

No Maths

d (critical value from the table at the significance level)

Reject . There is significant evidence of association. Students studying a higher level of


Mathematics tend to do better in Physics.

3 : Reading level and fiction/non-fiction are independent.

: Reading level and fiction/non-fiction are dependent.

Observed values:

Elementary Moderate Advanced

Fiction
Non-Fiction

Expected values:

Elementary Moderate Advanced

Fiction
Non-Fiction

and .

The critical value from the table at the level of significance is .


Reject . There is significant evidence that the reading level and fiction/non-fiction are dependent.
4 a Early On Time Late Total

Walk
Car

Other

Total

b Expected values:

Early On Time Late Total


Walk

Car

Other

Total

c lateness and mode of transport are independent,

lateness and mode of transport are dependent.

Critical value from the table at the significance level

critical value.

Reject . Significant evidence that lateness depends on the mode of transport.

d He must assume that his data are representative. For example: it was not a day with unusual traffic.
He must also assume that the respondents were independent. For example: not lots of students from
the same bus.

5 The expected frequencies are given by:

number of visits and money spent are independent,

number of visits and money spent are dependent.

(critical value from the table at the level of significance).

Reject .

There is significant evidence of an association.

People who visit more often tend to spend more money on each visit.
6 The expected frequencies are given by:

No drug taken Single dose Double dose


days

days
days
days

recovery speed and dose of drug are independent,

recovery speed and dose of drug are dependent.

(critical value from the table at the level of significance).

Do not reject .

There is no significant dependency.

Hence, the drug does not appear to be effective in increasing the speed of recovery.

7 a Male Female
£

b gender and salary are independent,

gender and salary are dependent.

(critical value from the table at the level of


significance).

Do not reject .

There is no significant evidence of dependency between salary and gender.


8 a Using the fact that the sum over all is equal to the sum over all :

b , since has to be positive.

c The critical value here is . Hence find significance for .


9 a For both genders, of the corresponding population voted for any party.

b Gender and voting intention are independent.


Gender and voting intention are not independent.

If is true, half of the voters for each party will be males and half will be females.

Degrees of freedom, . Critical value from the table at the level of significance
is .

Observed Expected
Male Female Male Female

, so do not reject and conclude that, for a sample of size at the level, there is not
significant evidence to suggest that gender and voting intention are independent.

c Let the smallest sample size required be times the sample of size .

, so

Sample must be greater than , so the smallest is .

d There will be random variation within the sample.


10 a

b
11 a The square causes larger deviations to have a stronger effect and ensures is positive.

b You divide by the expected value to prevent large groups contributing disproportionately and
overwhelming the contributions of smaller groups.

EXERCISE 3B

2 appearance and colour are independent,

appearance and colour are dependent.

The expected frequencies are given by:

Wrinkled Round
Yellow
Green

so use Yates’ correction.


(critical value from the table at the level of significance)

Do not reject . There is no significant evidence of association at the significance level.

3 The colleague's enjoyment is independent of whether milk or tea is added first.

The colleague's enjoyment depends on whether milk or tea is added first.

If is true, then half the drinks that the colleague likes had tea added first, and half had milk added
first.

Observed Expected

Tea first Milk first Tea first Milk first


Likes Likes
Dislikes Dislikes

Degrees of freedom, , so you must apply Yates' correction.

(critical value from the table at the significance level), so do not reject at the
level. There is evidence to suggest that the colleague's enjoyment is independent of whether milk or
tea is added first.

4 The number of books does not differ between rural and urban libraries.

The number of books differs between rural and urban libraries.

Combine last two columns due to an expected frequency being .

Observed Expected

Rural Rural
Urban Urban

Degrees of freedom, , so you must apply Yates' correction.

(critical value from the table at the significance level), so do not reject at the
level. There is evidence to suggest that the number of books does not differ significantly between rural
and urban libraries.

5 There is no association between the amount spent on horror films and the number of murders each
year.

There is an association between the amount spent on horror films and the number of murders each
year.

Combine last two columns due to an expected being frequency .

Observed Expected

Degrees of freedom, , so you must apply Yates' correction.

(critical value from the table at the significance level), so reject . At the level,
there is significant evidence of an association between the amount spent on horror films and the
number of murders each year. However, this does not establish causality (it does not show that high
spending on horror films causes a high number of murders).
6 a Acceptance patterns are independent of gender (no association).

Acceptance patterns are dependent on gender (association).

Observed Expected
Accepted Rejected Accepted Rejected

Male Male
Female Female

Degrees of freedom, , so you must apply Yates' correction.

(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that acceptance at this university is dependent on
gender.

The acceptance rates are for males and for females, which appears to be evidence of bias.

b Acceptance patterns do not vary in different departments.

Acceptance patterns vary in different departments.

Expected

Dept. Men Women


Total
Admitted Rejected Admitted Rejected

Total

Degrees of freedom, .

(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that acceptance patterns depend on department.

The proportion of men admitted is higher than the proportion of women admitted in two of the six
departments ( and ).

Dept
% Men admitted
% Women admitted

7 You cannot do a calculation based on two factors being dependent unless you know exactly what that
dependency is.

MIXED PRACTICE 3

1 More information is needed. (Answer D)

2 Require to not be rejected in a test at the level.

and the critical value from the table is .

(Answer B)

3 loan outcome and recipient type are independent,

loan outcome and recipient type are dependent.

The expected frequencies are:

Recipient
Small Large
Individual
business business
Satisfactory
Outcome
Bad debt
(critical value from the table at the significance level).

Do not reject .

There is no significant evidence of an association.


4 a The expected frequencies are:

Eye colour
Blue Green Brown
Brown
Hair colour
Blonde

c hair colour and eye colour are independent,

hair colour and eye colour are dependent.

Do not reject with (critical value from the table at the significance level).
There is no significant evidence that hair colour and eye colour are dependent.

5 Expected frequencies:

Spring Summer Autumn Winter


Boy

Girl

gender and time of year are independent,

gender and time of year are dependent.

(critical value from the table at the significance level).

Do not reject . There is no evidence of any association between the gender of the baby and the time
of year.

6 The expected frequencies are:

Using the Yates’ correction:

(Answer A)

7 a Combining the last two columns since otherwise the expected frequencies would be smaller than :

Semi-detached and
Flat Terraced detached

Sold within three months


Sold in more than three
months

type of property and time taken to sell it are independent,

type of property and time taken to sell it are dependent.

(critical value from the table at the significance level)

Reject . There is significant evidence of an association.


b i The larger total number of properties could make it easier to sell.

ii The large difference between observed and expected frequencies together with the small
expected frequency gives a large contribution to .
8 a There is no association between class of degree and A-level grade.

There is an association between class of degree and A-level grade.

Combine degree classes and due to an expected frequency being .

Class of degree – Expected


or Total

A-level grade

Total

Degrees of freedom, .

(critical value from the table at the significance level), so reject . There is some
evidence at the level to suggest that Fiona's belief is justified.

b They obtained more Class degrees, but fewer Class degrees than expected.
9 a i The type of sideswipe accident is independent of where the was registered.

The type of sideswipe accident is dependent on where the was registered.

Type of sideswipe accident – Expected


Changing Changing Overtaking
lane to the lane to the moving Total
left right vehicle
British
reg.
HGV
Foreign
reg.
HGV
Total
Degrees of freedom, .

(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that the type of sideswipe accident is dependent on
whether the involved was British registered or foreign registered.

ii Accidents involving changing lane to the left are less likely and accidents involving changing lane
to the right are more likely than expected for foreign registered .
b i Observed Expected
No No
Prosecution Prosecution
prosecution prosecution

ii Prosecutions result independently of the age of the driver.

Prosecutions resulting are dependent on the age of the driver.

Degrees of freedom, , so you must apply Yates' correction.

(critical value from the table at the significance level), so do not reject . There
is significant evidence at the level to suggest that prosecutions resulted independently of the
age of the driver. There is no significant evidence that the driver's age had an influence on
whether or not a prosecution resulted.

10 There is no association between the age of staff and the department they work in.

There is an association between the age of staff and the department they work in.

Combine the columns for staff aged and due to an expected frequency being .

Expected
Total
Accounts
Personnel
Marketing
Comms.
Total

Degrees of freedom, .
(critical value from the table at the significance level), so reject . There is significant
evidence at the level of an association between the age of staff and the department they work in.
11 a A sixth of businesses do not repay their loans while a fifth of all mortgages and personal loans are
defaulted.

b The frequencies he would observe are:

Repaid Defaulted
Personal
Mortgage
Business

type of loan and whether it is repaid are independent,

type of loan and whether it is repaid are dependent.

(critical value from the table at the significance level),


so do not reject . There is no significant evidence of dependence.

c For a sample of loans:

Observed frequencies:

Repaid Defaulted Total


Personal
Mortgage
Business
Total

Expected frequencies:

Repaid Defaulted Total

Personal
Mortgage
Business
Total

To show dependence at the significance level, (critical value from the table at the
significance level).

So, the smallest whole number is .

d The sample follows the long-term trend.


12 a Development of Type diabetes is independent of the average level of weekly alcohol
consumption.
Development of Type diabetes is dependent on the average level of weekly alcohol
consumption.

Degrees of freedom, .

Expected
Type diabetes
developed
Yes No Total
Less than
Between and
Average level of weekly alcohol
consumption
More than
Total

(critical value from the table at the significance level), so reject . There is
significant evidence at the level to suggest that development of Type diabetes is dependent on
the average level of weekly alcohol consumption, i.e. that there is an association.

b The reviewer’s statement is most likely based on these proportions:

Consumption
Diabetes developed

It appears that the advice given might lower the risk for people in the ' ' group, but increase the
risk for people in the ' ' group.

However, these proportions apply to this particular group of people, and there is no evidence to
suggest that the group is representative of the whole population, for whom the proportions might be
very different.
c i The number of rows and columns in the contingency table, the number of degrees of freedom and
the significance level of the test would not change, so the critical value would remain at .

ii Compare with .

Each of the terms that are summed to find the test statistic, , would increase by a factor of , so
would increase by the same factor from to .

iii The conclusion is the same because the test statistic is still greater than the critical value, but the
evidence to support that conclusion would be stronger than before.
Chapter 4 Worked solutions
4 Continuous distributions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

EXERCISE 4A

4 a Using the fact that the total probability has to be :

b Substituting the value of from part a:

5 Setting the total probability equal to , and using the lower limit and the upper limit , to find :

7 Obtaining the CDF by integrating the PDF:

Using either :

CDF is

Let the lower and upper quartiles be and , respectively.

8 You need to use the fact that in both parts a and b.

a Using the fact that :


b Using the fact that

9 The graph of the PDF is:

Using the fact that the probability of two independent observations of both being below is :

Using the fact that the area under the graph must be equal to and substituting :

Area under graph

10 Using the fact that the total probability has to be and solving the exponential equation either by using
the sinh function or by substituting :

Substituting the value of into the formula for :

11 Using the fact that the total probability has to be :

Making the substitution and solving for :


Since can only take positive values, and so .

has to be positive, giving only one possible solution:

EXERCISE 4B

3 a Using :

b Using the formula for and substituting from part a:

4 a Using the fact that the total probability has to be :

b Using and substituting the value of from part a:

5 a is defined between and . To show that is a PDF, you need to show that , for all
and that .

for and for every .

satisfies the conditions to be a PDF.

b Using :

6 a Using the fact that the total probability has to be :

b Using , substituting in the value of from part a and using the fact that
:

7 Using the formula, :


Since is a PDF:

so

Using integration by parts:

Set and

Then and

So

Evaluating the expression in the squared brackets and using :

EXERCISE 4C
EXERCISE 4C
4 a Expected value £ £ £

b Standard deviation £ £

Using the fact that :

6 a

7 a

8 a Using integration by parts to find :

9 a

b Using your result from part a:

EXERCISE 4D
EXERCISE 4D
2 a

d
3 a

4 Let represent the mass of a woman, represent the mass of a man and represent the total mass
of the women, men and the lift:

5 Let represent the outcome of the roll and let represent the difference between the two scores:

6 Let represent one student's exam scores, and let represent the difference between two students'
scores:

7 Let represent Pamela's journey time, let represent Adrian's journey time and let represent the
difference in total weekly journey times:

8 is twice the mass of a single gerbil.

is the sum of the masses of two different gerbils.

EXERCISE 4E

2 a

The distribution is .
b Using your calculator:

3 Let the times taken by Aaron and by Bashir be represented by and , respectively.
a

Standard deviation of

4 a

Let represent the mean of a random sample of six rods. Then .

b Using the distribution from part a,

5 a Let represent the length of a randomly chosen pipe, then .

Tip

Alternatively, for the total length of pipes:

6 a
b

c Let the total mass be represented by .

7 Let represent the mass of a randomly chosen apple, and let represent the mass of a randomly
chosen pear.
a

8 Let the standard deviation of the length of a corn snake be .

For snakes:

Using the fact that for a randomly selected sample of corn snakes, :

So

9 Let represent the score of a randomly chosen boy and let represent the score of a randomly chosen
girl.
a

If the marks differ by less than , then is between and .


Tip

Alternatively, you could use as your random variable.)

b Taking as your random variable:

Mean of

Tip

Alternatively, you could use as your random variable.)

10 a Daily rainfall . Using the fact that , so :

, so

This gives

Weekly rainfall .

Using the fact that in a randomly chosen -day week, there is a probability of that the mean daily
rainfall is less than :

This gives

Solving and : and

b Assumes that the rainfall each day is independent of the rainfall on other days, which is unlikely to
be the case.

11 a

b i

Tip

Alternatively, if , then mean .


Mean wait for days

ii On any day,

Let the number of mornings per week that she waits more than minutes be :

Tip

You can use a probability found from a normal distribution as the parameter in a
binomial distribution.

iii

c Average for a week is more than minutes if she waits more than minutes on the last day.

d Average waiting time is .

EXERCISE 4F

3 Using the fact that for the percentile, :

4 a

c Using the fact that, for the median, , :

EXERCISE 4G
EXERCISE 4G

1 a

b Using the fact that the lines have gradients of :

c Using the fact that :

From , area is

2 a To find the median, you need to solve .

From the CDF or from its graph:

, so the median is .

Tip

Alternatively, looking at the PDF and its graph:

b Integrating to find the mean, :

Integrating and then subtracting the square of the mean to find :


3 a

Using the fact that the area under the graph of the PDF is equal to :

c Using the fact that the area to the right of the upper quartile is :

This simplifies to and is satisfied by , which is the value of .

Tip

You can use technology to find or, if working manually, test for a sign change in the
value of ; start by evaluating and

d The CDF,

First section:

At

Second section:

At

4 a
Using the fact that the area under the graph of the PDF is equal to :

Simplifying:

d Considering the areas under the two sections of the graph of the PDF:

Area , so the median is between and .

Using area :

Simplifying:

5 a You must show that is never negative and that the area under the graph is .

The function for all real , because and for all real .

b Using integration by parts to find and and using the variance formula
:

and

Integrating by parts:

and
Tip

A sketch graph of the PDF would tell you to expect that

Integrating by parts twice:

and

So,

6 a

Using the fact that the area under the graph of the PDF is :

, giving

Substituting from part a:

c The CDF is

First section:

, so , giving

Substituting :

Second section:

, giving
Substituting :

CDF is

d By substitution:

, and , which tell you that contains the median.

Let the median be . You know that and that .

e By substitution:

, which tell you that contains the lower quartile.

Let the lower quartile be .

You know that and that .

, giving

7 a

Using the fact that the area under the graph of the PDF is equal to :

, so

c  
So,

Using integration by parts for both functions:

and

EXERCISE 4H
EXERCISE 4H
3 a Let represent the length of the piece.

b Expected mean

Standard deviation

6 a

7 Mean of

Standard deviation of

EXERCISE 4I

3 Let represent the number of emails received in minutes.

4 Let represent the number of birds that arrive in ten minutes.

c The average rate is bird every minutes.

5 Let represent the number of people he meets that he knows in minutes.

6 Let represent the number of buses arriving in minutes:


Let represent the number of buses arriving in minutes:

7 Assuming is the average number of calls per minute, you can find an expression for the average
waiting time per call, :

9 a The waiting time per bus follows an exponential distribution with mean Hence .

10 Using integration by parts to find and and using the variance formula
:

and

Integrating by parts:

Integrating by parts twice:

So , and

11 a Poisson distribution with mean : .

c The expression describes the probability of no success in units of time, so it is the


probability of having to wait at least units of time for a success, which is described by .

d From parts b and c: , so

The expression represents the CDF of the random variable :


Now,

The PDF is

EXERCISE 4J

2 a Using the fact that the total probability has to be 1 and the fact that :

So

Hence

b Substituting the value for from part a:

c Splitting the calculation of into an integral over the continuous part of the variable and then
adding the value for the discrete part:

d Splitting the caclulation of into an integral over the continuous part of the variable and then
adding the value for the discrete part:

3 a Using the fact that the total probability has to be and the fact that :

So :

Hence .

b Splitting the calculation of into an integral over the continuous part of the variable and then
adding the value for the discrete part, then substituting in the value of c from part a:

Substituting into the calculation for :

4 a Using the fact that if you add the cumulative probability for the continuous interval to the probability
for the integer values you have to obtain a total probability of :
b For

Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:

5 a

Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:

d Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:

6 a Using the fact that and that has to have equal values at and due to continuity:

7 Using the fact that you have to obtain a total probability of :


Since you know that has to be positive, only is a valid solution.

8 a Constructing a probability table:

Using the fact that the total probability has to equal :

b For

Splitting the calculation of into an integral over the continuous part of the variable and a sum
over the discrete part:

d Using the fact that the total probability has to equal :

Rounding causes a slight underestimate of the true mean time.

MIXED PRACTICE 4

1 (Answer C)

2 Standard deviation of

(Answer B)
3 a Using the fact that the total probability has to equal :
b

c
4 a Using the fact that the total probability has to equal :

b Using and substituting from part a:

c Using :

5 a Standard deviation of standard deviation of £

b £ £

c £

Standard deviation £

6 a Using the fact that the total probability has to equal :

Using the fact that :

b i

ii Using and substituting the values of and from part a:

7 a Using the fact that the total probability has to equal :

b For the median, :


8 Since the PDF is symmetric about : and :

9 a Using the fact that the total probability has to equal :

b Substituting into the formula for :

10 a Using the fact that there are no recorded masses below , so :

So

b Using the fact that there are no recorded masses over , so :

So

c For

11 a
b

c Probably not. Especially in the situation in part b it is likely that when Alice is finished Hassan might
try to speed up.

d standard deviation
Measures for over days

For :

So, the expectation will be greater than the standard deviation for days.

12 Let the mean time taken per question for the questions be minutes.

Mean of

Variance of

Markus will fail to complete the test if .

Tip

Alternatively, let the mean time taken to answer all questions be minutes.

Mean of

Variance of

Note that when values are taken from the same normal distribution with variance , of
these values have variance , not .

He will fail to complete the test if .

13 Let the mean mass of a man in a group of be .

Mean of
The random variable .

The total mass of the men exceeds when .

Tip

Alternatively, let the total mass of a group of men be .

Mean of ;

The random variable .

14 Let the mean mass of purple beads be , and let the mass of yellow beads be .

Mean of

The random variable .

15 a Mean of

Variance of

b The random variable .

16 a

b Let the median be , then .

From the graph:

The median, or

c The PDF, over each of the four section intervals.

CDF PDF
Interval

Tip

Only one statement is needed in the PDF for the intervals and .

e Let the mean of values of be , then and

Tip

Alternatively, you can find the probability that values have a sum greater than
.

Let the sum of values be .

17 a Let the mean rate of observations per hour be .

Mean rate of observations per hours is, therefore, .

(no particles observed in hours)

Using the fact that the probability of observing no particles in hours is :


Expected waiting time hours

b CDF is , where is the waiting time in hours and, from part a, .

c Using the memoryless property of the exponential distribution:

, where and both represent hours.

18 a The PDF of the distribution is for , and otherwise.

The CDF is

c Let the two independent observations of be and .

You need to find the probability that both observations are less than , where each of these events
has probability .

19 a Using the fact that the total probability must be equal to :

b Finding by integrating over the continuous part and summing over the
discrete part:
c Let the median value of be , then .

This gives

Tip

Alternatively, you could find m using .

20

21

Using the fact that the median of is :

Using the fact that :

22 a Let the marks in English and in Mathematics be represented by and by , respectively.

The random variable .

b Let the average marks of the class in English and in Mathematics be and , respectively.

Mean of

The random variable .


Tip

Alternatively, you could find the probability that the sum of the marks is higher in English
than in Mathematics .

The random variable .

23 a Using the fact that the total probability has to equal :

d Six months is half a year.

e i

ii
24 a

d i
ii

e Solving the equation from part d ii:

25 a

b i

ii

c i Using the results from part b:

ii You know that .

d Substituting the value for from part c ii:


Worked solutions
5 Further hypothesis testing
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

EXERCISE 5A

2 a

Use the table to find the critical value. When and for a two-tailed test you look at the
column.

Comparing your calculated -score with the critical value:

Reject at the level. There is significant evidence that John's computer does not take
seconds to start on average.

c Assume that the times are normally distributed. True variance is unknown.

3 a

Use the table to find the critical value. When and for a one-tailed test you look
at the column.

Comparing your calculated -score with the critical value:

Do not reject . There is no significant evidence that the packets contain more than on
average.

4 .

Use the table to find the critical value. When and for a one-tailed test you look
at the column.

Comparing your calculated -score with the critical value:


Reject . There is significant evidence that babies in the nursery crawl earlier.
5 a

Use the table to find the critical value. When and for a one-tailed test you look at
the column.

Comparing your calculated -score with the critical value:

Do not reject . There is no significant evidence that cleaning the kettle decreases the time it takes
to boil.

6 a

Use the table to find the critical value. When and for a one-tailed test you
look at the column.

Comparing your calculated -score with the critical value:

Reject . There is significant evidence that the athletes of the club do, indeed, run faster.

d Assume that the times are normally distributed.


7 a

Use the table to find the critical value. When and for a two-tailed test you look
at the column.

Comparing your calculated -score with the critical value:

Do not reject . There is no significant evidence that the mean length of the bananas is different.
c i
ii

Use the table to find the critical value. When and for a one-tailed test you
look at the column.

Comparing your calculated -score with the critical value:

Reject . There is significant evidence that the mean length of the bananas is less than .
8 a

b Using :

So, calculating the values of for different , and comparing to the critical values from the
(since two-tailed test at level) column of the table:

Critical value

The first value of that falls into the critical region is so Aki will reject the null hypothesis at the
level for all .

EXERCISE 5B

7 This decreases the risk of a type II error but increases the risk of a type I error.
8 a : coin is fair, : coin is biased.

b Type I error. This is an example of rejecting when it is in fact true.


9 a Claiming that there is correlation when none really exists.

b Not recognising correlation when there is underlying correlation.


10 a i Claiming that the dice is biased when it is not.

ii Claiming that the dice is not biased when it is.

b For example: roll the dice more times, look for more than sixes, consider other numbers, do a chi-
squared test.
11 a

The significance level is , which is very small and requires extreme evidence before change is
found. This does not seem to be required in this situation.
12 a

13 a

b and

, so do not reject .

There is no evidence to suggest that the mean mass of the eggs is greater than .

d Let the smallest average mass necessary be .

, so the smallest average mass would be

e State the alternative mean value

14 a

So the significance level is

c You need to find the probability that is accepted, given that the alternative value for is true, i.e.
.

d at a stationary point.
Solutions are and .

nd derivative:

, which is negative at and positive at .

Therefore, the probability of a type II error is maximised when .


15 a i Critical -values for a two-tailed test at are .

ii Critical -values for a two-tailed test at are .

b i

ii

16

There is a special case when , where you simply cannot make a type II error. Here the power is .
In all other cases, the power of the test is .

MIXED PRACTICE 5

1 Using ¸ , and the estimate of :

2 Using the definitions, answer is correct. (Answer A)


3 a
b

c Use the table to find the critical value. When and for a two-tailed test you look
at the column.

Comparing your calculated -score with the critical value:

Do not reject . There is no significant evidence that the volume of produced in a reaction
differs from .

d Assumes that the data are drawn from a normal distribution.


4 a Not rejecting when it is actually false.

d Increase the sample size, i.e. make more observations.

5 a

b For a second period, the expected number of beta particles is .

The significance level is

c If per second are expected, then per seconds.

6 a

d Use the table to find the critical value. When and for a one-tailed test you look
at the column.

Comparing your calculated -score with the critical value:


Reject . There is significant evidence that the average salary is less than £ .
7 a

Significance level of the test is

8 a Josh has taken Safeerah's own body mass into account.

Use the table to find the critical value. When and for a one-tailed test you
look at the column.

Comparing your calculated -score with the critical value:

Do not reject . There is not enough evidence for a reduced journey time.
9 a .

Sample mean,

Sample

Test statistic,

Critical values for a two-tailed test at are .

lies within the acceptance region, so do not reject .

There is no significant evidence at the significance level to doubt that the current mean depth is
.

b .

Sample mean,

Sample
Test statistic,

Critical values for a two-tailed test at are .

lies outside the acceptance region, so reject .

There is significant evidence at the significance level that the current mean depth is not .
c i Neither;

ii The value of decreases as increases, so the confidence interval, which is

, gets narrower as increases.

Differences from are easier to detect in a narrower interval, so a sample size of gives
smaller risk of making a type II error.
10 a : method of encouragement and outcome are independent

: method of encouragement and outcome are dependent.

Calculating the expected frequencies:

Applied for grant Did not apply


Letter
Phone

Using Yates’ correction:

(the critical value from the table at the


significance level)

Reject . There is sufficient evidence of an association between method of receiving information and
outcome.

b A type I error: was rejected despite it being correct.


Chapter 6 worked solutions
6 Confidence intervals
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

EXERCISE 6A

4 a

b lies within the confidence interval found in part a. It is plausible that the true oxygen level is .

5 a

b Yes, there is sufficient evidence that the true mean is below since is above the upper bound of
the confidence interval found in part a.

7 a

b lies within the confidence interval found in part a. No significant evidence of a difference in the
mean wage.

8 a ,

The confidence level is .

b No. This is a confidence interval for the population mean, not sample means.

c
Using the fact that the width of the confidence interval must be less than :

since must be an integer.

Using the fact that the width of the confidence interval must be less than :

The minimum sample size is , since must be an integer.

10 a

b ,

The confidence level is .

c No, since the confidence intervals overlap (although it is quite difficult to find the significance level).
11 a Using these unbiased estimates:

Upper bound of CI is

so

The confidence interval is .

b In a one-tailed test at the significance level, a mean of lies outside the confidence interval
, so reject .
12 a

Calculating the mean of the differences:

b Increase the sample size.

c Yes. is below the lower bound of the confidence interval found in part a. The data suggests an
increase of at least at the level.
13 a False.

b False.

c False.

d True.

e False.

14 The larger the interval, the more confidence you can have that the true mean lies within it, hence the
interval will be larger than the one.

EXERCISE 6B

2 a Assume heights are normally distributed.

d Using the table to find the -score when

3 a

Using the table:

The confidence level is .

c No, the confidence interval shows the likely position of the mean, not of the individual member of the
population. The sample is too small for meaningful generalisations.
4 , so unbiased estimate of population standard deviation is:

Unbiased estimate of population mean

and

The confidence level is .

5 a

Using the table:

The confidence level is .

c Yes. is below the lower bound of the confidence interval. It could even be claimed that the
lifetime is at least pages.
6 a For , the table gives

Variance of the four times is , so

confidence interval for is:

i.e.
b i

ii A time of minutes is within the confidence interval, so do not reject at the


significance level. The test provides no significant evidence that the average time taken is more
than minutes.

7 a

Using the calculation of the mean of the differences:


b

Using the table:

The confidence level is .


8 a For , the table gives a -score of

For the two blocks, and , so

confidence interval for is:

i.e.

b As in part a, , and

For the two blocks, and , so

Let the lower bound for the confidence interval of be , then:

, so for all .

The value of is always less than the upper bound of .

So the two intervals always overlap.

MIXED PRACTICE 6

Since must be an integer, the smallest value of is . (Answer D)

2 a

c lies within the confidence interval. Do not reject . There is not enough evidence to suggest a
different mean to .
3

Using the table to find the -score when :

since must be an integer.

6 a

Using the table to find the -score when :

does not lie in the confidence interval. Reject .

There is significant evidence that the results are different.

7 a

A confidence interval for is

b If it is true that , then

, giving

A mean of is not consistent with the confidence interval for in part a.

8 a Sample mean, and

confidence interval for is

Interval is

b .

Test statistic,
Two-tailed test at the level of significance, so critical value is .

is outside the rejection region, so do not reject . There is evidence to suggest that the diet
results in no change in mass.

9 a

b Using the table to find the -score when :

10 a i A confidence interval for the population mean has a probability of including the true
population mean.

So the probability that the interval will not include the value of , or

ii (At least one interval will not include ) (No intervals do not include )

Using your results from part a:

Tip

Alternatively, using the cumulative binomial function on your calculator with :

b i Number of degrees of freedom

Small sample with taken from the sample, so using the -distribution:

confidence interval is

Using the table to find the -score when :

So, substituting in the values, the confidence interval is:

ii is beyond the upper limit of the confidence interval.

So, the new programme seems to have been effective and the mean time seems to have
decreased.
Worked solutions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

Cross-topic review exercise 1


1 Let P(Y=1)=a and P(Y=4)=b.

Then, using the fact that the probabilities must have a total sum of 1:

a+b=1⇒a=1−b

Using the fact that E(Y)=3:

E(Y)=1×a+4×b=3⇒a=13,b=23

Var(Y)=E(Y2)−(E(Y))2=13×12+23×42−32=2 (Answer B)

2 Z=Φ−1(0.975)=1.96 (3 s.f.)

2×Zσn=2×Z3010=37.2 (3 s.f.) (Answer D)


3 a

b Using the fact that the total area under the graph must be 1:

∫03kx2 dx+9k(4−3)=9k+9k=1⇒k=118

c i ∫03kx2 dx=9k(4−3)=0.5

This means that P(X⩽3)=0.5.

So, the median is 3.

ii Using the fact that for the lower quartile, q , P(X⩽q)=0.25:

∫0qkx2 dx=kq33=0.25

⇒q=0.75k3=2.38 (3 s.f.)

4 a The expected frequencies are:

North Academy South High School

No Maths 28.8 31.2

Single Maths 42.2 45.8


Further Maths 11.0 12.0

ν=(3−1)(2−1)=2
Comparing the χcalc2 value to the critical value from the table at the 5% significance level:

χcalc2=∑(Oi−Ei)2Ei=31.6 (3 s.f.)>5.991

There is significant evidence of an association.

b Random, representative sample from each school.

c No. Just because there is dependency does not mean that there is causality.

5 a Using the fact that the total summed probabilities must be equal to 1:

∫02px+q dx=[px22+qx]02=2p+2q=1⇒p+q=12

b Using the fact that E(X)=23 and substituting in q=12−p from part a:

∫02px2+qx dx=[px33+qx22]02=8p3+2q=8p3+1−2p=2p3+1=23⇒p=−12,q=1

c Calculating P(X>1) and substituting in the values for p and q from part b:

P(X>1)=∫12px+q dx=[px22+qx]12=1−p2−q=14

d Using the formula for Var(X) and substituting in the values for p and q from part b:

Var(X)=E(X2)−(E(X))2=∫02px3+qx2 dx−(23)2=4p+8q3−49=29

6 a X=C+T

E(X)=E(C)+E(T)=20+100=120 minutesσ(X)=σ2(C)+σ(T)2=52+102=11.2 minutes (3 s.f.)

b X=C+T,Y=200+1060X=200+16X

Substituting in the values from part a:

E(Y)=200+16E(X)=£220σ(Y)=136σ2(X)=16σ(X)=£1.86 (3 s.f.)

7 a Mean=10 065160=62.9 kg (3 s.f.);

Standard deviation=12.3 kg  (3 s.f.)

b i Z=Φ−1(0.99)=2.33 (3 s.f.)

Zsn=2.27 (3 s.f.)x¯−Zsn<μ<x¯+Zsn⇒60.6<μ<65.2 (3 s.f.)

ii 61.7 lies within the confidence interval found in part b i. There is reason to doubt the claim.
8 a (X+Y) has a normal distribution with mean 5+3=8 and variance 22+52=29.

(X+Y)~ N(8,29).

b P(X⩾15−Y)=P(X+Y⩾15)=1 − P(X+Y<15)=1−Φ(15−829)=1−Φ(1.300)=0.0968
9 a Let X represent the number of patients during one week requiring treatment after a venomous snake
bite, X~ Po(0.5).
P(X⩽1)=e−0.5+0.51e−0.51!=0.910 (3 s.f.)
b i
ii Let Y represent the number of patients during an 8-week period requiring treatment after a
venomous snake bite, Y~ Po(4).

P(Y⩾5)=1−P(Y⩽4)=1−∑n=044ne−4n!=0.371 (3 s.f.)

iii Let Z represent the number of patients during a 26-week period requiring treatment after a
venomous snake bite, Z~Po(13).

P(10<Z<20)=P(Z<20)−P(Z⩽10)=∑n=01913ne−13n!−∑n=01013ne−13n!=0.706 (3 s.f.)
c Let W represent the number of patients during a 4-week period requiring treatment after a venomous
snake bite, W~Po(2).

P(W>5)=1−P(W⩽5)=1−∑n=052ne−2n!=0.0166 (3 s.f.)P(W>6)=1−P(W⩽6)=1−∑n=062ne
−2n!=0.004 53 (3 s.f.)

The hospital should have 6 doses of anti-venom after the delivery.

10 a AP AV

Baseball 275 50
Basketball 475 75
Soccer 350 25

b The expected frequencies are:

AP AV

Baseball 286 39.0


Basketball 484 66.0
Soccer 330 45.0

ν=(3−1)(2−1)=2

Comparing the χcalc2 value to the critical value from the table at the 1% significance level:

χcalc2=∑(Oi−Ei)2Ei=15.0 (3 s.f.)>9.210

There is significant evidence of an association between coping strategy and the sport involved.

c Soccer officials are far less likely than expected to use an AV coping style. Baseball officials are far
more likely than expected to use an AV coping style.
11 a Using the fact that the summed probabilities must equal 1:

a+b=1⇒b=1−aE(X)=b=1−a

b Var(X)=E(X)2−(E(X))2=b−E(X)2 =1−a−(1−a)2=a−a2

c Var(X)=f(a)=a−a2

Differentiating with respect to a and setting equal to 0 to find the maximum value:

f′(a)=1−2a=0⇒a=12⇒Var(X)=14

12 a X¯=12.7+13.32=13

b Z=Φ−1(0.95)=1.65 (3 s.f.)

Zσn=0.3⇒n=(Zσ0.3)2=636 (3 s.f.)
13 a i Let X represent the number of diamonds found per 2 m3, X~ Po(5):

P(X=2)=522!×e−5=0.0842 (3 s.f.)

ii Let Y represent the number of diamonds found per 1 m3, Y~ Po(2.5):

[P(Y=1)]2 = [2.511!×e−2.5]2=0.0421(3 s.f.)

b i For a one-tailed test at the 10% significance level, and using 1 m3 as your standard unit of volume:

H0: λ=2.5, H1: λ>2.5.
Tip

Alternatively, you could use 2 m3 as our standard unit of volume with
H0: λ=5 and H1: λ>5.

ii Let X represent the number of diamonds found per 2 m3, X~ Po(5):

P(X⩾7)=1− P(X⩽6)=1− ∑n=065ne−5n!=1−0.7621…=0.238(3 s.f.)

P(X⩾7)>10%, so do not reject H0. There is no significant evidence at the 10% level to suggest that
the number of diamonds found is greater than 5 per 2 m3 (or 2.5 per 1 m3 ), i.e. that the new mine
will be economically viable.

iii For a type I error to be made, the true null hypothesis H0: λ=5 per 2 m3 is rejected.

This will occur at the 10% level of significance in cases where the number of diamonds found, r, is
such that P(X>r)<0.10.

First you need to find the least possible integer value of r:

You need to find the least value of r such that P(X>r)<0.10.

The table shows probabilities for P(X>r) for values of r from 5 to 8:

P(X>r)=1−P(X⩽r)=1− e−5∑0r5rr! Result

P(X>5)=1−P(X⩽5)=0.384 039 349 >10%, so do not reject H0.


P(X>6)=1−P(X⩽6)=0.237 865 41 >10%, so do not reject H0.
P(X>7)=1−P(X⩽7)=0.133 371 678 >10%, so do not reject H0.
P(X>8)=1−P(X⩽8)=0.068 093 639 <10%, so reject H0.

Least value of r for which the true H0 is rejected is r=8.

P(type I error)= P(X>8)=0.0681 or 6.81%.(3 s.f.)

iv To reduce the likelihood of making a type II error. Making such an error would result in a lost
opportunity for the owner, as they would find significant evidence to suggest that the mine is not
economically viable when, in fact, it is.
14 a Phone calls are independent of each other. There is a constant average rate of phone calls.

b For example: The same customer might call back, breaking independence. The rate during office
hours might be different from the rate during the night.

c σ(X)=λ=4.5=2.12 (3 s.f.)

d Let X represent the number of phone calls answered in a 2-hour shift, X~ Po(2×4.5=9).

P(X<10)=∑n=099ne−9n!=0.587 (3 s.f.)

e Let Y represent the number of phone calls answered in an a-hour shift, Y~Po(a×4.5).

P(Y=0)=e−4.5a=0.5⇒a=−ln(0.5)4.5=0.154 hours (3 s.f.)

15 a Using the fact that ∫01f(x) dx=1:


∫01ax−x3 dx=a2−14=1⇒a=52

b E(X)=∫01x f(x) dx=∫01ax2−x4 dx=a3−15
Substituting a=52 from part a:
E(X)=56−15=1930

c E(30X+2)=30E(X)+2=21

E(1X)=∫011xf(x) dx=∫01a−x2 dx=a−13
d
Substituting a=52 from part a:

E(1X)=52−13=136

e Using the fact that, for median m, ∫0mf(x) dx=12:

∫0max−x3 dx=am22−m44=12n=m2⇒n2−5n+2=0⇒n=52±174≈4.56,0.438⇒m=
±0.662,±2.14 (3 s.f.)

Using the fact that 0<m<1:

m=0.662(3 s.f.) is the only possible solution.


16 a Let the number of cans containing less than 330 ml be X, then X~ B(25,0.5).

P(X=12)=(2512)×0.525=0.155(3 s.f.)
b i Rejecting the batch when the mean contents are, in fact, 330 ml.

P(type I error)= P(X⩾20)= ∑n=2025(25n)×0.525=0.00204(3 s.f.)


ii
c i A 95% confidence interval is 328±Z×σn, where Z=Φ−1(0.975)=1.960

328±1.960×425, giving 326.4<μ<329.6(1 d.p.)

ii H0: μ=330 and H1: μ<330

The probability that the mean volume is 330 ml lies outside the 95% confidence interval found in
part c i, so Philip will reject H0. There is evidence to suggest that the cans come from a population
whose contents are, on average, less than 330 ml.

Significance level = P(μ>329.6…)=2.5%


17 a P(X>3)=0.13+0.07+0.15=0.35

b The total probability is the product of the probabilities of each individual event happening multiplied
by two, since it is not specified who is borrowing more/fewer than 3 books.

2×P(X>3)×P(X<3)=2×0.35×0.45=0.315

c E(X)=1×0.19+2×0.26+3×0.20+4×0.13+5×0.07+6×0.15=3.08

Var(X) = 1×0.19+4×0.26+9×0.20+16×0.13 + 25×0.07+36×0.15−3.082 = 2.77 (3 s.f.)

d No probability of 0 books borrowed and no probability of more than 6 books borrowed.

e i 10×E(X)=30.8 pence

ii 10×σ(X)≈10×2.774=16.7 pence (3 s.f.)


Worked solutions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.

Cross-topic review exercise 2


1 a X~exp(3)

b f(x)=3e−3x, so P(1<X<2)=∫123e−3x dx= [−e−3x]12= e−3− e−6=0.0473(3 s.f.)

c If X~exp(λ), then the standard deviation of X is 1λ, which in this case is 13.
2 a Calculating the expected frequencies:

Age when leaving education (years)


Greatest rate of income tax paid
16 or less 17 or 18 19 or more

Zero 29.445 3.9 5.655


Basic 98.905 13.1 18.995

Higher 22.65 3 4.35

Some of the frequencies are less than 5, so merging the 17 or 18 and the 19 or more frequencies:

<17 >17
Zero 29.445 9.555

Basic 98.905 32.095


Higher 22.65 7.35

ν=2×1=2

Comparing the χcalc2 value to the critical value from the table at the 5% significance level:

χcalc2=∑(Oi−Ei)2Ei=7.05 (3 s.f.)>5.991

There is evidence of an association between age when leaving education and greatest rate of income
tax paid.

b This belief is supported at the 5% level of significance.

3 a Using the fact that for this probability density function, f(x), ∫00.1f(x) dx=1:

∫00.1k dx=0.1k=1⇒k=10

b Substituting k=10 into the formula for probability:

P(X>0.03)=∫0.030.110 dx=10×0.07=0.7
E(X)=∫00.1xf(x) dx=∫00.110x dx=[5x2]00.1=0.05=120
c i
E(X2)=∫00.1x2 f(x) dx=∫00.110x2 dx=[103x3]00.1=1300
ii

iii σ(X)=E(X2)−(E(X))2=1300−1202=0.0289 (3 s.f.)


4 a H0: There is no association between students' first time performances and their ages.

H1:  There is an association between students' first time performances and their ages.
Expected frequencies:

Pass Fail

17–18 19.2 28.8

19–30 6.4 9.6


31–39 18 27

40–60 4.4 6.6

The frequency in the Pass category of the 40–60 age group is less than 5, so combining the 31–39
and the 40–60 age groups:

Observed frequencies:

Pass Fail

17–18 28 20

19–30 2 14
31–60 18 38

Expected frequencies:

Pass Fail
17–18 19.2 28.8

19–30 6.4 9.6


31–60 22.4 33.6

Calculating the χ2 value:

χcalc2=∑(Oi−Ei)2Ei=(28−19.2)219.2+…+(38−33.6)233.6=13.2  (3 s.f.)

Number of degrees of freedom, ν=(3−1)(2−1)=2.

Comparing the χcalc2 value to the critical value from the table:

χcalc2=13.2>9.210 (critical value from the table at the 1% significance level), so reject H0. There is
significant evidence at the 1% significance level to support Julie's belief.

b More students than expected in the age group 17−18 pass their test first time.

σ(Y)=n2−112=1.12 (3 s.f.)
5 a
b i Ε(Z)=6+2Ε(X)=6+n+1=11

ii Var(Z)=22Var(X)=n2−13=5
6 a X~ U(n) for n=1,2,3,…,n.

n−10=3−1, so n=12.

E(X)=12+12=132

b X~ U(12), so P(X=x)=112, and P(X⩽3)=312=14.

Let the random variable Y be the number of these three observations that have a value of 3 or less,
so Y~B(3,14).

P(Y<2) =P(Y=0)+ P(Y=1) =( 3 0)×( 1 4)0×( 3 4)3+( 3 1)×( 1 4)1×( 3 4)2 =2732=0.844(3 s.f.)
7 a Y=5+0.02X
E(Y)=5+0.02E(X)=10Var(Y)=0.022Var(X)=1

b Z=25−Y

E(Z)=25−E(Y)=15Var(Z)=Var(Y)=1
8 a H0: μ=20 minutes

H1: μ≠20 minutes

s=4.57 (3 s.f.),X¯=22.6,ν=7T=X¯−20s8=1.63 (3 s.f.)

Use the table to find the critical value. For a two-tailed test you need to look at the 0.95 column.

Comparing your calculated t-value with the critical value:

|T|=1.63<1.895

There is insufficient evidence to reject the null hypothesis.

b Z=Φ−1(0.95)=1.65 (3 s.f.),σ=4.6,n=100μ−Zσn<20⇒μ<20.76 (2 d.p.)

Any value less than or equal to 20.75(2 d.p.) would lead to rejection of Rajul's claim.
9 a The population is large. Most marks would be concentrated around the mean, and the further a mark
is from the mean, the less likely it is to occur.

b The null hypothesis 'The school is not doing better than the rest of the population (μ=60)’ will be
rejected if the 10 students' average is more than 70%.

Significance level=P(X>70)=1− P(X⩽70)=1−Φ(70−601510)=1.75%(3 s.f.)

c Now test H0:  The school is not doing better than the rest of the population (μ=60) against H1: μ=65,
and find the probability that the 10 students' average is not more than 70.

P(type II error)=P(X⩽70|μ=65)=Φ(70−651510)=0.8541

Power of test =1−P(type II error)=0.146 or 14.6%(3 s.f.)

d The power of the test would increase. The test would be more likely to identify the difference from
60%.

In context, as the true mean gets closer to 70 (from below), it becomes more and more difficult to
believe that the school is not doing better than the rest of the population.
10 a e−10λ=0.25λ=0.1ln4

b Using the value of λ from part a:

Ρ(T=20)=e−20λ=0.0625

c Ε(T)=1λ=7.21 (3 s.f.)

11 Assume the population to be normally distributed.

H0: μ=79

H1: μ>79

s=5.58 (3 s.f.),X¯=82,ν=12−1=11T=X¯−79s12=1.86 (3 s.f.)

Use the table to find the critical value at the 5% significance level.

Comparing your calculated t-value with the critical value:


|T|=1.86>1.796

Reject H0. There is sufficient evidence to support Judith's belief.

12 H0: the new drug is not effective in the prevention of sickness in holiday-makers.

H1:  the new drug is effective in the prevention of sickness in holiday-makers.

Calculating the expected frequencies:

Sickness No sickness

Drug taken 28 52
No drug taken 7 13

n=100v=(2−1)(2−1)=1

Using Yates' correction:

χYates2=∑(|Oi−E|i−0.5)2Ei

Comparing the χYates2 value with the critical value from the table at the 5% significance level:

χYates2=3.37 (3 s.f.)<3.841

Do not reject H0. There is no evidence at the 5% significance level to support the claim that the drug is
effective against the sickness.
13 a Using the fact that E(X)= Var(X):

n+12=n2−112⇒n2−6n−7=0⇒n=7

b Var(2X)=22Var(X)=4Var(X)

Since E(X)= Var(X) from the question:

4Var(X)=4Ε(X)=2Ε(2X)≠Ε(2X)

14 Assume that the distances follow a normal distribution.

H0: μ=190 metres

H1: μ≠190 metres

s=11.7 (3 s.f.),X¯=184,ν=10−1=9T=X¯−190s10=−1.62 (3 s.f.)

Use the table to find the critical value. For a two-tailed test you need to look at the 0.99 column.

Comparing your calculated t-value with the critical value:

|T|=1.62 (3 s.f.)<2.821

Do not reject H0. There is sufficient evidence to support Lorraine's belief that there has been no
change.
15 a It must be a random sample.

b H0: μ=44.1 mph

H1: μ<44.1 mph

s=9.35 (3 s.f.),X¯=43.3,ν=100−1=99T=X¯−44.1s100=−2.71 (3 s.f.)

Use the table to find the critical value at the 1% significance level.
Comparing your calculated t-value with the critical value:

|T|=2.71 (3 s.f.)>2.364

Reject H0. There is significant evidence that the mean speed has reduced.
c i Concluding that the mean speed has reduced when in fact it has not.

ii Concluding that the mean speed is still 44.1 when in fact it has reduced.

16 a Ε(X)=14+22+n4=5+n4

4Ε(1X)=4(14×1+12×2+14×n)=2+1n

Setting E(X)=4E(1X) and equating to 0:

n2−3n−4=0⇒n=−1 or 4

b For n=4:

Var(X)=E(X2)−(E(X))2=124+222+424−(94)2=1.1875

Var(1X)=E(1X2)−(E(1X))2=14×12+12×22+14×42−0.56252=0.0742Var(X)Var(1X)=16
17 a H0: λ=70, H1: λ>70

b X represents the number of cars passing the school gates in a 10-minute interval, X~Po(70).

Ρ(X⩾x)=∑n=x∞70ne−70n!⩽0.1⇒X⩾82

c Reject H0. There is sufficient evidence that the mean number of cars has increased.

d The mean number of cars passing is 12 per minute, so now X~ Po(120).

Ρ(X⩽81)=∑n=081120ne−120n!=1.01×10−4 (3 s.f.)
18 a λ0=3×16=48

Ρ(X⩽35)=∑n=03548ne−48n!=0.0309 (3 s.f.)
b
c The number of visitor groups is now 12 per hour, so if X represents the number of groups arriving in a
3-hour period, X~Po(3×12=36).

Ρ(X>35)=∑n=36∞36ne−36n!≈0.522Power=1−P(X>35)=0.478 (3 s.f.)

19 a Interval is X¯−Z×sn<μG<X¯+Z×sn

2.0001−1.960×10−410<μG<2.0001+1.960×10−410

2.000 038<μG<2.000 162(7 s.f.)

b Lower bound=10 000(2.000 038−2)=0.380(3 s.f.)

Upper bound=10 000(2.000 161−2)=1.62(3 s.f.)

Interval is 0.380<μY<1.62

c No; G=2 lies outside the confidence interval.


d i Let the number of confidence intervals that contain the true mean be X, then X~ B(3,0.95).

P(X⩾2)=∑x=23(3x)×0.95x×0.053−x=39714000 or 0.993(3 s.f.)

ii Let the number of confidence intervals that are above the true mean be Y, then Y~B(3,0.025).

P(Y=3)=0.0253=164 000 or 0.000 015 6(3 s.f.)


Acknowledgements
The authors and publishers acknowledge the following sources of copyright material and are grateful for
the permissions granted. While every effort has been made, it has not always been possible to identify the
sources of all the material used, or to trace all copyright holders. If any omissions are brought to our
notice, we will be happy to include the appropriate acknowledgements on reprinting.

Thanks to the following for permission to reproduce images:

Cover image: Peter Medlicott Sola/Getty Images


Back cover: Fabian Oefner www.fabianoefner.com

Serdarbayraktar/Getty Images; Chris Hepburn/Getty Images; PM Images/Getty Images;


aaaaimages/Getty Images; John Lund/Getty Images; espy3008/Getty Images

AQA material is reproduced by permission of AQA.


University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025,
India
79 Anson Road, #06–04/06, Singapore 079906

Cambridge University Press is part of the University of Cambridge.


It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning and research at the highest international levels of excellence.

www.cambridge.org
Information on this title:
www.cambridge.org/9781316644508 (Paperback)
www.cambridge.org/9781316644324 (Paperback with Cambridge Elevate edition)
www.cambridge.org/9781316644584 (Cambridge Elevate edition 2 years)
www.cambridge.org/9781316644614 (Cambridge Elevate edition 1 year School Site
Licence)
© Cambridge University Press 2018
This publication is in copyright. Subject to statutory exception and to the provisions of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published 2018
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Printed in the United Kingdom by Latimer Trend.
A catalogue record for the print publication is available from the British Library
ISBN 978-1-316-64450-8 Paperback
ISBN 978-1-316-64432-4 Paperback with Cambridge Elevate edition

Additional resources for this publication at www.cambridge.org/education


Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication, and
does not guarantee that any content on such websites is, or will remain, accurate or
appropriate.

NOTICE TO TEACHERS IN THE UK

It is illegal to reproduce any part of this work in material form (including photocopying
and electronic storage) except under the following circumstances:
(i) where you are abiding by a licence granted to your school or institution by the
Copyright Licensing Agency;
(ii) where no such licence exists, or where you wish to exceed the terms of a licence,
and you have gained the written permission of Cambridge University Press;
(iii) where you are allowed to reproduce without permission under the provisions of
Chapter 3 of the Copyright, Designs and Patents Act 1988, which covers, for
example, the reproduction of short passages within certain types of educational
anthology and reproduction for the purposes of setting examination questions.

Message from AQA

This textbook has been approved by AQA for use with our qualification. This
means that we have checked that it broadly covers the specification and we are
satisfied with the overall quality. Full details of our approval process can be found
on our website.
We approve textbooks because we know how important it is for teachers and
students to have the right resources to support their teaching and learning.
However, the publisher is ultimately responsible for the editorial control and
quality of this book.
Please note that when teaching the A/AS Level Further Mathematics (7366, 7367)
course, you must refer to AQA’s specification as your definitive source of
information. While this book has been written to match the specification, it cannot
provide complete coverage of every aspect of the course.
A wide range of other useful resources can be found on the relevant subject pages
of our website: www.aqa.org.uk

IMPORTANT NOTE AQA has not approved any Cambridge Elevate content.

You might also like