1-3-correlation--regression-jYMCtkvRAlEsm
1-3-correlation--regression-jYMCtkvRAlEsm
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 1
Easy Questions
1 (a) Explain what is measured by the Pearson product moment correlation coefficient.
(2 marks)
(b) The product moment correlation coefficient between two variables is denoted r . Five
different values of r , rounded to four decimal places, are given below:
r 1 = 0 . 0000
r 2 = 0 . 9812
r 3 = − 1 . 0000
r 4 = 0 . 7652
r 5 = − 0 . 7098
Match each of the following four scatter graphs, showing observations from different
bivariate data sets, to one of the values of r given above. You should use each given
value of r no more than once.
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 2
(4 marks)
(c) Sketch a scatter graph for the remaining value of r from the list above.
(2 marks)
2 A teacher is interested in the relationship between the number of hours her students
spend on a phone per day and the number of hours they spend on a computer. She
takes a sample of nine students and records the results in the table below.
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 3
Hours spent on a
7.6 7 8.9 3 3 7.5 2.1 1.3 5.8
phone per day
Hours spent on a
1.7 1.1 0.7 5.8 5.2 1.7 6.9 7.1 3.3
computer per day
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 4
3 (a) The table below shows data for a sample of 8 people comparing the maximum number
of pull-ups they are able to complete, x, with the maximum number of press-ups, y.
(4 marks)
(ii) Explain the purpose of regression lines and how they may be used.
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 5
(4 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 6
4 (a) A class is asked to collect a sample of bivariate data. They collect data on the shoe size, S,
and the arm span, A cm, of 20 randomly selected boys from the class.
(1 mark)
(b) The class plot the data in a scatter diagram and find the equation of the regression line
of A on S to be A=4.5 S + 133. These are both plotted in the diagram below.
(iii) Explain how the sign of the coefficient of S in the equation is related to the
correlation shown in the scatter diagram.
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 7
5 (a) The following table shows data comparing the length of time a cake was baked for, t
minutes, with the mass of the cake once it has cooled, m grams. Each cake in the sample
weighed the same before being baked.
t 37 35 36 31 30 28 36
m 825 868 812 943 947 997 837
State which variable is the explanatory (independent) variable and which is the response
(dependent) variable.
(1 mark)
(i) Use the regression line to estimate the mass of a cake if it is baked for 32 minutes.
(2 marks)
(c) (i) Use the regression line to estimate the mass of a cake if it is baked for 80 minutes.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 8
6 (a) Eight food critics are asked to give a rating, x out of ten, for a new restaurant. The
following shows their scores:
5 4 8 4 9 6 5 7
(iii) (∑x ) 2
Use your answers to part (i) and (ii) and the formula S =∑ x2 − to find
xx n
the value of S .
xx
(4 marks)
(b) The ratings, y out of ten given to a different restaurant by the same food critics are
summarised below:
∑y = 52 ∑y 2 = 352 n =8
(2 marks)
∑x ∑y
(c) Use the formula S xy = ∑xy − and the statistic ∑xy = 328 to find the
n
value of S .
xy
(1 mark)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 9
S xy
(d) Use the formula r = to calculate the product moment correlation
S xx × S yy
coefficient.
(2 marks)
(e) State whether you think the food critics are consistent with their scoring, based on your
answer to part (d).
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 10
7 (a) The heights, h metres rounded to 1 decimal place, and the weights, w kg rounded to the
nearest kilogram, of a group of newly born elephants are recorded in the table below.
(∑h ) 2
Use the formula S = ∑h 2 − to find the value of S .
hh n hh
(4 marks)
Weight, w kg
96 103 98 99 102 101
x = w − 100 −4 3
(1 mark)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 11
(4 marks)
S hx
(d) Use the formula r = to find the product moment correlation coefficient
S hh × S xx
between h and x .
(2 marks)
(e) Hence, write down the product moment correlation coefficient between the heights, h ,
and the weights, w , of the newly born elephants.
(1 mark)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 12
8 (a) The manager of a local supermarket collected data on the distance a person’s house was
from the supermarket, d miles, and the average total cost of the person’s shopping, c
dollars. The information is given in the table below.
x = 10 d 6 3
y = c − 30 3 −1
(2 marks)
(b) Find the mean of x , x⎯⎯ , and show that the mean of y ⎯⎯
is y = 5
(3 marks)
(4 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 13
(d) The equation of the least squares regression line of y on x is written in the form
S xy 5
y = a + bx , where b = and a = y⎯⎯ − b x⎯⎯ . Show that b = and find the value of a .
S xx 3
(3 marks)
(e) By substituting x = 10d and y = c − 30 into your answer for part (d), show that the least
50
squares regression line of c on d is c = 25 + d.
3
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 14
Medium Questions
1 (a) A teacher collected the maths and physics test scores of a number of students and drew
a scatter diagram to represent this data.
Describe the correlation shown by the scatter diagram, and interpret the correlation in
context.
(2 marks)
(b) An alternative therapist collected data on his clients’ reported levels of anxiety as well as
the number of trees they had hugged in the course of therapy. He drew a scatter
diagram to represent this data.
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 15
Describe the correlation shown by the scatter diagram, and interpret the correlation in
context.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 16
2 (a) The table below shows data from the United States regarding annual per capita cheese
consumption (in pounds) and the divorce rate (number of divorces per 1000 people) for
ten years between 2000 and 2018:
Year 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
Cheese
consumption 32.1 32.8 33.6 34.8 34.5 35 35.5 36.2 38.5 40
(pounds)
Divorce rate
(number per 4 3.9 3.7 3.7 3.5 3.6 3.4 3.2 3.0 2.9
1000 people)
Draw a scatter diagram to represent this data, with per capita cheese consumption on
the horizontal axis and divorce rate on the vertical axis.
(3 marks)
(b) (i) Describe the correlation between per capita cheese consumption and divorce rate.
(ii) Do you think there is a causal relationship between per capita cheese consumption
and divorce rate in the United States?
Explain your reasoning.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 17
3 (a) Myfanwy has been applying different voltages (v , measured in volts) to an electrical
circuit in her lab and recording the resulting currents (i , measured in amps). The
smallest voltage she applied was 0.5 volts, and the largest voltage she applied was 120
volts.
(ii) Use the equation to predict the current for a voltage of 70 volts.
(2 marks)
(b) Explain why it would not be sensible to use the regression equation to work out:
(2 marks)
(c) Myfanwy’s lab partner suggests that the value 0.056 in the regression equation
represents the current in the circuit when the voltage applied is zero. Explain why he
might suggest this, but also suggest a reason why his interpretation is most likely
incorrect.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 18
4 (a) The following table shows the height, h cm, and weight, w kg, for each of eleven students
at a sixth form college.
h 167 182 176 173 17 174 177 178 172 170 169
w 51 62 69 65 65 56 64 62 51 55 58
An outlier is an observation which lies more than ±2 standard deviations from the mean.
(ii) Explain why this outlier should be omitted from the data.
(2 marks)
(b) With the outlier data excluded, the equation of the regression line of w on h is w = − 87.6
+ 0.845h.
(i) Exclude the outlier data from the recorded measurements and draw a scatter
diagram to represent the data for the remaining ten students.
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 19
(c) Based on your diagram, along with the regression equation, to what extent would you
say that a person’s height may be used as an accurate predictor of his or her weight?
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 20
5 (a) An A level music teacher is collecting data on the number of hours his students spend
rehearsing their final piece, h , and the number of mistakes made in their exam, m . He
calculates the following summary data of ten of his students.
(4 marks)
(b) The music teacher calculates the equation of the regression line of m on h to be
m = a + bh .
Show that b = − 0 . 0626 correct to 3 significant figures and find the value of a .
(3 marks)
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 21
6 (a) An estate agent, Terry, claims that there is a correlation between the value of a house, v
(£1000) and the distance between that house and the nearest nightclub, d (miles).
Terry has a database containing over 100 houses and he takes a random sample of
seven houses to investigate his claim. The scatter graph below shows the results:
Terry calculates the product moment correlation coefficient as r = 0 . 852 . Using the
scatter graph, explain how you know Terry’s PMCC value is incorrect.
(1 mark)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 22
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 23
7 (a) The table below shows some data on daily mean air pressure, p (hPa), and daily total
sunshine, s (mins), in a certain area over a random sample of 7 days.
s − 300
x = p – 1011 y=
4
x 6
y 7
(2 marks)
(4 marks)
(c) Use your answers to part (b) to find the product moment correlation coefficient for the
daily mean air pressure and daily total sunshine. Comment on the relationship between
the two variables.
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 24
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 25
Hard Questions
1 (a) Ella measures how the extension, x mm, of a thin piece of metal wire varies with the
force applied to it, F kN. She records her results in the table below.
(1 mark)
(b) The correct equation for the regression line of F on x is F = 6.16 + 67.6x.
(1 mark)
(c) Using the correct regression line, Ella estimates that if she applies a force of 1000 kN
then the wire will show an extension of 14.7 mm.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 26
2 (a) The table below shows a comparison of the average house price, H (£100 000), and the
average yearly income, I (£10 000), for different areas around the UK in 2021.
Area H I
Conwy 155.1 26.4
Perth and Kinross 181.3 27.9
Richmondshire 190.3 25.1
Monmouthshire 232.6 31.4
Trafford 260.2 32.0
Gwynedd 148.5 23.6
Basingstoke and Dean 297.7 33.7
Daventry 259.2 29.5
(4 marks)
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 27
(c) A particularly unscrupulous politician uses this to claim that if you want a salary of £35
000, all you need to do is buy a house that costs £583 000.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 28
3 (a) Two researchers, Alwyn and Beth, are working on a project collecting data about the self-
reported happiness of students on a scale from 0 to 10, H , and the number of exams sat
by those students, n . After collecting data from 1000 students, they construct a scatter
diagram and find the equation of the regression line of H on n to be H = a + bn .
(5 marks)
(b) What information about the original data set would need to be checked before using the
regression line equation to estimate the self-reported happiness of a student sitting 8
exams?
(1 mark)
(c) After calculating the equation of the line of regression, Alwyn accidentally deletes all the
data collected about the self-reported happiness scores. Alwyn says it’s not a problem
since he can use the regression line and the number of exams sat to recalculate all the
values. Beth says that Alwyn is wrong and the original data is lost forever.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 29
4 (a) A consultant is trying to improve the efficiency of how a factory making chewing gum
operates. To help them do this, they collect many types of data about the factory
workers. One such type of data is the number of chewing gum packets made per shift.
The list below shows the number of chewing gum packets made by a particular worker
(Worker 1) during the last 10 shifts worked.
392 414 536 474 212 396 427 545 459 234
Calculate the mean number of chewing gum packets made per shift by Worker 1 to the
nearest whole number of packets.
(1 mark)
(b) The table below shows the mean number of chewing gum packets, N , made by various
workers along with how many hours of training, T hours, they have received.
Worker 1 2 3 4 5 6 7 8 9
T 18 24 22.5 15 16 20 21 22 21
(i) Including your answer from (a), plot a scatter diagram of the data in the table
above.
(ii) Given that the equation of the regression line of N on T is N = 18T + 95, add the
regression line to your scatter diagram.
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 30
(c) The consultant then goes on to collect even more data on other factory workers and
records some of it in the table below.
Worker 10 11 12 13 14 15 16 17 18
Without adding this new data to your scatter diagram, what advice could the consultant
give to the factory to improve the efficiency of their workers?
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 31
5 (a) A snack shop owner has noticed that the sale of energy drinks seems to increase later in
the school term. He decides to collect data over the final ten days of a school term to
see if the sale of the energy drinks per day, h , increases as the number of days until the
school holidays, d , decreases.
(i) What type of correlation is the snack shop owner testing for?
(ii) State which of the two variables is the explanatory variable.
(2 marks)
(b) Over the ten days the snack shop owner collects the following summary statistics:
(5 marks)
(c) The snack shop owner uses this data to calculate the regression line of d on h and uses
it to predict the number of energy drinks he will sell on the first day of the new term,
when there are still 90 days until the holidays. State two reasons why this is unlikely to
give a reliable prediction.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 32
6 (a) Hatter has noticed that over the past 50 years there seems to be fewer hatmakers in
London. He also knows that global temperatures have been rising over the same time
period. He decides to see if there could be any correlation, so he collects data on the
number of hatmakers each year in London h , and the yearly global mean temperatures,
t from the past 50 years and records the information in the graph below.
Explain why the product moment correlation coefficient between h and t can not be
r = 0 . 05.
(2 marks)
⎯
∑h = 7423 ∑h 2 = 2107421 ∑ht = 4273. 1 t = 0 . 61 S tt = 0 . 195
Find the value of the product moment correlation coefficient, r , correct to 4 decimal
places.
(4 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 33
(c) Hatter concludes that the rise in mean global temperature is what is causing hatmakers
in London to go out of business.
(1 mark)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 34
7 (a) On 21st January 2020, doctors in China started recording and reporting the number of
new daily cases of an unknown virus. Over the first five days there were 916 new cases.
The table below shows the number of new cases, c , of the virus in a town in China and
the number of days, d , after 21st January 2020. The number of new cases were not
available for the 6th and 11th days for this town.
d 7 8 9 10 12
c 700 1700 1600 1700 1500
Given that for days 1 to 5 the value of ∑c 2 = 213622, use the data you have for the 10
days when cases were reported to calculate the values of S and S dd .
cc
(4 marks)
(b) The value of the product moment correlation coefficient between the number of days
after 21st January 2020 and the number of new cases was calculated as r = 0 . 8880.
Use this value of r and your answers from part (a) to find the value of S cd .
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 35
Explain why the equation for the regression line should not be used to estimate
how many new cases there were on 19th January 2020.
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 36
8 (a) A restaurant owner, Mr Capazio, suspects that there is positive correlation between the
number of alcoholic beverages a person has with their meal and the amount of time it
takes them to pay their bill at the end of the evening. He decides to collect some data to
test his theory.
(i) In the context of this question, describe what positive correlation would mean.
(ii) State which of the two variables is the dependent variable.
(2 marks)
(b) The table below shows the number of alcoholic beverages consumed, d , and the amount
of time taken to pay the bill, t seconds, for a sample of 10 visitors to the restaurant on a
particular night.
Number of
drinks, d 0 1 3 2 8 4 2 0 3 2
Time taken, t
seconds 155 190 320 245 375 540 130 190 180 250
(i) t
Using the coding x= − 50, find the values of S , S xx and S xd .
5 dd
(ii) Calculate the product moment correlation coefficient between d and t .
(6 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 37
(c) Mr Capazio calculates the regression line of t on d to be t = 171 . 8 + 34. 3d .
(i) Give an interpretation of the values 171.8 and 34.3 in the context of the question.
(ii) A person took 4.5 minutes to pay their bill. Explain why the regression line should
not be used to estimate the number of drinks they had ordered.
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 38
Very Hard Questions
1 Four statisticians are arguing over which line best highlights the trend of the set of data
shown in the scatter diagram below.
The first statistician draws, by eye, a line of best fit and claims its equation is
y = − 0 . 05 + 0 . 17x . The second draws, again by eye, a different line of best fit and
claims its equation is y = − 1 . 08 + 1 . 3x . The third calculates the equation of the
regression line of y on x claims it is y = 0 . 18 + 0 . 11x . The fourth statistician claims that
all three of the other statisticians are definitely wrong and that there is no line of best fit.
By adding each of these lines to the scatter diagram, comment on the claims of each of
the statisticians.
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 39
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 40
2 (a) Paige takes a sample of 9 cities throughout the UK to compare the percentage of people
living in a city who identify as vegan, V %, and the percentage of restaurants offering
vegan options in that same city, R %.
(4 marks)
(b) In one of the cities, 1.16% of people were vegan and 55.9% of restaurants offered vegan
options.
(2 marks)
(c) Paige discovers that in one city every restaurant offers vegan options. Paige suggests
that the equation of the regression line of R on V can be used to find the percentage of
people in this city who identify as vegan. Explain why Paige is likely wrong.
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 41
3 (a) A ride sharing app collected data on the time, t minutes, taken to complete a journey of
distance, d miles. Data from a random sample of 8 journeys is detailed in the table
below.
(5 marks)
(b) Using a new random sample of thousands of journeys, the ride sharing app calculated
the regression line of time on distance to be t = − 1.8 + 5.9d.
The app uses this regression equation to predict that a journey of distance 7 km would
take 39.5 minutes. Explain why this is incorrect.
(1 mark)
(c) The regression equation predicts that for journeys less than 0.3 miles the time taken will
be less than zero minutes. What is the most likely reason that the regression equation
gives this false prediction?
(1 mark)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 42
4 (a) A maths teacher randomly selects 10 students from a class of 30 to answer a survey. The
survey asks students how many practice questions they completed when revising for a
recent test, Q, and their percentage score in that test, S %. Summary statistics for Q are
shown below
⎯⎯⎯
Q =21 Range of Q=20
(2 marks)
(b) Use the regression equation to find an estimate for the mean value and range of S. State
any assumptions that are needed.
(6 marks)
(i) estimate the scores of the other students in the maths class,
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 43
(2 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 44
5 (a) An owner of a beach resort is comparing parasol sales, £p, and sun cream sales, £s, at
the resort over a period of eleven days. The data is standardised by coding the variables
s −153 p −32
using x = and y = . The values for the first ten days are plotted on the
103 37
scatter diagram below.
(i) On the eleventh day, the resort sold £246 worth of sun cream and £69 worth of
parasols. Use this information to complete the scatter diagram.
(ii) The equation for the regression line of y on x is y = 0.19+0.83x. Add the regression
line to the scatter diagram.
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 45
(b) (i) Show that by using the regression line of y on x and the coding equations above,
the regression line of p on s can be written in the form p = a + bs, where a and b are
constants to be found to 3 significant figures.
(ii) Hence, or otherwise, find an estimate for the amount of parasol sales on a day
where there are £170 of sun cream sales.
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 46
6 (a) Effie loves watching the turtles play in the lake near her house, she thinks that there is a
relationship between the number of turtles and the number of ducks that live on
different parts of the lake. She decides to investigate this further and gathers data on
duck and turtle populations from six wildlife centres. Effie records the data in the table
below.
Centre A B C D E F
d−m t −p
Effie codes the results using the codes x = and y= . Some of the
n q
values for x and y are recorded in the table below.
Centre A B C D E F
x 2 1.5
y 1 1.2
(4 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 47
(6 marks)
(c) Use your answers to parts (a) and (b) to find the regression line of t on d , show your
working clearly.
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 48
7 (a) Charlie is interested to find out if there is positive correlation between the number of
letters in someone’s name, l , and the time, t rounded to the nearest five seconds, it
takes her six-year-old sister to correctly guess the spelling of the name. She decides to
test this by looking at a random sample of different names and timing how long it takes
her sister to guess their spelling.
Letters, l 4 5 5 5 6 7
Time, t 10 5 15 25 60 80
Frequency
x 3 29 17 7 1
Given that S = 17 . 9375 , find the value of x and hence find the number of names in
u
Charlie’s sample.
(6 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 49
(5 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 50
8 (a) Two variables, p and q , are thought to be connected in the form q = a + bp , where a
and b are constants. A random sample of 100 pairs of data are taken from data sets p
p − 100 q − 20
and q and are coded such that x = and y = . The data from the
5 10
coded records are summarised below.
S xx = 6 ∑y = 11 ∑y 2 = 1 . 29 ∑xy = 20. 25
Given that the product moment correlation coefficient between x and y is
r = − 0 . 93819 , find the value of S xy correct to 3 significant figures.
(3 marks)
(5 marks)
(3 marks)
© 2024 Save My Exams, Ltd. Get more and ace your exams at savemyexams.com 51