PhysicsAndMathsTutor.
com
S1 Correlation and regression PMCC
1.
Gary compared the total attendance, x, at home matches and the total number of goals, y, scored
at home during a season for each of 12 football teams playing in a league. He correctly
calculated:
Syy = 130.9
Sxx = 1022500
(a)
Sxy = 8825
Calculate the product moment correlation coefficient for these data.
(2)
(b)
Interpret the value of the correlation coefficient.
(1)
Helen was given the same data to analyse. In view of the large numbers involved she decided to
divide the attendance figures by 100. She then calculated the product moment correlation
x
and y.
coefficient between
100
(c)
Write down the value Helen should have obtained.
(1)
(Total 4 marks)
2.
The blood pressures, p mmHg, and the ages, t years, of 7 hospital patients are shown in the table
below.
Patient
42
74
48
35
56
26
60
98
130
120
88
182
80
135
[ t = 341, p = 833, t
(a)
= 18181,
= 106397,
tp = 42948]
Find Spp, Stp and Stt for these data.
(4)
(b)
Calculate the product moment correlation coefficient for these data.
(3)
Edexcel Internal Review
S1 Correlation and regression PMCC
(c)
PhysicsAndMathsTutor.com
Interpret the correlation coefficient.
(1)
(d)
On the graph paper below, draw the scatter diagram of blood pressure against age for
these 7 patients.
(2)
Edexcel Internal Review
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
(e)
Find the equation of the regression line of p on t.
(4)
(f)
Plot your regression line on your scatter diagram.
(2)
(g)
Use your regression line to estimate the blood pressure of a 40 year old patient.
(2)
(Total 18 marks)
3.
The volume of a sample of gas is kept constant. The gas is heated and the pressure, p, is
measured at 10 different temperatures, t. The results are summarised below.
p = 445
(a)
p2 = 38 125
t = 240
t2 = 27 520
pt = 26 830
Find Spp and Spt.
(3)
Given that Stt = 21 760,
(b)
calculate the product moment correlation coefficient.
(2)
(c)
Give an interpretation of your answer to part (b).
(1)
(Total 6 marks)
4.
In a study of how students use their mobile telephones, the phone usage of a random sample of
11 students was examined for a particular week.
The total length of calls, y minutes, for the 11 students were
17, 23, 35, 36, 51, 53, 54, 55, 60, 77, 110
(a)
Find the median and quartiles for these data.
(3)
Edexcel Internal Review
S1 Correlation and regression PMCC
PhysicsAndMathsTutor.com
A value that is greater than Q3 + 1.5 (Q3 Q1) or smaller than Q1 1.5 (Q3 Q1) is defined
as an outlier.
(b)
Show that 110 is the only outlier.
(2)
(c)
Using the graph below draw a box plot for these data indicating clearly the position of the
outlier.
(3)
The value of 110 is omitted.
(d)
Show that Syy for the remaining 10 students is 2966.9
(3)
These 10 students were each asked how many text messages, x, they sent in the same week.
The values of Sxx and Sxy for these 10 students are Sxx = 3463.6 and Sxy = 18.3.
(e)
Calculate the product moment correlation coefficient between the number of text
messages sent and the total length of calls for these 10 students.
(2)
A parent believes that a student who sends a large number of text messages will spend fewer
minutes on calls.
(f)
Comment on this belief in the light of your calculation in part (e).
(1)
(Total 14 marks)
Edexcel Internal Review
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
5.
As part of a statistics project, Gill collected data relating to the length of time, to the nearest
minute, spent by shoppers in a supermarket and the amount of money they spent.
Her data for a random sample of 10 shoppers are summarised in the table below, where t
represents time and m the amount spent over 20.
(a)
t (minutes)
15
23
17
19
16
30
12
32
27
23
35
20
27
Write down the actual amount spent by the shopper who was in the supermarket for 15
minutes.
(1)
(b)
Calculate Stt, Smm and Stm.
(You may use t2 = 5478
m2 = 2101
tm= 2485)
(6)
(c)
Calculate the value of the product moment correlation coefficient between t and m.
(3)
(d)
Write down the value of the product moment correlation coefficient between t and the
actual amount spent. Give a reason to justify your value.
Edexcel Internal Review
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
(2)
On another day Gill collected similar data. For these data the product moment correlation
coefficient was 0.178
(e)
Give an interpretation to both of these coefficients.
(2)
(f)
Suggest a practical reason why these two values are so different.
(1)
(Total 15 marks)
6.
Students in Mr Brawns exercise class have to do press-ups and sit-ups. The number of
press-ups x and the number of sit-ups y done by a random sample of 8 students are summarised
below.
(a)
x = 272,
x2 = 10 164,
y = 320,
y2 = 13 464.
xy = 11 222,
Evaluate Sxx, Syy and Sxy.
(4)
(b)
Calculate, to 3 decimal places, the product moment correlation coefficient between x and
y.
(3)
(c)
Give an interpretation of your coefficient.
(2)
(d)
Calculate the mean and the standard deviation of the number of press-ups done by these
students.
(4)
Mr Brawn assumes that the number of press-ups that can be done by any student can be
modelled by a normal distribution with mean and standard deviation . Assuming that and
. take the same values as those calculated in part (d),
(e)
find the value of a such that P( a < X < + a) = 0.95.
(3)
Edexcel Internal Review
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
(f)
Comment on Mr Brawns assumption of normality.
(2)
(Total 18 marks)
7.
A researcher thinks there is a link between a person's height and level of confidence. She
measured the height h, to the nearest cm, of a random sample of 9 people. She also devised a
test to measure the level of confidence c of each person. The data are shown in the table below.
h
179
169
187
166
162
193
161
177
168
569
561
579
561
540
598
542
565
573
[You may use h2 = 272 094, c2 = 2 878 966, hc = 884 484]
(a)
Draw a scatter diagram to illustrate these data.
(4)
(b)
Find exact values of Shc Shh and Scc.
(4)
(c)
Calculate the value of the product moment correlation coefficient for these data.
(3)
(d)
Give an interpretation of your correlation coefficient.
(1)
(e)
Calculate the equation of the regression line of c on h in the form c = a + bh.
(3)
(f)
Estimate the level of confidence of a person of height 180 cm.
(2)
(g)
State the range of values of h for which estimates of c are reliable.
(1)
(Total 18 marks)
Edexcel Internal Review
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
8.
A company owns two petrol stations P and Q along a main road. Total daily sales in the same
week for P (p) and for Q (q) are summarised in the table below.
p
Monday
4760
5380
Tuesday
5395
4460
Wednesday
5840
4640
Thursday
4650
5450
Friday
5365
4340
Saturday
4990
5550
Sunday
4365
5840
When these data are coded using x =
q 4340
p 4365
and y =
,
100
100
x = 48.1, y = 52.8, x2 = 486.44, y2 = 613.22 and xy = 204.95.
(a)
Calculate Sxy, Sxx and Syy.
(4)
(b)
Calculate, to 3 significant figures, the value of the product moment correlation coefficient
between x and y.
(3)
(c)
(i)
Write down the value of the product moment correlation coefficient between p and
q.
(ii)
Give an interpretation of this value.
(2)
(Total 9 marks)
Edexcel Internal Review
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
1.
(a)
8825
,
1022500 130.9
r=
= awrt 0.763
M1 A1
B1
B1ft
Note
M1 for a correct expression, square root required
Correct answer award 2/2
(b)
Teams with high attendance scored more goals
(oe, statement in context)
Note
Context required (attendance and goals). Condone causality.
B0 for strong positive correlation between attendance and goals
on its own oe
(c)
0.76(3)
Note
Value required.
Must be a correlation coefficient between 1 and +1 inclusive.
B1ft for 0.76 or better or same answer as their value from part (a)
to at least 2 d.p.
[4]
2.
(a)
S pp
833 2
= 106397
= 7270
7
341 833
= 2369,
7
3412
10986
S tt = 18181
= 1569.42857.... or
7
7
M1 A1
S pp = 42948
A1 A1
Note
M1
for at least one correct expression
1st A1 for Spp = 7270, 2nd A1 for Stp = 2369 or 2370,
3rd A1 for Stt = awrt 1570
Edexcel Internal Review
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
(b)
r=
2369
M1 A1ft
7270 1569.42857...
= 0.7013375
awrt (0.701)
A1
B1
B2
Note
M1
for attempt at correct formula and at least one
correct value (or correct ft) M0 for
42948
106397 18181
A1ft
(c)
All values correct or correct ft. Allow for
an answer of 0.7 or 0.70 Answer only: awrt
0.701 is 3/3, answer of 0.7 or 0.70 is 2/3
(Pmcc shows positive correlation.)
Older patients have higher blood pressure
Note
B1
for comment in context that interprets the
fact that correlation is positive, as in scheme.
Must mention age and blood pressure in words,
not just t and p.
(d)
Points plotted correctly on graph: 1 each error
or omission
(within one square of correct position)
Note
Record 1 point incorrect as B1B0 on epen. [NB overlay
for (60, 135) is slightly wrong]
Edexcel Internal Review
10
S1 Correlation and regression PMCC
(e)
b=
2369
= 1.509466...
1569.42857...
a=
833
341
b
= 45.467413...
7
7
P = 45.5+1.51t
PhysicsAndMathsTutor.com
M1 A1
M1
A1
B1ft B1
Note
(f)
1st M1
for use of the correct formula for b,
ft their values from (a)
1st A1
allow 1.5 or better
2nd M1
for use of y b x with their values
2nd A1
for full equation with a = awrt 45.5 and
b = awrt 1.51. Must be p in terms of t,
not x and y.
Line drawn with correct intercept, and gradient
Diagram for (d) + (f)
Edexcel Internal Review
11
S1 Correlation and regression PMCC
PhysicsAndMathsTutor.com
Note
1st B1ft
ft their intercept (within one square).
You may have to extend their line.
2nd B1
for correct gradient i.e. parallel to given
line (Allow 1 square out when t = 80)
Edexcel Internal Review
12
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
(g)
t = 40, p = 105.84... from equation or graph.
awrt 106
M1 A1
Note
M1
for clear use of their equation with
t = 40 or correct value from their graph.
A1
for awrt 106. Correct answer only (2/2)
otherwise look for evidence on graph to
award M1
[18]
3.
(a)
(S
pp
= 38125
445 2
10
M1
= 18322.5
(S
pp
= 26830
awrt 18300
A1
awrt 16200
A1
445 240
10
= 16150
Note
M1 for seeing a correct expression
445 2
445 240
38125
or 26830
10
10
If no working seen, at least one answer
must be exact to score M1 by implication.
(b)
r=
"16150"
"18322.5"21760
Using their values
for method
= 0.8088...
awrt 0.809
M1
A1
Note
Square root and their values with 21760
all in the right places required for method.
Anything which rounds to (awrt) 0.809 for A1.
Edexcel Internal Review
13
S1 Correlation and regression PMCC
(c)
PhysicsAndMathsTutor.com
As the temperature increases
the pressure increases.
B1
Note
Require a correct statement in context using
temperature/heat and pressure for B1.
Dont allow as t increases p increases.
Dont allow proportionality.
Positive correlation only is B0 since there is
no interpretation.
[6]
4.
(a)
Q2 = 53,
Q1 = 35,
Q3 = 60
B1, B1, B1
Note
1st B1 for median
2nd B1 for lower quartile
3rd B1 for upper quartile
(b)
Q3 Q1 = 25 Q1 1.5 25 = 2.5 (no outlier)
Q3 + 1.5 25 = 97.5 (so 110 is an outlier)
M1
A1
M1
A1ft
A1ft
Note
M1
A1
for attempt to find one limit
for both limits found and correct. No explicit comment about
outliers needed.
(c)
Note
M1
for a box and two whiskers
st
1 A1ft for correct position of box, median and quartiles. Follow through
their values.
nd
2 A1ft for 17 and 77 or their 97.5 and * . If 110 is not an outlier then
score A0 here. Penalise no gap between end of whisker and outlier.
Must label outlier, neednt be with * .
Accuracy should be within the correct square so 97 or 98 will do for 97.5
Edexcel Internal Review
14
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
(d)
y = 461,
y 2 = 24 219S yy = 24219
4612
, = 2966.9(*)
10
B1, B1,
B1cso
Note
1st B1
2nd B1
(e)
for
y
for
N.B.
( y )
= 212521 and can imply this mark
or at least three correct terms of
( y y)
seen.
3rd B1
for complete correct expression seen leading to 2966.9. So all 10
( y y) 2
terms of
r=
18.3
3463.6 2966.9
or
18.3
= 0.0057
3205.64...
AWRT 0.006 or 6 103
M1 A1
Note
M1
(f)
for attempt at correct expression for r. Can ft their Syy for M1.
r suggests correlation is close to zero so parents claim is not justified
B1
Note
B1
for comment rejecting parents claim on basis of weak or zero
correlation
Typical error is negative correlation so comment is true which
scores B0
Weak negative or weak positive correlation is OK as the basis
for their rejection.
[14]
5.
(a)
() 17
Just 17
B1
(b)
t = 212 and m = 61 (Accept as totals under each column in qu.)
B1, B1
61 212
, = 1191.8
awrt 1190 or 119 (3sf)
M1, A1
Stm = 2485
10
Stt = 983.6 (awrt 984) and Smm = 1728.9 (awrt 1730)(or 98.4 and 173) A1, A1
M1 for one correct formula seen, ft. their t, m
[Use 1st A1 for 1 correct, 2nd A1 for 2 etc]
(c)
r=
1191.8
M1, A1ft
983.6 1728.9
= 0.913922...
M1 for attempt at correct formula,
Edexcel Internal Review
awrt 0.914
A1
2485
2101 5478
15
S1 Correlation and regression PMCC
PhysicsAndMathsTutor.com
scores M1A0A0
A1ft ft. their values for Stt etc from (b) but dont give for
Stt = 5478 etc (see above)
Answer only (awrt 0.914) scores 3/3, 0.913 (i.e. truncation)
can score M1A1ft by implication.
(d)
0.914
(Must be the same as (c) or awrt 0.914)
e.g. linear transformation, coding does not affect
coefficient (or recalculate)
B1ft (r<1)
dB1
B1
B1
B1g
2nd B1 dependent on 1st B1 Accept m = 261, m2 = 8541,
tm = 6725 0.914
(e)
0.914 suggests longer spent shopping the more spent.
(Idea more time, more spent)
0.178 different amounts spent for same time.
One mark for a sensible comment relating to each coefficient
For 0.178 allow little or no link between time and amount spent.
Must be in context.
Just saying 0.914 is strong +ve correlation
between amount spent and time shopping and 0.178 is weak
correlation scores B0B0.
(f)
e.g. might spend short time buying 1 expensive item OR
might spend a long time checking for bargains, talking,
buying lots of cheap items.
B1g for a sensible, practical suggestion showing that other
factors might affect the amount spent.
E.g. different day (weekend vs weekday) or time of day
(time spent queuing if busy)
[15]
6.
(a)
272 2
= 916
8
Any one method, cao
Sxx = 10164
320 2
= 664
8
cao
Syy = 13464
272 320
= 342
8
cao
Sxy = 11222
M1,A1
A1
A1
(Or 114.5,83 & 42.75)
Edexcel Internal Review
16
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
(b)
r=
342
916 664
= 0.43852
M1A1ftA1
B1
B1
formula, all correct ( 608224 ), 0.439
(c)
(d)
Slight / weak evidence,
students perform similarly in pressups and situps
context for +ve
272
= 34
8
10164
34 2 = 114.5 = 10.700
s=
8
x=
M1A1
M1A1
a = 1.96 10.700 = 20.9729 (or 22.4 divisor (n 1))
1.96 s, 21.0 or 22.4
1.96B1
M1A1
Pressups discrete, Normal continuous
Not a very good assumption
B1
B1 dep
method includes
, awrt 10.7
OR divisor (n 1) awrt 11.4
(e)
(f)
[18]
Edexcel Internal Review
17
PhysicsAndMathsTutor.com
S1 Correlation and regression PMCC
7.
(a)
600
580
560
540
520
500
160
170
180
190
Labels (not x, y)
Sensible scales allow axis interchange
Points
200
B1
B1
B2
(1 ee)
(b)
1562 5088
= 1433
9
correct use of S
Shc = 884484
M1
1433; 1433.3
A1
Shh = 1000 2 9 ; Scc = 2550
A1; A1
1000 2 9 , 1000.2 ; 2550
(NB: accept :- 9; i.e.:- 159 7 27 ; 111 1181 ; 283)
Edexcel Internal Review
18
S1 Correlation and regression PMCC
(c)
r=
1433 13
1000 2 9 2550
PhysicsAndMathsTutor.com
M1
substitution in correct formula
= 0.897488.
AWRT 0.897(accept 0.8975)
(d)
(e)
(f)
(g)
Taller people tend to be more confident
context
1433.3
= 1.433014..
1000.2
5088 1433.3 1562
a=
= 316.6256
9
9
1000.2
allow use of their b
b=
A1 ft A1
B1
M1
M1
c = 317 + 1.43h (3sf)
A1
h = 180 c = 574.4 or 574.5683.
subt. of 180
M1
574 575
A1
161 h 193
B1
[18]
NB (a) No graph paper 0/4
8.
(a)
48.1 52.8
= 157.86142
7
Sxx = 155.92428
Syy = 214.95714
Sxy = 204.95
correct method
AWRT 158/22.6
AWRT 156/22.3
AWRT 215/30.7
Edexcel Internal Review
M1
A1
A1
A1
19
S1 Correlation and regression PMCC
(b)
(c)
PhysicsAndMathsTutor.com
157.86142
r=
155.92428... 214.95714...
= 0.862269 (awrt 0.862)
SR: No working
r = 0.862
B1 only
(i)
0.862
(ii)
As sales at on petrol station increases, the other decreases;
limited pool of customers; close one garage
M1 A1 ft
A1
B1 ft
B1
2
[9]
Edexcel Internal Review
20
S1 Correlation and regression PMCC
1.
PhysicsAndMathsTutor.com
Typically candidates successfully used the correct formula in order to calculate the product
moment correlation coefficient in part (a). However, a number of candidates lost the accuracy
mark by only giving a rounded answer to two decimal places. Providing an interpretation of
their value of the correlation coefficient was less straightforward. Most frequently candidates
made general remarks and described the correlation as positive without relating this to the
context of the question. Of those who did attempt to provide an interpretation, many failed to
appreciate that it was the attendance at the matches being compared to the total number of goals
scored and not the number of home matches that were played.
Part (c) was answered well overall and correct answers were often justified by accompanying
statements which indicated that linear coding does not affect the product moment correlation
coefficient. Some candidates, however, seemed unaware of this fact and a common mistake was
to divide their original product moment correlation coefficient by 100. In addition many
candidates failed to recognise the significance of them being asked to write down their answer
and chose to perform a full calculation in order to obtain the product moment correlation
coefficient, which sometimes led to processing errors.
2.
This was a high scoring question for most candidates. The calculations in parts (a) and (b) were
answered very well with very few failing to use the formulae correctly. Part (c) received a good
number of correct responses but many still failed to interpret their value and simply described
the correlation as strongly positive. The scatter diagram was usually plotted correctly and most
knew how to calculate the equation of the regression line although some used S pp instead of S tt
and some gave their final equation in terms of y and x instead of p and t . Plotting the line in part
(f) proved quite challenging for many candidates and a number with the correct equation did not
have the gradient correct. Part (g) was usually well done but some chose to use their graph
rather than their equation of the line and lost the final accuracy mark.
3.
The vast majority scored full marks in part (a). The most common reason for losing marks for
the correlation coefficient was for rounding to less than 3 significant figures without having
stated the more accurate answer first. A large proportion of candidates still believe that stating
its a high level of correlation will be enough to gain the mark for interpretation. A fully
contextual comment is required here, using the named variables of pressure and temperature and
not just the letters p and t.
4.
This question was usually answered well. In part (b) some did not realise that they needed to
check the lower limit as well in order to be sure that 110 was the only outlier. Part (c) was
answered very well although some lost the last mark because there was no gap between the end
of their whisker and the outlier. Part (d) was answered very well and most gave the correct
values for
y and y
in the appropriate formula. A few tried to use the
(y y)
approach but this requires all 10 terms to be seen for a complete show that and this was rare.
Part (e) was answered well although some gave the answer as 5.7 having forgotten the103, or
failed to interpret their calculator correctly. Many candidates gave comments about the
correlation being small or negative in part (f) but they did not give a clear reason for rejecting
the parents belief. Once again the interpretation of a calculated statistic caused difficulties.
Edexcel Internal Review
21
S1 Correlation and regression PMCC
5.
PhysicsAndMathsTutor.com
Most candidates knew how to carry out the required calculations in parts (b) and (c) and these
were usually completed accurately and with suitable working shown. Although the majority
gave an answer of 17 in part (a) 60 and 3 were sometimes seen. The coding on the variable
m also caused some confusion with candidates using a value of 261 for m and then trying to
combine this with the sums of squares given in the question.
In part (d) most knew that the correlation coefficient remain unchanged but some thought the
value should be increased by 20 and a few candidates found new values of
m, m 2 and tm and then seemed surprised when their correlation coefficient was
unchanged. In part (e), the commonest response was to simply state that 0.914 represented
strong positive correlation whilst 0.178 was weak correlation rather than attempting to interpret
the values in terms of time spent shopping and amount of money spent as required. There were
a number of sensible practical suggestions offered in response to part (f).
6.
Parts (a) and (b) were extremely well answered by candidates; the value of 664 for Syy was
occasionally miscopied as 646 from part (a) to part (b). Candidates found it surprisingly difficult
to obtain both marks in part (c), with a contextual relationship frequently being omitted. In part
(d) the calculation of the mean was straightforward for nearly all candidates. Those candidates
who were able to provide a correct formula also accurately found the standard deviation;
however, too many candidates at this level were quoting an incorrect formula. Part (e) proved a
good discriminator, with relatively few concise solutions; some candidates managed to obtain
the correct value of a after a page or so of working. Only a handful of candidates were able to
see that the number of press-ups is a discrete variable, whereas normal distributions are
continuous.
7.
This question was familiar to most candidates and many of them answered it very well. This
being said, too many used scales that were not sensible for the scatter diagram and far too many
ignored the instruction to find the exact value. The interpretation of the correlation coefficient
was rarely given in terms of the context of the question and many candidates did not give the
values of a and b to 3 significant figures in spite of previous advice.
8.
Parts (a) and (b) were generally well answered with many candidates gaining full marks. This
being said, it was not unusual to see ridiculous values for the correlation coefficient and for
candidates to follow this through into part (c). Many candidates realised that the value of the
correlation coefficient would be the same in (c)(i) and those that attempted (c)(ii) often did so
without reference to the context of the question.
Edexcel Internal Review
22