Statistics Booklet For NEW A Level AQA
Statistics Booklet For NEW A Level AQA
Statistics booklet
A-level Biology Statistics booklet
1
CONTENTS
About Statistics ………………………………….………….3
Descriptive Statistics ………………………………………3
Range…………………………………………………..……..………………..4
Standard deviation…………………………………………….………………5
±2Standard deviations………………………………………………………..6
Worked examples……………………………………………….6
Questions………………………………………………………...9
Worked examples…..……………………………..……..…….12
Questions………………….....………………..…….…………16
Questions…………………... ……………..…………………..29
Questions……………….....………………..………………….43
Student’s t test……………..……..…………………………………..……...51
Questions……………….....………………..………………….57
2
About Statistics -
Measuring ‘Spread’
(Descriptive statistics)
In the Biology A-level, we will use descriptive statistics to measure the ‘spread’ of data.
Descriptive statistics are used to describe data. For example, if you were investigating the
number of visitors to a beach in August (nice job if you can get it!), you might draw a graph
to see how the number of visitors varied each day, work out the average number of visitors
each day (using mean, mode or median), work out the range of visitor numbers each day.
This would all be descriptive statistics. Descriptive statistics also involve using:
Range
± 2 Standard Deviations from the mean Can NOT use the
± 1.96 Standard Errors of the mean term “significant”
Chi squared
Spearman’s rank correlation coefficient CAN use the term
Student’s t-test “significant”
3
Measuring ‘Spread’
Distribution 1 Distribution 2
‘Distribution 1’ is tall and thin and ‘distribution 2’ is short and fat yet they have the same
mean average (and this data also has the same ‘mode’ and same ‘median’ average too.)
Nevertheless, the distributions have different spreads.
Range
To measure the spread, we could calculate the range, but sometimes there are data which
are outliers. Calculating range is simple but often not a good measure of the spread.
Using the standard deviation as a measure of spread about the mean is often a better
measure of spread.
Standard deviation
4
Using standard deviation is better than the range as it uses all the observations, and is
less affected by the outliers.
A “thin” curve means that most values remain close to the average, and the standard
deviation is small.
A “fat” curve means that there is a wider spread of values about the mean, and the
standard deviation is large.
𝚺(𝐱 − 𝐱̄ )𝟐
𝑺𝑫 = √
𝐧−𝟏
SD = Standard deviation
X = value
x̄ = mean
5
± 2 Standard Deviations
When data has a ‘normal distribution’ as shown in the lovely ‘bell-shaped’ graph above, we
can see that just over 95% of the data is within two standard deviations either side of the
mean. We can make use of this when comparing data.
You have found the following ages of lions in two different zoos. The lions were randomly
selected from all the lions in each zoo.
Age of Lions at Bristol Zoo (months) Age of Lions at London Zoo (months)
36 46
31 50
35 48
24 49
21 51
47 49
6
You can use the calculators to calculate the mean and standard deviation, using the
instructions on the calculator to help!
𝚺(𝐱 − 𝐱̄ )𝟐
𝑺𝑫 = √
𝐧−𝟏
We can draw a bar chart of the mean and plot the ± 2 Standard deviations from the mean
and look at the overlap of the bars.
You have measured the sizes of the different genders of a species of tropical fish.
Female Male
mean 21 10
SD 3 1.6
2 x SD 6 3.2
Mean + (2 x SD) 27 13
Mean - (2 x SD) 15 7
We can draw a bar chart of the mean and plot the ± 2 Standard deviations from the mean
and look at the overlap of the bars.
This indicates that the differences in the means (the size of the
fish) is unlikely to be due to chance.
Compare the data for resting heart rate whilst watching two different TV shows. Describe
the data.
mean
SD
2 x SD
Mean + (2 x SD)
Mean - (2 x SD)
Plot the bar chart on graph paper and draw on the 2 x SD bars. You may find you can
describe the data without plotting the bar chart though.
9
± 2 Standard deviations Question 2. Drugs
Compare the data shown in the graph above. This figure depicts two experiments, A and B.
In each experiment, control and treatment measurements were obtained. The graph shows
the mean difference between control and treatment for each experiment. A positive number
denotes an increase; a negative number denotes a decrease. The bars show the 2 x SD for
those differences.
.
There is ……………………………………………….
This ……………………………………………………………………………………………
………………………………………………………………………………………………….
10
Standard Error of the Mean and 95% confidence
interval
The Standard Error of a mean is calculated using the following formula:
When you take a sample and calculate the mean, it is important to remember that this is
only an estimate of the true mean for the whole of the population you are measuring. If you
took a second sample, you would probably arrive at a slightly different estimate of the
mean. There is no reason to suppose that the real mean will be exactly equal to the sample
mean. It is likely to be close to it, however, and
the amount by which it is likely to differ from the
estimate can be found from the standard error.
The 95% Confidence Interval is then calculated using the following formula:
𝟗𝟓% 𝑪𝑰 = 𝐱̄ ± 𝐒𝐄 × 𝟏. 𝟗𝟔
we are 95% confident that the true mean value of the population from which the
sample was taken lies between the upper and lower confidence limits
if the intervals of two calculated means do not overlap, we are 95% confident that
these means are different.
NOTE: People find the terms ‘standard error’ and ‘standard deviation’ confusing. We use
the term‘standard deviation’ when we are talking about distributions, either of a sample or
a
population. We use the term ‘standard error’ when we are talking about an estimate
found from a sample. If we want to say how good our estimate of the mean measurement
is, we quote the standard error of the mean. If we want to say how widely scattered the
measurements are, we quote the standard deviation.
11
Standard error of the Mean and 95% Confidence Interval
The owner of a Pet shop wants to work out the average mass of small guinea pigs in her
shop. She only had time to measure the mass of a few which she randomly caught.
Calculate the mean mass with 95% confidence intervals, using the sample data below.
Mass of guinea pigs (g): 19.4 21.4 22.3 22.1 20.1 23.8 24.6 19.9 19.1
mean 21.4
n 9
√n 3
SD 1.95
SE (SD ÷ √n) 0.65
1.96 x SE 1.27
Mean + 1.96 x SE 22.7
Mean - 1.96 x SE 20.1
12
Standard error of the mean and 95% confidence interval
A resident on the Isle of Wight wants to know if the size of limpets is different between the
upper and middle ledges at Bembridge rocky shore. The resident collects the following
data. (Obviously the resident is not a proper scientist otherwise they would never have
presented the data in such a terrible way!) (Sizes in mm, measured using callipers:)
Middle: 43.9, 38.4, 39.4, 44.7, 40.3, 37.8, 35.6, 56.7, 47.3, 36.0, 38.7, 37.7, 41.6, 42.5,
48.3, 34.9, 21.3, 42.5, 36.2, 46.8, 41.6, 43.5, 48.6, 39.5, 50.2, 46.9, 48.7, 42.9, 37.6, 53.4
Upper: 40.3, 35.0, 36.2, 40.8, 31.2, 30.2, 27.3, 37.7, 26.5, 25.5, 33.6, 24.8, 42.3, 38.6, 30.7,
23.2, 37.3, 34.6, 32.3, 33.0, 34.6, 38.0, 28.3, 22.5, 37.0, 45.0, 32.8, 36.9, 44.2, 28.6
The 95% CI of the size of snails on the Middle ledge are 39.71mm to 44.53mm.
The 95% CI of the size of snails on the Upper ledge are 31.45mm to 35.81mm.
There is no overlap. Some people prefer to use graph paper to draw a bar chart with bars to
show the confidence intervals, but it is not essential to do this. The 95% CI bars do not
overlap.
13
There is no overlap in the 95% confidence intervals of the two calculated means
(the mean size of Limpet on the Middle ledge and Limpets on the Upper ledge).
In the UK almost a third of adults are obese. A new diet called the South Beach diet has
become popular, and the NHS wants to assess how effective it is compared to a more
traditional low calorie diet. The following data is available. What conclusions should be
drawn from the available data?
14
% Reduction in BMI (body mass index) after 4 weeks of completion of diet
South Beach diet Traditional Low Calorie diet
2.1 2.2
2.0 1.8
1.8 1.9
2.0 2.0
1.9 1.9
2.4
There are six values for the South Beach diet so n= 6, and five values for the traditional low
calorie diet so n=5.
The 95% confidence intervals for the South Beach diet is 1.8 to 2.2 percent.
The 95% confidence intervals for the South Beach diet is 1.9 to 2.1 percent.
There is an overlap in the 95% confidence intervals of the two calculated means (of the %
reduction in BMI for the South beach diet and the traditional low calorie diet).
15
Standard error of the mean and 95% Confidence Interval
4 2 58 1 22 5 7 17 1 4
Calculate the mean count of the orchid (per metre2) with 95% confidence intervals.
mean
n
√n
SD
SE (SD ÷ √n)
1.96 x SE
Mean + 1.96 x SE
Mean - 1.96 x SE
is between …………………………………..
16
Standard error of the mean and 95% confidence interval
Question 2. Beans Q: What's the fastest vegetable?
A: A runner bean
There is an/no overlap in the ……………………………….. of the two calculated means (of the mean
height of Paul or Simon’s bean plants).
Ultra-competitive Paul was a bit disappointed at the result of his last experiment, so this
time Paul has decided to grow his beans using a super new fertiliser, whereas Simon just
likes to use horse-manure. Can a conclusion be made about which fertiliser is best?
Height of Runner bean plants (cm)
Paul’s plants (grown using new fertiliser) Simon’s plants (grown using horse manure)
103 160
113 145
117 120
123 125
119 126
98 212
124 123
120 129
There ………………………………………………………………………………………………………..
We ……………………………………………………………………………………………………………
18
More Standard error and 95% confidence interval problems.
4. Spotted knapweed is a common weed in the USA. Two methods, chemical control and
biological control, have been used to reduce the numbers of spotted knapweed plants.
Use Standard error and 95% confidence intervals to determine whether biological
control is better than biological control?
2 2
15 3
3 3
20 5
3 4
16 3
2 2
5. Two fields, A and B, were used to grow the same crop. Random samples of crop plants
from each plot were collected and their mass determined. The results are shown in the
table. Does the evidence suggest that previous use of the field affects the mass of the
crop?
1 14.5 6.4
2 16.7 9.8
3 17.4 12.9
4 17.5 16.2
5 17.5 17.1
6 17.5 17.1
7 17.5 17.1
19
6. Mayflies are insects which lay their eggs in streams and rivers. The nymphs hatch from
the eggs and live in the water for several years. Mayfly nymphs were collected by
disturbing the gravel of a stream bed. A net placed immediately downstream caught any
animals which were washed out of the gravel. Eight samples were collected from
shallow, fast-flowing parts of the stream and eight from deeper, slow-flowing parts. The
results are given in the table. What can be concluded about the differences in the
numbers of nymphs found in shallow water and deep water?
7. The table shows the relative thickness of the walls of the aorta and vena cava in four
different people. Is it possible to show that the thickness of arteries is thicker than veins?
Name Thickness / µm
Artery Vein
420 180
John
490 175
James
370 165
Daniel
120 120
David
8. A general practitioner has been investigating whether the diastolic blood pressure of
men aged 20-44 differs between farm workers and firemen. For this purpose, she has
obtained a random sample of 72 firemen and 48 farm workers and calculated the mean
and standard deviations. What conclusions can be made?
Farm workers Firemen
20
Using what we know to make inferences
about what we don’t know
Inferential Statistics
You should be aware that inferential statistics are used to test a theory, known as a
hypothesis. Before we choose a statistical test we should write a null hypothesis. The table
shows how hypotheses can be turned into null hypotheses.
Once the null hypothesis is stated, a statistical test is then chosen. This either supports or
fails to support the null hypothesis.
21
Chi-squared test
All chi-squared tests are concerned with counts of things (frequencies) that you can put into
categories. For example, you might be investigating flower colour and have counted the
numbers (frequencies) of red flowers and white flowers (categories). Or you might be
investigating human health and have frequencies of smokers and non-smokers.
The test looks at the frequencies you obtained when you counted them and compares them
with the frequencies you might expect to get in order to determine whether the difference is
significant or not.
22
Chi-squared Worked Example 1. Snails on the Seashore
You have been wandering about on a seashore and you have noticed that a small snail (the
flat periwinkle) seems to live only on certain types of seaweed. You decide to investigate
whether the animals prefer to certain types of seaweed by counting numbers of animals on
the different types of seaweed. You end up with the following data:
Null hypothesis
In other words, the periwinkle does not have a preference for which seaweed it lives on.
This is now used to work out the ‘expected’ frequencies.
23
Expected Frequencies
Our null hypothesis is that there is no difference between the observed and expected
frequencies. If this were exactly the case there would be no differences in the frequencies
over all of our categories (i.e. the five types of seaweed). The best estimate we could make
therefore would be to add up all our observed frequencies and divide by the number of
categories. So our expected frequency for each category would be:
45 + 38 + 10 + 5 + 2 = 100
O = Observed frequency
E = Expected frequency
24
The ‘critical value’ & the ‘degrees of freedom’
Before we can interpret our results we need to work out the ‘critical value’. The critical value
represents the borderline between accepting or rejecting our null hypothesis. We get the
critical value from the data sheet, but this depends on the number of ‘degrees of freedom’.
'Degrees of freedom' is a term that can be bit confusing. A simple (though not completely
accurate) way of thinking about degrees of freedom is to imagine you are picking people to
play in a team. You have eleven positions to fill and eleven people to put into those
positions. How many decisions do you have? In fact you have ten, because when you come
to the eleventh person, there is only one person and one position, so you have no choice.
You thus have ten 'degrees of freedom' as it is called. So 11 categories but only 10
‘degrees of freedom’. Hence, degrees of freedom = number of categories -1
Likewise, the periwinkle snail was found on the serrated wrack, bladder wrack, egg wrack,
spiral wrack, or other seaweed. There are five categories (five different types of seaweed),
so only 4 degrees of freedom.
Chi-squared gives a number which indicates how big the difference is between the
observed data and the expected data.
If the Chi-squared value is small, then there is a small difference between the observed and
the expected data. This means the null hypothesis is accepted (likely to be correct). In other
words, the snails don’t mind which seaweed they live on!
If the Chi-squared value is huge, then there is a huge difference between the observed and
the expected data. This means the null hypothesis is rejected. In other words, the snails do
indeed have a preference for living on a particular seaweed.
Looking at the table above we can see that the critical value of Chi-squared at 5%
significance (p=0.05) and 4 degrees of freedom is 9.49.
25
Our calculated value is 79
The calculated value is bigger (much bigger!) than the critical value. In a chi-squared test
this means we must reject the null hypothesis. In doing this we are saying that the snails
are not scattered about the various sorts of seaweed randomly. Biologists would infer that
this means they seem to prefer living on certain species.
It is worth pointing out that statistics of this kind tell you nothing about the biology of the
situation. All we are saying is that our observed frequencies are different to our expected
ones. (For example, you could criticise our approach by pointing out that it might be that
there are not equal amounts of each type of seaweed on the shore for the animals to live
on.)
26
Chi-squared Worked Example 2. Birds on the Bird-Table
Three neighbours have very similar bird-tables in their gardens. Bill owns the middle garden
and is a keen birdwatcher but he suspects that Kate, one of his neighbours, is actively
encouraging birds away from Bill’s garden into her own somehow. Is Bill right?
Observed frequency
Garden
(the numbers of bird’s visiting the
garden on one day in March)
Bill 112
Kate 145
Chris 139
TOTAL 396
Null hypothesis
We would expect the same numbers of birds to visit each garden if the null hypothesis is
correct. A total of 396 birds were seen, so we would expect 132 in each garden (396 ÷ 3)
27
Degrees of Freedom
We have three categories (i.e. three gardens) so the degrees of freedom is 3-1 = 2
Looking at the table above we can see that the critical value of Chi-squared at 5%
significance (p=0.05) and 2 degrees of freedom is 5.99.
The calculated value is smaller than the critical value. In a chi-squared test this means we
must accept the null hypothesis. (In other words, Bill is not right! – Kate is not secretly
putting out expensive bird-food to encourage the birds into her garden!)
28
Chi-squared Question 1. Mendel and his Peas
Mendel planted some round peas which grew into plants that produced a total of 556 peas.
423 were round peas and 133 were wrinkled peas. Mendel postulated that round is
dominant to wrinkled, and he work out the expected ratio for two heterozygous parent
plants as 3:1. Do his experimental data support his 3:1 expected ratio?
…………………………………………………………………………………………………………..
…………………………………………………………………………………………………………..
29
Chi-squared Question 2. A zookeeper’s dilemma
…………………………………………………………………………………………………………..
…………………………………………………………………………………………………………..
30
Chi-squared Question 3. The West-Africian bee-eaters
You have just returned from a 3 year stint in the jungles of western
Africa, where you studied the habitat selected by the native bee-eaters
(a family of birds that specialize in catching bees and wasps on the
wing, taking them to a perch, bashing their stingers out, and devouring
them. In a pinch, they will eat other flying or hopping insects, such as
grasshoppers). Several habitats were available to the bee-eaters. Is there evidence to
suggest that birds prefer a particular habitat?
…………………………………………………………………………………………………………..
…………………………………………………………………………………………………………..
4. One section of a river was trawled and four species of fish counted and frequencies
recorded. There were 15 Rudd, 15 Roach, 4 Dace and 6 Bream. Are the fish present
in the river in equal proportions?
6. The table below shows the number of patients requesting an urgent appointment to
see a Doctor on particular days of the week. Are the differences significant?
7. Cranes are large birds. Biologists have used DNA hybridisation to confirm the
relationships between different species of crane. They made samples of hybrid DNA
from the same and from different species. They measured the percentage of
hybridisation of each sample. The results are shown in the table. Are there any
differences which are statistically significant?
8. The table shows the number of cases of tuberculosis in the East Midlands between
2000 and 2005. Are the differences in number of cases of tuberculosis significant?
32
Spearman’s Rank
Spearman’s rank correlation is a statistical test that is carried out in order to assess the degree of
association between different measurements from the same sample. That is, if you are looking for a
positive or negative correlation between two variables.
When the points on a graph clearly fit onto a line of best fit it is easy to determine whether a correlation
exists. However, as the points become further placed from each other it is hard to make an accurate
judgement. This is where statistics is used; to clarify how confident we are that a correlation exists.
33
Spearman’s Rank Worked Example 1. Whales
Do blue whales get heavier as they get longer? From the table below it certainly looks as if they do. If
we drew a graph we would get an ‘uphill line’ (a positive correlation). However, to find out if the
correlation is statistically significant we must calculate Spearman’s rank correlation coefficient ( rs ).
Null hypothesis
When doing Spearman’s rank the two variables are used to construction the null
hypothesis. The null hypothesis always assumes there is no relationship.
“There is no correlation between the length of a blue whale and the mass of the whale.”
34
Rank the data
For each set of data assign ranks from lowest to highest. The lowest value in a column will be
given the rank of 1, the next smallest number will be given a 2 etc. If there are tied scores each of those
will share the ranks and be given the average (mean) rank.
For example, there are two whales with the same length (7 metres). These would be ranked 8 and 9, so
they are given the mean value. (i.e. 8+9 = 17 17÷2 = 8.5).
1 1 1.5 1
1.5 2 2.3 2
2.3 3 3.6 4
3.4 4 7.1 5
4.4 5 2.6 3
4.6 6 13.3 11
6.2 7 7.5 6
7 8.5 11.2 8
7 8.5 12.1 10
8.7 10 11 7
10.5 11 12 9
12 12 18.2 12
Next we have to work out D2 ;this is the difference in the rankings, squared.
35
Then we calculate the value of Spearman’s rank correlation, rs, using the equation below.
∑ = the sum of
= 1 - (285 ÷1716)
= 1 – 0.166
36
rs = 0.834
The first thing we notice is that the answer is a positive number, so we know that length are
positively correlated (i.e. not negatively correlated.)
The closer rS is to 1 or -1 the more likely the correlation. A perfect positive correlation has
an rS value of 1, a perfect negative correlation has a value of -1.
If the value lies between -1 and 1 we need to carry out a test for significance.
37
Before we can interpret our results we need to work out the ‘critical value’. The critical value
represents the borderline between accepting or rejecting our null hypothesis.
We have 12 paired values which gives us a critical value of 0.59 (for a positive correlation)
or -0.59 (for a negative correlation). This means that any value between +0.59 and +1 is a
statistically significant (positive) correlation.
The Spearman’s rank correlation coefficient, rs, between the length and mass of a blue
whale was +0.834. The calculated value is bigger than the critical value. In a Spearman’s
rank test this means we must reject the null hypothesis. In doing this we are saying that
there is a relationship (a positive correlation) between the length and mass of the blue
whale.
38
Spearman’s Rank Worked Example 2. Heather in a Moorland
During monitoring of a moor, an ecologist collected data from an area of moorland that had been
restored in 2003. In order to assess whether a correlation existed between the two plant species she
studied or whether they were growing independently of one another, she used a quadrat to collect the
following data.
Number of
Number of
Common
Bilberry plants
Heather plants
(per m2)
(per m2)
50 18
175 12
270 20
375 10
425 10
580 12
710 8
790 6
890 10
980 9
Null hypothesis
“There is no correlation between the number of Bilberry plants growing in an area and
the number of Common Heather plants growing in the area”
39
Rank the data & Calculate Spearman’s rank
Number of
Number of
Common Rank Common
Bilberry plants Rank Bilberry
Heather Heather Difference (D) D2
(per m2) (variable 1)
plants (variable 2)
(per m2)
50 1 18 9 -8 64
175 2 12 7.5 -5.5 30.25
270 3 20 10 -7 49
375 4 10 5 -1 1
425 5 10 5 0 0
580 6 12 7.5 -1.5 2.25
710 7 8 2 5 25
790 8 6 1 7 49
890 9 10 5 4 16
980 10 9 3 7 49
2
∑D = 285.5
∑D2 = 285.5
= 1 - (1713 ÷990)
= 1 – 1.730
rs = - 0.730
40
Interpret the results
The answer is a negative number, so we know that relationship between the two plants
shows a negative correlation. We have 10 paired values which gives us a critical value
(from the table of critical values) of -0.65 (for a negative correlation).
Males of the Frigatebird have a large red throat pouch. They visually display this pouch and
use it to make a drumming sound when seeking mates. Researchers wanted to know
whether females, who presumably choose mates based on their pouch size, could use the
pitch of the drumming sound as an indicator of pouch size. They estimated the volume of
the pouch and the frequency of the drumming sound in 16 males:
41
Volume
Frequency (Hz)
(cm3)
1760 529
2040 390
2440 473
2550 461
2730 465
2740 532
3010 484
3080 527
3370 488
3740 485
4910 478
5090 434
5380 468
5850 449
6730 464
6990 530
Null hypothesis
“There is no correlation between the volume of the bird’s pouch and frequency of the
sound it makes.”
Volume Rank
Rank Volume Frequency
(per cm3) Frequency Difference (D) D2
(variable 1) (Hz)
(variable 2)
1760 1 529 15 -14 196
2040 2 390 1 1 1
2440 3 473 9 -6 36
2550 4 461 5 -1 1
2730 5 465 7 -2 4
2740 6 532 17 -11 121
3010 7 484 11 -4 16
3080 8 527 14 -6 36
3370 9 488 13 -4 16
3740 10 485 12 -2 4
4910 11 478 10 1 1
5090 12 434 3 9 81
5380 13 468 8 5 25
5850 14 449 4 10 100
6730 15 464 6 9 81
6990 16 530 16 0 0
2
∑D = 719
42
∑D2 = 719
n = 16
= 1 - (4314 ÷ 4080)
= 1 – 1.057
rs = - 0.057
The answer is a negative number, so we know that any relationship between volume and
frequency shows a negative correlation. We have 16 paired values which gives us a critical
value (from the table of critical values) of -0.51 (for a negative correlation).
There is more than 5% probability that the correlation between the volume
of the pouch and the frequency of the sound it makes is due to chance.
43
Spearman’s rank Question 1. Purple loosestrife control
Q: What do you call a beetle that can't have too much sugar?
A: a diabeetle.
A European beetle was tested to see whether it could be used for the biological control of
purple loosestrife in the USA. In an investigation, beetles were released in an area where
purple loosestrife was a pest. The table shows some of the results. Is it possible to prove
that the beetles are effective in controlling purple loosestrife?
Mean number of Purple loosestrife plants (per m2) Mean number of Beetles (per m2)
28 4
22 5
8 40
6 62
7 68
…………………………………………………………………………………………………………..
…………………………………………………………………………………………………………..
Mean number
Mean number
purple Rank Purple
Beetles Rank Beetles
loosestrife loosestrife Difference (D) D2
(per m2) (variable 2)
(per m2) (variable 1)
∑D2 =
44
∑D2 = ………………….
n = ……………………...
rs = ………………
45
Spearman’s rank Question 2. Blood sugar control
A student ate a meal containing carbohydrates at 07:00. He ate nothing else for the next
five hours. The table shows the concentration of glucose in his blood at hourly intervals
after the meal. Is there a significant relationship between time of day and blood sugar
concentration?
…………………………………………………………………………………………………………..
…………………………………………………………………………………………………………..
∑D2 =
46
∑D2 = ………………….
n = ……………………...
rs = ………………
47
Spearman’s rank Question 3. Biodiversity on roundabouts
Roundabouts are common at road junctions in towns and cities. Ecologists investigated the
species of plants and animals found on roundabouts in a small town. The grass on the
roundabouts was mown at different time intervals. The table shows the mean number of
plant species found on the roundabouts. From the data, is it possible to prove that mowing
too frequently reduces biodiversity?
…………………………………………………………………………………………………………..
…………………………………………………………………………………………………………..
∑D2 =
48
∑D2 = ………………….
n = ……………………...
rs = ………………
49
More Spearman’s rank correlation coefficient problems.
5. Great tits are small birds. In a study of growth in great tits, the relationship between
the mass of the eggs and the mass of the young bird on hatching was investigated.
Is there a relationship?
50
51
6. A student carried out an investigation to find out if there is a link between Mussel
shell length and width on a rocky shore. Is there a relationship?
7. In a study of the 18 volunteers, the correlation between the mood and the amount of
liquid consumed by daily drinking was investigated. The Spearman’s rank
correlation, rs = 0.12 was obtained. How should this data be interpreted?
8. The correlation value obtained in a study of correlation between body height and
biological age was rs = 0.97. May we conclude that height and age are definitely
excellently correlated?
9. A student carried out 6 samples examining the growth rate of bacteria and the
concentration of citric acid present in the growth medium, and then calculated an r s
value of -0.89. How should this data be interpreted?
52
Student’s t-test
Use this test when you are looking for the difference between two means, and you want to
know if the difference is ‘significant’ or not. OK, so the formula looks a bit scary, but if you
look through the worked examples you’ll realise it’s not so tough!
53
Student’s t-test Worked Example 1. Bacteria
We have been growing two different strains of bacteria in flasks containing glucose. We had
4 replicate flasks for each bacterium. We have measured the biomass and want to find out
whether or not the results are significantly different for the two different strains of bacteria.
Null hypothesis
54
Calculating the value of t
Bacterium A Bacterium B
Mean 487.5 257.5
N 4 4
s (standard deviation) 27.54 22.17
s2 758.5 491.5
s2 ÷ n 189.6 122.9
487.5 – 257.5
t = √(189.6 +122.9)
230
t = √312.5
230
t = √17.7
t = 12.99
55
The ‘critical value’ & the ‘degrees of freedom’
Before we can interpret our results we need to work out the ‘critical value’. You will
remember from Chi-squared that this represents the borderline between accepting or
rejecting our null hypothesis. We get the critical value from the data sheet, but this depends
on the number of ‘degrees of freedom’. Hopefully you will remember all about 'degrees of
freedom' from Chi-squared. The calculation is slightly different simply because it allows for
two sets of data.
There were 4 flasks for each of the bacteria (n=4 for both bacteria)
Hence:
So we can see from the table of critical values of t, that 6 degrees of freedom = 2.48
Our value of t = 12.99 This is much higher than the critical value.
There is more than 5% probability that the differences in the means (mean
mass of bacterium A and mean mass of bacterium B) are not due to chance.
Some species of bacteria cause diseases of the stomach. Most are killed by acid gastric
juices produced by the stomach lining. Some species of bacteria survive the antibacterial
action of gastric juices by secreting the enzyme urease. This enzyme catalyses a reaction
that produces ammonia. The ammonia neutralises the acid in gastric juice. A student
believes that a small increase in temperature reduces the effect of urease and has
produced the table of results below. The student wants to know whether her findings are
significant or not.
Null hypothesis
36.5oC 37.5oC
Mean 48.6 54.3
N 5 6
s (standard deviation) 5.68 3.67
s2 32.26 13.47
s2 ÷ n 6.5 2.2
57
Substitute these values into the formula
48.6 – 54.3
t=
√(6.5 + 2.2)
5.7
t = √8.7 (Ignore the minus sign, as the ‘difference in means’ is intended)
5.7
t = 2.95
t = 1.9
There were n = 5 for the lower temperature, but n = 6 for the higher temperature
Hence:
So we can see from the table of critical values of t, that 6 degrees of freedom = 2.26
58
Interpreting the results
There is more than 5% probability that the differences in the means (mean
mass of bacterium A and mean mass of bacterium B) are due to chance.
A theatre nurse suspects that his newly purchased bit of expensive kit that gives instant
blood - haemoglobin levels is faulty. So he takes readings and compares these with his
trusty old bit of kit. His results are in the table below. Are his results significantly different?
59
The Null hypothesis is
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
−
t=
√( + )
t=
√
t=
t=
Use the table on the previous page to find the critical values of t = ……………………
There is more than 5% probability that the differences in the means (mean
mass of bacterium A and mean mass of bacterium B) are/not due to chance.
60
Student’s t-test Question 2.
A scientist is examining the rate of mitosis in the root cells and shoot cells of a species of
grass. She wants to know whether or not the rate of cells division in the root is quicker than
the rate of mitosis in the shoot cells. Are her results significantly different?
…………………………………………………………………………………………………………
…………………………………………………………………………………………………………
Use the table on the previous page to find the critical values of t = ……………………
There is ……………………………………………………………………………………………………………………………………………………………………..
61
We …………………………………………………………………………………………………………..
Final Questions.
For each of the following questions, use the flow diagram below to help you to use
the most appropriate statistical test.
Yes
Looking for Correlation
association? coefficient
No
Yes
Comparing Chi-squared
frequencies? test
No
Student’s t-test
62
1. The two-spot ladybird is a small beetle. It has a red form and a black form. These two
forms are shown in the diagram.
Colour is controlled by a single gene with two alleles. The allele for black, B, is dominant
to the allele for red, b. Scientists working in Germany compared the number of red and
black ladybirds over a six-year period. They collected random samples of ladybirds from
birch trees. Some of the results from the investigation are shown in the table.
How could you show that the frequency of the allele has remained the same?
Which statistical test should you use? Justify your choice of statistical test.
State the null hypothesis and interpret the results using the terms probability &
chance.
63
2. Fur seals live in Antarctic seas. They feed on fish and shrimp-like animals called krill.
During the summer the fur seals come ashore to breed. The table shows the number of
fur seals breeding on an Antarctic island from 1956 to 1986.
How could the increase in adult fur seal numbers be shown to be significant?
Which statistical test should you use? Justify your choice of statistical test.
State the null hypothesis and interpret the results using the terms probability &
chance.
1956 100
1964 100
1970 200
1975 100
1976 1600
1981 2900
1983 3100
1986 11700
3. A young bird watcher was watching a pair of breeding blue tits bringing food back to the
nest for the newly hatched chicks. He measured their ‘return rate’ which is a factor that
takes into account the bird’s time away from the nest, and the success in returning to the
nest with food for the chicks. He created a table of his results. Is the male blue tit a
significantly better provider for the chicks than the female?
Gender Return rate (mg/hour)
Female
adult 56 75 45 71 61 64 58 80 76 61
Male
adult 66 70 40 60 65 56 59 77 67 63
Which statistical test should you use? Justify your choice of statistical test.
State the null hypothesis and interpret the results using the terms probability &
chance.
64
4. Malaria is a disease caused by a parasite. Scientists investigated the effect of malaria
on competition between two species of Anolis lizard on a small Caribbean island. They
sampled both populations by collecting lizards from a large number of sites on the
island.
The scientists investigated the percentage of lizards of both species that were
infected with malaria at different sites on the island. They collected samples of both
lizards at intervals of 3 months for 1 year. They also recorded the elevation (height
above sea level) of each site. Some of their results are shown in the table.
Elevation
Total number Total number
of Percentage of Percentage of
of of
collectio A. gingivinus A. wattsi
Site A. gingivinus A. wattsi
n infected with infected with
collected in collected in
site / malaria malaria
one year one year
metres
1 10 13 0 0 0
2 80 30 0 0 0
3 120 35 23 3 0
4 200 40 30 7 0
5 300 52 46 12 0
6 315 35 31 13 1
7 370 155 37 79 2
8 414 124 44 68 4
(a) A preliminary study suggested that malarial infections in A.gingivinus were more
common at higher elevations. Use the data provided to determine whether this
suggestion is statistically significant.
Which statistical test should you use? Justify your choice of statistical test.
State the null hypothesis and interpret the results using the terms probability & chance.
(b) The scientists carried out a statistical test to determine whether the correlation
between the number of A. wattsi collected and the percentage of A. gingivinus
infected was significant.
Which statistical test should you use? Justify your choice of statistical test.
State the null hypothesis and interpret the results using the terms probability & chance.
65
5. In an investigation by a student into the responses of maggots, the bottom of a large box
was marked with six coloured segments, as shown in the diagram.
30 maggots were placed on each segment in the box. A transparent cover was put on the
box and light bulbs were positioned so that the segments were evenly illuminated. The
positions of the maggots were recorded after one hour. The intensity of the light reflected by
each segment was measured. The experiment was repeated three more times. The total
number of maggots in each segment from the four experiments is shown in the table.
Black 4 154
Red 25 229
Blue 10 178
White 44 47
Green 25 48
Yellow 40 64
Give one conclusion about the responses of maggots which is supported by these results,
and test your conclusion to see if it is statistically significant, using a suitable statistical test.
Which statistical test should you use? Justify your choice of statistical test.
State the null hypothesis and interpret the results using the terms probability &
chance.
66
6. Here are the results of an investigation into the rate of photosynthesis in the pond weed
Elodea. The number of bubbles given off in one minute was counted under different light
intensities, and each measurement was repeated 5 times. How can you show that mean
rate of photosynthesis at each light intensity is significantly different?
7. In a test of two drugs 8 patients were given one drug and 8 patients another drug. The
number of hours of relief from symptoms was measured with the following results:
Drug Time spent symptom free (hours)
A 3.2 1.6 5.7 2.8 5.5 1.2 6.1 2.9
B 3.8 1.0 8.4 3.6 5.0 3.5 7.3 4.8
Find out which drug is better by using an appropriate statistical test to find if it is
significantly better than the other drug.
8. In one of Mendel's dihybrid crosses, the following types and numbers of pea plants were
recorded in the F2 generation:
Number of Number of Yellow Number of Number of
Yellow round wrinkled seeds Green round Green wrinkled
seeds seeds seeds
395 122 96 39
67
9. The areas of moss growing on the north and south sides of a group of trees were
compared.
Orientation Total area of moss growing (m2)
North side of tree 20 43 53 86 70 54
South side of tree 63 11 21 54 9 74
10. The table below shows the results of an experiment. Five different trays of seedlings
were grown under red or yellow light over a four-hour period. The growth of the
seedlings was measured. Are the differences in growth significant?
P 5.2 3.8
Q 3.9 4.2
R 4.9 3.4
S 4.1 3.3
T 4.9 3.7
68