Darmand’s Ultimate Science Academy
CSEC Additional Mathematics
Statistical Diagrams
Statistical diagrams are used to illustrate data distribution so that trends and other observations can be deduced.
Even though these diagrams can be used to construct hypothesis about the data distribution, we should seek
to confirm these hypotheses through calculation of appropriate statistical measures.
For example, if a statistical diagram suggests that the data set is positively skewed, you should confirm this
through the calculation of appropriate statistical measures to confirm this deduction.
Luckily, the CSEC Additional Mathematics syllabus covers only two statistical diagrams:
1. The Stem and Leaf Diagram
2. The Box and Whisker Diagram
Note: While we only study these two statistical diagrams, the other significant statistical diagrams are covered in
the CSEC General Mathematics syllabus; pie chart, bar chart, line graph, histogram, frequency polygon and
cumulative frequency curves. It is assumed that you have a working knowledge of these diagrams.
The Stem and Leaf Diagram
The stem and leaf diagram, also referred to as the stem plot, is a very efficient way of illustrating the distribution
of values in a data set while retaining the original values. As the name suggests, the diagram resembles a single
stem with multiple leaves attached to it. The process of constructing a stem and leaf diagram will be illustrated in
the example below.
Example 1
The marks of 20 students in an assignment are given below
84 17 38 45 47 53 76 54 75 22
66 65 55 54 51 44 39 19 54 72
Construct an ordered stem and leaf diagram to show the distribution of marks.
HINT: For large data sets, it is usually helpful to construct an unordered stem and leaf diagram first, then
construct an ordered diagram after
From our example above, you will notice the following:
1. The size of the lead is restricted to one digit only.
While the leaf has a restricted size, there is no restriction on the size of the stem. For example,
- If we wanted to represent the value 359 in a stem and leaf diagram, the stem would be 35 and the leaf
would be 9.
- If we wanted to represent the value 1.4 in a stem and leaf diagram, the stem would be 1 and the leaf
would be 4
As you can see in our examples above, the leaf is usually reserved for the trailing digit of each number.
2. The values must be grouped in intervals of equal widths.
At the CSEC Additional Mathematics level, our grouping will always have a width of 10 units. Thus, our
groupings are usually 0 – 9 , 10 – 19 , 20 – 29, 30 – 39 and so on.
3. The diagram must be ordered
Prepared by Shamar Mundell
The values in each interval must be ordered in ascending order.
4. The diagram must have a key
The key is essential to explain the meaning of the values in the diagram to the reader.
Advantages and Disadvantages of the Stem Plots
While the stem and leaf diagram is useful in our statistical analysis, the diagram has the following advantages and
disadvantages:
Advantages:
The individual original data values are retained
When we look at a stem and leaf diagram, we are able to tell the individual.
The distribution of the data values become visible
We can rotate the stem and leaf diagram 90 degrees anticlockwise to get an idea of the shape of the
distribution. While this presents us with a general idea of the shape of the distribution, we must confirm
throughout the calculation of appropriate statistical measures.
Disadvantages:
The diagram is not suitable for extremely large or small data sets
Consider a data set containing 1000 values. The presentation of this diagram would be cluttered and hard
to interpret. Also, consider a data set containing 5 values. Illustrating these values using a stem plot would
not add any value to the analysis.
Time consuming for large data sets
With a data set containing 1000 values, it would be tedious and time consuming to arrange all the values
in ascending order within their respective stems.
Example 2
The times, in minutes, taken by a group of students to solve a puzzle is given below:
5 10 15 12 8 7 20 35 24 15
20 33 15 24 10 8 10 20 16 10
(a) Construct a stem and leaf diagram to show the distribution of the time take to solve the puzzle.
(b) Another group of 3 students from a local high school completed the puzzle and recorded the time it took
them to solve the puzzle. Explain why the stem and leaf diagram is not suitable to represent the values of
these five students.
Prepared by Shamar Mundell
Example 3
The amount of rainfall, in millimetres, across different cities for the month of May is given below
3.1 4.2 5.9 1.2 2.2
0.3 4.1 0.5 4.7 3.3
2.5 3.4 2.1 6.0 5.9
3.2 3.1 1.5 1.4 2.6
3.1 2.4 2.2 3.6 6.5
2.1 2.0 0.2 0.0 5.9
(a) Construct a stem and leaf diagram showing the distribution of the data values.
(b) State one advantage and one disadvantage of using a stem plot
(c) Calculate the following from your stem plot
(i) Semi – IQR
(ii) Mean
(iii) Standard Deviation
Try this on your own!
Calculate each of the following statistical measures from the stem and leaf diagrams in example 1 and 2
(i) Mean
(ii) Median
(iii) Mode
(iv) Semi – IQR
(v) Variance
(vi) Standard Deviation
Prepared by Shamar Mundell
(vii) Range
Example 4
Jan measures the heights, in millimetres, of 20 plants in her greenhouse. The results of her findings are
shared below:
178 189 147 147 166
167 153 171 164 158
189 166 165 155 152
147 158 148 151 172
(a) Construct a stem and leaf diagram to show the distribution of the heights.
(b) Determine the mean and standard deviation of the heights of the plants.
(c) Calculate the interquartile range of the heights of the plants
Prepared by Shamar Mundell
Example 5
The stem and leaf diagram below represents data collected for the number of hits on an internet site on each day
in March 2007. There is one missing value, denoted by 𝑥
(a) Find the median and lower quartile for the number of hits each day
(b) The interquartile range is 19. Find the value of 𝑥
(c) Hence, determine the variance for the number of hits per day.
Prepared by Shamar Mundell
Back – To – Back Stem Plot
Two data sets can be compared at once by using the same stem with leaves extending outwards from both sides
of the stem. The examples below demonstrate how this can be done.
Example 6
The lengths of the diagonals in metres of the 9 most popular flat screen TVs and the 9 most popular
conventional TVs are shown below.
Flat Screen: 0.85 0.94 0.91 0.96 1.04 0.89 1.07 0.92 0.76
Conventional: 0.69 0.65 0.85 0.77 0.74 0.67 0.71 0.86 0.75
(a) Represent this information on a back to back stem plot
(b) Find the median and the interquartile range of the lengths of the diagonals of the 9 conventional TVs
(c) Find the mean and standard deviation of the lengths of the diagonals of the 9 flat screen TVs.
Prepared by Shamar Mundell
Example 7
The lengths of time in minutes to swim a certain distance by the members of a class of twelve 9 – year – olds
and by the members of a class of eight 16 – year – olds are shown below.
9 – year – olds: 13.0 16.1 16.0 14.4 15.9 15.1 14.2 13.8 16.7 16.4 15.0 13.2
16 – years – old: 14.8 13.0 11.4 11.7 16.5 13.7 12.8 12.9
(a) Draw a back to back stem and leaf diagram to illustrate the information above
(b) Calculate the 65𝑡ℎ percentile of ALL the times.
(c) A new pupil joined the 16 – year – old class and swam the distance. The mean time for the class of nine
pupils was now 13.6 minutes. Find the new pupil’s time to swim the distance
(d) Compare the distribution of swim times in both classes by finding the standard deviation. Ensure to
include the new student in the 16 – year – old class.
Prepared by Shamar Mundell
Example 8
The scores of 20 students on a French and English Examination are shown belo.
French 75 69 58 58 46 44 32 50 53 78
81 61 61 45 31 44 53 66 47 57
English 52 58 68 77 38 85 43 44 56 65
65 79 44 71 84 72 63 69 72 79
(a) Construct a single stem and leaf diagram to show the distribution of scores on both examination.
(b) Determine the quartile deviation of ALL the scores from both the English and French Examination
(c) The standard deviation of the scores on the French Examination is 13.5 marks.
Determine the standard deviation of the marks from the English Examination. Provide an interpretation
for your answer for the English scores compared to the French scores.
Prepared by Shamar Mundell
Measures of Shape: Stem and Leaf Diagram
Measures of shape give an idea of the distribution of values through the data set. There are three measures of
shape that we will study at this level:
1. Positive Skew: 𝑀𝑜𝑑𝑒 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑒𝑎𝑛
Majority of the data values are clustered in the lower end of the data set with a few values in the
upper end of the data set. Therefore, the positively skewed distribution will have a longer tail that
extends to the right.
2. Symmetric: 𝑀𝑜𝑑𝑒 = 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑀𝑒𝑎𝑛
The data values are evenly distributed in the upper and lower ends of the data set. Both tails are of
equal length.
3. Negatively Skewed: 𝑀𝑜𝑑𝑒 > 𝑀𝑒𝑑𝑖𝑎𝑛 > 𝑀𝑒𝑎𝑛
Majority of the data values are clustered in the upper end of the data set with a few values dispersed
in the lower end. Therefore, the negatively skewed distribution will have a longer tail that extends to
the left
The diagram below shows the shape of distributions graphically.
The shape of the distribution can be inferred from the box and whisker plot by completing the following steps:
1. Rotate the stem and leaf diagram anticlockwise by 90 degrees. You must ensure that the lowest stem
value is to the left.
2. From the highest bar, draw a line down to the 𝑥 axis
3. The side with the longest tail will tell you what type of distribution you have. (symmetric, positive or
negative).
WARNING! To be absolutely sure about the shape of the distribution from a stem and leaf diagram, we MUST
calculate and analyze the measures of central tendency as shown in the diagrams above.
Prepared by Shamar Mundell
Example 9
The values in a data set are given as:
48 42 35 38 28 22 32 29
25 19 17 14 12 12 10 11
(a) Represent the data using a stem and leaf diagram
(b) From your diagram, calculate the mean, median and mode
(c) Hence, state the shape of the distribution
Prepared by Shamar Mundell
Example 10
The following back to back stem and leaf diagram shows the annual salaries of a group of 39 females and 39
males.
(a) Find the mean, median and mode of the female’s salaries
(b) Hence, state the shape of the distribution for the female’s salaries
(c) Find the mean, median and mode of the male’s salaries
(d) Hence, state the shape of the distribution for the male’s salaries
(e) Calculate the semi interquartile range for ALL the salaries
(f) By calculating the standard deviation for both the males and females salaries, compare the distribution of
salaries for both gender.
Prepared by Shamar Mundell
The Box and Whisker Plot
A box and whisker plot is called a five point plot as it is constructed using the following five data points:
1. Minimum Value
2. Lower Quartile
3. Median
4. Upper Quartile
5. Maximum Value
With these values, the box and whisker plot is a scaled diagram whose standard forms are:
Vertical Box Plot: Horizontal Box Plot
Steps to Constructing a Box and Whisker Plot:
Step 1: Calculate the values for the five data points
Step 2: Plug these values into the standard form; either a vertical or horizontal box plit
Remember it is a scaled drawing!
Example 11
The marks of 20 students in an assignment are given below
84 17 38 45 47 53 76 54 75 22
66 65 55 54 51 44 39 19 54 72
Construct a box and whisker plot to show the distribution of the grades of students in this class
Prepared by Shamar Mundell
Advantages and Disadvantages of the Box Plots
While the box plot is useful in our statistical analysis, the diagram has the following advantages and
disadvantages:
Advantages:
The distribution of the data set is visible
The shape of the distribution can be inferred from how the box spreads between the three quartiles. We
will discuss this more later in the lesson.
Can be used to compare multiple (more than two data sets)
While the stem and leaf diagram can only compare up to two data sets, the box and whisker plot can be
used to compare a significant number of data sets.
Disadvantages
The individual original data values are lost
The box plot does not retain all the original data values. In fact, there are cases where the quartiles are not
in the data set.
Not ideal for small data sets
Consider a data set with three values. The box plot would not be appropriate in representing this
distribution as a minimum of 5 data points would be needed.
Example 12
The ages of teacher in the Mathematics department at a school are
22 45 52 21 33 61 30 40 42 48 35
(a) Determine the mean, median and modal ages.
(b) Determine the range and quartile deviation of ages
(c) Determine the standard deviation of the ages
(d) Construct a stem and leaf diagram to show the distribution of ages
(e) Construct a box and whisker plot to show the distribution of ages
(f) State one advantage of the stem and leaf diagram over the box plot
(g) State one advantage of the box plot over the stem plot.
Prepared by Shamar Mundell
Example 13
The following stem and leaf diagram shows the number of minutes that patients waited at a medical centre before
they were seen by a doctor.
(a) Determine the median waiting time for the sample
(b) Calculate the interquartile range for the data
(c) Draw a box and whisker diagram to show this data
Example 14
The box and whisker plot below shows the volunteer service hours performed by students at Indian Train
Middle School last summer.
(a) What is the median of the data set?
(b) What is the lower quartile of the data set?
(c) What is the upper quartile of the data set?
(d) What is the range?
(e) What is the quartile deviation?
Prepared by Shamar Mundell
Example 15
The box plot gives information about the distribution of the weights of bags on a plane.
(a) Jean says the heaviest bag weights 23𝑘𝑔. State, with reason whether her statement is true or false.
(b) Write down the median weight
(c) Work out the inter quartile range
(e) There are 240 bags on the plane. Determine the number of bags with a weight of 10𝑘𝑔 or less.
Example 16
Mr Green measured the hight, in cm, of each tomato plant in his greenhouse. He used the results to draw the
box and whisker diagram shown below.
(a) Write down the median height.
(b) 25% of the tomato plants are taller than 𝑥 𝑐𝑚. State the value of 𝑥
(c) Determine the semi interquartile range
(d) Explain why the interquartile range may be a better measure of spread than the range.
Prepared by Shamar Mundell
Comparative Box and Whisker Plot
To construct a comparative box and whisker plot, we simply construct both box and whisker plots on the same
number line. This is used to compare the distribution to two similar data sets.
Example 17
The scores of 20 students on a French and English Examination are shown below.
French 75 69 58 58 46 44 32 50 53 78
81 61 61 45 31 44 53 66 47 57
English 52 58 68 77 38 85 43 44 56 65
65 79 44 71 84 72 63 69 72 79
Construct a box plot to compare the grades.
Prepared by Shamar Mundell
Example 18
The following back to back stem and leaf diagram shows the cholesterol count for a group of 45 people who
exercise daily and for another group of 63 who do not exerciser. The figures in brackers show the mumber of
people corresponding to each set of leaves.
(a) Give one useful feature of the stem and leaf diagram.
(b) Find the median and the quartile of the cholesterol count for the group who do not exercise.
You are given that the lower quartile, median and upper quartile of the cholesterol count for the group who
exercise are 4.25, 5.3 and 6.6 respectively.
(c) On a single diagram on graph paper, draw two box – and – whisker plots to illustrate the data
Prepared by Shamar Mundell
Measures of Shape: The Box and Whisker Diagram
Since the box and whisker diagram only uses five data points, our measure of shape from the box and
whisker plot is dependent on these (specifically the quartiles)
Symmetric: 𝑄3 − 𝑄2 = 𝑄2 − 𝑄1
For a symmetric distribution, the quartiles are equidistant from each other. In the box
and whisker plot, the box will be evenly divided and the whiskers will have
approximately the same length
Positively Skewed: 𝑄3 − 𝑄2 > 𝑄2 − 𝑄1
𝐹or a positively skewed distribution, the distance between the upper quartile and the
median, is greater than the distance between the median and the lower quartile. In the
box and whisker plot, the box will have a larger portion between 𝑄3 and 𝑄2 and the
upper whisker will normally be longer than the lower whisker
Negatively Skewed: 𝑄3 − 𝑄2 < 𝑄2 − 𝑄1
For a negatively skewed distribution, the distance between the upper quartile and the
median, is less than the distance between the median and the lower quartile. In the
box and whisker plot, the box will have a larger portion between 𝑄2 and 𝑄1 and the
lower whisker will normally be longer than the upp
The diagram below shows the shape of distribution using the quartiles.
WARNING! The shape of distribution using measures of central tendency and using the quartiles does not
always agree.
Prepared by Shamar Mundell
Example 19
The weights in kilograms of two groups of 17 – year – old males from county P and country 𝑄 are displayed in
the following back to back stem plot. In the third row of the diagram … 4 | 7 | 1 …. denotes weights of 74𝑘𝑔 for
a male in tcountry 𝑃 and 71 𝑘𝑔 for a male in country 𝑄.
(a) Find the median and quartile weights for country 𝑄
(b) You are given the lower quartile, median and upper quartile for country 𝑃 are 84, 94 and 98𝑘𝑔
respectively. On a single diagram, draw two box and whisker plots of the data.
(c) Make two comments on the weights of the two groups
Prepared by Shamar Mundell