0% found this document useful (0 votes)
26 views40 pages

Statistics 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views40 pages

Statistics 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2024 Edition

MATHEMATICS

STATISTICS
Maths and Science Infinity

Grade 12 Learner Booklet


Contents
STATISTICS...........................................................................................3
CURRICULUM OVERVIEW ..............................................................3
TERMINILOGOLY ................................................................................4
MEASURES OF CENTRAL TENDENCY & DISPERSION ............... 12
CUMULATIVE FREQUENCY CURVE (OGIVE) ............................. 17
REGRESSION ANALYSIS ................................................................ 26

MSI 2
STATISTICS
CURRICULUM OVERVIEW
1. Measures of central tendency in ungrouped data.
Calculate the mean. Determine the median and the mode.
2. Measures of central tendency in grouped data:
Calculation of mean estimate of grouped data and
identification of modal interval and interval in which the
median lies.
GRADE 10 3. Range as a measure of dispersion and extension to
include percentiles, quartiles, inter-quartile and semi-inter-
quartile range.
4. Five number summary (maximum, minimum and
quartiles) and box and whisker diagram.
5. Use the statistical summaries (measures of central
tendency and dispersion), and graphs to analyse and make
meaningful comments on the context associated with the
given data.
6. Histogram.

1. Revise measures of central tendency and dispersion in


ungrouped and grouped data.
2. Revise five number summary (maximum, minimum and
quartiles) and box and whisker diagram.
3. Revise histograms.
GRADE 11 4. Frequency polygons.
5. Ogives (cumulative frequency curves).
6. Variance and standard deviation of ungrouped data.
7. Symmetric and skewed data.
8. Identification of outliers.

1. Revise:
• Histograms.
• Frequency polygons.
• Ogives (cumulative frequency curves).
• Variance and standard deviation of ungrouped data.
GRADE 12 • Symmetric and skewed data.
• Identification of outliers.
2. Use statistical summaries, scatterplots, regression (in
particular the least squares regression line) and correlation
to analyse and make meaningful comments on the context
associated with given bivariate data, including
interpolation, extrapolation and discussions on skewness.

MSI 3
TERMINILOGOLY
MEAN
The mean, (also known as arithmetic mean/average), is simply the arithmetic average
of a group of numbers (or data set) and is shown using the bar symbol (¯x pronounced
‘x-bar’).
The mean of a set of values is calculated by adding up all the values in the set and
dividing by the number of items in that set. The mean is calculated from the raw,
ungrouped data.
MEDIAN
The median of a set of data is the data value in the central position, when the data set
has been arranged from highest to lowest or from lowest to highest.
There are an equal number of data values on either side of the median value.

MODE
This is the most frequently occurring value.
QUARTILES
Quartiles are measures of dispersion around the median, which is a good measure of
central tendency. The median divides the data into two halves. The lower and upper
quartiles further subdivide the data into quarters.
There are three quartiles:
The Lower Quartile (Q1): This is the median of the lower half of the values.
𝒏𝒕𝒉 (𝒏+𝟏)𝒕𝒉
𝑸𝟏 = or 𝑸𝟏 =
𝟒 𝟒

The Median (Q2): This is the value that divides the data into halves. 𝑸𝟐 =
𝒏𝒕𝒉 (𝒏+𝟏)𝒕𝒉
𝒐𝒓 𝑸𝟐 =
𝟐 𝟒

The Upper Quartile(Q3): This is the median of the upper half of the values. 𝑸𝟑 =
𝟑𝒏𝒕𝒉 𝟑(𝒏−𝟏)𝒕𝒉
𝟒
or 𝑸𝟑 = 𝟒

MSI 4
If there is an odd number of data values in the data set, then the specific quartile will
be a value in the data set. If there is an even number of data values in the data set, then
the specific quartile will not be a value in the data set. A number which will serve as
a quartile will need to be inserted into the data set (the average of the two middle
numbers).
RANGE
The range is the difference between the largest and the smallest value in the data set.
The bigger the range, the more spread out the data is.
THE INTER-QUARTILE RANGE (IQR)
It is a measure which provides information about the spread of a data set.
The difference between the lower and upper quartile is called the inter-quartile range.
IQR=Q3-Q1
SEMI-INTERQUARTILE RANGE
𝟏
is half the interquartile range, = (𝑸𝟑 − 𝑸𝟏 )
𝟐

FIVE NUMBER SUMMARY


The Five Number Summary uses the following measures of dispersion:
• Minimum: The smallest value in the data.
• Lower Quartile: The median of the lower half of the values.
• Median: the middle most number when the data is arranged from smallest to
greatest.
• Upper Quartile: The median of the upper half of the values.
• Maximum: The largest value in the data.
BOX AND WHISKER PLOT
A Box and Whisker Plot is a graphical representation of the Five Number Summary
and it allows us to interpret the spread of data.

MSI 5
It is very important to note that the first 25% (first quarter) of results lies between the
minimum and the lower quartile.
The next 25% (second quarter) of results lies between the lower quartile and the
median.
The third quarter lies between the median and the upper quartile and the last quarter
of data lies between the upper quartile and the maximum value.

HISTOGRAM
To interpret a histogram, we find the events on the x-axis and the counts on the y-axis.
Each event has a rectangle that shows what its count (or frequency) is. To draw a
histogram of a data set containing numbers, the numbers first must be grouped. Each
group is defined by an interval. We then count how many times numbers from each
group appear in the data set and draw a histogram using the counts.
FREQUENCY POLYGON
A frequency polygon is sometimes used to represent the same information as in a
histogram. A frequency polygon is drawn by using line segments to connect the
middle of the top of each bar in the histogram. This means that the frequency polygon
connects the coordinates at the centre of each interval and the count in each interval.
OGIVES
Cumulative frequency curve (ogives) are graphs that can be used to determine how
many data values lie above or below a particular value in a data set. The cumulative
frequency is calculated from a frequency table, by adding each frequency to the total
of the frequencies of all data values before it in the data set. The last value for the

MSI 6
cumulative frequency will always be equal to the total number of data values. Ogives
are useful for determining the median, percentiles and five number summary of data.
The starting point is obtained by plotting (x; 0), x is the lower boundary of the lowest
class interval.
VARIANCE
The variance is measured in the square of the data units.
STANDARD DEVIATION
The standard deviation is a very common measure of dispersion. Standard deviation
measures how spread out the values in a data set are around the mean.
• If the data values are all similar, then the standard deviation will be low (closer
to zero).
• If the data values are highly variable, then the standard variation is high (further
from zero).
The standard deviation is always a positive number and is always measured in the
same units as the original data. Standard deviation is directly proportional to mean.

VARIATION
Within one/ two standard deviation intervals (max/min)
Range is directly proportional to standard deviation.
The larger the standard deviation, the greater the variability of the data (the greater
the spread of the data)
A large standard deviation indicates that the data values are far from the mean and a
small standard deviation indicates that they are clustered closely around the mean.
▪ Standard deviation with one (x - α; x + α)
▪ Standard deviation with two (x - 2 α; x + 2α)

MSI 7
SYMMETRIC AND SKEWED DATA
A symmetric distribution is one where the left and right hand sides of the distribution
are roughly equally balanced around the mean. The histogram below shows a typical
symmetric distribution.

Symmetric distribution
• the mean is approximately equal to the median; and
• the tails of the distribution are balanced.
Right (positively) skewed distribution.
• the mean is greater than the median.
• the tail on the right-hand side is longer than the tail on the left-
hand side.
• the median is closer to the first quartile than the third quartile.
Left (negatively) skewed distribution.
• the mean is less than the median.
• the tail on the left-hand side is longer than the tail on the right-
hand side.
• the median is closer to the third quartile than the first quartile.

MSI 8
In summary: Mean=Median (symmetrical data set)
Mean>Median (skewed to the right/positively skewed)
Mean<Median (skewed to the left/negatively skewed)
IDENTIFICATION OF OUTLIERS
An outlier is a value that is far away from the rest of the data.
To check the outlier: [ Q1 – 1,5 x IQR; Q3 + 1,5 x IQR]
SCATTERPLOTS AND CORELATION
The scatter plot is widely used to present measurements of two or more related
variables (Bivariate date).
The linear correlation coefficient, 𝑟, is a measure which tells us the strength
and direction of a relationship between two variables. The correlation
coefficient r∈[−1;1]. When r=−1, there is perfect negative correlation,
when r=0, there is no correlation and when r=1 there is perfect positive
correlation.

MSI 9
CURVE FITTING
The process of fitting functions to data.
INTUITIVE CURVE FITTING
is performed by visually interpreting if the points on the scatter plot conform
to a linear, exponential, quadratic or some other function.
THE LINE OF BEST FIT OR TREND LINE
is a straight line through the data which best approximates the available data
points. This allows for the estimation of missing data values. 𝒚 = 𝒂 + 𝒃𝒙

MSI 10
INTERPOLATION
is the technique used to predict values that fall within the range of the available
data.
EXTRAPOLATION
is the technique used to predict the value of variables beyond the range of the
available data.
LINEAR REGRESSION ANALYSIS
is a statistical technique of finding out exactly which linear function best fits a
given set of data.

MSI 11
MEASURES OF CENTRAL TENDENCY & DISPERSION
QUESTION 1

The time taken, in minutes, to complete a 5 km race by a group of 10 runners is given below:

18 21 16 24 28 20 22 29 19 23

1.1 Calculate the mean time taken to complete the race. (2)

1.2 Calculate the standard deviation of the time taken to complete the race. (2)

1.3 How many runners completed the race with one standard deviation of the mean? (2)

[6]

QUESTION 2

The data below shows the energy levels, in kilocalories per 100g, of 10 different snack foods.

440 520 480 560 615 550 620 680 545 490

2.1 Calculate the mean energy level of these snack foods. (2)

2.2 Calculate the standard deviation. (2)

2.3 The energy levels, in kilocalories per 100g, of 10 different breakfast cereals had a mean
of 545,7 kilocalories and a standard deviation of 28 kilocalories. Which of the two types
of food show greater variation in energy levels? What do you conclude? (2)

[6]

QUESTION 3

The members of a local gym had to undertake a fitness test. The performance scores were
analysed and found to follow a normal distribution with a mean of 100 and a standard deviation
of 15.

3.1 Approximately what percentage of scores lie between 85 and 115? (2)

3.2 If a performance score between 115 and 130 indicates that a member is fit,
approximately what percentage of members fall in this category? (2)

3.3 If there are 500 members at the local gym, how many of them would you expect to
score more than 130? (2)

MSI 12
QUESTION 4

Fifteen members of a basketball team took part in a tournament. Each player was allowed the
same amount of time on the court. The points scored by each player at the end of the tournament
are shown below.

27 28 30 32 34 38 41 42 43 43 44 46 53 56 62
4.1 Determine the median of the given data. (1)

4.2 Determine the interquartile range of the data. (3)

4.3 Draw a box and whisker diagram to represent the data. (3)

4.4 Use the box and whisker diagram to comment on the points scored by the players
in this team. (2)

[9]

QUESTION 5

MSI employs 9 salespersons. The commission that each salesperson earned (in rands) in a
certain month is shown below:

3 900 5 700 7 300 10 600 13 000 13 600 15 100 15 800 17 100

5.1 Calculate the mean of the above data. (2)

5.2 Calculate the standard deviation for the data. (2)

5.3 MSI rates the sales staff according to the amount of commission earned. A salesperson
whose commission is more than one standard deviation above thew mean receives a
rating of “good”. How many salespersons will receive a rating of “good” for that
month? (2)

[6]

MSI 13
QUESTION 6
Hlaluminathi plays for his school’s cricket team. The number of runs scored by Hlaluminathi
in the eight games that he batted in, is shown below. (Hlaluminathi was given out in all of the
games)
21 8 19 7 15 32 14 12

6.1 Determine the average runs scored by Hlaluminathi in the eight games. (2)

6.2 Determine the standard deviation of the data set. (2)

6.3 Hlaluminathi’s scores for the first three of the next eight games were 22, 35 and 2
respectively. Describe the effect of his performance on the standard deviation
of this larger set having 11 data points. (2)

6.4 Hlaluminathi hopes to score an average of 20 runs in the first 16 games. What should
his average in the last five games be so that he may reach his goal? (3)

[9]

QUESTION 7
The data below shows the total monthly rainfall (in millimetres) at King Phalo Airport for the
year 2023.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
60,9 14,9 9,3 28,0 71,9 76,4 98,2 65,7 26,1 32,5 23,6 15,0

7.1 Determine the mean monthly rainfall for 2023. (2)

7.2 Write down the five-number summary for the data. (5)

7.3 Draw a box and whisker diagram for the data. (3)

7.4 By referring to the box and whisker diagram, comment on the spread of the rainfall for
the year. (2)

7.5 Calculate the standard deviation for the data. (3)

[15]

MSI 14
QUESTION 8

Two Mathematics classes, A and B, are in competition to see which class performed best in the
June examination. The marks of the learners in Class A are given below and the box and
whisker diagram below illustrates the results of Class B. Both classes have 25 learners. (Marks
are given in %.)

8.1 Write down the five-number summary for Class A. (4)

8.2 Draw the box and whisker diagram that represents Class A’s marks. Clearly indicate
ALL relevant values. (2)

8.3 Determine which class performed better in the June examination and give reasons for
your conclusion. (3)

[9]

QUESTION 9
The table below gives a breakdown of the PSL log standings for the 8 top teams at the end of
2008/2009.

POSITION TEAM POINTS


1 Super Sport United 55
2 Orlando Pirates 55
3 Kaizer Chiefs 50
4 Free State Stars 47
5 Golden Arrows 𝑥
6 Bidvest Wits 𝑥
7 Ajax Cape Town 𝑥
8 AmaZulu Royals 42
9.1 If the average points for these 8 teams is 48,375, show that 𝑥 = 46. (2)

9.2 Draw a box and whisker diagram (4)

[6]

MSI 15
QUESTION 10

In a certain school 60 learners wrote examinations in Mathematics and Physical Sciences. The
box and whisker diagram below shows the marks (out of 100) that these learners scored in the
Physical Sciences examination.

10.1 Write down the range of the marks scored in the Physical Sciences examination. (1)

10.2 Use the information below to draw the box-and-whisker diagram for the
Mathematics results.

Minimum mark = 30

Range = 55

Upper quartile = 70

Interquartile = 30

Median = 55

10.3 How many learners scored less than 70% in the Mathematics examination? (2)

10.4 Lwazi claims that the number of leaners who scored between 30 and 45 in Physical
Sciences is smaller than the number of learners who scored between 30 and 55 in
Mathematics. Is Lwazi’s claim valid? Justify your answer. (2)

[9]

MSI 16
CUMULATIVE FREQUENCY CURVE (OGIVE)
QUESTION 1

The heights, ℎ, of the learners at Ntabankulu Senior Secondary School in Grades 10, 11 and 12
were recorded as follows:

HEIGHT (in cm) FREQUENCY


118 ≤ ℎ < 127 16
127 ≤ ℎ < 136 26

136 ≤ ℎ < 145 42


145 ≤ ℎ < 154 54

154 ≤ ℎ < 163 26


163 ≤ ℎ < 172 22

172 ≤ ℎ < 181 14

1.1 Draw a cumulative frequency table. (2)

1.2 Draw an ogive for the data. (3)

1.3 Use the ogive, or otherwise, to determine the lower quartile, median and upper quartile.
(3)

1.4 If the minimum height was 119 cm and the maximum height was 178 cm, draw box
and whisker diagram for the data. (3)

1.5 Comment on the distribution of the heights of the learners. (1)

1.6 Approximately how many learners are between 138 cm and 158 cm tall? (1)

[13]

MSI 17
QUESTION 2
Thirty learners were asked to answer a question in Mathematics. The time taken, in minutes,
to answer the question correctly, is shown in the frequency table below.

TIME, 𝒕 (in minutes) Number of learners


1≤𝑡<3 3
3≤𝑡<5 6
5≤𝑡<7 7
7≤𝑡<9 8
9 ≤ 𝑡 < 11 5
11 ≤ 𝑡 < 13 1

2.1 Draw a cumulative frequency table for the data. (3)

2.2 Draw a cumulative frequency graph (ogive) of the above data. (4)

2.3 If a learner answers the question correctly in less than 4 minutes, then he/she is
classified as a ‘gifted learner’. Estimate thew percentage of ‘gifted learners’ in this
group. (2)

[9]

QUESTION 3

The individual masses (in kg) of 25 rugby players are given below:

78 102 88 93 81 90 75 60 76 75
68 90 80 77 81 69 60 83 91 100
80 70 81 64 70
3.1 Complete the following table.

MASS (kg) FREQUENCY CUMULATIVE


FREQUENCY
60 ≤ 𝑥 < 70
70 ≤ 𝑥 < 80
80 ≤ 𝑥 < 90
90 ≤ 𝑥 < 100
100 ≤ 𝑥 < 110
(4)

MSI 18
3.2 Draw an ogive (cumulative frequency curve) of the above inforamtion. (3)

3.3 Calculate the mean mass of the rugby players. (2)

3.4 How many rugby players have masses within one standard deviation of the mean? From
your calculations, calculations, calculate the percentage of the rugby players who have
masses within one standard deviation of the mean. (5)

[14]

QUESTION 4

The time taken (to the nearest minute) for a certain task to be completed was recorded on
48 occasion and the following data was obtained.

TIME (in minutes) FREQUENCY


11 ≤ 𝑡 < 15 6
15 ≤ 𝑡 < 19 9
19 ≤ 𝑡 < 23 13
23 ≤ 𝑡 < 27 12
27 ≤ 𝑡 < 30 8

4.1 Complete the cumulative frequency table. (1)

4.2 Draw an ogive (Cumulative frequency curve) for the given data. (4)

4.3 Determine, from the ogive, the median, lower quartile and upper quartile for
the data. (3)

4.4 Draw a box-and-whisker diagram of the data. (2)

4.5 Comment on the spread of the time taken to complete the task. (1)

[11]

MSI 19
MSI 20
MSI 21
MSI 22
MSI 23
MSI 24
MSI 25
REGRESSION ANALYSIS

MSI 26
MSI 27
MSI 28
MSI 29
MSI 30
MSI 31
MSI 32
MSI 33
MSI 34
MSI 35
MSI 36
MSI 37
MSI 38
MSI 39
THE END

MSI 40

You might also like