Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
17mm
Fourth
Edition
Fou rth
Edition
Statistical Methods
and Calculation Skills
Key features:
• A theoretical framework for statistical problem-solving
• A practical step-by-step approach to applying methods and calculations
• A complete list of outcomes in each unit
• Worked examples with detailed explanations
• Guided activities and a range of self-test questions.
Part A – statistical methods – covers the collection and presentation of data; descriptive
and inferential methods of analysis; index numbers; regression and correlation analysis;
time series; probability and probability distributions; statistical estimation; and hypothesis
testing. Calculation skills are revised in Part B, which deals with elementary calculations
Statistical Methods
such as exponents, decimals, scientific notation, logarithms and rounding. Students with no
mathematics background can learn how to do basic calculations before going on to statistical
applications. For some courses, calculations such as interest, future values of investments,
graphs and ratios form part of the core module and are also covered here.
The book includes examples and activities from the fields of business, food and biotechnology,
engineering, medicine and environmental studies.
I Willemse | P Nyelisani
www.jutaacademic.co.za
Isabel Willemse
Peter Nyelisani
All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage or retrieval system,
without prior permission in writing from the publisher. Subject to any applicable licensing terms and condi-
tions in the case of electronically supplied publications, a person may engage in fair dealing with a copy of this
publication for his or her personal or private use, or his or her research or private study. See section 12(1)(a) of the
Copyright Act 98 of 1978.
The author and the publisher believe on the strength of due diligence exercised that this work does not contain
any material that is the subject of copyright held by another person. In the alternative, they believe that any pro-
tected pre-existing material that may be comprised in it has been used with appropriate authority or has been
used in circumstances that make such use permissible under the law.
Contents
Unit 1: Introduction................................................................................................................................ 3
1.1 Problem-solving steps.................................................................................................................... 4
1.2 Definition........................................................................................................................................ 5
1.3 The language of statistics............................................................................................................... 6
1.4 Measurement.................................................................................................................................. 7
1.5 Role of the computer in statistics.................................................................................................. 10
TEST YOURSELF 1................................................................................................................................. 11
Unit 6: Summarising bivariate data: simple regression and correlation analysis............................. 127
6.1 Response variable (y) and explanatory variable (x).................................................................... 128
6.2 Scatter diagram............................................................................................................................... 128
6.3 Correlation analysis (r).................................................................................................................. 131
6.4 Regression analysis........................................................................................................................ 134
6.5 Spearman rank correlation coefficient (rs).................................................................................. 138
TEST YOURSELF 6................................................................................................................................. 140
iv
12.7 Square roots (0 ) .......................................................................................................................... 271
12.8 Logarithms (log)........................................................................................................................... 272
12.9 Factorial notation (!)..................................................................................................................... 272
12.10 Sigma notation (S)..................................................................................................................... 273
12.11 Fractions...................................................................................................................................... 274
12.12 Decimal numbers....................................................................................................................... 276
12.13 Scientific notation...................................................................................................................... 277
12.14 Rounding off decimals............................................................................................................... 278
12.15 Significant digits......................................................................................................................... 280
12.16 The metric system....................................................................................................................... 283
1 Introduction
This unit deals with the role of statistics in the data analysis process. Concepts
that are basic to the study of statistics are discussed.
Example 1.1
average to access the calibration of the filling machine. You can extend the
results from the sample of 50 bottles to all bottles filled during that week.
occur, but must have distance between them, for example interest rates and
stock prices.
• If you have to measure or weigh to get the value of the variable, it is continuous.
It has an infinite number of possible values that are not countable. For example
mass, length, time taken to complete a task, age, etc. can be measured to any
desired accuracy or number of decimal places within a given range.
Example 1.2
Example 1.3
1.4 Measurement
Measurement is the process we use to assign a value to the observations or
elements of a variable. This set of values for a given variable is known as data.
Activity 1.1
Activity 1.2
The student council at a university with 10 000 students is interested in the
proportion of students who favour a change in the admission requirements at
the university. Two hundred students are interviewed to determine their attitude
toward this proposed change. Of the 200, 64 (or 32%) are in favour of a change.
The student council announced that less than 35% of all the students are in
favour of a change.
a) What is the question to be answered in this investigation?
10
TEST YOURSELF 1
1. A survey of 100 people is conducted and questions are asked relating to the
following characteristics:
• marital status
• salary
• occupation
• number of hours of television watched per week.
What type of data and measurement scales are applicable?
2. The personnel manager of a business is studying employee morale and uses
a questionnaire to collect data. A typical question on the questionnaire: ‘I
feel that I am performing a valuable service for society when I do my job
well.’ Circle the letter that most closely represents your agreement with the
statement:
11
12
13
14
2 Collection of data
This unit deals with how and where to obtain data that can be used to make
informed decisions. Data collection is the process of collecting, counting and
recording of information.
The quality of the final results depends on the quality of the raw material
collected. Researchers have adopted the acronym GIGO – garbage in, garbage
out – to emphasise this fact.
for some purpose other than you intend to use it for. Data is often collected
through the use of secondary sources because it is available at low cost, but you
need to be sure that you are not using unsuitable data just because it is easily
available. Secondary data can be obtained internally or externally.
Internal data comes from within the organisation for its own use, for example
from accounting records, payrolls, inventories, sales records, etc.
Primary data is information collected by those wishing to collect their own data.
The distinguishing feature of this data is that it will be both reliable and relevant
to your purpose. As a result, primary data can take a long time to collect and may
be expensive. Sources of primary data include experiments, observation, group
discussions and the use of questionnaires under controlled conditions.
There are multiple methods and tools that can be used to collect data, but you
must decide which method(s) will best answer your research questions.
The four main methods of collecting data are:
• face to face
• by phone
• by post
• via the Internet.
There are advantages and disadvantages to using each of these methods. One
might be better suited to a particular survey than another.
16
Example 2.1
2.2.2 Observations
In an observational survey, collecting data relies on watching or listening very
carefully, and then counting or measuring events as they happen without any
interaction with the individuals or objects. You draw up an observation sheet
and keep count of the observations in a tally table, using straight vertical lines
for each item counted up to 4 (| | | |). The fifth event is a line across the first four
lines (| | | |) so that you can easily tally the total in multiples of 5. The variables
of interest are not controlled.
Example 2.2
The metro police wanted to determine whether motorists using a certain road
wore seatbelts. They observed whether drivers used seatbelts and counted how
many wore seatbelts and how many did not.
17
18
Activity 2.1
19
20
• yes/no answers
• tick boxes
• numbered responses
• word responses.
2. Open-ended questions will allow respondents to give their own opinions in
their own words and to express any thoughts that they feel are appropriate
to the question. As a result, depending on the nature of the question and
the interest of the respondent, answers may vary a great deal in length and
detail.
Activity 2.2
21
22
is small and a small sample will be sufficient. If the amounts differ a lot, the
variability is greater and a larger sample is needed.
Types of samples
23
24
2.6 Random sampling
A random sample is one in which the items chosen are based on chance – the
procedure must be such that every element of the population has the same chance
(or probability) of being selected into the sample. Following such a method is
considered ‘fair’ and free from bias, therefore allowing sample statistics to be
generalised to the whole population from which the sample was taken. Some
basic random sampling techniques are simple random sampling, systematic
random sampling, stratified sampling and cluster sampling.
Example 2.3
Assume that you have 100 employees in a company and you wish to interview
a random sample of 10. Assign every employee a number from 00 to 99. You
assign a two-digit number to each element in the population, and then you can
use two digits of each number from the random number list. The first step in
selecting a sample is to decide where in the random table you should start. Use
the random table given below. You can choose to use the first two digits, the
middle two digits or the last two digits. You can even choose which columns
to use. You can make this decision by using the ‘goldfish-bowl’ technique or by
closing your eyes and pointing to a spot in the table. Suppose you have decided
to start in the first column with the first two digits, and the population consists
25
of numbers from 00 to 99. If you reach the bottom of the last column on the
right and are still short of your desired number, go back to the beginning and
start reading the third and fourth digits of each number. According to the table,
employee numbers 70, 23, 20, 22, 53, 39, 48, 64, 12 and 45 will be in the
sample of 10.
Note that if a number occurs more than once, you skip it. You can’t use any
population ID twice because there is a unique ID assigned to each element in the
population.
Activity 2.3
Each student at the university has a mailbox on campus. The mailboxes are
numbered from 0000 to 9000. Use the random number table and select 10
mailbox numbers in your sample. Compare your results with some of the results
obtained by other students in the class and comment on your findings.
26
TEST YOURSELF 2
1. ‘How much do you trust information about health that you find on the
Internet?’ You want to ask a sample of 10 students chosen from your class the
question. Describe how you will select your sample using a random method.
2. You want to select a random sample of 25 of the approximately 371 active
telephone area codes covering South Africa. Explain the method you will
use and select your sample.
27
3. At a party there are 30 students over the age of 21 and 15 students under
age 21. You want to select a representative sample of five to interview about
attitudes toward alcohol. Explain your method and select your sample.
4. Based on satellite images, a forest area in KwaZulu Natal is divided into 14
types. The area of each type is divided into large sectors. Chose 18 sectors
of each type at random and count the tree species in a 20 3 25 m rectangle
randomly placed within each sector selected. Explain the method you will
use and select the sectors.
5. You want to choose four addresses at random from a list of 120 addresses.
Use the systematic method and describe how you will obtain your sample.
6. The New Firearm Policy Survey asked respondents’ opinions about
government regulation of firearms. If you are the researcher, and you want
to follow the telephone interview method using the multi-stage cluster
sampling method, how will you go about selecting your sample?
7. In the 1940s the public was greatly concerned about polio. In an attempt
to prevent this disease Jonas Salk of the University of Pittsburgh developed
a polio vaccine. To test the vaccine 1 000 000 children received the Salk
vaccine and another 1 000 000 a placebo, in this case an injection of salt
dissolved in water. Neither the children nor the doctors performing the
diagnoses knew which children belonged to which group, but an evaluation
centre did. The centre found that the incidence of polio was far lower
among children inoculated with the Salk vaccine. From that information
the researchers concluded that the vaccine would be effective in preventing
polio for all school children and made it available for general use.
Is this investigation an observational study or a designed experiment? Justify
your answer. Is the conclusion of the researchers descriptive or inferential?
8. An inspector of the Department of Health obtains all vitamin pills produced in
an hour at the Herbal Supply Company. She thoroughly mixes them and then
scoops a sample of 10 pills that are to be tested for the exact amount of vitamin
content. Does this sampling design result in a random sample? Explain.
28
When data is collected the initial result is usually a list of the observations for
each variable. This is referred to as raw data. Raw data has not been processed
and provides little information. Statistics give us some tools or techniques to
organise and summarise the raw data into tables and graphs. Data in this format
is easy to understand because it focuses on the key characteristics only.
A graph shows the relationship between two variables: one will be the x-variable
on the horizontal axis and the other the y-variable on the vertical axis. A graph
does not replace a table, but complements it by showing the data’s general
Steps
1. Draw a column in which each row lists one of the categories for the variable
of interest.
2. Draw a second column to list the corresponding number of times that the
category occurs (frequency f).
3. Add up the frequency column to make sure that the total is the same as the
number of observations.
4. The order for the categories in the frequency table is not important, unless
there is a logical order in the given data set.
5. Interpret the table results.
Example 3.1
30
1. Create a frequency table for the data and determine if Toni’s concerns are
justified.
2. Change the frequency distribution to a relative (%) frequency distribution by
dividing each frequency by the total frequencies.
Conclusions
1. The table shows that half of the boxes that arrived were damaged, which is
definitely a matter for concern. Only two of the boxes were unsealed and can
be considered as unsafe for use.
2. 50% of the boxes are damaged with half of the damaged boxes crushed.
Activity 3.1
31
3.1.2 Cross-tabulation
Data resulting from observations made on two different related categorical
variables (bivariate) can be summarised using a table known as a two-way
frequency table or contingency table.
The word ‘contingency’ is used because the table is used to determine if there
is an association between the variables.
Steps
1. This table displays the one variable (x) in the rows and the other variable (y)
in the columns.
2. Each row and column combination in the table is called a cell.
3. The number of times each (x, y) combination occurs in the data set is
recorded and these numbers are entered in the corresponding cells of the
table. These are known as the observed cell counts.
4. Add the observed cell counts in each row and also in each column of the
table to obtain the marginal totals.
5. The grand total is the total of all the observed cell counts in the table.
All the row marginal totals will add up to the grand total. All the column
marginal totals will also add up to the grand total.
6. We use a contingency table if we want to compare two different populations
on the basis of a single categorical variable, or when two categorical variables
are observed in a single sample. For example, data could be collected at a
university to compare students, staff and management on the basis of their
means of transport to campus (taxi, bus, car, train, motorcycle, bicycle or
on foot). This will result in a (3 3 7) two-way frequency table with row
categories of Student, Staff and Management, and column categories
corresponding to the seven possible modes of transport. The observed cell
counts could then be used to gain insight into differences and similarities in
means of transport in the three groups.
32
Activity 3.2
People believe that organic foods are healthier than conventionally grown fruit
and vegetables. An investigation is carried out on a sample of 10 000 food items
by the local health department as part of the regulatory monitoring of foods for
pesticides residues. The following table displays the frequencies of foods for all
possible category combinations of the two variables: food type and pesticide status.
Pesticides
Food type Present Not present Total
Organic 28 99 127
Conventional 9 085 788 9 873
Total 9 113 887 10 000
Activity 3.3
Steps
1. Communicate only a single idea or variable.
2. Draw a pair of axes, x and y.
33
Example 3.2
34
20
Number of boys
15
10
0
0 1 2 3
Number of fruit servings
Conclusion: 67% of the boys ate fewer than the recommended number of servings.
Activity 3.4
Draw a simple bar chart showing the ages of employees and draw conclusions
from your results.
Steps
1. Draw a pair of axes, x and y.
2. Label the axes and give the graph a title.
3. At evenly spaced intervals on the x-axis put tick marks and label them with
the categories from the frequency table.
35
Example 3.3
The contingency table below summarises the responses of two different groups
to their perceived risk of smoking. Portray the data using a multiple bar graph to
determine whether smokers and former smokers perceive the risks of smoking
differently.
100
80
60
%f
40
20
0
Very harmful Not too harmful
Smokers Former smokers
The graph shows that the proportion of former smokers who believe that
smoking is very harmful is larger than the proportion of smokers who believe
that smoking is very harmful. In other words, smokers are less likely to believe
that smoking is very harmful than former smokers.
36
Activity 3.5
Draw a multiple bar chart showing the ages of male and female employees and
comment on your result.
Steps
1. Draw a single bar for each category, with the height of the bar representing
the total of each category.
2. Subdivide each bar to show the components that make up each category.
3. Identify the components involved by colouring or fill effects, accompanied
by an explanatory key to show what each colour or fill effect represents.
4. Interpret your results.
5. If the components are converted to percentages of the total of each
category, the bars are divided in proportion to these percentages. The scale
is a percentage scale and the height of each bar is then 100%. This is known
as a percentage component bar graph.
Example 3.4
The contingency table below summarises the responses of two different groups to
their perceived risk of smoking. Portray the data using a percentage component
bar graph to determine whether smokers and former smokers perceive the risks
of smoking differently.
37
100%
80%
60%
%f
40%
20%
0%
Very harmful Not too harmful
Smokers Former smokers
300
200
f
100
0
Very harmful Not too harmful
Smokers Former smokers
From the stacked bar chart you can conclude that there are more smokers and
former smokers who believe that smoking is very harmful than those who believe
it is not too harmful.
Activity 3.6
Draw a stacked bar chart showing the ages of male and female employees and
comment on your result.
38
Steps
1. Draw a circle to represent the entire data set.
2. Keep the categories to 10 or fewer.
3. For each category calculate the ‘slice’ size.
4. A circle has 360° and ‘slice’ sizes are calculated as a proportion of 360°.
‘Slices’ are drawn by making use of a protractor.
5. Put any labelling outside the circle.
6. Look for categories that form large and small proportions of the data set
when interpreting the chart.
Example 3.5
A random sample of 2 000 shoppers was asked why they were visiting a shopping
centre on a specific day.
Number of shoppers %f °
Groceries 790 0.395 142
Clothing 570 0.285 103
DIY 580 0.29 104
Other 60 0.03 11
Total 2 000 1 360
39
Groceries
Clothing
DIY
Other
The majority of shoppers on that specific day wanted to buy groceries. Equal
proportions wanted to buy clothing or DIY items and only a few people were
there for other purposes.
Activity 3.7
Draw a pie chart to portray the data and comment on the results.
3.1.6 Pictograms
Pictograms are small symbols or simplified pictures that represent data.
Steps
1. Give the pictogram a title.
2. Choose a simple symbol or picture that is easy to draw.
3. The quantity that each symbol represents should be given.
4. It is important that the symbols are all the same size. It is possible to use half
a picture to represent half the quantity.
5. Draw the symbols neatly and professionally.
40
Example 3.6
Activity 3.8
Steps
1. Construct a single horizontal axis and label it with the name of the variable.
41
2. Mark the axis with an appropriate measurement scale to fit the smallest as
well as the largest value in the data set.
3. For each observation, place a dot above its value on the number line.
4. If there are two or more observations with the same value, stack the dots
vertically.
5. The number of dots above a value on the number line represents the
frequency of occurrence of that value.
Example 3.7
The purpose of a study is to investigate how much sugar and how much sodium
(the main ingredient of salt) is in breakfast cereals. The following table lists 15
popular cereals and the amounts of sodium and sugar contained in a single
serving of 180 ml.
Cereal Sodium (mg) Sugar (g) Cereal Sodium (mg) Sugar (g)
A 290 2 I 250 10
B 200 3 J 125 14
C 230 3 K 220 3
D 125 13 L 0 7
E 260 5 M 220 12
F 200 11 N 170 3
G 210 12 O 140 10
H 140 10
Construct a dot plot for the sodium values of the breakfast cereals.
0 120 140 160 180 200 220 240 260 280 300
The dot plot gives us an overview of all the data. We see that the sodium values
fall between 0 and 290 mg, with most cereals falling between 125 and 250 mg.
42
Activity 3.9
1. Construct a dot plot for the sugar values of the breakfast cereals.
2. What does the dot plot tell us about the data?
Stem Leaf
7 6
Units of measure: Stem: tens
Leaf: ones
Steps
1. Select one or more leading digit(s) for the stem values. You can choose the
digits to serve as the stem, but keep them constant for all the stems.
2. Find the smallest number and the largest number in the distribution of
numbers. These will give the first stem and the last stem.
3. List all possible stems in increasing order to the left of the line.
4. The trailing digit(s) become the leaves.
5. Record the leaf for every observation beside the corresponding stem value.
6. Place the leaves with the same stem on the same row as the stem.
7. Arrange the leaves in each row from lowest to highest to form a stem-and-
leaf plot.
43
8. Use a label to indicate the units for stems and leaves in the display.
9. Count the number of leaves per row and enter the answer in a column next
to the display. That is the frequency of each row.
10. The display conveys information about:
• a representative or typical value in the data set
• the extent to which the data values are spread out
• the presence of any gaps in the data
• the extent of symmetry in the distribution of values
• the number and location of peaks
• the presence of unusual values (outliers) in the data set.
11. When the stem has many leaves it does not clearly portray where the data
falls. In this case it is useful to split each stem in two: putting leaves from 0
to 4 on the first stem and from 5 to 9 on the second stem.
12. To make a stem-and-leaf plot more compact we can remove the last digit. For
example, 0.311, 370 and 125 will become 0.31, 37 and 12. Just remember
to indicate the correct unit for the leaves: for instance, in the case of 125, if
the 5 falls away, the stem will be 1 with unit hundred and the leaf will be 2
with unit ten.
Example 3.8
78 82 96 74 52 68 82 78 74 76
88 62 66 76 76 84 95 91 58 86
1. The smallest number is 52 and the largest number is 96. Use the first digit
(the tens) in each number as the stem and the last digit (the units) as the leaf.
Stem Leaf
5
6
7
8
9
44
2. Place each leaf on its stem by placing the trailing digit of each data value on
the right side of the vertical line opposite its corresponding leading digit (stem).
The first value is 78 with 7 the stem and 8 the leaf. Thus, we place 8 opposite
the stem 7.
Stem Leaf
5 28
6 826
7 8484666
8 22684
9 651
3. Order the trailing digits (leaves) in each row from lowest to highest to form a
stem-and-leaf plot.
Stem Leaf
5 28
6 268
7 4466688
8 22468
9 156
45
Activity 3.10
The following is an array of the daily litres of used sunflower oil bought by a bio-
diesel plant. Construct a stem-and-leaf plot for the data.
58 63 69 69 70 71 71 72 72 72
73 73 74 75 77 79 80 62 84 84
85 88 91 91 91 94 96 97 99 100
Example 3.9
15 6 14 15 4 15 17 6 18 15
An array of the x-values and the number of times each one occurs (frequency f ):
Activity 3.11
46
Steps
1. Determine the range of the given ungrouped or ‘raw’ data. The range (R) is
the difference between the largest and smallest values in the data set.
2. Determine the number of class intervals (K). Frequency tables should
contain between five and 20 classes. As a guideline, the number of classes
(K) should be approximately equal to the square root of the sample size, n.
K 5 number of observations
Round the answer up to the next whole number.
3. Determine the width (c) of the class interval, which is the range divided by
the number of classes.
47
c5 R
K
This answer should be rounded to a whole number (either up or down) or to
the same number of decimals as the raw data.
4. Test: the number of intervals multiplied by the width must always be larger
than the range.
(K 3 c . R)
5. Choose the lower and upper class boundaries of each interval to indicate
the smallest and largest data values that will fall into each class. The classes
must span the entire data set and must not overlap.
Begin by choosing a number for the lower boundary of the first class. Choose
either the lowest data value or a convenient value that is a little smaller. Add
the class width (c) to this value to get the second lower class boundary. Add the
class width to the second lower class boundary to get the third, and so on. List
the lower class boundaries in a vertical column. The upper class boundary of
the first interval is the same as the lower class boundary of the second interval.
The last class ends at a value more than the highest number in the range.
6. Sort the raw data into the classes by making use of the tally method. The tally
method is a method of counting data that falls into each interval. Examine
each data value and determine which class contains the data value. Make a
tally mark or vertical stroke beside that class. For ease of counting, each fifth
tally mark of a class is placed across the prior four marks (| | | | rather than
| | | | |). Observations that fall exactly on the lower class boundary stay
in that interval; observations that fall exactly on the upper class boundary
go into the next higher class interval. A class contains all observations
from the lower boundary of the class up to but not including the
upper boundary.
7. Count the number of tallies (observations) in each class to obtain the
frequency (f ) for each class.
8. The sum of the frequencies for all class intervals must equal the number of
original data values.
9. It is possible to come to some conclusions, such as: in which class do you find
the majority of the values or the least number of values?
Notes
1. The number of classes should be small enough to provide an effective
summary, but large enough to display the relevant characteristics of the data.
48
2. Class boundaries must be selected in such a way that the smallest value is
included in the first interval and the largest value in the last interval.
3. Avoid overlapping of intervals so that an observation falls in one class only.
4. The width of all classes should be equal.
5. Open-ended class intervals should be avoided, although they may be useful
when a few values are extremely large or small in comparison with the rest
of the values.
6. Class intervals with a frequency of 0 should be avoided.
Example 3.10
366 155 326 187 245 270 319 223 212 190
193 247 255 235 300 311 180 333 289 245
328 201 260 259 263 313 151 322 270 299
49
The sample from 30 franchises has been counted into six classes, with a
width of 36 each. For example, 151 up to just under 187 is the first class
interval, the two numbers 151 and 187 are the class boundaries and 3 (the
number of franchises) is the frequency of that class. This means that in three
of the franchises the AA levels in the French fries were between 151 and just
under 187.
Note: The Greek capital letter sigma (S) stands for ‘sum the appropriate values’.
Thus we write 1 1 2 1 3 1 4 1 . . . 1 n as
n
i50
xi
This means the sum of all the x values from 1 to n. This index system must be
used whenever only part of the available information is to be used. In statistics,
however, we usually use all the available information and the notation will be
adjusted by doing away with the index system
n
x
i51
5 Sx
i
Activity 3.12
Activity 3.13
A study was recently carried out to determine the amount of time that non-
secretarial office staff spend using computer terminals. The study involved
50 staff and the times spent using computers, in hours per week, were as
follows:
50
1.2 4.8 10.3 7.0 13.1 16.0 12.7 0.5 5.1 2.2
8.2 0.7 9.0 7.8 2.2 1.8 5.2 14.1 5.5 13.6
12.2 12.5 12.8 13.5 2.5 5.0 15.5 2.5 3.9 6.5
4.2 8.8 7.5 14.4 10.8 16.5 2.8 9.5 17.0 10.5
12.5 10.5 16.0 14.9 0.3 11.6 12.8 17.7 18.0 22.0
Example 3.11
The following is a frequency table showing the AA levels in the potato fries from
a sample of Big Mac’s outlets:
51
Interpreting interval 2: 17% of the outlets have AA levels in the potato fries
of between 187 and 223. Eight of the outlets have AA levels of less than 223
representing 27% of the outlets.
Activity 3.14
Use your frequency table from activity 3.13 and construct a relative frequency
distribution, a cumulative frequency distribution, a relative cumulative
frequency distribution and the class midpoints.
Steps
1. Mark the class boundaries on the x-axis. The class intervals are equal in
width; therefore the points must be equidistant from one another.
2. Use either f or % f on the y-axis. A proper scale showing the true zero must
be used on the y-axis in order not to misrepresent the character of the data.
3. Whenever the zero point on the horizontal axis is not in its usual position
at the intersection of the horizontal and vertical axis, the symbol // or some
similar symbol should be used to indicate that.
4. Draw a rectangle for each class directly above the corresponding interval.
The height of each rectangle is the frequency (or relative frequency) of the
corresponding class.
52
A distribution is skewed to the right if the ‘tail’ (larger values) extends much
farther out to the right :
A distribution is skewed to the left if the ‘tail’ (smaller values) extends much
farther out to the left :
53
A distribution is uniform if the frequency of each class is the same and the bars
of the histogram have the same length.
Example 3.12
Draw a histogram showing the AA levels in the potato fries from a sample of Big
Mac’s outlets.
Histogram
9
f 5
0
151 187 223 259 295 331 367
Class boundaries showing acrylamide levels
54
Activity 3.15
Steps
1. The frequency distribution must show the class midpoint (x) of each class.
2. Mark the class midpoints on the x-axis.
3. Mark the frequencies on the y-axis using a proper scale and preferably
starting at the zero point. The scale must include values large enough to
include the largest frequency.
4. Plot each midpoint together with its corresponding frequency.
5. Connect the successive dots with a straight line to form the polygon.
6. Frequency polygons begin and end on the horizontal axis with a frequency
of zero. On the left end, plot a point one class width to the left of the first
midpoint with a frequency of zero. On the right end, plot a point one class
width to the right of the last midpoint with a frequency of zero.
A polygon that uses the relative frequencies of the intervals rather than the
actual number of points is called a relative polygon. It has the same shape as the
frequency polygon, but uses a percentage scale on the y-axis.
Example 3.13
Below is a frequency table showing the AA levels in the potato fries from a sample
of Big Mac’s outlets. Draw the polygon for this distribution.
55
Polygon:
8
Number of franchises
0
169 205 241 277 313 349
Activity 3.16
Steps
1. The frequency distribution must show class boundaries and cumulative
frequencies.
2. The frequency scale on the y-axis must extend to the total of the frequencies.
56
Example 3.14
Below is a frequency table showing the AA levels in the potato fries from a sample
of Big Mac’s outlets. Draw an ogive for this distribution.
Ogive
30
Number of outlets
20
10
0
151 187 223 259 295 331 367
Acrylamide levels
57
Activity 3.17
3.4 Using software
There are a number of useful software packages available for data presentation
and most of them are simple and easy to use.
Computers can help you develop your ideas about how to organise the
information by using a ‘try and refine’ approach, which would take too long to
carry out manually. For example, if you decide to break the information down
in a certain way and the results are not what you need, it is a simple matter to
create new ways and experiment again.
Computer software can produce accurate and professional graphs and charts
from data, but these are only as useful as the data and instructions used to make
them.
TEST YOURSELF 3
N T N R N T N R N
R T M R M M N M
M N R T R R T M
58
a) Construct a simple bar chart to portray the data for each year.
b) Construct a stacked bar chart to portray the data.
c) Construct a multiple bar chart to portray the data.
d) Comment on your results.
4. A recent newspaper article ‘The need to be connected’ described the results
of a survey of 1 000 adults who were asked about how various essential
technologies, including personal computers, cell phones and DVD players,
influenced their daily lives. The table summarises the responses:
Construct a comparative bar graph to portray the responses for the different
technologies.
5. In the manufacture of printed circuit boards, finished boards are subjected
to a final inspection before they are distributed to customers. The type of
defect for each board rejected at this final inspection during a randomly
selected day is listed together with the frequency of occurrence:
59
8. A survey of 3 000 adults asked ‘How accurate are the weather forecasts in
your area?’ The responses are summarised in the given table:
Extremely accurate 5%
Very accurate 26%
Sometimes accurate 55%
Not too accurate 9%
Not at all accurate 4%
Not sure 1%
60
250 325 333 368 301 386 295 308 320 315
310 332 270 334 356 315 334 370 274 260
Construct a dot plot, stem-and-leaf plot and a frequency distribution for the
data.
11. An ecologist wishes to investigate the level of mercury pollution in a stream
in the Dullstroom area. He catches 25 trout and measures the concentration
of mercury (measured in parts per million) in each fish:
2.2 1.4 1.7 3.4 2.7 2.6 3.0 3.6 3.5 2.6 1.9 3.0 3.8
2.2 2.9 1.8 3.0 3.4 2.8 3.3 3.1 3.2 2.3 2.4 3.7
Construct a dot plot and a stem-and-leaf plot from the data. (Hint: split the
stems in two.)
12. The following observations represent the lifetimes (hours) of a certain type
of energy-saver lamp. Construct a dot plot and stem-and-leaf plot for the
data.
612 1 016 1 022 1 003 1 201 883 898 1 029 1 088 1 135
623 666 744 983 1 029 1 058 1 085 1 122 970 964
13. The following observations were measurements on coating thickness for a
sample of low-viscosity paint. Construct a dot plot and stem-and-leaf plot
for the data.
61
14. The following observations are carbon monoxide levels (ppm) in air samples
obtained from a certain region in Gauteng. Construct a stem-and-leaf plot
for the data.
43 29 29 31 35 39 42 41 31 27 31 33 32
34 28 22 30 40 33 19 30 32 31 28 23
Construct a stem-and-leaf plot and comment on the calorie content of beer.
10 12 22 21 20 24 20 35 31 24
24 45 12 26 17 29 19 7 27 29
17 18 16 13 2 16 12 15 22 11
11 15 41 29 16 21 24 14 24 16
8 33 18 21 12 13 15 21 10 33
The health clinic advertises that 90% of all its patients have a treatment
time of 40 minutes or less. Does the sample data support this claim? (Hint:
Use the cumulative relative ogive to answer the question.)
17. The amounts of protein (g) for a variety of fast-food sandwiches are reported
here:
29 33 23 25 27 40 30 15 35 35 20 18 26
38 27 27 43 57 44 19 35 22 26 22 14 42
35 12 24 24 20 26 12 21 29 34 23 31 15
18. The following are the numbers of kilometres (in thousands) driven during
the year by 110 food inspectors:
62
40 29 35 33 88 24 38 28 20 21
43 31 18 67 29 76 26 30 23 18
49 44 97 40 48 15 37 43 36 22
55 54 41 34 35 24 38 47 66 34
65 60 32 56 68 38 42 62 55 42
73 31 31 30 36 61 45 52 50 90
30 50 75 20 34 71 51 48 45 84
36 27 52 39 44 51 11 35 41 73
32 65 40 32 81 42 42 53 45 61
10 41 46 84 28 39 47 63 50 52
26 93 36 38 44 58 52 41 55 48
6.9 4.6 4.3 5.0 6.0 5.3 4.6 3.9 6.0 3.9
6.3 4.2 6.0 5.6 4.2 4.6 6.0 4.3 3.6 6.0
6.0 5.8 3.9 5.7 6.0 3.9 3.7 3.9 3.7 3.9
20. The following data represent tons of maize harvested each year for 40 years
from Section 20 on an agricultural experiment farm in the Delmas area:
2.71 2.82 1.35 2.20 1.47 2.39 0.59 0.46 1.31 2.50
1.80 0.89 1.64 1.62 1.39 2.19 1.18 1.26 2.04 2.33
1.32 2.60 2.07 0.94 1.42 1.19 2.34 0.77 0.89 1.44
1.62 2.15 0.95 2.02 1.67 1.99 1.48 0.70 0.98 2.00
21. Many people consider the number of calories in an ice-cream bar more
important than cost. To investigate the calorie content, a sample of 26 bars
gave the following results:
342 310 131 294 209 319 111 353 201 295 182 233 323
234 197 377 439 151 286 147 377 190 182 151 260 301
22. The time, in minutes, for a sample of 70 workers waiting at various points
in the production line were as follows:
63
1 3 7 23 1 2 5 1 0 6 2 14 5 3 5 6
5 0 1 2 4 5 18 0 1 3 3 1 11 21 1 3
0 6 7 1 19 3 5 1 17 3 5 16 10 2 5 6
1 3 8 5 4 14 15 12 0 2 4 4 2 5 9 6
11 15 13 17 2 20
22 11 33 10 28 7 12 25 32
22 46 21 10 18 17 29 14 2
37 35 3 5 18 4 29 21 20
44 23 31 31 24 13 23 10 36
24. Each of the following figures represents the weight of a package passing
through a sorting office. Construct a frequency distribution with cumulative
and relative columns:
7.9 7.8 5.0 8.6 8.1 7.9 8.2 8.1 7.3 8.0 8.2 4.9 8.0 7.5 7.4
8.0 8.0 7.7 7.8 7.5 7.8 5.3 7.9 6.8 7.5 6.9 5.2 8.5 7.9 7.5
5.2 8.2 4.9 8.7 7.7 7.8 6.0 8.1 8.5 8.0 6.1 7.8 8.1 7.6 7.8
7.9 7.9 5.3 7.9 8.1 7.6 7.9 8.3 7.4 8.4 7.6 8.0 8.0 8.2 8.2
6.9 8.1 5.7 7.9 7.7 7.9 6.8 7.8 7.7 7.5 8.1 8.1 8.0 5.1 5.7
6.0 8.0 5.6 8.2 7.6 7.9 6.2 5.4 5.9 7.8 8.7 6.6 8.1 7.7 6.1
7.8 7.4 8.1 7.3 7.1
25. The following data were recorded on a study of flexural strength of high-
performance concrete obtained by using certain binders and super-
plasticisers:
64
4 using numerical
descriptors
In this unit we look at numerical measures that can be used to describe the
characteristics of data collected in its raw form (ungrouped data) as well as for
data summarised into frequency distributions (grouped data).
Numbers used to describe data sets are called descriptive measures. Statistics
are summary measures used to describe a sample and populations are described
by parameters. For the purpose of this text, samples’ statistics are calculated
and used in later units to estimate the population parameters. Data has three
major characteristics: location, dispersion and shape.
Ungrouped data
Ungrouped (or raw) data will usually be presented as a list of numbers in any
order or quantity.
__ Sx
n
5
x
Where:
__
5 arithmetic mean
x
x 5 each observation value
n 5 number of observations
Steps
1. Add the values of the individual observations (Sx).
2. Count the number of observations (n).
__
3. Substitute the totals into the formula for x
.
4. Divide the sum of the values by the number of observations (n).
5. Interpret your answer.
66
Example 4.1
Calculate the arithmetic mean for the number of cars entering a parking area
during a sample of 10 10-minute intervals.
10 22 31 9 24 27 29 9 23 12
We can conclude that on average 20 cars enter the parking area during a
10-minute interval.
Activity 4.1
22 29 27 30 12 22 31 15 26 16 48 23
Grouped data
We cannot calculate exact values of the mean without raw data. If the
source of data is from a grouped frequency distribution, the mean can be
approximated using the technique in this section.
You have to assume that each observation of a class falls on the midpoint
(x) of that class. That means that the observations in a particular interval all
take the same value.
67
__ Sxf
n
5
x
Where:
x 5 class midpoint of each class
f 5 frequency of each class
n 5 number of observations in the sample (Sf)
Steps
1. Determine the class midpoint (x) of each class.
2. Multiply the midpoint (x) by the frequency (f) to obtain (xf) of each class.
Write the products in a column with the heading xf.
3. Sum the xf column to obtain Sxf.
4. Sum the frequency column to obtain n: n 5 Sf
5. Substitute the column totals into the formula.
__
6. Calculate the mean (x ) for grouped data.
7. Interpret your answer.
Example 4.2
The following frequency table shows the time (in minutes) taken to travel to
work for a sample of 25 people from Gauteng. Calculate the mean time to travel
to work.
Class boundaries f x xf
15.5 2 , 21.5 2 18.5 37
21.5 2 , 27.5 6 24.5 147
27.5 2 , 33.5 8 30.5 244
33.5 2 , 39.5 4 36.5 146
39.5 2 , 45.5 4 42.5 170
45.5 2 , 51.5 1 48.5 48.5
Total 25 — 792.5
68
Steps
1. Calculate the midpoints (x) column by adding the lower boundary to the
upper boundary of each class and dividing the sum by 2.
2. Multiply the midpoint by the frequency of each class to obtain the xf column.
3. Sum the xf column.
4. Sum the f column to obtain n.
__
5. Substitute the Sxf and n into the formula and calculate the x .
__ Sxf 792.5
x n 5
5 25
5 31.7 minutes
The approximate mean time to travel to work for the sample of people from
Gauteng is 31.7 minutes.
Activity 4.2
Calculate the mean number of hours of personal computer usage per week for a
sample of 16 people and interpret your answer.
Hours spent f x xf
1.95 2 , 3.95 2
3.95 2 , 5.95 5
5.95 2 , 7.95 5
7.95 2 , 9.95 3
9.95 2 , 11.95 1
Total 16
69
4.1.2 Median
The median is the value that occupies the middle position in a data set when
arranged in numerical order. This means that there are an equal number of data
values in the ordered distribution that are above it and below it.
Ungrouped data
Steps
1. Arrange the data in numerical order.
2. Count the number of observations (n).
3. Determine the position of the median.
n11
Median position 5
2
4. Read the value of the median from the number list.
• If the number of observations is odd, the median is the value that is
exactly in the middle of the data set.
• If the number of observations is even, the median is the average of the
two middle values in the data set.
Example 4.3
1 711
n 1
• Determine the position of the median: 2 5
2
5 value number 4
• Count up to value number 4 on the numerical list: median 5 12
• 50% of the time there were less than 12 customers in the shop and 50%
of the time there were more than 12 customers in the shop.
70
2. A city planner working on special tracks for bicycles recorded how many
minutes it takes bicycle commuters to pedal from home to their destination.
A sample of 12 bicycle commuters yields the following times:
22 29 27 30 12 22 31 15 26 16 48 23
• Numerical order:
12 15 16 22 22 23 26 27 29 30 31 48
1
• Position of the median: n 1
2 12 21 1
5 5 value number 6.5
• Value number 6.5 falls between 23 and 26.
23 1 26
• Median 5
2 5 24.5
• 50% of the commuters took less than 24.5 minutes to travel to their
destination and 50% took more than 24.5 minutes to travel to their
destination.
Activity 4.3
1. The following numbers represent the typing speeds in words per minute of
five secretaries.
30 90 45 25 55
2. How many calories are in a serving of cheese pizza? A variety of pizzas from
different outlets were sampled and the calories per serving were determined.
The calories were as follows:
332 275 393 347 350 353 357 296 358 322 337 323 333 299
Grouped data
You can calculate an estimated median for a frequency distribution either
graphically or by calculation.
71
With grouped data we are unable to determine where the true middle value
n
falls, but we assume that the median value will be value number
2 and that the
frequencies in the median class are evenly spread. Use the following formula to
calculate an estimate for the median.
(
2n 2 F )c
Median 5 L 1
f
m
Where:
n 5 Sf
L 5 lower boundary of the median class
fm 5 frequency of the median class
c 5 width of interval
F 5 sum of all the frequencies up to but not including the median class.
Steps
2n
1. Determine the location of the median:
2. Construct the cumulative frequency column (cum , f).
3. Compare the position of the median with the cum , f column to determine
which one of the intervals contains the median. The median class is the
2n for the first
interval where the cumulative frequency is equal to or exceeds
time.
4. Estimate the value of the median using the formula for grouped data.
Example 4.4
Calculate the median time (in minutes) taken to travel to work for a sample of
25 people in Gauteng.
72
The median time to travel to work for the 25 people in the sample is:
(
2n 2 F )c
Median 5 L 1
f
m
( 25
)
2 2 8 6
5 27.5 1
8
5 30.88
This means that half the sampled people travelled less than 30.88 minutes to
work and the other half more than 30.88 minutes.
Steps
1. Draw the cumulative ‘less than’ ogive.
n
2. Find the median position
2 on the vertical axis.
3. Draw a straight horizontal line up to the ogive. Drop a straight line down to
the x-axis.
4. The corresponding value on the horizontal x-axis is the median value.
73
Example 4.5
Use the table from example 4.4 and determine the median value graphically.
25
2n 5
1. Find the position of the median on the vertical axis: 2 5 12.5
25
20
Cum f (no. of people)
15
#
10
0
#
15.5 21.5 27.5 33.5 39.5 45.5 51.5
Class boundaries (minutes)
2. Draw a straight horizontal line from the position on the y-axis up to the
ogive. Drop a straight line down to the x-axis.
3. Read the median value from the x-axis: median 5 6 31 minutes.
Activity 4.4
Calculate the median number of hours of personal computer usage per week for
a sample of 16 people by making use of a formula as well as a graph.
Hours f
1.95 2 , 3.95 2
3.95 2 , 5.95 5
5.95 2 , 7.95 5
7.95 2 , 9.95 3
9.95 2 , 11.95 1
Total 16
74
4.1.3 Mode
The mode of a data set is the value that occurs most frequently. It can be a good
measure to represent a typical value such as the most popular shirt size.
Ungrouped data
Tally the number of observations that occur for each data value. If there is no
value that occurs more often than the others, then there is no mode. (Note: this
is not the same as a mode of 0.) A set of data can have no mode, one mode or
more than one mode (bi-modal or multi-modal).
Example 4.6
1. The commission earnings of five colleagues for the previous month were as
follows:
R5 000 R5 200 R5 200 R5 700 R8 600
The modal commission was R5 200 because more of the colleagues earn
R5 200 than any other income.
2. The lengths of stay (in days) for a sample of nine patients in Ward A are:
17 19 19 4 19 26 4 21 4
The modal lengths of stay are 19 days and four days: more of the patients
stay either four days or 19 days than any other number of days.
There is no mode: none of the workers earn the same income rate.
75
Activity 4.5
1.4 15.5 2.1 8 15.5 1.4 17.7 7.2 9.1 15.5
Grouped data
An estimate of the mode can be approximated either graphically or by making
use of a formula. Grouped data does not show a single most frequently
occurring value but assumes that the mode will occur in the interval with the
highest frequency.
Mode 5 L 1 1
D 1D(
1
c
D
2
)
Where:
L 5 lower boundary of modal class
c 5 width of interval
D1 5 frequency (f) of modal class minus f of previous class
D2 5 frequency (f) of modal class minus f of following class
Steps
1. Select the class containing the highest frequency as the modal class.
2. Determine the D1 value by subtracting the frequency of the class preceding
the modal class from the frequency of the modal class.
3. Determine D2 by subtracting the frequency of the class following the modal
class from the frequency of the modal class.
4. Use the formula to estimate the modal value.
5. Interpret your answer.
76
Example 4.7
The modal time (in minutes) taken to travel to work for a sample of 25 people is:
Time (minutes) f
15.5 2 , 21.5 2
21.5 2 , 27.5 6
27.5 2 , 33.5 8
33.5 2 , 39.5 4
39.5 2 , 45.5 4
45.5 2 , 51.5 1
Total 25
1. Choose the class with the highest frequency – that is class number 3.
2. Mode 5 L 1 1
D 1D(
1
D
) (
c 5 27.5 1
2
2
2 1 )
4 6 5 29.5 minutes
3. More of the people take 29.5 minutes to travel to work than any other time.
Steps
1. Draw the histogram of the frequency distribution.
2. Identify the longest bar on the histogram as the modal bar.
3. Draw a line from the top right corner of the modal bar up to the right corner
of the bar to its immediate left.
4. Draw a second line from the top left corner of the modal bar up to the top left
corner of the bar to its immediate right.
5. Draw a straight line parallel to the y-axis through the intersection point of
the previous two lines down to the x-axis.
6. The value on the x-axis approximates the modal value.
77
Example 4.8
7
6
6
No. of people (f)
5
4 4
4
3
2
2
1
1
0
15.5 21.5 27.5 33.5 39.5 45.5 51.5
Activity 4.6
Calculate the modal number of hours of personal computer usage per week for a
sample of 20 people using a formula and read the modal number of hours from
an appropriate graph. Interpret your result in the context of the data.
Time (hours) f
1.95 2 , 3.95 2
3.95 2 , 5.95 5
5.95 2 , 7.95 9
7.95 2 , 9.95 3
9.95 2 , 11.95 1
Total 20
78
79
Example 4.9
The arithmetic mean of the marks is 68. This means that the sum of all the
marks evenly divided by all the learners will give you 68. The median value
is 66, which means that half of the learners scored less than 66 and the
other half scored more than 66. The mode is 66, which means that more
learners obtained 66 than any other mark.
If the values are arranged in numerical order and you slot the arithmetic
mean value in position, you will see that there are four values smaller than
the mean and only one bigger than the mean. This means that the value
on the right is an outlier which pulled the mean to the right, causing the
distribution to be positively skewed. For this reason the median or the mode
will be a better measure to choose.
The arithmetic mean of 89.5 is probably the best average to use since it
takes into account all the test marks of the student and therefore indicates
overall performance.
80
Activity 4.7
For grouped data the range is the difference between the upper boundary of the
last interval and the lower boundary of the first interval.
81
Example 4.10
Supplier 1: 480 600 600 600 760
Supplier 2: 480 540 570 760 760
Calculate the range of punnet weights for each supplier and comment on your
results.
The ranges are the same, but it is obvious that the variations within the samples
are different. So the range will not solve the bakery’s problem if they want to
choose a supplier that will provide punnets with consistent weights.
Note: The deviation of a value in a data set is the difference between that value
and the mean of the data set.
Some of the values are smaller than the mean, which will result in a negative
deviation, and others are larger than the mean, which will result in a positive
deviation.
To prevent negative deviations from the mean cancelling positive deviations,
the algebraic signs of the deviations are ignored and the absolute differences are
averaged.
82
Ungrouped data
Steps
__
1. Calculate the arithmetic mean (x ) of the distribution.
2. Determine the difference between each value and without regard to the
__
algebraic sign: |x 2 x
|. The two vertical lines indicate that you are using
the absolute value.
__
3. Add the absolute values of the deviations: S|x 2 x |
4. Divide the sum by the number of values (n).
__
|
S|x 2 x
MAD 5
n
Example 4.11
Calculate the mean absolute deviation for the number of cars entering a parking
area during a sample of 10-minute intervals.
__
x |x 2 x
|
10 9.6
22 2.4
31 11.4
9 10.6
24 4.4
27 7.4
29 9.4
9 10.6
23 3.4
12 7.6
x 5 196 76.8
__ 196
1. x 5
10 5 19.6
__
2. |x 2 x | 5 76.8
3. n 5 10 __
|
S|x 2 x 76.8
4. MAD 5 n 10 5 7.68 cars
5
5. The typical deviation from the mean is 7.68 cars. The smaller the answer,
the less variation we have in the distribution.
83
Activity 4.8
22 29 27 30 12 22 31 15 26 16 48 23
Determine the mean absolute deviation travelling time for the riders and interpret
the result in the context of the data.
Steps
__
1. Calculate the arithmetic mean (x ) of the distribution.
__
2. Determine the deviation of each midpoint (x) from x without regard to the
__
algebraic sign: |x 2 x
|
3. Multiply the absolute deviation in each class by the frequency of that class:
__
|x 2 x| f
__
4. Add the absolute values of the deviations: S|x 2 x |f
5. Divide the sum by the number of values (n 5 Sf)
__
|f
S|x 2 x
6. Formula: MAD 5
n
Example 4.12
The following frequency table shows the time (in minutes) taken to travel to work
for a sample of 25 people from Gauteng. Calculate the mean absolute deviation
time to travel to work.
__
Time (minutes) f x |x 2 x
|f
15.5 2 , 21.5 2 18.5 26.40
21.5 2 , 27.5 6 24.5 43.20
27.5 2 , 33.5 8 30.5 9.60
33.5 2 , 39.5 4 36.5 19.20
39.5 2 , 45.5 4 42.5 43.20
45.5 2 , 51.5 1 48.5 16.80
Total 25 158.40
84
The average absolute difference between each observation of the time taken to
travel to work and the mean is 6.34 minutes.
Activity 4.9
85
Ungrouped data
Steps
__
1. Calculate the arithmetic mean (x ).
2. Find the difference between each observation and the mean by subtracting
__
from each data value: (x 2 x
)
__ 2
3. Square each difference: (x 2 x )
__ 2
4. Sum the squared differences: S(x 2 x )
5. Divide the sum by (n 2 1) to get the average difference.
Example 4.13
Calculate the standard deviation for the number of cars entering a parking area
during a sample of 10-minute intervals.
86
Activity 4.10
22 29 27 30 12 22 31 15 26 16 48 23
Determine the standard deviation of the travelling time for the riders and
interpret the result in the context of the data.
Grouped data
To estimate the standard deviation from data grouped into a frequency
distribution, we assume that each class is represented by its midpoint (x).
87
Steps
1. You need a frequency table with the following columns: classes, frequencies
and midpoints.
__
2. Compute the arithmetic mean (x )
3. Subtract the mean from each class midpoint and square the difference:
__
(x 2 x )2
__
4. Multiply the squared difference by the frequency within each class: (x 2 x )2f
5. Sum the result to obtain the total squared deviation from the mean:
__ 2
S(x 2 x ) f
6. Calculate the average of this total by dividing by (n 2 1)
7. The standard deviation is the square root of this total.
__ 2
8. Formula: S(x 2 x ) f
n 2 1
s 5
Example 4.14
The following frequency table shows the time (in minutes) taken to travel to
work for a sample of 25 people from Gauteng. Calculate the standard deviation
of the time to travel to work.
__
Class boundaries f x (x 2 x )2f
15.5 2 , 21.5 2 18.5 348.48
21.5 2 , 27.5 6 24.5 311.04
27.5 2 , 33.5 8 30.5 11.52
33.5 2 , 39.5 4 36.5 92.16
39.5 2 , 45.5 4 42.5 466.56
45.5 2 , 51.5 1 48.5 282.24
Total 25 — 1 512.00
__ Sxf 792.5
1. n 5
x 5 25
5 31.7 minutes (from example 4.12)
2. Subtract 31.7 from each x-value and square the difference.
3. Multiply each squared difference by the frequency of that difference and
__
record the answers in the (x 2 x )2f column. The first value in this column is
calculated as (18.5 2 31.7)2 3 2.
4. Sum the results.
5. Substitute the totals into the formula to determine the standard deviation.
88
The typical standard deviation between each observation of travelling time and
the mean is 7.94 minutes.
Activity 4.11
Class intervals f
1.95 2 , 3.95 2
3.95 2 , 5.95 5
5.95 2 , 7.95 5
7.95 2 , 9.95 3
9.95 2 , 11.95 1
Total 16
s
__x
CV 5 3 100
This is a unit-free number because the standard deviation and mean are
measured using the same units. The higher the result the more variability there
is in a set of data.
89
All the measures of dispersion described so far have dealt with a single set of
data. In practice it is often important to compare two or more sets of data with
different means, sample sizes or measurement units.
Example 4.15
50
1 000
CV(1 000 ml) 5
3 100 5 5%
40
500 3 100 5 8%
CV(500 ml) 5
For the 1 000 ml bottle, the CV of the filling process is 5% of the filling mean. For
the 500 ml bottle, the CV is 8% of the filling mean.
Although the machine filling the smaller bottle has a lower standard deviation,
the CVs indicate that it is the machine filling the larger bottle which is relatively
more consistent.
Activity 4.12
Two growers of grapefruit have obtained the following statistics regarding the
mass of their current crops:
__
Grower A: x 5 300 g with s 5 20 g
__
Grower B: x
5 280 g with s 5 40 g
90
A distribution is skewed if the curve appears skewed either to the left or to the
right, meaning that the one tail extends more to one side than the other. The
mode stays at the peak of the distribution because outliers do not influence the
mode at all. The influence of outliers is highest on the arithmetic mean because
the mean is affected by all values in the data set, including the extreme ones, and
tends to be located toward the tail of the skewed distribution. The median, being
dependent on the number of values in the data set rather than on the size of those
values, is less sensitive than the mean, since only the middle measurements are
used for its calculation. It is located somewhere between the mode and the mean.
Positive skewness (or skewed to the right) occurs when the majority of
the data values are concentrated on the left. There are a few data values that
are substantially larger than others and these larger values cause the mean to
increase while having little, if any, effect on the median. The mean will exceed
the median, and both the mean and the median will be greater than the mode.
The tail to the right will be longer than to the left.
91
Negative skewness occurs when the majority of the data values is concentrated
on the right of the distribution. There are a few data values that are substantially
lower than others and these smaller values cause the mean to decrease while
having little, if any, effect on the median. The mean will be less than the median,
and the mode will exceed both the mean and the median. The tail to the left will
be longer than to the right.
Simply by knowing the value of the skewness coefficient we can infer the general
shape of the distribution without resorting to a diagram:
• Skewness is measured on a scale: 23 # SK # 13
92
Example 4.16
Activity 4.13
93
Leptokurtic
Mesokurtic
Platikurtic
Consider a data set with a mean of 100 and a standard deviation of 15.
• The mean minus one standard deviation 5 100 2 15 5 85. This means that
85 2 100
85 is one standard deviation below the mean. The z-score 5 15 5 21.
A z-score is negative if the data value is less than the mean.
115 2 100
• The z-score for a value of 115 5
15 5 1. This means that 115 is one
standard deviation above the mean. A z-score is positive if a data value is
greater than the mean.
94
• All observations that fall between 85 and 115 are within one standard
deviation from the mean.
• Two standard deviations 5 2 3 15 5 30, 100 2 30 5 70 and 100 1 30 5
130. All observations that fall between 70 and 130 are within two standard
deviations from the mean.
• 100 1 3(15) 5 145. Observations above 145 exceed the mean by more than
three standard deviations.
4. The following two rules can be applied, depending on the shape of the
distribution.
• If the distribution is symmetrical, you can make a statement about the
proportion of data values that fall within a specified number of standard
deviations of the mean by making use of the empirical rule.
• A more general interpretation of the proportion of data values that fall
within a specified number of standard deviations of the mean is derived
from Chebysheff ’s theorem, which applies to distributions of all shapes.
Empirical rule:
• Approximately 68% of all observations fall within one standard deviation
from the mean.
• Approximately 95% of all observations fall within two standard deviations
from the mean.
• Approximately 99.7% of all observations fall within three standard deviations
from the mean.
Chebysheff’s theorem
• The proportion of observations in any sample that lies within z standard
1
deviations of the mean must be at least (1 2 z2
) 3 100, where z is any value
greater than 1.
1
• For z 5 2, at least (1 2 22
) 3 100 5 75% of all observations will fall within
__
two standard deviations of the mean. That will be the values between x 2 2z
__
and x 1 2z
• For z 5 3, at least (1 2 312 ) 3 100 5 88.89% of all observations will fall
within three standard deviations of the mean. That will be the values between
__ __
x 2 3z and x
1 3z.
• For z 5 4, at 412 ) 3 100 5 93.75% of all observations will fall within
least (1 2
__
four standard deviations of the mean. That will be the values between x 2 4z
__
and x
1 4z.
95
Example 4.17
13 26 16 21 15 31 15 30 14 11
__
1. x 5 19.2 hours
2. s 5 7.33 hours
3. Range: 31 2 11
4. The stem-and-leaf plot shows a positive skewed distribution. Pearson’s
second coefficient of skewness can also be used to determine the shape.
Stem Leaf
1 134556
2 16
3 01
Example 4.18
The stem-and-leaf plot below displays the IQ scores of a sample of 112 children.
Stem: tens
Leaf: ones
96
Stem Leaf
6 1
7 25679
8 0000124555668
9 0000112333446666778889
10 0001122222333566677778899999
11 00001122333344444477778899999
12 01111123445669
13 006
14 26
15 2
__
The summary statistics for this distribution are: x
5 104.5 and s 5 16.3
Activity 4.14
The following frequency table shows the time (in minutes) taken to travel to work
for a sample of 25 people from Gauteng. Interpret the centre and variability for
the sample.
Class boundaries f
15.5 2 , 21.5 2
21.5 2 , 27.5 6
27.5 2 , 33.5 8
33.5 2 , 39.5 4
39.5 2 , 45.5 4
45.5 2 , 51.5 1
Total 25
97
Activity 4.15
Interpret the centre and variability of the number of cars entering a parking
area during a sample of 10-minute intervals. From previous examples we know
that:
__
5 19.6
x
s 5 8.72
median 5 22.5
10 22 31 9 24 27 29 9 23 12
98
quartile, Q2 or P50. The exact location must therefore be determined before the
value can be calculated.
Ungrouped data
Steps
1. Arrange the numbers in numerical order.
2. Change quartiles to percentiles (Q15 P25 and Q3 5 P75)
3. Determine the position of the percentile or quartile you want to obtain.
(Where in the numerical list will you locate this value?) Use the following
formula to determine the position of the quartile or percentile:
jn
Pj 5
100
with j the jth percentile and n the number of observations.
4. If the position results in a fraction, choose the next larger integer. If the
position results in an integer, add 0.5.
5. Read the value from the numerical list.
Example 4.19
26 13 16 21 15 31 15 30 14 11
2. Determine the position of each value and read the value from the array.
jn 25(10)
• Q1 position 5 P25 5
100 5
100 52.5 rounded to position 3.
Q1 value 5 14. This means that 25% of the TV cartoons have less than 14
incidents of verbal and physical violence per cartoon and the other 75%
have more than 14 incidents per cartoon.
75(10)
• Q3 position 5 P75 5
100
5 7.5 rounded to position 8.
99
Q3 value 5 26. This means that 75% of the cartoons have less than 26
incidents of verbal and physical violence per cartoon and 25% have more
than 26 incidents.
80(10)
• P80 position 5
100
5 8 rounded to position 8.5.
26 1 30
P80 value 5
2
5 28
This means that 80% of the cartoons have less than 28 incidents and
20% have more than 28 incidents of verbal and physical violence.
20(10)
• P20 position 5
100
5 2 rounded to position 2.5.
13 1 14
P20 value 5
2
5 13.5
This means that 20% of the cartoons have less than 13.5 incidents.
Activity 4.16
The following data shows the number of cars entering a parking area during a
sample of 10-minute intervals.
10 22 9 24 27 29 9 23 12 31
Calculate Q1, Q3, P90 and P10. Interpret your answers in context of the data.
Grouped data
Steps
1. Construct a frequency distribution with classes and frequencies.
2. Construct the cumulative ‘less than’ frequency column.
3. Determine the position of the quartile or percentile value.
jn
Position Qj 5
4
jn
Position Pj 5
100
100
Qj 5 L 1
( jn )
4 2 F c
f
Q
Pj 5 L 1
( jn
)
100 2 F c
fP
Where:
L 5 Lower boundary of the quartile or percentile class
n 5 Sf
fQ 5 frequency of the quartile class
fP 5 frequency of the percentile class
c 5 width of the class
F 5 sum of the frequencies up to but not including the chosen class
These values can also be determined graphically by making use of the ogive.
Steps
1. Locate the position of the fractile on the y-axis.
2. Draw a horizontal line from the y-axis to the ogive.
3. Drop a straight line down to the x-axis.
4. Read the estimated value of the fractile from the x-axis.
Example 4.20
101
1(255)
1n
1. Position of Q1 5 4 5 4
5 63.75
3(255)
3n
Position of Q3 5 4 5 4
5 191.25
85n 85(255)
100
Position of P85 5 5
100
5 216.75
4
Q1 5 3.5 1
( 1(255)
2 32 3
)
5 4.38
108
This means that 25% of the patients stay less than 4.38 days in hospital.
The rest, 75% of the patients, stay longer than 4.38 days.
4
4. Q3 5 6.5 1
( 3(255)
2 140 3 )
5 8.79
67
This means that 75% of the patients stay less than 8.79 days in hospital.
102
5. Pj 5 L 1
( jn
5 10.54
)
100 2 F c
f p
( 85(225) )
3
100
2 207
P85 5 9.5 1
28
5 10.54
This means that 85% of the patients stay less than 10.54 days in the hospital.
The other 15% of the patients stay longer than 10.54 days.
Activity 4.17
The following frequency table shows the time (in minutes) taken to travel to work
for a sample of 25 people from Gauteng. Calculate Q1, Q3, P90 and P10. Interpret
your answers in context of the data.
Class boundaries f
15.5 2 , 21.5 2
21.5 2 , 27.5 6
27.5 2 , 33.5 8
33.5 2 , 39.5 4
39.5 2 , 45.5 4
45.5 2 , 51.5 1
Total 25
Q1 Q2 Q3
P25 P50 P75
103
The interquartile range 5 Q3 2 Q1 the middle two quarters or middle 50% range.
2. A middle range is the middle proportion of the data between two percentiles
where the cut-off portions, at the beginning of the data set and the end of
the data set, are equal.
• Middle 80% range 5 P90 2 P10
[This means that the first and the last 10% of the data are cut off.]
• Middle 40% range 5 P70 2 P30
[This means that the first and the last 30% of the data are cut off.]
Example 4.21
26 13 16 21 15 31 15 30 14 11
Q1 value 5 14
Q3 value 5 26
P80 value 5 28
P20 value 5 13.5
Interquartile range 5 Q3 2 Q1 5 26 2 14 5 12
Middle 60% range 5 P80 2 P20 5 28 2 13.5 5 14.5
Activity 4.18
The following data shows the number of cars entering a parking area during a
sample of 10-minute intervals. Calculate the middle 80% range, middle 70%
range and the middle 60% range.
10 22 9 24 27 29 9 23 12 31
104
Steps
1. Determine the Q3 and Q1 values.
2. Subtract Q3 2 Q1 and divide the difference by 2.
Example 4.22
Q1 5 4.38
Q3 5 8.79 (from example 4.20)
Q3 2 Q1 8.79 2 4.38
Quartile deviation 5 QD 5 5
2 2
5 2.2
Activity 4.19
Determine the quartile deviation for the number of cars entering a parking area
during a sample of 10-minute intervals.
10 22 9 24 27 29 9 23 12 31
Note: The five numerical values divide the data set into four subsets, with
approximately 25% of the observations in each quarter.
S Q1 Med Q3 L
Example 4.23
11 13 14 15 15 16 21 26 30 31
Activity 4.20
Do a five-number summary table for the number of cars entering a parking area
during a sample of 10-minute intervals.
10 22 9 24 27 29 9 23 12 31
106
It visually shows the range of the data values by indicating the smallest and
largest values, the first and third quartiles, the median (to show where the data
is centred) and how spread out the data is. The degree of symmetry can be
identified by inspection.
• It is compact, and provides information about centre, spread, symmetry and
the presence of outliers.
• These plots are particularly useful when you want to compare several sets of
related data.
Steps
1. Draw a horizontal x-axis which covers the range of the data values (S and L).
2. Do the five-number summary table.
3. Draw a rectangular box horizontal to the x-axis whose left edge is at the
lower quartile value (Q1) and whose right edge is at the upper quartile value
(Q3). The box width is the interquartile range and shows the spread of the
middle 50% of the data.
4. Draw a vertical line inside the box at the median.
5. Draw whiskers (horizontal lines) from the midpoints from each end of the
box out to the smallest value and to the largest value.
6. Place markers at distances 1.5 times the interquartile range from either end
of the box. These are known as the inner fences.
7. Outliers are data values between the inner fences and the smallest and
largest values.
8. A box plot also shows the symmetry or skewness of a distribution. In a
symmetric distribution the Q1 and Q3 values are equally distant from the
median. If the distribution is skewed to the right, the Q3 value will be farther
away from the median than the Q1. If the distribution is skewed to the left,
Q3 will be closer to the median than Q1.
Example 4.24
11 13 14 15 15 16 21 26 30 31
107
25 0 5 10 15 20 25 30 35 40 45
Q1 Q3
Interpretation
The five-number summary values are all indicated on the plot:
• Outliers are any observations larger than 26 1 1.5(12) 5 44 or smaller than
14 2 1.5(12) 5 24. The whisker to the left extends to 11, which is the smallest
value and not an outlier. The whisker to the right extends to 31, which is the
largest value and not an outlier. That means there are no outliers.
• The Q1 is closer to the median than Q3; therefore the distribution is positively
skewed.
Activity 4.21
10 22 9 24 27 29 9 23 12 31
TEST YOURSELF 4
For the distributions below, calculate and interpret in the context of the data (if
possible):
• arithmetic mean
• median
• mode
108
• range
• mean absolute deviation
• standard deviation
• variance
• coefficient of variation
• Pearson’s second coefficient of skewness
• draw graphs to determine median and mode
• interquartile range
• quartile deviation
• box plot.
1. The following data represents the pulse rates (beats per minute) of nine
students enrolled for the statistics course:
76 60 60 81 72 80 80 68 73
4. During a quality assurance check, the actual coffee content (in grams) of six
jars of instant coffee was recorded as:
82.9 76.9 88.0 82.5 82.4 82.8
5. A sample of apples, guavas and mangos were analysed for the pesticide residues
in the fruit. The amounts, in mg/kg, of a certain pesticide were as follows:
0.2 1.6 4.0 5.4 5.7 11.4 0.2 3.4 2.4 6.6 4.2 2.7
6. During one month, records show the following results for the number of
workers absent per day:
109
13 14 9 17 21 10 15 22 19 13 5
22 13 19 23 17 21 10 9 20 18
7. The daily sales of a small business (in R’000) are given below for an eight-
day period:
8.2 11.5 10.1 9.4 15.1 6.1 10.3 12.3
8. The following sample of lifetimes (in hours) of a certain type of battery used
in a remote control is recorded as follows:
5.5 5.1 6.2 6.5 5.8 5.6 5.8 6.0
11. The number of minutes after their appointment times each of a random
sample of 64 patients had to wait to be served in a major local health facility
were observed as follows:
Waiting time Number of patients
02,4 10
42,8 17
8 2 , 12 16
12 2 , 16 14
16 2 , 20 7
110
13. The lecturer in computer science recorded the amount of computer time (in
minutes) needed by each student to complete an assignment:
Time Number of students
0.1 2 , 0.5 3
0.5 2 , 0.9 10
0.9 2 , 1.3 16
1.3 2 , 1.7 9
1.7 2 , 2.1 5
14. A study of the number of trips on a particular day for a sample of 40 taxi
drivers revealed the following data:
Number of trips Frequency
02,5 3
5 2 , 10 6
10 2 , 15 8
15 2 , 20 13
20 2 , 25 7
25 2 , 30 3
15. A factory manager records the yearly sick leave (rounded to the nearest half
day) taken by his employees:
111
17. The Strongbo Rubber Company has two factories. Both factories employ
students during the holiday seasons. In factory A, the students are paid on
average R982 per week with a standard deviation of R158. In factory B,
the students earn on average R1 208 per week with a standard deviation of
R214. Which factory has the greatest relative dispersion?
18. Data has been collected on the life (in hours) of two brands of light bulbs.
Compare the two brands using the coefficient of variation.
Brand A Brand B
Mean 5 5 800 Mean 5 5 770
Standard deviation 5 5 100 Standard deviation 5 560
112
5 Index numbers
Index numbers are used in business and economics as indicators to measure how
much an economic variable changes over time or differs between two locations.
The types of indexes that dominate business and economic applications are:
1. Price indexes (Ip) which are the most frequently used. They measure the
percentage change between two time periods (or other characteristic)
of a product or group of products. The current unit price of a product is
expressed as a percentage of the unit price in the base period.
2. Quantity or volume indexes (Iq) which measure how much the quantity of a
commodity or group of commodities changes over time.
An index number that represents a comparison for a single item is a simple index
number. In contrast, when the index number has been constructed for a group of
items, known as a basket of goods, it is an aggregate or composite index number.
114
Steps
1. Obtain the prices or quantities for the product over the time period of interest.
2. Select the period to be used as base.
3. Divide the current price (Pi) of the commodity by the base price (Pb).
4. Multiply this ratio by 100.
current price
5. Price index (I ) 5
3 100
P base price
P
i 3 100
IP 5 Pb
Pi represents the current period price and Pb the base period price.
6. The formula for a quantity index can be obtained by interchanging the
values of P and Q in the price index formulae:
Q
Q i × 100
IQ 5
b
Qi represents the current period quantity and Qb the base period quantity.
115
Example 5.1
If milk cost R3.50 per litre in 2012 and R3.85 in 2013, the simple price index
for 2013 will be
P 3.85
i 3 100 5
IP 5 3.50 3 100 5 110
P b
This means that milk increased in price by 10% during the period concerned.
Activity 5.1
The following table provides the price per 500 g and the quantities purchased of
nuts during the years 2012 and 2013. Use 2012 as base and construct simple
price and quantity indexes per commodity for 2013.
Prices per kg (rands) Quantities purchased (kg)
2012 2013 2012 2013
Peanuts 45 50 600 550
Pecans 55 60 300 350
Cashews 60 70 325 325
116
Steps
1. Obtain the prices for the commodity over the time period of interest.
2. Select the base period.
3. Sum the prices of all the items in the current period (Pi).
4. Sum the prices of all the items in the base period (Pb).
5. Divide the numerator by the denominator.
6. Multiply the result by 100.
7. Interpret the answer.
8. If you want to calculate a simple unweighted quantity index, substitute all
the prices with quantities.
SQ i
IQ 5
SQ
3 100
b
Example 5.2
The following table shows the costs of course material and price per unit that a
student needs for a course in statistics:
2012 2013
Pb Pi
Textbook 203 229
Calculator 80 70
Answer manual 10 10
Total 293 309
SP 309
SP i 3 100 5
IP 5 293 3 100 5 105.46
b
117
This means that the prices increased by 5.46% over the period under
consideration.
Activity 5.2
Laspeyres index
This method uses quantities consumed during the base period as a weighting
factor and assumes that whatever the price changes, the quantities purchased
will remain the same. The changes in the index can therefore be attributed to
price changes. Using this method will be misleading when the buying quantities
change significantly from those in the base period. One solution to the problem
of buying quantities that change relative to those of the base period is to change
the base period regularly, so that the quantities are regularly updated.
Laspeyres index may lead to an overestimation of the inflation rate because
people tend to reduce consumption of items which get more expensive.
The best-known Laspeyres index is the consumer price index (CPI).
SPiQb
Ip(L) 5
SP Q 3 100
b b
118
Steps
1. Collect price and quantity information for each of the product items to be
used in the composite index.
2. Select the base period.
3. Denote the current prices and quantities as Pi and Qi respectively.
4. Denote the base prices and quantities as Pb and Qb respectively.
5. Multiply the current period price (Pi) by the base period quantity (Qb) for
each product item and sum the resulting values (SPiQb).
6. Multiply the base period price (Pb) by the base period quantity (Qb) for each
product item and sum the resulting values (SPbQb).
7. Divide the first sum by the second sum and multiply the result by 100.
8. Interpret the answer.
Paasches index
This method uses quantities of the products in the basket consumed during the
current period as a weighting factor. It measures the change in total cost of goods
that represent a consumption pattern typical of the current year, and therefore
avoids the problem of changing consumption patterns. This can lead to an
underestimation in the rise of the inflation rate. It is also difficult to make year-
to-year comparisons because of the continuous changing of the base period.
SP Q
SP i Qi
Ip(P) 5 3 100
b i
Steps
1. Collect price and quantity information for each of the product items to be
used in the composite index.
2. Select the base period.
3. Denote the current prices and quantities as Pi and Qi respectively.
4. Denote the base prices and quantities as Pb and Qb respectively.
5. Multiply the current period price (Pi) by the current period quantity (Qi) for
each product item and sum the resulting values (SPiQi).
6. Multiply the base period price (Pb) by the current period quantity (Qi) for
each item and sum the resulting values (SPbQi).
7. Divide the first sum by the second sum and multiply the result by 100.
8. Interpret the answer.
119
Example 5.3
SPiQb 514.8
Ip(L) 5
SP Q 3 100 5
471.0
3 100 5 109.3
b b
Activity 5.3
120
The formula used in determining the CPI in South Africa is that of Laspeyres.
SP Q
SPi Q1
CPI 5 Ip(P) 5
3 100
b b
To determine the base year weight factor, at least 10 000 households were sampled
out of different income groups and metropolitan areas. Following international
practice, the base period used for the CPI must change at least every five years.
A monthly index for each consumer item for each area is determined by making
use of the above formula and then a combined CPI is calculated.
Example 5.4
74.2
2008 5
117.0
3 100 5 63.4
121
81.4
2009 5
117.0
3 100 5 69.6
100.0
2010 5
117.0
3 100 5 85.5
Activity 5.4
For the following consumer price indexes, move the base to 2011.
2007 2008 2009 2010 2011 2012 2013
100 104.2 109.8 116.3 121.3 120.0 117.4
Example 5.5
The rand value of sales for the Papillion Café between January and May is as
follows:
Month January February March April May
Sales (R) 14 980 16 433 20 194 23 015 23 621
Link relative 109.7 122.9 114.0 102.6
16 433
• February index 5
14 980
3 100 5 109.7
20 194
• March index 5
16 433
3 100 5 122.9
Activity 5.5
The following table lists the retail price of milk per litre between June and October
at a local grocery store. Determine the link relatives for milk.
122
Example 5.6
20
Percentage change 5
100 3 100 5 20%
40
120 3 100 5 33.33%
5
28
160 3 100 5 17.5%
5
As the index gets larger in a fixed-base index, the same percentage change is
represented by a larger difference. For example, a change from 100 to 120 is the
same as a change from 300 to 360, but the impression can be very different. In
practice the solution will be to change the base year.
123
Example 5.7
If your income increased from R70 000 to R80 000 and the current inflation
rate is 6%, the nominal growth rate is 14% and the real growth rate is:
10 000 2 ( 6
100 )
3 10 000 5 9 400
9 400
70 000
3 100 5 13%
TEST YOURSELF 5
124
3. The table below shows the prices and annual consumption of the raw
materials used in Gauteng Breweries in 2011 and 2012:
Prices (R) Unit Quantities
2011 2012 2011 2012
Malt 49 46 10 874 15 116
Hops 512 724 732 696
Sugar 46 51 1 865 2 486
Wheat flour 31 27 873 1 093
4. Tixif Limited sells three types of chain saws. Company records showed the
prices (R’000) and quantities sold as follows:
Price (R’000) Quantities
2012 2013 2012 2013
X 30 40 22 30
Y 50 60 31 40
Z 120 99 8 12
5. Mr Hiram, a pensioner, has kept a record of the costs of certain items
purchased weekly:
Price per unit (R) Quantities Purchased
2012 2013 2012 2013
Coffee 12.00 15.00 30 32
Cookies 5.60 4.99 7 9
Sugar 7.50 8.20 20 24
6. Mr Rolling has been offered a job in Cape Town with a salary of R123 500 a
year. The cost of living index is 132. If he presently earns R100 000 a year
in Johannesburg with a cost of living index of 120, will he be financially
better off in the new job?
7. The CPI values for the first eight months of 2012, with 2010 5 100, were:
233 236 240 243 248 249 252 255
Shift the base of the index to June 2009 and determine the purchasing
power of the rand per month for both index series.
8. The producer price indexes with 2010 5 100, for the previous twelve
months were:
181 201 221 227 234 238 245 249 260 268 290 300
125
Calculate the percentage point increases and the percentage increases in the
index numbers.
9. The reported new cases of tuberculosis in a busy hospital were as follows:
January February March April May June
239 311 289 264 321 199
Calculate the chain indexes and then the percentage point and percentage
increases.
10. The following figures relate to the library expenditure (R) of a small town.
Also given is the retail price index per year:
Year 2010 2011 2012 2013
Expenditure (R) 4 800 5 230 5 800 6 700
Retail index 103 110 120 128
If the retail index is taken into consideration, was there a real increase in the
library expenditure?
126
Regression and correlation analysis are statistical tools used to study the
relationship between two variables, one of which is dependent and the other
independent. It is used to determine:
• whether there is a relationship between the two variables
To describe the relationship between the two variables we first graphically
represent the data in a scatter diagram. This visual representation can give
an immediate impression of a set of data; it will illustrate whether there is a
relationship and also suggest whether the relationship is linear, non-linear,
positive or negative. The strength of the relationship may be concluded
tentatively. Note that this unit deals only with linear relationships. Linear
means that a straight line can be used to represent the data pairs.
• how good that relationship is
The correlation coefficient measures how good this relationship is by making
use of a single number.
• how the relationship can be used to make predictions
Once the scatter diagram and correlation coefficient indicate that a linear
relationship exists between the two variables we proceed to find a linear equation
that describes the relationship between the two variables. This equation can be
used to make predictions within the given range of the data (interpolation).
Activity 6.1
Steps
1. Collect pairs of data (x, y). The data are paired in a way that matches each
value from one data set with a corresponding value from a second data set.
128
2. Select which variable is the dependent (y) variable and which is the
independent (x) variable. The label y goes to the variable which we want to
predict. The other variable is then labelled x.
3. Arrange the data in two columns, x and y.
4. Draw a set of axes.
5. The horizontal axis represents the x variable and is scaled so that any x
value can be easily located.
6. The vertical axis represents the y variable and is scaled so that any y value
can be easily located.
7. Each pair of observations (x, y) is plotted as a point. That is where a vertical
line from the value on the x axis meets a horizontal line from the value on
the y axis.
8. The points are not connected.
9. Scatter plots can take on the following patterns:
• The plot can show no relationship, because no pattern can be identified.
• The plot can show a positive relationship because the dots start at the bottom
left and move upwards to the top right. Although the data points do not fall
exactly on a line, they appear to cluster about a line. A positive relationship
means that if the x variable increases, the y variable will also increase.
• The plot can show a negative relationship because the dots start at the
top left and move downwards to the bottom right. A negative relationship
means that if the x variable increases, the y variable will decrease.
• If all the points fall exactly along a straight line, in a negative or positive
direction, we say the relationship is perfect.
• Non-linear relationships are beyond the scope of this unit.
y
y
x x
no correlation negative correlation
»
129
y y
x x
positive correlation non-linear correlation
Example 6.1
You timed how long it takes 10 workers to assemble an item. It was possible for
you to match these times with the length of the workers’ experience (in months).
The results obtained are shown below:
Person Experience (months) x Time (min) y
A 2 27
B 5 26
C 3 30
D 8 20
E 5 22
F 9 20
G 12 16
H 16 15
I 1 30
J 6 19
67 225
130
Time (min)
20
15
10
5
0
0 4 8 12 16 20
Experience (months)
The scatter plot shows a negative relationship, which means the more experienced
workers take a shorter time to assemble the product.
Activity 6.2
During the baking of a certain type of bread roll, each bread roll goes through
a series of heat processes. The length of time spent under this heat treatment
is related to the lifespan of the bread rolls. A sample of eight bread rolls that
underwent different baking times was selected and the life span (in hours) of
each was recorded:
Length of time 18 13 18 15 10 12 8 4
Life span 23 20 18 16 14 11 10 7
Draw a scatter diagram and interpret the relationship in the context of the given
data.
131
Steps
1. List the values of the x and the y variables in two columns.
2. Sum the x values (Sx) and sum the y values (Sy).
3. Square each of the x values and add the column (Sx2).
4. Square each of the y values and add the column (Sy2).
5. Multiply each x with its corresponding y value and add the products (Sxy).
6. Substitute these values into the formula for r and determine the value of r.
n Sxy 2 SxSy
r 5
2
[n.Sx 2 (Sx)
2
][n.Sy
2 2
2 (Sy) ]
132
• If the association between two variables is strong, then knowing the one
variable helps a lot in predicting the other. But when there is a weak association,
information about one variable does not help much in estimating the other.
Example 6.2
You timed how long it takes 10 workers to assemble an item. It was possible for
you to match these times with the length of the workers’ experience. The results
obtained are shown below:
Person Experience (x) Time (min) (y) xy x2 y2
A 2 27 54 4 729
B 5 26 130 25 676
C 3 30 90 9 900
D 8 20 160 64 400
E 5 22 110 25 484
F 9 20 180 81 400
G 12 16 192 144 256
H 16 15 240 256 225
I 1 30 30 1 900
J 6 19 114 36 361
Total 67 225 1 300 645 5 331
1. Correlation coefficient:
n Sxy 2 SxSy
r 5
2
2
[n.Sx 2 (Sx)
2 2
][n.Sy 2 (Sy) ]
10(1 300) 2 (67)(225)
r 5
2
5 20.90
[10(645) 2
(67) ][10(5
2
331)
2 (225) ]
133
2. Coefficient of determination:
r2 3 100 5 (20.90)2 3 100 5 81%
81% of the total variation in the time taken to produce an item can be
explained by the variation in experience. The remaining 19% is explained
by other factors.
Activity 6.3
During the baking of a certain type of bread roll, each bread roll goes through a series
of heat processes. The length of time spent under this heat treatment is related to the
lifespan of the bread rolls. A sample of eight bread rolls that underwent different
baking times was selected and the life span (in hours) of each roll was recorded.
Length of time 18 13 18 15 10 12 8 4
Life span 23 20 18 16 14 11 10 7
Where:
y^ 5 estimated y value for a given x value
a 5 intercept on the y axis
b 5 the slope (the average change in y for each change of 1 unit in x)
134
Steps
1. Obtain a random sample of n data pairs (x, y), with x the independent
variable and y the dependent variable.
2. Use the data pairs to calculate n, Sx, Sy, Sxy and Sx2.
3. Calculate a and b values in the equation by making use of the method of least
squares: the least squares principle states that the method used to determine
the regression equation must be such that the sum of the squares of the
differences between each actual y value and the corresponding estimated y
value is a minimum.
4. The b value is the slope of the straight line equation. The slope is the amount
by which y increases or decreases when x increases by 1 unit.
nSxy 2 SxSy
nSx2 2 (Sx)2
b 5
5. The a value is the y intercept, that is where x 5 0 and the straight line crosses
the y axis.
Sy Sx
a 5 n 2 b
n
6. Substitute the a and the b values into the regression line equation:
y^ 5 a 1 bx
Steps
1. For predictions using a graph, a straight line is fitted to the data. The
best fitting straight line can be obtained by making use of the equation:
y^ 5 a 1 bx
2. Since we are dealing with a linear function, we need to estimate only two
points; the rest of the y^ values will all fall on that straight line.
»
135
3. Choose any two x values from the x scale on the scatter diagram. Substitute
the two chosen values for x into the regression equation and obtain the two
corresponding values for y^.
4. Plot the two coordinate points on the same axis as the scatter diagram.
5. Use a ruler and draw a straight line through the two points.
6. When the estimated y^ is plotted against x and the points are connected, the
result is called the regression line or the line of best fit.
7. To do a prediction for a specific x value, find the required x on the x axis and
draw a vertical line to the regression line. From this point draw a horizontal
line to the y axis and read the estimate from the y axis.
Note: Estimated y values are meaningful only for x values in (or close to) the
range of the given data.
Example 6.3
You timed (in minutes) how long it takes 10 workers to assemble an item. It was
possible for you to match these times with the length of the workers’ experience
(months).The results obtained are shown below. Develop the regression equation
and estimate the assembly time for a worker with four months’ experience and
for a worker with 10 months’ experience.
Person Experience x Time y xy x
2
y
2
A 2 27 54 4 729
B 5 26 130 25 676
C 3 30 90 9 900
D 8 20 160 64 400
E 5 22 110 25 484
F 9 20 180 81 400
G 12 16 192 144 256
H 16 15 240 256 225
I 1 30 30 1 900
J 6 19 114 36 361
67 225 1 300 645 5 331
136
1. Regression equation:
nSxy 2 SxSy
nSx2 2 (Sx)2
b 5
225 67
10 2 (21.06) 10 5 29.60
a 5
y^ 5 a 1 bx
y^ 5 26.60 1 (21.06)x
2. Interpret a and b:
The a value in the equation represents the y intercept, that is the point on
the y axis where x 5 0. In this equation it means that a worker with no
experience will take 29.6 minutes to assemble the product.
The b value represents the slope, which means that for every additional month
of experience, a worker will take 1.06 minutes less to assemble the product.
3. Estimates:
For a worker with four months’ experience:
y^ 5 29.60 1 (21.06)x
y^(x 5 4) 5 29.60 1 (21.06)(4) 5 25.36 min
137
Activity 6.4
During the baking of a certain type of bread roll, each bread roll goes through
a series of heat processes. The length of time spent under this heat treatment
is related to the lifespan of the bread rolls. A sample of eight bread rolls that
underwent different baking times was selected and the life span (in hours) of
each roll was recorded.
Length of time 18 13 18 15 10 12 8 4
Life span 23 20 18 16 14 11 10 7
If a bread roll spends 16 hours under heat treatment, how long do you expect
it will remain fresh? Do your estimate using the regression equation and the
regression line. Give the meaning of the slope.
1 2 6Sd2
n(n2 2 1)
rs 5
138
Steps
1. Rank the x and the y variables: assign numbers from 1 onwards to the data
values, starting with the smallest (or largest) value up to the largest (or smallest)
value. Keep each x together with its y. Remember to use the same type of ranking
for x and y: from high to low or from low to high. If two values are the same,
they are first assigned ranks (say 2 and 3) and then the average of the ranks is
determined (2.5).That average is then assigned to each appropriate value.
2. Calculate the difference between ranks of the two variables (d).
3. Square these differences (d2) and add the column.
4. Substitute the required values into the formula and calculate the rank-order
correlation coefficient.
5. Interpret the coefficient.
Example 6.4
The safety officer of a company wants to know if experience influences the quality
of an employee’s work. She selects 10 employees at random and records their
years of work experience and their quality rating as assessed by their supervisors.
Assume that the employees with the most years of experience will be the most highly
rated. To keep the type of ranking the same, the least years of experience will be
ranked lowest. Note that the x ranking (quality of work) was done by the safety officer.
x y Rating d d2
Employee Experience x code y code
1 1 2 1 1 1
2 17 5 8 23 9
3 20 5 9 24 16
4 9 6 4.5 1.5 2.25
5 2 3 2 1 1
6 13 5 7 22 4
7 9 4 4.5 20.5 0.25
8 23 6 10 24 16
9 7 3 3 0 0
10 10 6 6 0 0
Total 49.5
139
The correlation is moderate and positive. This means that the more
experience the employees have, the better their rating.
Activity 6.5
During the baking of a certain type of bread roll, advertised as having a long
shelf-life, each bread roll goes through a series of heat processes. The length of
time spent under this baking process is related to the shelf-life of the bread rolls.
A sample of 8 bread rolls that underwent different baking times was selected,
and the shelf-life (in hours) of each roll was recorded.
Length of time 18 13 18 15 10 12 8 4
Shelf- life 23 20 18 16 14 11 10 7
TEST YOURSELF 6
140
Frying time 65 50 35 30 20 15 10 5
Moisture content 1.4 1.9 3.0 3.4 4.2 8.1 9.7 16.3
Predict the moisture content of the chips after 40 seconds’ frying time.
2. From the following data determine the resting pulse rate that you would
expect from someone exercising for a daily average of (a) 45 minutes (b) 15
minutes and (c) 2.5 hours:
Daily exercise (min) 20 30 60 10 100 0 120 160 160 180
Pulse/min. 75 70 70 85 50 90 60 52 48 64
3. The following sample data measures levels of anxiety before a test and test
marks obtained for the test.
Anxiety 23 14 14 0 17 20 20 15 21
Test % 43 59 48 77 50 52 46 51 51
What test marks do you expect if the anxiety level was 12?
4. A chemistry lab testing food has 7 divisions that do different chemical tests
on food products. The number of hours devoted to safety training and the
number of hours lost due to industry-related accidents were recorded for
each division:
Safety training 10 19 30 45 50 65 80
Hours lost due to accidents 80 65 68 55 35 10 12
After 60 hours of training, how many hours do you expect to lose due to
accidents?
5. The Rip-Off Vending Machine Company operates coffee vending machines
in several office buildings. The company wants to study the relationship that
exists between the number of cups of coffee sold per day and the number
of persons working in each office. Data for this study was collected by the
company and is presented below:
Number of cups sold 10 20 30 40 30 20 40 40 50 10 40 20
Number of persons 5 6 14 19 15 11 18 22 26 4 23 10
Predict the number of cups of coffee that you would expect to sell if there
are 45 people working in an office.
6. The following data reflects the family income and food expenditure (in
R’000) of a sample of 10 low-income families:
141
142
10. Below are the rankings of the top 10 products produced by Peter’s Party
Products for last year and this year:
Product Last year This year
Crackers 1 3
Hats 3 1
Masks 4 2
Balloons 6 10
Whistles 7 9
Streamers 8 7
Flags 9 8
Face paint 2 4
Joke food 5 6
Joke cards 10 5
Compute the Spearman correlation coefficient and interpret your answer.
11. Ten sales agents of a company had the following number of years of service:
Agent A B C D E F G H I J
Years 8 6 4 12 5 3 1 14 9 10
The manager of the company arranged the agents in the following order,
from most excellent (H) to least excellent (F).
H I D J A E C B G F
143
7 Time series
This unit discusses the general use of forecasting in business and several methods
that are available for making forecasts.
y5T3C3S3I
Note: The trend component (T) is stated in the same units as y, while the
remaining three components are expressed as percent adjustments. A value
above 100 indicates an above average effect for the component and a value
below 100 indicates a below average effect.
For a time series composed only of annual data there is no seasonal component.
In that case the time series model becomes:
y5T3C3I
146
Note: Analysis of cyclical and irregular influences on data is useful for describing
past variations but, because of their unpredictability, their value in forecasting
is very limited. Instead, a number of business indicators are used to forecast
cyclical turning points. Predicting cyclical and irregular variations requires
techniques beyond the scope of this unit.
If we ignore the C and I components, since by definition they can’t be predicted,
the forecasting model will become:
y^ 5 T 3 S
Activity 7.1
The sales (R’000) for Turtle Toys have been analysed and the values of the four
components have been determined for the preceding four quarters. Find the
missing values in the table below, assuming a multiplicative model.
Sales (y) Trend (T) Seasonal (S) Cyclical (C) Irregular (I)
Winter 1 000 50 107 101
Spring 820 1 100 70 105
Summer 988 105 98
Autumn 2 623 1 300 200 97
7.2 Historigram
The standard graph to portray the behaviour of data over time is a line graph,
known as a historigram.
1. Time is the independent variable (x) and is measured along the horizontal
axis.
2. The variable of interest is the dependent variable (y) and is measured on the
vertical axis.
3. Plot the time series values (on the vertical y axis) against time (on the
horizontal x axis) as single points, and join the points by straight-line
segments.
147
7.3 Time-series decomposition
Steps
1. The observed time series data is the dependent y variable since this is the
variable that we want to predict.
2. The time variable is the independent x variable. The time periods are
translated into x values by using a simple coding process: 1 represents the
first time period, 2 the second time period, and so on, until the final time
period. Assume that the x code falls in the middle of the time period it
represents.
3. Having established the values for the x and y variables, the following
formulae can be used to identify the trend line through the data:
y^ 5 a 1 bx
where y^ is the trend value for a given time period.
nSxy 2 SxSy
b 5
nSx2 2 (Sx)2
»
148
Sy Sx
a 5
n 2 b
n
where a 5 y-intercept.
4. To calculate the trend values (y^) with the trend line equation, substitute the
appropriate x code into the equation and compute the value of y^.
5. Plot the trend values (y^) together with the corresponding time periods on
the time-series graph and draw a line through the points. This is the trend
line. By extending this line, future values can be read from it.
6. To forecast future values using the equation, substitute the x with an
appropriate code for the year of forecast (extrapolation) and calculate y^.
Example 7.1
The following table shows the income (R’000) of Super 10 Taxis by year.
Year x-code Income y xy x² ŷ
2009 1 28 28 1 29.0
2010 2 31 62 4 30.3
2011 3 34 102 9 31.6
2012 4 30 120 16 32.9
2013 5 35 175 25 34.2
Total 15 158 487 55 158
The b value is the slope, which means that for every additional year the income
will increase by R1 300.
Sy
a 5
Sx
n 2 b
158
( )
5 2 1.3
n 5
15
5 5 27.7
The a value is the y intercept, which means that the trend line will cross the
y axis at point 27.7.
y^ 5 a 1 bx 5 27.7 1 1.3x
To forecast the income for the year 2014 we need a code for 2014. The x-unit
is one year, which means that for every year you move forward you move one in
the x-code. Therefore the code for 2014 will be 6 and the forecast for 2014 will be
149
34
32
30
Income (R’000)
28
26
24
22
20
2009 2010 2011 2012 2013
Years
To obtain the coordinates to plot the trend line, substitute the x in the trend
formula with the appropriate x code for that year to calculate the trend (y^) values
for each of the given years:
y^(2009) 5 27.7 1 1.3(1)
5 29 (1; 29)
y^(2014) 5 27.7 1 1.3(6)
5 35.5 (6 ; 35.5)
Activity 7.2
The following table shows the number of traffic tickets issued by the Alberton
Traffic Department for the first six months of the year.
Month No of tickets
January 120
February 120
March 100
April 90
May 130
June 150
150
1. Use the method of least squares to calculate the trend values for the period
under question.
2. Forecast the number of tickets that the Traffic Department can expect to
issue for the next three months.
3. Graph the time series as well as the trend line.
Method of semi-averages
This technique involves the calculation of two averages which, when plotted
on a graph as two separate points and joined up, form a straight line (or trend
line).
Steps
1. Split the data into two equal groups. If there is an odd number of years,
simply omit the middle year.
2. Calculate the arithmetic mean for each group.
3. Plot the two means at the midpoints of the time intervals covered by the
respective groups.
4. Join these two points with a straight line. This is the trend line.
To forecast:
5. Extend the straight line up to the required forecast period and use the graph
by reading the value from the y axis; or
6. Calculate the average increase per year by determining the difference
between the two averages and divide this difference by the number of years
between the two averages. Add this increment an appropriate number of
times to the mean of the latter group.
Note: It is important that the two groups in question have an equal number of
data values.
151
Example 7.2
The table below shows the sales for Yolanda’s Coffee Shop.
}
2008 80
2009 95 275
3
5 92
2010 100
2011 110
}
2012 130
2013 145 425
3
5 142
2014 150
100
80
60
40
20
0
2008 2009 2010 2011 2012 2013 2014
Years
152
Activity 7.3
The following table shows the number of traffic tickets issued by the Alberton
Traffic Department for the first six months of the year. Forecast the number of
traffic tickets for July by making use of the method of semi-averages and portray
the trend line on a graph.
Month No of tickets
January 120
February 120
March 100
April 90
May 130
June 150
Steps
1. If you calculate an odd-numbered moving average (i.e.3, 5 or 7), there will
be a middle time point opposite which to record the answers. For example, in
calculating a three-year moving average, you will start by adding the y-values
for the first three years and dividing the answer by three. This answer will
correspond with the middle of the second year. Move down one year and
calculate the average for years two to four. This answer will correspond with
the middle of year three. Complete the process for all the years.
153
Example 7.3
The quarterly sales of petrol at Jack’s Garage are represented in the table below.
154
1. The first value in the 3-quarterly moving average column is obtained by (40
1 37 1 61) 4 3 5 46. Write the answer in the middle of the second quarter,
which is the middle of the time period used to calculate this average. Move
down one quarter from the top and calculate the second value: (37 1 61 1
58) 4 3 5 52. This answer corresponds with the middle of the third quarter.
2. The 4-quarterly moving average requires two columns. The first moving
average value in the column is: (40 1 37 1 61 1 58) 4 4 5 49. Write this
answer between the second and third quarters, because that is the middle of
this time period used to calculate the average. Move down one quarter from
the top and calculate the second value in the column: (37 1 61 1 58 1 16)
4 4 5 43. This answer corresponds with the position between the third and
the fourth quarters.
These values do not correspond with the middle of a specific time period;
therefore a centred column is required. The first value in this column is obtained
by: (49 1 43) 4 2 5 46. This answer corresponds with the middle of the third
quarter. Move down one quarter and calculate the second value: (43 1 47.5) 4
2 5 45.2. This answer corresponds with the middle of the fourth quarter.
Quarterly petrol sales for Jack’s Garage
100
90
80
70
60
Sales (R’000)
50
40
30
20
10
0
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Time period
Activity 7.4
155
Year y
2005 14
2006 20
2007 40
2008 30
2009 28
2010 42
2011 51
2012 25
2013 32
Ratio-to-moving-average method
Steps
1. List data in date order.
2. Determine the time period to be used for the moving average. If the data is
monthly, use a 12-monthly moving average. Quarterly data needs a four-
quarterly moving average and six days a week needs a six-daily moving
average.
3. Calculate the required moving average. Remember, if the moving average
period is an even number, centre the averages by averaging adjacent moving
averages.
4. Express the original time series values as percentages of the corresponding
centred moving averages by dividing the moving average into the original
data and multiplying the result by 100. These are the individual seasonal
percentages.
156
Note: A modified mean is the arithmetic mean of the values that remain after
elimination of the smallest and largest values in the column.
6. Add the unadjusted seasonal indexes.
7. Determine the factor needed to adjust the index numbers to typical index
numbers.
Typical quarterly index 5 100 3 4
Typical monthly index 5 100 3 12
Steps
1. Determine a linear trend over the given data using: y^ 5 a 1 bx
2. Code the different time units (quarters, months, weeks, etc) within the
forecasting period.
3. Calculate the trend value for each time unit within the forecasting period.
4. Multiply the trend value for each time unit with the seasonal index of that
time unit.
Deseasonalising data
The influence of seasonality can be removed from a time series by dividing
each original value in the series by the appropriate typical seasonal index for
that period and then multiplying the result by 100. The result is known as
deseasonalised data. Deseasonalised data is used if we wish to compare data
157
Example 7.4
The quarterly income (R’000) of a soft drink company has been recorded for
four years. The time period in date order is shown in column 1 and the actual
sales (y) in column 2.
1. Quarterly data is given; therefore a four-quarterly moving average is
determined and listed in column 3. The first value is: (52 1 67 1 85 1 54)
4 4 5 64.5. This answer corresponds with a position between the second
and third quarters of 2009. The second value in the column is calculated by
moving down one quarter: (67 1 85 1 54 1 57) 4 4 5 65.8. This answer
corresponds with a position between the third and fourth quarters of 2009.
By moving down one quarter at a time, calculate the rest of the moving
averages.
158
1 2 3 4 5
y 4-q m.a Centred %
2009 1 52
2 67
64.5
3 85 65.2 130.4
65.8
4 54 66.8 80.8
67.8
2010 1 57 68.4 83.3
69
2 75 69.9 107.3
70.8
3 90 71.2 126.4
71.5
4 61 71.8 85.0
72
2011 1 60 72.5 82.8
73
2 77 73.2 105.2
73.5
3 94 74.2 126.6
75
4 63 75.9 83.0
76.8
2012 1 66 77.3 85.4
77.8
2 84 78.3 107.3
78.8
3 98
4 67
Total 1 150
4. Construct a summary table with the years in the first column and the
quarters in the top row. Enter the values from the percentage column into
the summary table. The purpose of this table is to group together all first
quarters, second quarters, third quarters and fourth quarters.
5. Calculate the unadjusted index for the first quarter by cancelling the smallest
value (82.8) and the highest value (85.4) in the column and average the
rest of the values (83.3 4 1 5 83.3).This average is known as the modified
mean. Continue to do the same for the other quarters.
159
Summary table:
Year 1 2 3 4
2009 130.4 80.9
2010 83.3 107.3 126.4 85.0
2011 82.8 105.2 126.6 83.0
2012 85.4 107.3
Unadjusted 83.3 107.3 126.6 83.0 400.2
3 0.9995 3 0.9995 3 0.9995 3 0.9995
Typical index 83.3 107.2 126.5 83.0 400.0
8. Interpret the index numbers: the influence of the season caused the sales
during the first quarter to be 16.7% lower than expected. The second quarter
sales are 7.2% higher than expected, the third quarter sales are higher with
26.5% and the fourth quarter sales are lower than expected by 17% due to
the influence of the season.
Forecasting:
1 2 6
Y Seasonal Deseasonalised x-code xy x2
index y
2009 1 52 83.3 62.4 1 52 1
2 67 107.2 62.5 2 134 4
3 85 126.5 67.2 3 255 9
4 54 83.0 65.1 4 216 16
2010 1 57 83.3 68.4 5 285 25
2 75 107.2 70.0 6 450 36
3 90 126.5 71.1 7 630 49
4 61 83.0 73.5 8 488 64
2011 1 60 83.3 72.0 9 540 81
2 77 107.2 71.8 10 770 100
3 94 126.5 74.3 11 1 034 121
4 63 83.0 75.9 12 756 144
2012 1 66 83.3 79.2 13 858 169
2 84 107.2 78.4 14 1 176 196
3 98 126.5 77.5 15 1 470 225
4 67 83.0 80.7 16 1 072 256
Total 1 150 1 150 136 10 186 1 496
160
83.3
82.16 3
100 5 68.44
Year x code Trend Seasonal Seasonalised
forecast index forecast
2013 1 17 82.16 83.3 68.43
2 18 83.37 107.2 89.37
3 19 84.58 126.5 106.99
4 20 85.79 83.0 71.21
Activity 7.5
The owner of a pizzeria recorded the number of pizzas sold during the past three
weeks in order to determine the influence of the day of the week on the sales. Do
a seasonalised forecast per day for week 5 and graph the original time series, the
deseasonalised time series and the trend line.
Week Monday Tuesday Wednesday Thursday Friday
1 12 18 16 25 31
2 11 17 19 24 27
3 14 16 16 28 25
4 17 21 20 24 32
161
TEST YOURSELF 7
1. Complete the following table assuming the classical multiplicative model:
Trend (T) C S Forecast
Winter 130 80 120
Spring 132 90 100
Summer 134 100 70
Autumn 136 110 110
2. Using the classical model, find the missing values:
Trend C S I Sales
Winter 100 100 90 99
Spring 200 80 100 168
Summer 300 110 110
Autumn 400 120 120 604.8
Total
3. The total units of new government housing under construction for the past
six years in the Gauteng province are given below:
Year Total units
2008 1 488
2009 1 014
2010 1 354
2011 1 474
2012 1 617
2013 1 666
Forecast the number of units that will be built during 2014 and 2015.
4. The data given below (in hundreds) was prepared by a marketing research
agency for Radio UJ:
Year Audience size
2005 31
2006 32
2007 33
2008 30
2009 29
2010 30
2011 28
2012 26
162
a) Estimate the audience for the year 2013 using the method of least
squares and graph the series.
b) Estimate the audience for the year 2013 using the method of semi-
averages.
5. a) Use the data given in the following table to smooth the time series by
making use of a:
two-year moving average
three-year moving average
four-year moving average.
b) Graph the original data and the moving averages.
c) Forecast the value for the year 2014.
2005 2006 2007 2008 2009 2010 2011 2012 2013
642 819 845 755 767 720 749 794 686
6. The numbers of operational nuclear power reactors in the world for the
given years are listed in the following table:
Year 2008 2009 2010 2011 2012 2013
Number of reactors 83 99 108 110 105 104
163
164
165
The theory of probability grew out of the study of various games of chance using
coins, dice, cards, lottery and gambling machines. Since then, probability theory
has been developed to determine uncertainties in our everyday lives as well.
8.1 Language of probability
• An experiment or investigation is an action that generates the uncertain
outcomes to which we will assign probabilities.
• A particular result of an experiment is an outcome.
• A sample space of a random experiment is a list of all the possible outcomes
of the random experiment.
• The individual outcomes in a sample space are called simple events. An event
is a collection of one or more outcomes of an experiment.
Example 8.1
A family has three children. Their blood type can either be O or A. The sample
space for the blood types of the three children contains eight possible outcomes:
[OOO, AOO, OAO, OOA, AAO, AOA, OAA, AAA]
Activity 8.1
The type of transmission – automatic (A) or manual (M) – is recorded for each of
the next two cars purchased from a certain dealer.
1. What is the random experiment?
2. What is the sample space?
3. List the outcomes in each of the following events:
a) that at least one car has an automatic transmission
b) that exactly one car has an automatic transmission
c) that neither car has an automatic transmission.
168
number of successes
P(E) 5
of outcomes
total number
Steps
1. Identify the event (or success).
2. Find the number of successes.
3. Find the total number of outcomes in the experiment.
4. Divide the number of successes by the total number of possible outcomes.
5. Interpret the probability.
Example 8.2
In rolling a fair die once, each of the possible outcomes in the sample space (1,
2, 3, 4, 5 or 6), has an equal chance of occurring. In calculating the probability
of the event obtaining an even number in one roll of the die, your number of
successes is (2 or 4 or 6).
3
P(even number) 5
6 5 0.5
Activity 8.2
In drawing a card from a deck of 52 cards, what is the probability that it will be
an ace?
169
number of times
the event occurred
P(E) 5
total number of observations
Note: Chance behaviour is unpredictable over the short run, but has a regular
and predictable pattern in the long run.
Steps
1. Identify the event (or success).
2. Find the frequency of the event, that is the number of times the event
occurred in the experiment or in the past.
3. Find the total number of outcomes in the experiment.
4. Divide the number of times the event occurred by the total number of possible
outcomes.
5. Interpret the probability.
Example 8.3
1. If you toss a coin 10 times and get three heads, you obtain an empirical
3
probability of 10 . The frequency of the event ‘heads’ is 3 and the total of the
possible frequencies is 10. Because you tossed the coin only a few times, your
empirical probability is not representative of the classical probability, which
1
is
2 . If, however, you toss the coin several thousand times, the probability
will move very close to the classical probability.
170
Activity 8.3
In a survey, a sample of 100 students were asked if they think that cloning of
humans should be allowed. Ninety two said it should not be allowed, five said it
should be allowed and three had no opinion. Calculate the probabilities for each
event in the survey.
Activity 8.4
We all know that fruit is good for us and that we don’t eat enough. In a recent
study among a random sample of 75 teenage boys the following information
was collected:
Fruit servings per day Number of boys % of boys
0 20 27
1 15 20
2 15 20
3 12 16
4 8 11
5 5 6
Total 75 100
1. What is the probability of a teenage boy eating three fruit servings per day?
2. What is the probability of a teenage boy eating no fruit per day?
171
Example 8.4
Given a patient’s health and extent of injuries, a doctor may feel that the patient
has a 90% chance of full recovery.
Example 8.5
80
P(accepting next batch) 5 90 5 0.89
If you add the probabilities of the two possible outcomes, the total is 1.
172
Despite its simplicity, the complement rule can be very useful. The task of finding
the probability that an event of interest will not occur is sometimes easier or less
time-consuming than finding the probability that it will occur.
Example 8.6
Activity 8.5
The probability that a typist will make at most five mistakes is 0.64. What is the
probability that she will make more than five mistakes?
Activity 8.6
We all know that fruit is good for us and that we don’t eat enough. In a
recent study done among a random sample of 75 teenage boys, the following
information was collected:
Fruit servings per day Number of boys % of boys
0 20 27
1 15 20
2 15 20
3 12 16
4 8 11
5 5 6
Total 75 100
173
1. What is the probability of teenage boys eating three fruit servings per day?
2. What is the probability of teenage boys not eating three fruits per day?
3. What is the probability that teenage boys will eat at least one fruit per day?
Example 8.7
174
• B and C 5 you are interested in the possible outcomes that will result in a red
king card – it can be one of two possible outcomes. Only two of the king cards
are red.
Example 8.8
Statistics (29)
Sc 5 (40 2 29 5 11)
29
5. P(Students enrolled for Statistics) 5
40
29
P(Students not enrolled for Statistics) 5 1 2 11
40 5 40
2. The total area in the two circles displays the union of the two events (S∪A). It
consists of all the outcomes in either event Statistics (S) or event Accounting
(A) or both.
3. The area where S and A overlap is where both S and A occur.
40 students
Statistics Accounting
5
(29 2 5 5 24) (14 2 5 5 9)
4. If one student is selected from this sample, what is the probability that
the student is enrolled in at least one of the subjects? This consists of all
the students enrolled in Statistics or in Accounting or in both subjects.
(24 1 5 1 9 5 38)
38
P(S or A) 5
40
If one student is selected from this sample, what is the probability that the
student is enrolled in neither subject? The student is not enrolled in any of
the__ two__subjects, but forms part of the 40 group. (40 2 38) 5 2
2
P(S or A
) 5
40
What is the probability that a randomly chosen student from this sample is
enrolled in Statistics only? That will exclude the students who are enrolled
in Statistics as well as in Accounting (29 2 5 5 24)
24
P(Statistics only) 5 40
What is the probability that a randomly chosen student from this sample is
enrolled in
__
Accounting but not Statistics?
9
P(A and S ) 5
40
176
S A
S and A 5 (5)
3. What is the probability that a randomly chosen student from this sample is
enrolled in Statistics and Accounting?
5
P(S and A) 5
40
Queen King
177
means that none of the other events can occur at the same time. An outcome
can belong to event A or to event B but not to both.
If you flip a coin, you can have either heads or tails. Both can’t happen at
the same time!
3. Two events are independent if the occurrence of one is in no way affected by
the occurrence of the other; that is, they are unrelated.
If you flip two coins and you obtained heads on the one, it will have no
influence on the outcome of the second flip.
4. If there is a particular relationship between events such that the occurrence
of one event affects the occurrence of the second event, the events are
dependent. The probability attached to the occurrence of such events is
known as conditional probability.
Example 8.9
We all know that fruit is good for us and that we don’t eat enough. In a recent study
done among a random sample of 75 people, the following information was collected:
Fruit servings per day Number of people
0 20
1 15
2 15
3 12
4 8
5 5
Total 75
178
15
• The probability that a selected person eats two fruits per day is
75 5 0.20
12
The probability that a selected person will eat three fruits per day is
75 5
0.16
The two events are mutually exclusive because if the person eats exactly two
fruits per day he cannot eat exactly three fruits as well.
• The probability that a selected person will eat two or three fruits per day is
P(2 or 3) 5 P(2) 1 P(3) 5 0.20 1 0.16 5 0.36.
• Calculate the probability that a randomly selected person will eat at least four
fruits per day.
Activity 8.7
The probabilities that a wine tasting will rate a new shiraz as very poor, poor,
fair, good, very good or excellent are 0.05, 0.14, 0.17, 0.33, 0.20 and 0.11.
What are the probabilities that the shiraz will be rated as:
1. very poor or poor?
2. good, very good or excellent?
To avoid double counting the probability of the outcomes that fulfil the
conditions for both events, P(A and B) is subtracted from the sum of the
probability of A and B.
Note: In this rule P(A and B) denotes the probability that A and B both occur
in the same observation. In the multiplication rule P(A and B) denotes the
probability that event A occurs on one trial followed by event B on another trial.
179
Example 8.10
The probability that a person stopping at a petrol garage will ask to have his tyres
checked is 0.12, the probability that he will ask to have his oil checked is 0.29
and the probability that he will ask to have both checked is 0.07. What is the
probability that a person stopping at this garage will ask to have:
• either his tyres or his oil checked?
P(T or O) 5 P(T) 1 P(O) 2 P(T and O)
5 0.12 1 0.29 – 0.07 5 0.34
• neither his tyres nor his oil checked?
1 2 P(T or O) 5 1 2 0.34 5 0.66
Activity 8.8
There are two secretaries in the office. The probability that the one secretary will
be absent on any given day is 0.08 and the probability that the other one will be
absent on any given day is 0.07. The probability that both will be absent is 0.02.
What is the probability that on a given day:
1. either or both secretaries will be absent?
2. at least one secretary comes to work?
Two events are independent if the occurrence of one does not change the
probability of the next one occurring.
180
Example 8.11
A survey among students found that 46% of them suffer stress at least once a
week. If three students are selected at random, find the following probabilities:
• That all three will say that they suffer stress at least once a week:
The three students are independent.
P(all three suffer stress) 5 0.46 3 0.46 3 0.46 5 0.097
• That none of them will say that they suffer stress:
If 46% suffer stress, then 54% will not suffer any stress, so
P(none of them suffer stress) 5 0.54 3 0.54 3 0.54 5 0.157
Activity 8.9
The quality control manager of a company questions the reliability of the two
quality control checks in the food processor manufacturing process. A worker
who manually checks the processors performs one check and a computer
monitor performs a second check. The manager knows that 5% of the time the
worker is apt to miss a defective processor and that 2% of the time the computer
will malfunction and fail to detect defective processors. What is the probability
that a worker will miss a defective processor and the computer will malfunction,
allowing a defective processor to leave the manufacturing process?
181
Example 8.12
You are not aware of it, but in a case of wine bought, five of the 12 bottles are
bad. The given table lists all the possible outcomes in the experiment together
with the probability for each possible outcome.
G 5 good
B 5 bad
G1G25 first bottle good and second bottle good
1. To calculate P(G1G2):
7
There are seven good bottles in the case of 12 P(G1) 5
12
6
The probability of G2 if the first bottle is good (G1): P(G2\G1) 5 11 . (Remember
that if the first bottle selected is good, there are only 11 bottles left in the box and
only six good ones. The condition is: if the first bottle is good.)
All the possible outcomes Probability for each outcome
G1 G2 7 6 42
7 5 35
G1 B2
12 3 11 5 132
5 7 35
B1 G2
12 3 11 5 132
5 4 20
B1 B2
12 3 11 5 132
132
Total
132 5 1
If you were to select two bottles from the case, what is the probability that:
5 4 20
P(B1B2) 5
12 3
11
5
132
Note: The probability of an event is the sum of all the possibilities that will give
you the answer.
182
Example 8.13
A certain testing apparatus has two batteries. The probability that the first
battery will run down is 0.3 (B1) and the probability that both batteries (B1
and B2) will run down is 0.06. If the first battery is found to be flat, what is the
probability that the second battery will be flat?
0.06
P(B2\B1) 5
0.30 5 0.20
This means that 20% of the time, if the first battery is flat, the second battery
will also be flat.
Activity 8.10
183
Example 8.14
P(female) 5 0.15
P(male) 5 0.85
P(promoted) 5 0.20
P(not promoted) 5 0.80
P(female and promoted) 5 0.03
P(male and promoted) 5 0.17
P(female and not promoted) 5 0.12
P(male and not promoted) 5 0.68
184
Activity 8.11
The table shows the results of a study on 102 children in which a child’s IQ was
examined and the presence of a specific gene was found in the child.
Gene present Gene not present Total
High IQ 33 19 52
Normal IQ 39 11 50
Total 72 30 102
Steps
1. Plot a dot on the left to represent the root of the tree.
2. Construct a column for each trial.
3. Start on the left at the dot and determine the possibilities for the first trial,
which forms the branches of the tree in the first column.
4. Branches grow from each of the original branches, representing the
possibilities for the second trial. The second stage is based on the choice made
in the first stage. Determine if the outcomes are dependent or independent.
5. The branches of the tree are weighted by probabilities; therefore show the
probabilities for each event on the branches.
6. List all the outcomes together with the joint probability for each combined
outcome.
185
7. Add the probabilities. Because the tree represents the sample space of the
experiment, the sum of the probabilities should equal 1.
Example 8.15
A bag contains five red balls and three black balls. Two balls are drawn from the
bag. Construct a probability tree to list all the possible outcomes together with
each outcome’s probability.
1. The first stage of the tree consists of the possibilities in the first draw. There
are only red and black balls in the bag, therefore if you draw one ball from
the bag it can be either red or black. These possible outcomes are represented
by the branches of a tree.
2. If the ball in the first draw was red, there are still four red balls and three
black balls in the bag; therefore the second ball you draw can be either red
or black. If the first ball was black, there are still five red and two black balls
in the bag; therefore the second ball you draw can be either red or black.
3. The probabilities alongside each possibility must be calculated. In the first
5
stage if you draw a ball, the chance that it is red is 8 . The chance that the
3
first ball is black is
8 . Please note that the first ball can either be red or black.
Only one of the two possibilities can happen.
4. If the first ball is red, there are only four red ones left and seven balls in the
bag, therefore the chance that the second one is red is only 47 . But if the
3
second one is black the probability is 7 .
5
5. If the first ball was black, the chance that the second one is red is 7 or the
2
second one can also be black with an associated probability of 7 .
Stage 1 Stage 2
4 5 4 20
r 5 8 .
rr5 7 5 56
5 7
8
r5 5 3 15
3 r b 5
8 . 7 5 56
7
b5
5 3 5 15
r 5
7 b r 5
8 . 7 5 56
3
8
b5 3 2 6
2 b b 5 8 .
7 5 56
b 5
7
56
56 5 1
6. If you add the probabilities of all the possible events, the total must be 1.
20
7. From the tree diagram, the probability of drawing two red balls is 56 .
15 15 30
The probability of a red and a black ball is
56 1
56 5
56 .
186
There are two possible outcomes that result in a red and a black ball. A
probability of an event is the sum of the probabilities of all the possibilities that
can result in the required event.
Activity 8.12
m3n3...
Example 8.16
187
Activity 8.13
If a restaurant menu had a choice of three salads, six main dishes and six
desserts, how many different possible dinners can be ordered?
n!
n
Px 5
(n 2 x)!
Example 8.17
Suppose we need to select a group of 3 people from a larger group of 10. They
are to fill the roles of chairperson, secretary and treasurer in a committee. The
number of possible ways of filling these roles is:
10!
10
P3 5
(10 2
3)!
5 720
Activity 8.14
Assume there are five carriages that need to be unloaded at a dock, but there is
only enough time left in the day to unload three of them. Since the goods in each
of the carriages are needed by customers, the order of unloading is important.
In how many ways can three of the five carriages be unloaded in first, second
and third order?
188
selected. For example, ABC is considered the same selection as BCA or CBA. The
number of combinations of n objects taken x at a time is:
n!
n
Cx 5
x(n 2 x)!
Example 8.18
7!
n
Cx 5
5!(7 2 5)!
5 21
Activity 8.15
You are given a list of 10 books and you are to read four of them. How many
possible combinations of four books are available from the list of 10?
TEST YOURSELF 8
1. There are six balls of the same size in a box: two are red, three are blue and
one is yellow. If you draw a ball from the box, what is the probability that
a) a red ball will be selected?
b) the ball will not be yellow?
c) the ball will be red or yellow?
2. This table shows the blood type of a randomly chosen person in South
Africa:
Blood type A O B AB
Probability 0.40 0.45 0.11 ?
189
a) Find the probability of a randomly selected adult who doesn’t always eat
healthy foods because he or she has no time to cook or is confused about
nutrition.
b) Find the probability of a randomly selected adult who feels that healthy
foods have poor taste or are hard to find.
4. What is the probability that an even number will result from one roll of a
die?
5. If you draw a Smartie at random from a box of Smarties, you can draw one
of six possible colours.
Colour Blue Green Brown Orange Red Yellow
Probability ? 0.16 0.13 0.20 0.13 0.14
190
191
17. If 18% of all South Africans are underweight, find the probability that if two
citizens are selected at random, both will be underweight.
18. Approximately 9% of men have a type of colour blindness that prevents
them from distinguishing between red and green. If three men are selected
at random, find the probability that all three will have this type of red–green
colour blindness.
19. The probability that a person has type O1 blood is 38%. Three unrelated
people are selected in a random sample. Find the probability that:
a) all three have type O1 blood
b) none of the three has type O1 blood
c) at least one has type O1 blood.
20. Students take two independent tests –30% of them pass test A and 60% pass
test B.
Find the probability that a student selected at random passes:
a) both tests
b) only test A
c) only one test.
21. Ten students are being interviewed for appointment to the students’ council.
Six of them are female and four are male. If two are selected at random for a
newspaper interview, what is the probability that:
a) at least one is female?
b) one female and one male are selected?
22. A person owns a collection of 30 CDs, of which five are classical music. If
two CDs are selected at random, find the probability that both are classical
music.
23. A doctor gives a patient a 60% chance of surviving bypass surgery after a
heart attack. If the patient survives the surgery, he has a 50% chance that
the heart damage will heal. Find the probability that the patient survives the
surgery and that the heart damage will heal.
24. The probability that Jack parks in a disabled parking zone and gets a parking
fine is 0.06. The probability that Jack cannot find a legal parking space and
has to park in the disabled parking is 0.20. On Monday, Jack arrives at the
shopping centre and has to park in the disabled parking zone. Find the
probability that he will get a parking fine.
25. A batch of 10 calculators has three defective calculators. What is the
probability that a sample of three calculators will have:
a) no defective calculators?
b) all defective calculators?
192
What is the probability that a drill bit issued at random will last
a) less than 10 days?
b) five or more days?
27. There are 100 students in a class, of whom 40 are male. When questioned,
60 students agreed that they were happy in the school and of these, 30 were
female. Construct a contingency table and find the following probabilities
for a randomly selected student:
a) male and unhappy
b) female and happy
c) given a female, she is happy
d) an unhappy student
e) if it is a male, he is happy.
28. A study done on a sample of 1 000 people to determine answers on gender
and dominant hand produced the following information:
Men Women Total
Left-handed 63 50 113
Right-handed 462 425 887
Total 525 475 1 000
If one person is selected at random, calculate the probability that the person:
a) is left-handed
b) is a woman
c) is a man or left-handed
d) if it is a man, he is left-handed
e) is a woman and left-handed.
29. The table below indicates the results of steroid tests given to 1 000 male and
500 female athletes who were randomly selected for testing.
193
194
195
40. Space shuttle astronauts each consume an average of 3 000 calories per day.
One meal normally consists of a meat dish, a vegetable dish and a dessert.
The astronauts can choose from 10 meat dishes, eight vegetable dishes and
13 desserts. How many meal combinations are possible?
41. A mail-order company sells eight different books. As part of a special
promotion, customers may select three different books to make up a package.
How many different packages are possible?
42. There are 15 qualified applicants for five trainee positions in a fast food
management programme. How many different groups of trainees can be
selected?
43. A food technologist must select three tests to perform on ice cream. He has a
choice of seven tests. In how many ways can he perform three different tests?
44. A new drug is in the test phase: the first phase involves five volunteers and
the objective is to test the safety of the drug. If eight volunteers are available
and five of them are to be selected, how many different combinations of five
volunteers are possible?
45. The CEO of a research centre has to reduce the management staff from
10 to seven. He wants to get rid of the oldest three. How many possible
arrangements are there of the management staff in order of age?
46. The Big Triple at the local race track consists of picking, in the correct
order, the first three horses in the ninth race. How many possible Big Triple
outcomes are there if the ninth race is run by 12 horses? What is the
probability that your ticket will be a winning ticket?
47. A rugby team must schedule a game with each of three other teams. There are
five dates available for games. How many different schedules can be arranged?
What is the probability that it will be scheduled on one specific day?
48. Suppose that 60 of 200 students have Statistics as a subject, 40 have
Accountancy as a subject and 25 have both subjects. Portray the given data
using a Venn diagram and answer the following questions:
a) How many students have only Statistics as a subject? Calculate the
probability of having only Statistics as a subject.
b) How many have only Accountancy? What is the probability of taking
only Accountancy as a subject?
c) How many have Statistics or Accountancy or both? Calculate the
probability attached to this answer.
d) How many students have neither of the two subjects? What is the
probability of taking neither of the two subjects?
196
In Unit 8 you studied the computation of the probability of an event. This unit
introduces probability distributions and shows how to calculate probabilities
using a probability distribution.
It is important that you can distinguish between discrete and continuous random
variables because different statistical techniques are used to analyse each.
Steps
1. List all the possible outcomes of an investigation in a frequency distribution.
2. Add the frequencies of each possible outcome.
3. Find the probability of each possible outcome by dividing the frequency of
the outcome by the sum of the frequencies.
4. Make sure each probability is between 0 and 1 and the total of all the
probabilities is 1.
5. The mean of a probability distribution is referred to as its expected value
and denoted by E(x) 5 S[xP(x)]
6. In this text we will cover the binomial and Poisson discrete probability
distributions.
Example 9.1
A survey asked a sample of 200 people how many times they donate blood each
year. The results are summarised as a probability distribution. The random
variable (x) represents the number of donations for one year.
x f P(x) E(x)
0 60 0.30 0.00
1 50 0.25 0.25
2 50 0.25 0.50
3 20 0.10 0.30
4 10 0.05 0.20
5 6 0.03 0.15
6 4 0.02 0.12
Total 200 1.00 1.52
Interpretation: 0.3 of the people did not donate blood and 0.25 of the people
donated once.
The expected number of times someone will donate blood per year is 1.52.
198
Where:
x 5 the number of successes, 0, 1, 2, etc
n 5 number of trials or sample size
p 5 probability of success on each trial
Steps
1. Find the probability (p) of a success in each trial.
2. Find the number of trials (n).
3. Decide on the number of successes (x) for which you want to determine the
probability.
4. Substitute the values into the formula.
199
Example 9.2
Suppose the probability is 0.2 that any given avocado will show measurable
damage when the temperature falls to 15 °C. Construct the binomial distribution for
a sample of five avocados if the temperature does drop to 15 °C.
p 5 0.2
n55
x 5 0, 1, 2, 3, 4, 5 (damage can occur in none of the five, one of the five . . .
up to all five)
x P(x)
0 0.3277
1 0.4096
2 0.2048
3 0.0512
4 0.0064
5 0.0003
Total 1.0000
5! . 0.20 . (1 2 0.2)5 2 0 5 0.3277
P(x 5 0) 5
0!(5 2 0)!
Note: A probability is the sum of the probabilities of all the possibilities that will
give you the answer.
200
Note: You can apply the complement rule or you can calculate P(x 5 2, 3, 4 or
5). It will give you the same answer. Use the complement rule if it will give you a
short-cut method to your answer.
Activity 9.1
A shoe store’s records show that 30% of customers making a purchase use a
credit card to make payment. This morning seven customers purchased shoes
from the store. What is the probability that
1. three customers will pay using a credit card?
2. at least two customers will pay using a credit card?
3. more than five customers will pay using a credit card?
4. exactly three customers will not use a credit card to pay?
Characteristics
1. The number of successes that occur in one interval is independent of the
number of successes that occur in any other interval.
2. The probability that a success will occur in an interval is the same for all
intervals of equal size and is proportional to the size of the interval.
3. x is the count of the number of successes that occur in a given interval of
time or other measurement and may take on any value from 0 to infinity.
4. If x is a Poisson random variable, the probability distribution of x is given by
lx . e2l
P(x) 5
x!
201
Where:
x 5 0, 1, 2, . . . `
l (pronounced lambda) 5 number of successes in the given unit of
measurement
e 5 the base of natural logarithms (use the ex key on the calculator)
Example 9.3
60 . e26
P(x 5 0) 5
3!
5 0.0498
65 . e23
P(x 5 5) 5 5 0.1606
5!
3. at least two calls will be received during the next 2.5 minutes
l 5 3 per 5 minutes, therefore l 5 1.5 for 2.5 minutes
1.50 . e21.5
P(x 5 0) 5
0! 5 0.2231
1.51 . e21.5
P(x 5 1) 5
1! 5 0.3347
Note: There is no n value available and therefore no upper limit to the x counts.
If you are required to do a probability involving . or , you have to apply the
complement rule.
202
Activity 9.2
Characteristics
• The graph for a continuous random x variable is a smooth curve.
• The curve is unimodal (single mode).
0.5 0.5
203
Steps
1. Any normal random variable can be converted to a standard normal random
variable by calculating the corresponding z score. The z score expresses the
difference between the value of interest (x) and the m in units of standard
deviation (z).
x2m
z 5
2. This z score shows the number of standard deviations that a specific value lies
to the right or left of the mean. Any x value smaller than the mean will have
a negative z score and any x value greater than the mean will have a positive z
score. A negative z score only indicates that the area is to the left of the mean.
3. The x axis of the graph becomes the z axis with z 5 0 in the middle.
4. The z score is used to find the area under the standard normal curve in
published tables.
5. Because the distribution is symmetrical, the normal table deals only with
positive z scores. The negative z scores will give exactly the same areas as the
positive ones; therefore we use the absolute value of the z score.
Note: Absolute value means that you ignore the sign of the value.
6. The area is always positive because all probabilities are positive.
7. Within the same number of standard deviations (z) from the mean all
normal distributions will have the same area or probability.
204
Steps
1. The first column in the table gives the z score to the first decimal place, and
the top row gives the second decimal for a z score.
2. For example, to find the area of a z score of 0.23:
a) find 0.2 in the first column
b) go across with this row up to the column headed with the second decimal
of 0.03
c) where the corresponding row and column intersect, the area is 0.0910
d) this is the area between a z score of 0 and 0.23 and is denoted as
P(0 # z # 0.23)
e) a z score of 1.02 or P(0 # z # 1.02) will correspond to an area of 0.3461.
3. This table always gives the area between the mean and the required z score.
205
x x x1 x2
Steps
1. Sketch (1): Calculate the z score and look up the value in the Normal table.
2. Sketch (2): Calculate the z score. The answer for z is negative; therefore use
the absolute value of z and look up the value in the Normal table.
3. Sketch (3): If the area to be determined falls on both sides of the mean:
• calculate the z score for the area between µ and the x value to the
right of m
• determine the z score for the area between µ and the x value to the left
of m
• look up the areas for the two z scores from the Normal table
• add the two areas.
x1 x2 x2 x1
Steps
1. Calculate the z score for the area between m and the larger x value.
2. Calculate the z score for the area between m and the smaller x value.
3. Look up the two z scores in the Normal table to obtain the two areas.
4. Subtract the smaller area from the larger area.
206
x1 x2
Steps
1. Calculate the z score for the area between µ and the x value.
2. Use the absolute z score to look up the area in the Normal table.
3. Subtract the area from 0.5 (the area from the mean to the end of the
distribution is 0.5).
Area to the right of any x-value, where x is less than the mean and the area to
the left of any x-value, where x is greater than the mean
(1) (2)
0.5 0.5
x1 x2
Steps
1. Calculate the z score for the area between µand the x value.
2. Look up the z score to get the area.
3. Add 0.5 to the area.
Steps
If a probability or area is given and you are required to determine an unknown
x value:
1. Calculate the area between µ and the unknown x value.
207
2. If the known area falls in the tail of the curve, subtract the tail area from
0.5.
3. Compare this area with the areas in the body of the Normal table.
4. If the exact area is not listed, use the closest value.
5. Read the z score from the first column and top row.
6. Use this z score in the z formula to obtain the unknown x value.
Note: If the area falls to the left of µ, the z is negative, and if the area falls to
the right of m, the z is positive.
(1) (2)
0.5 0.5
?x ?x
For example, if the area between µ and the unknown x value is 0.08, find the
closest area to 0.08 in the table – that is 0.0793 – and read the z score from the
first column and top row, that is 0.2 from the first column and 0.00 from the top
row. The z 5 0.20.
Example 9.4
208
100 2 120
z 5
20
5 21.00 Area: 0.3413
130 2 120
z 5
20
5 0.5 Area: 0.1915
2. The probability that a randomly selected employee will complete the task
between 75 and 100 sec:
75 2 120
z 5
20 5 22.25 Area: 0.4878
100 2 120
z 5
20
5 21 Area: 0.03413
75 100 120
3. The probability that a randomly selected employee will complete the task
within 75 sec:
P(x # 75) 5 0.50 2 0.4878 5 0.0122
75 120
4. The probability that a randomly selected employee will complete the task in
more than 75 sec:
P(x 75) 5 0.50 1 0.4878 5 0.9878
209
5. The 10% of the employees who complete the task within the shortest time
are to be given advanced training. What task times qualify individuals for
such training?
• The fastest 10% will fall in the left-hand tail of the distribution because
those times will be the shortest.
• The area between the mean and the 10% in the tail is: 0.5 2 0.1 5 0.4
• Look up an area of 0.40 in the Normal table and obtain the z value. The
area you are interested in falls on the left of m, resulting in a negative z
score.
xµ x 2 120
z 5 2 1.28 5
20
x 5 94.4
This means that the applicants who complete that task in 94.4 seconds or less
will qualify for advanced training.
0.5 2 0.1
5 0.4
0.1
21.28 120
Activity 9.3
The lifetimes of a certain kind of battery have a mean of 300 hours and a
standard deviation of 35 hours. Assume that the lifetimes, measured to the
nearest hour, follow a normal distribution, and determine
1. the percentage of batteries that have a lifetime of more than 320 hours
2. the value above which the best 30% of the batteries lie
3. the proportion of batteries that have a lifetime from 250 to 350 hours
4. the proportion of batteries with a lifetime between 250 and 280 hours
5. the maximum lifetime below which the weakest 20% of the batteries will fall
6. the minimum lifetime above which the 15% of the batteries with the longest
life will fall.
210
TEST YOURSELF 9
1. A textile firm has found from experience that only 20% of the people applying
for a certain stitching-machine job are qualified for the work.
a) Construct the probability distribution for this investigation if five persons
are interviewed to find qualified persons.
b) What is the probability that at least two are qualified for the job?
2. Testing blood for HIV, the virus that causes Aids, gives a positive result with
probability of about 0.004 when a person who is free of HIV antibodies is
tested. A clinic tests three people who are all free of HIV antibodies.
a) Construct the probability distribution for this investigation.
b) What is the probability that you will get one false-positive result?
c) What is the probability that you will get more that one false-positive
result?
3. You read that one out of four eggs contains salmonella bacteria. If you use
six eggs in your chocolate cake, what is the probability that:
a) one of the eggs contains salmonella?
b) at most, two of the eggs contain salmonella?
4. If 40% of all patients have medical aid, what is the probability that, in a
sample of 10 patients:
a) exactly four will have medical aid?
b) at least four will have medical aid?
c) at most, four will have medical aid?
5. Shortly after being put into service, some buses of a certain type develop
cracks on the underside of the mainframe. A particular city has 20 buses of
this type, eight of which have cracks. If five buses are randomly selected for
inspection, determine the probability of finding:
a) exactly two buses with cracks
b) at most, two buses with cracks.
6. About 15% of the population is left-handed. Fifteen individuals are randomly
selected. What is the probability that:
a) three or fewer are left-handed?
b) one or more are right-handed?
7. According to the National Environmental Programme, air pollution
standards for particulate matter are exceeded an average of 5.6 days in
every three-week period. What is the probability that the standard is:
a) not exceeded on any day during a three-week period?
b) exceeded two days or more of a two-week period?
211
212
14. Manufactured items are sold in boxes that are stated to contain a mass of at
least 40 kg. The actual mass in a box varies with a mean of 41.2 kg and a
standard deviation of 0.8 kg.
a) Calculate the proportion of boxes whose mass is between 40 kg and
42 kg.
b) Calculate the mass below which 20% of the lightest boxes fall.
c) All boxes containing less than 40 kg are scrapped at a cost of R100 per
box. Calculate the scrapping cost associated with the packing of 50
boxes.
d) To what mean mass should the box contents be adjusted, with the
standard deviation unchanged, if only 1% of the boxes are to be
scrapped?
15. The production foreman of the Oros Fruit Company estimates that the
average sales of oranges are 4 700 and the standard deviation, 500 oranges.
Calculate the probability that sales will be:
a) more than 5 500 oranges
b) more than 4 500 oranges
c) less than 4 900 oranges
d) between 4 500 and 4 900 oranges
e) between 4 900 and 5 500 oranges.
16. Birth masses are normally distributed with a mean of 3 579 g and a standard
deviation of 500 g. What is the cut-off point for the lightest 2% of babies?
17. The Faber Co produces a pencil called Ultra-Light. Sales follow a normal
distribution with a mean of 457 000 pencils each year. Furthermore, 90%
of the time sales have been between 460 000 and 454 000 pencils. Estimate
the standard deviation of this distribution.
18. The average number of calories in a 50 g chocolate bar is 225. If the
distribution of calories is approximately normal, with a standard deviation
of 10, find the probability that a randomly selected chocolate bar will have
between 200 and 220 calories.
19. The thickness of bolts (mm) manufactured by a certain process follows a
normal distribution with a mean of 10 mm and a standard deviation of
1 mm.
a) What proportion of the bolts in the long run are at most 11 mm?
b) What proportion of the bolts will have thickness values between 7.5 mm
and 12.5 mm?
c) What proportion of bolts will have thicknesses that exceed 11.5 mm?
213
20. The amount of distilled water dispensed by a certain machine has a normal
distribution with a mean of 64 litres and a standard deviation of 0.78 litres.
What container size will ensure that overflow occurs only 0.5% of the time?
214
The t distributions are more dispersed than the normal distribution and
are distinguished by a positive whole number, called degrees of freedom:
df 5 n 2 1
216
Degrees of freedom (df) are defined as 1 less than the sample size (n – 1) and
represent the number of observations that are ‘free to vary’ around the mean
of the sample.
There is a different t distribution for each sample size. But as the sample
size gets larger, the t distribution becomes more and more normal. Once the
number of degrees of freedom reaches 30, the t distribution is so close to the
normal distribution that we can use the normal distribution to approximate
the t. That means that the t distribution will only be used for samples with
sizes of less than 30.
217
Example 10.1
The average number of hours per week spent on the internet was calculated for
a sample of 20 students as x 5 7.
The point estimate for the population mean will be: m 5 7 hours per week.
218
(1 2 a)
a a
2 2
2z 1z
x 6 z .
m5 n
Steps
1. Collect a sample of an adequate size (n).
2. Compute the sample mean ( x ) and standard deviation (s) or proportion (p).
3. Determine the type of sampling distribution:
a) normal (z) if population is normally distributed with known
b) normal (z) via the central limit theorem with (n 30) and known
c) normal (z) via the central limit theorem (n 30) if is unknown
d) student t distribution for n , 30 and unknown
e) normal (z) if dealing with proportions with np 5 and n(1 2 p) 5.
4. Identify the level of confidence (1 2 ): a 95% level implies that if 100
different confidence intervals are constructed, each based on a different
sample from the same population, we expect 95 of the intervals to include
the parameter and five not to include the parameter. We are capturing the
middle 95% between the two critical values and 2.5% in each tail.
5. Find the critical value z or t that corresponds to the level of confidence by
making use of the appropriate table (normal z table or student t table) and
the level of confidence. A critical value is the cut-off point between the
sample statistics that are likely to occur and those that are unlikely to occur.
219
2.5% of
2.5% of all
all sample
sample means
means will
will lie in this tail
lie in this tail
95% of all sample means
are between a z-score of
21.96 and 11.96
21.96 1.96
For a confidence level of 95%: the standard normal table is used to determine
a value z such that a central area of 0.95 falls between 2z 5 21.96 and 1z
5 1.96.
Identify the critical z value for a 95% confidence level:
• 95% or 0.95 covers the middle area of the curve. Do not look up 0.95
in the body of the Normal table because the Normal table contains
probabilities only for one half of the normal curve. Divide 0.95 by 2 to
obtain the area to the left or right of the mean (0.95 4 2 5 0.475).
• Look up 0.475 or the closest to this area in the body of the table to find
the corresponding z as 61.96 (6 because the area to the left of the mean
will result in a 2z and, to the right of the mean, a 1z).
6. Find the margin of error (E) which is the critical value multiplied by the
standard error of estimate:
x 5
n or
p(1 2 p)
n
7. Calculate the upper and lower confidence limits by making use of the
appropriate formula:
s
• 6 z .
m5
x 6 z .
or m 5 x
n n
s
• 6 t .
m5
x n
p 5 p 6 z
p(1 2 p)
• n
220
Activity 10.1
Identify the critical z values associated with confidence levels of 90%, 98% and
99%.
10.3.3 Confidence interval estimate for the population mean (μ) for
data obtained from a population that is normally distributed
or from large samples (n 30)
The central limit theorem states that if n is large (n 30), the sampling
distribution of the mean will be approximately normal. It does not matter
whether is known or unknown or if the distribution is normal or not. If is
unknown, substitute with the sample standard deviation s.
Example 10.2
The Pappi Paper Company wanted to estimate the average time required for a
new machine to produce a ream of paper. A sample of 36 reams required an
average production time of 1.5 minutes for each ream. The population standard
deviation was 0.30 min and the confidence level was 95%.
1. The population distribution is not known to be normal, but via the central
limit theorem we assume a normal distribution.
2. To obtain the z value from the Normal table, divide the confidence level by
2: (0.95 4 2 5 0.475) and look up the area in the body of the Normal table.
An area of 0.475 corresponds to z 5 61.96
3. Use the sample standard deviation (s) to estimate the population .
4. Use the normal distribution formula to calculate the interval boundaries.
x 6 z .
m 5 n
0.3
m 5 1.5 1 1.96 .
36
5 1.5 6 0.098
1.5 2 0.098 5 1.402 and 1.5 1 0.098 5 1.598
1.402 # m # 1.598
5. Based on the sample data, the Pappi Paper Company can be 95% confident
that the average time required for a new machine to produce a ream of
paper lies between 1.402 and 1.598 minutes.
221
Activity 10.2
Example 10.3
The number of home fires that were started by candles in low-cost housing areas
was recorded for a sample of seven years. The mean number of fires was 7 046
with a standard deviation of 1 605. Calculate the 99% confidence interval for
the average number of home fires started by candles.
• Find the critical values for 99% confidence and n 5 7 using the t table from
Appendix 2:
df 5 n 2 1 5 7 2 1 5 6
Use the t table to look up the critical values. The top rows of the table indicate a
one-tail test or a two-tail test at a specified significance level (). All confidence
interval estimates are two-tail tests. If the level of confidence is 0.99, the value
is: (1 2 0.99) 5 0.01. Choose the 0.01 column under the two-tail row and go
down in this column to where it corresponds with the desired degrees of freedom
(df), which is 6. The df column is the first column in the table. This t table value
is 3.707.
s
m5 x 6 t .
n
1 605
m 5 7 046 6 3.707 .
7
5 7 046 6 2 248.8
4 797.2 # m # 9 294.8
• Based on the sample data, we can be 99% confident that the mean number of
home fires per year will be between 4 797 and 9 295.
222
Activity 10.3
The time taken to complete the same task (in minutes) was recorded for nine
participants in a training exercise as follows:
8 7 8 9 7 7 9 10 9
Construct a 95% confidence interval for the average time taken to complete the
task.
Steps
1. Identify the sample statistics n and x (or p).
x
2. Find the point estimate if not given: p 5 n
3. Verify that the sampling distribution of p can be approximated by the normal
distribution: np 5 and n(1 2 p) 5
4. Find the critical value z that corresponds to the given level of confidence.
5. Use the formula to calculate the margin of error and the confidence interval
boundaries.
p 5 p 6 z
p(1 2 p)
n
Example 10.4
A survey found that, out of 200 workers, 168 said they were interrupted three
or more times an hour by phone calls. Find the 90% confidence interval of the
population proportion of workers who are interrupted three or more times an
hour.
168
200 5 0.84
p5
223
You can be 90% confident that the true proportion of workers that are interrupted
three or more times per hour is between 80% and 88%.
Activity 10.4
Formula for determining the sample size when estimating the population mean:
z.
n 5 (
E )
2
224
Formula for determining the sample size when estimating the population
proportion:
p(1 2 p) . z2
n5 E2
Example 10.5
Consider a machine that is filling cans with tomato paste. Experience has shown
that in this process the population of fill masses is normally distributed with
a standard deviation of 0.31 g. The production supervisor wants to collect a
sample just large enough to provide a sample mean within 0.25 g of the true
process mean at a 99% confidence level.
z.
n 5 (
E )
2
5 10.23 ≈ 11
Activity 10.5
Example 10.6
In a sample of 100 invoices randomly selected from the debtors file, 12 were
found to be incorrect. How large a sample is necessary if we want the percentage
of incorrect invoices to be within 3% of the true population proportion at a 95%
confidence level?
225
5 450.74 ≈ 451
Activity 10.6
TEST YOURSELF 10
226
227
228
11 Hypothesis testing
Methods for making inferences about population parameters fall into one of two
categories. In Unit 10 you studied how to estimate the value of the population
parameter of interest and in this unit you will learn how to test a claim or
hypothesis about a population parameter.
Results are statistically significant if the difference between the sample result
and the statement made in the null hypothesis is unlikely to occur due to chance
alone. It indicates that the sample came from a population with a mean other
than the hypothesised mean.
Steps
Understand the problem
1. Set up the null and alternative hypothesis.
230
231
the population, there is always a possibility that the decision could be wrong.
The probability associated with this uncertainty is your level of significance.
Just as we place a level of confidence in the construction of an interval, we can
determine the probability of making errors. The significance level is chosen by
the researcher before the sample data are collected.
• 5 type I error and occurs if H0 is rejected when it is true
• type II error occurs if H0 is not rejected when it is false.
For example, when 5 0.10, there is a 10% chance of rejecting a true H0. You
can decrease the probability of rejecting H0 when it is actually true by lowering
the significance level.
The significance level is the maximum probability of making a type I error
and is denoted by .
• The purpose of the level of significance is to provide a probability basis for deciding
whether an observed difference between a sample statistic and a hypothesised
parameter is a chance difference or a statistically significant difference, since a is
the probability that the test statistic will fall in the rejection area.
• Usually tests are performed at an value of 0.01, 0.02, 0.05 or 0.10.
11.1.4 Determine the critical value(s) and identify the rejection region
• The critical value represents the maximum number of standard deviations
that the sample mean or proportion can differ from the hypothesised value
before the null hypothesis (H0) is rejected.
• The critical value separates the area under the curve into two regions: the
non-rejection region and the rejection region. The rejection area(s) falls in the
tail(s) of the distribution.
232
• The rejection region (or decision rule) is a range of values such that, if the test
statistic falls into that range, we reject the null hypothesis (H0).
Steps
1. Specify the level of significance ().
2. Decide whether the test is two-tailed, left-tailed or right-tailed.
The HA is the indicator whether to perform a one-tailed test or a two-tailed test.
• If HA: m ≠ hypothesised value: two-tailed test
• If HA: m , hypothesised value: left-tailed test
• If HA: m . hypothesised value: right-tailed test
3. Find the critical value(s). One or two critical values are established on the
horizontal axis of the distribution, which serve as cut-off points between
the non-rejection and rejection areas.
• Critical values are expressed in the same measurement units as the test
statistic (z or t).
• A two-tailed test will have two critical values close to the two tails of the
curve. Each tail contains
2 % of the sample distribution means farthest from
the hypothesised mean. The critical values (expressed as a z or t score(s) will
correspond to an area ( 0.5 2 2 )from the mean.The critical value in the left
tail will take on a negative sign and in the right tail, a positive sign.
• A one-tailed test will have one critical value placed close to one side of the
curve. This one tail contains % of the sample distribution means farthest
from the hypothesis mean. The z or t score (from the corresponding table)
will correspond to an area of (0.5 2 ) from the mean.
4. Sketch the normal curve. Draw a vertical line at the critical value(s) and
shade the rejection region(s).
5. State the rejection region in words.
Example 11.1
Two-tailed (0.5 2 0.025 5 0.475) Left-tailed (0.5 2 0.05 5 0.45) Right-tailed (0.5 2 0.05 5 0.45)
2.5% 2.5% 5% 5%
0.475 0.475 0.45 0.45
233
Activity 11.1
Find the critical value(s) and rejection region for a two-tailed test, a left-tailed
test and a right-tailed test for an approximately normal distribution at levels of
significance of 1%, 2% and 10%.
Normal z distribution:
x 2 m
z5
n
234
Example 11.2
A machine is set to release 30 g of dried fruit into a box of cereal moving along
the production line. A sample of 36 boxes revealed that the average mass of fruit
inserted was 30.3 g with a standard deviation of 0.5 g. Is the increase in the
amount of fruit inserted significant at the 0.01 level of significance?
1. H0: m 5 30
2. HA: m . 30 (indication of ‘more than’)
3. 5 0.01
235
Reject HO
Do not reject HO
1%
2.33
Activity 11.2
A sample of 100 healthy adult males has a systolic blood pressure of 125 mmHg
with a standard deviation of 15. Test at a 2% level of significance whether the
mean systolic blood pressure is different from the generally accepted level of
130 mmHg.
Example 11.3
236
standard deviation of 2.4 minutes. Is there any significant reduction in the time
at a level of significance of 0.05?
1. H0: m 5 35
HA: m < 35 (reduction in time is an indication of less than)
2. 5 0.05
The alternative hypothesis uses , so the test is a one-tail test to the left.
Use the t distribution because is unknown and the sample size is small.
Reject HO
Do not reject HO
5%
21.782
3. To look up the critical t value you need to know the direction of the test and
the value. Find the in the one-tail test row of the t table if the test is one-
tailed, and in the two-tail test row if two-tailed. Move down in the chosen
column to the required number of df. (Remember df is (n 2 1).) The critical t
value is where the df value corresponds with the .
6. The sample evidence does suggest that there is a significant reduction of the
process time.
237
Activity 11.3
From past records we know that the average unbroken sleep periods of patients
with a certain kind of insomnia is 2.8 hours. A new drug is tested on a sample
of 25 patients and this yields an average of three hours of unbroken sleep with
a standard deviation of 0.8 hours. Is there a significant improvement on the
unbroken number of hours sleep? Test at 5 2.5%.
Example 11.4
5 21.67
Two-tailed (0.5 2 0.025 5 0.475)
2.5% 2.5%
0.475 0.475
21.96 0 1.96
238
Activity 11.4
To determine if new flavours of ice cream must be introduced into the market, a
random sample of 320 people was asked to taste and choose their favourite ice
cream flavour. Of the 320 people surveyed, 58 responded that they preferred the
chocolate flake flavour. If less than 25% of the sample prefers the new flavour,
it will not be used. Test the claim that less than 25% of people prefer chocolate
flake flavour ice cream at the 5 0.01 level of significance.
Steps
(Note: The first three steps and the last step are identical to the classical
approach.)
1. State the null hypothesis, H0, and the alternative hypothesis, HA.
2. Choose the level of significance ().
3. Determine the type of sampling distribution (normal z or t) and conduct the
test statistic.
4. Find the P value (or area) by looking up your test answer in the z table or t
table.
• For a one-tailed test, use the test statistic and look up the area in the
normal z or t table. This area is the P value.
• For a two-tailed test, use the test statistic and look up the area in the
normal z or t table. Multiply this area by 2 to obtain the P value.
5. Make the decision.
Compare the P value to to determine whether or not to reject the null
hypothesis.
P value # : reject H0
P value . : do not reject H0
239
• A large test statistic, one that is larger than the critical value, is associated
with a small P value, one that is smaller than .
• Small P values will result in the rejection of H0 and large P values will result
in failing to reject H0.
• You can conduct the hypothesis test at different levels of significance. If, for
example, the P value of a sample is .025, then you reject the null hypothesis
at the 5% level of significance, but not at the 1% level of significance.
• The conclusion may be based upon the P value alone, which is the lowest level
of significance at which H0 can be rejected.
If the P value , 0.01: strong evidence against H0
If the P value is 0.01 to 0.05: some evidence against H0
If the P value . 0.05: insufficient evidence against H0
6. Interpret the decision in the context of the original claim.
Example 11.5
A cell phone operator’s manager believes that customer monthly cell phone bills
average more than R85 per month. To test this claim, a sample of 64 customer
cell phone accounts was randomly selected. The mean of the sample is found to
be R88 and the standard deviation R16.
1. H0: μ 5 85
HA: μ . 85
2. 5 0.05
3. Since our hypothesis involves the population mean and the sample size is
more than 30, the test statistic is z.
x 2 m 88 2 85
4. z 5
s 16
5 5 1.5
n
64
Use the Normal table to look up the area of the z test of 1.5. The area is 0.4332.
The HA is an indication to conduct a one-tailed test to the right of the
distribution.
The P value of z test 5 1.5 is 0.5 2 0.4332 5 0.0668. This means that the
probability of obtaining a sample whose mean is R88 or more when H0 is not
rejected is 0.0668.
5. Decision: Since 0.0668 is greater than the level of significance, 5 0.05,
we do not reject H0.
6. Conclusion: Based on the hypothesis, there is no evidence that the mean
customer cell phone account has increased significantly from R85 per month.
240
Example 11.6
Example 11.7
The National Road Safety Council claimed that 50% of the accidents that occur
over the Easter weekend would be caused by drunk driving. A sample of 130
accidents over the Easter weekend showed that 70 were caused by drunk driving.
241
Use these data to test the NSC’s claim at a 0.05 significance level.
1. H0: p 5 0.50
HA: m ≠ 0.50
2. 5 0.05
3. Since our hypothesis involves the population proportion and the sample size
is more than 30, the test statistic is z.
p2p 0.54 2 0.5
4. z test 5
5
5 1.00
p(1 2 p) 0.5(1 2 0.5)
n 130
Use the Normal table to look up the area of the z test 5 1.00.The area is 0.3413.
The HA is an indication to conduct a two-tailed test.
The P value of z test 5 1.0 is: 0.5 2 0.3413 5 0.1587 per tail.
For both tails: 0.1587 3 2 5 0.3174
This mean that the probability of obtaining a sample whose proportion is at
least 54% when H0 is not rejected is 0.3174.
5. Decision: Since 0.3174 is greater than the level of significance, 5 0.05,
we do not reject H0.
6. Conclusion: Based on the hypothesis, there is no evidence that the proportion
of accidents caused by drunk driving is significantly different from the
claimed 50%.
242
Example 11.8
The management of a mine wishes to investigate the effect of the four-day work
week on absenteeism. Two random samples of 40 were selected; employees of
group A worked 10-hour days (four-day week) and group B worked eight-hour
days (five-day week). If group A averaged four hours of absenteeism per week
with a standard deviation of 1.2 and group B averaged 4.4 hours of absenteeism
per week with a standard deviation of 1.5, should we conclude that the shorter
work week reduces absenteeism? Set 5 0.05.
1. State the null hypothesis and the alternative hypothesis:
H 0: m 1 5 m 2
HA: m1 , m2 (indication of ‘less than’)
2. Select the level of significance:
5 0.05
3. Formulate the decision rule:
The alternative hypothesis uses <, so the test is a one-tail test to the left. The
central limit theorem applies and we use the z distribution.
Reject H0 if the z test < 21.64
4. Determine the value of the test statistic:
1
x 2
2
x
z5
s1 s2 2
n2
n 1
1 2
4 2 4.4
5
1.22 1.52
40 1
40
5 21.32
5. Since 21.32 . 21.64 we do not reject H0 at the 0.05 level of significance.
6. The sample evidence suggests that there is no significant evidence to
conclude that the shorter work week does reduce absenteeism.
Activity 11.5
A report on personal savings of 240 citizens of the Gauteng region showed that
the average annual savings was R9 300 with a standard deviation of R3 600.
The data for a sample of 150 citizens in the Western Cape region showed annual
savings of R8 400 with a standard deviation of R2 100. Test at a 5% significance
243
level whether there is a significant difference in the annual savings between the
two regions.
Example 11.9
In order to compare if the performance of two training methods are the same,
samples of individuals using each of the methods were checked. For the six
individuals from method one, the mean efficiency score was 35 with a standard
deviation of six. For the eight individuals in method two, the mean efficiency
score was 27 with a standard deviation of seven. Set 5 0.01.
1. H0: m15 m2
HA: m1 ≠ m2 (indication of two-tailed test)
2. 5 0.01
3. The alternative hypothesis sign is ≠, so the test is a two-tailed test.
Both samples are small, therefore we use the t distribution.
If two samples are used, the number of degrees of freedom will be:
n1 1 n2 2 2 5 6 1 8 2 2 5 12
Reject H0 if the t test . 2.681 or if t test , 22.681
1
x 2
2
x
4. Test: 5
(n 2 1)s 1 1 (n 2 1)s 2
1
2 2
2
n 1 n 2
2 .
n1 1
n1
1 2 1 2
35 2 27
5
(6 2 1)62 1 (8 2 1)72
2
2
6 1 8 16 1
. 18
5 2.24
5. Since 2.24 falls in the acceptance area, we do not reject H0 at the 0.05 level
of significance.
6. The sample evidence suggests that there is no significant difference in the
performance of individuals using the two training methods.
Activity 11.6
The manufacturer of two styles of shoes (A and B) wishes to test the hypothesis
that the average retail price of style A is less than the average price of style B.
A random sample of 12 retailers who stock style A yielded an average price of
244
Example 11.10
Workers in two different mining groups were asked what they considered to be the
most important problem they have with management. In group A, 200 out of a
random sample of 400 workers felt that a fair adjustment of grievances was the
most important problem. In group B, 60 out of a random sample of 100 workers
felt that this was the most important problem. Would you conclude that these two
groups differed with respect to the proportion of workers who believed that a fair
adjustment of grievances was the most important problem? Set 5 0.1.
1. H0: pA 5 pB
H A: p A ≠ p B
2. 5 0.10
3. Reject H0 if the z test . 1.64 or if the z test is , 21.64
1 2
p 2p
4. z test 5
(
p^ . q^ .
1 1
n
1
1
)
n
2
n p 1n p
400(0.5) 1 100(0.6)
where p^ 5
1 1 2 2
n 1n
5
400 1 100
5 0.52
1 2
0.5 2 0.6
5
0.52 3 0.48
1 1
400
1 (
100
)
5 21.79
5. Reject H0.
6. There is significant evidence to conclude that the two groups differ in their
beliefs.
Activity 11.7
The manufacturer of Munchy Breakfast Bars believes that her product will be
more popular in region Y than in region X, where it is currently being produced
245
and distributed. In order to check this hypothesis, a random sample was taken
from each region. The sample in region X contained 700 people, 560 of whom
claimed to prefer the taste of Munchy. In region Y, 525 of the 750 people sampled
responded favourably. Based on these results, does it appear that Munchy will be
more popular in region Y than in region X at a 2% significance level?
Two basic tests that will be discussed here are the test for independence of
variables and the goodness-of-fit test.
Steps
1. State H0 and HA: the null hypothesis states that the two variables are
statistically independent. This means that knowledge of the one variable
does not help in predicting the other variable.
H0: the variables are independent (no relationship)
HA: the variables are dependent (is a relationship)
2. Select the level of significance ().
3. State the decision rule by defining the rejection region.
You must know the level of significance () and the degrees of freedom (df) to
find the critical value from the x2 table in Appendix 3.
246
accept
reject
critical value
4. Calculate the value of the chi-square test by substituting cell by cell the
values from the fo and fe table into the formula:
(f 2 f ) 2
x2 5 S
o e
fe
• Construct a table with columns showing the fo, fe and x2 value for each
entry.
• The observed frequencies (fo) are obtained from the sample data given in
the contingency table.
• In order to perform the chi-square test, expected frequencies (fe) are needed.
The (fe) for any given cell in the contingency table is the product of the total
of the frequencies observed in that row and the total of the frequencies
observed in that column, divided by the overall size of the sample.
247
Example 11.11
A random sample of adults was selected from each of four ethnic groups in Cape
Town. They were asked to specify their primary source of news. The results were
as follows:
Ethnic Group
A B C D Total
TV 30 20 25 20 95
Radio 25 25 20 20 90
Newspaper 10 10 5 30 55
Total 65 55 50 70 240
Is there a relationship between ethnic groups and the source of news at a 2.5%
level of significance?
1. H0: there is no relationship between ethnic group and source of news
HA: there is a relationship between ethnic group and source of news
2. 5 0.025
3. Reject H0 if the x2 test value . 14.449
(use df 5 (k 2 1)(r 2 1) 5 (3 2 1)(4 2 1) 5 6)
4. Test:
(f 2 f ) 2
fo fe x2 5 S o
e
fe
30 25.73 0.71
25 24.38 0.02
10 14.90 1.61
20 21.77 0.14
25 20.62 0.93
10 12.60 0.54
25 19.79 1.37
20 18.75 0.08
5 11.46 3.64
20 27.71 2.15
20 26.25 1.49
30 16.04 12.15
240 240 24.83
248
5. Decision: Because the test statistic falls in the rejection region, the H0 should
be rejected at a 0.025 significance level.
6. Conclusion: There is evidence to suggest that there is a relationship between
ethnic group and source of news.
Activity 11.8
Steps
1. State the null and alternative hypotheses:
H0: The population under investigation fits some specified or expected
distribution.
2. Select the level of significance: is the criterion used to formulate the
rejection area for H0.
3. Define the rejection region: to find the critical values from the chi-distribution
table you need the level of significance () and the degrees of freedom:
df 5 k 2 1 where k is the number of possible outcomes in the investigation.
Reject H0 if the x2 test statistic . x2 critical value.
249
(fo 2 fe)
2
4. Calculate the x2 test statistic: x2 5 S
fe
Where:
fo is the observed frequency from the sample data, and
fe is the expected frequency that is calculated to conform to the null
hypothesis that is being tested.
If the calculated x2 test statistic is zero, it means that the observed frequencies
and expected frequencies are identical, or exactly what we had expected.
5. Make the decision.
6. Interpret the decision.
Example 11.12
250
Example 11.13
The respective car manufacturers’ shares of the national market are as follows:
Manufacturer % of market share
Volkswagen 37
Toyota 30
Delta 15
BMW 10
Mercedes 8
251
5. Decision: Because the test statistic falls in the rejection region, the H0 should
be rejected at a 0.05 significance level.
6. Conclusion: There is evidence to suggest that the pattern in Pretoria differs
from the national pattern.
Activity 11.9
TEST YOURSELF 11
252
fruit inserted was 30.3 g with a standard deviation of 0.5 g. Is the increase
in the amount of fruit inserted significant at the 0.05 level of significance?
5. A company that makes cola drinks states that the mean caffeine content per
bottle of cola is 40 mg. The quality controller is convinced that it is lower. A
sample of 30 bottles of cola has a mean caffeine content of 39.2 mg with a
standard deviation of 7.5 mg. At 5 0.01, can the quality controller reject
the claim?
6. Hyperactive children are often disruptive in the typical classroom setting
because they find it difficult to remain seated for extended periods of time. The
typical number of ‘out-of-seat’ behaviours was 12.40 per hour. Treatment
was applied to a group of 25 hyperactive children and after treatment the
‘out-of-seat’ behaviours reduced to 11.60 per hour with a standard deviation
of 3.5. Using 5 0.01, can we conclude that this decline is significant?
7. Medical research has shown that repeated wrist extension beyond 20°
increases the risk of wrist and hand injuries. In each of 24 randomly selected
students in the information technology field, wrist extension was recorded
while using a mouse with a proposed new design. The sample mean was found
to be 24° with a standard deviation of 5°. Test the hypothesis that the mean
wrist extension for people using the new mouse design is greater than 20°.
8. You are involved in an environmental awareness programme and want to
test the claim that the mean waste generated by adults is more than 1.8 kg
per day. In a random sample of 15 adults you find that the mean waste
generated per person per day is 1.9 kg with a standard deviation of 0.54 kg.
At a 5% level of significance, is the claim justified?
9. A sample of 16 unflavoured ice cream tubs were selected at random and
subjected to chocolate flavouring. The sample mean time required to flavour
the ice cream was 13 minutes with a standard deviation of 2 minutes. Perform
a hypothesis test at the 1% level of significance to test that the population
mean time required to flavour ice cream is greater than 10 minutes.
10. A chicken producer claims that the average mass of a particular group of
chickens is 1 kg. Before agreeing to purchase, a customer selected a sample of
25 chickens, which yielded a sample mean of 1.12 kg and standard deviation
of 0.1 kg. If the masses can be considered to be normally distributed, should
the claim be rejected at the 1% level of significance?
11. A personnel manager claims that 60% of all single women hired for
secretarial jobs leave to get married within two years. An analysis shows
that of a random sample of 120 single women, 64 left to get married. Is this
evidence consistent with the company’s claim, at a 1% level of significance?
253
254
Is there sufficient evidence to conclude that brand B has a longer shelf life
than brand A at a 2% level of significance?
22. In a public opinion survey, 60 out of a sample of 100 high-income voters
and 40 out of a sample of 75 low-income voters supported a decrease in
VAT. Can we conclude at a 5% level of significance that the proportion of
voters favouring a decrease differs between high- and low-income voters?
23. In an Aids awareness programme, it was found that 110 males in a random
sample of 310 males were aware of Aids. In another similar programme, it was
found that 87 women in a random sample of 290 women were aware of Aids. Test
at the 2% level of significance whether the first campaign was more successful.
24. Tests have been carried out on the effects of three fertilisers on sugar cane
growth. Each fertiliser was tried on several different plots of land. Each
value is a number of plots of land.
255
Fertiliser
A B C
Strong growth 94 124 44
Weak growth 50 96 38
Test for an association between the choice of fertiliser and plant growth at a
1% level of significance.
25. A car manufacturer is interested in predicting purchase patterns for a new
small capacity car they are producing. The car comes in four colours and
the manufacturer wants to relate colour preference to the gender of the
purchaser. Use the following sample data and do the hypotheses tests at a
10% level of significance.
White Green Red Silver
Male 260 240 175 420
Female 130 200 240 340
26. Two different manufacturers supply parts for a production process. Each
part is tested for six possible defects. The following table shows the number
of each type of defect by each supplier:
Defect
Supplier 1 2 3 4 5 6
A 35 10 10 2 5 10
B 45 20 0 10 15 20
Would you conclude that the defect is independent of the supplier, using a
2.5% level of significance?
27. A sales manager has become interested in the number of sales calls made
by each of the employees. He reasoned that if all the employees are working
equally hard, they should make the same number of calls during a set period
of time. In order to investigate this hypothesis, the manager used a sample
of five employees and recorded the number of calls they made during a set
time period:
Employee A B C D E
No of calls 31 62 59 40 58
256
28. An accountant for a department store knows from past experience that
23% of customers pay cash for their purchases, 35% write cheques and
the remaining 42% use credit cards. A random sample of 200 sales receipts
during a month-end week was examined and the following results were
obtained:
Cash Cheque Credit card Total
Number of customers 37 47 116 200
Are the customers’ payment methods still the same as before? Use 5 0.05.
29. The manager of a local Spar counted the number of customers using the
store’s five checkout lanes during Friday and Saturday of a certain week.
The results were as follows:
Checkout Lane
1 2 3 4 5
Number of customers 160 200 300 120 100
Lane 5 is closed much of the time because it is used during busy times only.
The manager suspects, prior to taking the actual count, that lane 5 will be
used half as often as lanes 1, 2 and 4. Checkout lane 3 is the express lane
and is used by twice as many people as lanes 1, 2 and 4. Test the manager’s
belief that certain lanes are used more than others using 5 0.01.
30. Two companies have recently conducted aggressive advertising campaigns
to maintain or increase their respective market shares for a particular
product. Before the campaigns the market share of company A was 45%,
while company B had a share of 40%. Other competitors accounted for the
remaining 15%. To determine whether these market shares changed after
the campaigns, a market analyst determined the preferences of a random
sample of 200 customers of this product. Of the sample, 100 indicated a
preference for company A’s product, 85 preferred company B’s product and
the remainder preferred another competitor’s product. Conduct a test to
determine, at a 2.5% level of significance, whether the market shares have
changed from the previous levels.
31. A computer store owner feels that 50% of her customers purchase word-
processing programs, 25% purchase spreadsheet programs and 25%
purchase games. A sample of purchases shows the following distribution:
257
258
12 calculations
In this unit you revise your calculation skills and how to utilise your calculator.
The purpose of this course is to provide you with the numeracy skills to
understand the basic principles of business calculations and make sound
decisions based on them.
These skills will benefit you in other subjects, in a business career, and even in
the everyday business of living.
262
5. The number of digits that can be entered into a calculator depends on the
size of the display, normally 10 digits. When the resulting answer exceeds
the display limit, the value is displayed in scientific notation. The display
reads as follows in these cases:
• 1.405 is 140 000. The 05 to the right of the value means that the decimal
point must move five places to the right.
• 1.4–04 is 0.00014. The decimal point is moved four places to the left.
6. The different keys to use for specific calculations will be mentioned when
the operation is dealt with in this unit. It is also recommended that you
keep your calculator’s owner’s manual or user’s guide accessible because
different calculators use different methods, which your trainer may not be
familiar with.
Integers
Fractions
263
and their interrelations. Negative numbers are indicated to the left of the 0 and
positive numbers to the right of the 0. The number line can extend to infinity
with fractions or decimals between the whole numbers.
25 24 23 22 21 0 1 2 3 4 5
Activity 12.1
Division by 0 is undefined.
The three groups of rational numbers are:
1. Integers or whole numbers. For example, 6 is a rational number because it
6
can be written as 1 (both numerator and denominator are whole numbers).
1
2. Finite or terminating fractions. For example: 4 is a rational number
because the decimal expression 0.25 is terminating or finite.
2
3. Recurring or repeated fractions. For example: 3 is a rational number
because the decimal expression 0.666… has a pattern that repeats or can
carry on forever. This recurring decimal can also be written as 0.6.
Irrational numbers are real numbers which cannot be expressed as the ratio of
two integers. The decimal value cannot be expressed with either a finite number
264
Example 12.1
Recurring decimals
a
5 5 2.2360679775… is an irrational number because it cannot be written as b
and this never-ending string of digits will not form a pattern that continues to
repeat itself.
Activity 12.2
5
2.
7
3
3.
13
13
4.
19
Note: The distinction between rational and irrational numbers is of very little
significance as far as practical applications are concerned. This is due to the fact
that any irrational number can be approximated (rounded) to any desired degree
of accuracy by means of a rational number.
265
1 6 3
A mixed number, such as 1 2 , 1 7 or 5 4 , is a whole number with a fraction.
Any mixed number can be turned into an improper fraction.
To change a fraction into a decimal, divide the numerator by the denominator.
Activity 12.3
In the table below tick () in the column(s) that correctly describe the number as
real, irrational, rational, integer, whole or natural. Use a calculator to help you.
Real Irrational Rational Natural Whole
number number number number number
5
54 876
216
8
5
1.25
( 12 )
3
1
7
3
12.3 Common notation
Mathematical ‘shorthand’, or symbols, is often used in analysing and presenting
results rather than descriptive text.
Arithmetic symbols
1 add 2 subtract
3 multiply 4 or / divide
< less than # less than or equal to
> more than more than or equal to
5 equal to ≠ not equal to
± plus/minus S sum of
rounded as n! factorial
266
12.4 Basic operations
Note: To change the order of priority, brackets or the calculator can be used. Note
that the multiplication symbol ‘3’ is frequently omitted in some expressions. For
example: 6 3 (5 2 2) will normally be shown as 6(5 2 2).
Activity 12.4
267
be achieved only through practice. Adding is to add more (or calculate the sum
of) and subtracting is to take away (or calculate the difference).
Activity 12.5
Note: The order in which you multiply and divide does not matter, but, if the
268
calculation includes addition and subtraction, you must first calculate values
inside brackets, or multiply and divide, before you add and subtract.
Note: An alternative to the multiplication sign (3) is the multiplication point (),
a point that is set above the line and not to be confused with the decimal point.
An alternative to the division sign (4) is the right oblique (/), as used in
writing fractions.
Example 12.2
1. You have four boxes with 24 bars of soap in each and you want to know how
many bars of soap you have in total:
4 3 24 5 96
2. You must pack 100 items in boxes containing five items each and you want
to know how many boxes you need:
100 4 5 5 20 boxes
Activity 12.6
269
If each trip takes 2 hours and 15 minutes, how many hours will you take to
move the clothing? How many tonnes will you take on each trip?
8. You want to put a carpet in the staff room. You have to pay R55 per m2. The
room is 11 m long and 6 m wide. How many square metres of carpet do you
need? How much will it cost you?
Note: Use the (2) or 6 key on the calculator to change the sign of a number.
Activity 12.7
1. (13) 1 (22) 5
2. (22) 2 (13) 5
3. (24) 1 (27) 5
4. 0 2 (23a) 5
270
5. (23) 1 (11) 5
6. (15) 3 (11) 5
7. (2xy)(21) 5
8. (212xy) 4 (24) 5
9. (12t)(2t) 5
10. (25p) 3 (12p) 5
Note: Use the power key xy or ˆ on the calculator to calculate the answer. Enter a
value for x, press the power key, and then enter a value for y.
2 xy 4 5 16
12.7 Square roots ( )
The square root is the inverse of squaring. The root of a number is that quantity
which, when multiplied by itself, equals the number.
For example: 25 5 5 (read as the square root of 25 equals 5) and 52 5 25.
Other roots – the third, fourth etc – are possible, and all roots can also be written
as fractional exponents. To convert a root to its exponential form the root is first
converted to its reciprocal and the quantity is then raised to that reciprocal power.
3 4
For example: 16 5 161/3 or 5 5 51/4 or 25 5 251/2.
Activity 12.8
271
3
65 5
3.
12.8 Logarithms (log)
The logarithm of a number is the power (exponent) to which a base number
must be raised to produce that number.
The common logarithm of a number is the exponent of 10, which equates to
that number. For example: the log of 1 000 is 3 because 103 5 1 000. All numbers
can be converted to a common base of 10. When the base is not indicated, it is
understood to be 10.
Logarithms which use e 5 2.718… as their base are called natural logarithms
and can be denoted by ln x. The anti-logarithm of the natural log ln is ex.
Note: To calculate a log on the calculator press the log key and then the value.
On some calculators the value is entered first, followed by the log key. To obtain
the ex value press the ex key followed by the exponent. On some calculators the
exponent is entered first, then the ex key.
Activity 12.9
272
Notes:
• The symbol n! has no meaning if n represents anything other than a positive
whole number or zero. (23! is undefined)
• The value of 0! is defined to be 1. (0! 51)
• To obtain the factorial value from the calculator, enter the value followed by
the x! or n! key.
Activity 12.10
13!
2.
10! 5
20!
8!12!
3. 5
Thus we write 1 1 2 1 3 1 4 1 … 1 n as x
i
i51
This means the sum of all the x values from 1 through n. This index system
must be used whenever only part of the available information is to be used. In
statistics, however, we will usually use all the available information and the
notation will be adjusted by doing away with the index system.
n
x
i51
i
will become Sx if all the data is used.
Activity 12.11
273
12.11 Fractions
A fraction is a number that can represent part of a whole and is denoted by:
a
b if a and b are integers and b ≠ 0.
• The numerator (a) at the top tells you how many parts of the whole are
actually used.
• The denominator (b) at the bottom tells you how many parts the whole has
been divided into. If the numerator is larger than the denominator, we have
an improper fraction. If the numerator is smaller than the denominator, we
have a proper fraction.
• A horizontal line drawn between the numerator and denominator separates
them. This horizontal line indicates that the numerator value must be divided
by the denominator value to find a single numerical result. A decimal can
thus be found by dividing a numerator by a denominator.
• Converting the fraction to a decimal before performing the arithmetic
operation can save computational time.
Example 12.3
3
The fraction
4 indicates 3 is to be divided by 4 as follows:
3
3 4 4 5 0.75 or 4 5 0.75
• Division into 0 is mathematically undefined and the fraction will always equal 0.
• When multiplying or dividing both the numerator and the denominator by
the same value, the fraction value does not change.
{ 5
12 .
}
12
5 5
• A fraction can be reduced to its lowest terms by dividing both the numerator
and denominator by a common factor.
{ 5 4 5
10 4 5
5 12 }
• To add or subtract fractions with the same denominator add or subtract the
numerators and write the sum over the common denominator.
3
15 5
5 1
45
274
{ }
5
5 20
12
3 43 5
12 3 36
4
Note: Fraction calculations can be done on the calculator if the fraction key ab/c
is available.
Activity 12.12
5 1
b)
20 2 5 5
2 3
c)
6 3 9 5
275
5 1
d)
24 4 6 5
Example 12.4
Hundred thousandths
Hundred thousands
Ten thousandths
Ten thousands
Thousandths
Hundredths
Thousands
Millionths
Hundreds
Decimal
Millions
Tenths
Units
Tens
There are 5 millions (5 000 000), 4 hundred thousands (400 000), 3 ten
thousands (30 000), 2 thousands (2 000), 3 hundreds (300), 6 tens (60),
4 units (4), 3 tenths (0.3), 1 hundredth (0.01), 0 thousandths (0.000), 8
ten thousandths (0.0008), 7 hundred thousandths (0.00007), 5 millionths
(0.000005).
276
Activity 12.13
Example 12.5
0 0 1 3 • 4 3 5
4 3 5
0 thousands plus 0 hundreds plus 1 ten plus 3 units plus
10 plus plus
100
1 000
435
5 13
1 000
or 13.435
Note: The two zeros in the example representing the first two cells are usually
not indicated as leading numbers. They have simply been put into the example
to indicate the positions that the different numbers take up in the value 13.345
from a decimal perspective.
277
Activity 12.14
Scientific notation:
1. 76 591 5 7.6591 3 104
2. 7 000 000 5 7 3 106
3. 0.0238 5 2.38 3 10–2
4. 0.00006 5
5. 891 245 5
6. 6 million 5
Activity 12.15
278
may not, however, be more decimals to the right of the decimal point than in
the values being processed – that means you can’t be more accurate than the
least accurate value of the given data. To correct such an instance requires
a ‘rounding off ’ to the appropriate number of decimal places, for example
1.2 3 0.54 5 0.648 ≈ 0.6 (the least accurate value in the original data is 1.2;
therefore the answer must be rounded to the nearest tenth).
Example 12.6
279
Activity 12.16
1. A company reports a profit figure last year of R1 078 245.67. Round this
figure to the:
a) nearest million
b) nearest thousand
c) nearest unit
d) nearest tenth
2. Round 539.345 to the nearest hundredth.
3. Round 4.2355 to the nearest thousandth.
4. Round 5.009 to the nearest hundredth.
280
Example 12.7
281
Activity 12.17
Example 12.8
Activity 12.18
282
Example 12.9
Activity 12.19
283
Activity 12.20
284
13 ratios
This unit deals with applying the concept of percentage and ratio calculations
in business.
‘Percentage’ is derived from the Latin per centum meaning ‘per hundred’ and uses
the symbol %. It is a universal basis for comparison whereby a value is expressed as
to how much of 100 such a value represents. The basis of comparison is therefore
always 100. Once a value is expressed in terms of its portion of 100, the result is
indicated as a percentage by adding the percentage sign ‘%’ after the result.
Example 13.1
75
75% 5
100 5 0.75
Activity 13.1
Example 13.2
6
1. Express 20 as a percentage:
6
20 3100 5 30%
Activity 13.2
1
1. Express
20 as a percentage.
2. Express 0.033 as a percentage.
286
Example 13.3
Calculate 5% of R200:
5
100
3 R200 5 R10
Activity 13.3
Example 13.4
What % is 8 of 18?
8
18 3 100 5 44.44%
Activity 13.4
287
Example 13.5
90
14 3 100 5 642.86
Activity 13.5
If you receive R4 600 simple interest on an investment earning 9%, how much
did you invest?
decrease
• percentage decrease 5
base value
3 100
Example 13.6
The daily sales have increased from R5 000 per day to R5 500 per day – that is a
difference of R500. The percentage rate increase is:
500
5 000
3 100 5 10%
288
Activity 13.6
When the tenants decided to vacate the premises they expected to receive their
entire deposit of R2 500, but instead received only R2 000. What percent of
their deposit was kept by the landlord?
Example 13.7
1. If you earn R500 a week and you have 15% payroll deductions, how much
do you take home per week?
15
Your deductions are:
100
3 500 5 75
You take home: R500 2 R75 5 R425
Alternatively, you can say if the deductions are 15%, you take home 85%
of R500.
85
100 3 500 5 R425
2. This month’s sales exceeded those of last month by 12%. If last month’s
sales were R26 521, calculate this month’s sales.
Percentage amount with last month’s sales as base
12
100
5 3 26 521 5 R3 182.52
Alternatively we can say that if last month’s sales are the base (100%), then
this month’s sales would be 112% (that is 100% 1 12%).
112
This month’s amount 5 100 3 26 521 5 R29 703.52
289
Activity 13.7
1. The total mass of a packaged article is 25% greater than its nett mass of
6 kg.
Determine the total packaged mass.
2. If an article weighs 9 kg after it is packaged, and the increase in mass is
20%, how much did the article weigh before packaging?
Example 13.8
1. In comparing two drums of paint, one with a volume of 20 litres and the
other a volume of 50 litres, we can say that the ratio between the two drums
is 20 : 50 or 2 : 5. That means that the small drum has 25 the volume of the
big drum, which is also 25 3 100 5 40%. Alternatively we can say that the
5
volume of the big drum is 2 5 2.5 times more than the small drum or 2.5 3
100 5 250% more.
2. If examinations normally result in a failure rate of 7 per 200 students,
the number of failures that can be expected if 800 students write the
examination is:
7
200
3 800 5 28 students
Activity 13.8
290
much from training, and the rest from tournaments. If his total income for
the year is R340 000, how much did he get from each source?
13.3 Business applications
Although percentages have many applications in many disciplines, in the
manufacturing, retail and wholesale environment they are usually applied to
pricing of goods or services, to determine final prices after adding profit margins
or allowing for discounts. Percentages can also be used in stock control levels.
The cost price is the price a wholesaler or retailer paid for a product or service
excluding the VAT, or the cost to the manufacturer of manufacturing the product
from scratch.
The selling price is the price for which a product or service is sold.
Example 13.9
ABS Manufacturers produce classroom desks used in schools. The cost price of a
desk is R120.00 and ABS adds a mark-up of 40% to this price to determine the
selling price per unit.
The selling price per unit is therefore: R120.00 1 ( 40
)
3 120 5 R168.00
100
ABS will now sell these desks to Tablecor (Pty) Ltd at R168.00 1 VAT per desk.
A desk will cost Tablecor R168.00 1 ( 14
100 )
3 168 5 R191.52
Assume that Tablecor decides to add its own profit or mark-up. Tablecor now
sells the desks to the Education Department for use in schools. The cost price of a
desk for Tablecor is R168.00 (because they reclaim the VAT that they have paid
as input VAT). Tablecor uses a mark-up percentage of 20%.
The price at which Tablecor will then sell each desk is:
R168.00 1 ( 20
100 )
3 168 5 R201.60, excluding VAT.
Adding VAT to the selling price means that the final consumer will pay:
R201.60 1 ( 14
100 )
3 201.60 5 R229.82 per desk.
291
Activity 13.9
Example 13.10
A carpenter is eligible for a 15% trade discount on all purchases from a wholesaler.
If R2 400 was the total list price of goods purchased, how much did the goods
cost the carpenter?
(
R2 400 2 R2 400 3
15
)
100
5 R2 040
Activity 13.10
292
Example 13.11
ABC Stores sells 20 cases of soft drinks to a customer. ABC applies the VAT
excluded method to charge VAT. That means that VAT is not part of the marked
price, but is added on at the end of the invoice. The price per case is R120.00 and
there are no discounts.
1. How much does the purchase cost excluding VAT?
20 cases 3 R120.00 per case 5 R2 400.00
2. How much VAT is charged on this purchase?
14
R2 400 purchase amount 3 14% VAT 5 R2 400 3 100 5 R336.00
3. How much must the customer pay ABC Stores?
R2 400 purchase amount 1 R336 VAT 5 R2 736.00
Example 13.12
Smart Stores advertises garden furniture sets at R3 200.00 per set (VAT inclusive).
1. What is the price of a set exclusive of VAT?
The price quoted includes VAT and therefore equates to 100% 1 14% 5 114%
3 200
Price without VAT 5 114
5 R2 807.02
2. How much VAT is charged per set?
VAT charged per set 5 R3 200 2 R2 870.02 5 R392.98
293
Example 13.13
Sarah purchases a hair dryer from Nice Appliances for R110.00 excluding VAT.
She negotiates a 10% discount for cash payment.
1. What does she pay for the hair dryer before VAT?
100% less 10% discount 5 90%
90
110 3 100 5 R99.00
2. What is the discount amount?
R110.00 2 R99.00 5 R11.00
3. How much VAT does she pay?
14
R99.00 3 100 5 R13.86
4. What is the total invoice amount?
R99.00 1 R13.86 5 R112.86
Activity 13.11
TEST YOURSELF 13
1. At the end of 2001 there were 101 stores open in South Africa – 39 in
Gauteng, 33 in Cape Town, 19 in Natal and 10 in the Free State. Find the
percentage of each in relation to the total.
2. Each section in a department store is given a target for the year, with Jack’s
section targeted for an increase of 25% over last year’s results. If last year’s
sales were R1.5 million, what was Jack’s targeted sum?
3. A salesman’s commission makes up 13% of his total weekly income. If his
commission is R948 for a particular week, what is his total income?
294
295
Smart City returns four refrigerators because they are damaged and requests
a credit for the returned goods. Freezer has also granted Smart City 10% trade
discount on the order. The discount has already been included in the invoice.
a) How much did a refrigerator originally cost?
b) How much discount did Smart City get per refrigerator?
c) How much VAT is included in the original invoice?
d) How much credit must Smart City get, excluding VAT?
e) By how much must Freezer adjust the VAT charged?
f) How much will Smart City now have to pay Freezer Manufacturers?
19. Smart Stores sells fashion clothing directly to the public. The prices of all
items are inclusive of VAT. The price reflected on the price tag of a garment
is the price to be paid. Agnes purchases three dresses at R160.00 each and
a track suit for R210.00.
a) How much does she have to pay Smart Stores for the purchases?
b) What were the prices of the dresses and track suit before Smart Stores
added the VAT?
c) How much VAT did she pay on the whole transaction?
d) How much will Agnes have to pay if Smart Stores grants her 10%
discount on the dresses and 8% discount on the track suit?
e) How much VAT will Smart Stores add to the transaction if they grant the
discounts above?
20. Eight slabs of chocolate cost R32. Find the cost of three slabs of chocolate.
21. John takes 30 minutes to walk from his home to school at a speed of 4 km/h.
How long will he take if he cycles at 10 km/h?
22. A lecturer takes three hours to mark the books of all the students in her
class. How long will it take three lecturers to mark the same books if they
work at the same pace?
23. It takes three markers 120 hours to mark students’ examination scripts.
Assuming they all work at the same pace, calculate how long it will take if
there are:
a) 6 markers
b) 10 markers
c) 20 markers.
24. If I travel at 50 km/h, I can do a journey in 6 hours. How long will it take the
same journey at 40 km/h?
25. A farmer buys enough chicken feed to last 200 chickens for a week. How
long will the same amount of feed last for 350 chickens? (Each chicken eats
the same amount each day.)
296
14 construction
In this unit we look at solving equations and ways to make this easier.
14.1 Graph construction
A graph shows a picture of the trend or relationship between two variables (x
and y), that is, how one quantity changes with respect to another.
The type of graph to be drawn depends on the type of data, the complexity
of the data and the requirements of the user. In this text we deal with the linear
graph only.
Two variable functions are graphed on a set of rectangular coordinate axes.
The plane formed by the coordinate axes is called the Cartesian plane. In order
to set up a Cartesian graph the following steps must be followed:
• Two lines, known as coordinate axes, are drawn at right angles dividing the
plane into four quadrants. The point where the two lines cross is known as the
origin (0).The horizontal line is known as the x axis and the vertical line as
the y axis.
• Indicate units of length or a scale on the two axes (not necessarily the same
for each one).To select a scale determine the maximum and minimum
numbers you will use for each variable and subdivide the axis in multiples
of, for example, 1, 2, 3, 5, 10 and 100 as necessary to accommodate the
maximum and minimum number of each variable. To the right of the y axis,
x is positive and to the left it is negative. Above the x axis, y is positive and
below it is negative.
• All values along the x axis are known as abscissas and are plotted below the
x axis.
• All the values along the y axis are known as ordinates and are shown to the
left of the vertical axis.
• The graph should have a title and both the axes should be labelled.
• Any point in the Cartesian plane is defined by an ordered pair of coordinates
(x, y),with the value of x always given first.
• A mathematical function assigns one value of y to each value of x within its
equation and by arbitrarily selecting values for x a corresponding value for y
can be computed with a resulting set of ordered paired coordinates (x, y).
• Each of these pairs of coordinates corresponds to a point on the Cartesian
plane and if we plot all the points, we obtain the graph of the function.
For example, the coordinate point (1, 4) is exactly 1 unit to the right of the 0
along the horizontal line and 4 units above the 0 on the vertical line.
y
3,7
7
6
2,5
5
4
1,4
3
2
1
x
23 22 21 1 2 3
21
22
23
Note: The Cartesian plane has four quadrants. While all are mathematically
important, we find that in business the most important of the four is the top
right quadrant, where both x and y have positive values.
298
Where:
a 5 y intercept – that is, the point on the y axis where the line will cut
b 5 slope or gradient.
The slope can be measured between any two points on the line, it is always the
same and can be defined as:
increase in y
increase in x
You can interpret it as follows: it is the number of units the line rises or falls
vertically (y axis) for each unit of horizontal (x) change from left to right.
When the slope (b) is positive, the line has an increasing trend and when b is
negative, the line has a decreasing trend. In business the slope is seen as the ratio
of change in y to the change in x or the marginal value.
Some examples of linear functions to measure profitability in business are:
• linear cost function
the
• the linear income function
• the linear profit function.
14.2.2 Linear cost function: C(x)
Organisations are concerned about costs because they reflect money flowing out
of the business. These costs are usually to pay for salaries, raw materials, rent,
municipal charges and so forth.
Cost is defined in terms of two components: total variable cost and total fixed
cost. These two components must be added to obtain the total cost. Variable
costs vary with the level of output. The linear cost function is:
C 5 F 1 Vx
299
Where:
C 5 total cost
V 5 variable cost per unit
F 5 fixed cost per period
Example 14.1
The y intercept tells us that the cost of producing zero units is R50 000. This
is the fixed cost. The slope tells us that for each unit that the line moves to the
right, the cost increases by R5.50. Therefore, the cost of producing one extra
unit each time is R5.50 and this is then the marginal cost of the product.
Activity 14.1
300
Example 14.2
A local car rental agent is trying to compete with some of the larger companies
and bought good second-hand cars for his fleet. He also simplified the rental rate
structure by charging a flat R125 per day for the use of the car. The total linear
revenue function is R 5 125x.
If a car was rented out for 20 days last month, what was the total revenue for
the car?
R(x 5 20) 5 125(20) 5 2 500 rand
When total revenue exceeds total costs, profit is positive and is referred to as net
gain. When total costs exceed total revenue, profit is negative and it is called net
loss or deficit.
Example 14.3
The price of a single product is R65. Variable costs per unit are R20 for materials
and R27.50 for labour. Annual fixed costs are R100 000. Construct the profit
function and determine the profit if annual sales are 20 000 units.
C(x) 5 100 000 1 47.50x
R(x) 5 65(x)
P(x) 5 65(x) 2 (100 000 1 47.50x)
5 2100 000 2 17.50x
301
Steps
1. Construct the total cost function C(x), where x represents the level of output.
2. Construct the total revenue function R(x).
3. Set C(x) 5 R(x) and solve x.
Example 14.4
A product is priced at R10 and the variable cost is R6 per unit. If total fixed costs
are R1 000, the break-even point in units of output sold is:
C(x) 5 1 000 1 6x
R(x) 5 10x
10x 5 1 000 1 6x
10x 2 6x 5 1 000
4x 5 1 000
x 5 250 units
250 units at R10 each will give a break-even income of R2 500.
TEST YOURSELF 14
302
a) Determine the number of smoke detectors that must be sold in order for
the company to break even.
b) Determine the break-even value.
c) If marketing research indicated that the firm can expect to sell
approximately 30 000 smoke detectors over the life of the project,
determine expected profits at this level of output.
2. A company produces a product which sells at a price of R25 per unit. Variable
costs are estimated to be R18.75 per unit and fixed costs are R50 000.
a) Determine the break-even level of output.
b) Compute the total cost and total revenue at the break-even point.
c) What will profit be if demand is 7 500 units?
3. A local Gauteng charity organisation is planning a one-week holiday in
Cape Town. The venture is a fund-raising effort. A package deal has been
worked out with a commercial airline whereby the charity will be charged
a fixed cost of R10 000 plus R300 per person. The R300 covers the flight
cost, airport tax, hotel and meals. The organisation is planning to price the
package at R450 per person.
a) Determine the number of persons necessary to break even on the
venture.
b) The goal of the organisation is to net a profit of R10 000. How many
people must participate for the goal to be realised?
303
UNIT
15 Interest calculations
Interest is the cost of money. When money is borrowed, the cost involved in
using the money is that the lender will be required to pay back more than was
borrowed. When capital is invested, the cost of money will be the interest the
investor receives in return.
The fact is ‘money earns money’.
15.1 Basic concepts
Interest (I) is the money paid for the use of borrowed money or money earned
when capital is invested.
The capital on which the interest is calculated at the beginning of the
transaction is called the principal (P) or present value.
The rate of interest (r) is that percentage of the principal that is to be paid for
each unit of time and is expressed as a percentage per year.
The time period (t) is the period for which the money is borrowed and is
expressed in years or a fraction of a year.
The amount to be repaid at the end of the term, that is, the principal plus the
interest, is referred to as the amount (A) or future value.
Interest can be calculated on the principal sum as:
• simple interest
• compound interest.
305
Where:
I 5 amount of interest
P 5 principal
A 5 amount
r 5 interest rate per annum expressed as a decimal
t 5 time in years or a portion of a year
Note: Exact interest is calculated on a basis of 365 days per year or 366 in a leap
year. Ordinary interest is calculated on a basis of 360 days per year or 30 days
per month.
Example 15.1
1. Thandi borrows R5 000 from Simon. Thandi must repay the R5 000 before
the end of 12 months and the interest is 15% per year.
How much must Thandi pay Simon after 12 months?
I 5 Prt
5 5 000(0.15)(1)
5 750
A5P1I
5 5 000 1 750
5 R5 750
306
3. B borrows R500 from A and at the end of eight months pays A an amount
of R525.
What is the simple rate of interest earned?
I 5 Prt
25 5 500 (r) ( )
8
12
r 5 0.075 5 7.5%
Activity 15.1
15.3 Compound interest
When interest is not paid out at the end of each period but continuously added to
the principal, the principal is continuously increasing and we say the interest is
compounded. This means that interest calculated in period one on the principal
amount is added to the principal amount so that the interest calculated in period
two is calculated on the increased balance.
Interest can be compounded annually (once a year), semi-annually (twice a
year), quarterly (four times a year), monthly (12 times a year), or even daily
(365 times per year). If interest is compounded, the interest rate, quoted as
a yearly rate, should be adjusted to a period rate. The time period, which is
normally quoted in years, should be adjusted to the number of interest periods
per transaction.
• For example, if interest is compounded quarterly, and the time period is five
307
annual rate
i 5
number
per year
A 5 P(1 1 i)n
i 5 ( AP ) 2 1
1n
A
log
log(1 P+ i)
t=
Where:
A 5 amount or future value
P 5 principal or present value
i 5 interest rate per period within a year expressed as a decimal
n 5 number of times per year interest must be calculated
Example 15.2
1. Simon lends R1 000 to Thandi at a rate of 15% per annum calculated
monthly. What is the amount she must repay at the end of two years?
The interest rate of 15% is the interest that is charged for the year.
However, if the interest is to be calculated monthly, then the annual interest
rate (15%) must be converted to a monthly interest rate by dividing by 12:
15%
i5 12 5 1.25%
The two-year time period should change to n 5 12 3 2 5 24
A 5 1 000(11 i)24 5 1 000(1.0125)24 5 R1 347.35
Amount of interest paid:
R1 347.35 2 R1 000 5 R347.35
308
money should be invested if the money will earn 8% per year compounded
semi-annually and how much interest will be earned over the period?
i 5 4% n 5 30
A 5 P(1 1 i) n
300 000 5 P 1 1
300 000
4 30
100 (
)
P 5 3.2434
5 R92 495.53
I 5 300 000 2 92 495.53 5 R207 504.47
3. Determine the interest rate on a study loan which would increase its value
from R36 000 to R50 000 in five years if the interest is compounded
monthly.
( )
A
i 5 1n
P 2 1
5 ( ) 2 1
50 000 1
60
36 000
5 0.0055
Monthly rate is 0.0055 312 3100 5 6.6%
4. How long will it take for R20 to amount to R30 at 5% compounded quarterly?
A
log
log(1 P1 i)
t5
30
log
20
5
1.25
log(1 1
100 )
Activity 15.2
1. Find the present value of R2 000 due in 18 months if money is worth 11%
compounded semi-annually.
2. R800 is invested in an account which earns 10% compounded quarterly.
Calculate the amount in the account at the end of five years and how much
interest will be earned.
3. Find the time in which R1 000 will amount to R1 500 at 4% compounded
monthly.
309
4. Find the rate of interest, compounded quarterly, at which R4 400 will
amount to R8 500 in 16 years.
Example 15.3
Activity 15.3
15.5 Annuities
An annuity is a sequence of equal payments made at equal time intervals, such
as instalment payments, pensions, insurance premiums, home loan payments,
rent, etc. The time between successive payments (R) is called the payment
interval and the time between the first payment and the last payment is called
310
the term of the annuity. The payment interval and the interest period always
coincide, which means that if the interest is compounded monthly, the payments
will be monthly.
Annuities are classified into two main classes:
• Ordinary annuities certain refer to annuities where the regular payments
are made at the end of each payment interval.
• Ordinary annuities due refer to annuities where the periodic payment (R)
falls at the beginning of each payment interval.
( 1 2 (1 1 i) )
2n
P 5 R
i
Example 15.4
1. Determine the amount of an annuity certain of R150 per month for three
years if money is worth 12% compounded quarterly.
(1 1 i)n 2 1
A 5 R
i
3
(1 1
)12
5 150
100
5 2 128.80 rand
3
100
2. A student needs R3 000 a year for books for four years with the first R3 000
available one year from now. If the student can get 8% p.a. return on
investment, how much money should he invest now?
( 1 2 (1 1 i) )
2n
P 5 R i
( )
8
1 2 (1 1 100
)24
5 3 000
8
5 9 936.38 rand
100
311
3. Arthur wants to have R6 000 in the bank in five years’ time. He plans to deposit
the correct amount at the end of each month to achieve this. What should the
value of each monthly payment be if interest is 15% compounded monthly?
(1 1 i)n 2 1
A 5 R
i
( )
1.25
(1 1
100 )60 2 1
6 000 5 R
1.25
100
6 000
R 5
88.5745
5 67.74 rand
4. A family buys a refrigerator that sells for R350. They pay R50 deposit
and the balance in 24 equal monthly payments. If the seller charges 18%
compounded monthly, how much will the monthly payments be?
( 1 2 (1 1 i) )
2n
P 5 R
i
( )
1.5
1 2 (1 1
100 )224
300 5 R
1.5
100
300
R 5
20.03
5 14.98 rand
Activity 15.4
1. At the end of every month, Julie deposits R500 in a bank account for her son.
How much will be in the account at the end of four years if it accumulates
interest at a rate of 9%?
2. John wants to accumulate R20 000 to purchase a business upon retiring from
his present job. How much must be put aside at the end of every six months for
10 years if the interest he receives is compounded semi-annually at 6%?
3. A television set is bought for R100 deposit and R580 payable at the end of
the next four quarters. What is the equivalent cash price if the rate is 27%
quarterly?
4. Mr Bones wants to buy a house costing R260 000. If he pays R26 000
deposit, how much will his monthly payments be if he gets a 20-year bond
at 14% converted monthly?
312
( [(1 1 i) 21][1 1 i] )
n
A 5 R
i
Example 15.5
( [(1 1 i) 21][1 1 i] )
n
A 5 R
i
( )
12 10 12
[(1 1
100
) 21][1 1
100 ]
5 200
12
5 3 930.92 rand
100
( )
2.5 2.5
[1 2 (1 1
100 )24][1 1
100 ]
5 60
2.5
5 231.36
100
3. The beneficiary of a life insurance policy may take R10 000 in cash or
10 equal payments, the first to be made immediately. What is the annual
payment if money is worth 12%?
( )
12 210 12
[1 2 (1 1
100 ) ][1 1
100 ]
10 000 5 R
12
100
10 000
6.3282
R5
5 1 580.23
313
4. The Bell Company plans to open a new retail outlet in its chain of telephone
equipment stores three years from today. How much must Bell invest at the
beginning of each semi-annual to have enough for the estimated costs of
R100 000 if the interest rate is 9%?
( )
4.5 4.5
[(1 1
100 )6 2 1][1 1
100 ]
10 000 5 R
4.5
100
100 000
7.0192
R5
5 14 246.64
Activity 15.5
1. The rent of a building is R15 000 per year payable in advance. If the interest
rate is 6% compounded monthly, what will the equivalent monthly rental,
payable in advance, be?
2. Mr Cute bought a car paying R2 000 deposit and R200 at the beginning of
each week for two years. If the interest rate is 9% compounded weekly, what
was the cash price of the car?
3. A school sets aside R10 000 at the beginning of each year to create a fund in
case of further expansion. If the fund earns 5%, how much does it amount
to at the end of the seventh year?
4. A debt of R5 000, inclusive of 5% interest compounded quarterly, is to be
settled within three years in equal quarterly payments. If the first payment
is due today, what will be the size of each payment?
TEST YOURSELF 15
1. Using the simple interest approach, how much interest will you pay on a
loan of R15 000 at 12.5% interest per annum? How much must you repay
after three years and six months?
2. If the amount plus interest that George had to repay at the end of a one-
year loan was R11 000 and the interest rate was 10% per annum, what
was the amount of the principal sum? Use the simple interest formula to
calculate.
314
3. A waitress who was temporarily pressed for funds pawned her watch and
diamond ring for R55. At the end of one month she redeemed them by
paying R59.40. What was the annual rate of interest?
4. A 26% interest charge on an overdue account of R800 came to R21. How
late was the account?
5. A mechanic borrowed R125 from a cash loan company and at the end of
one month paid off the loan with R128.75. What annual rate of interest
was paid?
6. At what rate will simple interest on R1 127 amount to R318 in 135 days?
7. John invests R200 at 7.75% simple interest per annum and receives R295
after a certain time. For how long was the money invested?
8. Philemon has an option of financing the purchase of a new music centre,
with a price of R800, through a loan for one year. The interest rate on the
loan is 12% per annum. He has an option of taking a loan with interest
calculated quarterly or a loan where the interest is calculated semi-annually.
Which option will you recommend to Philemon? The lender will apply the
compound interest formula.
9. The outstanding amount on your account is R2 650. If the store charges
24% interest compounded monthly, how much will you owe after three
months if no payment was made during that period?
10. A cell phone company will need R500 000 to replace a piece of equipment in
eight years. How much must be invested now at 6% compounded quarterly
to accumulate this amount?
11. If R500 amounts to R700 in five years with interest compounded quarterly,
what is the rate of interest?
12. A cash loan company charges 36% compounded monthly on small loans.
How long will the loan company take to triple its money at this rate?
13. How long will it take R4 000 to amount to R5 000 at 9% compounded
quarterly?
14. What is the effective rate of interest equivalent to 15% converted semi-
annually, quarterly and monthly?
15. What is the nominal rate of interest, compounded semi-annually and
monthly, equivalent to 24% effective?
16. Which gives the better annual return on investment: 4% compounded
quarterly, 4% converted semi-annually or 4% converted monthly?
17. A refrigerator can be bought for R50 deposit and R28 per month for 24
months, payable at the end of each month. What is the equivalent cash
price if the rate is 26%?
315
18. A company bought a machine costing R8 000 and estimates that its useful
life will be five years, after which it will be sold as scrap for R300. The
company decides to set up a reserve fund to cover the cost of a replacement
machine in five years’ time. Equal amounts are to be invested at the end of
each year in an account that earns 10% compound interest. Due to inflation,
it is estimated that the cost of this machine will be R15 000. How much
must be invested each year to cover the cost of a replacement machine,
allowing for the scrap value of the present one?
19. If money is worth 15% compounded quarterly, what single payment today
is equivalent to 15 quarterly payments of R100 each, the first due three
months from today?
20. A cash loan company charges 36% converted monthly for small loans. What
would be the payment at the end of every month if a loan of R250 is to be
repaid within one year?
21. Mr Smith invests R20 at the end of every week at 18% compounded weekly.
What amount will be in his savings account after six months?
22. A student wants to save R15 000 for a trip after graduation, four years from
now. How much must she save at the end of every six months if she gets
15% compounded semi-annually?
23. Mr T. Bone took out a R100 000 loan on a steakhouse over a 10-year period
at an interest rate of 12% compounded monthly. After 3.5 years, interest
rates climbed to 15% compounded monthly. If his repayments were made
at the end of each month, how much did Mr Bone owe at the end of the first
3.5 years? What was his monthly repayment for the remaining 6.5 years?
24. Instead of taking R5 000 from an inheritance, Peter decides to take
monthly payments for a period of five years, with the first to be made
immediately. If interest is 6% compounded monthly, what will be the size
of each payment?
25. Instead of paying R1 250 rent at the beginning of each month for the next
eight years, Mary decides to buy a flat. Considering interest of 15% to be
compounded monthly, what is the cash equivalent of the eight years’ rent?
26. At the beginning of each semester, Abdul invests R900 at an interest rate
of 7% compounded semi-annually to guarantee a sum sufficient to start a
practice for his daughter, who is entering medical school. If his daughter
finishes within eight years, how much will Abdul have for the practice?
27. Dr Kaye wants to spend five years researching a new book on the motor
industry. He calculated that he needs R12 000 a month to live on over the
five years. How much must Dr Kaye deposit today in an account earning
316
317
318
319
320
321