Statistics For Beginners 2024
Statistics For Beginners 2024
1.1.Statistical Definitions
• The number of values in the sample (sample size) is denoted by n. The number of
values in the population (population size) is denoted by N.
Examples
1
• Nominal scale is a level of measurement which classifies data into categories in
which no order or ranking can be imposed on the data.
• A variable can be treated as nominal when its values represent categories with no
intrinsic ranking.
Examples
➢ Gender
➢ Region of residence
➢ Religious affiliation.
• Ordinal scale – Level of measurement which classifies data into categories that can
be ordered or ranked.
• A variable can be treated as ordinal when its values represent categories with some
intrinsic order or ranking.
Examples
• Discrete variables are variables that can assume a finite or countable number of
possible values. Such variables are usually obtained by counting.
Examples
➢ A person’s response (agree, not agree) to a statement. A one (1) is recorded when the
person agrees with the statement, a zero (0) is recorded when a person does not agree.
• Continuous variables are variables that can assume an infinite number of possible
values. Such variables are usually obtained by measurement.
Examples
2
➢ The weight of a person.
• Interval scale is a level of measurement which classifies data that can be ordered and
ranked and where differences are meaningful. However, there is no meaningful zero
and ratios are meaningless.
• Interval data is generated mainly from rating scales, which are used in survey
questionnaires to measure respondents’ attitudes, motivations, preferences and
perceptions.
• Ratio scale is the level of measurement where differences and ratios are meaningful
and there is a natural zero.
• Variables like height, weight, mark (in test) and speed are ratio variables.
Examples
Examples
3
1.3. Sampling methods
•
Sampling frame (synonyms: "sample frame", "survey frame") – This is the actual set
of units from which a sample is drawn.
Example
Consider a survey aimed at establishing the number of potential customers for a new
service in a certain city. The research team has drawn 1000 numbers at random from a
telephone directory for the city, made 200 calls each day from Monday to Friday from
8am to 5pm and asked some questions.
➢ In this example, the population of interest is all the inhabitants in the city. The
sampling frame includes only those city dwellers that satisfy all the following
conditions:
• The sampling frame in this case definitely differs from the population. For example, it
underrepresents the categories which either have no telephone (e.g. the most poor),
have an unlisted number, and who were not at home at the time of calls (e.g.
employed people), who don't like to participate in telephone interviews (e.g. more
busy and active people). Such differences between the sampling frame and the
population of interest is a main cause of bias when drawing conclusions based on the
sample.
➢ Simple random sampling – Sampling in which each sample of a given size that can
be drawn will have the same chance of being drawn.
4
Examples
1) The 6 winning numbers (drawn from 49 numbers) in a Lotto draw. Each potential sample
of 6 winning numbers has the same chance of being drawn.
2) Each name in a telephone directory could be numbered sequentially. If the sample size was
to include 2 000 people, then 2 000 numbers could be randomly generated by computer or
numbers could be picked out of a hat. These numbers could then be matched to names in the
telephone directory, thereby providing a list of 2 000 people.
Solutions
Suppose the first 6 random numbers in the table of random numbers are:
10480, 22368, 24130, 42167, 37570, 77921.
The 49 numbers from which the draw is made all involve 2 digits i.e. 01, 02, . . . , 49.
Putting the above numbers from the table of random numbers next to each other in a string of
digits gives: 10 48 02 23 68 24 13 04 21 67 37 57 07 79 21 .
The winning numbers can be selected by either taking all pairs of digits between 01 and 49
(discarding any numbers outside this range or repeats) by working from left to right or right
to left in the above string.
By working from left to right the winning numbers are: 10, 48, 2, 23, 24 and 13. By working
from right to left the winning numbers are: 21, 7, 37, 21, 4 and 13.
• The advantage of simple random sampling is that it is simple and easy to apply
when small populations are involved. However, because every person or item in a
population has to be listed before the corresponding random numbers can be read,
this method is very cumbersome to use for large populations and cannot be used
if no list of the population items is available. It can also be very time consuming
to try and locate every person included in the sample. There is also a possibility
that some of the persons in the sample cannot be contacted at all.
5
Examples
1) A manufacturer might decide to select every 20th item on a production line to test for
defects and quality. This technique requires the first item to be selected at random as a
starting point for testing and, thereafter, every 20th item is chosen.
2) A market researcher might select every 10th person who enters a particular store, after
selecting a person at random as a starting point; or interview occupants of every 5th house in
a street, after selecting a house at random as a starting point.
• A general problem with random sampling is that you could, by chance, miss out a
particular group in the sample. However, if you subdivide the population into
groups, and sample from each group, you can make sure the sample is
representative. Some examples of strata commonly used are those according to
province, age and gender. Other strata may be according to religion, academic
ability or marital status.
Example
In a study investigating the expenditure pattern of consumers, they were divided into low,
medium and high income groups.
When sampling is proportional to size (an income group comprises the same percentage of
the sample as of the population) the sample sizes for the strata should be calculated as
follows.
6
➢ Cluster sampling – this is a sampling method in which you divide a population into
groups (clusters) such as districts or schools, and then randomly select some of these
clusters as your sample.
NB: in stratified random sampling method, elements within each stratum are sampled
whereas in cluster sampling only selected clusters are sampled.
➢ Convenience sampling.
➢ Quota sampling.
➢ Snowball sampling.
➢ Judgement sampling.
➢ Determine the smallest unit of measurement (um). This is the smallest unit
in which data can be measured.
➢ Determine the range (R). This is the difference between the maximum value
and the minimum value in the data set.
➢ Determine the number of classes (k). We use Sturge’s rule:
𝑘 = 1 + 3.3 log 𝑛. Always round up to the next whole number.
➢ Determine the class length (l). We divide the range by the number of classes
and round up to the nearest unit of measurement.
➢ Determine the class boundaries (cb) to determine the minimum and
maximum values that the observations in each class will be. For the first class:
1
▪ Lower class boundary= 𝑥𝑚𝑖𝑛 − 2 (𝑢𝑚)
▪ Add the class length to obtain the upper boundary for the first class,
which will be the lower boundary for the second class, and so on…
➢ Determine the class frequencies. The class frequencies are the number of
observations that belong to each class.
7
Exercise
4.21 5.55 3.02 5.13 4.77 2.34 3.54 3.20 4.5 6.1
0.38 5.12 6.46 6.19 3.79
Arrange this data into a frequency distribution table (clearly show the class
boundaries and the frequencies)
• Histogram
➢ Class boundaries on the horizontal axis.
➢ Class frequencies on the vertical axis.
➢ Bars are constructed on the horizontal axis from one class boundary to the
next with a height equal to the frequency of the respective class.
• Frequency polygon
• Ogive
• Three characteristics are commonly used to describe the data profile of a variable.
These are:
o measures of location (both central and non-central)
o measures of spread (or dispersion)
o a measure of shape (skewness).
8
• Location refers to the where the data values are concentrated.
• Central location is a representative ‘middle’ value of concentration of the data, while
non-central location measures identify relevant ‘off-centre’ reference points in the
data set (such as quartiles).
• Dispersion refers to the extent to which the data values are spread about the central
location value.
• Finally, skewness identifies the shape (or degree of symmetry) of the data values
about the central location measure.
➢ To illustrate, an electronic goods company has recorded the daily sales (in
rand) over a 12-month trading period.
o The average daily sales are a measure of central location, while the extent
to which daily sales vary around the average daily sales would be a
measure of dispersion. Finally, a measure of skewness would identify
whether any very large or very small daily sales values relative to the
average daily sales have occurred over this period.
• A central location statistic is a single number that gives a sense of the ‘centrality’ of
data values in a sample.
• These statements all refer to a typical or central data value used to represent where the
majority of data values lie. They are called central location statistics.
• Three commonly used central location statistics are: the arithmetic mean (also called
the average) (b) the median (also called the second quartile, the middle quartile or the
50th percentile) (c) the mode (or modal value).
• All three measures (mean, median and mode) can be used for numeric data, while
only the mode is valid for categorical data.
1
➢ Raw data: 𝑥̅ = 𝑛 ∑𝑛𝑖=1 𝑥𝑖 , where 𝑥𝑖 are the observations from the data set and
𝑛 is the number of observations in the data set.
9
1
➢ Grouped data: 𝑥̅ = 𝑛 ∑𝑘𝑖=1 𝑓𝑖 𝑋𝑖 , where 𝑓𝑖 is the frequency of the 𝑖 𝑡ℎ class and
𝑋1 , 𝑋2 , … , 𝑋𝑘 are the class midpoints of the frequency distribution.
➢ Raw data :
➢ Grouped data:
𝑛
o First add the cumulative frequency column and find the median position 2,
𝑛
𝑐( −𝑓(<))
2
Then use the formula: 𝑚𝑒 = 𝑂𝑚𝑒 + 𝑓𝑚𝑒
where 𝑂𝑚𝑒 is the lower limit of the median interval, 𝑓𝑚𝑒 is the frequency of the
median interval, 𝑐 is the class width, 𝑛 is the number of observations and 𝑓(<) is the
cumulative frequency for the class before the median interval.
• An 𝑖 percentile is a numerical value that separates the bottom ∝ percent values in the
data set from the top 100 − 𝑖, for example, the first quartile is referred to as the 25th
percentile as it separates the bottom 25% of the values in the data set from the top
75% of the values.
• Calculating percentiles (𝑃𝑖 )
➢ Raw data :
We first arrange data in the ascending order and find the 𝑃∝ using the
𝑖
operation (𝑛 + 1). Suppose that the position is a numerical value 𝑎, 𝑏
100
o The percentile value is then calculated as follows:
10
𝑃𝑖 = 𝑎𝑡ℎ value + 0. 𝑏 (value after the 𝑎𝑡ℎ value − 𝑎𝑡ℎ value)
➢ Grouped data :
𝑖
We find cumulative frequencies and find the 𝑃𝑖 position given by 100 (𝑛) to
identify the 𝑃𝑖 class, then use the formula:
𝑖𝑛
𝑐 (100 − 𝑓(<))
𝑃𝑖 = 𝑂𝑝𝑖 +
𝑓𝑝𝑖
where 𝑂𝑝𝑖 is the lower limit of the percentile class, 𝑓𝑝𝑖 is the frequency of the
percentile interval, 𝑐 is the class width and 𝑓(<) is cumulative frequency of
classes before the percentile class.
• The degree to which numerical data spread about the average value is called
variation or spread. There are various measures that are used to measure dispersion,
the most common ones are:
➢ Range (R)
➢ Interquartile range (IQR)
➢ Stand deviation (s)
➢ Variance (𝑠 2 )
➢ Coefficient of variance (CV)
• The range is the difference between the highest and the lowest value in the data set as
discussed earlier. The range is usually only calculated for the ungrouped data.
• The interquartile range is the difference between the upper quartile (𝑄3 /𝑃75 ) and the
lower quartile (𝑄1 /𝑃25 ), that is : 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 . The interquartile range is also
usually calculated for the ungrouped data.
• The standard deviation is most commonly used measure of spread about the mean. It
has an advantage that it uses all the data values in its calculation. It is calculated as
follows:
➢ Raw data:
𝑛
1
𝑠=√ (∑ 𝑥𝑖 2 − 𝑛𝑥̅ 2 )
𝑛−1
𝑖=1
where 𝑥𝑖 are the observations in the data set and 𝑥̅ is the arithmetic mean.
11
o We can also calculate the standard deviation using a scientific calculator.
➢ Grouped data:
𝑘
1
𝑠=√ (∑ 𝑓𝑖 𝑋𝑖 2 − 𝑛𝑥̅ 2 )
𝑛−1
𝑖=1
where 𝑓𝑖 are the class frequencies and 𝑋𝑖 are the class midpoints.
Symmetrical Distribution
• A histogram is symmetrical if it has a single central peak and mirror image slopes on
either side of the centre position. It is also called a bell-shaped curve or a normal
distribution.
• If a distribution is symmetrical, all three central location measures (mean, median and
mode) will be equal and therefore any one of them could be chosen to represent the
central location measure for the sample data.
• A histogram is positively skewed (or skewed to the right) when there are a few
extremely large data values (outliers) relative to the other data values in the sample.
• A positively skewed distribution will have a ‘long’ tail to the right.
• The mean is most influenced (‘inflated and distorted’) by the few extremely large data
values and hence will lie furthest to the right of the mode and the median. The
12
median is therefore preferred as the representative measure of central location in
right-skewed distributions.
• A histogram is negatively skewed (or skewed to the left) when there are a few
extremely small data values (outliers) relative to the other data values in the sample.
• A negatively skewed distribution will have a ‘long’ tail to the left.
• The mean, again, will be most influenced (‘deflated and distorted’) by the few
extremely small data values and hence will lie furthest to the left of the mode and the
median. The median is therefore preferred as the representative measure of central
location in left-skewed distributions.
• From the box plot, it is easy to read the range of the data (between the minimum
and the maximum data values) and the spread of data values between the quartiles
and median. It also highlights the degree of skewness in the data.
➢ On a horizontal number line, construct a box between the lower and upper quartile
numeric positions.
➢ Mark the median inside the box at its numeric value position on the number line.
➢ Draw a horizontal line from the minimum value position to the Q1 position. This is
called the lower whisker.
➢ Draw another horizontal line from the Q3 position to the maximum value position.
This is called the upper whisker.
13
➢ If the upper whisker and the box is ‘stretched’ at the upper end of the box plot, then
the histogram is positively skewed with a few extremely large values causing the
skewness.
• An outlier (or extreme value) is any data value that lies either
➢ below a lower limit of 𝑄1 − 1.5(𝑄3 − 𝑄1 )
➢ or above an upper limit of 𝑄3 + 1.5(𝑄3 − 𝑄1 )
Example
Construct a box plot based on the five-number summary table values are: Minimum = 33,
first quartile = 43, Median = 47, third quartile = 54 kWh and Maximum = 61
EXERCISES
14
3 Consider the data set below:
Summarize this the data above using a frequency distribution table (clearly
show the class boundaries and the frequencies). Show all your workings.
4.1.1 mean.
4.1.2. mode.
15
4.1.8. Hence, or otherwise comment on skewness of this data.
5 Repeat all the exercises in Question 4 above using the table below:
The frequency distribution table below shows the heights (in centimetres)
of selected tomato trees in a vegetable farm in Vryheid.
Heights Frequencies
30.5−<33.5 15
33.5−<36.5 7
36.5−<39.5 23
39.5−<42.5 30
42.5−<45.5 15
45.5−<48.5 13
16
3. BASIC PROBABILITY CONCEPTS
3.1 Introduction
In most cases in real life, decisions are made under conditions of uncertainty. Probability
theory provides the foundation for quantifying and measuring uncertainty. It is used to
estimate the reliability in making inferences from samples to populations, as well as to
quantify the uncertainty of future events. It is therefore necessary to understand the basic
concepts and laws of probability to be able to manage uncertainty.
• A probability is the chance, or likelihood, that a particular event will occur.
Although we cannot foretell what the outcome of any single repetition of the experiment
will be, we must be able to list the set of all possible outcomes of the experiment. In
general, random experiments must be capable, in theory at least, of indefinite repetition.
It must also be possible to observe the outcome of each repetition of the experiment.
• The set of all possible outcomes of a random experiment is called the sample space of
the random experiment.
We usually use the letter S to denote the sample space. Each repetition of the procedure
for the random experiment is called a trial, and gives rise to one and only one of the
possible outcomes.
The following are examples of random experiments and their sample spaces:
• We toss a coin. We can list the set of possible outcomes: S = {heads,tails}. We can
repeat the experiment endlessly, and we can observe the result of every trial.
• A phone number is chosen at random. The number is dialled, and the person who
answers is asked whether he/she is currently watching television. If the telephone is
unanswered after 45 seconds, the outcome, “no reply”, is recorded. The set of
possible outcomes, the sample space, is
17
positive numbers plus zero — the bulb might not burn at all). The sample space is
thus
S = {t | t ≥ 0}.
• A die is thrown out onto the table. The dots on the upturned face are counted. The
sample space is
S = {1, 2, 3, 4, 5, 6}.
• In a survey of traffic passing a particular point on the N2 North bound, a time period
of one minute is chosen at random, and the number of vehicles that pass the point in
the minute is counted. The possible outcomes are the integers, including zero,
therefore
S={0, 1, 2, 3,...}.
where:
• A = event of a specific type (or with specific properties)
• r = number of outcomes of event A
18
• n = total number of all possible outcomes (called the sample space)
• P(A) = probability of event A occurring
For example, if cash, cheque, debit card or credit card (i.e. k = 4) are the only
possible payment methods (events) for groceries, then for a randomly selected
grocery purchase, the probability that a customer pays by either cash, cheque, debit
card or credit card is: P(𝐴1 = cash) + P(𝐴2 = cheque) + P(𝐴3 = debit card) + P(𝐴4 =
credit card) = 1.
• Complementary probability: If P(A) is the probability of event A occurring, then the
probability of event A not occurring is defined as P(𝐴̅) = 1 − P(A). For example, if
there is a 7% chance that a part is defective, then P(a defective part) = 0.07 and P(not
a defective part) = 1 − 0.07 = 0.93.
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
• The essential feature of the conditional probability is that the sample space is reduced
to the set of outcomes associated with the given prior event B only. The prior
information (i.e. event B) can change the likelihood of event A occurring.
• If two events are mutually exclusive, they cannot occur together in a single trial of a
random experiment. If events A and B are mutually exclusive, then 𝑃(𝐴 ∩ 𝐵) = 0.
19
The addition rule relates to the union of events. It is used to find the probability of
either event A or event B, or both events occurring simultaneously in a single trial of
a random experiment.
• If two events are not mutually exclusive, they can occur together in a single trial of a
random experiment. Then the probability of either event A or event B or both
occurring in a single trial of a random experiment is defined as:
• Multiplication rule for Statistically Independent events: If two events A and B are
statistically independent (i.e. there is no association between the two events) then the
multiplication rule reduces to the product of the two marginal probabilities only.
1.1 a male.
20
2 Determine whether events that an employee is a female and that she is an engineer
are statistically independent.
More Exercises
3. An apple cooperative in Elgin, Western Cape receives and groups apples into
A, B, C and D grades for packaging and export. In a batch of 1 500 apples,
795 were found to be grade A, 410 were grade B, 106 were grade C and the
rest grade D. If an apple is selected at random from the batch, what is the
likelihood that it is neither of grade B nor D?
5 Let 𝐴 and 𝐵 be events in a sample space 𝑆 such that 𝑃(𝐴̅ ) = 0.6, 𝑃(𝐵) = 𝑥
and 𝑃(𝐴 ∪ 𝐵) = 0.5.
6 Let 𝐴 and 𝐵 be events in a sample space, 𝑆. Let 𝑃(𝐴) = 0.35, 𝑃(𝐵) = 0.6
and 𝑃(𝐴|𝐵) =0.52.
6.2. 𝑃(𝐴|𝐵̅).
21
6.3. Are 𝐴 and 𝐵 mutually exclusive? Substantiate your answer.
8. The two-way table below shows the IQ rating as well as the creativity rating of 250
individuals in a psychological study.
8.2. Find the probability that a randomly selected individual from this study
will be classified as:
8.2.2. having a low IQ, given that she has high creativity.
8.3. Let A be an event that a randomly selected individual has a low IQ and let
B be an event that a randomly selected individual has high creativity.
22
3.4. Bayes Theorem
A very useful tool for finding conditional probabilities is Bayes’ theorem, which connects
P(B |A) with P(A |B), named in honour of Rev. Thomas Bayes, who did pioneering work in
probability theory in the 1700’s.
𝑃(𝐵|𝐴)∙𝑃(𝐴)
𝑃(𝐴|𝐵) =
𝑃(𝐵|𝐴)∙𝑃(𝐴)+𝑃(𝐵|𝐴̅ )𝑃(𝐴̅ )
Example: The miners are out on strike, with a list of demands. Negotiators reckon that if
management meets one of the demands, the probability that the strike will end is 0.85. But if
this demand is not met, the probability that the strike will end is 0.08. You assess the
probability that management will agree to meet the demand as 0.3. Later you hear that the
strike has ended. What is the probability that demand was met?
Suppose that 𝐴1 , 𝐴2 , …,𝐴𝑛 are mutually exclusive events whose union is the sample space 𝑆
and 𝑃(𝐴𝑖 ) > 0. Then, for any event 𝐵 with 𝑃(𝐵) > 0 and any 𝑘 = {1,2, … , 𝑘}, we have
𝑃(𝐵|𝐴 )×𝑃(𝐴𝑘 )
𝑃(𝐴𝑘 |𝐵) = ∑ 𝑃(𝐵|𝐴𝑘
𝑘 )×𝑃(𝐴𝑘 )
Example
A family has two dogs (Rex and Rover) and a cat called Garfield. None of them is fond of the
postman. If they are outside, the probabilities that Rex, Rover and Garfield will attack the
postman are 30%, 40% and 15%, respectively. Only one is outside at a time, with
probabilities 10%, 20% and 70%, respectively. If the postman is attacked, what is the
probability that Garfield was the culprit?
23
Exercises
1 The probability that a student passes Statistics is 0.8 if he studies for the exam and 0.3
if he does not study. If 60% of the class studied for the exam, and a student chosen at
random from the class passes, what is the probability that he did not study?
2 The probability that a cancer test will detect the disease in a person who has cancer is
0.98. The probability that a person who does not have cancer will give a positive
reading on the test is 0.1 (i.e. the test says he has the disease even though he has not).
If 1 per cent of the population has cancer, what is the probability that a person
selected at random will in fact have cancer, given that he shows a positive reading on
the cancer test?
3 The probability that twins are identical is 0.7. Identical twins are always of the same
sex, while non-identical twins are of the same sex with probability 0.5. What is the
probability that twin boys are identical twins?
4 An assembler of electric fans uses motors from two sources. Company A supplies 90%
of the motors and Company B supplies the other 10% of the motors. Suppose that it is
known that 5% of the motors supplied by Company A are defective and 3% of the
motors supplied by Company B are defective. An assembled fan is found to have a
defective motor. What is the probability that this motor was supplied by Company B?
___________________________________________________________________________
• It is usually impractical to list and to count all the elementary events contained in the
sample space or in the event of interest.
• The theory of combinations and permutations frequently comes to the rescue, and
enables the number of elementary events contained in sample spaces and events to be
determined quite easily. This theory is summarized in a series of “counting rules”
given later.
24
different orderings of the objects belonging to the set.
• We can see this by thinking in terms of having n slots to fill with the n objects in the
set. Each slot can hold one object.
• We can choose any object for the first slot in n ways; there are then n−1 objects
available for the second slot, so we can select an object for the second slot in n−1
ways, leaving n−2 objects available for the third slot, until the last remaining object
has to placed in final slot.
• We say that there are n! distinct arrangements (technically, we call each arrangement
or ordering a permutation) of the n objects in the set.
Example: If the set A = {1,2,3}, list all the possible permutations. There are 3! = 3×2×1 = 6
distinct arrangements of the objects in A.
• Note that 𝑛𝑃𝑟 can be found directly from the calculator (without using factorial
notation) by pressing n, then SHIFT, then the multiplication sign followed by r and
then equal to.
• Note that we are
(a) choosing r objects and
(b) arranging them.
• We are here involved in two processes, choosing and arranging. The number of ways
of choosing and arranging r objects out of n distinguishable objects is called the
number of permutations of n objects taken r at a time and is denoted by 𝑛𝑃𝑟 (“n
permutation r”).This formula is also valid for r = n if we adopt the convention that
0! = 1.
Example: How many different pictures (a rearrangement of the same people is considered a
different picture) are possible if 10 people are present?
Solution: This is the same as asking for the number of permutations of 10 objects taken 3 at a
time, given by 10P3 = 720.
25
Example: Suppose that 19 political parties contested an election. How many different ways
can the top four political parties be lined up?
Solution: This is equivalent to asking: “How many permutations of 19 objects taken 4 at a
time are there?” The answer is 19P4 = 93024.
Example: In how many ways can a 9-man work team be formed from 15 men?
Solution: The problem asks only for the number of ways of choosing 9 men out of 15,
which is 15C9 = 5005.
Exercise: From 8 accountants and 5 computer programmers, in how many ways can one
select a committee of
(a) 5 people.
(b) 3 accountants and 2 computer programmers?
(c) 5 people, subject to the condition that the committee contain at least 2 computer
programmers and at least two accountants.
Example: How many four digit numbers can be made from the 10 digits from 0 to 9, if
repetitions are permitted?
Solution: We have four slots to fill. But because all of the 10 digits remain available to fill
every slot, this can be done in 10 × 10 × 10 × 10 = 10000 ways. This makes sense, because
there are 10 000 numbers from 0 (actually 0 000) to 9 999.
Example: How many four letter words can be made with a 26-letter alphabet — including all
nonsense words?
Solution: 26 × 26 × 26 × 26 = 456976
26
Example: It is proposed to adopt a system of motor car number plates which uses three
letters of the alphabet (excluding I and O) followed by three digits. How many number plates
are possible?
Solution: the number of possible number plates is 24 × 24 × 24 × 10 × 10 × 10 = 13
824 000.
Exercises
1. Determine whether each of the following situations would require calculating
a permutation or combination:
1.1. Selecting a treasurer, president and a vice president from a council of six
members.
1.2. Assigning different visitors’ cars to different parking bays.
1.3. Selecting five students to attend a State of the Nation Address.
2. There are 6 doors to a lecture hall. How many ways can a lecturer enter a
lecture hall through one door and leave a hall through a different door?
3. If seven friends were asked to line up for a groupie with the owner of the cell
phone in the centre, how many distinguishable photos are possible?
4. Suppose that a lotto player plays a single ticket, what is the probability of getting
all the number correctly?
5. In how many different ways can the letters of the of the word ‘SURVEYING’
be arranged if:
8. All telephone numbers at MUT start with 031907 followed by a four digit number.
Using the digits 0;1;2;…..9, determine the total number of distinct telephone numbers
if:
9. In how many ways can the letters of the word “SIPHESIHLE” be arranged?
27
10. Suppose that a student, whose name is Cyril would like to create a 10-character
password for his email account. He decided to use five English letters (A-Z)
followed by five digits (0-9). How many possible passwords can he create if:
10.1 the first five characters is his name and for the following five characters,
repetition of digits is allowed but the last digit of the password cannot be zero.
10.2 the first three characters will be letters from the name of his boyfriend, Xolani,
in any order (without repetition of letters), the next two characters will be vowels
(allowing repetition) and the last five characters will be odd numbers (not
allowing repetition).
11. Given a class of 12 girls and 10 boys. What is the probability of making a committee
of five students if a committee must consist of:
12. Out of 20 tyres 3 are defective (you do not know which ones are defective). You
select four tyres. What is the probability that out of the four selected tyres:
• We previously defined a sample space as the set consisting of all the elementary
events that are possible outcomes of a random experiment. Sometimes, we expressed.
• In order to manipulate the events defined on a sample space mathematically, it is
necessary to attach a numerical value to each elementary event.
• The motivation for assigning numbers to elementary events — it clears the way for us
to develop a general mathematical theory for handling the probabilities of events in a
sample space.
• Once all the elementary events in a sample space have numerical values assigned to
them, we follow the classic algebraic tradition and let X “stand for” the numerical
values of the elementary events.
• We then call X a random variable. X is a variable because it can “take on” (or
assume) different values. X is a random variable because the particular value it takes
on depends on the outcome of a random experiment.
• By convention, statisticians use the capital letters near the end of the alphabet to
denote random variables. Their favourite choice is the letter X.
28
4.1.1. Discrete and continuous random variables
• Random variables fall into two categories — discrete and continuous. The
mathematical treatment of these two types of random variables is very different — as
you will learn later.
• Discrete random variables take on isolated values along the real line, usually (but by
no means always) integer values. Examples of integer-valued discrete random
variables are:
➢ the number of customers entering a store between 09h00 and 10h00
➢ the number of occupied tables at a restaurant
➢ the number of clients visited by a salesperson during a day
➢ the number of applicants who respond to a job advertisement.
Exercises: Write down the probability distribution for each of the following
random experiments and the associated random variables
(a) Flip a coin twice and observe the number of heads
(b) Flip a coin three times and observe the number of tails.
(c) Roll a die once and observe the number of dots appearing.
29
Exercise: Calculate the mean and the standard deviation for each of the random variable in
the previous exercise (number of heads, number of tails and the number of dots appearing).
Exercises
1 Check which of the following functions can serve as probability mass functions:
𝑥
a. 𝑝(𝑥) = 𝑥 = 1,2,3
6
= 0 otherwise
1 3 1 1
b. 𝑝(𝑥) = 𝑥 𝑥 = 16 , 16 , 4 , 2
𝑥
c. 𝑝(𝑥) = 15, 𝑥 = 1,2,3,4,5.
2 Find the value of 𝑐 such that the function below is a probability mass function
𝑝(𝑥) = 𝑐𝑥 𝑥 = 0,1,2,3,4.
Hence, find
a. 𝑃(𝑋 ≥ 1)
b. 𝑃(0 < 𝑋 ≤ 3)
c. 𝑃(𝑋 ≤ 4)
3 Calculate the mean and the standard deviation of 𝑋 for each probability mass function
in question number 1.
___________________________________________________________________________
➢ There are only two, mutually exclusive and collectively exhaustive, outcomes
associated with the random variable on each object in the sample. These two
outcomes are labelled success and failure (e.g. a product is defective or not
defective; an employee is absent or not absent from work; a consumer prefers
brand A or not brand A).
30
➢ Each outcome has an associated probability. The probability for the success
outcome is denoted by p. The probability for the failure outcome is denoted by
1 − p.
‘What is the probability that x successes will be occur in a randomly drawn sample of n
objects?’
This probability can be calculated using the binomial probability distribution formula:
Where:
➢ n = the sample size, i.e. the number of independent trials (observations)
➢ x = the number of success outcomes in the n independently drawn objects
➢ p = probability of a success outcome on a single independent object (1 − p) =
probability of a failure outcome on a single independent object
Exercise: The Avis car hire company has a fleet of rental cars that includes the make Opel.
Experience has shown that one in four clients requests to hire an Opel. If five reservations are
randomly selected from today’s bookings,
(a) what is the probability that two clients will have requested an Opel?
(b) what is the probability that at most two clients will have requested an Opel?
(c) what is the probability that at least one clients will have requested an Opel?
(d) what is the probability that three clients will not have requested an Opel?
• The mean and variance of the binomial random variable are given by 𝜇 = 𝑛𝑝 and
𝜎 2 = 𝑛𝑝(1 − 𝑝), respectively.
Exercises
1. The South African Department of Health has reported that 30% of all goats
born in South Africa have been diagnosed with abscesses. If a random
sample of 10 goats born in South Africa are randomly selected,
1.1 approximate the expected number of goats that will not be diagnosed with
this disease.
1.2 calculate the probability that six of these will not be diagnosed with
31
abscesses.
1.3 calculate the probability that at most two of these goats will be diagnosed
with this disease.
2. six stores for which she is responsible. Experience has shown that there is a
one-in five chance that a given store will run out of stock before the merchandiser’s
weekly visit.
2.1. What is the probability that, on a given weekly round, the merchandiser
will find exactly one store out of stock?
2.2 What is the probability that, at most, two stores will be out of stock?
2.3 What is the probability that a minimum of two stores will be out of stock?
2.4 What is the mean number of stores out of stock each week?
3. A marketing manager makes the statement that the long-run probability that a
customer would prefer the deluxe model to the standard model of a product is 30%.
3.1. What is the probability that exactly three in a random sample of 10 customers will
prefer the deluxe model?
3.2. What is the probability that more than two in a random sample of 10 customers will
prefer the standard model?
3.3. In a random sample of 10 customers, calculate the standard deviation of the number
of customers who prefer the standard model.
32
•
In each case, the number of occurrences of a given outcome of the random variable,𝑥 ,
can take on any integer value from 0, 1, 2, 3, … up to infinity (in theory).
The Poisson Question
𝑒 −𝑎 ∙𝑎𝑥
𝑃(𝑥) = for 𝑥 = 0,1,2, …
𝑥!
Where:
➢ a = the mean number of occurrences of a given outcome of the random variable
for a predetermined time, space or volume interval
➢ e = is a mathematical constant.
➢ x = number of occurrences of a given outcome for which a probability is required.
Exercises
1.1 Calculate the standard deviation of the number of orders that a farm receives
in an eight-day interval.
1.2 Calculate the probability that in a given four-day interval, a company will
receive three orders.
1.3 Calculate the probability that in a given 16-day interval, a company will
receive at least two orders.
2.1. What is the probability that he sells more than one ice-cream in his first
hour of operation?
3. A company that supplies ready-mix concrete receives, on average, six orders per day.
33
3.1.1 only one order will be received?
3.1.4 What is the probability that, on a given half-day, only one order will be received?
3.2 What is the mean and standard deviation of orders received per day?
4.1 What is the probability that a typical family will purchase at least three
tubes of toothpaste in any given month?
4.2 What is the likelihood that a typical family will purchase less than four tubes of
toothpaste in any given month?
__________________________________________________________________________
• The normal probability distribution is continuous and has the following properties:
• To find the probability that 𝑥 lies between 𝑥1 and 𝑥2 , it is necessary to find the area
under the bell-shaped curve between these x-limits.
• This is done by converting the x-limits into limits that correspond to another normal
distribution called the standard normal distribution (or z-distribution as it is commonly
called) for which areas have already been worked out. These areas are given in a
statistical table.
Exercises
34
1 The manager of a local gym has determined that the length of time patrons spend at the
gym is a normally distributed variable with a mean of 80 minutes and a standard
deviation of 20 minutes.
1.1 What proportion of patrons spend more than two hours at the gym?
1.2 What proportion of patrons spend less than one hour at the gym?
1.3 What is the least amount of time spent by 60% of patrons at the gym?
2.1 If this type of washing machine is guaranteed for one year, what percentage of original
sales will require replacement if they fail within the guarantee period?
3.1. Calculate the probability that a randomly selected telemarketer uses daily:
3.4. What is the minimum amount of airtime spent by the 95% of the telemarketers?
35
References
• INTROSTAT
Authors: L. Underhill & D. Bradfield (University of Cape Town, 2013)
36