Sta 2100 Probability and Statistics I (Course Outline With Notes)
Sta 2100 Probability and Statistics I (Course Outline With Notes)
OF
AGRICULTURE & TECHNOLOGY
KAREN CAMPUS
V. S. ANDIKA
([email protected])
Course description
1. Data: sources, collection, classification and processing.
Prerequisite: SMA 2104: Mathematics for Science and SMA 2102: Calculus I
Learning outcomes
Upon completion of this course you should be able to;
2
5. Explain the purpose of measures of central tendency, variability, skewness
and Kurtosis
6. Acquire basic knowledge of the method of least square and correlation theory
Instruction methodology
• Online Tutorials
• Case studies
Core References:
1. P.S. Mann. Introductory Statistics. John Wiley & Sons Ltd, 2001 ISBN 13:
9780471395119.
2. S Ross A first course in Probability 4th ed. Prentice Hall, 1994 ISBN-10:
0131856626 ISBN-13: 9780131856622
4. GM Clarke & D Cooke A Basic Course in Statistics. 5th ed. Arnold, 2004
ISBN13: 978-0-340-81406-2 ISBN10: 0-340-81406-3
Additional Reference:
• Uppal, S. M., Odhiambo, R. O. & Humphreys, H. M. Introduction to Prob-
ability and Statistics. JKUAT Press, 2005.
Assessment information
The module will be assessed as follows;
3
Table of Contents
3. Measures of variability
3.1. Range
3.2. Semi-interquartile range
3.3. Variance and standard deviations
3.4. Coefficient of Variation
3.5. Pearson’s Measure of Skewness (psk)
4. Measures of location/Position
4.1. Quartiles
4.2. Percentiles
5. Summary
6. Assignment 1
5. Numerical Summaries of Data (Grouped Frequency Distributions)
1. Introduction
2. Measures of central tendency for grouped data
2.1. Computing the Mean
2.2. Assumed mean and coding method
2.3. Median for a grouped frequency
2.4. Mode for a grouped frequency:
3. Measures of variability for grouped data
3.1. Variance and Standard deviation
3.2. Mean absolute deviation (MAD)
4. Measures of location/Position
4.1. Quartiles and Percentiles
5. Combining sets of Data
6. Summary
6. Introduction to Probability
1. Introduction
2. Definitions
3. Rules of probability
4. How do we measure probability?
4.1. Classical (or theoretical) probability
4.2. Frequentist or Empirical (or statistical) probability
4.3. Subjective
5. Laws of probability
5
Table of Contents (cont.)
6
Table of Contents (cont.)
7
STA 2100 Probability and Statistics I
Chapter 1
Review of Basic Set theory, Permutations and
Combinations
Learning outcomes
Upon completion of this section, you should be able to:
• Perform basic set operations using the Venn diagram, union and intersection
of sets
1
STA 2100 Probability and Statistics I
1.2. Definitions
Set
A set is a collection of objects which are called the members or elements of that
set. If we have a set we say that some objects belong (or do not belong) to this
set, are (or are not) in the set. We also say that sets consist of their objects such
that given an object, it is possible to determine whether the object belongs to the
given collection or not.
Example. The following are some examples of sets:
Elements of a set
The members of a set are called elements. We use upper case (capital) letters
to denote sets and lower case (small) letters to denote elements of a set. For
instance, If a is an element of the set A, we write this as a ∈ A (read as a belongs
to A) and if a is not an element of A, we write this as a ∈ / A (read as a does not
belong to A).
2
STA 2100 Probability and Statistics I
Set denotation/specification
There are different ways of describing a set. We usually use the parenthesis {}
to denote a set. For instance, the set consisting of elements 1,2,3,4,5,6 could be
written as {1, 2, 3, 4, 5, 6} or {1, 2, 3, ...6} or {x|x ∈ N, x ≤ 6} where N is the
set of the natural numbers. As we can see from this example, we also use the set
builder notation {x|x......} to denote a set, where the “bar” is used to replace the
words ’such that’.
There are three main ways to specify a set:
1. by listing all its members (list notation); This method is mostly suitable
for finite sets. In this case we list names of elements of a set, sepa-
rate them by commas and enclose them in braces, for instance {1, 12, 45},
{George W ashington, Bill Clinton}, {a, b, d, m}.
2. by stating a property of its elements (predicate notation); {x|x is a natural number and x < 8}
Read as : “the set of all x such that x is a natural number and is less than
8” So the second part of this notation is a property the members of the set
share (a condition or a predicate which holds for members of this set)
Note: The order in which elements appear in a set is not important. That is, the
set {a, b, c, d, e} and {a, e, c, d, b} are equal.
A set which has finite number of elements is called finite set, otherwise we call it
an infinite set. For instance, if A is the set of all integers, the A is an infinite set,
denoted as {...., −2, −1, 0, 1, 2...} or{x|x is an integer}
Singleton
In mathematics, a singleton, also known as a unit set, is a set with exactly one
element. If a is the element of the singleton A, then A is denoted as A = {a}.
Note, {a}and a do not mean the same thing. Whereas the former means a set
consisting of a single element a , the latter is just an element.
3
STA 2100 Probability and Statistics I
Equality of sets
Two sets A and B are said to be equal if and only if (iff) every element of the set
A is a member of the set B and vice-verse. We express this by writing A = B
and logically speaking A = B means (x ∈ A) ≡ (x ∈ B) or the bi-conditional
statement (x ∈ A) ⇐⇒ (x ∈ B) is true for all x.
Subset
Let A and B be two sets. If every element of Ais an element of B then Ais called a
Subset of B and we write this as A ⊆ B or B ⊇ A (read as A is contained in B or
B contains A respectively). Equally speaking, A ⊆ B means (x ∈ A) =⇒ (x ∈ B)
is true ∀x.
Note:
A set which has no element is called the null or empty set, and is denoted as ∅or
{}. For example;
• the set of all those integers that are both even and odd
• the set of all JKUAT IT students who are both sick and well at the same
time
Universal set
We assume that every set is a subset of a fixed set U or ξ, known as the universal
set. It is the set which contains all objects, including itself.
4
STA 2100 Probability and Statistics I
Power sets
The set of all subsets of a set A is called the power set of A and denoted as ℘(A)
or sometimes as 2A . For example, if A = a, b, then ℘(A) = {∅, a, b, a, b} . From
this example above: a ∈ A; {a} ⊆ A {a} ∈ ℘(A); ∅ ∈ A; ∅ ⊆ A; ∅ ⊆ ℘(A)
Complements
If A and B are two sets, then complement B relative to A is the set of all those
elements x ∈ A such that x ∈ / B and is denoted as A − B. Whenever we are
talking of complement of the set B, we usually mean the complement of the set
B relative to the universal set U . In such cases, we denoted the complement of
B by B 0 or B̄, thus B 0 = U − B or B̄ = U − B. This form of compliment is also
known as relative compliment of B relative to A.
Given a set A and a universal set U , the elements that are in U and are NOT
in A is called the complement of A or Ac .
Example. Suppose we have the set S = {a, b, c, d, e, f }, then the set S 0 which
is the complement of the set S is given by S 0 = {g, h, i, j, ...z}given that the
universal set U is assumed to be the set of all letters of the alphabet.
Operations on sets
Union
Here, we would like to define some operations that can be done on sets. Suppose
we let A and B be two arbitrary sets. Then the union of the set A and B, written
A ∪ B, is the set whose elements are just the elements of A or B or of both. Using
the set builder notation, the definition is A ∪ B = {x|x ∈ A or x ∈ B}.
Example. Lets define the following sets K = {a, b},L = {c, d} and M =
{a, b, d}, then;
• K ∪ L = {a, b, c, d}
• K ∪ M = {a, b, d}
• L ∪ M = {a, b, c, d}
• (K ∪ L) ∪ M = K ∪ (L ∪ M ) = {a, b, c, d}
5
STA 2100 Probability and Statistics I
• K ∪K =K
• K ∪ ∅ = ∅ ∪ K = K = {a, b}
Intersection
The intersection of two sets A and B, written A ∩ B, is the set whose elements
are just the elements of both A and B. Using the set builder notations, we define
and intersection as follows: A ∩ B = {x|x ∈ A and x ∈ B}. When two sets A
and B are disjoint, the A ∩ B = ∅.
Example. Using the three sets seen in our previous example, we can obtain the
following sets using the operation, intersection.
• K ∩L=∅
• K ∩ M = {a, b}
• L ∩ M = {d}
• (K ∩ L) ∩ M = K ∩ (L ∩ M ) = ∅
• K ∩K =K
• K ∩ ∅ = ∅ ∩ K = ∅.
Venn Diagram
A Venn diagram or set diagram is a diagram that shows all possible logical rela-
tions between a finite collection of sets. These diagrams help us understand the
elementary set theory, as well as helping in illustrating simple set relationships in
probability, logic, statistics, linguistics and computer science. The diagram con-
sists of the universal set U represented by points in and on a rectangle, and subsets
A, B, C...by points in and on the circles or ellipse drawn inside the rectangle.
The following is a diagrammatic representation of the Venn diagram showing
a diagram with two disjoint sets A and B (left diagram), and two joint sets A and
B (right diagram).
6
STA 2100 Probability and Statistics I
7
STA 2100 Probability and Statistics I
Solution:
If we let x1 to represent Jackets, x2 to represent shirts and x3 to represent slacks;
then the man can have x1 *x2 *x3 =3*10*5=150 outfits
Permutations
Combinations
8
STA 2100 Probability and Statistics I
That is 4 C2 ∗ 3 C2 = 6 ∗ 3 = 18 ways
Learning Activities
Generally there are a number of general laws about sets which follow from the
definitions of set theoretic operations and subsets as we have seen in the previous
sections above. These include the Idempotent Laws, Commutative Laws,
DeMorgan’s Laws amongst other laws. Using relevant examples, discuss these
eight laws of sets.
Summary
Using the knowledge of sets, Venn diagrams, permutations and combinations we
can be able to apply the knowledge in finding probabilities of outcomes as we
shall see in the subsequent chapters. By knowing the number of elements in the
different sets, and the universal set, we can be able to compute the probability
of an event occurring, which is a proportion of elements in a set relative to the
number of elements in the universal set.
9
STA 2100 Probability and Statistics I
Revision Questions
List the following sets; N denotes the set of natural numbers while Z denotes the
set of integers.
Exercise 1. S = {x|x ∈ N, x < 10}
Exercise 2. P = {x|x ∈ Z, x < 6}
Exercise 3. Q = {x|x ∈ Z, and 2 < x < 10}
Exercise 4. Find the number of arrangements that can be made out of the
letters of the words ASSASINATIONS and ABRACADABRA.
Exercise 5. A committee of 5 members is to be formed out of 6 men and 4
women. How many committees can be formed so that at least one woman is
always there in the committee?
Assignment 1
1. A group of 50 BIT students were asked which of the three Computer Science
Journals, A, B or C they read. The results showed that 25 read A, 16 read
B, 14 read C. 5 read both A and B, 4 read both B and C, 6 read both C
and A and 2 read all three. a.
2. In a certain computer firm, there are 400 employees; of these 400, 150 are
men, 276 are university graduates, 212 are married persons, 94 are male
university graduates, 151 are married university graduates, 119 are married
men, 72 are married male university graduates. Find the number of single
women in this firm who are not university graduates.
3. A container has 12 blue balls and 8 red balls. If we are interested in choosing
10 balls from the container so that there are always at least two red balls in
the ten balls chosen, how many groups of ten balls can we make?
10
STA 2100 Probability and Statistics I
Chapter 2
Data: Collection and Sampling
Learning outcomes
Upon completing this topic, you should be able to:
• Identify the different methods of data collection and criterion that we use
to select a method of data collection
11
STA 2100 Probability and Statistics I
1. Introduction
Before we look at data presentation and the subsequent chapters, lets first define
what we mean by the terms Probability, and Statistics:
We often make statements about probability or chances. For instance a weather
forecaster may predict that there is a 80% chance of rain tomorrow; a health worker
may state that smokers are twice as likely to get lung cancer than non-smokers or
a computer technician may say that there is a 60% chance that a laptop fan may
not last more than 6 months after repair.
Definitions
Probability
Probability, which measures the likelihood that an event will occur, is an impor-
tant part of statistics. It is the basis of inferential statistics, in which we make
decisions under conditions of uncertainty. Probability theory is used to evaluate
the uncertainty invoked in those decisions. Combining probability and probability
distributions, with descriptive statistics, helps us to make decisions about popu-
lations based on information obtained from samples. Probability is therefore
the chance or relative possibility or likelihood that an event will occur.
It therefore reflects the long-run relative frequency of an outcome.
Statistics
The word ’statistics’ means one or more measures describing the characteristics of
a population. The main question that we attempt to answer using statistics is that
there is a relationship between variables. To demonstrate this relationship, we must
show that when one variable changes, the other variable changes too and that the
amount of change is not by mere chance. Statistics therefore is the science
of data. It deals with the scientific methods of collecting, organizing,
12
STA 2100 Probability and Statistics I
Role of statistics
Branches of Statistics
The study of statistics has two major branches: descriptive statistics and inferential
statistics.
Descriptive statistics - involves the organization, summarization, and display
of data.
Inferential statistics - involves using a sample to draw conclusions about a
population.
Example. In a recent study, volunteers who had less than 6 hours of sleep were
four times more likely to answer incorrectly on a Computer Science test than were
13
STA 2100 Probability and Statistics I
participants who had at least 8 hours of sleep. Decide which part is the descriptive
statistic and what conclusion might be drawn using inferential statistics.
Soln: The statement “four times more likely to answer incorrectly” is a de-
scriptive statistic. An inference drawn from the sample is that all individuals
sleeping less than 6 hours are more likely to answer science question incorrectly
than individuals who sleep at least 8 hours.
2.1. Definitions
Data consists of (raw) information coming from observations, counts, measure-
ments, or responses. They are measurements taken on a variable. A datum is
a single number which may represent a count (e.g. the number of computers
delivered to a client in Tatu City) or measurement (e.g. the mass of a bag of
tomatoes)
A population is the collection of all outcomes, responses, measurement, or
counts that are of interest.
A sample is a subset of a population.
A parameter is a numerical description of a population characteristic. (Parameter−→Population)
A statistic is a numerical description of a sample characteristic. (Statistic−→Sample)
14
STA 2100 Probability and Statistics I
Primary data
These are data that are fresh and collected for the first time and therefore are
original in character, that is, they are first hand information collected, compiled
and published by an organization or an individual for some purpose. They are
most original data in character and have not undergone any sort of statistical
(treatment).
15
STA 2100 Probability and Statistics I
For instance population census reports are primary data because they are col-
lected, compiled and published by the population census body, the economic survey
publication, the well-being survey etc.
The first hand information obtained by the investigator is more reliable and
accurate since the investigator can extract the correct information by removing
doubts, if any, in the minds of the respondents regarding certain questions. High
response rates might be obtained since the answers to various questions are ob-
tained on the spot. It permits explanation of questions concerning difficult subject
matter.
Secondary data
These are second hand information which are already collected by someone or
organization for some purpose and are available for the present study. They are
not pure in character, and have undergone some statistical treatment at least once.
When an investigator uses data, which have already been collected by others,
such data are called "Secondary Data". Such data are primary data for the agency
that collected them, and become secondary for someone else who uses these data
for his own purposes.
Sources of data.
When one is collecting data it is important to consider whether they are primary
or secondary data. Some of the main methods used for data collection include:
a) Census: A study that obtains data from every member of the popula-
tion—Not practical in most studies, because of the costs and/ or time required.
b) Sample survey: A study that obtains data from a subset of a popula-
tion, in order to estimate population attributes/ characteristics. Survey of human
populations and institution are common in government health, social science, IT,
marketing etc.
c) Experiment: A controlled study in which the researcher attempts to under-
stand the cause - and - effects relationship. The actual experiment is carried out
on certain individuals/units about whom information is drawn. Study is controlled
in the sense that the researcher controls how subjects are assigned to groups and
which treatments/conditions each group receives.
16
STA 2100 Probability and Statistics I
b) Semi - officials: from the state bank, Rail way board etc
Editing of data
After collecting data from either source the next step is editing. i.e. Examination
of collected data to discover any error and mistake before presenting it. Editing
of secondary data is simpler than primary.
17
STA 2100 Probability and Statistics I
What is a Variable?
18
STA 2100 Probability and Statistics I
incomes) and can also vary over time for each data unit (i.e. income can go up or
down).
A random variable, denoted by X, is a variable whose possible values are
numerical outcomes of a random experiment.
Types of variables
There are different ways variables can be described according to the ways they can
be studied, measured, and presented.
Numeric variables have values that describe a measurable quantity as a number,
like ’how many’ or ’how much’. Therefore numeric variables are quantitative
variables.
Numeric variables may be further described as either continuous or discrete:
Therefore, the data collected for a numeric variable are quantitative data.
Categorical variables have values that describe a ’quality’ or ’characteristic’
of a data unit, like ’what type’ or ’which category’. Categorical variables fall
into mutually exclusive (in one category or in another) and exhaustive (include
all possible options) categories. Therefore, categorical variables are qualitative
variables and tend to be represented by a non-numeric value.
19
STA 2100 Probability and Statistics I
20
STA 2100 Probability and Statistics I
Learning Activities
• Write a summary report, typed with the front size 12, font- Times New
roman with a spacing of 1.5 discussing the following concepts: Sampling
process, sampling frame, sampling methods, probability sampling methods
and non-probability methods.
21
STA 2100 Probability and Statistics I
Chapter 3
Presentation of Data
Learning outcomes
Upon completing this topic, you should be able to:
• Compare the presentations of the same set of data by using various graphs.
• Understand the criterion for the selection of a method to organize and present
data
22
STA 2100 Probability and Statistics I
1. Introduction
The raw data collected through the various methods of data collection will be
in a haphazard and unsystematic form and is not appropriately formed to draw
conclusions about the group or the population under study. Hence it becomes
necessary to arrange or organize the data in a form which is suitable for analysis.
Data can be presented in different forms such as: text, in a table, or pictorially as
a chart, diagram or graph.
Data graphics are a good way to communicate important data in your reports.
The purpose of putting results of research into graphs, charts and tables is two-
fold. First, it is a visual way to look at the data and see what happened and make
interpretations. Second, it is usually the best way to show the data to others.
Reading lots of numbers in the text puts people to sleep and does little to
convey information. Tables are the most commonly used form of data graphics,
but graphs, charts or diagrams that include symbols and pictures will get your
results across to the reader faster and will liven up your presentation or report.
2. Tables
Once we have collected our data, often the first stage of any analysis is to present
them in a simple and easily understood way. Tables are perhaps the simplest means
of presenting data.
There are many types of tables. For example, we have all seen tables listing
sales of computers by type, or exchange rates, or the financial performance of
companies. These types of tables can be very informative. However, they can also
be difficult to interpret, especially those which contain vast amounts of data.
Frequency tables are amongst the most commonly–used tables and are perhaps
the most easily understood. They can be used with continuous, discrete, categor-
ical and ordinal data. Frequency tables have uses in some of the techniques we
will see in the next lecture.
3. Graphical presentations
Graphics, such as maps, graphs and diagrams, are used to represent large volume of
data. With large amounts of data graphical presentation methods are often clearer
23
STA 2100 Probability and Statistics I
• Who is my audience?
• induce the viewer to think about the substance of the graphic rather than
the methodology, graphic design, or something else
24
STA 2100 Probability and Statistics I
• encourage the viewer to use the graphic as you intend, e.g. make compar-
isons
• be as simple as possible
In the following section we discuss some of the commonly used graphical presen-
tations
25
STA 2100 Probability and Statistics I
Pie Chart
To create a pie chart for the data, find the relative frequency
(percent) of each category.
Relative
Type Frequency
Frequency
Motor Vehicle 43,500 0.578
Falls 12,200 0.162
Poison 6,400 0.085
Drowning 4,600 0.061
Fire 4,200 0.056
Ingestion of Food/Object 2,900 0.039
Firearms 1,400 0.019
n = 75,200
Continued.
Athiany, HKO 29
26
STA 2100 Probability and Statistics I
Pie Chart
Next, find the central angle. To find the central angle,
multiply the relative frequency by 360°.
Relative
Type Frequency Angle
Frequency
Motor Vehicle 43,500 0.578 208.2°
Falls 12,200 0.162 58.4°
Poison 6,400 0.085 30.6°
Drowning 4,600 0.061 22.0°
Fire 4,200 0.056 20.1°
Ingestion of Food/Object 2,900 0.039 13.9°
Firearms 1,400 0.019 6.7°
Continued.
Athiany, HKO 30
27
STA 2100 Probability and Statistics I
Pie Chart
Ingestion Firearms
3.9% 1.9%
Fire
5.6%
Drowning
6.1%
Poison
8.5% Motor
vehicles
Falls 57.8%
16.2%
Athiany, HKO 31
28
STA 2100 Probability and Statistics I
1. First decide what goes on each axis of the chart. By convention the variable
being measured goes on the horizontal (x–axis) and the frequency goes on
the vertical (y–axis).
2. Next decide on a numeric scale for the frequency axis. This axis represents
the frequency in each category by its height. It must start at zero and include
the largest frequency. It is common to extend the axis slightly above the
largest value so you are not drawing to the edge of the graph.
5. Draw a bar for each category. When drawing the bars it is essential to ensure
the following:
• the bars are separated from each other by equally sized gaps.
Example. Use the following data representing the number of guests who were
booked in a hotel in Mombasa on a particular day in the month of December
2013, construct a suitable bar graph for the data.
29
STA 2100 Probability and Statistics I
Note that the above graph is a vertical bar graph. We can also obtain a vertical
bar graph presenting the same information.
3.3. Histograms
Bar charts have their limitations; for example, they cannot be used to present
continuous data. When dealing with continuous random variables a different kind
of graph is required. This is called a histogram. At first sight these look similar
to bar charts. There are, however, two critical differences:
• the height of the rectangle is only proportional to the frequency if the class
intervals are all equal. With histograms it is the area of the rectangle that
30
STA 2100 Probability and Statistics I
Initially we will only consider histograms with equal class intervals. Those with
uneven class intervals require more careful thought. Producing a histogram is
much like producing a bar chart and in many respects can be considered to be the
next stage after producing a grouped frequency table. In reality, it is often best to
produce a frequency table first which collects all the data together in an ordered
format. Once we have the frequency table, the process is very similar to drawing
a bar chart.
1. Find the maximum frequency and draw the vertical (y–axis) from zero to
this value, including a sensible numeric scale.
2. The range of the horizontal (x–axis) needs to include not only the full range
of observations but also the full range of the class intervals from the fre-
quency table.
3. Draw a bar for each group in your frequency table. These should be the
same width and touch each other (unless there are no data in one particular
class).
31
STA 2100 Probability and Statistics I
Guidelines
3. Find the class width as follows. Determine the range of the data, divide the
range by the number of classes, and round up to the next convenient
number.
4. Find the class limits. You can use the minimum entry as the lower limit of
the first class. To find the remaining lower limits, add the class width to the
lower limit of the preceding class. Then find the upper class limits.
5. Make a tally mark for each data entry in the row of the appropriate class.
6. Count the tally marks to find the total frequency f for each class.
• The minimum data entry is 18 and maximum entry is 54, so the range is
36. Divide the range by the number of classes to find the class width.
– Classwidth = 36
5
= 7.2 round up to 8
• The minimum data entry of 18 may be used for the lower limit of the first
class. To find the lower class limits of the remaining classes, add the width
(8) to each lower limit.
– The lower class limits are 18, 26, 34, 42, and 50.
– The upper class limits are 25, 33, 41, 49, and 57.
• Make a tally mark for each data entry in the appropriate class.
• The number of tally marks for a class is the frequency for that class.
32
STA 2100 Probability and Statistics I
Athiany, HKO 10
Once this is done and we now have the class intervals, we can then construct
the histogram as follows;
Frequency Histogram
A frequency histogram is a bar graph that represents
the frequency distribution of a data set.
1. The horizontal scale is quantitative and measures
the data values.
2. The vertical scale measures the frequencies of the
classes.
3. Consecutive bars must touch.
Class boundaries are the numbers that separate the
classes without forming gaps between them.
The horizontal scale of a histogram can be marked with
either the class boundaries or the midpoints.
Athiany, HKO 16
33
STA 2100 Probability and Statistics I
Class Boundaries
Lets consider the class boundaries for the “Ages of the IT Students”
frequency distribution.
Ages of Students
Class
Class Frequency, f Boundaries
The distance from 18 – 25 13 17.5 25.5
the upper limit of
the first class to the 26 – 33 8 25.5 33.5
lower limit of the 34 – 41 4 33.5 41.5
second class is 1.
42 – 49 3 41.5 49.5
Half this 50 – 57 2 49.5 57.5
distance is 0.5.
f 30
Athiany, HKO 17
34
STA 2100 Probability and Statistics I
Frequency Histogram
To draw a frequency histogram for the “Ages of Students”
frequency distribution, we use the class boundaries.
14 13 Ages of Students
12
10
8
8
f 6
4
4 3
2 2
0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
Athiany, HKO 18
You may have noticed that we referred to the above histogram as a frequency
histogram. Instead of constructing a frequency histogram, we may also be inter-
ested in constructing a relative frequency histogram. The process is quite similar
to the above after obtaining the relative frequencies as illustrated below.
Relative Frequency
First, we need to find the relative frequencies for the “Ages of IT
Students” frequency distribution as follows.
Relative Portion of
Class Frequency, f Frequency students
18 – 25 13 0.433 f 13
26 – 33 8 0.267 n 30
34 – 41 4 0.133 0.433
42 – 49 3 0.1
50 – 57 2 0.067
f
f 30 1
n
Athiany, HKO 14
35
STA 2100 Probability and Statistics I
0.5
0.433
(portion of students)
Relative frequency
2. Draw the axes;The x-axis needs to contain the full range of the classes
used. The y-axis needs to range from 0 to the maximum percentage relative
frequency.
3. Plot points: pick the mid point of the class interval on the x-axis and go up
until you reach the appropriate percentage value on the y-axis and mark the
point. Do this for each class.
36
STA 2100 Probability and Statistics I
Frequency Polygon
14
Ages of Students
12
10
8 Line is extended
to the x-axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints
Athiany, HKO 19
2. Label the x-axis with the full range of the data and the y-axis from 0 to
100%.
3. Plot the cumulative % relative frequency at the end point of each class.
37
STA 2100 Probability and Statistics I
30 Ages of Students
Cumulative frequency
(portion of students)
24
18
The graph ends
at the upper
12 boundary of the
last class.
6
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Athiany, HKO 21
38
STA 2100 Probability and Statistics I
small values, this interval width could also be 1, or even 0.1 or 0.01. Once we
have decided on our intervals we can construct the stem and leaf plot.
Consider the following data: 11, 12, 9, 15, 21, 25, 19, 8. The first step is to
decide on interval widths – one obvious choice would be to go up in 10s. This
would give a stem unit of 10 and a leaf unit of 1. The stem and leaf plot is
constructed as below.
Stem units: 10, leaf digits: 1 (the value 8.000 is represented by 0|8)
0|89
1|1259
2|15
In a stem-and-leaf plot, each number is separated into a stem (usually the
entry’s leftmost digits) and a leaf (usually the rightmost digit). This is an example
of exploratory data analysis.
Using the IT students age data set, we can construct a stem and leaf plot as
follows
Stem-and-Leaf Plot
Ages of Students
Key: 1|8 = 18
1 888999
2 0011124799 Most of the values lie
3 002234789 between 20 and 39.
4 469
5 14
This graph allows us to see
the shape of the data as well
as the actual values.
Athiany, HKO 24
39
STA 2100 Probability and Statistics I
Stem-and-Leaf Plot
Constructing a stem-and-leaf plot that has two lines for
each stem.
Ages of Students
1 Key: 1|8 = 18
1 888999
2 0011124
2 799
3 002234
3 789 From this graph, we can
4 4 conclude that more than 50%
4 69 of the data lie between 20
5 14 and 34.
5
Athiany, HKO 25
40
STA 2100 Probability and Statistics I
Revision Questions
Learning Activities
1. Read more on scatter plots and Box plots and summarize their use, ad-
vantages and disadvantages versus the methods we have presented in this
lecture. If possible, give some examples of a scatter plot and a box plot.
2. Search for at least five bad graphs and discuss why they are bad.
41
STA 2100 Probability and Statistics I
Chapter 4
Numerical Summaries of Data (Simple frequency
distributions)
Learning outcomes
Upon completing this topic, you should be able to:
• Choose the appropriate measure that can best describe a given data.
42
STA 2100 Probability and Statistics I
1. Introduction
Collected data need to be organized in such a way as to condense the information
they contain in a way that will show patterns of variation clearly. Precise meth-
ods of analysis can be decided up on only when the characteristics of the data
are understood, since the primary objective of these different techniques of data
organization and presentation like order; array, tables and diagrams are used.
For frequency distributions of data to be more easily appreciated, and to draw
quick comparisons, it is often useful to arrange the data in the form of a table,
or in one of a number of different graphical forms. When analyzing voluminous
data collected from say, an IT firms records, it is quite useful to put them into
compact tables. Quite often, the presentation of data in a meaningful way is done
by preparing a frequency distribution. If this is not done the raw data will not
present any meaning, and any pattern in them (if any) may not be detected.
A frequency distribution is a table that shows classes or intervals of data
with a count of the number in each class. The frequency f of a class is the number
of data points in the class
Array (ordered array) is a serial arrangement of numerical data in an as-
cending or descending order. This will enable us to know the range over which the
items are spread and will also get an idea of their general distribution. Ordered
array is an appropriate way of presentation when the data are small in size (usually
less than 20).
The first step in looking at data is to describe the data at hand in some concise
way. In smaller studies this step can be accomplished by listing each data point.
In general, however, this procedure is tedious or impossible and, even if it were
possible would not give an over-all picture of what the data look like.
The basic problem of statistics can be stated as follows: Consider a sam-
ple of data x1 , . . . . . . ..xn , where x1 corresponds to the first sample point and n
corresponds to the nth sample point.
Presuming that the sample is drawn from some population P , what inferences
or conclusion can be made about P from the sample?
To answer this question, first, the data must be summarized as succinctly
(concisely, briefly) as possible, since the number of sample points is frequently
large and it is easy to lose track of the overall picture by looking at all the data
at once. One type of measure useful for summarizing data defines the center, or
43
STA 2100 Probability and Statistics I
2.1. Mean
• Arithmetic mean
The arithmetic mean is the sum of all observations divided by the number of ob-
servations. If n measurements x1, x2 , x3, x4..........., xn have been taken on a variable,
the arithmetic mean of the observations is given by;
P
x1 + x2 + x3 + ... + xn xi
x= =
n n
In case of a frequency distribution where xi occur with frequency fi , the mean
x̄ is shall be obtained as follows;
x̄ = x1 f1 +x 2 f2 +x3 f3 ...xn fn
=
P P
f1 +f2 +...+fn
x i f i / fi
Example. The members od an ochestra were asked how many instruments each
would be able to play. Below are the results of their response.
2,5,2,4,1,1,1,2,1,3,3,2,1,2,1,1,2,2,4,3,2,1,2,3,1,4,2,3,1,1,2
Obtain the mean number of instruments played by a member of the ochestra.
Solution:
44
STA 2100 Probability and Statistics I
14, 830.7
Thus for 87 employees, we have 14, 830.7 + 158.80 = 14989.5 as the total
wage;
To obtain the mean, we have 14989.5/87 = 172.29
Weighted mean
A common problem is when the mean of a number of groups need to be
combined to form a grand mean. For instance, suppose a company splits its
home sales into three regions, each region having a sales representatives. Over a
particular period, rep A averages 8642 per sale from 24 sales, rep B had 119 from
37 sales and rep C 0422 from 25 sales. Find the average sale overall, thus
(8642∗24)+(1129−37)+(10422∗25)
24+37+25
= 509,731
86
= 5, 927.10 sales
Note:
45
STA 2100 Probability and Statistics I
The arithmetic mean is, in general, a very natural measure of central location.
One of its principal limitations, however, is that it is overly sensitive to extreme
values. In this instance it may not be representative of the location of the great
majority of the sample points.
• Geometric mean
The geometric mean is a type of mean or average, which indicates the central
tendency or typical value of a set of numbers by using the product of their values.
It is defined as the nth root (where n is the count of numbers) of the product
of the numbers. Geometric mean of two numbers a, b is the square root of their
product.
√
G.M = n
a1 a2 .........an
• Harmonic mean
Harmonic mean is another measure of central tendency and also based on math-
ematic footing like arithmetic mean and geometric mean. Like arithmetic mean
46
STA 2100 Probability and Statistics I
and geometric mean, harmonic mean is also useful for quantitative data. Harmonic
mean is defined in following terms: Harmonic mean is quotient of “number of the
given values” and “sum of the reciprocals of the given values”. The harmonic mean
of two numbers a, b is the reciprocal of the arithmetic mean of their reciprocal.
That is;
1
1/2(1/a+1/b)
This can then be simplified to become 1
= 1 = 2ab
1/2(1/a+1/b) 1/2( b+a ) (a+b)
ab
Generally, the harmonic mean of n numbers is given by 1/n(1/a1 +1/a
1
2 +...+1/an )
Example. Purity visits her aunt Daniella who stays some 60 km away. She travels
to her aunts home by cycle at an average speed of 20km/h. She returned in a
friend’s car at an average speed of 40km/h. What is her average speed for the
round trip?
Solution
When computing the average speed, we commonly make a mistake by com-
puting the speed as follows;
1
2
(20 + 40) = 30km/h This is not correct!
To compute this speed, there are two ways of getting this;
First approach
1st leg: she does 60km at 20km/h which takes 3 hours
2nd leg: she again does 60km at 40km/h takes 1.5 hours
Total distance covered is 120km in 4.5 hours
Average speed is 120
4.5
= 26.7km/h
Second approach
But this can also be obtained using the harmonic mean formula as follows:
2ab
(a+b)
= 2∗40∗20
40+20
= 1600
60
= 26.7km/h
• Easy to calculate
• Center of gravity
47
STA 2100 Probability and Statistics I
Cons:
• Extreme cases (outliers) affect results a lot. (e.g.Mean income is often not
very meaningful)
2.2. Median
The median is the midpoint of a distribution; i.e., the observation such that half
of observations are smaller and the other half are larger. This is sometimes used
instead of the mean particularly when the histogram of the observations is skewed.
It is obtained by placing the observations in ascending order of magnitude and then
picking out the middle observation. The median for a set of data that contains an
even number of items, there is no unique middle value or central value, hence use
the mean of the middle two items to give a practical median. It has the advantage
that it is not influenced by odd extreme observations.
COMPUTATION: Here are the steps we take for computing the median, M.
2. Compute the median position location; i.e., compute median position loca-
tion = n+1
2
3. The median of the data is given by the ordered value in this position.
48
STA 2100 Probability and Statistics I
Example. Suppose we have the following data giving the final exam scores for a
class in a particular unit; 96 92 84 77 74 84 80 74. Find the median.
Solution
Here, n = 8 (the number of observations).
First, we order the data from low to high (or we can also order them from high
to low)
74 74 77 80 84 84 92 96
The median position location is median position location n+12
= 8+1
2
= 4.5
Thus, the median is the average of the 4th and 5th ordered values; i.e.,
the median is M = 80+842
= 82
Example. Obtain the median of the following set of data:
205,207,220,217,219,208,206,212,215,218,204
Solution
Arranging the data, we have 204,205,206,207,208,212,215,217,218,219,220
The median is the value in position n+12
. We have n = 11, thus 11+1
2
=6.
Thus the value in the 6th position is 212
Suppose we have frequencies, then the median can be obtained as shown in
the following example.
Example. The following table shows the records for number of computers not
available for use in the multimedia center for 80 consecutive days in an institution
of higher learning.
No. of computers not available 0 1 2 3 4 5 6
No. of days 15 24 18 12 8 2 1
To obtain the median of this distribution, we follow these steps.
Obtain the cumulative frequency of the distribution, then find the position of
the median using the above formula.
That is
49
STA 2100 Probability and Statistics I
No. of 0 1 2 3 4 5 6
computers
not
available
No. of days 15 24 18 12 8 2 1
Cumulative 15 39 57 69 77 79 80
frequency,
CF
n in this case is even. Using the formula, n2 and n2 + 1, we have 80
2
= 40 and
80
2
+ 1 = 41. so the median is the value represented by the average of the values
in the position 40 and 41.
Using the CF, we obtain the median as the value that first exceeds 40 and 41.
In the above example, it is 57. The median therefore is the value corresponding
to this CF, and that is 2.
2.3. Mode
Sometimes a set of data is obtained where it is appropriate to measure a represen-
tative value in terms of ‘popularity’. The mode of a set of data is that value which
occurs most often or equivalently has the largest frequency and is appropriate for
all types of data. It is usually found by inspection. For discrete data this is easy.
The mode is simply the most common value. A data set may have no mode, one
mode (unimodal), two modes (bimodal) or more than two modes (multimodal).
Example. Given the following data, obtain the mode.
205,207,220,217,219,208,206,212,215,218,204, 205,219
Solution
Here, we have two modes, that is 205 and 219 as they both appear twice in
the data set.
50
STA 2100 Probability and Statistics I
65, 60, 60, 90, and 99. A quick scan of the data suggests that the amounts of
cholesterol vary somewhat in the different burgers.Find the amount of cholesterol
in a burger by calculating three measures of central tendency: median, mode,
and mean.
3. Measures of variability
A measure of central tendency is insufficient in itself to summarise data as it only
describes the value of a typical outcome and not how much variation there is in
the data. For example, the two data sets 6, 22, 38 and 21, 22, 23 both have
the same mean (22) and the same median (22). However the first set of data
ranges considerably from this value while the second stays very close. They are
clearly very different data sets. Measures of variability or dispersion are descriptive
statistics that describe how similar a set of scores are to each other . They describe
how “spread out” a distribution is around its center They include: Range, Inter-
quartile range (IQR) ,Quartile deviation (semi Inter-quartile range), Mean absolute
deviation , Variance and standard deviation
3.1. Range
This is the difference between the maximum and minimum observations in the
data set. The range is used when we have ordinal data or are presenting results to
people with little or no knowledge of statistics The range is rarely used in scientific
work as it is fairly insensitive since it depends on only two scores in the set of data,
the maximum and minimum values. It is also possible that two very different sets
of data can have the same range:
51
STA 2100 Probability and Statistics I
(x − µ)2
P
2
σ =
n
x2
P
= − µ2
n
The variance of a sample s2 is calculated differently as:
(x − x̄)2
P
2
s =
n−1
P 2
( x)2
P
x
= −
n − 1 n(n − 1)
Exercise 11. Find the variance and standard deviation for the following: 9, 10,
5, 6, 5, 7, 8, and 5.
The following notes gives more information on the variance and standard de-
viation of a population and a sample.
52
STA 2100 Probability and Statistics I
Guidelines
In Words In Symbols
1. Find the mean of the population x
μ
data set. N
variance. 2
N
6. Find the square root of the
x μ
2
variance to get the population
N
standard deviation.
Athiany, HKO 5
53
STA 2100 Probability and Statistics I
Guidelines
In Words In Symbols
1. Find the mean of the sample data x x
set. n
variance. s2
n 1
6. Find the square root of the
x x
2
variance to get the sample s
n 1
standard deviation.
Athiany, HKO 6
54
STA 2100 Probability and Statistics I
Example:
The following data are the closing prices for a certain Computer
store stock on five successive Fridays. The population mean is 61.
Find the population standard deviation.
Always positive!
55
STA 2100 Probability and Statistics I
14 14
12 =4 12 =4
Frequency
Frequency
10 s = 1.18 10 s=0
8 8
6 6
4 4
2 2
0 0
2 4 6 2 4 6
Data value Data value
Athiany, HKO 8
56
STA 2100 Probability and Statistics I
4. Measures of location/Position
4.1. Quartiles
We have seen that the median M is the value which halves the data (the lower
half and the upper half). Informally, the first quartile is the median of the lower
half; similarly, the third quartile is the median of the upper half.
To calculate the quartiles, we do the following:
57
STA 2100 Probability and Statistics I
(a) The first quartile, Q1, is the median of the lower half of the data.
(b) The third quartile, Q3 is the median of the upper half of the data.
4.2. Percentiles
Percentiles are like quartiles, except that percentiles divide the set of data into 100
equal parts while quartiles divide the set of data into 4 equal parts. Percentiles
measure position from the bottom.
Percentiles are most often used for determining the relative standing of an
individual in a population or the rank position of the individual.
Quartiles
The three quartiles, Q1, Q2, and Q3, approximately divide
an ordered data set into four equal parts.
Median
Q1 Q2 Q3
0 25 50 75 100
Athiany, HKO 18
58
STA 2100 Probability and Statistics I
Finding Quartiles
Example:
The quiz scores for 15 students is listed below. Find the first,
second and third quartiles of the scores.
28 43 48 51 43 30 55 44 48 33 45 37 37 42 38
28 30 33 37 37 38 42 43 43 44 45 48 48 51 55
Q1 Q2 Q3
About one fourth of the students scores 37 or less; about one
half score 43 or less; and about three fourths score 48 or less.
Athiany, HKO 19
59
STA 2100 Probability and Statistics I
Interquartile Range
The interquartile range (IQR) of a data set is the difference
between the third and first quartiles.
Interquartile range (IQR) = Q3 – Q1.
Example:
The quartiles for 15 quiz scores are listed below. Find the
interquartile range.
Q1 = 37 Q2 = 43 Q3 = 48
Athiany, HKO 20
60
STA 2100 Probability and Statistics I
61
STA 2100 Probability and Statistics I
28 37 43 48 55
28 32 36 40 44 48 52 56
Athiany, HKO 22
62
STA 2100 Probability and Statistics I
Athiany, HKO 23
5. Summary
In this section, we have discussed the measures of central tendency, variation/dispersion
and the measures of location for simple frequency distribution. We have noted
that each measure has its pros and cons, and the choice of the measure that one
would be using for their data may be determined by some other factors. However,
computing these measures for simple frequency distributions may not be that chal-
lenging as compared to when we have grouped data. In the next section, we look
at similar issues with regard to numerical data summaries; but instead consider a
grouped frequency distribution data.
6. Assignment 1
1. The following data is on the weight (in kg) of 50 computer components
leaving a an IT store located in Nairobi Forestland.
10.4, 10.0, 9.3, 11.3, 9.6, 11.2, 10.5, 8.5, 10.4, 8.2, 9.3, 9.6, 10.3, 10.0, 11.5,
11.3, 10.8,
8.9, 10.0, 9.5, 10.0, 11.3, 11.0, 9.7, 10.6, 9.9, 10.2, 10.6, 10.2, 8.1, 8.7, 9.4,
10.9,
10.0, 9.9, 9.2, 11.6, 9.6, 9.5, 10.4, 10.6, 8.8, 10.1, 10.3, 9.7, 10.7, 10.6, 12.8,
10.6, 10.2
63
STA 2100 Probability and Statistics I
64
STA 2100 Probability and Statistics I
Revision Questions
Exercise 12. Discuss the importance of the measures of central tendency, loca-
tion and dispersion in data analysis.
65
STA 2100 Probability and Statistics I
Chapter 5
Numerical Summaries of Data (Grouped Frequency
Distributions)
Learning outcomes
Upon completing this topic, you should be able to:
• Calculate and interpret the measures of central tendency for grouped data.
66
STA 2100 Probability and Statistics I
1. Introduction
In our last lecture we looked at the measures of central tendency, measures of
dispersion/variability and location for ungrouped data. We saw different ways
of computing the mean, that is arithmetic mean, harmonic mean and geometric
mean. We also saw the importance of each of these measures in terms of the data
that we have at hand. In addition, we also saw how to compute the measures of
dispersion and location, and their merits and demerits given a situation. All these
were done using the simple frequency distributions, also referred to as ungrouped
data.
In this lesson, we focus on the same measures as discussed in the last lesson but
we now consider grouped frequency distributions. To assist us solve the problems
that we shall deal with in this lesson, we refer to lesson 3 under the section
(Constructing a Frequency Distribution),where we showed how to construct
a frequency distribution (also known as grouped frequency) table. Therefore, in
this lesson, we shall go straight to the use of these tables assuming that we are
now familiar with the construction of the tables.
Example. Consider the following data that gives the masses of 100 male JKUAT
IT students as recorded in the frequency table below.
67
STA 2100 Probability and Statistics I
Class intervals and class limits: The symbol showing a class such as 60-62 is
called a class interval. The numbers 60 and 62 are called class limits. The
value 60 is the Lower Class Limit (LCL) while 62 is the Upper Class Limit
(UCL). Sometimes, it may be theoretically possible to have a class with no
UCL or LCL. Such is an open ended class, for instance - the class “30 years
and over”
Class Boundaries: In our table, we were correct in the value of masses to the
nearest kg. However, masses recorded in the interval 60-62 could theoreti-
cally include masses from 5905 to 62.5; e.g. 59.8 belongs to this class. The
numbers, 59.5 and 62.5 are called class boundaries or true class limits. In
practice, this is obtained by averaging the class limits of successive classes.
Class size or interval or width: This is the difference between the UCB and
the LCB, for instance 62.5-59.5=3
68
STA 2100 Probability and Statistics I
Example. The speed, to the nearest mile per hour, of 120 vehicles passing a check
point were recorded and grouped as follows:
Speed(mph) 21-25 26-30 31-35 36-35 46-60
No. of Cars 22 48 25 16 9
Estimate the mean of this distribution.
Solution
First, we need to work out the mid-interval values for the first interval21-25
using the LCB=20.5 and UCB=25.5
Thus mid-point for this class is midpoint = 12 (20.5 + 25.5) = 23
Thus we assume that all values in the interval 20-25 are now represented by
the value 23.
Similarly, we get the midpoints for the remaining classes as shown below.
speed(mph) Midpoint, x f fx
21-25 23 22 506
26-30 28 48 1344
31-35 33 25 825
36-45 40.5 16 648
46-60 53 9 477
Total
P P
f = 120 f x = 3800
Hence, mean x̄= f x/ f = 3800/120 = 31 23
P P
69
STA 2100 Probability and Statistics I
adjusted further if we find that all classes have the same class width (interval)
equal to a constant c. Therefore, di = cui
Thus we have x̄ = a + f d/ f = a + f cui / f
P P P P
the mean of u
The formula x̄ = a+cū is what we refer to as the coding method for computing
the mean and other measures from a frequency distribution table. It is mainly
useful when class intervals are equal!
Example. Consider the data that we saw in Example 1. Using the method of
assumed mean and coding method, we can obtain the mean as follows, taking
a = 67:
Mass (kg) No. of class d= u= fu
Students,f mark, x − a d/3
x
60-62 5 61 -6 -2 -10
63-65 18 64 -3 -1 -18
66-68 42 67 0 0 0
69-71 27 70 3 1 27
72-74 8 73 6 2 16
Total
P P
f= fu =
100 15
Thus, using the coding formula we have x̄ = a + cū , where ū =
P
f u/N =
15/100 = 0.15 remember
P
f =N
Implying x̄ = a + cū = 67 + 3(0.15) = 67.45kg
70
STA 2100 Probability and Statistics I
Given grouped frequency data, the best that we can do is to estimate the group/class
that contains the median item and hence obtain the ‘theoretical’ value. To achieve
this objective, we proceed as follows:
Step1: Form a Cumulative Frequency (CF) column
Step 2: Find N/2
Step3: Find that F value that first exceeds, N/2 which identifies the median
class M
Step 4: Calculate the median using the formula
median = LM + ( N/2+F fM
M −1
)CM
Where:
LM : is the lower class boundary of the median class
FM −1 :if the cumulative frequency of the class just prior to the median class
fM :is the observed frequency of the median class
CM :is the class interval/width of the median class
Example. Estimate the median for the following data which represents the ages
of 130 representatives who took part in a statistical survey.
Age in Years 20-25 25-30 30-35 35-40 40-45 45-50
No. of reps 2 14 29 43 33 9
Solution:
Using the procedure illustrated above, we have
Age in Years 20-25 25-30 30-35 35-40 40-45 45-50
No. of reps 2 14 29 43 33 9
CF 2 16 45 88 121 130
Thus, N/2 = 130/2 = 65
Using the value 65, the CF value which first exceeds 65 is 88 thus, the class
represented by CF=88 is the median class.
The median class is therefore the class 35-40
So, median = LM + ( N/2+F fM
M −1
)CM = 35 + ( 130/2+45
43
)5=35 + ( 20
43
)5 = 35 +
2.33 = 37.33 years
71
STA 2100 Probability and Statistics I
As for the case of the mean and median, the mode for grouped data cannot also
be determined exactly, but can be estimated by use of the interpolation technique
or graphically using a histogram.
An estimate can therefore be obtained as follows:
Step 1: Determine the modal class (class with the highest frequency)
Step 2: Calculate D1 =difference between largest frequency and the frequency
immediately preceding it.
Step 3: Calculate D2 =difference between largest frequency and the frequency
immediately following it.
Step 4: Use the interpolation formula mode = Lm + ( D1D+D 1
2
)Cm
Where:
Lm :is the lower class limit of the median class
Cm :modal class interval/width
Example. Estimate the mode for the following data which represents the ages of
130 representatives who took part in a statistical survey.
Age in Years 20-25 25-30 30-35 35-40 40-45 45-50
No. of reps 2 14 29 43 33 9
Solution:
D1 = 43 − 29 = 14
D2 = 43 − 33 = 10
Lm = 35
Cm = 5
Hence, mode = 35 + ( 10+14
14
)5 = 35 + ( 14
24
)5 = 37.92 years
72
STA 2100 Probability and Statistics I
This can then be used to find the variance of the data as shown in the following
example.
Example. The data below relates to the number of successful sales made by the
salesmen employed by a large microcomputer firm in a particular quarter. Calculate
the standard deviation of the number of sales.
No. of 0 to 4 5 to 9 10 to 15 to 20 to 25 to
sales 14 19 24 29
No. of 1 14 23 21 15 6
salesmen,f
Solution:
We can solve this problem by first finding the midpoint, computing the mean
and then variance
No. of Sales No.of mid- fx f x2
Sales- point,
men, (x)
f
0 to 4 1 2 2 4
5 to 9 14 7 98 686
10 to 14 23 12 276 3312
15 to 19 21 17 357 6069
20 to 24 15 22 330 7260
25 to 29 6 27 162 4374
Total 80 1225 21,703
Hence, mean, x̄ = 80 = 15.31 sales
1225
q √ √
sd = 21703
80
− (15.31) 2 = 271.29 − 234.40 = 36.89 = 6.1 sales
Note: In this case we have assumed that we are dealing with the whole population,
hence we divide the denominator by n and not n − 1.
73
STA 2100 Probability and Statistics I
We can use the coding method to find the standard deviation and variance of the
grouped data as follows:
Let x = a + cu and x̄ = a + cx̄
therefore, variance = f (x − x̄)2 / f , substituting the above figures, we
P P
have
= f (a + cu − a − cū)2 / f simplifying the equation we have
P P
=c2 f (u − ū)2 / f
P P
74
STA 2100 Probability and Statistics I
75
STA 2100 Probability and Statistics I
4. Measures of location/Position
4.1. Quartiles and Percentiles
We have already discussed how to find the median of grouped data. The process
of obtaining the quartiles and percentiles in a grouped data is quite similar to what
we have seen with the median. For instance,if we are interested in the 1st quartile,
then instead of using N ∗ 2/4 = N/2 we use N ∗ 1/4 = N/4 and the rest remain
similar to the median computation procedure. Remember, the median is the 2nd
quartile. Similarly, for the percentiles, we divide N by 100. For instance, the 1st
percentile will be N ∗ 10/100 = N/10.
Thus,
Q1 is the 41 nth value
Q2 is the 24 nth value
Q3 is the 34 nth value
76
STA 2100 Probability and Statistics I
Exercise 13. Jua Kali Solicitors monitored the time spent on consultations with
a random sample of 120 of their clients. The times spent, to the nearest minute
are summarized in the following table.
Time 10- 15- 20- 25- 30- 35- 45- 60- 90–
14 19 24 29 34 44 59 89 119
No. 2 5 17 33 27 25 7 3 1
of
clients
(a) Obtain the estimates of the median and quartiles of this distribution.
(b) Comment on the skewness of the distribution.
The mean is 4.6 errors per page and standard deviation is 2 errors.
(b) For the errors,y,on the further 50 pages
Mean=4.4
Therefore, 4.4 = y/50, which implies that
P P
y = 4.4 ∗ 50 = 220
The standard deviation =2.2
77
STA 2100 Probability and Statistics I
= 5032−1210
250
− 4.562 = 4.1744
√
Standard deviation = 4.1744 = 2.04(3.sf)
6. Summary
The relationship between mean, median and mode is as follows.
The median lies between the mean and mode but closer to the mean by a
factor of 2 to 1. Hence the relationship median − mode = 2(mean − median) is
approximately true. We can therefore express the following relationships:
• median = 2(mean)+mode
3
• mean = 3(median)−mode
2
To comment on the skewness of the distribution of a data set, we may use the
Quartile coefficient of skewness given by (Q3 −QQ23)−(Q
−Q1
2 −Q1 )
Learning Activities
• Briefly show how you can use the coding method to obtain the mean and
standard deviation of a simple frequency distribution table.
• With relevant examples, briefly discuss how we can compute the quartiles
and percentiles for grouped frequency table.
78
STA 2100 Probability and Statistics I
Chapter 6
Introduction to Probability
Learning outcomes
Upon completing this topic, you should be able to:
• Define probability
• Calculate probabilities
79
STA 2100 Probability and Statistics I
1. Introduction
Probability is the language we use to model uncertainty. We all intuitively under-
stand that few things in life are certain. There is usually an element of uncertainty
or randomness around outcomes of our choices. For instance in business this un-
certainty can make all the difference between a good investment and a poor one.
Hence an understanding of probability and how we might incorporate this into
our decision making processes is important. In this lesson, we look at the logical
basis for how we might express a probability and some basic rules that probabilities
should follow. In subsequent lessons, we look at how we can use probabilities to
aid decision making. It is advisable that you revisit the set theory lesson to help
you understand this lesson better.
2. Definitions
The probability of a specific event is a mathematical statement about the likelihood
that it will occur. All probabilities are numbers between 0 and 1, inclusive; a
probability of 0 means that the event will never occur, and a probability of 1
means that the event will always occur. We often use the letter P to represent a
probability. For example, P (Rain) would be the probability that it rains. In other
cases P r is used to represent a probability. It is important to understand some
terms used in probability. They include:
Probability Experiment
An experiment is an activity where we do not know for certain what will happen,
but we will observe what happens. For example:
• We may ask someone whether or not they have used our IT products.
• Rolling a die and observing the number that is rolled is a probability experi-
ment.
80
STA 2100 Probability and Statistics I
Outcome
An outcome, or elementary event, is one of the possible things that can happen.
For example, suppose that we are interested in the shoe size of the next customer
to come into a shoe shop. Possible outcomes include “eight”, “twelve”, “nine and
a half” and so on. In any experiment, one, and only one, outcome occurs.
The result of a single trial in a probability experiment is the outcome.
Sample space
The sample space is the set of all possible outcomes. For example, it could be the
set of all shoe sizes or the sample space when rolling a die has six outcomes. {1,
2, 3, 4, 5, 6}
Event
An event consists of one or more outcomes and is a subset of the sample space.
An event is usually denoted using a capital (uppercase) letter. For example “the
shoe size of the next customer is less than 9” is an event. It is made up of all of
the outcomes where the shoe size is less than 9. Of course an event might contain
just one outcome. We can set a letter say E to represent this event.
For instance, A die is rolled. Event A is rolling an even number.
A simple event is an event that consists of a single outcome.
Example. A die is rolled. Event A is rolling an even number. This is not a simple
event because the outcomes of event A are {2, 4, 6}.
3. Rules of probability
• Probabilities are usually expressed in terms of fractions or decimal numbers
or percentages.
• All probabilities are measured on a scale ranging from zero to one. The
probabilities of most events lie strictly between zero and one. An event with
probability zero is an impossible event and an event with probability one is
said to be a certain event.
81
STA 2100 Probability and Statistics I
• The collection of all possible outcomes, that is the sample space, has a
probability of 1. For example, if an experiment consists of only two outcomes
– success or failure – then the probability of either a success or a failure is
1. That is P(success or failure) = 1.
0
• With respect to an event E, the complementary event, denoted as E c or E
or ∼ E (read as “E prime”), is the negation of the event E. For example, if
we consider the event that it will rain tomorrow. The complement of this
event is the event that it will not rain tomorrow.We should note that the
probability of an event E and its complement is equal to 1 i.e.
P (E) + P (E c ) = 1
Example. There are 5 red chips, 4 blue chips, and 6 white chips in a basket. Find
the probability of randomly selecting a chip that is not blue.
Solution: P (selecting a blue chip) = 4/15 = 0.267
implying P (not selecting a blue chip) = 1 − 0.267 = 0.733
• Two or more events are said to be mutually exclusive if both cannot occur
simultaneously. In the example above, the outcomes success and failure are
mutually exclusive because both cannot occur at the same time.Two events
A and B are mutually exclusive if A ∩ B = 0.
Example. Let A = the event that it is Monday, B = the event that it is Tuesday,
and C = the event that it is the year 2014. A and B are mutually exclusive events,
since it cannot be both Monday and Tuesday at the same time. A and C are not
mutually exclusive events, since it can be a Monday in the year 2014.
• Two events are said to be independent if the occurrence of one does not affect
the probability of the second occurring. If two events are independent, then
the probability that both will occur is equal to the product of their individual
probabilities. In other words, if A and B are independent, then
82
STA 2100 Probability and Statistics I
P (A ∩ B) = P (A) × P (B)
Example. If you toss a coin and look out of the window, it would be reasonable
to suppose that the events “get heads” and “it is raining” would be independent.
However, not all events are independent.
83
STA 2100 Probability and Statistics I
The larger the experiment, the closer this probability is to the “true” probability.
The frequentist view of probability regards probability as the long run relative
frequency (or proportion). So, in the defects example, the “true” probability of
getting a defective item is the proportion obtained in a very large experiment
(strictly an infinitely long sequence of trials). In the frequentist view, probability is
a property of nature and, since, in practice, we cannot conduct infinite sequences
of trials, in many cases we never really know the “true” values of probabilities. We
also have to be able to imagine a long sequence of “identical” trials. This does not
seem to be appropriate for “one-off” experiments like the launch of a new product.
For these reasons (and others) some people prefer the subjective or Bayesian view
of probability.
Example. A travel agent determines that in every 50 reservations she makes, 12
will be for a cruise. What is the probability that the next reservation she makes
will be for a cruise?
Solution:
12
p(cruise) = 50
= 0.24
For instance, Sally flips a coin 20 times and gets 3 heads. The empirical probability
84
STA 2100 Probability and Statistics I
is 20
3
. This is not representative of the theoretical probability which is 12 .
As the number of times Sally tosses the coin increases, the law of large
numbers indicates that the empirical probability will get closer and closer
to the theoretical probability. This is referred to as the Law of Large
Numbers.
4.3. Subjective
We are probably all intuitively familiar with this method of assigning probabilities.
When we board an Airplane, we judge the probability of it crashing to be sufficiently
small that we are happy to undertake the journey. Similarly, the odds given by
bookmakers on a football match reflect people’s beliefs about which team will win.
This probability does not fit within the frequentist definition as the match cannot
be played more than once.
One potential difficulty with using subjective probabilities is that it is sub-
jective. So the probabilities which two people assign to the same event can be
different. This becomes important if these probabilities are to be used in deci-
sion making. For example, if you were deciding whether to launch a new product
and two people had very different ideas about how likely success or failure of this
product was, then the decision to go ahead could be controversial.
If both individuals assessed the probability of success to be 0.8 then the decision
to go ahead could easily be based on this belief. However, if one said 0.8 and the
other 0.3, then the decision is not straightforward. We would need a way to
reconcile these different positions.
Subjective probability is based on personal judgment, accumulation of knowl-
edge and experience. For instance, medical doctors sometimes assign subjective
probabilities to the length of life expectancy of people with breast cancer.
5. Laws of probability
5.1. Multiplication law
The probability of two independent events E1 and E2 both occurring can be written
as
E2) = P (E1) × P (E2), and this is known as the multiplication law
T
P (E1
85
STA 2100 Probability and Statistics I
of probability.
For example, the probability of throwing a six followed by another six on two
rolls of a die is calculated as follows. The outcomes of the two rolls of the die are
independent. Let E1 denote a six on the first roll and E2 a six on the second roll.
Then
P (two sixes) = P (E1 and E2)
P (E1) × P (E2) = 1
6
× 61 )= 1
36
86
STA 2100 Probability and Statistics I
A and B
A
B A B
Athiany H,K O 18
87
STA 2100 Probability and Statistics I
A B
1
4
2
88
STA 2100 Probability and Statistics I
A J 9 2 B
3 10
J J A 7
K 4
J 5
6Q8
89
STA 2100 Probability and Statistics I
90
STA 2100 Probability and Statistics I
Athiany H,K O 22
91
STA 2100 Probability and Statistics I
6. Conditional probability
So far we have only considered probabilities of single events or of several indepen-
dent events, like two rolls of a die. However, in reality, many events are related.
For example, the probability of it raining in 5 minutes time is dependent on whether
or not it is raining now. We need a mathematical notation to capture how the
probability of one event depends on other events taking place. We do this as
follows. Consider two events A and B. We write P (A|B) for the probability of
A given that B has already happened. We describe P (A|B) as the conditional
probability of A given B.
We can calculate these conditional probabilities using the formula
P (A and B)
P (A|B) =
P (B)
that is, in terms of the probability of both events occurring, P(A and B), and the
probability of the event that has already taken place, P(B).
92
STA 2100 Probability and Statistics I
Exercise 15. Given the events A and B are independent, copy and complete the
following contingency table. The results can b obtained as follows:
93
STA 2100 Probability and Statistics I
0
A A
3
B 20
y u
0
B x z v
1
4
t 1
7. Tree Diagrams
In some cases, especially where there are three or more different events being
considered, tree diagrams are an alternative to the contingency tables.
Tree diagrams or probability trees are simple clear ways of presenting proba-
bilistic information. Let us first consider a simple example in which a fair coin is
tossed twice. Suppose we are interested in the probability that we get a head on
both tosses. This probability can be calculated as
P(Head and Head) = P(Head on 1st toss) × P(Head on 2nd toss|head on 1st
toss)
This example can be represented as a tree diagram in which experiments are
represented by circles (called nodes) and the outcomes of the experiments as
branches. The branches are annotated by the probability of the particular out-
come.
Example. In a large farm, 20% of a particular kind of flower is red and 80% is
white. The farmer decides to take samples of flowers from the production of this
particular kind. What is the probability that he obtains;
(a) One or two red flowers in a sample of two?
(b) At least two red flowers in a sample of three?
Solution:
This information can be represented in the tree diagram as follows.
94
STA 2100 Probability and Statistics I
Start
1/5 4/5
R W
1/5 4/5 1/5 4/5
R W R W
R W R W R W
W R
Resulting in…
RRR RRW RWR RWW WRR WRW WWR WWW
In this problem, we assume that probability of these events remain the same
even after picking a small number of flowers from the production line.
(a) P (RR) + P (RW ) + P (W R) this represents one or two red flowers
But
P (RR) = 1/5 ∗ 1/5 = 1/25
P (RW ) = 1/5 ∗ 4/5 = 4/25
P (W R) = 4/5 ∗ 1/5 = 4/25
=⇒P (RR) + P (RW ) + P (W R) = 1/25 + 4/25 + 4/25 = 9/25
Alternatively,
P (one or two red f lowers) = 1 − P (no red f lower) = 1 − P (W W )
= 1 − (4/5 ∗ 4/5) = 1 − 16/20 = 9/25
(b) P (RRR) + P (RRW ) + P (RW R) + P (W RR)
= (1/5)3 + (1/5 ∗ 1/5 ∗ 4/5) + (1/5 ∗ 4/5 ∗ 1/5) + (4/5 ∗ 1/5 ∗ 1/5) = 13/125
Example. A box has 6 blue beads and 4 red beads. Three beads are drawn at
random (without replacement). What is the probability that: (a) they are all blue
(b) there are exactly two blue balls (c) there is at least one blue bead
Solution:
95
STA 2100 Probability and Statistics I
In the case of draws made without replacement, and tree diagrams being com-
plex/many branches, we can use the combinations for quick computation of prob-
abilities
(a) For this case, total number of ways of selecting 3 beads from 10 is
10
C3 = 120
Selecting 3 from 6 is 6 C3 = 20
Therefore, P (All blue) = 6 C3 /10 C3 = 20/120 = 1/6
(b) Selecting 2 from 6= 6 C2 = 15
selecting 1 from 4 =4 C1 = 4
Therefore, exactly 2 red will be 15∗4
120
= 1/2
(c) 1 − P (all red) = 1 − 4 C3 /10 C3 = 1 − 4/120 = 29/30
8. Bayes Theorem
Suppose we know P (A),P (∼ A) and also P (B/A) and P (B/ ∼ A), then we can
represent the first branches of a tree diagram and those of B and ∼ B in the
second branches. Can we then determine P (A/B)?
This problem can be solved by using Thomas Bayes theorem. Bayes was
an English Mathematician and his theorem has given us a fundamental result of
statistical inference.
Mathematically, Bayes theorem gives the relationship between probabilities of
A and B, P (A) and P (B) and the conditional probabilities of A given B and
Bgiven A; denoted by P (A/B), P (B/A)
Commonly, Bayes theorem is;
Simple P (A/B) = P (B/A)P
P (B)
(A)
f or P (B) 6= 0
(The meaning depends on the interpretation of probability ascribed to the
terms)
Extended P (A/B) = P (B/A)PP(A)+P
(B/A)P (A)
(B/A0 )P (A0 )
Example. Kamau has two gardeners, David and James. David comes on 1/3 of
the occasions and James 2/3 of the occasions. There is a probability of 1/10
that David will forget to water the flowers and a probability of 1/2 that James
will forget to water the flowers. One day, Kamau had to leave the house before
the gardener arrived. On his return, he found that the gardener had come and
gone, and also that the flowers were not watered. What is the probability that it
is James who came that day?
96
STA 2100 Probability and Statistics I
Solution:
Let
D: David comes
J: James comes
W: Flowers watered
The tree diagram will then look like this
W’
1/2
J
2/3
W
D
9/10
W
Exercise 16. A certain video store uses blank tapes bought from two sources,
say source A and source B. Suppose that the owner of the video store buys 30%
of the tapes from A and its is known that 5% of the video tapes are defective,
then buys 70% from source B when 20% are usually defective. On recording some
movies on the tapes, the owner discovers that certain tape is defective. What is
97
STA 2100 Probability and Statistics I
9. Summary
An experiment is a process that, when performed, results in one and only one of
many observations. The observations are called the outcomes of an experiment.
The collection of all possible outcomes of an experiment is called a sample space.
A sample space is denoted by S. Therefore, the sample space for an experiment
of inspecting a computer fan is written as: S = {good, def ective} or for tossing
a coin twice is S = {0, 1, 2}for the number of heads obtained.
For three or more events, it is easy to construct a probability space than a
contingency table, for contingency tables are only practicable for two events!
98
STA 2100 Probability and Statistics I
(a) What is the sample space if the picked piece is not replaced?
(b) What is the sample space if the picked piece is replaced?
2. If 85% of people have a bowl of cereal for breakfast, 60% of people have
toast, and 50% of people have both cereal and toast for breakfast, what
percentage of people have neither cereal nor toast for breakfast?
5. Two events A and B are such that P (A) = 1/4, P (A|B) = 1/2 and
P (B|A) = 2/3. (a) Are A and B independent? (b) Are A and B Mutually
exclusive? (c) Find P (A ∩ B) (d) Find P (B).
99
STA 2100 Probability and Statistics I
6. A group of 50 BIT students were asked which of the three Computer Science
Journals, A, B or C they read. The results showed that 25 read A, 16 read
B, 14 read C. 5 read both A and B, 4 read both B and C, 6 read both C
and A and 2 read all three.
100
STA 2100 Probability and Statistics I
Learning Activities
• Two fair six faced dice are rolled. Let T be the sum is 10 and B be the score
is double. Construct a tree diagram with the first branch being T, and also
another tree diagram with the first branch being D.
101
STA 2100 Probability and Statistics I
Chapter 7
Discrete Probability Distribution
Learning outcomes
Upon completing this topic, you should be able to:
102
STA 2100 Probability and Statistics I
1. Introduction
An important part of any analysis of decision making under stochastic conditions
is a probability distribution.Probability distributions state the relative frequency of
occurrence of a set of mutually exclusive events. Probability distributions can be
univariate or multivariate. They give the relative frequency of observing a particular
event.We saw that surveys can be used to get information on population quanti-
ties.In most cases, it is not possible to measure the variables on every member of
the population and so some sampling scheme is used. This means that there is
uncertainty in our conclusions. Before we can make inferences about populations,
we need a language to describe the uncertainty we find when taking samples from
populations.This can be done using probability distributions.
2. Random Variable
In many experiments the outcomes of the experiment can be assigned numerical
values. For instance, if you roll a die, each outcome has a value from 1 through
6. If you ascertain the midterm test score of a student in your class, the outcome
is again a number.
A random variable is just a rule that assigns a number to each outcome of
an experiment. These numbers are called the values of the random variable. We
often use letters like X, Y and Z to denote a random variable. Here are some
examples
• Discrete random variables that can take on only finitely many values (like
the outcome of a roll of a die) are called finite random variables.
103
STA 2100 Probability and Statistics I
A continuous random variable, on the other hand, can take on any values
within a continuous range or an interval, like the temperature, or the height of an
athlete in centimeters, the yield of maize from an acre of land, the weight of a
laptops in a supplier’s store.
104
STA 2100 Probability and Statistics I
The distinction between the capital letter X and small letter x is important;
X stands for the random variable in question, whereas x stands for a specific value
or outcome.
Or
Example. Two tetrahedral dice, with faces labeled 1,2,3,4 are thrown and the
score noted, where the score is the sum of the two numbers on which the dice
land. Find the probability density function (pdf ) of X, where X is the random
variable ’the score where two dice are thrown’
Solution:
x 2 3 4 5 6 7 8
P (X = x) 1/16 2/16 3/16 4/16 3/16 2/16 1/16
Since, 1
P
P (X = x) = 16 (1 + 2 + 3 + 4 + 3 + 2 + 1) = 1
Thus X is a random variable.
105
STA 2100 Probability and Statistics I
3.2. Expectation
E(X) read as 0 E of X 0 gives the average or typical value of X, known as the
expected value or expectation of X. X represents the random variable.
The mean of a discrete random variable is the mean of its probability distribu-
tion. This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value.
This is the value we expect to observe per repetition, if we repeat an experiment
several times. This value is a useful summary of the variable’s distribution.
Stating the expected value gives a general impression of some random variable
without giving full details of its probability distribution. The expected value of a
random variable X is symbolized by E(X) or µ, read as “E of X” and is denoted
as;
X
E [X] = xP (X = x)
Exercise 17. A fruit machine consists of three windows which operate indepen-
dently. Each window shows pictures of fruits: Lemon, Apples, Cherries or Bananas.
The probability that a window shows a particular fruit is as follows:
P (Lemon) = 0.4
P (Cherries) = 0.2
106
STA 2100 Probability and Statistics I
P (Apple) = 0.1
P (Cherries) = 0.3
The rule for playing the game on the fruit machine is as follows: It costs Kshs
10 to play the game. A player will win Kshs 100 if he/she gets three Apples in a
row, Kshs 50 if he/she gets three Cherries in a row, Kshs 40 if he/she gets three
Lemons in a row and Kshs 80 if he/she gets two Apples and a Cherry in the game.
The order in which the fruits appear is not important. Based on this information,
would you expect to gain or lose if you play the game?
For instance,
P
E(10X) = 10xP (X = x)
E(X 2 ) = x2 P (X = x)
P
E( X1 ) =
P1
x
P (X = x)
P
E(X − 4) = (x − 4)P (X = x)
Example. The random variable X has a distribution function shown below.
x 1 2 3
P (X = x) 0.1 0.6 0.3
Find;
i) E(X)
P
E(X) = xP (X = x) = (1 ∗ 0.1) + (2 ∗ 0.6) + (3 ∗ 0.3) = 2.2
ii) E(3)
P
E(3) = 3P (X = 3) = (3 ∗ 0.1) + ... + (3 ∗ 0.3) = 3
iii) E(5X)
P
E(5X) = 5xP (X = x) = (5 ∗ 0.1) + ... + (15 ∗ 0.3) = 11,
Notice that 5E(X) = 5 ∗ 2.2 = 11
In general, for two constants a and b;
E(a) = a
E(aX) = aE(X)
E(aX + b) = aE(X) + b
107
STA 2100 Probability and Statistics I
Exercise 18. X is the number of heads obtained when two coins are tossed.
Find (a) the expected number of heads (b)E(X 2 ) (c) E(X 2 − X)
3.3. Variance
The variance of a random variable is a non-negative number which gives an idea
of how widely spread the values of the random variable are likely to be; the larger
the variance, the more scattered the observations on average.
Stating the variance gives an impression of how closely concentrated round the
expected value the distribution is; it is a measure of the ’spread’ of a distribution
about its average value. Variance is symbolized by V (X) or V ar(X) or σ 2 and is
defined as:
or var(X) = E(X 2 ) − µ2
Example. Find the variance of the following distribution
x 1 2 3 4 5
P (X = x) 0.1 0.3 0.2 0.3 0.1
var(X) = x2 P (X = x) − xP (X = x)
P P
= (12 ∗ 0.1) + (22 ∗ 0.3) + ... + (52 ∗ 0.1) − [(1 ∗ 0.1) + ... + (0.1 ∗ 5)]2
= 10.4 − 9 = 1.4
In general, if a and b are any two constants, then;
var(a) = 0
var(aX) = a2 var(X)
var(aX + b) = a2 var(X)
108
STA 2100 Probability and Statistics I
109
STA 2100 Probability and Statistics I
The variance of X is pq
Example. A random variable whose value represents the outcome of a coin toss
(1 for heads, 0 for tails, or vice-versa) is a Bernoulli variable with parameter p,
where p is the probability that the outcome corresponding to the value 1 occurs.
For an unbiased coin, where heads or tails are equally likely to occur, p = 0.5.
110
STA 2100 Probability and Statistics I
There are only two possible outcomes: either the card is an Ace or not. Therefore,
n = 8, p = 4/52 = 1/13, q = 12/13 and x = 0, 1, 2, 3, 4, 5, 6, 7, 8
In the next few sections of the lesson, we discuss the binomial distribution and
mainly showing how to solve a number of problems.We also define the Binomial
probability function.
Binomial Probability Formula
In a binomial experiment, the probability of exactly x
successes in n trials is
P (x ) nC x p xq n x n! p xq n x .
(n x )! x !
Example:
A bag contains 10 chips. 3 of the chips are red, 5 of the chips are
white, and 2 of the chips are blue. Three chips are selected, with
replacement. Find the probability that you select exactly one red chip.
p = the probability of selecting a red chip 3 0.3
10
q = 1 – p = 0.7 P (1) 3C1(0.3)1(0.7)2
n=3 3(0.3)(0.49)
x=1 0.441
Athiany, HKO 28
111
STA 2100 Probability and Statistics I
Athiany, HKO 29
112
STA 2100 Probability and Statistics I
Finding Probabilities
Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected.
x P ( x) a.) Find the probability of selecting no
0 0.24 more than 3 red chips.
1 0.412
2 0.265
3 0.076 b.) Find the probability of selecting at
4 0.008 least 1 red chip.
a.) P (no more than 3) = P (x 3) = P (0) + P (1) + P (2) + P (3)
= 0.24 + 0.412 + 0.265 + 0.076 = 0.993
b.) P (at least 1) = P (x 1) = 1 – P (0) = 1 – 0.24 = 0.76
Complement
Athiany, HKO 30
113
STA 2100 Probability and Statistics I
Probability
0.4
2 0.265
0.3
3 0.076
4 0.008 0.2
0.1
0 x
0 1 2 3 4
Number of red chips
Athiany, HKO 31
114
STA 2100 Probability and Statistics I
Athiany, HKO 32
5. Summary
A discrete probability distribution lists each possible value the random variable can
assume, together with its probability. A probability distribution must satisfy the
following conditions.
The mean of a discrete random variable is the mean of its probability distribution.
This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value. This is the value we expect
to observe per repetition, if we repeat an experiment several times. This value is
a useful summary of the variable’s distribution. Stating the expected value gives
a general impression of some random variable without giving full details of its
probability distribution.
The variance of a random variable is a non-negative number which gives an
idea of how widely spread the values of the random variable are likely to be; the
larger the variance, the more scattered the observations on average. Stating the
variance gives an impression of how closely concentrated round the expected value
the distribution is; it is a measure of the ’spread’ of a distribution about its
average value
Guidelines for Constructing a Discrete Probability Distribution
115
STA 2100 Probability and Statistics I
• Check that each probability is between 0 and 1 and that the sum is 1.
116
STA 2100 Probability and Statistics I
3. Suppose a fair six sided die is tossed 5 times. What is the probability of
getting exactly 2 fours?
(a) Use the binomial probability formula to complete the probability distri-
bution .
X 0 1 2 3 4
P (X = x) 0.316 0.422 ? 0.047 ?
117
STA 2100 Probability and Statistics I
(e) What is the probability that two or fewer seniors in the sample played
sports all four years?
(f) If a new random variable Y = X 2 + 2X, use the above table to obtain
E(Y ) and sd(Y )
118
STA 2100 Probability and Statistics I
Chapter 8
Relations (Correlation)
Learning outcomes
Upon completing this topic, you should be able to:
119
STA 2100 Probability and Statistics I
1. Introduction
It is frequently of interest to know whether two or more variables are related and if
so, how they are related . For instance, we may ask if there exists a relationship
between a students’ mean grade and the class attendance record! If the two
variables are related, how are they related? Similarly, the president of a large
computer firm knows very well that there is a tendency for sales to increase
as advertising expenditures increases, but how strong is that tendency, and how
can he/she predict the approximate sales that will result from various advertising
expenditures?
The number of possible relationships between two continuous variables is infi-
nite. Of course there may be no relationship at all, but in the simplest case where
one does exist, it may be that high scores on one variable tend to accompany
high scores on the second.
For instance, Age and vocabulary ; the younger you are the fewer words
one is likely to know; while the older you are the more you know. This kind of
relationship is described as positive relationship. A second kind of relationship is
one in which high values of one variable tend to accompany lower values of the
other. For example, the degree of education and crime rate. Other kinds of
relationships may exist too e.g. Age and physical strength. It increases up to
a certain level then drops to lower values.
Two variables can correlate quite nicely, yet have no cause/effect relationship.
Correlation metrics are measures of associations between variables. It is important
to note that association is a concept that has no implication of causation.
In this lesson, and the next lesson, we will examine some widely employed
procedures that are used to analyze the relationship between the two variables
(e.g. sales and advertising). These procedures are part of what is known as
correlation and linear regression. We shall start by looking at correlation, then
look at regression in the next lesson.
120
STA 2100 Probability and Statistics I
2. Bivariate Data
So far we have confined our discussion to the distributions involving only one
variable. However, in practical applications, we might come across certain set
of data, where each item of the set may comprise of the values of two or more
variables.
A bivariate data is a a set of paired measurements which are of the form
• The series of sales revenue and advertising expenditure of the various branches
of an IT firm in a particular year.
In this kind of data, each pair represents the values of the two variables. Our
interest therefore, is to find a relationship (if it exists) between the two variables
under study.
Positive Correlation
If the values of the two variables deviate in the same direction i.e. if an increase (or
decrease) in the values of one variable results, on an average, in a corresponding
increase (or decrease) in the values of the other variable the correlation is said to
be positive. Some examples of series of positive correlation are:
121
STA 2100 Probability and Statistics I
Negative Correlation
Scatter diagrams will generally show one of six possible correlations between the
variables:
122
STA 2100 Probability and Statistics I
1. List the two values for each participant. You should do this in a column
format so as not to get confused
2. Compute the sum of all the x values, and compute the sum of all the y
values.
123
STA 2100 Probability and Statistics I
• Spearman’s rho
• Kendall’s tau.
Cov(X, Y ) Sxy
rxy = =
Sx Sy Sx Sy
where Cov(X, Y ) is the co-variance between the variables X and Y which is
given by;
Σ(x − x)(y − y)
Cov(x, y) =
n
This equation can be simplified to:
Σxy − nx y
Cov(x, y) =
n
where r
Σx Σy Σ(x − x)2
x= , y= , Sx = and
n n n
r
Σ(y − y)2
Sy =
n
124
STA 2100 Probability and Statistics I
Example. Given the following data, where X is the Number of power blackouts at
night reported in a month and Y is the corresponding Number of crimes reported in
that month at Juja police station, Thika. Find the Pearson’s moment correlation
coefficient and comment on the results.
X 93 44 53 08 71 81 06 10 32 21
Y 45 62 12 28 92 84 73 03 51 32
Solution:
r
Σ(x − x)2
Sx = = 30.174
n
r
Σ(y − y)2
Sy = = 28.368
n
Cov(X, Y ) 368.52
rxy = = = 0.4305
Sx Sy 30.174 ∗ 28.368
The formula for the linear correlation coefficient r for data points can be written
if different forms. For instance,
Cov(X, Y )
rxy =
Sx Sy
or
n(Σxy) − (Σx)(Σy)
rxy = p p
n(Σx2 ) − (Σx)2 n(Σy 2 ) − (Σy)2
where n is the number of pairs of reading.
Using the second expression, we can obtain correlation as illustrated using the
following example;
Example. Suppose we have the following data on Age Versus the Price of Datsun
Z cars.
Age (X) 5 7 6 6 5 4 7 6 5 5 2
Price (Y) 80 57 58 55 70 88 43 60 69 63 118
We can use the data to answer the following questions:
2. Interpret the value of r obtained in the previous part above in-terms of the
linear relationship between age and price
125
STA 2100 Probability and Statistics I
n(Σxy) − (Σx)(Σy)
rxy = p p
n(Σx2 ) − (Σx)2 n(Σy 2 ) − (Σy)2
Example-Solution
1. The Datsun Z data can be summarized as follows, with the last Row indi-
cating the sum of the respective columns (Σ), and n=11
x y xy x2 y2
5 80 400 25 6,400
7 57 399 49 3,249
6 58 348 36 3,364
6 55 330 36 3,025
5 70 350 25 4,900
4 88 352 16 7,744
7 43 301 49 1,849
6 60 360 36 3,600
5 69 345 25 4,761
5 63 315 25 3,969
2 118 236 4 13,924
58 761 3,736 326 56,785
Applying the formula
n(Σxy) − (Σx)(Σy)
rxy = p p
n(Σx ) − (Σx)2 n(Σy 2 ) − (Σy)2
2
that is
11(3, 736) − (58)(761)
rxy = p p
11(326) − (58)2 11(56, 785) − (761)2
126
STA 2100 Probability and Statistics I
therefore
rxy=−0.957
2. This implies that the data points are expected to be clustered closely about
the regression line.
Exercise 19. A study was conducted to find whether there is any relationship
between the weight and blood pressure of an individual. The following set of data
was arrived at from a clinical study.
Weight 78 86 72 82 80 86 84 89 68 71
Blood 140 160 134 144 180 176 174 178 128 132
Pres-
sure
Determine the coefficient of correlation for this set of data in the above table;
Exercise 20. Obtain the correlation coefficient of the following data, and com-
ment on your result.
Mean 14.2 14.3 14.6 14.9 15.2 15.6 15.9
Temp
(x)
Pirates 35000 45000 20000 15000 5000 400 17
(y)
Again we need to find the value of the following:
127
STA 2100 Probability and Statistics I
This method is based on the assumption that the population being studied is
normally distributed . Therefore, it is possible to avoid making any assumptions
about the populations being studied by ranking the observations according to size,
and basing the calculation on the ranks rather than upon the original observations.
It does not matter which way (direction) the items are ranked [ascending or
descending ]
The formula to determine the coefficient of rank correlation, rs , is given by
6Σd
rs = 1 −
n(n2 − 1)
where n is the sample size and d is the difference in ranks of the variables xi and
yi .
The Spearman’s rank correlation also lies between −1 and +1. It is based
on non-parametric test i.e. it doesn’t assume the distribution where the sample
comes from.
Remark 1. To find Spearman’s rank correlation coefficient, we first give ranks to
the given values as per their hierarchy, then proceed to apply the formula. The
ranking order must be the same for the two variables, i.e either both ascending
or both descending .
Example. Based on the data in Example 1, on crime and blackout in Juja, lets
check using Spearman’s rank correlation if there is correlation between crime and
blackout in the town. That is;
128
STA 2100 Probability and Statistics I
6Σd
rs = 1 −
n(n2 − 1)
6 ∗ 110
rs = 1 − = 0.3333
10(102 − 1)
Once again the value of rs is positive, even though slightly different from the value
obtained using the Pearson’s correlation method [rs = 0.4305]. This difference in
the value of rs , goes to confirm that this second method is simply an estimation
of the Pearson’s method. However, it is evident that by using this method, one is
still able to arrive at the same decision as that of the Pearson’s method.
• The sum of the differences of ranks between two variables is equal to Zero.
Symbolically Σd = 0
129
STA 2100 Probability and Statistics I
Exercise 21. The data given below are obtained from student records, that
is Grade Point Average,GPA (x) and Graduate Record Exam, GRE score (y).
Calculate the rank correlation coefficient r for the data.
Subject1 2 3 4 5 6 7 8 9 10
x 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260
Note that in the x row, we have two students having a grade point average of
8.6 and also in the y row; there is a tie for 2000.
4. Summary
Correlation Analysis is a method designed to measure the degree of association
between variables. E.g. correlation may express the degree of association
between verbal and mathematical scores of students on entering a college.
Data which are arranged in ascending order are said to be in ranks or ranked data.
The coefficient of correlation for such type of data is given by Spearman
rank difference correlation coefficient and is denoted by rs .
Measures of central tendency and measures of variability are not the only descrip-
tive statistics we are interested in using to get a picture of what a set of
scores looks like.
We have already learnt that knowing the values of the one most representative
score (central tendency) and a measure of spread or dispersion (variability) is
critical for describing a characteristics of a distribution. However, sometimes
we are as interested in the relationship between variables or to be more
precise, how the value of one variable changes when the value of another
variable changes.The way we express this interest is through the computation
of a simple correlation coefficient .
130
STA 2100 Probability and Statistics I
The Pearson correlation coefficient examines the relationship between two vari-
ables, but both of these variables are continuous in nature. In other words,
they are variables that can assume any value along some underlying con-
tinuum, such as height, age, test score, or income. But there is a host of
other variables that are not continuous. They are called discrete or cate-
gorical variables, like race (such as black and white) social class (such as
high and low ) and political affiliation (such as Democrat and Republi-
can).You need to use other correlation techniques, which we don’t cover in
this module.
There are several (easy but important) things to remember about the correlation
coefficient:
2. The absolute value of the coefficient reflects the strength of the correlation.
So a correlation of -0.70 is stronger than a correlation of +0.50.
4. A correlation always reflects the situation where there are at least two data
points (or variables) per case.
5. Another easy mistake is to sign a value judgment to the sign of the corre-
lation. Many students assume that a negative relationship is not good and
a positive one is good. That’s why instead of using the terms “negative”
and “positive” the terms “indirect” and “direct” communicate meaning more
clearly.
131
STA 2100 Probability and Statistics I
Remark 2. What’s really interesting about correlations is that they measure the
amount of distance that one variable co-varies in relation to another. So, if both
variables are highly variable (have lots of wide-ranging scores), the correlation
between them is more likely to be higher than if not. Now that’s not to say that
lots of variability guarantees a higher correlation because the scores have to vary in
a systematic way. But if the variance is contained in one variable, no matter how
much the other variable changes, the correlation will be lower. For example,lets say
you are examining the correlation between academic achievement in high school
and a first-year grades in colleges and you look at only the top 10 of the class.
Well, that top 10 is likely to have very similar grades, introducing no variability
and no room for the one variable to vary as a function of the other. Guess what
you get when you correlate one variable with another that does not change? i.e
rxy = 0.
132
STA 2100 Probability and Statistics I
Learning Activities
1. Refer to the Datsun Z cars data. Obtain the Spearma’s rank correlation
coefficient and compare your result with the Pearson’s moment correlation
result given in the notes.
2. The following data relate to ages of husbands and their wives. Obtain the
Pearson’s moment correlation coefficient and the Spearma’s rank correlation
coefficient. Hence compare the two results.
Husband 18 19 25 27 30 35 36 40 42 44
Wife 16 17 24 23 22 30 28 36 40 41
Consider the following data and draw a scatter plot
X 1.0 1.9 2.0 2.9 3.0 3.1 4.0 4.1 5.0
Y 10 99 100 999 1,000 1,001 10,000 10,001 100,000
The ranks of two sets of variables (Heights and Weights) are given below.
Calculate the Spearman rank difference correlation coefficient r.
1 2 3 4 5 6 7 8 9 10
Heights 2 6 8 4 7 4 9.5 4 1 9.5
Weights 9 1 9 4 5 9 2 7 6 3
133
STA 2100 Probability and Statistics I
Chapter 9
Relations (Simple Linear Regression)
Learning outcomes
Upon completing this topic, you should be able to:
• Define correlation, regression and know the link between these two concepts
• Calculate the equations of least squares regression lines, and use them to
estimate any given values for a set of data
• Interpret the meaning of the values obtained in the regression equation, and
how the values link to the correlation coefficient.
134
STA 2100 Probability and Statistics I
1. Introduction Regression
As indicated in introductory section of the previous lesson, we shall now be looking
at regression in this lesson, and specifically simple linear regression.
If two variables are significantly correlated, and if there is some theoretical basis
for doing so, it is possible to predict values of one variable from the other.
Regression analysis, in general sense, means the estimation or prediction of the
unknown value of one variable from the known value of the other variable. It is
one of the most important statistical tools which is extensively used in almost all
sciences – Natural, Social and Physical.
“Regression analysis is a mathematical measure of the average relationship
between two or more variables in terms of the original units of the data.”
Definitions
Predictions
One of the primary advantages of knowing about a relationship between two vari-
ables is that one can use the knowledge to facilitate making predictions. Specif-
ically when one has exact knowledge of the individual score on one of the two
variables, then he/she can use the knowledge of the relationship to increase the
accuracy of a prediction of the individuals’ score on the other variable. As a note,
by the term prediction in this case, we mean a “best guess” of what a single
value of score will be. e.g. the value to be predicted can be the number of years
that a 62 yr old woman with high blood pressure will live or the time an individual
machine will take before it will break down. Therefore, a prediction is a guess
about the value of a term to be drawn from a specified population.
A predictor variable is one that provides relevant information for predicting what
scores will be on some other variable.
135
STA 2100 Probability and Statistics I
A predicted variable is one about which predictions are made. (An exact rela-
tionship between the predictor and predicted variable is very essential if we
are to make the most accurate predictions possible)
The simplest form of regression analysis, called simple linear regression or straight
line regression which involves the statistical modeling between a single input fac-
tor X (the “regerssor”) and a single output variable Y (the “response”).
• The plot of a line with slope 2 and intercept 1 is depicted in the following
figure:
136
STA 2100 Probability and Statistics I
• Even though the straight-line model is perfect for algebra class, in the real
world finding such a perfect linear relationship is next to impossible.
• Most real world relationships are not perfectly linear models, but imperfect
models where the relationship between x and y is more like the correlation plot
we saw earlier.
• In this case, the question literally is “Where do you draw the line?”.
Simple linear regression is the statistical technique to correctly answer this
equation.
• Simple linear regression is the statistical model between X and Y in the real
world, where there is random variation associated with measured variable
quantities. To study the relationship between X and Y, the simplest rela-
tionship is that of a straight line, as opposed to a more complex relationship
such as a polynomial.
• Therefore in most cases we want to try to fit the data to a linear model.
137
STA 2100 Probability and Statistics I
• The second step, once it is determined that a linear model is a good idea,
is to determine the best fitting line that represents the relationship.
• For any given value of X, the true mean value of Y depends on X, which can
be written µy|x .
• In regression, the line represents mean values of Y not individual data values.
Each observation Yi is independent of all other, Yj 6= Yi
138
STA 2100 Probability and Statistics I
1.0
0.5
gene2
0.0
−1.0
gene1
• This method is subject to much error and is unlikely we will produce the
“best fitting” line. Therefore a more sophisticated method is needed.
• Regression analysis can be thought of as being sort of like the flip side of
correlation. It has to do with finding the equation for the kind of straight
lines we have just looked at.
• Suppose we have a sample of size n and it has two sets of measures, denoted
by x and y. We can predict the values of y given the values of x by using
the equation, y∗ = a + bx . Where,
x y)/n x2 − ( x)2
P P P P P
b = (n xy −
For a we have
a = y + bx
or rewritten as
P P
a = ( y − b x)/n
The symbol y∗ refers to the predicted value of y from a given value of x from
the regression equation.
139
STA 2100 Probability and Statistics I
• Suppose we have the linear equation y = 25 + 20x which gives the total
cost, y of a word processing job. Given the amount of time required,x, we
can use the equation to determine the exact cost of the job,y.
• However, things are not quite simple as in this case of word processing exam-
ple. So more often than not we have to be content with rough predictions.
In fact, for many circumstances, the variable being predicted will vary even
for a fixed value of the variable being used to make the prediction.
• For instance, we cannot predict the exact price of a Datsun Z cars by just
knowing the age . Indeed even for a fixed age, say three (3) years old, the
price of a Datsun Z varies from car to car.
Example. Suppose we have the following data on Age Vs Price of Datsun Z’s.
Age(yrs)5 7 6 6 5 4 7 6 5 5 2
Price 80 57 58 55 70 88 43 60 69 63 118
($100)
It’s useful to plot the data so that we can visualize the apparent relationship
between Age and price. Such plot is known as a scatter diagram.
140
STA 2100 Probability and Statistics I
From the diagram, it’s clear that the points are not on a straight line, but it’s
apparent they are clustered about a straight line. Hence, we fit a straight line
to the data, and then we could use that line to predict the price of Datsun Z’s.
Since it is possible to draw many reasonable looking straight lines through the
cluster of points, we need a method to choose the “Best” line. The method used
is known as the Least-square criterion.
So how does it work?
Simple illustration;
Suppose we have two lines A and B drawn for a set of plots in a scatter
diagram, say;
Line A: y = 0.5 + 1.25x
Line B: y = −0.25 + 1.5x
Then we have the following predicted values, and errors for the two lines as
follows:
x y yˆA e= e2A yˆB e= e2B
y− y−
yˆA yˆB
0 2 0.5 1.5 2.25 - 2.25 5.0625
0.25
1 4 1.75 2.25 5.0625 1.25 2.75 7.5625
2 6 3 3 9.00 2.75 3.25 10.5625
3 8 4.25 3.75 14.0625 4.25 3.75 14.0625
P 2 P 2
eA = eB =
30.375 37.25
Where,
x is the observed value of x
y is the observed value of y
eA is the error made if we use line A for prediction
eB is the error made of we use line B for prediction
• The rule for choosing the best line among several possible lines, is that we
choose the line with the smaller value of e . This line will give the best
P 2
fit for the data at hand. This may not be an easy task as we shall be forced
to draw all the possible lines, which may not also be possible. To solve the
141
STA 2100 Probability and Statistics I
• Hence, from the above examples of lines A and B, we would choose line A
as it has the least square error, i.e. its the line of best fit for the data if we
were to consider only these two lines.
x y)/n x2 − ( x)2
P P P P P
b = (n xy −
or rewritten as
P P
a = ( y − b x)/n
We can then derive the line of best fit for the Datsun Z cars example, and also
answer the following questions;
Example. Refer to Age Vs Price data for the Datsun Z’s:
1. Determine the regression equation for the data; i.e. find the equation of the
regression line.
2. Describe the apparent relationship between Age and price for Datsun Zs
3. What does the slope of the regression equation represent in terms of the
prices for Datsun Zs?
142
STA 2100 Probability and Statistics I
4. Use the regression equation to predict the price for a two year-old Z and a
five-year old Z.
Solution
x y xy x2
5 80 400 25
7 57 399 49
6 58 348 36
6 55 330 36
5 70 350 25
4 88 352 16
7 43 301 49
6 60 360 36
5 69 345 25
5 63 315 25
2 118 236 4
58 761 3,736 326
143
STA 2100 Probability and Statistics I
3. Here, we are to describe the apparent relationship between age and price for
Datsun Zs. Since the slope of the regression line is negative, we see that
the price tends to decrease as age increases-Any surprises!
4. For this part, we are to interpret the slope of the regression equation in
terms of the prices for Datsun Zs. To begin, recall that represents age, in
years, and represents price, in hundred dollars. The slope of -13.70 or $1,370
indicates that Datsun Zs depreciate an estimated $1,370 per year, at least
in the two-to seven year-old range.
144
STA 2100 Probability and Statistics I
Remark 4. If you plan to find a regression line for a set of data points, first look at
a scatter diagram of the data. If data points do not appear to be scattered about
a straight line, do not determine a regression line.
2. Exercises
Exercise 22. Scores made by students in a statistics class in the mid-term and
final examination are given here. Develop a regression equation which may be used
to predict final examination scores from the mid – term score.
Student 1 2 3 4 5 6 7 8 9 10
Mid- 98 66 100 96 88 45 76 60 74 82
term
Final 90 74 98 88 80 62 78 74 86 80
3. Revision Questions
The following is a list of questions that will assist you in your revision.
Practice Problems:
Subject 1 2 3 4 5
Hamburgers 5 4 3 2 1
Beers 8 10 4 6 2
2. A horse owner is investigating the relationship between weight carried and the finish
position of several horses in his stable. Calculate r and R for the data given
Weight 11 11 12 11 11 11 11 12 10 10 11 11
carried
Position 0
2 3
6 0
3 5
4 0
6 5 7
4 3
2 6
1 8
4 0
1 0
3
Finishe
3. The top and bottom number which may appear on a die are as follows Calculate r
d
and R for these values. Are the results surprising?
Top 1 2 3 4 5 6
Bottom 5 6 4 3 1 2
X 38 42 29 31 28 15 24 17 19 11 8 19 3 14 6
y 4 3 11 5 9 6 14 9 10 15 19 17 10 14 18
b) What does this statistic mean concerning the relationship between death
anxiety and religiosity?
c) What percent of the variability is accounted for by the relation of these two
variables?
5. The data given below are obtained from student records.( Grade Point Average (x)
and Graduate Record exam score (y)) Calculate the regression equation and compute
the estimated GRE scores for GPA = 7.5 and 8.5..
Subject 11 12 13 14 15 16 17 18 19 20
X 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260
145
STA 2100 Probability and Statistics I
6. A horse was subject to the test of how many minutes it takes to reach a point from
the starting point. The horse was made to carry luggage of various weights on 10
trials.. The data collected are presented below in the table. Find the regression
equation between the load and the time taken to reach the goal. Estimate the time
taken for the loads of 35 Kgs , 23 Kgs, and 9 Kgs. Are the answers in agreement with
your intuitive feelings? Justify.
Trial 1 2 3 4 5 6 8 8 9 10
Number 11
Weight 23 16 32 12 28 29 19 25 20
(in Kgs) 13
Time 22 16 47 13 39 43 21 32 22
taken
(in
7. A study was conducted
mins) to find whether there is any relationship between the weight
and blood pressure of an individual. The following set of data was arrived at from a
clinical study.
Serial 1 2 3 4 5 6 8 8 9 10
Number 78
Weight 86 72 822 80 86 84 89 68 71
Blood 140 160 134 144 180 176 174 178 128 132
Pressure
8. It is assumed that achievement test scores should be correlated with student's
classroom performance. One would expect that students who consistently perform
well in the classroom (tests, quizzes, etc.) would also perform well on a standardized
achievement test (0 - 100 with 100 indicating high achievement (x)). A teacher
decides to examine this hypothesis. At the end of the academic year, she computes a
correlation between the students achievement test scores (she purposefully did not
look at this data until after she submitted students grades) and the overall G.P.A.(y)
for each student computed over the entire year. The data for her class are provided
below.
X 98 96 94 88 01 77 86 71 59 6 8 7 7 7 8 8 7 9 9 6
Y 3.6 2.7 3.1 4.0 3.2 3.0 3.8 2.6 3.0 3 4 9
2 1 5 2
3 2 6 3
2 2 5 1 3 3
2 3 0 2
1
. . . . . . . . . . .
a) Compute the correlation coefficient. 2 7 1 6 9 4 4 8 7 2 6
d) What would be the slope and y-intercept for a regression line based on this
data?
146
STA 2100 Probability and Statistics I
the nearest dollar) and degree of customer satisfaction (on a scale of 1 - 10 with a 1
being not at all satisfied and a 10 being extremely satisfied). The researcher only
includes programs with comparable types of services. A sample of the data is
provided below.
Dollars 11 18 17 15 9 5 12 19 22 25
Satisfaction 6 8 10 4 9 6 3 5 2 10
b) What does this statistic mean concerning the relationship between amount of
money spent per month on internet provider service and level of customer
satisfaction?
10. It is hypothesized that there are fluctuations in norepinephrine (NE) levels which
accompany fluctuations in affect with bipolar affective disorder (manic-depressive
illness). Thus, during depressive states, NE levels drop; during manic states, NE
levels increase. To test this relationship, researchers measured the level of NE by
measuring the metabolite 3-methoxy-4-hydroxyphenylglycol (MHPG in micro gram
per 24 hour) in the patient's urine experiencing varying levels of mania/depression.
Increased levels of MHPG are correlated with increased metabolism (thus higher
levels) of central nervous system NE. Levels of mania/depression were also recorded
on a scale with a low score indicating increased mania and a high score increased
depression. The data is provided below.
MHPG 980 1209 1403 1950 1814 1280 1073 1066 880 776
Affect 22 26 8 10 5 19 26 12 23 28
b) What does this statistic mean concerning the relationship between MHPG
levels and affect?
d) What would be the slope and y-intercept for a regression line based on this
data?
e) What would be the predicted affect score if the individual had an MHPG level
of 1100? of 950? of 700?
147
STA 2100 Probability and Statistics I
4. Learning Activities
1. Daniel computed the following statistics based on the amount (X) in millions
(Kshs) that he invested in his cyber café business, and the income (Y) in
millions (Kshs) generated.
P P 2 P P P 2
n = 10, xi = 93, xi = 999, xi yi = 293, yi = 28, yi = 90
• Using the data, fit a linear regression line of the income (y) generated on
the amount (x) invested.
• Use the regression equation to determine how much Daniel would realize if
he invested Kshs 2.5M and comment on your results.
148
STA 2100 Probability and Statistics I
Chapter 10
Revision Questions
Learning outcomes
Upon completing this lesson, you should be able to:
149
STA 2100 Probability and Statistics I
1. Introduction
So far we have looked at all the areas that we needed to cover in this module. In
this chapter, we are going to look at some revision questions that may help us in
reviewing the course content, and be able to handle the queries that may come at
the end of the semester or in the subsequent years of study.
The questions that we provide in this section are divided into two sections.
In Section One, we list some example questions that you may attempt, while in
Section Two, we give some exercises with solutions for your revision purposes.
Example. Two tetrahedral dice are rolled together once and the sum of the scores
facing down was noted. Find the pmf of the random variable ‘the sum of the
scores facing down.’
Solution:
1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
i.e. X = {2, 3, 4, 5, 6, 7, 8}
Therefore x is given the pmf by the table below:
x 2 3 4 5 6 7 8
1 2 3 4 3 2 1
P (X = x) 16 16 16 16 16 16 16
This can also
be written as a function;
x−1 , f or x = 2, 3, 4, 5
16
P (X = x) =
9−x , f or x = 6, 7, 8
10
2. Sample Questions
150
STA 2100 Probability and Statistics I
Example. The following table gives the Age(X), and Price (Y) in (£’000) of cars
driven by Eleven IT department Lecturers who attended a retreat in Mombasa in
the month of January, 2014. Use the data to answer the following questions:
Age(X) 5 7 6 6 5 4 7 6 5 5 2
Price 8.0 5.7 5.8 5.5 7.0 8.8 4.3 6.0 6.9 6.3 11.8
(Y)(£’000)
(i) Determine the regression equation of Price (Y) on Age (X)
(ii) Estimate the price of a car aged 3 years
151
STA 2100 Probability and Statistics I
Exercise 32. The following data relate to ages of husbands and their wives.
Obtain the Spearman’s rank correlation coefficient and comment on your result.
Husband’s
18 19 25 27 30 35 36 40 42 44
Age
Wife’s 16 17 24 23 22 30 28 36 40 41
Age
Exercise 33. Interviews with 36 IT students at JKUAT revealed that 10 of the 16
ladies and 16 of the 20 men preferred the course Distributed Systems to Artificial
Intelligence. Determine the probability that the first person interviewed was either
a man or someone who preferred Distributed Systems to Artificial Intelligence,
assuming each of the 36 people were equally likely to have been the first to be
interviewed.
Exercise 34. The frequency distribution table below shows the number of com-
puters available for use in a computer lab for a period of 60 days.
Number of 0- 12- 24- 36- 48-
Days 12 24 36 48 60
Number of 3 10 8 4
Computers
If the mode of data was 29 days, find the number of computers available
between “24-36” days.
Exercise 35. A conservative design team and an innovative design team were
asked separately to design a new product within a period of one month. From past
experience, we know that: The probability that the conservative team is successful
is 2/3, while the probability that the innovative team is successful is 1/2. The
152
STA 2100 Probability and Statistics I
probability that at least one team is successful is 43 . Assuming that exactly one
successful design is produced, what is the probability that it was designed by the
innovative design team?
Exercise 36. An urn contains 10 balls: 4 red and 6 blue. A second urn contains
16 balls with an unknown number of blue balls. A single ball is drawn from each
urn. The probability that both balls are of the same colour is 0.44. Calculate the
number of blues balls in the second urn.
Exercise 37. Explain what you understand by simple random sampling and
stratified random sampling.
Problem 5. A and B are two identical boxes. Box A contains 5 Diamond rings
and 4 Gold rings. Box B contains 6 Diamond rings and 5 Gold rings. A box is
chosen at random, and from it a ring is drawn at random and then put into the
other box. A ring is then drawn at random from this latter box. illustrate the
information given is a probability tree diagram, hence determine the probability
that the first ring drawn is a Diamond ring given that the second ring is a Gold
ring.
Exercise 38. Past records show that 5% of Bridging Mathematics students who
pass the JKUAT certificate examination join an IT course at JKUAT Westland
campus. If a group of Six students, selected at random from former Bridging
Mathematics students had passed the examination, what is the probability that;
153
STA 2100 Probability and Statistics I
(a) None of these former students will be doing an IT course at JKUAT West-
land campus?
(b) At most two will join the IT course at JKUAT Westland campus?
Exercise 39. Giving relevant examples, define the following terms;
(i) Secondary data
(ii) Discrete random variable
(iii) An event
(iv) Probability space
Problem 6. The mean and variance of a random variable X are 4 and 2.25 respec-
tively. Find the mean and standard deviation of the random variable Y=15X+5.
Problem 7. The discrete random variable X has the probability distribution shown
below.
x 0 1 2 3
P(x) 0.2 p q r
If F (1) = 0.3 and E(X) = 1.7, find V ar(X).
Exercise 40. The following table shows the marks often candidates in Physics
and Mathematics. Find the product-moment correlation coefficient and comment
on your result.
Mark 18 20 30 40 46 54 60 80 88 92
in
Physics
(x)
Mark 42 54 60 54 62 68 80 66 80 100
in
Math-
emat-
ics
(y)
Exercise 41. Two judges rank the eight photographs in a competition as follows:
154
STA 2100 Probability and Statistics I
Photograph A B C D E F G H
1st Judge 2 5 3 6 1 4 7 8
2nd Judge 4 3 2 6 1 8 5 7
Calculate Spearman’s coefficient or rank correlation for the data.
Exercise 42. A Personnel Manager in the city of Mwaimbasa divides the com-
pany days into “Good”, “Better” or “Best”. He estimates that the probability
of a good day is 0.5 and that 30%of the company days are better. He has also
calculated that the company’s average revenue on the three types of days is Kshs
40 million, Kshs 130 million and Kshs 220 million respectively. If the company’s
average running cost per day is Kshs 80 million, calculate the company’s expected
profit per day.
Exercise 43. The Governor of Kiambu County has allocated funds for entertain-
ment for his office to the tune of 52 Million. When asked to explain the rationale
of his budget, he gave several explanations. Of interest to note, he mentioned
that within the few days that he had been in office, each of the five working days
had several people of different background visiting his office. He went on to check
the records with the secretary and noted the number of visitors coming to his
office per day at once, and their respective frequencies, which he later converted
to probabilities as shown below.
X 1 2 3 4 5
P (X = x) 0.3 a 0.1 0.2 b
Using this probability distribution, the Governor computed the expected value
as, E(X) = 3.1. Hence;
Example. What is the purpose of the measures of central tendency and dispersion?
Solution:
(1) Descriptive statistics (numerical measures) has two branches: measures of
central tendency and measures of dispersion.
(2) Measures of central tendency are the mean, median, and mode. These
show the central location around which all data points tend to congregate. They
determine the central value around which the various items concentrate, which is
used for describing the data
155
STA 2100 Probability and Statistics I
(3) Measures of dispersion are the range, standard deviation, variance, inter-
quartile range to name a few. These show how the data spreads or varies from
the central point. Together these two branches describe the data.
156
STA 2100 Probability and Statistics I
4. Conclusion
In this lesson we have tried to give a list of some of the questions that one may face
while dealing with the issues learnt in this course. While we encourage you to do as
many questions from the list as possible, if not ALL, we also discourage students
from memorizing the questions given here. Rather, we encourage students to look
at every question listed here, both problem sets and exercises, and always ensure
that they are aware of the concept being tested in each question, for instance,
binomial distribution and its applications. That way, you will have learnt more
regarding Probability and Statistics I, and be on a good footing for the coming
course in Probability and Statistics II.
We wish you the very best in the course and always ask questions whenever
something is not clear to you!
—Adieu—
@HKOA
157
STA 2100 Probability and Statistics I
Solutions to Exercises
Exercise 1. S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} Exercise 1
Exercise 2. P = {..., −2, −1, 0, 1, 2, 3, 4, 5} Exercise 2
Exercise 3. Q = {3, 4, 5, 6, 7, 8, 9} Exercise 3
Exercise 4. For the first word, we have 3!4!2!2!1!1!
13!
= 10810800 arrangements while
for the second word we have 5!2!2!1!1!
11!
= arrangements Exercise 4
Exercise 5. The possible number of committees can be found in two different
approaches.
Option 1
That is, All possible number of committees is 10 C5 = 252. A committee
without any woman (that is all 5 men) is obtained as 6 C5 = 6 C1 = 6. Hence the
total number of ways of obtaining at least one woman is 252 − 6 = 246 ways.
Option 2
For the second option, we can get all the possible committees with at least
one woman as follows:
6
C1 ∗ 4 C4 = 6 (1 man and 4 women, total 5 members chosen)
6
C2 ∗ 4 C3 = 60 (2 men and 3 women, total 5 members chosen)
6
C3 ∗ 4 C2 = 120 (3 men and 2 women, total 5 members chosen)
6
C4 ∗ 4 C1 = 60 (4 men and 1 woman, total 5 members chosen)
Thus all possible ways is 6 + 60 + 120 + 60 = 246 ways
Exercise 5
Exercise 7. (a) This is a cluster sample because each department is a naturally
occurring subdivision. (b) This is a convenience sample because you are using
the lecturers that are readily available to you. (c) This is a systematic sample
because the lecturers are divided by department and some from each department
are randomly selected. Exercise 7
Exercise 8. A pie chart is mostly useful in displaying a relative frequency (per-
centage) distribution; similar to Bar chart while a histogram is useful for revealing
the general pattern or distribution of (quantitative) values. Exercise 8
Exercise 10. 1. Finding the median.
Arrange the numbers in order from lowest to highest, and find the number in
the middle of the set. 50, 50, 55, 60, 60, 60, 65, 65, 80, 90, and 99
158
STA 2100 Probability and Statistics I
9 + 10 + 5 + 6 + 5 + 7 + 8 + 5 55
=
8 8
= 6.875
xi xi − x̄ (xi − x̄)2
5 -1.875 3.156
6 -0.875 0.766
5 -1.875 3.156
7 0.125 0.016
8 1.125 1.266
9 2.125 4.516
10 3,125 9.766
5 -1.875 3.156
X
(x − x̄)2 = 26.878
Hence s2 = 26.878
8−1
=3.839
√
The standard deviation is 3.839 = 1.959 Exercise 11
Exercise 13. For grouped continuous data with n = 120 ,
Q1 is the 14 nth value i.e the 30th value,
Q2 is the 42 nth value i.e. the 60th value,
Q3 is the 34 nth value i.e. the 90th value.
Our table should now look like this
159
STA 2100 Probability and Statistics I
Time CF
9.5-14.5 2
14.5-19.5 7
19.5-24.5 24
24.5-29.5 57
29.5-34.5 84
34.5-44.5 109
44.5-59.5 116
59.5-89.5 119
89.5-119.5 120
For solution (a)
Q1 lies in the interval 24.5-29.5 (width=5)
There are 33 items in this interval
So Q1 = 24.5 + 33 6
∗ 5 = 25.4 min
Q2 lies in the interval 29.5-34.5 (width=5)
There are 27 items in this interval
So Q1 = 29.5 + 27 3
∗ 5 = 30 min
Q3 lies in the interval 34.5-44.5 (width=10)
There are 25 items in this interval
So Q1 = 34.5 + 25 6
∗ 10 = 36.9 min
This is an implementation of the formula median = LM + ( N/2+F fM
M −1
)CM
Solution (b)
Q3 − Q2 = 6.9 min, Q2 − Q1 = 4.6 min
Since Q3 − Q2 > Q2 − Q1 , it implies that we have a positive skew
Exercise 13
P P 2
Exercise 14. x = 101.4, x = 102.83, n = 100
Therefore, x̄ = x/n = 101.4/100 = 1.014
P
160
STA 2100 Probability and Statistics I
161
STA 2100 Probability and Statistics I
= 1.5
(c) E(X 2 − X) = (x2 − x)P (X = x) = E(X 2 ) − E(X)
P
= 1.5 − 1 = 0.5
t Exercise 18
Exercise 19. We can organize the table as shown below;
162
STA 2100 Probability and Statistics I
Thus;
Exercise 19
Exercise 20. Thus, we have
163
STA 2100 Probability and Statistics I
Exercise 20
Exercise 21. Now we arrange the data in descending order, and then rank 1,2,3,.
. . . .10 accordingly.
In case of a tie, the rank of each tied value is the mean of all positions they
occupy.
In x, for instance, 8.6 occupy ranks 5 and 6. So each has a rank of 5+62
= 5.5
Similarly in y =2000 occupies ranks 9 and 10, so each has rank 9.5
Now we come back to our formula,
6Σd
rs = 1 −
n(n2 − 1)
164
STA 2100 Probability and Statistics I
So
6Σd 6(12)
rs = 1 − 2
=1− = 1 − 0.0727 = 0.9273
n(n − 1) 10(100 − 1)
Note: If we are provided with only ranks without giving the values of x and y we
can still find Spearman rank difference correlation rs by taking the difference of
the ranks and proceeding in the above shown manner.
Exercise 21
Exercise 22. We want to predict the final exam scores from the mid-term scores.
So let us designate ‘y’ for the final exam scores and ‘x’ for the mid-term exam
scores. We open the following table for the calculations.
165
STA 2100 Probability and Statistics I
166
STA 2100 Probability and Statistics I
167
STA 2100 Probability and Statistics I
Exercise 45. This question can be solved using the formula for Harmonic mean;
i.e.
2ab 2 ∗ 80 ∗ 50
Harmonic mean= = = 61.54km/h Exercise 45
a+b 80 + 50
168