0% found this document useful (0 votes)
302 views175 pages

Sta 2100 Probability and Statistics I (Course Outline With Notes)

The document outlines the STA 2100: Probability and Statistics I course at Jomo Kenyatta University, detailing its purpose, course description, learning outcomes, and assessment methods. It covers fundamental concepts in probability and statistics, including data collection, frequency distributions, measures of central tendency, and regression analysis. The course aims to equip students with the skills to analyze data and apply statistical methods in Information Technology.

Uploaded by

tpumba002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
302 views175 pages

Sta 2100 Probability and Statistics I (Course Outline With Notes)

The document outlines the STA 2100: Probability and Statistics I course at Jomo Kenyatta University, detailing its purpose, course description, learning outcomes, and assessment methods. It covers fundamental concepts in probability and statistics, including data collection, frequency distributions, measures of central tendency, and regression analysis. The course aims to equip students with the skills to analyze data and apply statistical methods in Information Technology.

Uploaded by

tpumba002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 175

JOMO KENYATTA UNIVERSITY

OF
AGRICULTURE & TECHNOLOGY

SCHOOL OF OPEN, DISTANCE AND


eLEARNING
IN COLLABORATION WITH

KAREN CAMPUS

STA 2100: PROBABILITY AND STATISTICS I

LAST REVISION ON August,2020

V. S. ANDIKA
([email protected])

P.O. Box 62000, 00200


Nairobi, Kenya
STA 2100: PROBABILITY AND STATISTICS I
Course Purpose
The purpose of this course is to introduce students to the basic concepts of prob-
ability and statistics, and focus on the applications of statistical knowledge in the
field of Information Technology. At the end of the course the student should be
proficient in handling probability and probability distributions, including expecta-
tion and variance of a discrete random variable; representing data graphically and
handling summary statistics, simple correlation and best fitting line.

Course description
1. Data: sources, collection, classification and processing.

2. Frequency distributions and graphical representation of data, including bar


diagrams, histograms and stem-and-leaf diagrams.

3. Measures of central tendency and dispersion, Skewness and Kurtosis.

4. Introduction to Probability: Classical and axiomatic approaches to probabil-


ity. Compound and conditional probability, including Bayes’ theorem.

5. Concept of discrete random variable: expectation and variance.

6. Regression and Correlation. Fitting data to a best straight line.

Prerequisite: SMA 2104: Mathematics for Science and SMA 2102: Calculus I

Learning outcomes
Upon completion of this course you should be able to;

1. Define and evaluate probability and probability distribution

2. Identify Mutually exclusive, and Independent events

3. Evaluate expected value and variance from a discrete probability distribution

4. Proficiently represent data graphically and handle summary statistics

2
5. Explain the purpose of measures of central tendency, variability, skewness
and Kurtosis

6. Acquire basic knowledge of the method of least square and correlation theory

Instruction methodology
• Online Tutorials

• Case studies

Core References:
1. P.S. Mann. Introductory Statistics. John Wiley & Sons Ltd, 2001 ISBN 13:
9780471395119.

2. S Ross A first course in Probability 4th ed. Prentice Hall, 1994 ISBN-10:
0131856626 ISBN-13: 9780131856622

3. J Crawshaw & J Chambers A concise course in A-Level statistics, with


worked examples, 3rd ed. Stanley Thornes, 1994 ISBN 0-534- 42362-0

4. GM Clarke & D Cooke A Basic Course in Statistics. 5th ed. Arnold, 2004
ISBN13: 978-0-340-81406-2 ISBN10: 0-340-81406-3

Additional Reference:
• Uppal, S. M., Odhiambo, R. O. & Humphreys, H. M. Introduction to Prob-
ability and Statistics. JKUAT Press, 2005.

Assessment information
The module will be assessed as follows;

• 10% of marks from two (2) assignments to be submitted online

• 20% of marks from one written CAT to be administered at JKUAT main


campus or one of the approved centres

• 70% of marks from written Examination to be administered at JKUAT main


campus or one of the approved centres

3
Table of Contents

1. Review of Basic Set theory, Permutations and Combinations


1. Basic Set Theory
1.1. Introduction
1.2. Definitions
2. Permutations and Combinations
2. Data: Collection and Sampling
1. Introduction
2. Data: Source/types and collection methods
2.1. Definitions
2.2. Types of Data
2.3. Data Classification
2.4. Methods of Data Collection
3. Presentation of Data
1. Introduction
2. Tables
3. Graphical presentations
3.1. Pie chart
3.2. Bar Charts
3.3. Histograms
3.4. Frequency Polygons
3.5. Cumulative Frequency Polygons (Ogive)
3.6. Scatter Plots
3.7. Box plots
3.8. Stem and leaf plots
4. Numerical Summaries of Data (Simple frequency distributions)
1. Introduction
2. Measures of central tendency
2.1. Mean
• Arithmetic mean • Geometric mean • Harmonic mean
2.2. Median
2.3. Mode
Table of Contents (cont.)

3. Measures of variability
3.1. Range
3.2. Semi-interquartile range
3.3. Variance and standard deviations
3.4. Coefficient of Variation
3.5. Pearson’s Measure of Skewness (psk)
4. Measures of location/Position
4.1. Quartiles
4.2. Percentiles
5. Summary
6. Assignment 1
5. Numerical Summaries of Data (Grouped Frequency Distributions)
1. Introduction
2. Measures of central tendency for grouped data
2.1. Computing the Mean
2.2. Assumed mean and coding method
2.3. Median for a grouped frequency
2.4. Mode for a grouped frequency:
3. Measures of variability for grouped data
3.1. Variance and Standard deviation
3.2. Mean absolute deviation (MAD)
4. Measures of location/Position
4.1. Quartiles and Percentiles
5. Combining sets of Data
6. Summary
6. Introduction to Probability
1. Introduction
2. Definitions
3. Rules of probability
4. How do we measure probability?
4.1. Classical (or theoretical) probability
4.2. Frequentist or Empirical (or statistical) probability
4.3. Subjective
5. Laws of probability

5
Table of Contents (cont.)

5.1. Multiplication law


5.2. Addition law
6. Conditional probability
6.1. Independence of two compound events
7. Tree Diagrams
8. Bayes Theorem
9. Summary
9.1. Revision questions or guidelines
7. Discrete Probability Distribution
1. Introduction
2. Random Variable
3. Discrete probability distribution
3.1. Finding probabilities
3.2. Expectation
• Expectation of a function of a discrete random variable
3.3. Variance
3.4. The cumulative distribution function, F (X)
4. Special discrete probability distributions
4.1. Bernoulli distribution
4.2. Binomial distribution
5. Summary
6. Revision questions or guidelines
8. Relations (Correlation)
1. Introduction
2. Bivariate Data
2.1. Scatter Diagrams and Correlation
2.2. Correlation Coefficient
3. Methods of determining correlation
3.1. Pearson’s Product Moment Correlation Coefficient
3.2. Spearman’s Rank Correlation
4. Summary
9. Relations (Simple Linear Regression)
1. Introduction Regression
1.1. Simple Linear (least squares) Regression Model

6
Table of Contents (cont.)

1.2. Assumptions of linear regression


1.3. Fitting the Regression Model/Equation
1.4. Regression Equation:
2. Exercises
3. Revision Questions
4. Learning Activities
10. Revision Questions
1. Introduction
2. Sample Questions
3. Exercises with solutions
4. Conclusion
Solutions to Exercises

7
STA 2100 Probability and Statistics I

Chapter 1
Review of Basic Set theory, Permutations and
Combinations

Learning outcomes
Upon completion of this section, you should be able to:

• Define a set and other set related terminologies

• Define a Venn diagram

• Perform basic set operations using the Venn diagram, union and intersection
of sets

• Define and perform basic operations of permutation and combinations

1
STA 2100 Probability and Statistics I

1. Basic Set Theory


1.1. Introduction
In set theory, we are concerned with the concept of a set, essentially a collection
of objects that we call elements. Generally, set theory forms the foundation of
nearly every other part of mathematics. The study of probability mostly deals
with combining different events and studying these events alongside each other.
The way these different events relate to each other determines the methods and
rules that we should follow when we’re studying their probabilities. As a result, it is
important that we understand the concept of sets to assist us in fully understanding
the concept of probability.

1.2. Definitions
Set
A set is a collection of objects which are called the members or elements of that
set. If we have a set we say that some objects belong (or do not belong) to this
set, are (or are not) in the set. We also say that sets consist of their objects such
that given an object, it is possible to determine whether the object belongs to the
given collection or not.
Example. The following are some examples of sets:

• The set of all integers

• The set of all JKUAT IT students

• The set of all letters of the alphabet

• The set of even integers

• {Jane, David, James, Anthony}

Elements of a set

The members of a set are called elements. We use upper case (capital) letters
to denote sets and lower case (small) letters to denote elements of a set. For
instance, If a is an element of the set A, we write this as a ∈ A (read as a belongs
to A) and if a is not an element of A, we write this as a ∈ / A (read as a does not
belong to A).

2
STA 2100 Probability and Statistics I

Set denotation/specification

There are different ways of describing a set. We usually use the parenthesis {}
to denote a set. For instance, the set consisting of elements 1,2,3,4,5,6 could be
written as {1, 2, 3, 4, 5, 6} or {1, 2, 3, ...6} or {x|x ∈ N, x ≤ 6} where N is the
set of the natural numbers. As we can see from this example, we also use the set
builder notation {x|x......} to denote a set, where the “bar” is used to replace the
words ’such that’.
There are three main ways to specify a set:

1. by listing all its members (list notation); This method is mostly suitable
for finite sets. In this case we list names of elements of a set, sepa-
rate them by commas and enclose them in braces, for instance {1, 12, 45},
{George W ashington, Bill Clinton}, {a, b, d, m}.

2. by stating a property of its elements (predicate notation); {x|x is a natural number and x < 8}
Read as : “the set of all x such that x is a natural number and is less than
8” So the second part of this notation is a property the members of the set
share (a condition or a predicate which holds for members of this set)

3. by defining a set of rules which generates (defines) its members (recursive


rules). (Always safe.) Example – the set E of even numbers greater than 3:

Note: The order in which elements appear in a set is not important. That is, the
set {a, b, c, d, e} and {a, e, c, d, b} are equal.

Finite and infinite sets

A set which has finite number of elements is called finite set, otherwise we call it
an infinite set. For instance, if A is the set of all integers, the A is an infinite set,
denoted as {...., −2, −1, 0, 1, 2...} or{x|x is an integer}

Singleton

In mathematics, a singleton, also known as a unit set, is a set with exactly one
element. If a is the element of the singleton A, then A is denoted as A = {a}.
Note, {a}and a do not mean the same thing. Whereas the former means a set
consisting of a single element a , the latter is just an element.

3
STA 2100 Probability and Statistics I

Equality of sets

Two sets A and B are said to be equal if and only if (iff) every element of the set
A is a member of the set B and vice-verse. We express this by writing A = B
and logically speaking A = B means (x ∈ A) ≡ (x ∈ B) or the bi-conditional
statement (x ∈ A) ⇐⇒ (x ∈ B) is true for all x.

Subset

Let A and B be two sets. If every element of Ais an element of B then Ais called a
Subset of B and we write this as A ⊆ B or B ⊇ A (read as A is contained in B or
B contains A respectively). Equally speaking, A ⊆ B means (x ∈ A) =⇒ (x ∈ B)
is true ∀x.

Note:

• If A ⊆ B and A 6= B,we write A ⊂ B or B ⊃ A (read as A is a proper


subset of B or B is a superset of A)

• Every set is a subset and a superset of itself

• If Ais not a subset of B, we write this as A * B

Empty or Null set

A set which has no element is called the null or empty set, and is denoted as ∅or
{}. For example;

• the set of all real numbers whose square is equal to -1

• the set of all those integers that are both even and odd

• the set of all JKUAT IT students who are both sick and well at the same
time

The null or empty set is a subset of every set.

Universal set

We assume that every set is a subset of a fixed set U or ξ, known as the universal
set. It is the set which contains all objects, including itself.

4
STA 2100 Probability and Statistics I

Power sets

The set of all subsets of a set A is called the power set of A and denoted as ℘(A)
or sometimes as 2A . For example, if A = a, b, then ℘(A) = {∅, a, b, a, b} . From
this example above: a ∈ A; {a} ⊆ A {a} ∈ ℘(A); ∅ ∈ A; ∅ ⊆ A; ∅ ⊆ ℘(A)

Complements

If A and B are two sets, then complement B relative to A is the set of all those
elements x ∈ A such that x ∈ / B and is denoted as A − B. Whenever we are
talking of complement of the set B, we usually mean the complement of the set
B relative to the universal set U . In such cases, we denoted the complement of
B by B 0 or B̄, thus B 0 = U − B or B̄ = U − B. This form of compliment is also
known as relative compliment of B relative to A.
Given a set A and a universal set U , the elements that are in U and are NOT
in A is called the complement of A or Ac .
Example. Suppose we have the set S = {a, b, c, d, e, f }, then the set S 0 which
is the complement of the set S is given by S 0 = {g, h, i, j, ...z}given that the
universal set U is assumed to be the set of all letters of the alphabet.

Operations on sets

Union

Here, we would like to define some operations that can be done on sets. Suppose
we let A and B be two arbitrary sets. Then the union of the set A and B, written
A ∪ B, is the set whose elements are just the elements of A or B or of both. Using
the set builder notation, the definition is A ∪ B = {x|x ∈ A or x ∈ B}.
Example. Lets define the following sets K = {a, b},L = {c, d} and M =
{a, b, d}, then;

• K ∪ L = {a, b, c, d}

• K ∪ M = {a, b, d}

• L ∪ M = {a, b, c, d}

• (K ∪ L) ∪ M = K ∪ (L ∪ M ) = {a, b, c, d}

5
STA 2100 Probability and Statistics I

• K ∪K =K

• K ∪ ∅ = ∅ ∪ K = K = {a, b}

Note that K ∪ M = {a, a, b, d}={a, b, d}since an element is only counted once in


a set. Both K and M have the element {a}.

Intersection

The intersection of two sets A and B, written A ∩ B, is the set whose elements
are just the elements of both A and B. Using the set builder notations, we define
and intersection as follows: A ∩ B = {x|x ∈ A and x ∈ B}. When two sets A
and B are disjoint, the A ∩ B = ∅.
Example. Using the three sets seen in our previous example, we can obtain the
following sets using the operation, intersection.

• K ∩L=∅

• K ∩ M = {a, b}

• L ∩ M = {d}

• (K ∩ L) ∩ M = K ∩ (L ∩ M ) = ∅

• K ∩K =K

• K ∩ ∅ = ∅ ∩ K = ∅.

Venn Diagram

A Venn diagram or set diagram is a diagram that shows all possible logical rela-
tions between a finite collection of sets. These diagrams help us understand the
elementary set theory, as well as helping in illustrating simple set relationships in
probability, logic, statistics, linguistics and computer science. The diagram con-
sists of the universal set U represented by points in and on a rectangle, and subsets
A, B, C...by points in and on the circles or ellipse drawn inside the rectangle.
The following is a diagrammatic representation of the Venn diagram showing
a diagram with two disjoint sets A and B (left diagram), and two joint sets A and
B (right diagram).

6
STA 2100 Probability and Statistics I

Number of elements in a set


We denote the number of elements in a set as follows: n(.). For example, suppose
we have the set S = {a, b, c, d, 1, 2, 3}, then the number of elements in the set
S denoted as n(S) = 7. Using the Venn diagram on the right hand side of the
figure above, we can say that if the number of elements in the set A is n(A) and
the number of elements in the set B is n(B), then for the two overlapping sets
A and B, if we add n(A) and n(B) together, we shall count the overlap twice.
Therefore to find the number of elements in A ∪ B, we treat the overlap as follows
n(A ∪ B) = n(A) + n(B) − n(A ∩ B).
Example. In a survey, 73 participants said they had bought lottery tickets, 49 had
bought premium bonds while 22 had bought both lottery and premium bonds. Find
the number of people who had bought either lottery tickets or premium bonds.
Solution
Let
L: rep number of participants who bought lottery ticket, n(L) = 73
B: rep number of participants who bought premium bond, n(B) = 49
Implying n(L ∩ B) = 22
Hence, n(L or B) = n(L)+n(B)−n(L∩B) = 73+49−22 = 100 participants

2. Permutations and Combinations


If one thing can be done in m different ways, and it is done in any one of these ways;
a second thing can be done in n different ways, then the two things in succession
can be done in mn different ways. For instance, if there are 3 candidates for a
governor and 5 for a mayor, then the two offices can be filled in 3*5=15 ways.
Example. A man has 3 jackets, 10 shirts and 5 pairs of slacks. If an outfit consists
of a jacket, a shirt and a pair of slacks; how many different outfits can the man
make?

7
STA 2100 Probability and Statistics I

Solution:
If we let x1 to represent Jackets, x2 to represent shirts and x3 to represent slacks;
then the man can have x1 *x2 *x3 =3*10*5=150 outfits

Permutations

A: A permutation is an arrangement of all or part of a number of things in a


definite order. For instance, a permutation of the letters a,b,c taken all at a time
are abc, acb, bca, bac, cba, cab. A permutation of the 3 letters taken 2 at a
time is ab, ac, ba, bc, ca, cb.
The symbol n Pr or n Pr represents the number of permutations (arrangements,
orders) of n things taken r at a time. This can also be written as P (n, r).
Permutation of n different things taken r at a time is given as follows:
n n!
Pr = n(n − 1)(n − 2)...(n − r + 1) = (n−r)!
When r = n; n Pr =n Pn =n(n − 1)(n − 2)...1 = n!
Example. The number of ways in which 4 persons can take their places in a cab
having 6 seats is: 6 P4 = (6−4)!
6!
= 6!
2!
= 6∗5∗4∗3∗2∗1
2∗1
= 360ways
B: A permutation with some things alike taken all at a time can be computed
as follows. The number of permutations P of n things taken all at a time of which
n1 are alike, n2 are alike, n3 are alike e.t.c. is obtained as P = n1 !n2n!!n3 !... where
n1 +n2 +n3 ...= n.
Example. The number of ways in which 3 dimes, 7 quarters can be distributed
among 10 boys each receiving one coin is 3!7!
10!
= 120ways
C: Circular permutations
The number of ways of arranging n different objects around a circle is (n − 1)!
ways. For instance, 10 persons may be seated at around table in (10−1)! = 362880
ways.

Combinations

A combination is a grouping or selection of all or part of a number of things


without reference to the arrangement of things selected. Thus, the combination
of the three letters a, b, c taken 2 at a time is given by ab, ac, bc. We note
here that for a combination, ab=ba while for permutation ac6=ca!

8
STA 2100 Probability and Statistics I

The symbol, n Cr =n Cr =C(n, r) denotes the number of combinations (selec-


tions, groups) of n things taken r at a time.
n
Hence, n Cr = r!Pr = (n−r)!r!
n!

Example. The number of handshakes between a party of 12 students if each


student shakes hands once with each other is obtained as follows|;
n
Pr = 10!2!
12!
= 12∗11
2∗1
= 66 hand shakes.
It can be noted that when dealing with combinations, the following expressions
give the same solution. n Cr = n Cn−r . e.g. 9 C7 = 9 C9−7 = 9 C2 =36
Example. A committee of 4 must be chosen from 3 women and 4men. Calculate:

1. In how many ways the committee can be chosen


This involves choosing 4 people from 7 members. That is, 7 C4 = 7!
4!3!
= 35
ways

2. In how many ways can 2 men and 2 women be chosen?

That is 4 C2 ∗ 3 C2 = 6 ∗ 3 = 18 ways

Learning Activities
Generally there are a number of general laws about sets which follow from the
definitions of set theoretic operations and subsets as we have seen in the previous
sections above. These include the Idempotent Laws, Commutative Laws,
DeMorgan’s Laws amongst other laws. Using relevant examples, discuss these
eight laws of sets.

Summary
Using the knowledge of sets, Venn diagrams, permutations and combinations we
can be able to apply the knowledge in finding probabilities of outcomes as we
shall see in the subsequent chapters. By knowing the number of elements in the
different sets, and the universal set, we can be able to compute the probability
of an event occurring, which is a proportion of elements in a set relative to the
number of elements in the universal set.

9
STA 2100 Probability and Statistics I

Revision Questions
List the following sets; N denotes the set of natural numbers while Z denotes the
set of integers.
Exercise 1. S = {x|x ∈ N, x < 10}
Exercise 2. P = {x|x ∈ Z, x < 6}
Exercise 3. Q = {x|x ∈ Z, and 2 < x < 10}
Exercise 4. Find the number of arrangements that can be made out of the
letters of the words ASSASINATIONS and ABRACADABRA.
Exercise 5. A committee of 5 members is to be formed out of 6 men and 4
women. How many committees can be formed so that at least one woman is
always there in the committee?

Assignment 1
1. A group of 50 BIT students were asked which of the three Computer Science
Journals, A, B or C they read. The results showed that 25 read A, 16 read
B, 14 read C. 5 read both A and B, 4 read both B and C, 6 read both C
and A and 2 read all three. a.

(a) Represent these data on a Venn diagram


(b) Find the probability that a person selected at random from this group
reads
i. At least one of the Journals
ii. Only one of the Journals

2. In a certain computer firm, there are 400 employees; of these 400, 150 are
men, 276 are university graduates, 212 are married persons, 94 are male
university graduates, 151 are married university graduates, 119 are married
men, 72 are married male university graduates. Find the number of single
women in this firm who are not university graduates.

3. A container has 12 blue balls and 8 red balls. If we are interested in choosing
10 balls from the container so that there are always at least two red balls in
the ten balls chosen, how many groups of ten balls can we make?

10
STA 2100 Probability and Statistics I

Chapter 2
Data: Collection and Sampling

Learning outcomes
Upon completing this topic, you should be able to:

• Clearly define the terms probability and statistics

• Describe the different sources and types of data

• Identify the different methods of data collection and criterion that we use
to select a method of data collection

• Define and distinguish the different types of variables

11
STA 2100 Probability and Statistics I

1. Introduction
Before we look at data presentation and the subsequent chapters, lets first define
what we mean by the terms Probability, and Statistics:
We often make statements about probability or chances. For instance a weather
forecaster may predict that there is a 80% chance of rain tomorrow; a health worker
may state that smokers are twice as likely to get lung cancer than non-smokers or
a computer technician may say that there is a 60% chance that a laptop fan may
not last more than 6 months after repair.

Definitions

Probability

Probability, which measures the likelihood that an event will occur, is an impor-
tant part of statistics. It is the basis of inferential statistics, in which we make
decisions under conditions of uncertainty. Probability theory is used to evaluate
the uncertainty invoked in those decisions. Combining probability and probability
distributions, with descriptive statistics, helps us to make decisions about popu-
lations based on information obtained from samples. Probability is therefore
the chance or relative possibility or likelihood that an event will occur.
It therefore reflects the long-run relative frequency of an outcome.

Statistics

The word ’statistics’ means one or more measures describing the characteristics of
a population. The main question that we attempt to answer using statistics is that
there is a relationship between variables. To demonstrate this relationship, we must
show that when one variable changes, the other variable changes too and that the
amount of change is not by mere chance. Statistics therefore is the science
of data. It deals with the scientific methods of collecting, organizing,

12
STA 2100 Probability and Statistics I

summarizing, analyzing, interpreting and presentation of information


or data. The collecting, organizing and summarizing part of data is what is
called descriptive statistics, while making valid conclusions from the data is
inferential statistics which comes after data analysis.
The first step in any analysis (investigation) is collection of data. The data may
be collected for the whole population or for a sample only. It is mostly collected
on sample basis. Collection of data is very difficult job and an enumerator or
investigator should be well a trained person who collects the statistical data. The
respondents (information) are the persons whom the information is collected from

Role of statistics

It is virtually impossible to avoid data analysis if we wish to monitor and improve


the quality of products, services and processes in an organization. Statistics plays
an important role in almost every field of life and human activity. It is applicable
to a wide variety of disciplines, including natural and social sciences, engineering,
government, information technology and business. For instance, statistics has
proved to be of immense use in physics and chemistry. It also plays a major role
in psychology, IT and education.
Exercise 6. Briefly describe the different aspects of statistics

Branches of Statistics

The study of statistics has two major branches: descriptive statistics and inferential
statistics.
Descriptive statistics - involves the organization, summarization, and display
of data.
Inferential statistics - involves using a sample to draw conclusions about a
population.
Example. In a recent study, volunteers who had less than 6 hours of sleep were
four times more likely to answer incorrectly on a Computer Science test than were

13
STA 2100 Probability and Statistics I

participants who had at least 8 hours of sleep. Decide which part is the descriptive
statistic and what conclusion might be drawn using inferential statistics.
Soln: The statement “four times more likely to answer incorrectly” is a de-
scriptive statistic. An inference drawn from the sample is that all individuals
sleeping less than 6 hours are more likely to answer science question incorrectly
than individuals who sleep at least 8 hours.

2. Data: Source/types and collection methods


Before any statistical work can be done, data must be collected. Depending on the
type of variable and the objective of the study; different data collection methods
can be employed.
Data collection techniques allow us to systematically collect data about our
objects of study (people, objects, and phenomena) and about the setting in which
they occur. In the collection of data we have to be systematic because if data
are collected haphazardly, it will be difficult to answer our research questions in a
conclusive way.

2.1. Definitions
Data consists of (raw) information coming from observations, counts, measure-
ments, or responses. They are measurements taken on a variable. A datum is
a single number which may represent a count (e.g. the number of computers
delivered to a client in Tatu City) or measurement (e.g. the mass of a bag of
tomatoes)
A population is the collection of all outcomes, responses, measurement, or
counts that are of interest.
A sample is a subset of a population.
A parameter is a numerical description of a population characteristic. (Parameter−→Population)
A statistic is a numerical description of a sample characteristic. (Statistic−→Sample)

Example. Decide whether the numerical value describes a population parameter


or a sample statistic.

14
STA 2100 Probability and Statistics I

1. A recent survey of a sample of 450 college IT graduates reported that the


average weekly income for graduates is £325.
Soln: Because the average of £325 is based on a sample, this is a sample
statistic.

2. The average weekly income for all IT graduates is £405.


Soln: Because the average of $405 is based on a population, this is a
population parameter.

2.2. Types of Data


Individuals and organizations collect data because the information is needed. They
may want information to keep the records for administrative purposes, make de-
cisions about important issues, or they may be required to pass information on to
others.
Data are known to be crude information and not knowledge by themselves. The
sequence from data to knowledge is: from data to information, from information
to facts, and finally, from facts to knowledge.
Data become information when they become relevant to your decision problem.
Information becomes facts, when the data can support it. Facts are what data
reveals. However, decisive instrumental knowledge is expressed together with some
statistical degree of confidence.
There are two types (sources) for the collection of data.

1. Primary data source

2. Secondary data source

Primary data
These are data that are fresh and collected for the first time and therefore are
original in character, that is, they are first hand information collected, compiled
and published by an organization or an individual for some purpose. They are
most original data in character and have not undergone any sort of statistical
(treatment).

15
STA 2100 Probability and Statistics I

For instance population census reports are primary data because they are col-
lected, compiled and published by the population census body, the economic survey
publication, the well-being survey etc.
The first hand information obtained by the investigator is more reliable and
accurate since the investigator can extract the correct information by removing
doubts, if any, in the minds of the respondents regarding certain questions. High
response rates might be obtained since the answers to various questions are ob-
tained on the spot. It permits explanation of questions concerning difficult subject
matter.

Secondary data
These are second hand information which are already collected by someone or
organization for some purpose and are available for the present study. They are
not pure in character, and have undergone some statistical treatment at least once.
When an investigator uses data, which have already been collected by others,
such data are called "Secondary Data". Such data are primary data for the agency
that collected them, and become secondary for someone else who uses these data
for his own purposes.

Sources of data.
When one is collecting data it is important to consider whether they are primary
or secondary data. Some of the main methods used for data collection include:
a) Census: A study that obtains data from every member of the popula-
tion—Not practical in most studies, because of the costs and/ or time required.
b) Sample survey: A study that obtains data from a subset of a popula-
tion, in order to estimate population attributes/ characteristics. Survey of human
populations and institution are common in government health, social science, IT,
marketing etc.
c) Experiment: A controlled study in which the researcher attempts to under-
stand the cause - and - effects relationship. The actual experiment is carried out
on certain individuals/units about whom information is drawn. Study is controlled
in the sense that the researcher controls how subjects are assigned to groups and
which treatments/conditions each group receives.

16
STA 2100 Probability and Statistics I

d) Observational study: Like experiments, observational study attempts to


understand cause-and-effect relationship. However, here the researcher is not able
to control how subjects are assigned to groups and /or which treatment each group
receives.

Methods of collecting primary data


a) Personal investigation; Researcher conducts the survey him/herself and collects
data from it. The methods is accurate, reliable and suitable for small research
projects

b) Through investigations; Trained investigators are employed to collect data.

c) Collection through questionnaire; Researcher gets data from local representa-


tion or agents that are based upon their own experience. Quick but only
saves rough estimates.

d) Through telephone; method is quick, accurate and reliable.

Methods of collecting secondary data


Secondary data may be collected by the following sources:

a) Official: from publications of KNBS, ministries etc

b) Semi - officials: from the state bank, Rail way board etc

c) Publication of trade associations

d) Technical & trade journals such as newspapers

e) Research organizations such as universities & other institutions

Editing of data

After collecting data from either source the next step is editing. i.e. Examination
of collected data to discover any error and mistake before presenting it. Editing
of secondary data is simpler than primary.

17
STA 2100 Probability and Statistics I

2.3. Data Classification


Data sets can consist of two types of data: qualitative data and quantitative data.
Qualitative data - consists of attributes, labels, or nonnumerical entries.
Quantitative Data - consists of numerical measurements or counts.

2.4. Methods of Data Collection


There are several methods of data collection and these may include some of the
data collection methods/techniques. These data collection methods results into
some of the following samples depending on how the data was collected.

• A stratified sample has members from each segment of a population. This


ensures that each segment from the population is represented.

• A cluster sample has all members from randomly selected segments of a


population. This is used when the population falls into naturally occurring
subgroups.

• A systematic sample is a sample in which each member of the population


is assigned a number. A starting number is randomly selected and sample
members are selected at regular intervals.

• A convenience sample consists only of available members of the popula-


tion.

What is a Variable?

In statistics, a variable is any characteristics, number, or quantity that can be


measured or counted. A variable may also be called a data item. Age, sex,
business income and expenses, country of birth, capital expenditure, class grades,
eye colour and vehicle type are examples of variables.
It is called a variable because the value may vary between data units in a
population, and may change in value over time.
For example; ’income’ is a variable that can vary between data units in a
population (i.e. the people or businesses being studied may not have the same

18
STA 2100 Probability and Statistics I

incomes) and can also vary over time for each data unit (i.e. income can go up or
down).
A random variable, denoted by X, is a variable whose possible values are
numerical outcomes of a random experiment.

Types of variables

There are different ways variables can be described according to the ways they can
be studied, measured, and presented.
Numeric variables have values that describe a measurable quantity as a number,
like ’how many’ or ’how much’. Therefore numeric variables are quantitative
variables.
Numeric variables may be further described as either continuous or discrete:

• A continuous variable is a numeric variable. Observations can take any


value between a certain set of real numbers. The value given to an observa-
tion for a continuous variable can include values as small as the instrument
of measurement allows. Examples of continuous variables include height,
time, age, and temperature.

• A discrete variable is a numeric variable. Observations can take a value


based on a count from a set of distinct whole values. A discrete variable
cannot take the value of a fraction between one value and the next closest
value. Examples of discrete variables include the number of registered cars,
number of business locations, and number of children in a family, all of of
which measured as whole units (i.e. 1, 2, 3 cars).

Therefore, the data collected for a numeric variable are quantitative data.
Categorical variables have values that describe a ’quality’ or ’characteristic’
of a data unit, like ’what type’ or ’which category’. Categorical variables fall
into mutually exclusive (in one category or in another) and exhaustive (include
all possible options) categories. Therefore, categorical variables are qualitative
variables and tend to be represented by a non-numeric value.

19
STA 2100 Probability and Statistics I

Categorical variables may be further described as ordinal or nominal :

• An ordinal variable is a categorical variable. Observations can take a value


that can be logically ordered or ranked. The categories associated with
ordinal variables can be ranked higher or lower than another, but do not
necessarily establish a numeric difference between each category. Examples
of ordinal categorical variables include academic grades (i.e. A, B, C), cloth-
ing size (i.e. small, medium, large, extra large) and attitudes (i.e. strongly
agree, agree, disagree, strongly disagree).

• A nominal variable is a categorical variable. Observations can take a value


that is not able to be organized in a logical sequence. Examples of nomi-
nal categorical variables include sex, business type, eye colour, religion and
brand.

The data collected for a categorical variable are qualitative data.


Exercise 7. You are doing a study to determine the number of years of education
each lecturer at your college has. Identify the sampling technique used if you select
the samples listed. (a) You randomly select two different departments and survey
each lecturer in those departments. (b) You select only the lecturers you currently
have this semester. (c) You divide the lecturers up according to their department
and then choose and survey some lecturers in each department.

20
STA 2100 Probability and Statistics I

Learning Activities

• Write a summary report, typed with the front size 12, font- Times New
roman with a spacing of 1.5 discussing the following concepts: Sampling
process, sampling frame, sampling methods, probability sampling methods
and non-probability methods.

21
STA 2100 Probability and Statistics I

Chapter 3
Presentation of Data

Learning outcomes
Upon completing this topic, you should be able to:

• Identify different methods of presenting data.

• Construct Charts and graphs for the purpose of data presentation.

• Compare the presentations of the same set of data by using various graphs.

• Understand the criterion for the selection of a method to organize and present
data

• Identify the different methods of data organization and presentation

• Identify sources of deception in misleading graphs.

22
STA 2100 Probability and Statistics I

1. Introduction
The raw data collected through the various methods of data collection will be
in a haphazard and unsystematic form and is not appropriately formed to draw
conclusions about the group or the population under study. Hence it becomes
necessary to arrange or organize the data in a form which is suitable for analysis.
Data can be presented in different forms such as: text, in a table, or pictorially as
a chart, diagram or graph.
Data graphics are a good way to communicate important data in your reports.
The purpose of putting results of research into graphs, charts and tables is two-
fold. First, it is a visual way to look at the data and see what happened and make
interpretations. Second, it is usually the best way to show the data to others.
Reading lots of numbers in the text puts people to sleep and does little to
convey information. Tables are the most commonly used form of data graphics,
but graphs, charts or diagrams that include symbols and pictures will get your
results across to the reader faster and will liven up your presentation or report.

2. Tables
Once we have collected our data, often the first stage of any analysis is to present
them in a simple and easily understood way. Tables are perhaps the simplest means
of presenting data.
There are many types of tables. For example, we have all seen tables listing
sales of computers by type, or exchange rates, or the financial performance of
companies. These types of tables can be very informative. However, they can also
be difficult to interpret, especially those which contain vast amounts of data.
Frequency tables are amongst the most commonly–used tables and are perhaps
the most easily understood. They can be used with continuous, discrete, categor-
ical and ordinal data. Frequency tables have uses in some of the techniques we
will see in the next lecture.

3. Graphical presentations
Graphics, such as maps, graphs and diagrams, are used to represent large volume of
data. With large amounts of data graphical presentation methods are often clearer

23
STA 2100 Probability and Statistics I

to understand. We look at methods for producing graphical representations of data


of the different types.These methods are necessary because:

• If the information is presented in tabular form or in a descriptive record, but


it becomes difficult to draw results.

• Graphical form makes it possible to easily draw visual impressions of data.

• The graphic method of the representation of data enhances our understand-


ing.

• It makes the comparisons easy.

• Such methods create an imprint on mind for a longer time.

• It is a time consuming task to draw inferences about whatever is being


presented in non–graphical form.

• It presents characteristics in a simplified way.

• These makes it easy to understand the patterns of population growth, distri-


bution and the density, sex ratio, age–sex composition, occupational struc-
ture, etc.

When creating graphic displays, keep in mind the following questions:

• What am I trying to communicate?

• Who is my audience?

• What might prevent them from understanding this display?

• Does the display tell the entire story?

Some Rules of Thumb

• show the data

• avoid distorting the data

• induce the viewer to think about the substance of the graphic rather than
the methodology, graphic design, or something else

24
STA 2100 Probability and Statistics I

• make large amounts of data coherent

• encourage the viewer to use the graphic as you intend, e.g. make compar-
isons

• be closely integrated with statistical and verbal descriptions of the data

• be as simple as possible

In the following section we discuss some of the commonly used graphical presen-
tations

3.1. Pie chart


Pie charts are simple diagrams for displaying categorical or grouped data. These
charts are commonly used within industry to communicate simple ideas, for exam-
ple market share of ISP in a country. They are used to show the proportions of a
whole. They are best used when there are only a handful of categories to display.
A pie chart consists of a circle divided into segments, one segment for each
category. The size of each segment is determined by the frequency of the category
and measured by the angle of the segment. As the total number of degrees in a
circle is 360, the angle given to a segment is 360 times the fraction of the data in
the category, that is angle = T otalnumberinsample(n)
N umberrincategory
× 360
A pie chart is a circle that is divided into sectors that represent categories. The
area of each sector is proportional to the frequency of each category.
Example. Construct a pie chart for the following data on Accidental Deaths in
the USA in 2002.
Type Frequency
Motor Vehicle 43,500
Falls 12,200
Poison 6,400
Drowning 4,600
Fire 4,200
Ingestion of Food/Object 2,900
Firearms 1,400

25
STA 2100 Probability and Statistics I

Pie Chart
To create a pie chart for the data, find the relative frequency
(percent) of each category.

Relative
Type Frequency
Frequency
Motor Vehicle 43,500 0.578
Falls 12,200 0.162
Poison 6,400 0.085
Drowning 4,600 0.061
Fire 4,200 0.056
Ingestion of Food/Object 2,900 0.039
Firearms 1,400 0.019
n = 75,200
Continued.
Athiany, HKO 29

26
STA 2100 Probability and Statistics I

Pie Chart
Next, find the central angle. To find the central angle,
multiply the relative frequency by 360°.

Relative
Type Frequency Angle
Frequency
Motor Vehicle 43,500 0.578 208.2°
Falls 12,200 0.162 58.4°
Poison 6,400 0.085 30.6°
Drowning 4,600 0.061 22.0°
Fire 4,200 0.056 20.1°
Ingestion of Food/Object 2,900 0.039 13.9°
Firearms 1,400 0.019 6.7°
Continued.
Athiany, HKO 30

27
STA 2100 Probability and Statistics I

Pie Chart
Ingestion Firearms
3.9% 1.9%
Fire
5.6%
Drowning
6.1%

Poison
8.5% Motor
vehicles
Falls 57.8%
16.2%

Athiany, HKO 31

28
STA 2100 Probability and Statistics I

3.2. Bar Charts


Bar charts are a commonly used and clear way of presenting categorical data or
any ungrouped discrete frequency observations.
The five step process of creating a bar chart are shown below:

1. First decide what goes on each axis of the chart. By convention the variable
being measured goes on the horizontal (x–axis) and the frequency goes on
the vertical (y–axis).

2. Next decide on a numeric scale for the frequency axis. This axis represents
the frequency in each category by its height. It must start at zero and include
the largest frequency. It is common to extend the axis slightly above the
largest value so you are not drawing to the edge of the graph.

3. Having decided on a range for the frequency axis we need to decide on a


suitable number scale to label this axis. This should have sensible values,
for example, 0, 1, 2, . . . , or 0, 10, 20 . . . , or other such values as make
sense given the data.

4. Draw the axes and label them appropriately.

5. Draw a bar for each category. When drawing the bars it is essential to ensure
the following:

• the width of each bar is the same;

• the bars are separated from each other by equally sized gaps.

Example. Use the following data representing the number of guests who were
booked in a hotel in Mombasa on a particular day in the month of December
2013, construct a suitable bar graph for the data.

29
STA 2100 Probability and Statistics I

Country of origin Male Female


Irish 10 7
British 4 10
Mainland European 2 2
Rest of the world 5 7
Total 21 26
The corresponding bar graph is as shown below.

Note that the above graph is a vertical bar graph. We can also obtain a vertical
bar graph presenting the same information.

3.3. Histograms
Bar charts have their limitations; for example, they cannot be used to present
continuous data. When dealing with continuous random variables a different kind
of graph is required. This is called a histogram. At first sight these look similar
to bar charts. There are, however, two critical differences:

• the horizontal (x-axis) is a continuous scale. As a result of this there are


no gaps between the bars (unless there are no observations within a class
interval);

• the height of the rectangle is only proportional to the frequency if the class
intervals are all equal. With histograms it is the area of the rectangle that

30
STA 2100 Probability and Statistics I

is proportional to their frequency.

Initially we will only consider histograms with equal class intervals. Those with
uneven class intervals require more careful thought. Producing a histogram is
much like producing a bar chart and in many respects can be considered to be the
next stage after producing a grouped frequency table. In reality, it is often best to
produce a frequency table first which collects all the data together in an ordered
format. Once we have the frequency table, the process is very similar to drawing
a bar chart.

1. Find the maximum frequency and draw the vertical (y–axis) from zero to
this value, including a sensible numeric scale.

2. The range of the horizontal (x–axis) needs to include not only the full range
of observations but also the full range of the class intervals from the fre-
quency table.

3. Draw a bar for each group in your frequency table. These should be the
same width and touch each other (unless there are no data in one particular
class).

Example. The following data represents the ages of 30 IT students in a statistics


class. Construct a histogram using this data.
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 38 37 22
30 39 32 44 33 46
54 49 18 51 21 21
Before constructing the histogram, we need to first construct a frequency distri-
bution, then finally construct the histogram. In that connection, the process of
constructing the histogram is as follows.

Constructing a Frequency Distribution

31
STA 2100 Probability and Statistics I

Guidelines

1. Decide on the number of classes to include.

2. The number of classes should be between 5 and 20; otherwise, it may be


difficult to detect any patterns.

3. Find the class width as follows. Determine the range of the data, divide the
range by the number of classes, and round up to the next convenient
number.

4. Find the class limits. You can use the minimum entry as the lower limit of
the first class. To find the remaining lower limits, add the class width to the
lower limit of the preceding class. Then find the upper class limits.

5. Make a tally mark for each data entry in the row of the appropriate class.

6. Count the tally marks to find the total frequency f for each class.

For the above data;

• Obtain the number of classes (k) as follows: k = roundup(log(n)/log2)

• The minimum data entry is 18 and maximum entry is 54, so the range is
36. Divide the range by the number of classes to find the class width.

– Classwidth = 36
5
= 7.2 round up to 8

• The minimum data entry of 18 may be used for the lower limit of the first
class. To find the lower class limits of the remaining classes, add the width
(8) to each lower limit.

– The lower class limits are 18, 26, 34, 42, and 50.
– The upper class limits are 25, 33, 41, 49, and 57.

• Make a tally mark for each data entry in the appropriate class.

• The number of tally marks for a class is the frequency for that class.

In summary, the frequency distribution is as follows;

32
STA 2100 Probability and Statistics I

Constructing a Frequency Distribution


Example continued:
Number of
Ages students
Ages of Students
Class Tally Frequency, f
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
Check that the
50 – 57 2 sum equals
the number in
 f  30
the sample.

Athiany, HKO 10

Once this is done and we now have the class intervals, we can then construct
the histogram as follows;

Frequency Histogram
A frequency histogram is a bar graph that represents
the frequency distribution of a data set.
1. The horizontal scale is quantitative and measures
the data values.
2. The vertical scale measures the frequencies of the
classes.
3. Consecutive bars must touch.
Class boundaries are the numbers that separate the
classes without forming gaps between them.
The horizontal scale of a histogram can be marked with
either the class boundaries or the midpoints.
Athiany, HKO 16

33
STA 2100 Probability and Statistics I

Class Boundaries
Lets consider the class boundaries for the “Ages of the IT Students”
frequency distribution.
Ages of Students
Class
Class Frequency, f Boundaries
The distance from 18 – 25 13 17.5  25.5
the upper limit of
the first class to the 26 – 33 8 25.5  33.5
lower limit of the 34 – 41 4 33.5  41.5
second class is 1.
42 – 49 3 41.5  49.5
Half this 50 – 57 2 49.5  57.5
distance is 0.5.
 f  30

Athiany, HKO 17

34
STA 2100 Probability and Statistics I

An finally the histogram;

Frequency Histogram
To draw a frequency histogram for the “Ages of Students”
frequency distribution, we use the class boundaries.

14 13 Ages of Students
12
10
8
8

f 6
4
4 3
2 2

0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
Athiany, HKO 18

You may have noticed that we referred to the above histogram as a frequency
histogram. Instead of constructing a frequency histogram, we may also be inter-
ested in constructing a relative frequency histogram. The process is quite similar
to the above after obtaining the relative frequencies as illustrated below.

Relative Frequency
First, we need to find the relative frequencies for the “Ages of IT
Students” frequency distribution as follows.

Relative Portion of
Class Frequency, f Frequency students
18 – 25 13 0.433 f 13

26 – 33 8 0.267 n 30
34 – 41 4 0.133  0.433
42 – 49 3 0.1
50 – 57 2 0.067
f
 f  30  1
n
Athiany, HKO 14

35
STA 2100 Probability and Statistics I

Relative Frequency Histogram


A relative frequency histogram has the same shape and
the same horizontal scale as the corresponding frequency
histogram.

0.5
0.433
(portion of students)
Relative frequency

0.4 Ages of Students


0.3
0.267
0.2
0.133
0.1
0.1 0.067
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Athiany, HKO 20

3.4. Frequency Polygons


These are a natural extension of the relative frequency histogram. They differ in
that, rather than drawing bars, each class is represented by one point and these
are joined together by straight lines. The method is similar to that for producing
a histogram:

1. Produce a percentage relative frequency table.

2. Draw the axes;The x-axis needs to contain the full range of the classes
used. The y-axis needs to range from 0 to the maximum percentage relative
frequency.

3. Plot points: pick the mid point of the class interval on the x-axis and go up
until you reach the appropriate percentage value on the y-axis and mark the
point. Do this for each class.

4. Join adjacent points together with straight lines.

A frequency polygon is a line graph that emphasizes the continuous change in


frequencies.
Example. Using the same data on IT students age, we can construct the following
frequency polygon;

36
STA 2100 Probability and Statistics I

Frequency Polygon

14
Ages of Students
12
10
8 Line is extended
to the x-axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints

Athiany, HKO 19

3.5. Cumulative Frequency Polygons (Ogive)


Cumulative percentage relative frequency is also a useful tool. The cumulative
percentage relative frequency is simply the sum of the percentage relative frequen-
cies at the end of each class interval (i.e. we add the frequencies up as we go
along).
A cumulative frequency graph or Ogive, is a line graph that displays the cumu-
lative frequency of each class at its upper class boundary. The graph, or Ogive, is
simple to produce by hand:

1. Draw the axes.

2. Label the x-axis with the full range of the data and the y-axis from 0 to
100%.

3. Plot the cumulative % relative frequency at the end point of each class.

4. Join adjacent points, starting at 0% at the lowest class boundary.

Example. Use the IT students age data to construct an Ogive curve/plot.

37
STA 2100 Probability and Statistics I

Cumulative Frequency Graph

30 Ages of Students
Cumulative frequency
(portion of students)

24

18
The graph ends
at the upper
12 boundary of the
last class.
6

0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Athiany, HKO 21

3.6. Scatter Plots


Scatter plots are used to plot two variables which you believe might be related, for
example, height and weight, advertising expenditure and sales, or age of machinery
and maintenance costs.

3.7. Box plots


Box plots or “box and whisker” plots are another graphical method for displaying
data and are particularly useful for highlighting differences between groups.These
plots use some of the key summary statistics such as, the quartiles and also the
maximum and minimum observations. Box plots are a useful tool to compare
groups of data .Another use of box plots is to show outliers and it also gives an
idea about the shape of the distribution.

3.8. Stem and leaf plots


Stem and leaf plots are a quick and easy way of representing data graphically. They
can be used with both discrete and continuous data. The method for creating a
stem and leaf plot is similar to that for creating a grouped frequency table. The
first stage, as with grouped frequency tables, is to decide on a reasonable number
of intervals which span the range of data. The interval widths for a stem and leaf
plot must be equal. Because of the way the plot works it is best to use “sensible”
values for the interval width – i.e. 5, 10, 100, 1000; if a data set consists of many

38
STA 2100 Probability and Statistics I

small values, this interval width could also be 1, or even 0.1 or 0.01. Once we
have decided on our intervals we can construct the stem and leaf plot.
Consider the following data: 11, 12, 9, 15, 21, 25, 19, 8. The first step is to
decide on interval widths – one obvious choice would be to go up in 10s. This
would give a stem unit of 10 and a leaf unit of 1. The stem and leaf plot is
constructed as below.
Stem units: 10, leaf digits: 1 (the value 8.000 is represented by 0|8)
0|89
1|1259
2|15
In a stem-and-leaf plot, each number is separated into a stem (usually the
entry’s leftmost digits) and a leaf (usually the rightmost digit). This is an example
of exploratory data analysis.
Using the IT students age data set, we can construct a stem and leaf plot as
follows

Stem-and-Leaf Plot

Ages of Students
Key: 1|8 = 18
1 888999
2 0011124799 Most of the values lie
3 002234789 between 20 and 39.

4 469
5 14
This graph allows us to see
the shape of the data as well
as the actual values.

Athiany, HKO 24

39
STA 2100 Probability and Statistics I

Stem-and-Leaf Plot
Constructing a stem-and-leaf plot that has two lines for
each stem.

Ages of Students
1 Key: 1|8 = 18
1 888999
2 0011124
2 799
3 002234
3 789 From this graph, we can
4 4 conclude that more than 50%
4 69 of the data lie between 20
5 14 and 34.
5
Athiany, HKO 25

40
STA 2100 Probability and Statistics I

Revision Questions

Exercise 8. Distinguish between a pie chart and a histogram.


Exercise 9. Identify a data set of your choice and use it to construct a bar or
pie chart. Similarly, obtain some quantitative data and use the data to construct
a histogram and a cumulative frequency polygon.

Learning Activities
1. Read more on scatter plots and Box plots and summarize their use, ad-
vantages and disadvantages versus the methods we have presented in this
lecture. If possible, give some examples of a scatter plot and a box plot.

2. Search for at least five bad graphs and discuss why they are bad.

41
STA 2100 Probability and Statistics I

Chapter 4
Numerical Summaries of Data (Simple frequency
distributions)

Learning outcomes
Upon completing this topic, you should be able to:

• Identify and give examples of the measures of central tendency.

• Calculate and interpret the measures of central tendency.

• Identify and calculate the measures of variability.

• Identify and calculate the measures of location.

• List the characteristics of the measures of central tendency .

• Choose the appropriate measure that can best describe a given data.

• Discuss the relative merits of different measures of central tendency for a


given situation

42
STA 2100 Probability and Statistics I

1. Introduction
Collected data need to be organized in such a way as to condense the information
they contain in a way that will show patterns of variation clearly. Precise meth-
ods of analysis can be decided up on only when the characteristics of the data
are understood, since the primary objective of these different techniques of data
organization and presentation like order; array, tables and diagrams are used.
For frequency distributions of data to be more easily appreciated, and to draw
quick comparisons, it is often useful to arrange the data in the form of a table,
or in one of a number of different graphical forms. When analyzing voluminous
data collected from say, an IT firms records, it is quite useful to put them into
compact tables. Quite often, the presentation of data in a meaningful way is done
by preparing a frequency distribution. If this is not done the raw data will not
present any meaning, and any pattern in them (if any) may not be detected.
A frequency distribution is a table that shows classes or intervals of data
with a count of the number in each class. The frequency f of a class is the number
of data points in the class
Array (ordered array) is a serial arrangement of numerical data in an as-
cending or descending order. This will enable us to know the range over which the
items are spread and will also get an idea of their general distribution. Ordered
array is an appropriate way of presentation when the data are small in size (usually
less than 20).
The first step in looking at data is to describe the data at hand in some concise
way. In smaller studies this step can be accomplished by listing each data point.
In general, however, this procedure is tedious or impossible and, even if it were
possible would not give an over-all picture of what the data look like.
The basic problem of statistics can be stated as follows: Consider a sam-
ple of data x1 , . . . . . . ..xn , where x1 corresponds to the first sample point and n
corresponds to the nth sample point.
Presuming that the sample is drawn from some population P , what inferences
or conclusion can be made about P from the sample?
To answer this question, first, the data must be summarized as succinctly
(concisely, briefly) as possible, since the number of sample points is frequently
large and it is easy to lose track of the overall picture by looking at all the data
at once. One type of measure useful for summarizing data defines the center, or

43
STA 2100 Probability and Statistics I

middle, of the sample. This type of measure is a measure of central tendency


(location).
The two most common numerical descriptive measures are; measures of central
tendency and measures of variability; that is, we seek to describe the center of
the distribution of measurements and also how the measurements vary about the
center of the distribution.
In this topic we introduce the numerical summaries of data .We discuss the
measures of central tendency, variability and measures of location for univariate
data. We also draw a distinction between parameters and statistics.

2. Measures of central tendency


The tendency of statistical data to get concentrated at certain values is called the
“Central Tendency” and the various methods of determining the actual value
at which the data tend to concentrate are called measures of central Tendency
or averages. Hence, an average is a value which tends to sum up or describe the
mass of the data. Herein, we describe some of the measures of central tendency
for simple frequency distributions.

2.1. Mean
• Arithmetic mean
The arithmetic mean is the sum of all observations divided by the number of ob-
servations. If n measurements x1, x2 , x3, x4..........., xn have been taken on a variable,
the arithmetic mean of the observations is given by;
P
x1 + x2 + x3 + ... + xn xi
x= =
n n
In case of a frequency distribution where xi occur with frequency fi , the mean
x̄ is shall be obtained as follows;
x̄ = x1 f1 +x 2 f2 +x3 f3 ...xn fn
=
P P
f1 +f2 +...+fn
x i f i / fi
Example. The members od an ochestra were asked how many instruments each
would be able to play. Below are the results of their response.
2,5,2,4,1,1,1,2,1,3,3,2,1,2,1,1,2,2,4,3,2,1,2,3,1,4,2,3,1,1,2
Obtain the mean number of instruments played by a member of the ochestra.
Solution:

44
STA 2100 Probability and Statistics I

From the information given, n = 30,


P
x = 2 + 5 + 2 + 1... + 1 + 2 = 63
Thus x̄= x/n = 30 = 2.1
63
P

Thus mean number of instruments played are 2.


Using frequencies, we can get the mean using the frequency table as follows;
No. of instruments, x 1 2 3 4 5
Frequency, f 11 10 5 3 1
xf 11 20 15 12 5
From the table, f = 30 ,
P P
xf = 63
Thus, mean is obtained again as x̄= x/n = 63
P
30
= 2.1
Manipulation of the mean formula
P P
x̄= x/n =⇒ nx̄ = x
Using this idea, we can now compute the mean of a simple frequency distribu-
tion
Example. The mean weekly wage of 86 IT managers in a county has been cal-
culated as $172.45 but the wage of the 87th employee was later on discovered to
have been missed in the process. If his wage was $158.80, what is the mean wage
of all the 87 employees?
Solution:
Initially, n = 86 , x̄ = 172.45. This implies that n xi = 172.45 ∗ 87 =
P

14, 830.7
Thus for 87 employees, we have 14, 830.7 + 158.80 = 14989.5 as the total
wage;
To obtain the mean, we have 14989.5/87 = 172.29
Weighted mean
A common problem is when the mean of a number of groups need to be
combined to form a grand mean. For instance, suppose a company splits its
home sales into three regions, each region having a sales representatives. Over a
particular period, rep A averages 8642 per sale from 24 sales, rep B had 119 from
37 sales and rep C 0422 from 25 sales. Find the average sale overall, thus
(8642∗24)+(1129−37)+(10422∗25)
24+37+25
= 509,731
86
= 5, 927.10 sales
Note:

45
STA 2100 Probability and Statistics I

The arithmetic mean is, in general, a very natural measure of central location.
One of its principal limitations, however, is that it is overly sensitive to extreme
values. In this instance it may not be representative of the location of the great
majority of the sample points.

• Geometric mean
The geometric mean is a type of mean or average, which indicates the central
tendency or typical value of a set of numbers by using the product of their values.
It is defined as the nth root (where n is the count of numbers) of the product
of the numbers. Geometric mean of two numbers a, b is the square root of their
product.


G.M = n
a1 a2 .........an

Example. Tawaguchi invested a sum of money at a compound interest for three


years. In the first year, the rate of interest was 3%, in the second year, it was 7%
and the third year it was 14%. What was the average annual rate of interest over
the three years?
Solution
Using the arithmetic mean, the average interest would be
1
3
(3 + 7 + 14) = 8%
However, if we think of him investing p initially, the interest is worth;
p(1 + 1/3) = p(1.03) 1st year
p(1.03)p(1.07) 2nd year
p(1.03)p(1.07)p(1.14) 3rd year
If it was constant rate r for three years, his interest would be p(1 + 100
r 3
) after 3
years, and we may determine the equivalent constant rate by equating p(1 + 100 r 3
)
and p(1.03)p(1.07)p(1.14)
that is, (1 + 100r
) = [(1.03)(1.07)(1.14)]1/3 =1.079 implying r = 0.079 or
r = 7.9%
This is slightly less than the average rate obtained using the arithmetic mean.

• Harmonic mean
Harmonic mean is another measure of central tendency and also based on math-
ematic footing like arithmetic mean and geometric mean. Like arithmetic mean

46
STA 2100 Probability and Statistics I

and geometric mean, harmonic mean is also useful for quantitative data. Harmonic
mean is defined in following terms: Harmonic mean is quotient of “number of the
given values” and “sum of the reciprocals of the given values”. The harmonic mean
of two numbers a, b is the reciprocal of the arithmetic mean of their reciprocal.
That is;
1
1/2(1/a+1/b)
This can then be simplified to become 1
= 1 = 2ab
1/2(1/a+1/b) 1/2( b+a ) (a+b)
ab
Generally, the harmonic mean of n numbers is given by 1/n(1/a1 +1/a
1
2 +...+1/an )

Example. Purity visits her aunt Daniella who stays some 60 km away. She travels
to her aunts home by cycle at an average speed of 20km/h. She returned in a
friend’s car at an average speed of 40km/h. What is her average speed for the
round trip?
Solution
When computing the average speed, we commonly make a mistake by com-
puting the speed as follows;
1
2
(20 + 40) = 30km/h This is not correct!
To compute this speed, there are two ways of getting this;
First approach
1st leg: she does 60km at 20km/h which takes 3 hours
2nd leg: she again does 60km at 40km/h takes 1.5 hours
Total distance covered is 120km in 4.5 hours
Average speed is 120
4.5
= 26.7km/h
Second approach
But this can also be obtained using the harmonic mean formula as follows:
2ab
(a+b)
= 2∗40∗20
40+20
= 1600
60
= 26.7km/h

Properties of the Mean


Pros:

• Gives a sense of “typical” case

• Useful for continuous data

• Easy to calculate

• Center of gravity

47
STA 2100 Probability and Statistics I

Cons:

• Every case influences outcome

• Extreme cases (outliers) affect results a lot. (e.g.Mean income is often not
very meaningful)

• Doesn’t give you a full sense of the distribution

• Appropriate only for quantitative data

2.2. Median
The median is the midpoint of a distribution; i.e., the observation such that half
of observations are smaller and the other half are larger. This is sometimes used
instead of the mean particularly when the histogram of the observations is skewed.
It is obtained by placing the observations in ascending order of magnitude and then
picking out the middle observation. The median for a set of data that contains an
even number of items, there is no unique middle value or central value, hence use
the mean of the middle two items to give a practical median. It has the advantage
that it is not influenced by odd extreme observations.
COMPUTATION: Here are the steps we take for computing the median, M.

1. Order the data from low to high.

2. Compute the median position location; i.e., compute median position loca-
tion = n+1
2

3. The median of the data is given by the ordered value in this position.

• If n is odd, the median position location will be a whole number, say,


8; in this case, the median would be the 8th ordered observation. That
is position n+1
2

• If n is even, the median position location will be a fraction, say, 8.5;


in this case, the median would be the (arithmetic) average of the 8th
and 9th ordered observations. That is average of the values in position
n+1
2
and position n2 + 1 .

48
STA 2100 Probability and Statistics I

Example. Suppose we have the following data giving the final exam scores for a
class in a particular unit; 96 92 84 77 74 84 80 74. Find the median.
Solution
Here, n = 8 (the number of observations).
First, we order the data from low to high (or we can also order them from high
to low)
74 74 77 80 84 84 92 96
The median position location is median position location n+12
= 8+1
2
= 4.5
Thus, the median is the average of the 4th and 5th ordered values; i.e.,
the median is M = 80+842
= 82
Example. Obtain the median of the following set of data:
205,207,220,217,219,208,206,212,215,218,204
Solution
Arranging the data, we have 204,205,206,207,208,212,215,217,218,219,220
The median is the value in position n+12
. We have n = 11, thus 11+1
2
=6.
Thus the value in the 6th position is 212
Suppose we have frequencies, then the median can be obtained as shown in
the following example.
Example. The following table shows the records for number of computers not
available for use in the multimedia center for 80 consecutive days in an institution
of higher learning.
No. of computers not available 0 1 2 3 4 5 6
No. of days 15 24 18 12 8 2 1
To obtain the median of this distribution, we follow these steps.
Obtain the cumulative frequency of the distribution, then find the position of
the median using the above formula.
That is

49
STA 2100 Probability and Statistics I

No. of 0 1 2 3 4 5 6
computers
not
available
No. of days 15 24 18 12 8 2 1
Cumulative 15 39 57 69 77 79 80
frequency,
CF
n in this case is even. Using the formula, n2 and n2 + 1, we have 80
2
= 40 and
80
2
+ 1 = 41. so the median is the value represented by the average of the values
in the position 40 and 41.
Using the CF, we obtain the median as the value that first exceeds 40 and 41.
In the above example, it is 57. The median therefore is the value corresponding
to this CF, and that is 2.

2.3. Mode
Sometimes a set of data is obtained where it is appropriate to measure a represen-
tative value in terms of ‘popularity’. The mode of a set of data is that value which
occurs most often or equivalently has the largest frequency and is appropriate for
all types of data. It is usually found by inspection. For discrete data this is easy.
The mode is simply the most common value. A data set may have no mode, one
mode (unimodal), two modes (bimodal) or more than two modes (multimodal).
Example. Given the following data, obtain the mode.
205,207,220,217,219,208,206,212,215,218,204, 205,219
Solution
Here, we have two modes, that is 205 and 219 as they both appear twice in
the data set.

Exercise 10. One of the characteristics of young adolescents is their ability to


consume fast foods particularly burgers. If you eat a hamburger, you consume
more than just bun, beef, and trimmings. While enjoying the delicious taste of a
hamburger, generous amounts of fat, calories, and cholesterol are being consumed
as well. What is the “typical” amount of cholesterol in a burger? The cholesterol
levels (in mg) are given for 11 different fast food burgers: 65, 50, 55, 60, 80, 50,

50
STA 2100 Probability and Statistics I

65, 60, 60, 90, and 99. A quick scan of the data suggests that the amounts of
cholesterol vary somewhat in the different burgers.Find the amount of cholesterol
in a burger by calculating three measures of central tendency: median, mode,
and mean.

3. Measures of variability
A measure of central tendency is insufficient in itself to summarise data as it only
describes the value of a typical outcome and not how much variation there is in
the data. For example, the two data sets 6, 22, 38 and 21, 22, 23 both have
the same mean (22) and the same median (22). However the first set of data
ranges considerably from this value while the second stays very close. They are
clearly very different data sets. Measures of variability or dispersion are descriptive
statistics that describe how similar a set of scores are to each other . They describe
how “spread out” a distribution is around its center They include: Range, Inter-
quartile range (IQR) ,Quartile deviation (semi Inter-quartile range), Mean absolute
deviation , Variance and standard deviation

3.1. Range
This is the difference between the maximum and minimum observations in the
data set. The range is used when we have ordinal data or are presenting results to
people with little or no knowledge of statistics The range is rarely used in scientific
work as it is fairly insensitive since it depends on only two scores in the set of data,
the maximum and minimum values. It is also possible that two very different sets
of data can have the same range:

3.2. Semi-interquartile range


The semi-interquartile range (or SIR) is defined as the difference of the first and
third quartiles divided by two i.e. the first quartile which is the 25th percentile
The third quartile which is the 75th percentile. The SIR is often used with skewed
data as it is insensitive to the extreme scores . The method of obtaining the 25th
and 75th percentiles is quite similar to the method of obtaining the median as the
median is the 50th percentile.

51
STA 2100 Probability and Statistics I

3.3. Variance and standard deviations


The variance is a measure of how items are dispersed about their mean.Variance is
defined as the average of the square deviations of the observations from the mean
while the standard deviation is the positive square root of variance.The variance
σ 2 of a whole population is given by the equation

(x − µ)2
P
2
σ =
n

x2
P
= − µ2
n
The variance of a sample s2 is calculated differently as:

(x − x̄)2
P
2
s =
n−1
P 2
( x)2
P
x
= −
n − 1 n(n − 1)

Example. Calculate the standard deviation of the following set of data. 2, 4,


6,8,10 and 12
Solution
x̄ = 2+4+6+8+10+12
6
= 42
6
=7
(xi −x̄) (2−7)+(4−7)+(6−7)+(8−7)+(10−7)+(12−7)
n
= 6
= −5+−3+−1+1+3+5
6
= 06 = 0
To avoid the result being zero, we need to square the deviations from the mean
to obtain the squared variations from the mean
2 (2−7)2 +(4−7)2 +(6−7)2 +(8−7)2 +(10−7)2 +(12−7)2
That is (xi −x̄)n
= 6
= 70
6
= 11.66
Hence variance is 11.66 and to find the standard deviation, we have to get the

square root of the variance, thus 11.66 = 3.42.
If we treated the above data set as a sample, then we would have used the
2
i −x̄)
formula (xn−1

Exercise 11. Find the variance and standard deviation for the following: 9, 10,
5, 6, 5, 7, 8, and 5.
The following notes gives more information on the variance and standard de-
viation of a population and a sample.

52
STA 2100 Probability and Statistics I

Finding the Population Standard Deviation

Guidelines
In Words In Symbols
1. Find the mean of the population x
μ
data set. N

2. Find the deviation of each entry. x μ

3. Square each deviation. x  μ2


4. Add to get the sum of squares. SS x   x  μ
2

5. Divide by N to get the population  x  μ


2

variance. 2 
N
6. Find the square root of the
 x  μ
2
variance to get the population 
N
standard deviation.

Athiany, HKO 5

53
STA 2100 Probability and Statistics I

Finding the Sample Standard Deviation

Guidelines
In Words In Symbols
1. Find the mean of the sample data x  x
set. n

2. Find the deviation of each entry. x x


3. Square each deviation. x  x 2
4. Add to get the sum of squares. SS x   x  x 
2

5. Divide by n – 1 to get the sample  x  x 


2

variance. s2 
n 1
6. Find the square root of the
 x  x 
2
variance to get the sample s
n 1
standard deviation.

Athiany, HKO 6

54
STA 2100 Probability and Statistics I

Finding the Population Standard Deviation

Example:
The following data are the closing prices for a certain Computer
store stock on five successive Fridays. The population mean is 61.
Find the population standard deviation.
Always positive!

Stock Deviation Squared SS2 = Σ(x – μ)2 = 74


x x–μ (x – μ)2
 x  μ
2
56 –5 25 2  
74
 14.8
58 –3 9 N 5
61 0 0
 x  μ
2
63 2 4   14.8  3.8
67 6 36 N

Σx = 305 Σ(x – μ) = 0 Σ(x – μ)2 = 74


σ  $3.90
Athiany, HKO 7

55
STA 2100 Probability and Statistics I

Interpreting Standard Deviation

When interpreting standard deviation, remember that is a measure


of the typical amount an entry deviates from the mean. The more
the entries are spread out, the greater the standard deviation.

14 14
12 =4 12 =4
Frequency

Frequency
10 s = 1.18 10 s=0
8 8
6 6
4 4
2 2
0 0
2 4 6 2 4 6
Data value Data value

Athiany, HKO 8

3.4. Coefficient of Variation


It is sometime necessary to compare two different distributions with regards to
variability. For example; if two machines were engaged in the production of iden-
tical components, it would be of considerable value to compare the variation of
certain critical dimensions of their output. The standard deviation is used as a
measure for the comparison only when the units in the distribution are the same
and the respective means are roughly comparable.
In the majority of cases where distributions need to be compared with respect to
variability, the following measure, known as the Coefficient of Variation(CV)
is much more appropriate and considered as the standard measure of relative
variation. i.e. CV = standardmean
deviation(sd)
∗ 100%
CV calculates the standard deviation as a percentage of the mean. The actual
units cancel each other hence CV becomes unit-free value which is very useful for
relative comparisons.
Example. Over a period of three months, the daily number of components pro-
duced by two comparable machines was measured giving the following statistics.
Machine A: mean=242.8 s.d =20.5 and Machine B: mean =281.3 s.d =23.0
Solution
Coefficient of Variation, CV for;
Machine A: 242.8
20.5
∗ 100 = 8.4%
Machine B: 281.3 ∗ 100 = 8.2%
23.0

Comment: Although the standard deviation for machine B is higher in abso-


lute terms; the dispersion for machine A is higher in relative terms!!! Among

56
STA 2100 Probability and Statistics I

the two machines, which one do you go for?

3.5. Pearson’s Measure of Skewness (psk)


The degree of skewness can be measured by the mean and mode. For practi-
cal reasons, it is usual to require a measure of skewness to be unit-free (i.e. a
coefficient), hence we use the Pearson’s measure of skewness (psk).
psk = mean−mode
s.d
= 3∗(mean−median)
s.d
Thus using this measure, we can compare the skewness of two different sets
of employees’ remuneration if one is given in terms of weekly wages and the other
in terms of annual salary.
NOTE: For
psk>0 means there is right or positive skews
psk=0 signifies no skew (mean=mode)
psk<0 signifies left or negative skew
The greater the value of psk, the more the distribution is skewed. But what
is more?
Example. Suppose we have computed the mean, mode and of a data set and also
the standard deviation and the values are given as 15.31, 14.09 and 6.1 respectively.
Then the psk can be obtained as follows:
psk = 15.31−14.09
6.1
= 0.2
Conclusion: There is a small degree of right (positive) skew.

4. Measures of location/Position
4.1. Quartiles
We have seen that the median M is the value which halves the data (the lower
half and the upper half). Informally, the first quartile is the median of the lower
half; similarly, the third quartile is the median of the upper half.
To calculate the quartiles, we do the following:

1. Arrange data from low to high and calculate the median M.

• Identify the lower half of the data (M excluded)


• Identify the upper half of the data (M excluded)

57
STA 2100 Probability and Statistics I

(a) The first quartile, Q1, is the median of the lower half of the data.
(b) The third quartile, Q3 is the median of the upper half of the data.

4.2. Percentiles

Percentiles are like quartiles, except that percentiles divide the set of data into 100
equal parts while quartiles divide the set of data into 4 equal parts. Percentiles
measure position from the bottom.
Percentiles are most often used for determining the relative standing of an
individual in a population or the rank position of the individual.

Example. Use the following information as a guide on the method of calculating


the measures of location listed above.

Quartiles
The three quartiles, Q1, Q2, and Q3, approximately divide
an ordered data set into four equal parts.

Median

Q1 Q2 Q3

0 25 50 75 100

Q1 is the median of the Q3 is the median of


data below Q2. the data above Q2.

Athiany, HKO 18

58
STA 2100 Probability and Statistics I

Finding Quartiles
Example:
The quiz scores for 15 students is listed below. Find the first,
second and third quartiles of the scores.
28 43 48 51 43 30 55 44 48 33 45 37 37 42 38

Order the data.


Lower half Upper half

28 30 33 37 37 38 42 43 43 44 45 48 48 51 55

Q1 Q2 Q3
About one fourth of the students scores 37 or less; about one
half score 43 or less; and about three fourths score 48 or less.
Athiany, HKO 19

59
STA 2100 Probability and Statistics I

Interquartile Range
The interquartile range (IQR) of a data set is the difference
between the third and first quartiles.
Interquartile range (IQR) = Q3 – Q1.

Example:
The quartiles for 15 quiz scores are listed below. Find the
interquartile range.
Q1 = 37 Q2 = 43 Q3 = 48

(IQR) = Q3 – Q1 The quiz scores in the middle


= 48 – 37 portion of the data set vary by
= 11 at most 11 points.

Athiany, HKO 20

60
STA 2100 Probability and Statistics I

Box and Whisker Plot


A box-and-whisker plot is an exploratory data analysis tool
that highlights the important features of a data set.
The five-number summary is used to draw the graph.
• The minimum entry
• Q1
• Q2 (median)
• Q3
• The maximum entry
Example:
Use the data from the 15 quiz scores to draw a box-and-
whisker plot.
28 30 33 37 37 38 42 43 43 44 45 48 48 51 55
Continued.
Athiany, HKO 21

61
STA 2100 Probability and Statistics I

Box and Whisker Plot


Five-number summary
• The minimum entry 28
• Q1 37
• Q2 (median) 43
• Q3 48
• The maximum entry 55
Quiz Scores

28 37 43 48 55

28 32 36 40 44 48 52 56
Athiany, HKO 22

62
STA 2100 Probability and Statistics I

Percentiles and Deciles


Fractiles are numbers that partition, or divide, an
ordered data set.

Percentiles divide an ordered data set into 100 parts.


There are 99 percentiles: P1, P2, P3…P99.

Deciles divide an ordered data set into 10 parts. There


are 9 deciles: D1, D2, D3…D9.

A test score at the 80th percentile (P8), indicates that the


test score is greater than 80% of all other test scores and
less than or equal to 20% of the scores.

Athiany, HKO 23

5. Summary
In this section, we have discussed the measures of central tendency, variation/dispersion
and the measures of location for simple frequency distribution. We have noted
that each measure has its pros and cons, and the choice of the measure that one
would be using for their data may be determined by some other factors. However,
computing these measures for simple frequency distributions may not be that chal-
lenging as compared to when we have grouped data. In the next section, we look
at similar issues with regard to numerical data summaries; but instead consider a
grouped frequency distribution data.

6. Assignment 1
1. The following data is on the weight (in kg) of 50 computer components
leaving a an IT store located in Nairobi Forestland.

10.4, 10.0, 9.3, 11.3, 9.6, 11.2, 10.5, 8.5, 10.4, 8.2, 9.3, 9.6, 10.3, 10.0, 11.5,
11.3, 10.8,
8.9, 10.0, 9.5, 10.0, 11.3, 11.0, 9.7, 10.6, 9.9, 10.2, 10.6, 10.2, 8.1, 8.7, 9.4,
10.9,
10.0, 9.9, 9.2, 11.6, 9.6, 9.5, 10.4, 10.6, 8.8, 10.1, 10.3, 9.7, 10.7, 10.6, 12.8,
10.6, 10.2

• Calculate the mean of the data.

63
STA 2100 Probability and Statistics I

• Put the data in a grouped frequency table.

• Estimate the sample mean from the grouped frequency table.

• Calculate the median of the data.

• Find the modal class.

• Calculate the range of the data.

• Calculate the inter–quartile range.

• Calculate the sample standard deviation.

• Draw a box plot for these data and comment on it.

64
STA 2100 Probability and Statistics I

Revision Questions

Exercise 12. Discuss the importance of the measures of central tendency, loca-
tion and dispersion in data analysis.

65
STA 2100 Probability and Statistics I

Chapter 5
Numerical Summaries of Data (Grouped Frequency
Distributions)

Learning outcomes
Upon completing this topic, you should be able to:

• Calculate and interpret the measures of central tendency for grouped data.

• Identify and calculate the measures of variability for grouped data

• Identify and calculate the measures of location for grouped data.

66
STA 2100 Probability and Statistics I

1. Introduction
In our last lecture we looked at the measures of central tendency, measures of
dispersion/variability and location for ungrouped data. We saw different ways
of computing the mean, that is arithmetic mean, harmonic mean and geometric
mean. We also saw the importance of each of these measures in terms of the data
that we have at hand. In addition, we also saw how to compute the measures of
dispersion and location, and their merits and demerits given a situation. All these
were done using the simple frequency distributions, also referred to as ungrouped
data.
In this lesson, we focus on the same measures as discussed in the last lesson but
we now consider grouped frequency distributions. To assist us solve the problems
that we shall deal with in this lesson, we refer to lesson 3 under the section
(Constructing a Frequency Distribution),where we showed how to construct
a frequency distribution (also known as grouped frequency) table. Therefore, in
this lesson, we shall go straight to the use of these tables assuming that we are
now familiar with the construction of the tables.

2. Measures of central tendency for grouped data


As previously indicated, measures of central tendency generally shows the tendency
of statistical data to get concentrated at certain values. To begin with, lets remind
ourselves of some of the useful information that we need in this lesson by defining
the following:

Frequency distribution: When summarizing large masses of raw data, it is often


useful to distribute the data into classes or categories and determine the
number of individuals belonging to each class or group or category. Such
a number is what we refer to as class frequency. When data are arranged
in classes together with the corresponding class frequency for each class
in a table, then such a table is referred to as frequency distribution or
frequency table.

Example. Consider the following data that gives the masses of 100 male JKUAT
IT students as recorded in the frequency table below.

67
STA 2100 Probability and Statistics I

Mass (kg) No. of Relative Cumulative


Students frequency Frequency
60-62 5 0.05 5
63-65 18 0.18 23
66-68 42 0.42 65
69-71 27 0.27 92
72-74 8 0.08 100
Total 100 1.00
Data that is organized and summarized as in this table are often called grouped
data. Much of the original details of the raw data may be lost because of the
grouping. However, an important advantage of using grouped data is that a clear
overall picture of the data emerges, as evidence of certain vital relationships.

Class intervals and class limits: The symbol showing a class such as 60-62 is
called a class interval. The numbers 60 and 62 are called class limits. The
value 60 is the Lower Class Limit (LCL) while 62 is the Upper Class Limit
(UCL). Sometimes, it may be theoretically possible to have a class with no
UCL or LCL. Such is an open ended class, for instance - the class “30 years
and over”

Class Boundaries: In our table, we were correct in the value of masses to the
nearest kg. However, masses recorded in the interval 60-62 could theoreti-
cally include masses from 5905 to 62.5; e.g. 59.8 belongs to this class. The
numbers, 59.5 and 62.5 are called class boundaries or true class limits. In
practice, this is obtained by averaging the class limits of successive classes.

Class size or interval or width: This is the difference between the UCB and
the LCB, for instance 62.5-59.5=3

Class mark/mid-mark: This is the midpoint of class interval obtained by taking


the average of UCB and LCB. In grouped data, while computing the mean
and standard deviation, we assume that the observations coincide with the
midpoint.

68
STA 2100 Probability and Statistics I

2.1. Computing the Mean

Example. The speed, to the nearest mile per hour, of 120 vehicles passing a check
point were recorded and grouped as follows:
Speed(mph) 21-25 26-30 31-35 36-35 46-60
No. of Cars 22 48 25 16 9
Estimate the mean of this distribution.

Solution
First, we need to work out the mid-interval values for the first interval21-25
using the LCB=20.5 and UCB=25.5
Thus mid-point for this class is midpoint = 12 (20.5 + 25.5) = 23
Thus we assume that all values in the interval 20-25 are now represented by
the value 23.
Similarly, we get the midpoints for the remaining classes as shown below.
speed(mph) Midpoint, x f fx
21-25 23 22 506
26-30 28 48 1344
31-35 33 25 825
36-45 40.5 16 648
46-60 53 9 477
Total
P P
f = 120 f x = 3800
Hence, mean x̄= f x/ f = 3800/120 = 31 23
P P

2.2. Assumed mean and coding method


Suppose we guess that the mean of a distribution x = a(assumed mean), then
each xi value can be written as:
xi = a + di where di = the deviation of xi from the assumed mean a.
Thus, the mean x̄ = fi (a + di )/ fi ; expanding this and
P P P P
f i xi / f i =
simplify the expression, we get x̄ = a + fi di / fi that we can simply write as
P P

x̄ = a + f x/ f without the subscripts.


P P

So the mean, x̄ = assumed mean+mean deviation f rom the assumed mean


If we use grouped data, all values filling a class interval are considered as
coincident with the class mark of the interval. The last formula can then be

69
STA 2100 Probability and Statistics I

adjusted further if we find that all classes have the same class width (interval)
equal to a constant c. Therefore, di = cui
Thus we have x̄ = a + f d/ f = a + f cui / f
P P P P

Since c is a constant, we have x̄ = a + c( f ui / f ) = a + cū where ū is


P P

the mean of u
The formula x̄ = a+cū is what we refer to as the coding method for computing
the mean and other measures from a frequency distribution table. It is mainly
useful when class intervals are equal!
Example. Consider the data that we saw in Example 1. Using the method of
assumed mean and coding method, we can obtain the mean as follows, taking
a = 67:
Mass (kg) No. of class d= u= fu
Students,f mark, x − a d/3
x
60-62 5 61 -6 -2 -10
63-65 18 64 -3 -1 -18
66-68 42 67 0 0 0
69-71 27 70 3 1 27
72-74 8 73 6 2 16
Total
P P
f= fu =
100 15
Thus, using the coding formula we have x̄ = a + cū , where ū =
P
f u/N =
15/100 = 0.15 remember
P
f =N
Implying x̄ = a + cū = 67 + 3(0.15) = 67.45kg

2.3. Median for a grouped frequency


As previously mentioned, there is loss of individual identities; hence we cannot
calculate the exact value for the median but can be estimated using two methods.

1. Using the interpolation formula

2. By graphical interpolation (check on the assignment for this)

Using the interpolation formula

70
STA 2100 Probability and Statistics I

Given grouped frequency data, the best that we can do is to estimate the group/class
that contains the median item and hence obtain the ‘theoretical’ value. To achieve
this objective, we proceed as follows:
Step1: Form a Cumulative Frequency (CF) column
Step 2: Find N/2
Step3: Find that F value that first exceeds, N/2 which identifies the median
class M
Step 4: Calculate the median using the formula
median = LM + ( N/2+F fM
M −1
)CM
Where:
LM : is the lower class boundary of the median class
FM −1 :if the cumulative frequency of the class just prior to the median class
fM :is the observed frequency of the median class
CM :is the class interval/width of the median class
Example. Estimate the median for the following data which represents the ages
of 130 representatives who took part in a statistical survey.
Age in Years 20-25 25-30 30-35 35-40 40-45 45-50
No. of reps 2 14 29 43 33 9
Solution:
Using the procedure illustrated above, we have
Age in Years 20-25 25-30 30-35 35-40 40-45 45-50
No. of reps 2 14 29 43 33 9
CF 2 16 45 88 121 130
Thus, N/2 = 130/2 = 65
Using the value 65, the CF value which first exceeds 65 is 88 thus, the class
represented by CF=88 is the median class.
The median class is therefore the class 35-40
So, median = LM + ( N/2+F fM
M −1
)CM = 35 + ( 130/2+45
43
)5=35 + ( 20
43
)5 = 35 +
2.33 = 37.33 years 

2.4. Mode for a grouped frequency:


Sometimes a set of data is obtained where it is appropriate to measure a repre-
sentative value in terms of ‘popularity ’. The mode of a set of data is that value
which occurs most often or equivalently has the largest frequency.

71
STA 2100 Probability and Statistics I

As for the case of the mean and median, the mode for grouped data cannot also
be determined exactly, but can be estimated by use of the interpolation technique
or graphically using a histogram.
An estimate can therefore be obtained as follows:
Step 1: Determine the modal class (class with the highest frequency)
Step 2: Calculate D1 =difference between largest frequency and the frequency
immediately preceding it.
Step 3: Calculate D2 =difference between largest frequency and the frequency
immediately following it.
Step 4: Use the interpolation formula mode = Lm + ( D1D+D 1
2
)Cm
Where:
Lm :is the lower class limit of the median class
Cm :modal class interval/width
Example. Estimate the mode for the following data which represents the ages of
130 representatives who took part in a statistical survey.
Age in Years 20-25 25-30 30-35 35-40 40-45 45-50
No. of reps 2 14 29 43 33 9
Solution:
D1 = 43 − 29 = 14
D2 = 43 − 33 = 10
Lm = 35
Cm = 5
Hence, mode = 35 + ( 10+14
14
)5 = 35 + ( 14
24
)5 = 37.92 years 

3. Measures of variability for grouped data


In this section, we want to see how we can find the variance and standard deviation
of our data given a grouped frequency table. We shall also consider the coding
method in computing the variance and standard deviation of the data. In addition,
we also talk briefly about the mean absolute deviation (MAD).

3.1. Variance and Standard deviation


In most cases, we are always interested in a measure that can be used for further
statistical analysis of a set of data. In that case, the variance and standard deviation

72
STA 2100 Probability and Statistics I

are measures that can be used for this purpose.


Variance if defined as var(x) = f (x − x̄)/ f , while standard deviation is
P P

f (x − x̄)/ f . This formula can be written further as follows;


pP P
sd(x) =
pP 2
f (x − x̄)/ f = n1
 P 2
f x − ( f x)2 /n
P P
sd(x) =
or sd(x) =
pP
f x2 / f − ( f x2 / f )2
P P P

This can then be used to find the variance of the data as shown in the following
example.
Example. The data below relates to the number of successful sales made by the
salesmen employed by a large microcomputer firm in a particular quarter. Calculate
the standard deviation of the number of sales.
No. of 0 to 4 5 to 9 10 to 15 to 20 to 25 to
sales 14 19 24 29
No. of 1 14 23 21 15 6
salesmen,f
Solution:
We can solve this problem by first finding the midpoint, computing the mean
and then variance
No. of Sales No.of mid- fx f x2
Sales- point,
men, (x)
f
0 to 4 1 2 2 4
5 to 9 14 7 98 686
10 to 14 23 12 276 3312
15 to 19 21 17 357 6069
20 to 24 15 22 330 7260
25 to 29 6 27 162 4374
Total 80 1225 21,703
Hence, mean, x̄ = 80 = 15.31 sales
1225
q √ √
sd = 21703
80
− (15.31) 2 = 271.29 − 234.40 = 36.89 = 6.1 sales 

Note: In this case we have assumed that we are dealing with the whole population,
hence we divide the denominator by n and not n − 1.

73
STA 2100 Probability and Statistics I

We can use the coding method to find the standard deviation and variance of the
grouped data as follows:
Let x = a + cu and x̄ = a + cx̄
therefore, variance = f (x − x̄)2 / f , substituting the above figures, we
P P

have
= f (a + cu − a − cū)2 / f simplifying the equation we have
P P

=c2 f (u − ū)2 / f
P P

=c2 variance implying that



sd(x) = c variance
Example. Consider once more the data that we saw in Example 1. Using the
above formulas, we can obtain the standard deviation as follows:
First, remember, we already obtained the mean x̄ = 67.45kg
Mass (kg) No. of class x − x̄ (x − x̄)2 f (x − x̄)2
Students,f mark,
x
60-62 5 61 -6.45 41.6025 208.0125
63-65 18 64 -3.45 11.9025 214.2450
66-68 42 67 -0.45 0.2025 8.505
69-71 27 70 2.55 6.5025 175.5675
72-74 8 73 5.55 30.8125 246.420
100 852.750
Variance= f (x − x̄) / f = 852.75/100 = 8.5275
2
P P
√ √
Standard deviation= variance = 8.5275 = 2.92 kg
Alternatively, using the coding method, we have;
Example. Using the coding method to find the standard deviation of the same
data set.

74
STA 2100 Probability and Statistics I

Mass (kg) No. of class u= fu f u2


Students,f mark, x (x −
62)/3
60-62 5 61 -2 -10 20
63-65 18 64 -1 -18 18
66-68 42 67 0 0 0
69-71 27 70 1 27 27
72-74 8 73 2 16 32
100 15 97
Again, we can say that we already obtained the mean x̄ = 67.45kg
Thus, variance=c2 [( f u2 / f ) − ū2 ]=9 [97/100 − 0.152 ] = 8.5275
P P

75
STA 2100 Probability and Statistics I

3.2. Mean absolute deviation (MAD)


Also known as mean deviation, is defined by f |x − x̄|/ f or
P P P
f |x − x̄|/N
Where, |x − x̄| is the absolute difference between the value x and its mean x̄.
The function |x| or absolute value of x is defined by
|x| = x if x ≥ 0
|x| = −x if x < 0
For instance, | − 56| = 56 , |9| = 9 or | − 3.8| = 3.8
Example. Given the data in Example 3, obtain the MAD for the data.
Solution:
Mass (kg) No. of class x − x̄ |x − x̄| f |x − x̄|
Students,f mark, x
60-62 5 61 -6.45 6.45 32.25
63-65 18 64 -3.45 3.45 116.1
66-68 42 67 -0.45 0.45 18.9
69-71 27 70 2.55 2.55 68.85
72-74 8 73 5.55 5.55 44.4
Total 280.5
P
f=
100
Hence, M AD = 280.5/100 = 2.805 ≈ 2.81kg 

4. Measures of location/Position
4.1. Quartiles and Percentiles
We have already discussed how to find the median of grouped data. The process
of obtaining the quartiles and percentiles in a grouped data is quite similar to what
we have seen with the median. For instance,if we are interested in the 1st quartile,
then instead of using N ∗ 2/4 = N/2 we use N ∗ 1/4 = N/4 and the rest remain
similar to the median computation procedure. Remember, the median is the 2nd
quartile. Similarly, for the percentiles, we divide N by 100. For instance, the 1st
percentile will be N ∗ 10/100 = N/10.
Thus,
Q1 is the 41 nth value
Q2 is the 24 nth value
Q3 is the 34 nth value

76
STA 2100 Probability and Statistics I

Exercise 13. Jua Kali Solicitors monitored the time spent on consultations with
a random sample of 120 of their clients. The times spent, to the nearest minute
are summarized in the following table.
Time 10- 15- 20- 25- 30- 35- 45- 60- 90–
14 19 24 29 34 44 59 89 119
No. 2 5 17 33 27 25 7 3 1
of
clients
(a) Obtain the estimates of the median and quartiles of this distribution.
(b) Comment on the skewness of the distribution.

5. Combining sets of Data


There are some instances where we have been given (a) the number of observa-
tions, (b) the mean and (c) the standard deviation for each data set, but we need
to combine the data.
We may then be forced to find the mean and standard deviation for all the
data in the combined set.
Example. The number of errors, x, on each 200 pages of a book was noted and
the results summarized as follows;
P P 2
x = 920, x = 5032
(a) Calculate the mean and standard deviation of the number of errors per
page. A further 50 pages were added and checked and it was found that the mean
was 4.4 errors with a standard deviation of 2.2 errors.
(b) Find the mean and standard deviation of the number of errors per page for
the 250 pages.
Solution:
(a) x̄ =
P
x/n=920/200=4.6
s2 = x2 /n − x̄2 = 5032/200 − 4.62 = 4
P

The mean is 4.6 errors per page and standard deviation is 2 errors.
(b) For the errors,y,on the further 50 pages
Mean=4.4
Therefore, 4.4 = y/50, which implies that
P P
y = 4.4 ∗ 50 = 220
The standard deviation =2.2

77
STA 2100 Probability and Statistics I

Meaning, 2.22 = y 2 /50 − 4.42 implying


P

y = 50(2.22 + 4.42 ) = 1210 for the combined set of 250 pages


P 2

Thus, total number of errors = x + y = 920 + 220 = 1140


P P

Mean=1140/250 = 4.56 and


(standarddeviation)2 = x2 + y 2 /250 − 4.562
P P

= 5032−1210
250
− 4.562 = 4.1744

Standard deviation = 4.1744 = 2.04(3.sf) 

Exercise 14. Carontons of orange juice are advertised as containing 1 litre. A


random sample of 100 cartons gave the following results for the volume, x.
P P 2
x = 101.4, x = 102.83
Calculate the mean and standard deviation of the volume of orange juice in
these 100 cartons.

6. Summary
The relationship between mean, median and mode is as follows.
The median lies between the mean and mode but closer to the mean by a
factor of 2 to 1. Hence the relationship median − mode = 2(mean − median) is
approximately true. We can therefore express the following relationships:
• median = 2(mean)+mode
3

• mode = 3(median) − 2(mean)

• mean = 3(median)−mode
2

To comment on the skewness of the distribution of a data set, we may use the
Quartile coefficient of skewness given by (Q3 −QQ23)−(Q
−Q1
2 −Q1 )

Learning Activities
• Briefly show how you can use the coding method to obtain the mean and
standard deviation of a simple frequency distribution table.

• With relevant examples, discuss the interpolation method of finding the


median and mode of grouped frequency data.

• With relevant examples, briefly discuss how we can compute the quartiles
and percentiles for grouped frequency table.

78
STA 2100 Probability and Statistics I

Chapter 6
Introduction to Probability

Learning outcomes
Upon completing this topic, you should be able to:

• Define probability

• Calculate probabilities

• List and use the rules of probability

• Identify mutually exclusive events and independent events

79
STA 2100 Probability and Statistics I

1. Introduction
Probability is the language we use to model uncertainty. We all intuitively under-
stand that few things in life are certain. There is usually an element of uncertainty
or randomness around outcomes of our choices. For instance in business this un-
certainty can make all the difference between a good investment and a poor one.
Hence an understanding of probability and how we might incorporate this into
our decision making processes is important. In this lesson, we look at the logical
basis for how we might express a probability and some basic rules that probabilities
should follow. In subsequent lessons, we look at how we can use probabilities to
aid decision making. It is advisable that you revisit the set theory lesson to help
you understand this lesson better.

2. Definitions
The probability of a specific event is a mathematical statement about the likelihood
that it will occur. All probabilities are numbers between 0 and 1, inclusive; a
probability of 0 means that the event will never occur, and a probability of 1
means that the event will always occur. We often use the letter P to represent a
probability. For example, P (Rain) would be the probability that it rains. In other
cases P r is used to represent a probability. It is important to understand some
terms used in probability. They include:

Probability Experiment
An experiment is an activity where we do not know for certain what will happen,
but we will observe what happens. For example:

• We may ask someone whether or not they have used our IT products.

• We may observe the temperature at midday tomorrow.

• We may toss a coin and observe whether it shows “heads” or “tails”.

• Rolling a die and observing the number that is rolled is a probability experi-
ment.

Or A probability experiment is an action through which specific results (counts,


measurements or responses) are obtained.

80
STA 2100 Probability and Statistics I

Outcome
An outcome, or elementary event, is one of the possible things that can happen.
For example, suppose that we are interested in the shoe size of the next customer
to come into a shoe shop. Possible outcomes include “eight”, “twelve”, “nine and
a half” and so on. In any experiment, one, and only one, outcome occurs.
The result of a single trial in a probability experiment is the outcome.

Sample space
The sample space is the set of all possible outcomes. For example, it could be the
set of all shoe sizes or the sample space when rolling a die has six outcomes. {1,
2, 3, 4, 5, 6}

Event
An event consists of one or more outcomes and is a subset of the sample space.
An event is usually denoted using a capital (uppercase) letter. For example “the
shoe size of the next customer is less than 9” is an event. It is made up of all of
the outcomes where the shoe size is less than 9. Of course an event might contain
just one outcome. We can set a letter say E to represent this event.
For instance, A die is rolled. Event A is rolling an even number.
A simple event is an event that consists of a single outcome.
Example. A die is rolled. Event A is rolling an even number. This is not a simple
event because the outcomes of event A are {2, 4, 6}.

3. Rules of probability
• Probabilities are usually expressed in terms of fractions or decimal numbers
or percentages.

Therefore we could express the probability of it raining today as


1
p(rain) = 20 = 0.05

• All probabilities are measured on a scale ranging from zero to one. The
probabilities of most events lie strictly between zero and one. An event with
probability zero is an impossible event and an event with probability one is
said to be a certain event.

81
STA 2100 Probability and Statistics I

• The collection of all possible outcomes, that is the sample space, has a
probability of 1. For example, if an experiment consists of only two outcomes
– success or failure – then the probability of either a success or a failure is
1. That is P(success or failure) = 1.
0
• With respect to an event E, the complementary event, denoted as E c or E
or ∼ E (read as “E prime”), is the negation of the event E. For example, if
we consider the event that it will rain tomorrow. The complement of this
event is the event that it will not rain tomorrow.We should note that the
probability of an event E and its complement is equal to 1 i.e.
P (E) + P (E c ) = 1

Example. There are 5 red chips, 4 blue chips, and 6 white chips in a basket. Find
the probability of randomly selecting a chip that is not blue.
Solution: P (selecting a blue chip) = 4/15 = 0.267
implying P (not selecting a blue chip) = 1 − 0.267 = 0.733

• Two or more events are said to be mutually exclusive if both cannot occur
simultaneously. In the example above, the outcomes success and failure are
mutually exclusive because both cannot occur at the same time.Two events
A and B are mutually exclusive if A ∩ B = 0.

• A list of collectively exhaustive events contains all possible elementary events


for an experiment. For example, rolling a die once, the possible events are the
numbers {1,2,3,4,5,6} which are said to be collectively exhaustive because
it includes all possible outcomes. Thus, the sample spaces are collectively
exhaustive.

Example. Let A = the event that it is Monday, B = the event that it is Tuesday,
and C = the event that it is the year 2014. A and B are mutually exclusive events,
since it cannot be both Monday and Tuesday at the same time. A and C are not
mutually exclusive events, since it can be a Monday in the year 2014.

• Two events are said to be independent if the occurrence of one does not affect
the probability of the second occurring. If two events are independent, then
the probability that both will occur is equal to the product of their individual
probabilities. In other words, if A and B are independent, then

82
STA 2100 Probability and Statistics I

P (A ∩ B) = P (A) × P (B)

Example. If you toss a coin and look out of the window, it would be reasonable
to suppose that the events “get heads” and “it is raining” would be independent.
However, not all events are independent.

4. How do we measure probability?


There are three main ways in which we can measure probability. All three obey
the basic rules described above. Different people argue in favour of the different
views of probability and some will argue that each kind has its uses depending on
the circumstances.

4.1. Classical (or theoretical) probability


If all possible outcomes are “equally likely” then we can adopt the classical approach
to measuring probability. For example, if we tossed a fair coin, there are only two
possible outcomes – a head or a tail – both of which are equally likely, and hence
P (head) = P (tail) = 12
The underlying idea behind this view of probability is symmetry. In this ex-
ample, there is no reason to think that the outcome Head and the outcome Tail
have different probabilities and so they should have the same probability. Since
there are two outcomes and one of them must occur, both outcomes must have
probability 1/2. Another commonly used example is rolling dice. There are six
possible outcomes {1, 2, 3, 4, 5, 6} when a die is rolled and each of them should
have an equal chance of occurring. Hence the P (1) = 16 ,P (2) = 16 , . . . .
Other calculations can be made such as P (Even number) = 36 .
This follows from the formula

T otal number of outcomes in which event occurs


P (Event) =
T otal number of possible outcomes

4.2. Frequentist or Empirical (or statistical) probability


When the outcomes of an experiment are not equally likely, we can conduct exper-
iments to give us some idea of how likely the different outcomes are. For example,
suppose we were interested in measuring the probability of producing a defective

83
STA 2100 Probability and Statistics I

item in a manufacturing process. This probability could be measured by monitoring


the process over a reasonably long period of time and calculating the proportion of
defective items. What constitutes a reasonably long period of time is, of course,
a difficult question to answer. In a more simple case, if we did not believe that a
coin was fair, we could toss the coin a large number of times and see how often we
obtained a head. In both cases we perform the same experiment a large number
of times and observe the outcome. This is the basis of the frequentist view. By
conducting experiments the probability of an event can easily be estimated using
the following formula:

N umber of times an event occurs


P (Event) =
T otal number of times experiment perf ormed

The larger the experiment, the closer this probability is to the “true” probability.
The frequentist view of probability regards probability as the long run relative
frequency (or proportion). So, in the defects example, the “true” probability of
getting a defective item is the proportion obtained in a very large experiment
(strictly an infinitely long sequence of trials). In the frequentist view, probability is
a property of nature and, since, in practice, we cannot conduct infinite sequences
of trials, in many cases we never really know the “true” values of probabilities. We
also have to be able to imagine a long sequence of “identical” trials. This does not
seem to be appropriate for “one-off” experiments like the launch of a new product.
For these reasons (and others) some people prefer the subjective or Bayesian view
of probability.
Example. A travel agent determines that in every 50 reservations she makes, 12
will be for a cruise. What is the probability that the next reservation she makes
will be for a cruise?
Solution:

12
p(cruise) = 50
= 0.24 

Note: As an experiment is repeated over and over, the empirical probability of an


event approaches the theoretical (actual) probability of the event.

For instance, Sally flips a coin 20 times and gets 3 heads. The empirical probability

84
STA 2100 Probability and Statistics I

is 20
3
. This is not representative of the theoretical probability which is 12 .
As the number of times Sally tosses the coin increases, the law of large
numbers indicates that the empirical probability will get closer and closer
to the theoretical probability. This is referred to as the Law of Large
Numbers.

4.3. Subjective
We are probably all intuitively familiar with this method of assigning probabilities.
When we board an Airplane, we judge the probability of it crashing to be sufficiently
small that we are happy to undertake the journey. Similarly, the odds given by
bookmakers on a football match reflect people’s beliefs about which team will win.
This probability does not fit within the frequentist definition as the match cannot
be played more than once.
One potential difficulty with using subjective probabilities is that it is sub-
jective. So the probabilities which two people assign to the same event can be
different. This becomes important if these probabilities are to be used in deci-
sion making. For example, if you were deciding whether to launch a new product
and two people had very different ideas about how likely success or failure of this
product was, then the decision to go ahead could be controversial.
If both individuals assessed the probability of success to be 0.8 then the decision
to go ahead could easily be based on this belief. However, if one said 0.8 and the
other 0.3, then the decision is not straightforward. We would need a way to
reconcile these different positions.
Subjective probability is based on personal judgment, accumulation of knowl-
edge and experience. For instance, medical doctors sometimes assign subjective
probabilities to the length of life expectancy of people with breast cancer.

5. Laws of probability
5.1. Multiplication law

The probability of two independent events E1 and E2 both occurring can be written
as
E2) = P (E1) × P (E2), and this is known as the multiplication law
T
P (E1

85
STA 2100 Probability and Statistics I

of probability.
For example, the probability of throwing a six followed by another six on two
rolls of a die is calculated as follows. The outcomes of the two rolls of the die are
independent. Let E1 denote a six on the first roll and E2 a six on the second roll.
Then
P (two sixes) = P (E1 and E2)
P (E1) × P (E2) = 1
6
× 61 )= 1
36

5.2. Addition law


The multiplication law is concerned with the probability of two or more independent
events occurring. The addition law describes the probability of any of two or more
events occurring. The addition law for two events E1 and E2 is
P (E1 or E2) = P (E1)+P (E2)−P (E1 and E2). recall, n(A∪B) = n(A)+
n(B) − n(A ∩ B) from set theory.
This describes the probability of either event E1 or event E2 happening.
A more basic version of the rule works where events are mutually exclusive: if
events E1 and E2 are mutually exclusive then
P (E1 or E2) = P (E1) + P (E2).
This simplification occurs because when two events are mutually exclusive they
cannot happen together and so P (E1 and E2) = 0.
Example. Consider the following information: 50 percent of families in a certain
city subscribe to the morning newspaper, 65 percent subscribe to the afternoon
newspaper, and 30 percent of the families subscribe to both newspapers. What
percentage of families subscribe to at least one newspaper?
We are told P(Morning) = 0.5, P(Afternoon) = 0.65 and P(Morning and
Afternoon) = 0.3.
Therefore using the addition law
P(at least one paper) = P(Morning or Afternoon) = P(Morning) + P(Afternoon)
− P(Morning and Afternoon)
= 0.5 + 0.65 − 0.3 = 0.85.

86
STA 2100 Probability and Statistics I

So 85% of the city subscribe to at least one of the newspapers.


Below are some graphical illustrations on the laws, and use of the laws of
probability.
Mutually Exclusive Events
Two events, A and B, are mutually exclusive if they
cannot occur at the same time.

A and B

A
B A B

A and B are A and B are not


mutually exclusive. mutually exclusive.

Athiany H,K O 18

87
STA 2100 Probability and Statistics I

Mutually Exclusive Events


Example:
Decide if the two events are mutually exclusive.
Event A: Roll a number less than 3 on a die.
Event B: Roll a 4 on a die.

A B
1
4
2

These events cannot happen at the same time, so


the events are mutually exclusive.
Athiany H,K O 19

88
STA 2100 Probability and Statistics I

Mutually Exclusive Events


Example:
Decide if the two events are mutually exclusive.
Event A: Select a Jack from a deck of cards.
Event B: Select a heart from a deck of cards.

A J 9 2 B
3 10
J J A 7
K 4
J 5
6Q8

Because the card can be a Jack and a heart at the


same time, the events are not mutually exclusive.
Athiany H,K O 20

89
STA 2100 Probability and Statistics I

The Addition Rule


The probability that event A or B will occur is given by
P (A or B) = P (A) + P (B) – P (A and B ).
If events A and B are mutually exclusive, then the rule
can be simplified to P (A or B) = P (A) + P (B).
Example:
You roll a die. Find the probability that you roll a number
less than 3 or a 4.
The events are mutually exclusive.
P (roll a number less than 3 or roll a 4)
= P (number is less than 3) + P (4)
2 1 3
    0.5
6 6 6
Athiany H,K O 21

90
STA 2100 Probability and Statistics I

The Addition Rule


Example:
A card is randomly selected from a deck of cards. Find the
probability that the card is a Jack or the card is a heart.
The events are not mutually exclusive because the
Jack of hearts can occur in both events.

P (select a Jack or select a heart)


= P (Jack) + P (heart) – P (Jack of hearts)
4 13 1
  
52 52 52
16

52  0.308

Athiany H,K O 22

91
STA 2100 Probability and Statistics I

The Addition Rule


Example:
100 college students were surveyed and asked how many
hours a week they spent studying. The results are in the
table below. Find the probability that a student spends
between 5 and 10 hours or more than 10 hours studying.
Less More
5 to 10 Total
then 5 than 10
Male 11 22 16 49
Female 13 24 14 51
Total 24 46 30 100

The events are mutually exclusive.


P (5 to10 hours or more than 10 hours) = P (5 to10) + P (10)
46 30 76
    0.76
100 100 100
Athiany H,K O 23

6. Conditional probability
So far we have only considered probabilities of single events or of several indepen-
dent events, like two rolls of a die. However, in reality, many events are related.
For example, the probability of it raining in 5 minutes time is dependent on whether
or not it is raining now. We need a mathematical notation to capture how the
probability of one event depends on other events taking place. We do this as
follows. Consider two events A and B. We write P (A|B) for the probability of
A given that B has already happened. We describe P (A|B) as the conditional
probability of A given B.
We can calculate these conditional probabilities using the formula

P (A and B)
P (A|B) =
P (B)
that is, in terms of the probability of both events occurring, P(A and B), and the
probability of the event that has already taken place, P(B).

6.1. Independence of two compound events


Suppose twenty identical cards, numbered 1-20 inclusive are placed in a large box,
a card if then drawn at random. The events A and B are defined as follows:
A: the number is prime
B: the number is 14 or more
If the number is 14 or more, a player is given a price. Suppose the person
drawing the card does not reveal the number on the card but only says that the
number is prime. How does this affect the players chance of wining?

92
STA 2100 Probability and Statistics I

To fully understand this question, we need to use a contingency table as shown


below.
0
A A
B 2/20 5/20 7/20
0
B 6/20 7/20 13/20
8/20 12/20 1
This implies that
(i) P (B/A) = P (B∩A)
P (A)
= P (A∩B)
P (A)
= 2/20
8/20
= 1
4
0
0
(ii) P (B/A ) = P (B∩A )
P (A0 )
= 5/20
12/20
= 5
12
0
Is P (B/A) = P (B/A )?
0
Consider now B is 16 or more, is P (B/A) = P (B/A )?
Is the players winning of a price dependent on the previous event?
0
Two events A and B are said to be Independent whenever P (B/A) = P (B/A ) =
P (A)
To test the independence of the two events A and B, it will be sufficient to show
0
that any two of these probabilities P (B/A) = P (B/A ) = P (A) are equal.
0 0 0 0
Example. Show that P (B ) = P (B /A) = P (B /A ) = 13/20 and P (A) =
0
P (A/B) = P (A/B ) = 8/20 = 2/5
For independent events, we can multiply their probabilities
Thus the statements, ’A and B are independent’ and P (A ∩ B) = P (A)P (B)
are equivalent.
Proof
Suppose A and B are independent, then P (A) = P (A/B),that is P (A) =
P (A∩B)
P (B)
=⇒P (A) ∗ P (B) = P (A ∩ B)

Exercise 15. Given the events A and B are independent, copy and complete the
following contingency table. The results can b obtained as follows:

93
STA 2100 Probability and Statistics I

0
A A
3
B 20
y u
0
B x z v
1
4
t 1

7. Tree Diagrams
In some cases, especially where there are three or more different events being
considered, tree diagrams are an alternative to the contingency tables.
Tree diagrams or probability trees are simple clear ways of presenting proba-
bilistic information. Let us first consider a simple example in which a fair coin is
tossed twice. Suppose we are interested in the probability that we get a head on
both tosses. This probability can be calculated as
P(Head and Head) = P(Head on 1st toss) × P(Head on 2nd toss|head on 1st
toss)
This example can be represented as a tree diagram in which experiments are
represented by circles (called nodes) and the outcomes of the experiments as
branches. The branches are annotated by the probability of the particular out-
come.
Example. In a large farm, 20% of a particular kind of flower is red and 80% is
white. The farmer decides to take samples of flowers from the production of this
particular kind. What is the probability that he obtains;
(a) One or two red flowers in a sample of two?
(b) At least two red flowers in a sample of three?
Solution:
This information can be represented in the tree diagram as follows.

94
STA 2100 Probability and Statistics I

Start

1/5 4/5

R W
1/5 4/5 1/5 4/5

R W R W

4/5 1/5 1/5 4/5


1/5 4/5 1/5 4/5

R W R W R W
W R
Resulting in…
RRR RRW RWR RWW WRR WRW WWR WWW

In this problem, we assume that probability of these events remain the same
even after picking a small number of flowers from the production line.
(a) P (RR) + P (RW ) + P (W R) this represents one or two red flowers
But
P (RR) = 1/5 ∗ 1/5 = 1/25
P (RW ) = 1/5 ∗ 4/5 = 4/25
P (W R) = 4/5 ∗ 1/5 = 4/25
=⇒P (RR) + P (RW ) + P (W R) = 1/25 + 4/25 + 4/25 = 9/25
Alternatively,
P (one or two red f lowers) = 1 − P (no red f lower) = 1 − P (W W )
= 1 − (4/5 ∗ 4/5) = 1 − 16/20 = 9/25
(b) P (RRR) + P (RRW ) + P (RW R) + P (W RR)
= (1/5)3 + (1/5 ∗ 1/5 ∗ 4/5) + (1/5 ∗ 4/5 ∗ 1/5) + (4/5 ∗ 1/5 ∗ 1/5) = 13/125

Example. A box has 6 blue beads and 4 red beads. Three beads are drawn at
random (without replacement). What is the probability that: (a) they are all blue
(b) there are exactly two blue balls (c) there is at least one blue bead
Solution:

95
STA 2100 Probability and Statistics I

In the case of draws made without replacement, and tree diagrams being com-
plex/many branches, we can use the combinations for quick computation of prob-
abilities
(a) For this case, total number of ways of selecting 3 beads from 10 is
10
C3 = 120
Selecting 3 from 6 is 6 C3 = 20
Therefore, P (All blue) = 6 C3 /10 C3 = 20/120 = 1/6
(b) Selecting 2 from 6= 6 C2 = 15
selecting 1 from 4 =4 C1 = 4
Therefore, exactly 2 red will be 15∗4
120
= 1/2
(c) 1 − P (all red) = 1 − 4 C3 /10 C3 = 1 − 4/120 = 29/30 

8. Bayes Theorem
Suppose we know P (A),P (∼ A) and also P (B/A) and P (B/ ∼ A), then we can
represent the first branches of a tree diagram and those of B and ∼ B in the
second branches. Can we then determine P (A/B)?
This problem can be solved by using Thomas Bayes theorem. Bayes was
an English Mathematician and his theorem has given us a fundamental result of
statistical inference.
Mathematically, Bayes theorem gives the relationship between probabilities of
A and B, P (A) and P (B) and the conditional probabilities of A given B and
Bgiven A; denoted by P (A/B), P (B/A)
Commonly, Bayes theorem is;
Simple P (A/B) = P (B/A)P
P (B)
(A)
f or P (B) 6= 0
(The meaning depends on the interpretation of probability ascribed to the
terms)
Extended P (A/B) = P (B/A)PP(A)+P
(B/A)P (A)
(B/A0 )P (A0 )

Example. Kamau has two gardeners, David and James. David comes on 1/3 of
the occasions and James 2/3 of the occasions. There is a probability of 1/10
that David will forget to water the flowers and a probability of 1/2 that James
will forget to water the flowers. One day, Kamau had to leave the house before
the gardener arrived. On his return, he found that the gardener had come and
gone, and also that the flowers were not watered. What is the probability that it
is James who came that day?

96
STA 2100 Probability and Statistics I

Solution:

Let
D: David comes
J: James comes
W: Flowers watered
The tree diagram will then look like this

W’
1/2

J
2/3
W

D
9/10
W

We need to find P (J/W 0 )


0)
P (J/W 0 ) = PP(J∩W
(W 0 )
=⇒ P (W 0 ) = P (W 0 /J)P (J) + P (W 0 /D)P (D)
From the diagram
P (W 0 /J) = 1/2
Then
0)
P (J/W 0 ) = PP(J∩W
(W 0 )
(2/3∗1/2)
= (1/3∗1/10)+(2/3∗1/2) 1/3
= 11/30 = 10
11


Exercise 16. A certain video store uses blank tapes bought from two sources,
say source A and source B. Suppose that the owner of the video store buys 30%
of the tapes from A and its is known that 5% of the video tapes are defective,
then buys 70% from source B when 20% are usually defective. On recording some
movies on the tapes, the owner discovers that certain tape is defective. What is

97
STA 2100 Probability and Statistics I

the probability that the video tape was supplied by A?

9. Summary
An experiment is a process that, when performed, results in one and only one of
many observations. The observations are called the outcomes of an experiment.
The collection of all possible outcomes of an experiment is called a sample space.
A sample space is denoted by S. Therefore, the sample space for an experiment
of inspecting a computer fan is written as: S = {good, def ective} or for tossing
a coin twice is S = {0, 1, 2}for the number of heads obtained.
For three or more events, it is easy to construct a probability space than a
contingency table, for contingency tables are only practicable for two events!

98
STA 2100 Probability and Statistics I

9.1. Revision questions or guidelines


1. A box contains three pieces of identical mouse pads of different colors: Red
(R), White (W) and Blue (B). Two pieces of pads are randomly picked from
the box.

(a) What is the sample space if the picked piece is not replaced?
(b) What is the sample space if the picked piece is replaced?

2. If 85% of people have a bowl of cereal for breakfast, 60% of people have
toast, and 50% of people have both cereal and toast for breakfast, what
percentage of people have neither cereal nor toast for breakfast?

3. Do you think the following pairs of events are independent or dependent?


Explain.

(a) E: An individual has a high IQ


F: An individual is accepted for a University place
(b) A: A student plays table tennis
B: A student is good at maths
(c) E1: An individual has a large outstanding credit card debt
E2: An individual is allowed to extend his bank overdraft

4. A company manufactures a device which contains three components A, B


and C. The device fails if any of these components fail and the company offers
to its customers a full money-back warranty if the product fails within one
year. The company has assessed the probabilities of each of the components
lasting at least a year as 0.98, 0.99 and 0.95 for A, B and C respectively. The
three components within a single device are considered to be independent.
Consider a single device chosen at random. Calculate the probability that

(a) all three components will last for at least a year;


(b) the device will be returned for a refund

5. Two events A and B are such that P (A) = 1/4, P (A|B) = 1/2 and
P (B|A) = 2/3. (a) Are A and B independent? (b) Are A and B Mutually
exclusive? (c) Find P (A ∩ B) (d) Find P (B).

99
STA 2100 Probability and Statistics I

6. A group of 50 BIT students were asked which of the three Computer Science
Journals, A, B or C they read. The results showed that 25 read A, 16 read
B, 14 read C. 5 read both A and B, 4 read both B and C, 6 read both C
and A and 2 read all three.

(a) Represent these data on a Venn diagram


(b) Find the probability that a person selected at random from this group
reads
i. At least one of the Journals
ii. Only one of the Journals

100
STA 2100 Probability and Statistics I

Learning Activities
• Two fair six faced dice are rolled. Let T be the sum is 10 and B be the score
is double. Construct a tree diagram with the first branch being T, and also
another tree diagram with the first branch being D.

101
STA 2100 Probability and Statistics I

Chapter 7
Discrete Probability Distribution

Learning outcomes
Upon completing this topic, you should be able to:

• Define probability distributions.

• Differentiate between discrete and continuous variables and probability dis-


tributions.

• Explain the discrete probability distributions considered.

• Calculate the expected value from probability distribution.

• Evaluate variance from probability distribution.

102
STA 2100 Probability and Statistics I

1. Introduction
An important part of any analysis of decision making under stochastic conditions
is a probability distribution.Probability distributions state the relative frequency of
occurrence of a set of mutually exclusive events. Probability distributions can be
univariate or multivariate. They give the relative frequency of observing a particular
event.We saw that surveys can be used to get information on population quanti-
ties.In most cases, it is not possible to measure the variables on every member of
the population and so some sampling scheme is used. This means that there is
uncertainty in our conclusions. Before we can make inferences about populations,
we need a language to describe the uncertainty we find when taking samples from
populations.This can be done using probability distributions.

2. Random Variable
In many experiments the outcomes of the experiment can be assigned numerical
values. For instance, if you roll a die, each outcome has a value from 1 through
6. If you ascertain the midterm test score of a student in your class, the outcome
is again a number.
A random variable is just a rule that assigns a number to each outcome of
an experiment. These numbers are called the values of the random variable. We
often use letters like X, Y and Z to denote a random variable. Here are some
examples

1. Experiment: Select a mutual fund; X = the number of companies in the


fund portfolio. The values of X are 2, 3, 4,......

2. Experiment: Select a soccer player; Y = the number of goals the player


has scored during the season. The values of Y are 0, 1, 2, 3,.........

Random variables may be discrete or continuous.


A discrete random variable can take on only specific, isolated numerical
values, like the outcome of a roll of a die, or the number of shillings in a randomly
chosen bank account.

• Discrete random variables that can take on only finitely many values (like
the outcome of a roll of a die) are called finite random variables.

103
STA 2100 Probability and Statistics I

• Discrete random variables that can take on an unlimited number of values


(like the number of stars estimated to be in the universe) are infinite discrete
random variables.

A continuous random variable, on the other hand, can take on any values
within a continuous range or an interval, like the temperature, or the height of an
athlete in centimeters, the yield of maize from an acre of land, the weight of a
laptops in a supplier’s store.

3. Discrete probability distribution


Given a random variable X, it is natural to look at certain events, for instance,
the event that X = 2. By this, we mean the event consisting of all outcomes that
have an assigned X value of 2. To illustrate this let’s look at an example: suppose
we throw a pair of fair dice, and take X to be the sum of the numbers facing up
as 2. Then the event that X = 2 is (1; 1) . The event that X = 3 is (2; 1);
(1; 2). The event that X = 4 is (3, 1), (2, 2), (1, 3) and so on. Each of these
events has a certain probability associated with each of them. For instance, the
probability that X = 4 is 36 3 1
= 12 because the event in question consists of three
of the thirty-six possible (equally likely) outcomes.
For the probability distribution of a discrete random variable, we need to know
both the set of values of the random variables and the probability with which it
takes each of the values.
The function that is responsible for allocating probabilities, P (X = x), is
known as the probability mass function of X, sometimes abbreviated as PMF of
X.
A PMF/PDF can either be a list of probabilities individually or a summary of
them in a formula.
Example. You toss a coin 4 times. There are 16 possible outcomes:
HHHH;HHHT;HHTH;HHTT;HTHH;HTHT;HTTH;HTTT; THHH;
THHT; THTH; THTT; TTHH; TTHT; TTTH; TTTT (H = heads; T = tails)
Now take X = number of heads. Here are some of the probabilities: P(X
= 0) = 16
1
. Only one of the 16 possible outcomes has X = 0 (no head); namely
TTTT, P(X = 1) = 16 4
Four of the 16 possible outcomes have X = 1; namely
HTTT; THTT; TTHT and TTTH:

104
STA 2100 Probability and Statistics I

The distinction between the capital letter X and small letter x is important;
X stands for the random variable in question, whereas x stands for a specific value
or outcome.
Or
Example. Two tetrahedral dice, with faces labeled 1,2,3,4 are thrown and the
score noted, where the score is the sum of the two numbers on which the dice
land. Find the probability density function (pdf ) of X, where X is the random
variable ’the score where two dice are thrown’
Solution:
x 2 3 4 5 6 7 8
P (X = x) 1/16 2/16 3/16 4/16 3/16 2/16 1/16
Since, 1
P
P (X = x) = 16 (1 + 2 + 3 + 4 + 3 + 2 + 1) = 1 
Thus X is a random variable.

3.1. Finding probabilities

Example. The pdf of a discrete random variable Y is given by P (Y = y) = Cy 2


for y = 0, 1, 2, 3, 4. Given that C is a constant, find the value of C.
Solution
y 0 1 2 3 4
P (Y = y) 0 c 4c 9c 16c
Since Y is a random variable,
P
P (Y = y) = 1,
then c + 4c + 9c + 16c = 1 =⇒ c = 1/30
Example. The discrete random variable T has the probability distribution as shown
below;
T -3 -2 -1 0 1
P (T = t) 0.1 0.25 0.3 0.15 d
Find:
(a) the value of d
P
P (T = t) = 1
0.1 + 0.25 + 0.3 + 0.15 + d = 1 =⇒ d = 0.2
(b) P (−3 ≤ T ≤ 0)
P (−3 ≤ T ≤ 0) = P (T = −3) + P (T = −2) + P (T = −1) + P (T = 0) =
0.65

105
STA 2100 Probability and Statistics I

(c) P (T > −1)


P (T > −1) = P (T = 0) + P (T = 1) = 0.35
(d) P (−1 < T < 1)
P (−1 < T < 1) = P (T = 0) = 0.15

3.2. Expectation
E(X) read as 0 E of X 0 gives the average or typical value of X, known as the
expected value or expectation of X. X represents the random variable.
The mean of a discrete random variable is the mean of its probability distribu-
tion. This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value.
This is the value we expect to observe per repetition, if we repeat an experiment
several times. This value is a useful summary of the variable’s distribution.
Stating the expected value gives a general impression of some random variable
without giving full details of its probability distribution. The expected value of a
random variable X is symbolized by E(X) or µ, read as “E of X” and is denoted
as;

X
E [X] = xP (X = x)

Example. A random variable X has probability distribution as shown below.


Find the expectation, E(X).
x -2 -1 0 1 2
P (X = x) 0.3 0.1 0.15 0.4 0.05
Solution:
P
E(X) = xP (X = x)
= (−2 ∗ 0.3) + (−1 ∗ 0.1) + (0 ∗ 0.15) + (1 ∗ 0.4) + (2 ∗ 0.05)
= −0.2


Exercise 17. A fruit machine consists of three windows which operate indepen-
dently. Each window shows pictures of fruits: Lemon, Apples, Cherries or Bananas.
The probability that a window shows a particular fruit is as follows:
P (Lemon) = 0.4
P (Cherries) = 0.2

106
STA 2100 Probability and Statistics I

P (Apple) = 0.1
P (Cherries) = 0.3
The rule for playing the game on the fruit machine is as follows: It costs Kshs
10 to play the game. A player will win Kshs 100 if he/she gets three Apples in a
row, Kshs 50 if he/she gets three Cherries in a row, Kshs 40 if he/she gets three
Lemons in a row and Kshs 80 if he/she gets two Apples and a Cherry in the game.
The order in which the fruits appear is not important. Based on this information,
would you expect to gain or lose if you play the game?

• Expectation of a function of a discrete random variable


The definition of expectation can be expanded to any function of X such as 10X,
X 2 , X − 4 e.t.c. In general, g(X) is any function of the discrete random variable
X, then E(g(X)) = g(X) ∗ P (X = x)
P

For instance,
P
E(10X) = 10xP (X = x)
E(X 2 ) = x2 P (X = x)
P

E( X1 ) =
P1
x
P (X = x)
P
E(X − 4) = (x − 4)P (X = x)
Example. The random variable X has a distribution function shown below.
x 1 2 3
P (X = x) 0.1 0.6 0.3
Find;
i) E(X)
P
E(X) = xP (X = x) = (1 ∗ 0.1) + (2 ∗ 0.6) + (3 ∗ 0.3) = 2.2
ii) E(3)
P
E(3) = 3P (X = 3) = (3 ∗ 0.1) + ... + (3 ∗ 0.3) = 3
iii) E(5X)
P
E(5X) = 5xP (X = x) = (5 ∗ 0.1) + ... + (15 ∗ 0.3) = 11,
Notice that 5E(X) = 5 ∗ 2.2 = 11
In general, for two constants a and b;
E(a) = a
E(aX) = aE(X)
E(aX + b) = aE(X) + b

107
STA 2100 Probability and Statistics I

Exercise 18. X is the number of heads obtained when two coins are tossed.
Find (a) the expected number of heads (b)E(X 2 ) (c) E(X 2 − X)

3.3. Variance
The variance of a random variable is a non-negative number which gives an idea
of how widely spread the values of the random variable are likely to be; the larger
the variance, the more scattered the observations on average.
Stating the variance gives an impression of how closely concentrated round the
expected value the distribution is; it is a measure of the ’spread’ of a distribution
about its average value. Variance is symbolized by V (X) or V ar(X) or σ 2 and is
defined as:

var(X) = E[(X − E[X])2 ]

or var(X) = E(X − µ)2


which is equivalent to

var(X) = E[X 2 ] − (E[X])2

or var(X) = E(X 2 ) − µ2
Example. Find the variance of the following distribution
x 1 2 3 4 5
P (X = x) 0.1 0.3 0.2 0.3 0.1
var(X) = x2 P (X = x) − xP (X = x)
P P

= (12 ∗ 0.1) + (22 ∗ 0.3) + ... + (52 ∗ 0.1) − [(1 ∗ 0.1) + ... + (0.1 ∗ 5)]2
= 10.4 − 9 = 1.4
In general, if a and b are any two constants, then;
var(a) = 0
var(aX) = a2 var(X)
var(aX + b) = a2 var(X)

3.4. The cumulative distribution function, F (X)

In a frequency distribution, the cumulative frequency are obtained by summing all


the frequencies up to a part value. In the same way, in a probability distribution,

108
STA 2100 Probability and Statistics I

the probability up to a certain value are summed to give a cumulative probability.


The cumulative probability function is denoted as F (x)

Example. Consider the following probability distribution


x 1 2 3 4 5
P (X = x) 0.05 0.4 0.3 0.15 0.1
From the above table,
F (1) = P (X ≤ 1) = 0.05
F (2) = P (X ≤ 2) = P (X = 1) + P (X = 2) = 0.05 + 0.4 = 0.45
.
.
F (5) = P (X ≤ 5) = 1.0
Thus, cumulative distribution is given by
x 1 2 3 4 5
P (X = x) 0.05 0.45 0.75 0.9 1.0
Generally, for a discrete random variable X, the cumulative distribution func-
tion is given by F (x) where F (x) = P (X ≤ x)

4. Special discrete probability distributions


The following are some of the well known examples of discrete distributions:
Bernoulli, Binomial, Uniform, Geometric, Poisson, Hypergoemetric distributions,
amongst others. In this lesson, we only focus on two of these distributions, that
is Bernoulli and Binomial distributions.

4.1. Bernoulli distribution


A random variable X has a Bernoulli distribution with parameter p if it can assume
a value of 1 with a probability of p and the value of 0 with a probability of (1 − p).
The random variable X is also known as a Bernoulli variable with parameter p and
has the following probability mass function:
P (X = x) = p(1 − p)
The mean of a random variable X that has a Bernoulli distribution with pa-
rameter p is
E(X) = 1(p) + 0(1 − p) = p

109
STA 2100 Probability and Statistics I

The variance of X is pq
Example. A random variable whose value represents the outcome of a coin toss
(1 for heads, 0 for tails, or vice-versa) is a Bernoulli variable with parameter p,
where p is the probability that the outcome corresponding to the value 1 occurs.
For an unbiased coin, where heads or tails are equally likely to occur, p = 0.5.

4.2. Binomial distribution


• The binomial distribution is one of the most popular distributions.

• The origin of binomial distribution lies in Bernoulli’s trials.

• A Bernoulli trial is an experiment having only two possible outcomes, i.e.


success or failure. For example when one flips a coin, either head or tail
will show on the upper face, the sex of an expected baby will either be male
or female, a person will either be healthy or sick, a computer fan will either
be working or defective etc

• A binomial experiment is a probability experiment that satisfies the following


conditions.

– The experiment is repeated for a fixed number of trials, where each


trial is independent of other trials.
– There are only two possible outcomes of interest for each trial. The
outcomes can be classified as a success (S) or as a failure (F).
– The probability of a success P (S) is the same for each trial.
– The random variable x counts the number of successful trials.

Example. Decide whether the experiment is a binomial experiment. If it is, specify


the values of n, p, and q, and list the possible values of the random variable x. If
it is not a binomial experiment, explain why.
Experiment: You randomly select a card from a deck of cards, and note if
the card is an Ace. You then put the card back and repeat this process 8 times.
Solution: This is a binomial experiment. Each of the 8 selections represent
an independent trial because the card is replaced before the next one is drawn.

110
STA 2100 Probability and Statistics I

There are only two possible outcomes: either the card is an Ace or not. Therefore,
n = 8, p = 4/52 = 1/13, q = 12/13 and x = 0, 1, 2, 3, 4, 5, 6, 7, 8
In the next few sections of the lesson, we discuss the binomial distribution and
mainly showing how to solve a number of problems.We also define the Binomial
probability function.
Binomial Probability Formula
In a binomial experiment, the probability of exactly x
successes in n trials is
P (x )  nC x p xq n x  n! p xq n x .
(n  x )! x !
Example:
A bag contains 10 chips. 3 of the chips are red, 5 of the chips are
white, and 2 of the chips are blue. Three chips are selected, with
replacement. Find the probability that you select exactly one red chip.
p = the probability of selecting a red chip  3  0.3
10
q = 1 – p = 0.7 P (1)  3C1(0.3)1(0.7)2
n=3  3(0.3)(0.49)
x=1  0.441
Athiany, HKO 28

111
STA 2100 Probability and Statistics I

Binomial Probability Distribution


Example:
A bag contains 10 chips. 3 of the chips are red, 5 of the chips are
white, and 2 of the chips are blue. Four chips are selected, with
replacement. Create a probability distribution for the number of red
chips selected.
p = the probability of selecting a red chip  3  0.3
10
q = 1 – p = 0.7
n=4 x P (x)
x = 0, 1, 2, 3, 4 0 0.240 The binomial
1 0.412 probability
2 0.265 formula is used
3 0.076 to find each
4 0.008 probability.

Athiany, HKO 29

112
STA 2100 Probability and Statistics I

Finding Probabilities
Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected.
x P ( x) a.) Find the probability of selecting no
0 0.24 more than 3 red chips.
1 0.412
2 0.265
3 0.076 b.) Find the probability of selecting at
4 0.008 least 1 red chip.
a.) P (no more than 3) = P (x  3) = P (0) + P (1) + P (2) + P (3)
= 0.24 + 0.412 + 0.265 + 0.076 = 0.993
b.) P (at least 1) = P (x  1) = 1 – P (0) = 1 – 0.24 = 0.76
Complement
Athiany, HKO 30

113
STA 2100 Probability and Statistics I

Graphing Binomial Probabilities


Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected. Graph
the distribution using a histogram.
x P ( x) P (x)
0 0.24 0.5 Selecting Red Chips
1 0.412

Probability
0.4
2 0.265
0.3
3 0.076
4 0.008 0.2

0.1
0 x
0 1 2 3 4
Number of red chips
Athiany, HKO 31

114
STA 2100 Probability and Statistics I

Mean, Variance and Standard Deviation


Population Parameters of a Binomial Distribution
Mean: μ  np
Variance: σ 2  npq
Standard deviation: σ  npq
Example:
One out of 5 students at a local college say that they skip breakfast in
the morning. Find the mean, variance and standard deviation if 10
students are randomly selected.
n  10 μ  np σ 2  npq σ  npq
p  1  0.2  10(0.2)  (10)(0.2)(0.8)  1.6
5
q  0.8 2  1.6  1.3

Athiany, HKO 32

5. Summary
A discrete probability distribution lists each possible value the random variable can
assume, together with its probability. A probability distribution must satisfy the
following conditions.

• The probability of each value of the discrete random variable is between 0


and 1, inclusive.

• The sum of all the probabilities is 1.

The mean of a discrete random variable is the mean of its probability distribution.
This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value. This is the value we expect
to observe per repetition, if we repeat an experiment several times. This value is
a useful summary of the variable’s distribution. Stating the expected value gives
a general impression of some random variable without giving full details of its
probability distribution.
The variance of a random variable is a non-negative number which gives an
idea of how widely spread the values of the random variable are likely to be; the
larger the variance, the more scattered the observations on average. Stating the
variance gives an impression of how closely concentrated round the expected value
the distribution is; it is a measure of the ’spread’ of a distribution about its
average value
Guidelines for Constructing a Discrete Probability Distribution

• Let x be a discrete random variable with possible outcomes x1 , x2 , . . . , xn .

115
STA 2100 Probability and Statistics I

• Make a frequency distribution for the possible outcomes.

• Find the sum of the frequencies.

• Find the probability of each possible outcome by dividing its frequency by


the sum of the frequencies.

• Check that each probability is between 0 and 1 and that the sum is 1.

As indicated above, in a binomial experiment, the probability of x successes in n


trials can be obtained by the following formula:
P (X = x) =n Cx P x (1 − P )n−x , for x = 0, 1, 2, 3, ...n
From any one of the discrete distributions, we can create a probability distri-
bution of the random variable. Using the same distributions, we can then find the
expected value, variance, standard deviation and other measures.
We have seen that the expected value (mean) of a random variable can
be obtained from the probability distribution by using the formula; E(X) =
xP (X = x). This implies that after creating the probability distribution table
P

for a Bernoulli/binomial experiment, we can then obtain these measures. How-


ever, we have also seen that for the Bernoulli/binomial distribution, the mean and
variance can also be obtained using the parameters. That is:
Bernoulli distribution, mean = p, variance = pq
Binomial distribution, mean = np, variance = npq

6. Revision questions or guidelines


1. A computer firm claims that only 5% of all new computers delivered to its
customers have been infected by a virus. If the firm has 15 new computers
to deliver to its customers, find the following probabilities:

a) None reach them has been infected by the virus.


b) One or more have been infected by the virus.
c) Two or more have been infected by the virus

2. Are they continuous or discrete variables

(a) Number of misspelled words

116
STA 2100 Probability and Statistics I

(b) Amount of water through Hoover dam in a day


(c) How late a student is for class
(d) Number of bacteria in a water sample
(e) Amount of carbon monoxide produced from burning a gallon of un-
leaded gas
(f) The number of workers in the institution
(g) Number of checkout lanes at grocery store
(h) Amount of time waiting in line at grocery store

3. Suppose a fair six sided die is tossed 5 times. What is the probability of
getting exactly 2 fours?

4. Explain the meaning of the probability distribution of a discrete random


variable. Give an example of such a probability distribution. What are the
ways to present the probability distribution of a discrete random variable?

5. Approximately 25% of students at a local high school participate in after-


school sports all four years of high school. A group of four seniors is randomly
chosen. Let X be a random variable that represents the number of seniors
in the sample who have participated in sports all four years.

(a) Use the binomial probability formula to complete the probability distri-
bution .

X 0 1 2 3 4
P (X = x) 0.316 0.422 ? 0.047 ?

(a) Which value of X is most likely? Which value of X is least likely?


(b) On average, how many seniors in the sample would you expect to have
played sports all four years? In other words, what is the mean of X?
(c) What is the standard deviation of X?
(d) What is the probability that all four seniors in the sample played sports
all four years?

117
STA 2100 Probability and Statistics I

(e) What is the probability that two or fewer seniors in the sample played
sports all four years?
(f) If a new random variable Y = X 2 + 2X, use the above table to obtain
E(Y ) and sd(Y )

6. According to a study carried out by a computer company in Uganda, the


probability that a randomly selected laptop fan will last longer than 1.2 years
is 0.15. What is the probability that out of six randomly selected fans: (a)
Exactly two last longer than 1.2 years (b)None lasts longer than 1.2 years?

118
STA 2100 Probability and Statistics I

Chapter 8
Relations (Correlation)

Learning outcomes
Upon completing this topic, you should be able to:

• Define terms related to correlation

• Draw and interpret scatter diagram for bivariate data

• Calculate and interpret the value of the product-moment correlation coeffi-


cient.

• Calculate and interpret the value of Spearman’s rank correlation coefficient

119
STA 2100 Probability and Statistics I

1. Introduction
It is frequently of interest to know whether two or more variables are related and if
so, how they are related . For instance, we may ask if there exists a relationship
between a students’ mean grade and the class attendance record! If the two
variables are related, how are they related? Similarly, the president of a large
computer firm knows very well that there is a tendency for sales to increase
as advertising expenditures increases, but how strong is that tendency, and how
can he/she predict the approximate sales that will result from various advertising
expenditures?
The number of possible relationships between two continuous variables is infi-
nite. Of course there may be no relationship at all, but in the simplest case where
one does exist, it may be that high scores on one variable tend to accompany
high scores on the second.
For instance, Age and vocabulary ; the younger you are the fewer words
one is likely to know; while the older you are the more you know. This kind of
relationship is described as positive relationship. A second kind of relationship is
one in which high values of one variable tend to accompany lower values of the
other. For example, the degree of education and crime rate. Other kinds of
relationships may exist too e.g. Age and physical strength. It increases up to
a certain level then drops to lower values.
Two variables can correlate quite nicely, yet have no cause/effect relationship.
Correlation metrics are measures of associations between variables. It is important
to note that association is a concept that has no implication of causation.
In this lesson, and the next lesson, we will examine some widely employed
procedures that are used to analyze the relationship between the two variables
(e.g. sales and advertising). These procedures are part of what is known as
correlation and linear regression. We shall start by looking at correlation, then
look at regression in the next lesson.

120
STA 2100 Probability and Statistics I

2. Bivariate Data
So far we have confined our discussion to the distributions involving only one
variable. However, in practical applications, we might come across certain set
of data, where each item of the set may comprise of the values of two or more
variables.
A bivariate data is a a set of paired measurements which are of the form

• Marks obtained in two subjects by 60 students in a class.

• The series of sales revenue and advertising expenditure of the various branches
of an IT firm in a particular year.

• The series of ages of husbands and wives in a sample of selected married


couples.

In this kind of data, each pair represents the values of the two variables. Our
interest therefore, is to find a relationship (if it exists) between the two variables
under study.

2.1. Scatter Diagrams and Correlation


A scatter diagram is a tool for analyzing relationships between two variables. One
variable is plotted on the horizontal axis and the other is plotted on the vertical
axis. The pattern of their intersecting points can graphically show relationship
patterns.
Most often a scatter diagram is used to show a cause-and-effect relationships.
However, while the diagram shows relationships, it does not by itself prove that
one variable causes the other. In brief, the easiest way to visualize Bivariate Data
is through a Scatter Plot. “Two variables are said to be correlated if the change
in one of the variables results in a change in the other variable”.

Positive and Negative Correlation

Positive Correlation

If the values of the two variables deviate in the same direction i.e. if an increase (or
decrease) in the values of one variable results, on an average, in a corresponding
increase (or decrease) in the values of the other variable the correlation is said to
be positive. Some examples of series of positive correlation are:

121
STA 2100 Probability and Statistics I

i. Heights and weights;


ii. Household income and expenditure;
iii. Price and supply of commodities;
iv. Amount of rainfall and yield of crops.

Negative Correlation

Correlation between two variables is said to be negative or inverse if the variables


deviate in opposite direction. That is, if the increase in the variables deviate in
opposite direction. That is, if increase (or decrease) in the values of one variable
results on an average, in corresponding decrease (or increase) in the values of other
variable. For instance, Price and demand of goods.
The following figure shows the different examples of correlations:

Interpreting a Scatter Plot

Scatter diagrams will generally show one of six possible correlations between the
variables:

1. Strong Positive Correlation: The value of Y clearly increases as the value of


X increases.

122
STA 2100 Probability and Statistics I

2. Strong Negative Correlation: The value of Y clearly decreases as the value


of X increases.

3. Weak Positive Correlation: The value of Y increases slightly as the value of


X increases.

4. Weak Negative Correlation: The value of Y decreases slightly as the value


of X increases.

5. Complex Correlation: The value of Y seems to be related to the value of X,


but the relationship is not easily determined.

6. No Correlation: There is no demonstrated connection between the two vari-


ables

2.2. Correlation Coefficient


Correlation coefficient measures the degree of linear association between 2 paired
variables. It takes values from +1 to –1. That is;

1. If r = +1,we have perfect positive relationship .

2. If r = −1,we have perfect negative relationship

3. If r = 0 , there is no relationship i.e the variables are uncorrelated.

Steps in computing the correlation coefficient

1. List the two values for each participant. You should do this in a column
format so as not to get confused

2. Compute the sum of all the x values, and compute the sum of all the y
values.

3. Square each of the xvalues and square each of the y values

4. Find the sum of xy products

123
STA 2100 Probability and Statistics I

3. Methods of determining correlation


A correlation metric is called a correlation coefficient, usually designated by
the letter r. There are three most commonly used correlation metrics, namely:

• Pearson’s moment correlation coefficient

• Spearman’s rho

• Kendall’s tau.

3.1. Pearson’s Product Moment Correlation Coefficient


Pearson’s product moment correlation coefficient, usually denoted by r, is one
example of a correlation coefficient. It is a measure of the linear association
between two variables that have been measured on interval or ratio scales, such as
the relationship between height in inches and weight in pounds. However, it can
be misleadingly small when there is a relationship between the variables but it is a
non-linear one.
This was defined by a British Biometrician known as Karl Pearson. In his
definition, he defined r as;

Cov(X, Y ) Sxy
rxy = =
Sx Sy Sx Sy
where Cov(X, Y ) is the co-variance between the variables X and Y which is
given by;

Σ(x − x)(y − y)
Cov(x, y) =
n
This equation can be simplified to:

Σxy − nx y
Cov(x, y) =
n

where r
Σx Σy Σ(x − x)2
x= , y= , Sx = and
n n n
r
Σ(y − y)2
Sy =
n

124
STA 2100 Probability and Statistics I

Example. Given the following data, where X is the Number of power blackouts at
night reported in a month and Y is the corresponding Number of crimes reported in
that month at Juja police station, Thika. Find the Pearson’s moment correlation
coefficient and comment on the results.
X 93 44 53 08 71 81 06 10 32 21
Y 45 62 12 28 92 84 73 03 51 32
Solution:
r
Σ(x − x)2
Sx = = 30.174
n
r
Σ(y − y)2
Sy = = 28.368
n
Cov(X, Y ) 368.52
rxy = = = 0.4305
Sx Sy 30.174 ∗ 28.368
The formula for the linear correlation coefficient r for data points can be written
if different forms. For instance,

Cov(X, Y )
rxy =
Sx Sy

or
n(Σxy) − (Σx)(Σy)
rxy = p p
n(Σx2 ) − (Σx)2 n(Σy 2 ) − (Σy)2
where n is the number of pairs of reading. 
Using the second expression, we can obtain correlation as illustrated using the
following example;
Example. Suppose we have the following data on Age Versus the Price of Datsun
Z cars.
Age (X) 5 7 6 6 5 4 7 6 5 5 2
Price (Y) 80 57 58 55 70 88 43 60 69 63 118
We can use the data to answer the following questions:

1. Compute the linear correlation coefficient r of the data

2. Interpret the value of r obtained in the previous part above in-terms of the
linear relationship between age and price

125
STA 2100 Probability and Statistics I

3. Discuss the graphical implications of the value of r

To answer this question, we can organize the data by obtaining;

n, Σx, Σy, Σxy, Σx2 , Σy 2

and then applying the formula:

n(Σxy) − (Σx)(Σy)
rxy = p p
n(Σx2 ) − (Σx)2 n(Σy 2 ) − (Σy)2

Example-Solution

1. The Datsun Z data can be summarized as follows, with the last Row indi-
cating the sum of the respective columns (Σ), and n=11

x y xy x2 y2
5 80 400 25 6,400
7 57 399 49 3,249
6 58 348 36 3,364
6 55 330 36 3,025
5 70 350 25 4,900
4 88 352 16 7,744
7 43 301 49 1,849
6 60 360 36 3,600
5 69 345 25 4,761
5 63 315 25 3,969
2 118 236 4 13,924
58 761 3,736 326 56,785
Applying the formula

n(Σxy) − (Σx)(Σy)
rxy = p p
n(Σx ) − (Σx)2 n(Σy 2 ) − (Σy)2
2

that is
11(3, 736) − (58)(761)
rxy = p p
11(326) − (58)2 11(56, 785) − (761)2

126
STA 2100 Probability and Statistics I

therefore
rxy=−0.957

1. The linear correlation coefficient, r = −0.957suggests that there is a strong


linear relationship between age and price of the Datsun Z cars. In particular
then, it indicates that as age increases, there is tendency for the price to
decrease.

2. This implies that the data points are expected to be clustered closely about
the regression line.

Exercise 19. A study was conducted to find whether there is any relationship
between the weight and blood pressure of an individual. The following set of data
was arrived at from a clinical study.
Weight 78 86 72 82 80 86 84 89 68 71
Blood 140 160 134 144 180 176 174 178 128 132
Pres-
sure
Determine the coefficient of correlation for this set of data in the above table;
Exercise 20. Obtain the correlation coefficient of the following data, and com-
ment on your result.
Mean 14.2 14.3 14.6 14.9 15.2 15.6 15.9
Temp
(x)
Pirates 35000 45000 20000 15000 5000 400 17
(y)
Again we need to find the value of the following:

n, Σx, Σy, Σxy, Σx2 , Σy 2

3.2. Spearman’s Rank Correlation


This method was introduced by Charles Spearman in the early 1904. He derived the
formula from the definition of Karl Pearson’s correlation coefficient by considering
a case of non-repeated entries. However, we shall not get into the derivation of
the formula but instead state and use the formula.

127
STA 2100 Probability and Statistics I

This method is based on the assumption that the population being studied is
normally distributed . Therefore, it is possible to avoid making any assumptions
about the populations being studied by ranking the observations according to size,
and basing the calculation on the ranks rather than upon the original observations.
It does not matter which way (direction) the items are ranked [ascending or
descending ]
The formula to determine the coefficient of rank correlation, rs , is given by

6Σd
rs = 1 −
n(n2 − 1)

where n is the sample size and d is the difference in ranks of the variables xi and
yi .
The Spearman’s rank correlation also lies between −1 and +1. It is based
on non-parametric test i.e. it doesn’t assume the distribution where the sample
comes from.
Remark 1. To find Spearman’s rank correlation coefficient, we first give ranks to
the given values as per their hierarchy, then proceed to apply the formula. The
ranking order must be the same for the two variables, i.e either both ascending
or both descending .

Example. Based on the data in Example 1, on crime and blackout in Juja, lets
check using Spearman’s rank correlation if there is correlation between crime and
blackout in the town. That is;

128
STA 2100 Probability and Statistics I

x y Rank of x Rank of y di =(Rank of x-Ran of y) d2i


93 45 1 6 -5 25
44 62 5 4 1 1
53 12 4 9 -5 25
08 28 9 8 1 1
71 92 3 1 2 4
81 84 2 2 0 0
06 73 10 3 7 49
10 03 8 10 -2 4
32 51 6 5 1 1
21 32 7 7 0 0
110

The Spearman’s rank correlation,

6Σd
rs = 1 −
n(n2 − 1)

6 ∗ 110
rs = 1 − = 0.3333
10(102 − 1)
Once again the value of rs is positive, even though slightly different from the value
obtained using the Pearson’s correlation method [rs = 0.4305]. This difference in
the value of rs , goes to confirm that this second method is simply an estimation
of the Pearson’s method. However, it is evident that by using this method, one is
still able to arrive at the same decision as that of the Pearson’s method.

Features of the Spearman’s correlation

• The sum of the differences of ranks between two variables is equal to Zero.
Symbolically Σd = 0

• Spearman’s correlation coefficient is distributed-free or non-parametric be-


cause no strict assumptions are made about the form of population from
which sample observations are drawn.

129
STA 2100 Probability and Statistics I

• The Spearman’s correlation coefficient is nothing but Karl Pearson’s corre-


lation coefficient between ranks. Hence it can be interpreted in the same
manner as Pearson’s correlation coefficient.

Exercise 21. The data given below are obtained from student records, that
is Grade Point Average,GPA (x) and Graduate Record Exam, GRE score (y).
Calculate the rank correlation coefficient r for the data.
Subject1 2 3 4 5 6 7 8 9 10
x 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260
Note that in the x row, we have two students having a grade point average of
8.6 and also in the y row; there is a tie for 2000.

4. Summary
Correlation Analysis is a method designed to measure the degree of association
between variables. E.g. correlation may express the degree of association
between verbal and mathematical scores of students on entering a college.

Data which are arranged in ascending order are said to be in ranks or ranked data.
The coefficient of correlation for such type of data is given by Spearman
rank difference correlation coefficient and is denoted by rs .

Measures of central tendency and measures of variability are not the only descrip-
tive statistics we are interested in using to get a picture of what a set of
scores looks like.

We have already learnt that knowing the values of the one most representative
score (central tendency) and a measure of spread or dispersion (variability) is
critical for describing a characteristics of a distribution. However, sometimes
we are as interested in the relationship between variables or to be more
precise, how the value of one variable changes when the value of another
variable changes.The way we express this interest is through the computation
of a simple correlation coefficient .

A correlation coefficient is a numerical index that reflects the relationship between


two variables. The value of this descriptive statistics ranges between −1 and

130
STA 2100 Probability and Statistics I

+1. A correlation between two variables is sometimes referred to as bivariate


(for two variables) correlation. Even more specifically,the type of correlation
that we have talked about in the majority of this lesson is called the Pearson
product moment correlation, named for its inventor,Karl Pearson.

The Pearson correlation coefficient examines the relationship between two vari-
ables, but both of these variables are continuous in nature. In other words,
they are variables that can assume any value along some underlying con-
tinuum, such as height, age, test score, or income. But there is a host of
other variables that are not continuous. They are called discrete or cate-
gorical variables, like race (such as black and white) social class (such as
high and low ) and political affiliation (such as Democrat and Republi-
can).You need to use other correlation techniques, which we don’t cover in
this module.

There are several (easy but important) things to remember about the correlation
coefficient:

1. A correlation can range in the value from −1 to +1.

2. The absolute value of the coefficient reflects the strength of the correlation.
So a correlation of -0.70 is stronger than a correlation of +0.50.

3. One of the frequently made mistakes regarding correlation coefficient oc-


curs when students assume that a direct or positive correlation is always
stronger (ie better) than an indirect or negative correlation because of
sign and nothing else.

4. A correlation always reflects the situation where there are at least two data
points (or variables) per case.

5. Another easy mistake is to sign a value judgment to the sign of the corre-
lation. Many students assume that a negative relationship is not good and
a positive one is good. That’s why instead of using the terms “negative”
and “positive” the terms “indirect” and “direct” communicate meaning more
clearly.

131
STA 2100 Probability and Statistics I

6. The Pearson product moment correlation coefficient is represented by the


small letter r with subscript representing the variables that are being corre-
lated. For examplerxy is the correlation between variable x and variable y;
rweight−height is the correlation between weight and height e.t.c.

Remark 2. What’s really interesting about correlations is that they measure the
amount of distance that one variable co-varies in relation to another. So, if both
variables are highly variable (have lots of wide-ranging scores), the correlation
between them is more likely to be higher than if not. Now that’s not to say that
lots of variability guarantees a higher correlation because the scores have to vary in
a systematic way. But if the variance is contained in one variable, no matter how
much the other variable changes, the correlation will be lower. For example,lets say
you are examining the correlation between academic achievement in high school
and a first-year grades in colleges and you look at only the top 10 of the class.
Well, that top 10 is likely to have very similar grades, introducing no variability
and no room for the one variable to vary as a function of the other. Guess what
you get when you correlate one variable with another that does not change? i.e
rxy = 0.

132
STA 2100 Probability and Statistics I

Learning Activities
1. Refer to the Datsun Z cars data. Obtain the Spearma’s rank correlation
coefficient and compare your result with the Pearson’s moment correlation
result given in the notes.

2. The following data relate to ages of husbands and their wives. Obtain the
Pearson’s moment correlation coefficient and the Spearma’s rank correlation
coefficient. Hence compare the two results.

Husband 18 19 25 27 30 35 36 40 42 44
Wife 16 17 24 23 22 30 28 36 40 41
Consider the following data and draw a scatter plot
X 1.0 1.9 2.0 2.9 3.0 3.1 4.0 4.1 5.0
Y 10 99 100 999 1,000 1,001 10,000 10,001 100,000
The ranks of two sets of variables (Heights and Weights) are given below.
Calculate the Spearman rank difference correlation coefficient r.
1 2 3 4 5 6 7 8 9 10
Heights 2 6 8 4 7 4 9.5 4 1 9.5
Weights 9 1 9 4 5 9 2 7 6 3

133
STA 2100 Probability and Statistics I

Chapter 9
Relations (Simple Linear Regression)

Learning outcomes
Upon completing this topic, you should be able to:

• Define correlation, regression and know the link between these two concepts

• Know the assumptions of linear regression

• Calculate the equations of least squares regression lines, and use them to
estimate any given values for a set of data

• Interpret the meaning of the values obtained in the regression equation, and
how the values link to the correlation coefficient.

134
STA 2100 Probability and Statistics I

1. Introduction Regression
As indicated in introductory section of the previous lesson, we shall now be looking
at regression in this lesson, and specifically simple linear regression.
If two variables are significantly correlated, and if there is some theoretical basis
for doing so, it is possible to predict values of one variable from the other.
Regression analysis, in general sense, means the estimation or prediction of the
unknown value of one variable from the known value of the other variable. It is
one of the most important statistical tools which is extensively used in almost all
sciences – Natural, Social and Physical.
“Regression analysis is a mathematical measure of the average relationship
between two or more variables in terms of the original units of the data.”

Definitions

Regression analysisis a measure of the average relationship between two or


more variables in terms of the original units in which the data was given.
It provides a mathematically expression, an equation for estimating or pre-
dicting the values of one variable from the known of one or more other
variables.

Predictions

One of the primary advantages of knowing about a relationship between two vari-
ables is that one can use the knowledge to facilitate making predictions. Specif-
ically when one has exact knowledge of the individual score on one of the two
variables, then he/she can use the knowledge of the relationship to increase the
accuracy of a prediction of the individuals’ score on the other variable. As a note,
by the term prediction in this case, we mean a “best guess” of what a single
value of score will be. e.g. the value to be predicted can be the number of years
that a 62 yr old woman with high blood pressure will live or the time an individual
machine will take before it will break down. Therefore, a prediction is a guess
about the value of a term to be drawn from a specified population.

A predictor variable is one that provides relevant information for predicting what
scores will be on some other variable.

135
STA 2100 Probability and Statistics I

A predicted variable is one about which predictions are made. (An exact rela-
tionship between the predictor and predicted variable is very essential if we
are to make the most accurate predictions possible)

Simple Regression: Here, we use a single variable to estimate or predict another


variable.

Linear Regression Analysis: A regression analysis is called linear if the equation


of the method represents a straight line.

Curvilinear Regression Analysis: A regression Analysis is called curvilinear if


it represents a curve.

Multiple Regressions: This is a multivariate regression; that is it involves several


variables.

Independent Variable: This is the variable whose value is known.

Dependent Variable: this is the variable whose value is to be predicted by


convention, the independent variable is denoted by X and the dependent
variable by Y.

The simplest form of regression analysis, called simple linear regression or straight
line regression which involves the statistical modeling between a single input fac-
tor X (the “regerssor”) and a single output variable Y (the “response”).

1.1. Simple Linear (least squares) Regression Model


• I believe that at some point in your life, you have encountered the equation
of a straight line,y=mx+b.

• In this equation m is the “slope” of the line (change in y over change in


x) and b is the “intercept” of the line where the y-axis is intersected by the
line.

• The plot of a line with slope 2 and intercept 1 is depicted in the following
figure:

136
STA 2100 Probability and Statistics I

• Even though the straight-line model is perfect for algebra class, in the real
world finding such a perfect linear relationship is next to impossible.

• Most real world relationships are not perfectly linear models, but imperfect
models where the relationship between x and y is more like the correlation plot
we saw earlier.

• In this case, the question literally is “Where do you draw the line?”.
Simple linear regression is the statistical technique to correctly answer this
equation.

• Simple linear regression is the statistical model between X and Y in the real
world, where there is random variation associated with measured variable
quantities. To study the relationship between X and Y, the simplest rela-
tionship is that of a straight line, as opposed to a more complex relationship
such as a polynomial.

• Therefore in most cases we want to try to fit the data to a linear model.

• Plotting X versus Y, as we did in the correlation plot is a good first step

137
STA 2100 Probability and Statistics I

to determine if a linear modelis appropriate. This may reveal a a lot of


details about the data at hand.

• Sometimes data can be transformed (often by taking logarithms, square-root


or other mathematical methods) to fit a linear pattern if they do not do so
in the original measurement scale.

• Therefore, determining if a straight line relationship between X and Y is


appropriate is the first step.

• The second step, once it is determined that a linear model is a good idea,
is to determine the best fitting line that represents the relationship.

1.2. Assumptions of linear regression


Generally, to fit a regression modellinear or otherwise, some assumptions have
to be made and these include:

• For any given value of X, the true mean value of Y depends on X, which can
be written µy|x .

• In regression, the line represents mean values of Y not individual data values.
Each observation Yi is independent of all other, Yj 6= Yi

• We assume linearity between X and the mean of Y. The mean of Y is


determined by the straight line relationship which can be written as: µy|x =
β0 + β1 X or y∗ = a + bx where the betas are the slope and intercept of
the line.

• The variance of the Yi s is constant (homoscedasticity property).

• The response Yi s are normally distributed with a constant variance.

1.3. Fitting the Regression Model/Equation


• Consider the correlation plot between gene1 and gene2 as show in the
following figure.

138
STA 2100 Probability and Statistics I

1.0
0.5
gene2

0.0
−1.0

−1.0 −0.5 0.0 0.5

gene1

• We can simply take a ruler draw in a line (straight?), which, according to


our subjective eyes, best goes through the data.

• This method is subject to much error and is unlikely we will produce the
“best fitting” line. Therefore a more sophisticated method is needed.

• Regression analysis can be thought of as being sort of like the flip side of
correlation. It has to do with finding the equation for the kind of straight
lines we have just looked at.

• Suppose we have a sample of size n and it has two sets of measures, denoted
by x and y. We can predict the values of y given the values of x by using
the equation, y∗ = a + bx . Where,

b = ( (xi − x)(yi − y))/ (xi − x)2


P P

This can further be rearranged and expressed as,

x y)/n x2 − ( x)2
P P P P P
b = (n xy −

For a we have

a = y + bx

or rewritten as
P P
a = ( y − b x)/n

The symbol y∗ refers to the predicted value of y from a given value of x from
the regression equation.

139
STA 2100 Probability and Statistics I

• Suppose we have the linear equation y = 25 + 20x which gives the total
cost, y of a word processing job. Given the amount of time required,x, we
can use the equation to determine the exact cost of the job,y.

• However, things are not quite simple as in this case of word processing exam-
ple. So more often than not we have to be content with rough predictions.
In fact, for many circumstances, the variable being predicted will vary even
for a fixed value of the variable being used to make the prediction.

• For instance, we cannot predict the exact price of a Datsun Z cars by just
knowing the age . Indeed even for a fixed age, say three (3) years old, the
price of a Datsun Z varies from car to car.

Example. Suppose we have the following data on Age Vs Price of Datsun Z’s.
Age(yrs)5 7 6 6 5 4 7 6 5 5 2
Price 80 57 58 55 70 88 43 60 69 63 118
($100)
It’s useful to plot the data so that we can visualize the apparent relationship
between Age and price. Such plot is known as a scatter diagram.

140
STA 2100 Probability and Statistics I

From the diagram, it’s clear that the points are not on a straight line, but it’s
apparent they are clustered about a straight line. Hence, we fit a straight line
to the data, and then we could use that line to predict the price of Datsun Z’s.
Since it is possible to draw many reasonable looking straight lines through the
cluster of points, we need a method to choose the “Best” line. The method used
is known as the Least-square criterion.
So how does it work?
Simple illustration;
Suppose we have two lines A and B drawn for a set of plots in a scatter
diagram, say;
Line A: y = 0.5 + 1.25x
Line B: y = −0.25 + 1.5x
Then we have the following predicted values, and errors for the two lines as
follows:
x y yˆA e= e2A yˆB e= e2B
y− y−
yˆA yˆB
0 2 0.5 1.5 2.25 - 2.25 5.0625
0.25
1 4 1.75 2.25 5.0625 1.25 2.75 7.5625
2 6 3 3 9.00 2.75 3.25 10.5625
3 8 4.25 3.75 14.0625 4.25 3.75 14.0625
P 2 P 2
eA = eB =
30.375 37.25
Where,
x is the observed value of x
y is the observed value of y
eA is the error made if we use line A for prediction
eB is the error made of we use line B for prediction

• The rule for choosing the best line among several possible lines, is that we
choose the line with the smaller value of e . This line will give the best
P 2

fit for the data at hand. This may not be an easy task as we shall be forced
to draw all the possible lines, which may not also be possible. To solve the

141
STA 2100 Probability and Statistics I

problem, we use the regression equation formula as previously illustrated.


The line obtained using that equation/formula gives the line with least sum
of squares, hence the name least square regression!

• Hence, from the above examples of lines A and B, we would choose line A
as it has the least square error, i.e. its the line of best fit for the data if we
were to consider only these two lines.

• The least-squares criterion tells us what property the best-fitting line to a


set of data points must have, but it does not present a formula that permits
us to actually determine the best-fitting line to a set of data points. (Here
we just use the formula although the formula is derived using elementary
calculus)

1.4. Regression Equation:


As previously indicated, the equation of the best-fitting line (regression line) to a
set of data points is given by;
y∗ = a + bx
Where,

x y)/n x2 − ( x)2
P P P P P
b = (n xy −

or rewritten as
P P
a = ( y − b x)/n

We can then derive the line of best fit for the Datsun Z cars example, and also
answer the following questions;
Example. Refer to Age Vs Price data for the Datsun Z’s:

1. Determine the regression equation for the data; i.e. find the equation of the
regression line.

2. Describe the apparent relationship between Age and price for Datsun Zs

3. What does the slope of the regression equation represent in terms of the
prices for Datsun Zs?

142
STA 2100 Probability and Statistics I

4. Use the regression equation to predict the price for a two year-old Z and a
five-year old Z.

Solution

1. To determine the regression equation, we need to compute and using the


formulas above. It is therefore convenient to construct a table of values for

n, Σx, Σy, Σxy, Σx2

and their sums as presented below:

x y xy x2
5 80 400 25
7 57 399 49
6 58 348 36
6 55 330 36
5 70 350 25
4 88 352 16
7 43 301 49
6 60 360 36
5 69 345 25
5 63 315 25
2 118 236 4
58 761 3,736 326

The slope of the regression equation is therefore:

b = [11(3736) − (58)(761)]/11(326) − (58)2 = −13.7

While the intercept is:

a = (761 − (13.7)(58)/11 = 141.43

Thus, the regression equation for this data is:


ŷ = 141.43 − 13.7x

143
STA 2100 Probability and Statistics I

2. To graph the regression equation, we need only substitute two different x-


values to obtain two distinct points (why? ). Using the x-values x = 2 and
x = 8. The corresponding y-values are:

ŷ = 141.43 − 13.7(2) = 114.03


ŷ = 141.43 − 13.7(8) = 31.83
Consequently, the regression line passes through the two points (2, 114.03)
and (8, 31.83). The plot of these points should be shown on the diagram and also
include the data points as given in the table. This is the straight line that best fits
the data points according the least-squares criterion (i.e. the straight line whose
sum of squared errors is smallest)

3. Here, we are to describe the apparent relationship between age and price for
Datsun Zs. Since the slope of the regression line is negative, we see that
the price tends to decrease as age increases-Any surprises!

4. For this part, we are to interpret the slope of the regression equation in
terms of the prices for Datsun Zs. To begin, recall that represents age, in
years, and represents price, in hundred dollars. The slope of -13.70 or $1,370
indicates that Datsun Zs depreciate an estimated $1,370 per year, at least
in the two-to seven year-old range.

5. Finally, we are meant to use the regression equation ŷ = 141.43 − 13.7x to


predict the price for a two-year old Z and a five-year old Z. i.e. x = 2 and
x = 5 and Hence, predicted price is;

ŷ = 141.43 − 13.7(2) = 114.03


Or $114.03. Similarly, the price for a five year old Z is:
ŷ = 141.43 − 13.7(5) = 72.93
Or $7,293.
Remark 3. Warning on use of linear regression line: The idea behind regres-
sion line is based on the assumption that the data points are actually scattered
around a straight line. But data points can at times be scattered about a curve.
Unfortunately the formulas for a and b will work for this data set but fit an in
appropriate regression line to the data.

144
STA 2100 Probability and Statistics I

Remark 4. If you plan to find a regression line for a set of data points, first look at
a scatter diagram of the data. If data points do not appear to be scattered about
a straight line, do not determine a regression line.

2. Exercises

Exercise 22. Scores made by students in a statistics class in the mid-term and
final examination are given here. Develop a regression equation which may be used
to predict final examination scores from the mid – term score.
Student 1 2 3 4 5 6 7 8 9 10
Mid- 98 66 100 96 88 45 76 60 74 82
term
Final 90 74 98 88 80 62 78 74 86 80

3. Revision Questions
The following is a list of questions that will assist you in your revision.
Practice Problems:

1. Let variable X is the number of hamburgers consumed at a cook-out, and variable Y


is the number of beers consumed. Develop a regression equation to predict how
many beers a person will consume given that we know how many hamburgers that
person will consume.

Subject 1 2 3 4 5
Hamburgers 5 4 3 2 1
Beers 8 10 4 6 2

2. A horse owner is investigating the relationship between weight carried and the finish
position of several horses in his stable. Calculate r and R for the data given

Weight 11 11 12 11 11 11 11 12 10 10 11 11
carried
Position 0
2 3
6 0
3 5
4 0
6 5 7
4 3
2 6
1 8
4 0
1 0
3
Finishe
3. The top and bottom number which may appear on a die are as follows Calculate r
d
and R for these values. Are the results surprising?

Top 1 2 3 4 5 6

Bottom 5 6 4 3 1 2

4. Researchers interested in determining if there is a relationship between death


anxiety and religiosity conducted the following study. Subjects completed a death
anxiety scale (high score = high anxiety) and also completed a checklist designed to
measure an individual’s degree of religiosity (belief in a particular religion, regular
attendance at religious services, number of times per week they regularly pray, etc.)
(high score = greater religiosity . A data sample is provided below:

X 38 42 29 31 28 15 24 17 19 11 8 19 3 14 6
y 4 3 11 5 9 6 14 9 10 15 19 17 10 14 18

a) What is your computed answer?

b) What does this statistic mean concerning the relationship between death
anxiety and religiosity?

c) What percent of the variability is accounted for by the relation of these two
variables?

5. The data given below are obtained from student records.( Grade Point Average (x)
and Graduate Record exam score (y)) Calculate the regression equation and compute
the estimated GRE scores for GPA = 7.5 and 8.5..

Subject 11 12 13 14 15 16 17 18 19 20
X 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260

145
STA 2100 Probability and Statistics I

6. A horse was subject to the test of how many minutes it takes to reach a point from
the starting point. The horse was made to carry luggage of various weights on 10
trials.. The data collected are presented below in the table. Find the regression
equation between the load and the time taken to reach the goal. Estimate the time
taken for the loads of 35 Kgs , 23 Kgs, and 9 Kgs. Are the answers in agreement with
your intuitive feelings? Justify.

Trial 1 2 3 4 5 6 8 8 9 10
Number 11
Weight 23 16 32 12 28 29 19 25 20
(in Kgs) 13
Time 22 16 47 13 39 43 21 32 22
taken
(in
7. A study was conducted
mins) to find whether there is any relationship between the weight
and blood pressure of an individual. The following set of data was arrived at from a
clinical study.

Serial 1 2 3 4 5 6 8 8 9 10
Number 78
Weight 86 72 822 80 86 84 89 68 71
Blood 140 160 134 144 180 176 174 178 128 132
Pressure
8. It is assumed that achievement test scores should be correlated with student's
classroom performance. One would expect that students who consistently perform
well in the classroom (tests, quizzes, etc.) would also perform well on a standardized
achievement test (0 - 100 with 100 indicating high achievement (x)). A teacher
decides to examine this hypothesis. At the end of the academic year, she computes a
correlation between the students achievement test scores (she purposefully did not
look at this data until after she submitted students grades) and the overall G.P.A.(y)
for each student computed over the entire year. The data for her class are provided
below.

X 98 96 94 88 01 77 86 71 59 6 8 7 7 7 8 8 7 9 9 6
Y 3.6 2.7 3.1 4.0 3.2 3.0 3.8 2.6 3.0 3 4 9
2 1 5 2
3 2 6 3
2 2 5 1 3 3
2 3 0 2
1
. . . . . . . . . . .
a) Compute the correlation coefficient. 2 7 1 6 9 4 4 8 7 2 6

b) What does this statistic mean concerning the relationship between


achievement test performance and G.P.A.?

c) What percent of the variability is accounted for by the relationship between


the two variables and what does this statistic mean?

d) What would be the slope and y-intercept for a regression line based on this
data?

e) If a student scored a 93 on the achievement test, what would be their


predicted G.P.A.? If they scored a 74? A 88?

9. With the growth of internet service providers, a researcher decides to examine


whether there is a correlation between cost of internet service per month (rounded to

146
STA 2100 Probability and Statistics I

the nearest dollar) and degree of customer satisfaction (on a scale of 1 - 10 with a 1
being not at all satisfied and a 10 being extremely satisfied). The researcher only
includes programs with comparable types of services. A sample of the data is
provided below.

Dollars 11 18 17 15 9 5 12 19 22 25
Satisfaction 6 8 10 4 9 6 3 5 2 10

a) Compute the correlation coefficient.

b) What does this statistic mean concerning the relationship between amount of
money spent per month on internet provider service and level of customer
satisfaction?

c) What percent of the variability is accounted for by the relationship between


the two variables and what does this statistic mean?

10. It is hypothesized that there are fluctuations in norepinephrine (NE) levels which
accompany fluctuations in affect with bipolar affective disorder (manic-depressive
illness). Thus, during depressive states, NE levels drop; during manic states, NE
levels increase. To test this relationship, researchers measured the level of NE by
measuring the metabolite 3-methoxy-4-hydroxyphenylglycol (MHPG in micro gram
per 24 hour) in the patient's urine experiencing varying levels of mania/depression.
Increased levels of MHPG are correlated with increased metabolism (thus higher
levels) of central nervous system NE. Levels of mania/depression were also recorded
on a scale with a low score indicating increased mania and a high score increased
depression. The data is provided below.

MHPG 980 1209 1403 1950 1814 1280 1073 1066 880 776

Affect 22 26 8 10 5 19 26 12 23 28

a) Compute the correlation coefficient.

b) What does this statistic mean concerning the relationship between MHPG
levels and affect?

c) What percent of the variability is accounted for by the relationship between


the two variables?

d) What would be the slope and y-intercept for a regression line based on this
data?

e) What would be the predicted affect score if the individual had an MHPG level
of 1100? of 950? of 700?

147
STA 2100 Probability and Statistics I

4. Learning Activities
1. Daniel computed the following statistics based on the amount (X) in millions
(Kshs) that he invested in his cyber café business, and the income (Y) in
millions (Kshs) generated.
P P 2 P P P 2
n = 10, xi = 93, xi = 999, xi yi = 293, yi = 28, yi = 90

• Using the data, fit a linear regression line of the income (y) generated on
the amount (x) invested.

• Use the regression equation to determine how much Daniel would realize if
he invested Kshs 2.5M and comment on your results.

148
STA 2100 Probability and Statistics I

Chapter 10
Revision Questions

Learning outcomes
Upon completing this lesson, you should be able to:

• Have an overview of the whole course, and be in a position to answer any


questions that may arise; be it in Measures of central tendency, variabil-
ity/spread, probability and probability distributions, correlation and simple
linear regression amongst others.

149
STA 2100 Probability and Statistics I

1. Introduction
So far we have looked at all the areas that we needed to cover in this module. In
this chapter, we are going to look at some revision questions that may help us in
reviewing the course content, and be able to handle the queries that may come at
the end of the semester or in the subsequent years of study.
The questions that we provide in this section are divided into two sections.
In Section One, we list some example questions that you may attempt, while in
Section Two, we give some exercises with solutions for your revision purposes.
Example. Two tetrahedral dice are rolled together once and the sum of the scores
facing down was noted. Find the pmf of the random variable ‘the sum of the
scores facing down.’
Solution:
1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
i.e. X = {2, 3, 4, 5, 6, 7, 8}
Therefore x is given the pmf by the table below:
x 2 3 4 5 6 7 8
1 2 3 4 3 2 1
P (X = x) 16 16 16 16 16 16 16
This can also 
be written as a function;
 x−1 , f or x = 2, 3, 4, 5
16
P (X = x) = 
 9−x , f or x = 6, 7, 8
10

2. Sample Questions

Exercise 23. Explain the role of statistics in various fields.


Exercise 24. The following nine measurements are the heights in inches in a
sample of nine IT CEOs in Naivasha town:
Height (X): 69, 66, 67, 69, 64, 63, 65, 68, 72.
Calculate the sample variance of the heights.
Exercise 25. An analysis of time taken by a company’s TQM Manager to train
employees on Risk management for a period of 120 days was recorded as follows.

150
STA 2100 Probability and Statistics I

Training 30- 35- 40- 45- 50- 55-


duration 35 40 45 50 55 60
in days
Frequency,f 17 24 19 28 19 13
Determine the mean and standard deviation of the access time for this program
Your manager who is not familiar with grouped data would like to know what
these results mean. Explain.
Exercise 26. A die is loaded such that the probability of a face showing up is
proportional to the face number. Determine the probability of each sample point.
Exercise 27. Roll a fair die and let X be the square of the score that show
up. Write down the probability distribution of X hence compute P (X ≤ 15) and
P (3 ≤ X < 30)
Exercise 28. Let X be the random variable the number of fours observed when
two dice are rolled together once. Show that X is a discrete random variable.
Exercise 29. The pmf of a discrete random variable X is given by P (X =
x) = kx for x = 1, 2, 3, 4, 5, 6. Find the value of the constant k, P (X < 4) and
P (3 ≤ X < 6)
Exercise 30. A fair coin is flip until a head appears. Let N represent the number
of tosses required to realize a head. Find the pmf of N.
Exercise 31. A random variable X has the following probability function
X 0 1 2 3 4 5 6 7
P (X =0 K 2K 2K 3K K 2 2K 2 K 2 +
x) K
Evaluate i. P (X ≥ 3) ii. E(X ) iii. V ar(4X + 6) iv. F (6)
3

Example. The following table gives the Age(X), and Price (Y) in (£’000) of cars
driven by Eleven IT department Lecturers who attended a retreat in Mombasa in
the month of January, 2014. Use the data to answer the following questions:
Age(X) 5 7 6 6 5 4 7 6 5 5 2
Price 8.0 5.7 5.8 5.5 7.0 8.8 4.3 6.0 6.9 6.3 11.8
(Y)(£’000)
(i) Determine the regression equation of Price (Y) on Age (X)
(ii) Estimate the price of a car aged 3 years

151
STA 2100 Probability and Statistics I

(iii) Estimate the expected age, if the car is costing £ 9,000.


(iv) Compute the linear correlation coefficient, r of the data
(v) Interpret the result in part
(iv) in terms of the linear relationship between age and price
(vi) Discuss the graphical implications of the value of r

Exercise 32. The following data relate to ages of husbands and their wives.
Obtain the Spearman’s rank correlation coefficient and comment on your result.
Husband’s
18 19 25 27 30 35 36 40 42 44
Age
Wife’s 16 17 24 23 22 30 28 36 40 41
Age
Exercise 33. Interviews with 36 IT students at JKUAT revealed that 10 of the 16
ladies and 16 of the 20 men preferred the course Distributed Systems to Artificial
Intelligence. Determine the probability that the first person interviewed was either
a man or someone who preferred Distributed Systems to Artificial Intelligence,
assuming each of the 36 people were equally likely to have been the first to be
interviewed.
Exercise 34. The frequency distribution table below shows the number of com-
puters available for use in a computer lab for a period of 60 days.
Number of 0- 12- 24- 36- 48-
Days 12 24 36 48 60
Number of 3 10 8 4
Computers
If the mode of data was 29 days, find the number of computers available
between “24-36” days.

Problem 1. The percentages of families having television sets and a computer or


both are 87%, 36% and 29% respectively. Based on this information, is the family
ownership of a television set independent of the family ownership of a computer?

Exercise 35. A conservative design team and an innovative design team were
asked separately to design a new product within a period of one month. From past
experience, we know that: The probability that the conservative team is successful
is 2/3, while the probability that the innovative team is successful is 1/2. The

152
STA 2100 Probability and Statistics I

probability that at least one team is successful is 43 . Assuming that exactly one
successful design is produced, what is the probability that it was designed by the
innovative design team?
Exercise 36. An urn contains 10 balls: 4 red and 6 blue. A second urn contains
16 balls with an unknown number of blue balls. A single ball is drawn from each
urn. The probability that both balls are of the same colour is 0.44. Calculate the
number of blues balls in the second urn.
Exercise 37. Explain what you understand by simple random sampling and
stratified random sampling.

Problem 2. An internet service provider has installed 15 modems to serve the


needs of a population of 100 dialup users/customers. It is estimated that at
a given time, the probability that each customer will need a connection is 0.1,
independent of the others. What is the probability that there are 8 customers
needing connection at a given time?

Problem 3. Let X be a random variable from a binomial distribution, X B(n, p).


Suppose that E(X) = 5 and V ax(X) = 4, find n and p.

Problem 4. Explain the meaning of the following two statements:

(a) Demand and supply are positively linearly correlated


(b) Age and Salary are linearly uncorrelated

Problem 5. A and B are two identical boxes. Box A contains 5 Diamond rings
and 4 Gold rings. Box B contains 6 Diamond rings and 5 Gold rings. A box is
chosen at random, and from it a ring is drawn at random and then put into the
other box. A ring is then drawn at random from this latter box. illustrate the
information given is a probability tree diagram, hence determine the probability
that the first ring drawn is a Diamond ring given that the second ring is a Gold
ring.

Exercise 38. Past records show that 5% of Bridging Mathematics students who
pass the JKUAT certificate examination join an IT course at JKUAT Westland
campus. If a group of Six students, selected at random from former Bridging
Mathematics students had passed the examination, what is the probability that;

153
STA 2100 Probability and Statistics I

(a) None of these former students will be doing an IT course at JKUAT West-
land campus?
(b) At most two will join the IT course at JKUAT Westland campus?
Exercise 39. Giving relevant examples, define the following terms;
(i) Secondary data
(ii) Discrete random variable
(iii) An event
(iv) Probability space

Problem 6. The mean and variance of a random variable X are 4 and 2.25 respec-
tively. Find the mean and standard deviation of the random variable Y=15X+5.

Problem 7. The discrete random variable X has the probability distribution shown
below.
x 0 1 2 3
P(x) 0.2 p q r
If F (1) = 0.3 and E(X) = 1.7, find V ar(X).

Exercise 40. The following table shows the marks often candidates in Physics
and Mathematics. Find the product-moment correlation coefficient and comment
on your result.
Mark 18 20 30 40 46 54 60 80 88 92
in
Physics
(x)
Mark 42 54 60 54 62 68 80 66 80 100
in
Math-
emat-
ics
(y)
Exercise 41. Two judges rank the eight photographs in a competition as follows:

154
STA 2100 Probability and Statistics I

Photograph A B C D E F G H
1st Judge 2 5 3 6 1 4 7 8
2nd Judge 4 3 2 6 1 8 5 7
Calculate Spearman’s coefficient or rank correlation for the data.

3. Exercises with solutions

Exercise 42. A Personnel Manager in the city of Mwaimbasa divides the com-
pany days into “Good”, “Better” or “Best”. He estimates that the probability
of a good day is 0.5 and that 30%of the company days are better. He has also
calculated that the company’s average revenue on the three types of days is Kshs
40 million, Kshs 130 million and Kshs 220 million respectively. If the company’s
average running cost per day is Kshs 80 million, calculate the company’s expected
profit per day.
Exercise 43. The Governor of Kiambu County has allocated funds for entertain-
ment for his office to the tune of 52 Million. When asked to explain the rationale
of his budget, he gave several explanations. Of interest to note, he mentioned
that within the few days that he had been in office, each of the five working days
had several people of different background visiting his office. He went on to check
the records with the secretary and noted the number of visitors coming to his
office per day at once, and their respective frequencies, which he later converted
to probabilities as shown below.
X 1 2 3 4 5
P (X = x) 0.3 a 0.1 0.2 b
Using this probability distribution, the Governor computed the expected value
as, E(X) = 3.1. Hence;
Example. What is the purpose of the measures of central tendency and dispersion?
Solution:
(1) Descriptive statistics (numerical measures) has two branches: measures of
central tendency and measures of dispersion.
(2) Measures of central tendency are the mean, median, and mode. These
show the central location around which all data points tend to congregate. They
determine the central value around which the various items concentrate, which is
used for describing the data

155
STA 2100 Probability and Statistics I

(3) Measures of dispersion are the range, standard deviation, variance, inter-
quartile range to name a few. These show how the data spreads or varies from
the central point. Together these two branches describe the data.


Exercise 44. MAUA is a consulting firm working independently on two separate


jobs. There is a probability of only 0.3 that either of the jobs will be finished on
time. Find the probability that:
Exercise 45. Last week James decided to visit his grandparents who stay 200km
away from his current home. On his way to the grandparents’ home, he drives at a
speed of 80km/h. While coming back, there was traffic jam, so he only managed
a speed of 50km/h. What was James’ average speed for the round trip?

156
STA 2100 Probability and Statistics I

4. Conclusion
In this lesson we have tried to give a list of some of the questions that one may face
while dealing with the issues learnt in this course. While we encourage you to do as
many questions from the list as possible, if not ALL, we also discourage students
from memorizing the questions given here. Rather, we encourage students to look
at every question listed here, both problem sets and exercises, and always ensure
that they are aware of the concept being tested in each question, for instance,
binomial distribution and its applications. That way, you will have learnt more
regarding Probability and Statistics I, and be on a good footing for the coming
course in Probability and Statistics II.
We wish you the very best in the course and always ask questions whenever
something is not clear to you!

—Adieu—

@HKOA

157
STA 2100 Probability and Statistics I

Solutions to Exercises
Exercise 1. S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} Exercise 1
Exercise 2. P = {..., −2, −1, 0, 1, 2, 3, 4, 5} Exercise 2
Exercise 3. Q = {3, 4, 5, 6, 7, 8, 9} Exercise 3
Exercise 4. For the first word, we have 3!4!2!2!1!1!
13!
= 10810800 arrangements while
for the second word we have 5!2!2!1!1!
11!
= arrangements Exercise 4
Exercise 5. The possible number of committees can be found in two different
approaches.
Option 1
That is, All possible number of committees is 10 C5 = 252. A committee
without any woman (that is all 5 men) is obtained as 6 C5 = 6 C1 = 6. Hence the
total number of ways of obtaining at least one woman is 252 − 6 = 246 ways.
Option 2
For the second option, we can get all the possible committees with at least
one woman as follows:
6
C1 ∗ 4 C4 = 6 (1 man and 4 women, total 5 members chosen)
6
C2 ∗ 4 C3 = 60 (2 men and 3 women, total 5 members chosen)
6
C3 ∗ 4 C2 = 120 (3 men and 2 women, total 5 members chosen)
6
C4 ∗ 4 C1 = 60 (4 men and 1 woman, total 5 members chosen)
Thus all possible ways is 6 + 60 + 120 + 60 = 246 ways
Exercise 5
Exercise 7. (a) This is a cluster sample because each department is a naturally
occurring subdivision. (b) This is a convenience sample because you are using
the lecturers that are readily available to you. (c) This is a systematic sample
because the lecturers are divided by department and some from each department
are randomly selected. Exercise 7
Exercise 8. A pie chart is mostly useful in displaying a relative frequency (per-
centage) distribution; similar to Bar chart while a histogram is useful for revealing
the general pattern or distribution of (quantitative) values. Exercise 8
Exercise 10. 1. Finding the median.
Arrange the numbers in order from lowest to highest, and find the number in
the middle of the set. 50, 50, 55, 60, 60, 60, 65, 65, 80, 90, and 99

158
STA 2100 Probability and Statistics I

The median, or middle, value is 60 mg.


2. Find the Mode. The number that appears the most is the mode. There-
fore, the mode is 60 mg.
3. Find the Mean. The average of the data is the mean. The mean is
calculated as the sum of all the data elements divided by the number of elements.
That is, 734/11 = 66.73 mg. Exercise 10
Exercise 11. Find the mean x̄.

9 + 10 + 5 + 6 + 5 + 7 + 8 + 5 55
=
8 8

= 6.875
xi xi − x̄ (xi − x̄)2
5 -1.875 3.156
6 -0.875 0.766
5 -1.875 3.156
7 0.125 0.016
8 1.125 1.266
9 2.125 4.516
10 3,125 9.766
5 -1.875 3.156
X
(x − x̄)2 = 26.878

Hence s2 = 26.878
8−1
=3.839

The standard deviation is 3.839 = 1.959 Exercise 11
Exercise 13. For grouped continuous data with n = 120 ,
Q1 is the 14 nth value i.e the 30th value,
Q2 is the 42 nth value i.e. the 60th value,
Q3 is the 34 nth value i.e. the 90th value.
Our table should now look like this

159
STA 2100 Probability and Statistics I

Time CF
9.5-14.5 2
14.5-19.5 7
19.5-24.5 24
24.5-29.5 57
29.5-34.5 84
34.5-44.5 109
44.5-59.5 116
59.5-89.5 119
89.5-119.5 120
For solution (a)
Q1 lies in the interval 24.5-29.5 (width=5)
There are 33 items in this interval
So Q1 = 24.5 + 33 6
∗ 5 = 25.4 min
Q2 lies in the interval 29.5-34.5 (width=5)
There are 27 items in this interval
So Q1 = 29.5 + 27 3
∗ 5 = 30 min
Q3 lies in the interval 34.5-44.5 (width=10)
There are 25 items in this interval
So Q1 = 34.5 + 25 6
∗ 10 = 36.9 min
This is an implementation of the formula median = LM + ( N/2+F fM
M −1
)CM
Solution (b)
Q3 − Q2 = 6.9 min, Q2 − Q1 = 4.6 min
Since Q3 − Q2 > Q2 − Q1 , it implies that we have a positive skew
Exercise 13
P P 2
Exercise 14. x = 101.4, x = 102.83, n = 100
Therefore, x̄ = x/n = 101.4/100 = 1.014
P

So the mean volume is 1.014 litres.


x2 /n − x̄2 = 102.83/100 − 1.0142 = 0.0101...
pP p
s=
Thus, standard deviation is 0.010 (2 s.f.)
Exercise 14
Exercise 15.

160
STA 2100 Probability and Statistics I

From the table,


3
20
+ x = 14 =⇒ x = 41 − 20
3 2
= 20
1
4
+ t = 1=⇒t = 43
Since A and B are independent;
P (A) = P (A/B) =⇒ P (A) = P P(A∩B)
(B)
=⇒ P (A ∩ B) = P (A) ∗ P (B)
So, P (A ∩ B) = P (A) ∗ P (B) gives 3/20 = P (A) ∗ 1/4 =⇒ P (A) = 3/20
1/4
=
3/5
Thus, 3/20 + y = 3/5 =⇒ y = 3/5 − 3/20 = 9/20
u + v = 1 =⇒ v = 8/20
y + z = t =⇒ z = 6/20
0
A A
3 9 3
B
The final contingency table will be; 0
20 20 5
1 3 2
B 10 10 5
1
4
3
4
1
Exercise 15
Exercise 16.
Let D: be the event that the video CD is defective
A: From store A
B: From store B
P (D|A) = 0.05
P (D|B) = 0.02
P (A) = 0.3
P (B) = 0.7
=⇒ P (D|A) = P P(A∩D) (A)
P (D|A)P (A)
= P (D|A)P (A)+P (D|B)P (B)
(0.05)(0.3)
= (0.05)(0.3)+(0.02)(0.7) = 0.57
Exercise 16
Exercise 17. Let X be the amount gained.
It costs Kshs. 10 to play, therefore the possible amounts that can be gained
are;
Kshs. 90,70,40,30 and -10
P (X = 90) = P (3apples) = 0.1 ∗ 0.1 ∗ 0.1 = 0.001
P (X = 70) = P (AAC) + P (ACA) + P (CAA) = 3 ∗ 0.12 ∗ 0.2 = 0.006
P (X = 40) = P (CCC) = 0.23 = 0.008

161
STA 2100 Probability and Statistics I

P (X = 30) = P (LLL) = 0.43 = 0.064


P (X = −10) = P (win none) = 1 − (0.001 + 0.006 + 0.008 + 0.064) = 0.921
This can then be summarized as follows:
x 90 70 40 30 -10
P (X = x) 0.001 0.006 0.008 0.064 0.921
P
E(X) = xP (X = x)
= (90 ∗ 0.001) + (70 ∗ 0.006) + ... + (−10 ∗ 0.921) = −6.46
Thus there is an expected loss per turn of Kshs. 6.46 Exercise 17
Exercise 18. t
P (X = 0) = 1/4
P (X = 1) = 1/4
P (X = 2) = 1/4
x 0 1 2
P (X = x) 1/4 1/2 1/4
(a) E(X) = 1
(b) E(X 2 ) = x2 P (X = x) = 02 ∗ 1/4 + .. + 22 ∗ 1/4
P

= 1.5
(c) E(X 2 − X) = (x2 − x)P (X = x) = E(X 2 ) − E(X)
P

= 1.5 − 1 = 0.5
t Exercise 18
Exercise 19. We can organize the table as shown below;

162
STA 2100 Probability and Statistics I

Thus;

10(1242069) − (796)(1546) 11444


rxy = p p =p = 0.5966
10(63776) − (796)2 11(243036) − (1546)2 (1144)(40244)

Exercise 19
Exercise 20. Thus, we have

163
STA 2100 Probability and Statistics I

We then have that - 62583


r  0 : 93
2.5(1828695447)

Exercise 20
Exercise 21. Now we arrange the data in descending order, and then rank 1,2,3,.
. . . .10 accordingly.
In case of a tie, the rank of each tied value is the mean of all positions they
occupy.
In x, for instance, 8.6 occupy ranks 5 and 6. So each has a rank of 5+62
= 5.5
Similarly in y =2000 occupies ranks 9 and 10, so each has rank 9.5
Now we come back to our formula,

6Σd
rs = 1 −
n(n2 − 1)

We compute d, square it and substitute its value in the formula as follows;

164
STA 2100 Probability and Statistics I

So here, n = 10 and d2 = 12.


P

So

6Σd 6(12)
rs = 1 − 2
=1− = 1 − 0.0727 = 0.9273
n(n − 1) 10(100 − 1)

Note: If we are provided with only ranks without giving the values of x and y we
can still find Spearman rank difference correlation rs by taking the difference of
the ranks and proceeding in the above shown manner.
Exercise 21
Exercise 22. We want to predict the final exam scores from the mid-term scores.
So let us designate ‘y’ for the final exam scores and ‘x’ for the mid-term exam
scores. We open the following table for the calculations.

165
STA 2100 Probability and Statistics I

b = [10(65071) − (785)(810)]/10(64521) − (785)2 = 0.5127

While the intercept is:

a = (810 − (785)(0.5127)/10 = 40.7531

Thus, the regression equation for this data is:


ŷ = 40.7531 + 0.5127x
We can use this to find the projected or estimated final scores of the students.
E.g. for the midterm score of 50 the projected final score is ŷ = 40.7531 +
0.5127(50) = 66.3881, which is a quite a good estimation.
To give another example, consider the midterm score of 70. Then the projected
final score is , ŷ = 40.7531 + 0.5127(70) = 76.6421 which is again a very good
estimation. Exercise 22
Exercise 42. This can be expressed as shown in the table below:
Company days Good Better Best
Avg. Revenue, (x) 40 130 220
P(X=x) 0.45 0.4 0.15
x.P(X=x) 18 52 33

166
STA 2100 Probability and Statistics I

Thus expected revenue is : xP (X = x) = 8+52+33 = 103 Hence,Company’s


P

expected profit per day is: (103 − 80)=Kshs.23 million.


Remember the company spends on average Kshs.80 million per day, that’s why
we have to subtract from the expected value. Exercise 42
Exercise 43. From this information, we can construct the following table:
X 1 2 3 4 5
P (X = x) 0.3 a 0.1 0.2 b
xP (X = x) 0.3 2a 0.3 0.8 5b
Compute the values of a and b
Determine the mode of the distribution
Determine Var(X)
Compute P (X > 2), P (X < 1) and P (X ≥ 2)
(1) From the table above, P (X = x) = 1 for it to meet the requirements
P

of a probability distribution. Thus;


P
P (X = x) = 0.3 + a + 0.1 + 0.2 + b = 1
Implying a + b = 0.4..........(i)
P
xP (X = x) = 0.3 + 2a + 0.3 + 0.8 + 5b = 3.1
Implying 2a + 5b = 1.7..........(ii) solving the simultaneous equations we find
a = 0.1 and b = 0.3
(2) To determine the mode of the distribution
There are two modes (values with highest frequency/probability: 1 and 5)
(3) To determine V ar(X), use the formula:V ar(x) = E(X 2 )−(E(X))2 , note
that we already have E(X) = 3.1
(4) To compute P (X > 2), P (X < 1) and P (X ≥ 2), we obtain these values
from the table above. For instance, P (X > 2) = 1−P (X < 3) = 1−[0.3+0.2] =
0.5 Exercise 43
Exercise 44. This can be solved as follows:
0
(1) Suppose we let F represent one Job is finished and F represent job is not
finished. Then P (F F ) = 0.3 ∗ 0.3 = 0.09
Remember the jobs are independent, so we can multiply the probabilities.
(2) For this case, just one job means; either first is finished but second is not
or First is not finished but second is finished.
0 0 0 0
This can be represented as follows: P (F F orF F ) = P (F F ) + P (F F ) =
(0.3 ∗ 0.7) + (0.7 ∗ 0.3) = 0.42
Both of the jobs are finished on time.
Just one of the jobs is finished on time. Exercise 44

167
STA 2100 Probability and Statistics I

Exercise 45. This question can be solved using the formula for Harmonic mean;
i.e.
2ab 2 ∗ 80 ∗ 50
Harmonic mean= = = 61.54km/h Exercise 45
a+b 80 + 50

168

You might also like