Basic Statistics Lecture Note 2024/2025
CHAPTER 3
3. MEASURES OF CENTERAL TENDENCY
3.1 Introduction
On the scale of values of a variable there is a certain stage at which the largest number of items
tends to cluster/center around. Since this stage is usually in the center of distribution, the tendency
of the statistical data to get concentrated at this stage/value is called" central tendency“. The various
measures determining the actual value at which the data tends to concentrate are called measures of
central tendency. So, a measure of central location is the single value that best represents the whole
series. This single value is called the average of the group. An average which is representative is
called typical average and an average which is not representative and has only a theoretical value is
called a descriptive average.
A typical average should possess the following:
It should be rigidly defined.
It should be based on all observation under investigation.
It should be as little as affected by extreme observations.
It should be capable of further algebraic treatment.
It should be as little as affected by fluctuations of sampling.
It should be ease to calculate and simple to understand.
The Summation Notation (
Statistical Symbols: Let X1, X2 ,X3 …XN be a number of measurements where N is the total number
of observation and Xi is ith observation. Very often in statistics an algebraic expression of the form
X1+X2+X3+...+XN is used in a formula to compute a statistic. It is tedious to write an expression
like this very often, so mathematicians have developed a shorthand notation to represent a sum of
scores, called the summation notation.
N
The symbol X
i 1
i is a mathematical shorthand for X1+X2+X3+...+XN
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."
Example 3.1: Suppose the following were scores made on the first homework assignment for five
students in the class: 5, 7, 7, 6, and 8. in this example set of five numbers, where N=5, the
summation could be written:
BY: Habtamu W.(MSc in Biostatistics) Page 1
Basic Statistics Lecture Note 2024/2025
The "i=1" in the bottom of the summation notation tells where to begin the sequence of summation.
If the expression were written with "i=3", the summation would start with the third number in the
set. For example:
Sometimes if the summation notation is used in an expression and the expression must be written a
number of times, as in a proof, then a shorthand notation for the shorthand notation is employed.
When the summation sign "∑" is used without additional notation, then "i=1" and "N" are assumed.
For example:
Properties of summation
n
k nk
i 1
where k is any constant
n n
kX
i 1
i k X i
i 1
where k is any constant
n n
(a bX
i 1
i ) na b X i
i 1
where a and b are any constant
n n n
(X
i 1
i Yi ) X
i 1
i Yi
i 1
Example 3.2: considering the following data determine
X Y
5 6
7 7
7 8
6 7
8 8
5 5
a) Xi
i 1
e) (X
i 1
i Yi )
BY: Habtamu W.(MSc in Biostatistics) Page 2
Basic Statistics Lecture Note 2024/2025
5 5
b) Yi
i 1
f) X Y
i 1
i i
5 5
10 X
2
c) g) i
i 1 i 1
5 5 5
d) ( X i Yi )
i 1
h) ( X i )( Yi )
i 1 i 1
Solutions:
5
a) X
i 1
i 5 7 7 6 8 33
5
b) Y
i 1
i 6 7 8 7 8 36
5
c) 10 5 *10 50
i 1
5
d) (X
i 1
i Yi ) (5 6) (7 7) (7 8) (6 7) (8 8) 69 33 36
5
e) (X
i 1
i Yi ) (5 6) (7 7) (7 8) (6 7) (8 8) 3 33 36
5
f) X Y
i 1
i i 5 * 6 7 * 7 7 * 8 6 * 7 8 * 8 241
5
X 5 2 7 2 7 2 6 2 8 2 223
2
g) i
i 1
5 5
h) ( X i )( Yi ) 33 * 36 1188
i 1 i 1
3.2 Types of Measures of Central Tendency
There are several different measures of central tendency; each has its advantage and disadvantage.
The Mean (Arithmetic, Geometric and Harmonic)
The Mode
The Median
Quantiles (Quartiles, Deciles and Percentiles)
The choice of these averages depends up on which best fit the property under discussion.
1.2.1 Arithmetic Mean: Is defined as the sum of the magnitude of the items divided by the
number of items. The mean of X1, X2 ,X3 …Xn is denoted by A.M ,m or X and is given
by:
BY: Habtamu W.(MSc in Biostatistics) Page 3
Basic Statistics Lecture Note 2024/2025
X 1 X 2 ... X n
X
n
n
X i
X i 1
n
If X1 occurs f1 times, if X2occurs f2 times, … , if Xn occurs fn times
k
fX i i
X
k
f
i 1
Then the mean will be k , where k is the number of classes and i n
f
i 1
i
i 1
Example 3.3: Obtain the mean of the following number
2, 7, 8, 2, 7, 3, 7
Solution:
Xi fi Xi f i
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
4
f i Xi
36
X i 1
4
5.15
f
7
i
i 1
Arithmetic Mean for Grouped Data
If data are given in the shape of a continuous frequency distribution, then the mean is obtained as
follows:
k
f i Xi th th
X i 1
k
, Where Xi =the class mark of the i class and fi = the frequency of the i class
f i 1
i
Example 3.4: calculate the mean for the following age distribution.
Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
BY: Habtamu W.(MSc in Biostatistics) Page 4
Basic Statistics Lecture Note 2024/2025
Solutions:
First find the class marks
Find the product of frequency and class marks
Find mean using the formula.
Class fi Xi Xifi
6- 10 35 8 280 6
11- 15 23 13 299 f i Xi
1575
16- 20 15 18 270 ►X i 1
15.75
6
f
100
21- 25 12 23 276 i
i 1
26- 30 9 28 252
31- 35 6 33 198 Special properties of Arithmetic mean
Total 100 1575 1. The sum of the deviations of a set of items from their
n
mean is always zero. i.e. ( X i X ) 0.
i 1
2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
n n
( Xi X ) 2 ( X i A) 2 , A X
i 1 i 1
3. If X 1 is the mean of n1 observations, if X 2 is the mean of n 2 observations, … , if X k is the
mean of n k observation, then the mean of all the observation in all groups often called the
combined mean is given by:
k
X n X 2 n 2 .... X k n k X n i i
Xc 1 1 i 1
n1 n 2 ...n k
k
n
i 1
i
Example 3.5: In a class there are 30 females and 70 males. If females averaged 60 in an
examination and boys averaged 72, find the mean for the entire class.
Solutions:
Females Males
X 1 60 X 2 72
n1 30 n2 70
2
X 1n1 X 2 n2 X n i i
Xc i 1
n1 n2 2
n i 1
i
30(60) 70(72) 6840
Xc 68.40
30 70 100
BY: Habtamu W.(MSc in Biostatistics) Page 5
Basic Statistics Lecture Note 2024/2025
4. If a wrong figure has been used when calculating the mean the correct mean can be obtained
without repeating the whole process using:
(CorrectValue WrongValue)
CorrectMean WrongMean
n
Where n is total number of observations.
Example 3.6: An average weight of 10 students was calculated to be [Link] it was discovered
that one weight was misread as 40 instead of 80 kg. Calculate the correct average weight.
Solutions:
(CorrectValue WrongValue)
CorrectMean WrongMean
n
(80 40)
CorrectMean 65 65 4 69k.g.
10
5. The effect of transforming original series on the mean.
a) If a constant k is added/ subtracted to/from every observation then the new mean will
be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean will be k*old
mean
Example 3.7:
1. The mean of n Tetracycline Capsules X1, X2, …, Xn are known to be 12 gm. New set of
capsules of another drug are obtained by the linear transformation Yi = 2Xi – 0.5 ( i = 1,
2, …, n ) then what will be the mean of the new set of capsules.
Solutions:
NewMean 2 * OldMean 0.5 2 * 12 0.5 23.5
2. The mean of a set of numbers is 500.
a) If 10 is added to each of the numbers in the set, then what will be the mean of the new
set?
b) If each of the numbers in the set are multiplied by -5, then what will be the mean of the
new set?
Solutions:
a ).NewMean OldMean 10 500 10 510
b).NewMean 5 * OldMean 5 * 500 2500
BY: Habtamu W.(MSc in Biostatistics) Page 6
Basic Statistics Lecture Note 2024/2025
Weighted Arithmetic Mean
While calculating simple arithmetic mean, all items were assumed to be of equally importance (each
value in the data set has equal weight). When the observations have different weight, we use
weighted average. Weights are assigned to each item in proportion to its relative importance.
If , represent values of the items and , are the corresponding weights,
then the weighted mean, ( ̅ ) is given by
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
Example 3.8:
A student’s final mark in Mathematics, Physics, Chemistry and Biology are respectively A, B, D
and C. If the respective credits received for these courses are 4, 4, 3 and 2, determine the
approximate average mark the student has got for the course.
Solution
We use a weighted arithmetic mean, weight associated with each course being taken as the number
of credits received for the corresponding course.
4 3 1 2 Total
4 4 3 2 13
16 12 3 4 35
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
= = = 2.69, Average mark of the student is approximately 2.69.
Merits and Demerits of Arithmetic Mean
Merits:
It is based on all observation.
It is suitable for further mathematical treatment.
It is stable average, i.e. it is not affected by fluctuations of sampling to some extent.
It is easy to calculate and simple to understand.
Demerits:
It is affected by extreme observations.
It cannot be used in the case of open end classes.
It cannot be determined by the method of inspection.
BY: Habtamu W.(MSc in Biostatistics) Page 7
Basic Statistics Lecture Note 2024/2025
It cannot be used when dealing with qualitative characteristics, such as intelligence, honesty,
beauty.
3.2.2 Geometric Mean
The geometric mean like arithmetic mean is calculated average. It is used when observed values are
measured as ratios, percentages, proportions, indices or growth rates.
The geometric mean, G.M. of a set of n observation , is defined as the nth root of their
product.
G.M n x1 .x2 xn = antilog ( ∑ )
Taking the logarithms of both sides
1
log(G.M) log(n X 1 * X 2 * ... * X n ) log(X 1 * X 2 * ... * X n ) n
1 1
log(G.M) log(X 1 * X 2 * .... * X n ) (log X 1 log X 2 ... log X n )
n n
n
1
log(G.M) log X i
n i 1
The logarithm of the G.M of a set of observation is the arithmetic mean of their logarithm.
1 n
G.M Anti log( log X i )
n i 1
Example 3.9:
Find the G.M of the numbers 2, 4, 8.
Solutions:
G.M n X1 * X2 * ... * Xn 3 2 * 4 * 8 3 64 4
Geometric mean for discrete data arranged in FD:- When the numbers , occur with
frequencies , , respectively, then the geometric mean is obtained by
G.M . n x1f1 .x2f2 ..xmfm = antilog ( ∑ )
Example 3.9
Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6
Frequency 2 3 1 2
BY: Habtamu W.(MSc in Biostatistics) Page 8
Basic Statistics Lecture Note 2024/2025
G.M. = √ = 4.236
The geometric mean for the given data is 4.236.
Geometric mean for grouped data: The above formula can also be used whenever the frequency
distribution is grouped continuous, class marks of the class intervals are considered as xi.
Properties of geometric mean
It is less affected by extreme values.
It takes each and every observation into consideration.
If the value of one observation is zero its values becomes zero.
3.2.3 Harmonic Mean
It is a suitable measure of central tendency when the data pertains to speed, rate and time. The
harmonic of n values is defined as n divided by the sum of their reciprocal.
Harmonic mean for individual series:- If , are n observations, then harmonic mean
can be represented by the following formula:
n
H .M
1 1 1
x1 x2 xn
Example 3.10: A cyclist pedals from his house to his college at speed of 10 km/hr and back from
the college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
2
H.M 12km / hr
1 1
10 15
Harmonic mean for discrete data:- If the data is arranged in the form of frequency distribution
n
H .M m
, where n f k
f1 f 2 f
m k 1
x1 x 2 xm
Harmonic mean for continuous grouped data: Whenever the frequency distribution are grouped
continuous, class marks of the class intervals are considered as and the above formula can be
used as
BY: Habtamu W.(MSc in Biostatistics) Page 9
Basic Statistics Lecture Note 2024/2025
m
H.M. = where n f k
∑ k 1
is the class mark of ith class
Properties of harmonic mean
It is unique for a given set of data.
It takes each and every observation into consideration.
Difficult to calculate and understand.
Appropriate measure of central tendency in situations where data is in ratio, speed or rate.
3.3.4 The Mode or Modal Value
The mode or the modal value is the value with the highest frequency and denoted by ̂. The mode
may not exist and even if it does exist, it may not be unique. A distribution is called a bimodal
distribution if it has two data values that appear with the greatest frequency. If a distribution has
more than two modes, then the distribution is multimodal. If a distribution has no modes, then the
distribution is non-modal.
Mode for ungrouped data: In case of discrete distribution the value having the maximum
frequency is the modal value.
Examples 3.11:
1. Find the mode of 5, 3, 5, 8, 9
The Mode ( ̂ ) =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal ( ̂ ): 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
Mode for Grouped Data
If data are given in the shape of continuous frequency distribution, the mode is defined as:
1
X̂ L mo w
1 2
Where:
BY: Habtamu W.(MSc in Biostatistics) Page 10
Basic Statistics Lecture Note 2024/2025
Xˆ the mod e of the distribution
w the sizeof the mod al class
1 f mo f1
2 f mo f 2
f mo frequencyof the mod al class
f1 frequencyof the class preceedingthe mod al class
f 2 frequencyof the class followingthe mod al class
Note: The modal class is a class with the highest frequency.
Example 3.12: Following is the distribution of the size of certain farms selected at random from a
district. Calculate the mode of the distribution.
Size of farms No. of farms
5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3
Solutions:
45 55 is the mod al class, sin ce it is a class with the highest frequency.
Lmo 45
w 10
1 f mo f1 2
2 f mo f 2 26
f mo 31
f1 29
f2 5
ˆ 45 10 2
X
2 26
45.71
Merits and Demerits of Mode
Merits:
It is not affected by extreme observations.
BY: Habtamu W.(MSc in Biostatistics) Page 11
Basic Statistics Lecture Note 2024/2025
Easy to calculate and simple to understand.
It can be calculated for distribution with open end class
Demerits:
It is not rigidly defined.
It is not based on all observations
It is not suitable for further mathematical treatment.
It is not stable average, i.e. it is affected by fluctuations of sampling to
some extent.
Often its value is not unique.
Note: being the point of maximum density, mode is especially useful in finding the most popular
size in studies relating to marketing, trade, business, and industry. It is the appropriate average to be
used to find the ideal size.
3.2.5 Median
The median is as its name indicates the middle most value in the arrangement which divides the
data into two equal parts. It is obtained by arranging the data in an increasing or decreasing order of
magnitude. If X1, X2, …Xn be the observations, then the numbers arranged in ascending order will
be X[1], X[2], …X[n], where X[i] is ith smallest value ( i.e. X[1]< X[2]< …<X[n] )
~
Median is denoted by X .
Median for ungrouped data: We arrange the sample in ascending order of the variable of interest.
Then if the sample size n is odd the median is the middle value or the sample size n is even the
median is the average of the two middle values.
The median is obtained by
X ( n1) 2 th , If n is odd.
~
X 1
(X X
n 2
)th If n is even
2 ( n 2) 1
Example: Find the median of the following numbers.
a) 6, 5, 2, 8, 9, 4.
b) 2, 1, 8, 3, 5
Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9 b) Order the data: 1, 2, 3, 5, 8 Here n=5, which is odd.
Here n=6, which is even, then the middle value is the 3rd observation. So the median is 3
BY: Habtamu W.(MSc in Biostatistics) Page 12
Basic Statistics Lecture Note 2024/2025
~ 1
X (X n X n )
2 [2] [ 1]
2
1
( X [3] X [ 4 ] )
2
1
( 5 6) 5.5
2
Median for grouped data: If data are given in the shape of continuous frequency distribution, the
median is defined as:
~ w n
X L med ( c)
f med 2
Where :
L med lower class boundary of the median class.
w the size of the median class
n total number of observations.
c the cumulativefrequency( less than type) preceeding the median class.
f med thefrequency of the median class.
Remark:
The median class is the class with the smallest cumulative frequency (less than type) greater than
n
or equal to .
2
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
First find the less than cumulative frequency.
Identify the median class.
Find median using formula.
Class Frequency [Link](less
than type)
40-44 7 7
BY: Habtamu W.(MSc in Biostatistics) Page 13
Basic Statistics Lecture Note 2024/2025
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75
n 75
37.5
2 2
39 is the first cumulative frequencyto be greater thanor equalto 37.5
50 54 is the median class.
L 49.5, w 5
med
n 75, c 17, f 22
med
~
X L w ( n c)
med f 2
med
49.5 5 (37.5 17)
22
54.16
Merits and Demerits of Median
Merits:
Median is a positional average and hence not influenced by extreme observations.
Can be calculated in the case of open end intervals.
Median can be located even if the data are incomplete.
Demerits:
It is not a good representative of data if the number of items is small.
It is not amenable to further algebraic treatment.
It is susceptible to sampling fluctuations.
Remark: In the case of symmetrical distribution; mean, median and mode coincide. That is,
mean=median = mode. However, for a moderately asymmetrical (nonsymmetrical) distribution,
mean and mode lie on the two ends and median lies between them and they have the following
important empirical relationship, which is (Mean – Mode) = 3(Mean - Median).
BY: Habtamu W.(MSc in Biostatistics) Page 14
Basic Statistics Lecture Note 2024/2025
1.2.6 Quantiles
When a distribution is arranged in order of magnitude of items, the median is the value of the
middle term. Their measures that depend up on their positions in distribution quartiles, deciles, and
percentiles are collectively called quantiles.
[Link] Quartiles
Quartiles are measures that divide the frequency distribution in to four equal parts. The value of the
variables corresponding to these divisions are denoted Q1, Q2, and Q3 often called the first, the
second and the third quartile respectively. Q1 is a value which has 25% items which are less than or
equal to it. Similarly Q2 has 50%items with value less than or equal to it and Q3 has 75% items
whose values are less than or equal to it.
iN
To find Qi (i=1, 2, 3) we count of the classes beginning from the lowest class.
4
For grouped data: we have the following formula
w ( iN c) , i 1,2,3
Q
i LQ i f 4
Qi
Where :
L lower class boundary of the quartile class.
Qi
w the size of the quartile class
N total number of observations.
c the cumulative frequency (less than type) preceeding the quartile class.
f thefrequency of the quartile class.
Qi
Remark:
The quartile class (class containing Qi ) is the class with the smallest cumulative frequency (less
iN
than type) greater than or equal to .
4
1) Deciles: Deciles are measures that divide the frequency distribution in to ten equal parts.
The values of the variables corresponding to these divisions are denoted D1, D2,.. D9 often called the
first, the second,…, the ninth deciles respectively.
iN
To find Di (i=1, 2,..9) we count of the classes beginning from the lowest class.
10
For grouped data: we have the following formula
BY: Habtamu W.(MSc in Biostatistics) Page 15
Basic Statistics Lecture Note 2024/2025
w iN
Di LD i ( c) , i 1,2,..., 9
f Di 10
Where :
LDi lower class boundaryof the decile class.
w the size of the decileclass
N total number of observations.
c the cumulative frequency (less than type) preceeding the decile class.
f Di thefrequency of the decile class.
Remark:
The deciles class (class containing Di) is the class with the smallest cumulative frequency (less
iN
than type) greater than or equal to .
10
2) Percentiles: Percentiles are measures that divide the frequency distribution in to hundred equal
parts. The values of the variables corresponding to these divisions are denoted P1, P2,.. P99 often
called the first, the second,…, the ninety-ninth percentile respectively.
iN
To find Pi (i=1, 2,..99) we count of the classes beginning from the lowest class.
100
For grouped data: we have the following formula
w iN
P L ( c) , i 1,2,..., 99
i P f 100
i P
i
Where :
L lower class boundary of the percentile class.
P
i
w the size of the percentile class
N total number of observations.
c the cumulative frequency (less than type) preceeding the percentile class.
f thefrequency of the percentile class.
P
i
Remark:
The percentile class (class containing Pi) is the class with the small cumulative frequency
iN
(Less than type) greater than or equal to .
100
BY: Habtamu W.(MSc in Biostatistics) Page 16
Basic Statistics Lecture Note 2024/2025
Example: Considering the following distribution
Calculate:
a) All quartiles.
b) The 7th decile.
c) The 90th percentile.
Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:
First find the less than cumulative frequency.
Use the formula to calculate the required quantile.
Values Frequency [Link](less
than type)
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493
a) Quartiles:
i. Q1
- determine the class containing the first quartile.
N
123.25
4
170 180 is the class containingthe first quartile.
BY: Habtamu W.(MSc in Biostatistics) Page 17
Basic Statistics Lecture Note 2024/2025
LQ 170 ,
1
w 10
N 493 , c 88 , f Q 72
1
w N
Q1 LQ1 ( c)
fQ 4
1
10
170 (123.25 88)
72
174.90
ii. Q2
- determine the class containing the second quartile.
2* N
246.5
4
190 200 is the class containing the sec ond quartile.
LQ 190 ,
2
w 10
N 493 , c 244 , f Q 107
2
w 2* N
Q2 LQ ( c)
2
fQ 2
4
10
170 (246.5 244)
72
190.23
iii. Q3
- determine the class containing the third quartile.
3* N
369.75
4
200 210 is the class containing the third quartile.
LQ 200 ,
3
w 10
N 493 , c 351 , f Q 49
3
BY: Habtamu W.(MSc in Biostatistics) Page 18
Basic Statistics Lecture Note 2024/2025
w 3* N
Q3 LQ 3 ( c)
fQ3
4
10
200 (369.75 351)
49
203.83
b) D7
- determine the class containing the 7th decile.
7* N
345.1
10
190 200 is the class containingthe seventh decile.
LD 190 ,
7
w 10
N 493 , c 244 , f D 107
7
w 7* N
D7 LD ( c)
7
f D 10
7
10
190 (345.1 244)
107
199.45
c) P90
- determine the class containing the 90th percentile.
90 * N
443.7
100
220 230 is the class containingthe 90th percentile.
LP 220 ,
90
w 10
N 493 , c 434 , f P 3107
90
BY: Habtamu W.(MSc in Biostatistics) Page 19
Basic Statistics Lecture Note 2024/2025
w 90 * N
P90 LP ( c)
90
f P 100
90
10
220 (443.7 434)
31
223.13
BY: Habtamu W.(MSc in Biostatistics) Page 20