Unit 2
Unit 2
Arithmetic mean, geometric mean, harmonic mean, median and mode are the
popular measures of central tendency. The arithmetic mean, geometric mean and
harmonic mean are known as mathematical averages while median and mode are
positional averages.
i) It should be rigidly defined, based on all the observations and easy to calculate
and understand.
-21-
Manual on “Introduction to Statistical Methods”
Arithmetic Mean:
x 1 x 2 ..... x n 1
x
n
n
xi
If the observations x1, x2, ....., xn form a discrete frequency distribution with
respective frequencies f1, f2, , fn then arithmetic mean is given by:
f1 x 1 f 2 x 2 f 3 x 3 ..... f n x n 1
x
N
N
f i x i where N = fi
For grouped frequency distribution, mid values are determined for the various
class intervals and then arithmetic mean is computed as in case of a discrete frequency
distribution.
Sum of deviations of the given values from their arithmetic mean is always zero.
Sum of the squares of deviations of the given values from their arithmetic mean is
always minimum.
Example: If the weight of 9 ball of cotton are 89, 90, 102, 107, 108, 115, 117, 119 and
126 gm. Find the arithmetic mean by direct and short cut methods.
-22-
Manual on “Introduction to Statistical Methods”
Solution:
Direct Method:
n
i 1
xi 89 90 102 107 108 115 117 119 126
AM 108.11 gm
n 9
n
di
AM = A i 1
n
Where, n = number of observations; A = Assumed mean; di (deviation) = xi A
xi di = xi A
A = 108
89 89-108= -19
90 90-108 = -18
102 102-108 = -6
107 107-108 = -1
108 108-108 = 0
115 115-108 = 7
117 117-108 = 9
x i 972 d i 1
n
i 1
di 1
AM = 108 108 .11
n 9
-23-
Manual on “Introduction to Statistical Methods”
81, 94, 64, 80, 75, 69, 96, 66, 80, 91, 85, and 79
Solution:
n
i 1
xi 81 94 64 80 75 69 96 66 80 91 85 79
80
n 12
xi di = xi A
A = 75
81 81-75= 6
94 94-75 = 19
64 64-75 = -11
80 80-75 = 5
75 75-75 = 0
69 69-75 = -6
96 96-75 = 21
66 66-75 = -9
80 80-75 = 5
91 91-75 =16
85 85-75 = 10
79 79-75 = 4
x i 972 d i 60
-24-
Manual on “Introduction to Statistical Methods”
n
i 1
di 60
AM = A 75 75 5 80
n 12
Example: The following data grouped into 10 classes are the 405 Soybean plant
heights(cm) collected from a particular plot. Find the arithmetic mean of the plant by
Direct method and Step Deviation Methods:
x: 8-12 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
f: 6 17 25 86 125 77 55 9 4 1
Solution:
8-12 6 10 60 -4 -24
33-37 77 35 2695 1 77
43-47 9 45 405 3 27
48-52 4 50 200 4 16
53-57 1 55 55 5 5
-25-
Manual on “Introduction to Statistical Methods”
f1 x 1 f 2 x 2 f 3 x 3 ..... f n x n 1 12270
x
N
N
fi xi =
405
30 .3
xi A
Here , di = , where, A = 30 and size of class (h) = 5
h
1 24
AM = A f i x i h = 30 + 5 = 30.3 Cm
N 405
groups, based on n1, n2,,nk observations respectively then combined mean of all (n1 +
n2 + ..… nk) observations of the k groups taken together is:
n 1 x 1 n 2 x 2 ..... n k x k
xp
n 1 n 2 ..... n k
xw
w 1 x 1 w 2 x 2 w 3 x 3 ..... w k x k
w x i i
w i w i
Example: Find over all grade point average (OGPA) in a semester from the following:
OGPA is xw
w x i i
62 .9
= 6.99
w i 9
-26-
Manual on “Introduction to Statistical Methods”
Median: The median of distribution is that value which divides it into two equal parts.
Median is a positional average and is the suitable measure of central tendency when
individuals are ranked on the basis of some qualitative characteristics such as
intelligence, poverty etc., which cannot be measured numerically. It is also used when
open-ended classes are given at one or both the ends or data arise from some skewed
distribution such as income distribution.
h
Md L (N/2 C)
f
N = f
Demerits: Median is not based on all the observations. It is not suitable for further
treatment and is more sensitive to the fluctuations of sampling.
-27-
Manual on “Introduction to Statistical Methods”
Partition Values or Quantiles: Some times we not only need the mean or middle value
but also need values which divide the data in four or ten or hundred equal parts.
Quartiles: Are the three values which divide data into four equal parts. These are
denoted by Q1, Q2, Q3
Q1 is the value such that 25 % of the data fall below Q1 and 75% above Q1.
Q2 is the value such that 50 % of the data fall below Q2 and 50% above Q.
Q3 is the value such that 75 % of the data fall below Q3 and 25% above Q3.
iN/4 C
Qi L xh ; i 1, 2, 3
f
where L, C and f are lower limit, C.F. of preceding class and frequency of the class
containing that quartile. Q1 and Q3 are called lower and upper quartiles respectively and
Q2 is equal to the median.
Deciles: Are the nine values which divide the data into 10 equal parts and deciles are
denoted by D1, D2, , D9; where
iN/10 C
Di L xh ; i 1, 2, ....., 9
f
D1 is the value such that 10 % of the data fall below D1 and 90% above D1.
D2 is the value such that 20 % of the data fall below D2 and 80% above D2
D9 is the value such that 90 % of the data fall below D9 and 10% above D9
Percentiles: Are the ninety nine values which divide the data into 100 equal parts and are
denoted by P1, P2, , P99; where
iN/100 C
Pi L xh ; i 1, 2, ....., 99
f
Percentiles help to find the cut off values when the data be divided into a number of
categories and per cent of observations in various categories are given.
-28-
Manual on “Introduction to Statistical Methods”
n = 5 (odd number),
5 1
Median = value of th item
2
Example: Calculate the AM, median, Q1, D4, P55,. for the data on salaries of 1000
employees in a company.
Solution:
-29-
Manual on “Introduction to Statistical Methods”
Mode: Mode is defined as the value, which occurs most frequently in a set of
observations and around which other observations of the set cluster densely. Mode of a
distribution is not unique. If two observations have maximum frequency then the
distribution is bimodal. A distribution is called multi-modal if there are several values
that occurs maximum number of times. Mode is often used where we need the most
typical value e.g. in business forecasting the average required by the manufacturers of the
sizes of readymade garments, shoes etc.
f m - f1
Mode L h x
2f m - f1 - f 2
Here L, h, fm, f1 and f2 are respectively the lower limit of the modal class, the size
of modal class, frequency of the modal class, frequency preceding the modal class and
frequency following the modal class.
-30-
Manual on “Introduction to Statistical Methods”
Merits: Mode is not affected by extreme observations and can be calculated for open
ended distributions.
Demerits: It is ill defined, not based on all the observations, not unique and often does
not exit.
x: 1 2 3 4 5 6 7 8
f: 3 7 15 22 25 16 9 4
3, 5, 5, 7, 7, 7, 8, 8, 9, , 10
x : 3 5 7 8 9 10
f : 1 2 3 2 2 1
Maximum frequency = 3
Mode = 7
Illustration: A sample of 100 chapatis chosen randomly from a railway catering service
gave the following distribution of weights of chapattis
-31-
Manual on “Introduction to Statistical Methods”
No. of chapattis 8 12 16 24 18 14 8
L = 24 and h =1
f m f1 24 16
Mode L x h 24 1 24.57g
2 fm f 1 f 1 2 x 24 16 18
Illustration (Mode By The Method of Grouping) Find the mode of the following
frequency distribution.
Size (x) 1 2 3 4 5 6 7 8 9 10 11 12
(f) 3 8 15 23 35 40 32 28 20 45 14 6
Column III.: ignoring first frequency & combining remaining frequencies two by two
-32-
Manual on “Introduction to Statistical Methods”
Column VI. leaving first two frequencies and combining the frequencies three by three
Size Frequency
(x) I II III IV V VI
1 3
11
2 8 26
23
3 15 46
38
4 23 73
58
5 35 98
75
6 40 107
72
7 32 100
60
8 28 80
48
9 20 93
65
10 45 79
59
11 14 65
20
12 6
-33-
Manual on “Introduction to Statistical Methods”
(i) 45 10
(ii) 75 5, 6
(iii) 72 6, 7
(iv) 98 4, 5, 6
(v) 107 5, 6, 7
(vi) 100 6, 7, 8
On examining the possible mode that the value (i.e. repeated the maximum number of
Mode = 6
Geometric Mean: Geometric mean is more appropriate if the observations are measured
as ratios, proportions, growth rates or percentages. When the growth rates or increase in
production etc. are given for a number of years or periods then geometric mean should be
used as a measure of central tendency. If x1, x2, , xn be the n positive observations then
1
G = (x1x2 ..… xn)1/n or log G
n
logx i For a frequency distribution, the geometric
mean is defined as
1
G = (x1f1.x2f2….. xnfn)1/N or log G
N
f i log x i ,where N = fi
Properties of Geometric Mean:
ii) It is suitable for further mathematical treatment. If n1 and n2 are the sizes of two
data series with G1 and G2 as their geometric means respectively, then geometric
mean of combined series G is given by:
n 1 log G 1 n 2 log G 2
Log G
n1 n 2
-34-
Manual on “Introduction to Statistical Methods”
Demerits:
ii) It cannot be calculated when any value is zero or negative and gives an absurd
value if computed in case of even number of negative observations
iii) Like arithmetic mean it is also affected by the extreme values but to a lesser
extent.
If x1, x2, , xn are n observed values, the harmonic mean is given by:
n n
H
1 1 1 1
x1 x 2
.....
xn
x
i
N N
H where N =fi
f1 f 2 f fi
x1 x 2
..... n
xn
x
i
-35-
Manual on “Introduction to Statistical Methods”
Demerits:
Relation among Arithmetic Mean (A), Geometric Mean (G) and Harmonic Mean (H):
i) G AxH
Example: A machine was purchased in 1990. Its value appreciated at the rate of 5% p.a.,
for the first 4 years and then at rate of 8% p.a, for 6 years. Find the average rate of
decrease.
Solution:
x :5 8
f : 4 6
f N 10
1
log G = (f1 log x1 + f2 log x2)
N
1
= (4 log 5 + 6 log 8)
10
1
= (4 x 0.699 + 6 x 0.903 )
10
= 0.823
G = Antilog (0.823) = 6.653
Example: An aero plane flies along the four sides of a square at speeds of 100, 200, 300
and 400 km per hours respectively. Find the average speed.
Solution: The average speed is given by the Harmonic mean of speeds along with four
sides
4 4
HM 192 km/hr
1 1 1 1 25
100 200 300 400 1200
-36-
Manual on “Introduction to Statistical Methods”
Measures of Dispersion
For comparing two distributions, average value may not give complete picture as
it may be possible that distributions may have the same mean but they may differ in
scatter or spread. The degree to which numerical data tend to scatter or spread around the
central value is called dispersion. Further, any quantity that measures the degree of
spread or scatter around the central value is called measure of dispersion. Important
measures of dispersion are:
i) Range
iii) Variance
v) Coefficient of Variation
Range, mean deviation, variance and standard deviation are absolute measures of
dispersion while coefficient of variation (CV), which is independent of the units of
measurement, is a relative measure of dispersion. CV is a pure number and often
expressed as percentage.
Range: It is defined as the difference of the two extreme observations of a data set.
Range is used when we need a rough comparison of two or more sets of data or when the
observations are too scattered to justify the computation of a more precise measure of
dispersion. Range is the simplest measure of dispersion and it is easy to interpret.
However, range is based only on two extreme observations and has greater chances of
being affected by fluctuations in sampling. It is also a crude and less reliable measure of
dispersion.
Mean Deviation (MD): The arithmetic mean of the absolute deviations about any point
A is called the mean deviation about the point A. The point ‘A’ may be taken as mean,
median or mode of the distribution. It is a useful measure of dispersion in business and
economics when extreme observations influence the standard deviation unduly.
-37-
Manual on “Introduction to Statistical Methods”
The MD of n observations x1, x2, x3, , xn about any point A is given by
1 n
MD x i A
n i 1
1 n
MD fi xi - A
N i 1
where N f i
Mean deviation is rigidly defined, based on all the observations, easy to calculate and
relatively easy to interpret. It is least when deviations are taken about median and is not
affected much by the extreme values. MD has a serious demerit that it ignores the signs
of deviations and hence creates some artificiality in the result.
Variance (σ 2): The arithmetic mean of the squared deviations taken about mean of a
series is called variance (σ2). Mathematically, variance is defined as:
1 n 1
σ2 = (x i x) 2 x i x 2
2
n i 1 n
where N f i and x i i
1 1 fx
f i (x i x) 2 f i x i x 2
2
σ2 =
N N N
Standard Deviation (σ): The positive square root of the arithmetic mean of the squared
deviations of observations in a data series about its arithmetic mean is called standard
deviation (σ) i.e. it is the square root of variance.
1 n 1
(x i x) 2 xi x2
2
σ=
n i 1 n
1 1
f i (x i x) 2 fi xi x 2
2
σ=
N N
Standard deviation is a stable measure and is regarded as the best and most
powerful measure of dispersion. It is relatively less sensitive to the sampling fluctuations
-38-
Manual on “Introduction to Statistical Methods”
and also suitable for further mathematical treatment. Standard deviation is independent
of change of origin but not of scale. The main demerit of standard deviation is that it is
likely to be affected by sampling fluctuations and it cannot be calculated in case of open-
ended distributions.
i) Compute mid values of class intervals if not given. Let x1, x2, , xn be the mid
values then for every class interval find ui = (xi – A)/h where A = assumed mean and h =
width of the class interval.
Variance ( 2x ) h 2 f u i
2
i / N ( f i u i / N) 2 = h2 u2
Relative measures of dispersion are used while comparing the variability of series
having different or same units of measurement. They can also be used for comparing two
or more series for consistency. Coefficient of variation is defined as
SD
Coefficient of Variation (CV) = x 100
Mean
n 1σ 12 n 2 σ 22 n 1d 12 n 1 d 22
σp
2
n1 n 2
2 = SD of second group
d1 = (X1 Xp ) ; d2 (X2 Xp )
-39-
Manual on “Introduction to Statistical Methods”
Example: Find the mean deviation about AM and hence find the coefficient of mean
deviation.
MD = f x - x /N = 593.8./32 = 18.55
MD 18.55
Coefficient of MD 100 100 32.6%
AM 56.9
Example: The runs scored by two batsmen X and Y in 10 innings are given below. Find
out which is better runner and who is more consistent player?
Total
Y 65 68 52 47 63 25 25 60 55 60 520
x x 2 1225 3025 2500 2025 4900 1600 400 1521 6241 2025 25462
-40-
Manual on “Introduction to Statistical Methods”
σx 50.46
CV = 100 100 91.74%
X 55
σy 14.71
CV = 100 100 28.28 %
Y 52
-41-
Manual on “Introduction to Statistical Methods”
For a symmetrical distribution mean, mode and median are equal and in case of a
positively skewed distribution AM > Median > Mode and for a negatively skewed
distribution AM < Median < Mode.
μ 32
Skewness may be defined in terms of moments as γ1 β1 where, β 1 3 and
μ2
r (1 / N) fi (xi x)r is rth order central moment of the variable X and sign of γ1
Symmetrical distribution
-42-
Manual on “Introduction to Statistical Methods”
Kurtosis: The flatness or peakedness of top of the curve is called kurtosis. Kurtosis is
measured in terms of moments and is given by:
2
2 4 / 2 and γ2 = β2 - 3
If β2 = 3 γ2 = 0, then the curve is neither peaked not flat but normal, it is called
Mesokurtic curve.
β2 > 3 γ2 > 0, then the curve is more peaked than normal, i.e. Leptokurtic curve
β2 < 3 γ2 < 0, then the curve is less peaked than normal, i.e. Platykurtic curve
Leptokurtic
Mesokurtic
Platykurtic
-43-