0% found this document useful (0 votes)
16 views23 pages

Unit 2

Uploaded by

mnayana0910
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views23 pages

Unit 2

Uploaded by

mnayana0910
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Manual on “Introduction to Statistical Methods”

Measures of Central Tendency

Different observations have a tendency to concentrate around a central point in a


data series which is often called central tendency. A central tendency or an average can
also be defined as a single value within the range of the data that represents all the values
in the data series. Since an average is somewhere within the range of the data, it is
sometimes called a measure of central value or location of a distribution.

Arithmetic mean, geometric mean, harmonic mean, median and mode are the
popular measures of central tendency. The arithmetic mean, geometric mean and
harmonic mean are known as mathematical averages while median and mode are
positional averages.

Characteristics of an Ideal Measure:

i) It should be rigidly defined, based on all the observations and easy to calculate
and understand.

ii) It should be suitable for further mathematical treatment.

iii) It should be least affected by extreme values and fluctuations in sampling.

-21-
Manual on “Introduction to Statistical Methods”

Arithmetic Mean:

Arithmetic mean of a group of n observations x1, x2, ....., xn is defined as

x 1  x 2  .....  x n 1
x
n

n
 xi
If the observations x1, x2, ....., xn form a discrete frequency distribution with
respective frequencies f1, f2, , fn then arithmetic mean is given by:

f1 x 1  f 2 x 2  f 3 x 3  .....  f n x n 1
x
N

N
 f i x i where N = fi
For grouped frequency distribution, mid values are determined for the various
class intervals and then arithmetic mean is computed as in case of a discrete frequency
distribution.

Properties of Arithmetic Mean:

 Arithmetic mean is rigidly defined and based on all the observations.

 It is suitable for further mathematical treatment and is least affected by


fluctuations in samplings.

 Sum of deviations of the given values from their arithmetic mean is always zero.

 Sum of the squares of deviations of the given values from their arithmetic mean is
always minimum.

Demerits of Arithmetic Mean:

1. Arithmetic mean cannot be used as a suitable measure of central tendency if we are


dealing with qualitative characteristics or in case of extremely asymmetrical
distributions.

2. Arithmetic mean is likely to be affected by extreme values.

3. Arithmetic mean cannot be calculated in case of open-ended classes.

Example: If the weight of 9 ball of cotton are 89, 90, 102, 107, 108, 115, 117, 119 and
126 gm. Find the arithmetic mean by direct and short cut methods.

-22-
Manual on “Introduction to Statistical Methods”

Solution:

Direct Method:


n
i 1
xi 89  90  102  107  108  115  117  119  126
AM   108.11 gm
n 9

Short cut Method:


n
di
AM = A  i 1

n
Where, n = number of observations; A = Assumed mean; di (deviation) = xi A

xi di = xi A
A = 108

89 89-108= -19

90 90-108 = -18

102 102-108 = -6

107 107-108 = -1

108 108-108 = 0

115 115-108 = 7

117 117-108 = 9

119 119 -108 = 11

126 126 -108 = 18

x i  972 d i 1


n
i 1
di 1
AM =  108   108 .11
n 9

-23-
Manual on “Introduction to Statistical Methods”

Example: Find the mean of the following temperatures recorded in an experiment in an


Meteorology laboratory.

81, 94, 64, 80, 75, 69, 96, 66, 80, 91, 85, and 79

Solution:

Direct Method: A.M. =


n
i 1
xi 81  94  64  80  75  69  96  66  80  91  85  79
  80
n 12

Short cut Method:

xi di = xi A
A = 75

81 81-75= 6

94 94-75 = 19

64 64-75 = -11

80 80-75 = 5

75 75-75 = 0

69 69-75 = -6

96 96-75 = 21

66 66-75 = -9

80 80-75 = 5

91 91-75 =16

85 85-75 = 10

79 79-75 = 4

x i  972 d i  60

-24-
Manual on “Introduction to Statistical Methods”


n
i 1
di 60
AM = A  75   75  5  80
n 12

Example: The following data grouped into 10 classes are the 405 Soybean plant
heights(cm) collected from a particular plot. Find the arithmetic mean of the plant by
Direct method and Step Deviation Methods:

x: 8-12 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57

f: 6 17 25 86 125 77 55 9 4 1

Solution:

Class Frequency Mid Value fixi di=(xi A)/h fidi


interval fi (xi) A=30, h =5

8-12 6 10 60 -4 -24

13-17 17 15 255 -3 -51

18-22 25 20 500 -2 -50

23-27 86 25 2150 -1 -86

28-32 125 30 3750 0 0

33-37 77 35 2695 1 77

38-42 55 40 2200 2 110

43-47 9 45 405 3 27

48-52 4 50 200 4 16

53-57 1 55 55 5 5

Total 405 12270 24

-25-
Manual on “Introduction to Statistical Methods”

Method-I (Direct Method)

f1 x 1  f 2 x 2  f 3 x 3  .....  f n x n 1 12270
x
N

N
 fi xi =
405
 30 .3

Method – II (Step Deviation Method)

xi  A
Here , di = , where, A = 30 and size of class (h) = 5
h

1  24
AM = A    f i x i  h = 30 +  5 = 30.3 Cm
N  405

Pooled or Combined Arithmetic Mean: If x 1 , x 2 , ....., x k are the arithmetic means of k

groups, based on n1, n2,,nk observations respectively then combined mean of all (n1 +
n2 + ..… nk) observations of the k groups taken together is:

n 1 x 1  n 2 x 2  .....  n k x k
xp 
n 1  n 2  .....  n k

Weighted Mean: When different values have unequal weightage or importance or


contributions then instead of simple mean the weighted mean is used. Let x1, x2, , xk be
k values with weights w1, w2, ,wk respectively then weighted mean is

xw 
w 1 x 1  w 2 x 2  w 3 x 3  .....  w k x k

w x i i

w i w i

Example: Find over all grade point average (OGPA) in a semester from the following:

Course Cr. Hrs. (wi) Grade point (xi) wixi

Stat-101 2+1 6.3 18.9

Comp-101 1+1 7.0 14.0

Math-101 3+1 7.5 30.0

Total: 6+3 62.9

OGPA is xw 
w x i i

62 .9
= 6.99
w i 9

-26-
Manual on “Introduction to Statistical Methods”

Median: The median of distribution is that value which divides it into two equal parts.
Median is a positional average and is the suitable measure of central tendency when
individuals are ranked on the basis of some qualitative characteristics such as
intelligence, poverty etc., which cannot be measured numerically. It is also used when
open-ended classes are given at one or both the ends or data arise from some skewed
distribution such as income distribution.

Median is computed by arranging the observations in ascending or descending


order of magnitude. If n, the number of observations is odd, then the size of (n+1)/2th
observation will be the median and if n is even then average of two middle observations
is the median.

Median for discrete frequency distribution is the observation corresponding to the


cumulative frequency (less than type) just greater than N/2.

For a grouped frequency distribution the class corresponding to the cumulative


frequency just greater than N/2 is called the median class and median is determined by
the formula:

h
Md  L  (N/2  C)
f

where L: is the lower limit of the median class

f: is the frequency of the median class

h: is the size of the median class

C: is the cumulative frequency of the class preceding the median class

N = f

Merits: It is rigidly defined, easy to calculate and easy to understand. It is a positional


average and hence not affected by extreme values. It can be calculated for open-ended
distribution.

Demerits: Median is not based on all the observations. It is not suitable for further
treatment and is more sensitive to the fluctuations of sampling.

-27-
Manual on “Introduction to Statistical Methods”

Partition Values or Quantiles: Some times we not only need the mean or middle value
but also need values which divide the data in four or ten or hundred equal parts.

Quartiles: Are the three values which divide data into four equal parts. These are
denoted by Q1, Q2, Q3

Q1 is the value such that 25 % of the data fall below Q1 and 75% above Q1.

Q2 is the value such that 50 % of the data fall below Q2 and 50% above Q.

Q3 is the value such that 75 % of the data fall below Q3 and 25% above Q3.

On the lines of median formula, quartiles are given by

iN/4  C
Qi  L  xh ; i  1, 2, 3
f

where L, C and f are lower limit, C.F. of preceding class and frequency of the class
containing that quartile. Q1 and Q3 are called lower and upper quartiles respectively and
Q2 is equal to the median.

Deciles: Are the nine values which divide the data into 10 equal parts and deciles are
denoted by D1, D2, , D9; where

iN/10  C
Di  L  xh ; i  1, 2, ....., 9
f

D1 is the value such that 10 % of the data fall below D1 and 90% above D1.

D2 is the value such that 20 % of the data fall below D2 and 80% above D2

D9 is the value such that 90 % of the data fall below D9 and 10% above D9

Percentiles: Are the ninety nine values which divide the data into 100 equal parts and are
denoted by P1, P2, , P99; where

iN/100  C
Pi  L  xh ; i  1, 2, ....., 99
f

It is obvious that Q1= P25, Q2= P50, and Q3= P75.

Percentiles help to find the cut off values when the data be divided into a number of
categories and per cent of observations in various categories are given.

-28-
Manual on “Introduction to Statistical Methods”

Example: Find the median of 6,11,7,9,5

Sol. We first arrange the observations in ascending order of magnitude as 5,6,7,9,11

n = 5 (odd number),

5 1
Median = value of th item
2

= value of 3rd item = 7

Example:- Find the median of the data 5,11, 9,8 13,7

Solution: n = 6 (even number), observations arranged in ascending order are 5, 7, 8, 9,


11, 13, then n/2 =6/2 =3 and (n/2 +1) = (3+1) = 4 and hence 3rd and 4th item are the two
middle observations

3rd item  4th item 8  9 17


Median     8.5
2 2 2

Example: Calculate the AM, median, Q1, D4, P55,. for the data on salaries of 1000
employees in a company.

Salaries No. of Class fx cf


(000Rs) employees (f) mark (x)
1-3 50 2 100 50
3-5 110 4 440 160
5-7 162 6 972 322
7-9 200 8 1600 522
9-11 183 10 1830 705
11-13 145 12 1740 850
13-15 125 14 1750 975
15-17 15 16 240 990
17-19 8 18 144 998
19-21 2 20 40 1000
Total N = 1000 8856

Solution:

-29-
Manual on “Introduction to Statistical Methods”

(i) X = fx/N = 8856/1000 = 8.856 = Rs 8856

N/2 - C 500  322


(ii) Median = L h 7  2  8.78  Rs.8780
f 200

N/4 - C 250  160


(iii) Lower Quartile (Q1) = L h 5  2  6.111  Rs.6111
f 162

4N/10 - C 400  322


(iv) 4th decile (D4) = L h  7  2  7.78  Rs.7780
f 200

55N/100 - C 550  522


(v) 55th percentile (P55) = L  h  9  2  9.306  Rs. 9306
f 183

Mode: Mode is defined as the value, which occurs most frequently in a set of
observations and around which other observations of the set cluster densely. Mode of a
distribution is not unique. If two observations have maximum frequency then the
distribution is bimodal. A distribution is called multi-modal if there are several values
that occurs maximum number of times. Mode is often used where we need the most
typical value e.g. in business forecasting the average required by the manufacturers of the
sizes of readymade garments, shoes etc.

Mode of a discrete frequency distribution can be located by inspection as the


variable value corresponding to maximum frequency. For a continuous frequency
distribution we first calculate the modal class and then mode is determined by the
following formula:

f m - f1
Mode  L  h x
2f m - f1 - f 2

Here L, h, fm, f1 and f2 are respectively the lower limit of the modal class, the size
of modal class, frequency of the modal class, frequency preceding the modal class and
frequency following the modal class.

If there are irregularities in the distribution or the maximum frequency is repeated


or the maximum frequency occurs in the very beginning or at the end of the distribution
then the mode is determined by grouping method.

-30-
Manual on “Introduction to Statistical Methods”

Merits: Mode is not affected by extreme observations and can be calculated for open
ended distributions.

Demerits: It is ill defined, not based on all the observations, not unique and often does
not exit.

Relationships among Mean, Mode and Median:

i) For a symmetrical distribution Mean = Median = Mode

ii) For a skewed distribution Mean - Mode = 3(Mean – Median) or

Mode = 3 Median – 2 Mean

iii) For a positively skewed distribution Mean > Median >Mode

2 For a negatively skewed distribution Mean < Median <Mode

Example: Find mode for the following discrete frequency distribution

x: 1 2 3 4 5 6 7 8

f: 3 7 15 22 25 16 9 4

Solution: Maximum Frequency = 25

Mode = value corresponding to maximum frequency = 5

Example:- Find the mode of the following distribution

3, 5, 5, 7, 7, 7, 8, 8, 9, , 10

Sol. First we obtain the frequency distribution

x : 3 5 7 8 9 10
f : 1 2 3 2 2 1
Maximum frequency = 3

The value corresponding to maximum frequency is 7, there

Mode = 7

Illustration: A sample of 100 chapatis chosen randomly from a railway catering service
gave the following distribution of weights of chapattis

-31-
Manual on “Introduction to Statistical Methods”

Weight 21-22 22 –23 23-24 24-25 25-26 26 - 27 27 - 28

No. of chapattis 8 12 16 24 18 14 8

Compute Modal weight of chapattis.

Solution: Here fm = 24, f1 = 16 and f2 = 18

L = 24 and h =1

f m  f1 24  16
Mode  L  x h  24   1  24.57g
2 fm  f 1  f 1 2 x 24  16  18

Computation of Mode By The Method of Grouping

If there are irregularities in the distribution and/or the maximum frequency is


repeated and/or the maximum frequency occurs in the very beginning or at the end of the
distribution then the mode is determined with grouping method.

Illustration (Mode By The Method of Grouping) Find the mode of the following
frequency distribution.

Size (x) 1 2 3 4 5 6 7 8 9 10 11 12

(f) 3 8 15 23 35 40 32 28 20 45 14 6

Solution: The distribution appears to be irregular because the maximum frequency 45


does not seem to be consistent with the distribution. Moreover the maximum frequency
appears near the end of the distribution. We therefore, locate mode by the method of
grouping frequencies as explained below

Column I: original frequencies

Column II: combining frequencies two by two

Column III.: ignoring first frequency & combining remaining frequencies two by two

Column IV: combining the frequencies three by three

Column V: combining frequencies three by three after ignoring first frequency

-32-
Manual on “Introduction to Statistical Methods”

Column VI. leaving first two frequencies and combining the frequencies three by three

Size Frequency
(x) I II III IV V VI

1 3
11
2 8 26
23
3 15 46
38
4 23 73
58
5 35 98
75
6 40 107
72
7 32 100
60
8 28 80
48
9 20 93
65
10 45 79
59
11 14 65
20
12 6

-33-
Manual on “Introduction to Statistical Methods”

To find the mode, we form the following table.


Column number Maximum Frequency Possible Modes

(i) 45 10

(ii) 75 5, 6

(iii) 72 6, 7

(iv) 98 4, 5, 6

(v) 107 5, 6, 7

(vi) 100 6, 7, 8

On examining the possible mode that the value (i.e. repeated the maximum number of

times) 6 appears in maximum number of columns.

Mode = 6

Geometric Mean: Geometric mean is more appropriate if the observations are measured
as ratios, proportions, growth rates or percentages. When the growth rates or increase in
production etc. are given for a number of years or periods then geometric mean should be
used as a measure of central tendency. If x1, x2, , xn be the n positive observations then
1
G = (x1x2 ..… xn)1/n or log G 
n
 logx i For a frequency distribution, the geometric
mean is defined as

1
G = (x1f1.x2f2….. xnfn)1/N or log G 
N
 f i log x i ,where N = fi
Properties of Geometric Mean:

i) It is rigidly defined and is based on all the observations.

ii) It is suitable for further mathematical treatment. If n1 and n2 are the sizes of two
data series with G1 and G2 as their geometric means respectively, then geometric
mean of combined series G is given by:

n 1 log G 1  n 2 log G 2
Log G 
n1  n 2

-34-
Manual on “Introduction to Statistical Methods”

iii) It is not affected much by fluctuations in sampling.

Demerits:

i) Geometric mean is a bit difficult to understand and to calculate for a non-


mathematician.

ii) It cannot be calculated when any value is zero or negative and gives an absurd
value if computed in case of even number of negative observations

iii) Like arithmetic mean it is also affected by the extreme values but to a lesser
extent.

Harmonic Mean: The reciprocal of the arithmetic mean of reciprocals of non-zero


values of a variable is called harmonic mean (H). Harmonic mean is a suitable average
when observations are given in terms of speed rates and time.

If x1, x2, , xn are n observed values, the harmonic mean is given by:

n n
H 
1 1 1 1

x1 x 2
 ..... 
xn
x
i

When observations are given in the form of a frequency distribution, then


harmonic mean is given by:

N N
H  where N =fi
f1 f 2 f fi

x1 x 2
 .....  n
xn
x
i

Harmonic mean for a grouped frequency distribution can be obtained by first


calculating the class marks.

Properties of Harmonic Mean:

i) Harmonic mean is rigidly defined and is based on all observations.

ii) It is suitable for further mathematical treatment.

iii) Like geometric mean it is not affected much by fluctuation of sampling.

-35-
Manual on “Introduction to Statistical Methods”

Demerits:

i) Harmonic mean is not easily understood and is difficult to compute.

ii) It gives greater importance to small items.

iii) It cannot be calculated when any observation is zero.

Relation among Arithmetic Mean (A), Geometric Mean (G) and Harmonic Mean (H):

i) G  AxH

ii) A  G  H, equality holds when all observations have same magnitude.

Example: A machine was purchased in 1990. Its value appreciated at the rate of 5% p.a.,
for the first 4 years and then at rate of 8% p.a, for 6 years. Find the average rate of
decrease.

Solution:

x :5 8
f : 4 6

 f  N  10
1
log G = (f1 log x1 + f2 log x2)
N
1
= (4 log 5 + 6 log 8)
10
1
= (4 x 0.699 + 6 x 0.903 )
10
= 0.823
G = Antilog (0.823) = 6.653

Example: An aero plane flies along the four sides of a square at speeds of 100, 200, 300
and 400 km per hours respectively. Find the average speed.

Solution: The average speed is given by the Harmonic mean of speeds along with four
sides

4 4
HM    192 km/hr
1 1 1 1 25
  
100 200 300 400 1200

-36-
Manual on “Introduction to Statistical Methods”

Measures of Dispersion

For comparing two distributions, average value may not give complete picture as
it may be possible that distributions may have the same mean but they may differ in
scatter or spread. The degree to which numerical data tend to scatter or spread around the
central value is called dispersion. Further, any quantity that measures the degree of
spread or scatter around the central value is called measure of dispersion. Important
measures of dispersion are:

i) Range

ii) Mean Deviation

iii) Variance

iv) Standard Deviation

v) Coefficient of Variation

Range, mean deviation, variance and standard deviation are absolute measures of
dispersion while coefficient of variation (CV), which is independent of the units of
measurement, is a relative measure of dispersion. CV is a pure number and often
expressed as percentage.

Range: It is defined as the difference of the two extreme observations of a data set.
Range is used when we need a rough comparison of two or more sets of data or when the
observations are too scattered to justify the computation of a more precise measure of
dispersion. Range is the simplest measure of dispersion and it is easy to interpret.
However, range is based only on two extreme observations and has greater chances of
being affected by fluctuations in sampling. It is also a crude and less reliable measure of
dispersion.

Mean Deviation (MD): The arithmetic mean of the absolute deviations about any point
A is called the mean deviation about the point A. The point ‘A’ may be taken as mean,
median or mode of the distribution. It is a useful measure of dispersion in business and
economics when extreme observations influence the standard deviation unduly.

-37-
Manual on “Introduction to Statistical Methods”

The MD of n observations x1, x2, x3, , xn about any point A is given by

1 n
MD   x i  A
n i 1

If a frequency distribution is given then,

1 n
MD   fi xi - A
N i 1
where N   f i

Mean deviation is rigidly defined, based on all the observations, easy to calculate and
relatively easy to interpret. It is least when deviations are taken about median and is not
affected much by the extreme values. MD has a serious demerit that it ignores the signs
of deviations and hence creates some artificiality in the result.

Variance (σ 2): The arithmetic mean of the squared deviations taken about mean of a
series is called variance (σ2). Mathematically, variance is defined as:

1 n 1
σ2 =  (x i  x) 2   x i  x 2
2

n i 1 n

For a frequency distribution, the variance is given by:

where N   f i and x   i i
1 1 fx
 f i (x i  x) 2   f i x i  x 2
2
σ2 =
N N N

Standard Deviation (σ): The positive square root of the arithmetic mean of the squared
deviations of observations in a data series about its arithmetic mean is called standard
deviation (σ) i.e. it is the square root of variance.

1 n 1
 (x i  x) 2   xi  x2
2
σ=
n i 1 n

and for a frequency distribution

1 1
 f i (x i  x) 2   fi xi  x 2
2
σ=
N N

Standard deviation is a stable measure and is regarded as the best and most
powerful measure of dispersion. It is relatively less sensitive to the sampling fluctuations

-38-
Manual on “Introduction to Statistical Methods”

and also suitable for further mathematical treatment. Standard deviation is independent
of change of origin but not of scale. The main demerit of standard deviation is that it is
likely to be affected by sampling fluctuations and it cannot be calculated in case of open-
ended distributions.

Step deviation method for computing variance

i) Compute mid values of class intervals if not given. Let x1, x2, , xn be the mid
values then for every class interval find ui = (xi – A)/h where A = assumed mean and h =
width of the class interval.

ii) Find fiui and fiui2 and AM ( x ) = A + h u = A +


f u i i
xh
N

Variance ( 2x )  h 2 f u i
2
i / N  ( f i u i / N) 2  = h2  u2

Relative measures of dispersion are used while comparing the variability of series
having different or same units of measurement. They can also be used for comparing two
or more series for consistency. Coefficient of variation is defined as

SD
Coefficient of Variation (CV) = x 100
Mean

A series with smaller coefficient of dispersion is said to be less dispersed or more


consistent.

Combined or Pooled Variance: Combined standard deviation of two groups is denoted


by 2p and is computed follows:

n 1σ 12  n 2 σ 22  n 1d 12  n 1 d 22
σp 
2

n1  n 2

where 1 = SD of first group

2 = SD of second group

d1 = (X1  Xp ) ; d2  (X2  Xp )

-39-
Manual on “Introduction to Statistical Methods”

X1 , X 2 , X p are the group means and pooled mean respectively.

Example: Find the mean deviation about AM and hence find the coefficient of mean
deviation.

Daily wages (Rs.) 0-20 20-40 40-60 60-80 80-100 Total

No. of wage earners (f) 2 5 10 10 5 N=32

Mid value (x) 10 30 50 70 90

fx 20 150 500 700 450 1820

x-x 46.9 26.9 6.9 13.1 33.1

f x-x 93.8 134.5 69 131 165.5 593.8

AM ( x ) = fx/N = 1820/32 = 56.9

MD = f x - x /N = 593.8./32 = 18.55

MD 18.55
Coefficient of MD   100   100  32.6%
AM 56.9
Example: The runs scored by two batsmen X and Y in 10 innings are given below. Find
out which is better runner and who is more consistent player?

Total

X 90 110 5 10 125 15 35 16 134 10 550

Y 65 68 52 47 63 25 25 60 55 60 520

x-x 35 55 -50 -45 70 -40 -20 -39 79 -45

x  x 2 1225 3025 2500 2025 4900 1600 400 1521 6241 2025 25462

y-y 13 16 0 -5 11 -27 -27 8 3 8

y  y 2 169 256 0 25 121 729 729 64 9 64 2166

-40-
Manual on “Introduction to Statistical Methods”

Series X: X = x/n = 550/10 = 55

σ 2x = ( x  x ) 2 / n  25462 / 10  2546 .2  50 .46

σx 50.46
CV =  100   100  91.74%
X 55

Series Y: Y = y/n = 520/10 = 52

σ 2y = Σ(y  y) 2 / n  2166/10  216.6  14.71

σy 14.71
CV =  100   100  28.28 %
Y 52

X is better runner because X = 55 is more than Y = 52 and Y is more consistent player


since CV in Y series is less than CV in X series.

Skewness and Kurtosis:


Skewness: Literally means lack of symmetry, skewness gives an idea about the shape of
the curve that can be drawn with the help of the given data. The frequency curve of a
skewed distribution is not symmetrical but stretched more to one side than to the other.

Skewness is often measured by the Karl-Pearson Coefficient defined as:

Mean - Mode 3(Mean - Median)


Sk = =
SD SD

Since (Mean – Median)/SD lies between  1, hence Sk lies between  3.

-41-
Manual on “Introduction to Statistical Methods”

For a symmetrical distribution mean, mode and median are equal and in case of a
positively skewed distribution AM > Median > Mode and for a negatively skewed
distribution AM < Median < Mode.

μ 32
Skewness may be defined in terms of moments as γ1  β1 where, β 1  3 and
μ2

r  (1 / N) fi (xi  x)r is rth order central moment of the variable X and sign of γ1

is same that of 3. If γ1 = 0  curve is normal (symmetrical) and if γ1 < 0  curve is


skewed to the left (negative skewness) whereas γ1 > 0  curve is skewed to the right
(positive skewness). Graphically it can be shown as under:

Negatively skewed distribution

Positively skewed distribution


right)

Symmetrical distribution

-42-
Manual on “Introduction to Statistical Methods”

Kurtosis: The flatness or peakedness of top of the curve is called kurtosis. Kurtosis is
measured in terms of moments and is given by:
2
2   4 / 2 and γ2 = β2 - 3

For normal distribution curve β2 = 3 (or γ2 = 0) and flatness or peakedness of the


curve of a distribution is compared in relation to normal curve and the distributions have
been classified as follows:

If β2 = 3  γ2 = 0, then the curve is neither peaked not flat but normal, it is called
Mesokurtic curve.

β2 > 3  γ2 > 0, then the curve is more peaked than normal, i.e. Leptokurtic curve

β2 < 3  γ2 < 0, then the curve is less peaked than normal, i.e. Platykurtic curve

Leptokurtic

Mesokurtic

Platykurtic

Mesokurtic, Leptokurtic and Platykurtic Curves

-43-

You might also like