DESCRIPTIVE
STATISTICS
MODULE-1 CH-3
BOOK : KEN BLACK 7TH EDITION
MEASURES OF CENTRAL TENDENCY
• Measures of central tendency yield information about
“particular places or locations in a group of numbers.”
• Common Measures of Location
Mean Median Mode
Quartiles Decimals Percentiles
FOR UNGROUPED AND GROUPED DATA
FORMULAS FOR
Mean
Median
Mode
Quartile
• Mean is the average of a group of
numbers
• Applicable for interval and ratio
data
• Not applicable for nominal or
MEAN OR ordinal data
ARITHMETIC • Affected by each value in the
MEAN data set, including extreme values
• Computed by summing all values
in the data set and dividing the
sum by the number of values in
the data set
• Median - middle value in an ordered array of
numbers.
– Half the data are above it, half the data
are below it
– Mathematically, it’s the (n+1)/2 th
ordered observation
• For an array with an odd number of
terms, the median is the middle
number
MEDIAN – n=11 => (n+1)/2 th = 12/2 th = 6th
ordered observation
• For an array with an even number of
terms the median is the average of the
middle two numbers
– n=10 => (n+1)/2 th = 11/2 th =
5.5th = average of 5th and 6th
ordered observation
• Mode - the most frequently occurring
value in a data set
– Applicable to all levels of data
measurement (nominal, ordinal,
interval, and ratio)
– Can be used to determine what
categories occur most frequently
MODE – Sometimes, no mode exists (no
duplicates)
• Bimodal – In a tie for the most
frequently occurring value, two
modes are listed
• Multimodal -- Data sets that contain
more than two modes
QUARTILES
❑ Quartile - measures of central tendency that divide a
group of data into four subgroups
Q1: 25% of the data set is below the first quartile
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile
Q1 Q2 Q3
25 25 25 25
% % % %
Which measure of central tendency is
most appropriate?
• In general, the mean is preferred, since it has
nice mathematical properties (in particular, see
chapter 7)
• The median and quartiles, are resistant to
outliers
Appropriate
Consider the following three datasets
Measure
• 1, 2, 3 (median=2, mean=2)
• 1, 2, 6 (median=2, mean=3)
• 1, 2, 30 (median=2, mean=11)
• All have median=2, but the mean is sensitive to
the outliers
In general, if there are outliers, the
median is preferred to the mean
MEASURES OF VARIABILITY
• Measures of Variability - tools that describe the spread or the
dispersion of a set of data.
• Provides more meaningful data when used with measures of
central tendency in comparison to other groups
Common measures of
Variability or Spread or
Dispersion
Inter-quartile Standard Coefficient
Range Variance
Range Deviation of variation
Statistical formulas for:
- Range
- Inter-Quartile Range [IQR]
- Variance
- Standard Deviation
- Coefficient of Variance
RANGE
• RANGE= Largest Value of Data set– Lowest Value of Data Set
Advantages Disadvantages
Rigidly Defined Not Based All Items
Range is rigidly defined measure of Range may not be considered as the
dispersion. So, it value is always fixed. reliable method of dispersion because
it is not based on the all items in the
series.
Simple And Easy Method Highly Affected
It does not require special mathematical Range is highly affected by sampling
knowledge to compute range. It can be fluctuations.
calculated simply by subtracting lowest
value from the largest value in the series. It
is very simple to understand.
Usefulness Range cannot be determined in case
Range can be used for quality control of open end class distribution.
purpose. It is also useful for weather
forecast.
INTER-QUARTIL
E RANGE
INTER-QUARTIL
• Inter-quartile Range - range of E RANGE = Q3 –
values between the first and Q1
third quartiles
• Range of the “middle half”;
middle 50%
– Useful when researchers
are interested in the middle
50%, and not the extremes
VARIANCE
• Sample Variance - average of the squared
deviations from the arithmetic mean
• Sample Variance – denoted by s2
X X-Xbar (X-Xbar)2
2,398 625 390,625
1,844 71 5,041
1,539 -234 54,756
1,311 -462 213,444
Standard Deviation
Sample standard deviation is the square root
of the sample variance
COEFFICIENT OF VARIATION
•Coefficient of Variation (CV) – measures the volatility of a
value (perhaps a stock portfolio), relative to its mean. It’s the
ratio of the standard deviation to the mean, expressed as a
percentage
•Useful when comparing Standard Deviation computed from
data with different means
•Measurement of relative dispersions
COEFFICIENT OF VARIATION
Consider two different populations
Since 15.86 > 11.90, the first population is more variable,
relative to its mean, than the second population
MEASURES OF SHAPE
Kurtosis Skewness
Box Plot
SKEWNESS
• Symmetrical – the right half is a mirror
image of the left half
• Skewed – shows that the distribution lacks
symmetry; used to denote the data is sparse at
one end, and piled at the other end
– Absence of symmetry
– Extreme values or “tail” in one side of a
distribution
– Positively- or right-skewed vs.
negatively- or left-skewed
Skewness
Karl Pearson’s Coefficient of Skewness
Kurtosis
• Kurtosis is a statistical measure that defines how heavily the tails of a
distribution differ from the tails of a normal distribution. In other
words, kurtosis identifies whether the tails of a given distribution contain
extreme values.
• Leptokurtic indicates a positive excess kurtosis. The leptokurtic
distribution shows heavy tails on either side, indicating large outliers. In
finance, a leptokurtic distribution shows that the investment returns may be
prone to extreme values on either side. Therefore, an investment whose
returns follow a leptokurtic distribution is considered to be risky.
• A platykurtic distribution shows a negative excess kurtosis. The kurtosis
reveals a distribution with flat tails. The flat tails indicate the small outliers
in a distribution. In the finance context, the platykurtic distribution of
the investment returns is desirable for investors because there is a small
probability that the investment would experience extreme returns.
• Data that follows a Mesokurtic distribution shows an excess kurtosis of
zero or close to zero. This means that if the data follows a normal
distribution, it follows a mesokurtic distribution.
KURTOSIS
Box and Whisker Plot
• Box plots show the five-number summary of a set of data: including the
minimum score, first (lower) quartile, median, third (upper) quartile, and
maximum score.
• A Box and Whisker Plot (or Box Plot) is a convenient way of visually
displaying the data distribution through their quartiles. The lines extending
parallel from the boxes are known as the “whiskers”, which are used to
indicate variability outside the upper and lower quartiles.
• Summarizes variation in large datasets visually.
• Shows outliers.
• Compares multiple distributions.
• Indicates symmetry and skewness to a degree.
• Simple to sketch.
BOX PLOT