Measure of
Dispersion
Standard Deviation
SKEWNESS
Standard Deviation
Important Terminology
• Deviation : The deviation is a measure that is used to find the
difference between the observed value and the expected value
of a variable. In simple words, the deviation is the distance
from the center point.
• Depression : Statistical dispersion means the extent to which
numerical data is likely to vary about an average value. A
statistical term which represents the size of distribution of
values that are expected for a specific variable. Dispersion can
be calculated and measured using different statistics such as,
range, variance and standard deviation.
Concept of Standard Deviation
• In descriptive Statistics, the Standard Deviation is the degree of
dispersion or scatter of data points relative to the mean. It is a measure
of the data points' Deviation from the mean and describes how the
values are distributed over the data sample. The standard deviation is
the average distance from the mean value of all values in a set of data.
• Standard deviation, usually denoted by the letter σ (small sigma) of the
Greek alphabet was first suggested by Karl Pearson as a measure of
dispersion in 1893.
• Standard deviation is the measurement of the dispersion of the data set
from its mean value. It is always measured in arithmetic value. The
Standard Deviation is calculated as The square root of variance by
determining each data point's deviation relative to the arithmetic mean.
It is a statistical tool used for interpreting the reliability of data
The curve on is more spread out
and therefore has a higher
standard deviation, while the
curve below is more clustered
around the mean and therefore
has a lower standard deviation. A
high standard deviation shows
that the data is widely spread (less
reliable)
A standard deviation (or σ) is a
measure of how dispersed the data
is in relation to the mean. Low, or
small, standard deviation indicates
data are clustered tightly around
the mean. a low standard
deviation shows that the data are
clustered closely around the mean
(more reliable).
Low Standard Deviation
For
High Standard Deviation
Example
Low Standard Deviation
If there’s a low standard deviation
A high standard deviation (close to 1 or lower), it suggests that
(significantly higher than 1) indicates the data points tend to be closer to
that data points spread out over a the mean, indicating low variance.
wider range, signifying high variability. This might be considered “good” in
This could be “bad” in situations contexts where consistency or
where you want low variance but predictability is desired.
“good” when you are looking for a
high degree of diversity or dispersion For instance, in a production line, a
in your data. low standard deviation in the size of
produced items signifies high
The stock market, a high standard consistency and quality control.
deviation of a stock’s price would
mean that the price is highly volatile, Similarly, a mutual fund with a low
leading to a higher risk and potential standard deviation of returns would
for significant returns. indicate less risk as the returns are
relatively stable, not deviating much
from the average return
• A low standard deviation indicates that the values tend to be
close to the mean (also called the expected value) of the set,
while a high standard deviation indicates that the values are
spread out over a wider range.
• Standard deviation is the measurement of the dispersion of the
data set from its mean value. It is always measured in
arithmetic value.
Applications of Standard Deviation
• Standard deviation helps businesses quantify and manage various types
of risks. By calculating the standard deviation of certain outcomes,
businesses can assess the volatility or uncertainty associated with how
they operates.
• Standard deviation is used to analyze financial data and assess the
variability of financial performance metrics. Standard deviation is
employed to measure the volatility of investment returns.
• Standard deviation helps businesses identify seasonality, trends, and
patterns in sales data that allow them to plan for cash needs in the near
future.
• Standard deviation is also used in quality control processes such as Six
Sigma methodologies to measure process capability, reduce defects, and
optimize manufacturing processes for improved quality and customer
satisfaction.
Formula for Standard Deviation Group data
Coefficient of variation
• Coefficient of variation is a relative measure of dispersion that is used
to determine the variability of data. It is a useful statistic for
comparing the degree of variation from one data series to another,
even if the means are drastically different from one another.
• The CV for a model aims to describe the model fit in terms of the
relative sizes of the squared residuals and outcome values. The lower
the CV, the smaller the residuals relative to the predicted value. This
is suggestive of a good model fit. the coefficient of variation allows
investors to determine how much volatility, or risk, is assumed in
comparison to the amount of return expected from investments.
• The relative measure of dispersion for standard deviation given by
Karl Pearson, is called as coefficient of standard deviation or also as
coefficient of variation (CV). The formula to compute CV is as follows:
• A low standard deviation indicates that the values tend to be
close to the mean (also called the expected value) of the set,
while a high standard deviation indicates that the values are
spread out over a wider range.
• Standard deviation is the measurement of the dispersion of the
data set from its mean value. It is always measured in
arithmetic value.
Problems
Q.1 : From the following frequency distribution of the heights of 360 young boys in the
age group of 15-20 years. find Arthematic Mean and Standard Deviation and coefficient of
variation . ( Assume – Assumed mean as 143)
Answers: S.P. and M.P Gupta : AM- 145.53 and SD-
10.295
Answers: S.P. and M.P Gupta : AM- 227.5 and SD-
Assumed Mean: 227.5
8.732
From SP Gupta Answer : AM- 68.1 and SD:
Assumed Mean : 64.5
12.505
Note : assumed mean is
22.5
Note : assumed mean is
2200
Assumed Mean: 1000
A-CV=21.63, B-
CV=23.41
Assumed Mean:
17,500
Skewness
Concept of Skewness
• Skewness describes how much statistical data distribution is asymmetrical from the
normal distribution, where distribution is equally divided on each side. It quantifies
the degree to which the data deviates from a perfectly symmetrical distribution,
such as a normal (bell-shaped) distribution
• "Measures of skewness tell us the direction and the extent of skewness. In
symmetrical distribution the mean, median and mode are identical. The more the
mean moves away from the mode, the larger the asymmetry or skewness."-Simpson
& Kalka
• Pearson, K. (1894,1895) introduced a coefficient of skewness, known as the
coefficient, based on calculations of the centered moments
• If the distribution is symmetric, it has a skewness of 0 and its Mean = Median =
Mode
• Skewness is how much the data set deviates from its normal distribution. A larger
negative value in the data set means the distribution is negatively skewed, and a
larger positive value in the data set means the distribution is positive.
• Symmetrical Distribution. It is clear from the diagram (a) that in
a symmetrical distribution the values of mean, median and mode
coincide. The spread of the frequencies is the same on both
sides of the center point of the curve.
• Asymmetrical Distribution. A distribution, which is not
symmetrical, is called a skewed distribution and such a
distribution could either be positively skewed or negatively
skewed as would be clear from the diagrams (b) and (c)
• Positively Skewed Distribution. In the positively skewed
distribution the value of the mean is maximum and that of mode
least-the median lies in between the two as is clear from the
diagram (b).
Negatively Skewed Distribution. The
following is the shape of negatively skewed
distribution. In a negatively skewed
distribution the value of mode is maximum and
that of mean least-the median lies in between
the two. In the positively skewed distribution
the frequencies are spread out over a greater
range of values on the high-value end of the
Standard Value of Measurement of Skewness
If this value is between:
• -0.5 and 0.5, the distribution of the value is almost
symmetrical
• -1 and -0.5, the data is negatively skewed, and if it is
between 0.5 to 1, the data is positively skewed. The
skewness is moderate.
• If the skewness is lower than -1 (negatively skewed) or
greater than 1 (positively skewed), the data is highly
skewed.
Formula for Skewness
Problems : When MODE is well
defined
Sample Problem
Assumed Mean: 4899.5 AM-4845.5, Mode:4850.56, SD-365.9, Sk: -0.014
Assumed Mean: 4700 AM-4781.5, Mode:4866.67, SD-340.4, Sk: -0.25
Assumed Mean: 35 AM- 33.156; Mode-32.63; SD-17.08; SK-0.031
Problems : When MODE is ill defined , then
use median
F-300,
FD-778,
FD2-
3510
AM- 110.93, Median-116, Sd-22.3043, SK—
Assumed Mean:85
0.6819
Assumed Mean:35 AM- 38.93, Median-40, Sd-22.8, SK—0.141