Review Question - C3 - SACR3080

Uploaded by

Indira Brown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views10 pages

Review Question - C3 - SACR3080

Uploaded by

Indira Brown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Chapter 3

Measures of Dispersion

Notes:
Measures of dispersion (also called measures of variability) are statistical tools used to describe
how spread out or scattered the values in a dataset are. They provide insight into the degree of
variation or inconsistency in the data.
Measures of dispersion:
1. Range:
- Definition: The difference between the maximum and minimum values in a dataset.
- Formula: Rang = Maximum - Minimum
- Use: Simple and gives a quick sense of how spread out the data is. However, it is sensitive to
outliers.
2. Interquartile Range (IQR):
- Definition: The range of the middle 50% of the data, or the difference between the 75th
percentile (Q3) and the 25th percentile (Q1).
- Formula: IQR = Q3 - Q1
- Use: More resilient than the range because it focuses on the middle of the data and is less
influenced by outliers.
3. Variance:
- Definition: The average of the squared differences between each data point and the mean.
- Formula:
- Use: Measures how much the data points deviate from the mean but in squared units, making
it less intuitive for interpretation.
4. Standard Deviation (SD):
- Definition: The square root of the variance. It indicates the average amount by which the
data points differ from the mean.
- Formula:
- Use: Commonly used because it is in the same units as the original data and provides a clear
measure of spread, especially for normally distributed data.
5. Mean Absolute Deviation (MAD):
- Definition: The average of the absolute differences between each data point and the mean.
- Formula:
- Use: Shows the average distance of each data point from the mean, less sensitive to extreme
values compared to variance and SD.
6. Coefficient of Variation (CV):
- Definition: The ratio of the standard deviation to the mean, expressed as a percentage.
- Formula:
- Use: Useful for comparing variability between datasets with different units or scales.
7. Index of Qualitative Variation (IQV):
- Definition: A measure used to assess the dispersion of nominal (categorical) data.
-

- Use: It indicates how evenly distributed observations are across categories in a nominal
dataset.
8. Inter-Decile Range (IDR):
- Definition: The range between the 90th percentile and the 10th percentile.
- Formula:
- Use: Like IQR, it focuses on the central portion of the data but covers a broader range (80%
of the data).
9. Entropy:
- Definition: A measure of uncertainty or disorder, often used for categorical or nominal data.
- Formula
- Use: Captures the unpredictability in a dataset, especially for nominal data.

Review Questions:
1. When might we prefer to use an entropy measure of dispersion rather than an IQV? Rather
than a standard deviation?
 Entropy (like Shannon entropy) measures uncertainty or disorder in categorical data,
where there may be many categories with varying probabilities. Use entropy when you
want to capture the unpredictability or diversity in a dataset, especially for nominal data.

 Index of Qualitative Variation (IQV) is also for nominal data but assumes equal
probabilities among categories, which entropy does not require.
 Standard deviation is appropriate for interval and ratio data, where numerical distances
between values matter. It would not be suited for nominal data because nominal variables
lack meaningful distances between categories.

2. In the formula , what do N, nj, and K stand for? What does this formula
give us?
The formula for the Index of Qualitative Variation (IQV) essentially tells us how spread out or
varied cases are across different categories. The IQV ranges from 0 to 1:
 0 means no variation (all cases fall into one category).
 1 means maximum variation (cases are evenly spread across all categories).
The IQV is a rescaled version of the Index of Diversity, where the denominator is set to ensure
this 0-1 range. This makes it easier to interpret as a measure of diversity or variation.
In short, the formula gives a straightforward way to quantify how diverse the distribution of
cases is across categories, where higher values indicate greater diversity.

3. Suppose we see the formula −∑ pi [log2(pi)]. What do pi and log2 stand for? If we calculate
this quantity, what will it tell us?

 This formula calculates the Shannon entropy, which measures the uncertainty or disorder
in a dataset. High entropy means the data is more spread out across different categories,
making it unpredictable. On the other hand, low entropy indicates that the data is more
concentrated in fewer categories, making it more predictable.

4. Suppose someone said to you that there is no measure of dispersion for nominal (ordinal)
variables because dispersion is meaningless when we cannot tell how far apart categories are.
What might you say in reply?
 We can reply that there are measures of dispersion for nominal and ordinal variables,
even without knowing the distances between categories. For nominal data, measures like
the Index of Diversity, Index of Qualitative Variation (IQV), and Entropy show how
spread out the data is across categories. For ordinal data, while precise distances can't be
specified, we can use the Interquartile Range (IQR), Interdecile Range (IDR), and
Median Absolute Deviation (MAD) to show how much of the data falls between
specific points. The IQR and IDR are commonly used, representing the range between the
middle 50% and 80% of the data, respectively.

Explanation:
We might reply that there are, in fact, measures that do not require knowing the distances
between categories, but still give valuable information about how spread out or concentrated
the data is across these categories. For example, dispersion in nominal variables can be
measured through the Index of Diversity, which tells us how likely it is that two cases,
drawn at random, will come from different categories, Index of Qualitative Variation
(IQV), or through Entropy (a measure of the absolute extent of diversity that is present).
With ordinal data, although we cannot specify precise distances between categories, we can
say how much of the sample lies between particular values. For these purposes, we could use
the Interquartile Range (IQR), the Interdecile Range (IDR), and the Median Absolute
Deviation (MAD). The most widely used measures for this purpose are the Interquartile and
the Interdecile Ranges. Respectively, these give us the range between the upper and lower
quartiles, and between the upper and lower deciles. Quartiles, unsurprisingly, are points that
divide an ordered distribution into quarters. Deciles are points dividing an ordered set of
cases into tenths.
5. In what way are the IQV and the entropy measure complementary?

 Unlike the IQV, the entropy measure does not assess dispersion in relation to a maximum.
Instead, it measures the absolute extent of diversity present. Because of this difference,
the two are complementary and can be used together to highlight different aspects of the
data. Although entropy is calculated differently from the IQV or Index of Diversity, all
three measure the dispersion of nominal variables, and entropy is often well correlated
with the others.
Explanation:
The Index of Qualitative Variation (IQV) and the entropy measure are complementary because
they each highlight different aspects of diversity in categorical data.
- IQV gives us the amount of dispersion relative to the maximum possible variation.
It tells us how close a distribution is to being perfectly diverse or homogenous.
- Entropy measures the absolute extent of diversity without comparing it to a
theoretical maximum. It focuses on how much uncertainty or unpredictability exists within the
distribution.
Because the IQV focuses on relative dispersion and entropy measures absolute diversity, using
both together provides a fuller picture of how diverse or varied the data is. While they are
calculated differently, they often correlate well and can reinforce each other in showing the
degree of variation present.
In other words, IQV is about how far a distribution is from being maximally diverse, while
entropy quantifies the diversity in absolute terms, making them useful together for a more
complete analysis.
6. What measures of dispersion are commonly suggested for ordinal variables? Why, for truly
ordinal variables, may it be safer just to report key percentiles?

 With ordinal data, while exact distances between categories cannot be measured, we can
still describe how much of the sample falls between certain values. The Interquartile
Range (IQR) and Interdecile Range (IDR) are widely used to capture the spread of the
middle 50% and 80% of the data, respectively. Another option sometimes suggested for
ordinal data is the MAD (the Median Absolute Deviation). Since ordinal data does not
assume equal distances between categories, reporting key percentiles often provides a
clearer and more accurate picture.

Explanation:

 Interquartile Range (IQR) and Interdecile Range (IDR) are commonly suggested
because they focus on the middle part of the distribution, which is suitable for ordinal
data. Another option sometimes suggested for ordinal data is the MAD (the Median
Absolute Deviation).
 For truly ordinal variables, reporting key percentiles (e.g., the 25th, 50th, and 75th) might
be safer because they give a clear sense of how the values are distributed across ordered
categories without assuming equal intervals between ranks.

7. Suppose that the IDR for the final grades in a course in social statistics was found to be 21.
What would this tell us about the distribution of grades? If the MAD was 10, what would this tell
us?

 If the Interdecile Range (IDR) for the final grades is 21, it suggests that the middle 80%
of students' grades are spread across a 21-point range. This indicates moderate variability
in the distribution of grades.
 If the Median Absolute Deviation (MAD) is 10, it means that the typical deviation from
the median grade is 10 points. This means that most grades cluster around the median
with some moderate spread. Combining these two measures shows both the overall range
and the central concentration of grades.

8. What measure of dispersion is typically suggested for ratio variables, and when is it liable to
be misleading?
For (interval or) ratio variables, the most widely used measure of dispersion is the
standard deviation (SD). It leverages the defined intervals between observations by calculating
the average distance of each value from the mean. The SD is expressed in the original units of
the variable and has four key advantages: it accounts for the precise intervals between data
points, uses information from all cases, is comparable across different samples, and is in
meaningful units. However, because the SD squares deviations from the mean, outliers can have
a significant impact, making it unstable when extreme values are present.

9. What is the formula for a standard deviation? Why is there a square root sign in the formula?

10. Explain the meaning of the symbols in the formula for the SD.
11. Briefly state the advantages of the standard deviation.
The standard deviation:

 accounts for the precise intervals between data points

 uses information from all cases
 is comparable across different samples
 is in meaningful units.

12. When might we prefer to use an IQR rather than a standard deviation?
The IQR is preferred when the data are skewed or contain outliers, as it focuses on the
middle 50% of the data and is less sensitive to extreme values.
13. Why do many researchers prefer the SD to the IQR or IDR (or MAD) when they have ordinal
data?
One reason to use the SD with ordinal data is that measures often recommended, the IQR,
IDR, and MAD, may show major changes when the shifts in the data are modest, or no changes
when the shifts in the data are major. The SD responds much more smoothly to changes in the
data than the IQR, the IDR, and the MAD. Many use the SD with ordinal data for that reason, as
long as they can accept that the distances between categories are not too seriously uneven.
14. What are the mean and SD of a z-score? What are two ways z-scores can be helpful?
A z-score is a standardized form of a variable that allows for easy comparison across
different datasets. The mean of a z-score distribution is zero, and its standard deviation (SD) is
one.
Z-scores can be helpful because:

- they allow us to compare the shape of two distributions without being

distracted by differing means and SDs.
- let us see how far a given case lies from the mean by expressing the distance in standard
deviations.

15. If a variable is normally distributed, what percentage of observations will lie within
approximately two SDs of the mean? Within one SD?
If a variable is normally distributed, approximately 68% of cases will lie within one
standard deviation (SD) of the mean, and 95% will lie within 1.96 SDs of the mean. Often, a
two-SD approximation is used for simplicity when discussing the 95% range.

16. What is the “empirical rule”?

The empirical rule states that in a normal distribution (in practice, even if not normal,
many distributions tend to follow the same rule), approximately:
 68% of data falls within 1 SD of the mean.
 95% falls within 2 SDs.
 99.7% falls within 3 SDs.
Explanation:
The empirical rule states that many distributions encountered in practice, even if they are
not perfectly normal, tend to follow similar patterns regarding the spread of data. Specifically,
about 95% of the data typically lies within two standard deviations (SDs) of the mean, and
around 99.7% lies within three SDs of the mean. This rule provides a useful guideline for
understanding the distribution of data in various contexts.
17. If a variable is strictly continuous and unimodal, what percentage of observations will lie
within two SDs of the mean?
If a distribution is strictly continuous and unimodal, with a definable standard deviation,
then no more than about 11.1% of the observations can lie further than two standard deviations
(SDs) from the mean. Conversely, this means that approximately 88.9% of the observations will
fall within that range. This principle helps to quantify the spread of data in continuous unimodal
distributions.
Explanation:
In a strictly continuous and unimodal distribution—which means the distribution has a single
peak and no gaps—there are certain expectations about where most of the data will fall. If the
distribution has a definable standard deviation (SD), it helps us understand how spread out the
data is around the mean (the average value).
1. Within Two Standard Deviations: About 88.9% of the data points will lie within two
standard deviations from the mean. This means that if you look at the range from two
SDs below the mean to two SDs above the mean, you will find that most of the data
points (almost 9 out of 10) fall within this range.
2. Outside Two Standard Deviations: The remaining 11.1% of data points will lie outside
this range—meaning they are either much lower than two SDs below the mean or much
higher than two SDs above the mean.

Stat
No ratings yet
Stat
16 pages
QTT201 Ca-2
No ratings yet
QTT201 Ca-2
14 pages
Measures of Dispersion Explained
No ratings yet
Measures of Dispersion Explained
8 pages
Bast 503 Lect 5
No ratings yet
Bast 503 Lect 5
53 pages
Chapter 5
No ratings yet
Chapter 5
11 pages
Probability and Statistics Lab Da2
No ratings yet
Probability and Statistics Lab Da2
6 pages
MMW Module 4.2 - Statistics - Measures of Variation, Normal Distribution & Simple Regression
No ratings yet
MMW Module 4.2 - Statistics - Measures of Variation, Normal Distribution & Simple Regression
9 pages
Dispersion
No ratings yet
Dispersion
26 pages
Dispersion
No ratings yet
Dispersion
18 pages
Glossary of Terms
No ratings yet
Glossary of Terms
7 pages
Chapter 5
No ratings yet
Chapter 5
11 pages
Understanding Measures of Variability
No ratings yet
Understanding Measures of Variability
11 pages
Chapter 5
No ratings yet
Chapter 5
11 pages
Measures of Variability
No ratings yet
Measures of Variability
14 pages
Statistics
No ratings yet
Statistics
10 pages
Unit 3 Measure of Central Location
No ratings yet
Unit 3 Measure of Central Location
29 pages
Group-1 Module-1 PPT
No ratings yet
Group-1 Module-1 PPT
100 pages
Central Tendency & Dispersion Explained
No ratings yet
Central Tendency & Dispersion Explained
11 pages
Comparing The Mean and The Median
No ratings yet
Comparing The Mean and The Median
48 pages
Chapter 5
No ratings yet
Chapter 5
11 pages
Statistics: Organize Understand
No ratings yet
Statistics: Organize Understand
9 pages
Measures of Dispersion Explained
No ratings yet
Measures of Dispersion Explained
9 pages
Intro to Descriptive Statistics
100% (2)
Intro to Descriptive Statistics
57 pages
Organization of Terms
No ratings yet
Organization of Terms
10 pages
Intro To Statistics - Descriptive Statistics and NPC - 20250225 - 171911 - 0000
No ratings yet
Intro To Statistics - Descriptive Statistics and NPC - 20250225 - 171911 - 0000
23 pages
Final Measures of Dispersion DR Lotfi
No ratings yet
Final Measures of Dispersion DR Lotfi
54 pages
Variability & Normal Distribution Guide
100% (1)
Variability & Normal Distribution Guide
61 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Chapter 3 Measure of Variation Dhiraj (Becon 2025)
No ratings yet
Chapter 3 Measure of Variation Dhiraj (Becon 2025)
83 pages
Qtymeth Dispersion
No ratings yet
Qtymeth Dispersion
8 pages
Measurements of Dispersion
No ratings yet
Measurements of Dispersion
24 pages
Statistics
100% (1)
Statistics
6 pages
Lecture No. 6 Measures of Variability
No ratings yet
Lecture No. 6 Measures of Variability
25 pages
Intro to Descriptive Statistics
No ratings yet
Intro to Descriptive Statistics
68 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
10 pages
Statistics: Types, Data, and Measures
No ratings yet
Statistics: Types, Data, and Measures
6 pages
Module 7 - Measures of Variability
No ratings yet
Module 7 - Measures of Variability
16 pages
Sibd Questions Soved Theory
No ratings yet
Sibd Questions Soved Theory
14 pages
MBA Quantitative Methods: Dispersion
No ratings yet
MBA Quantitative Methods: Dispersion
2 pages
Measures of Dispersion E-Content
No ratings yet
Measures of Dispersion E-Content
6 pages
Math
No ratings yet
Math
50 pages
STATISTICS
No ratings yet
STATISTICS
7 pages
Measures of Dispersion and Shape Explained
No ratings yet
Measures of Dispersion and Shape Explained
6 pages
G1 IT Report
No ratings yet
G1 IT Report
16 pages
04 Data Exploration Part 1 - Spring 24-25
No ratings yet
04 Data Exploration Part 1 - Spring 24-25
15 pages
QTT Project 2 2023
No ratings yet
QTT Project 2 2023
16 pages
GE Math - Learning Module 9 - AY 2021 2022
No ratings yet
GE Math - Learning Module 9 - AY 2021 2022
9 pages
Measures of Dispersion and Position in Statistics
No ratings yet
Measures of Dispersion and Position in Statistics
50 pages
Descriptive Statistics in Biostatistics
No ratings yet
Descriptive Statistics in Biostatistics
35 pages
Basic Statistics
No ratings yet
Basic Statistics
7 pages
Module 1 Assessment 2
No ratings yet
Module 1 Assessment 2
5 pages
MBA Quantitative Techniques and Analytics 03
No ratings yet
MBA Quantitative Techniques and Analytics 03
14 pages
Math
No ratings yet
Math
10 pages
Understanding Data Variability
No ratings yet
Understanding Data Variability
7 pages
Understanding Data Dispersion
No ratings yet
Understanding Data Dispersion
23 pages
Data Management
No ratings yet
Data Management
7 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
Measures of Central Tendency & Variability
No ratings yet
Measures of Central Tendency & Variability
27 pages
Chapter 1 Instrumental
No ratings yet
Chapter 1 Instrumental
22 pages
PDVSA Centrifugal Pumps Guide
No ratings yet
PDVSA Centrifugal Pumps Guide
31 pages
MOSFET Circuits Analysis and Examples
No ratings yet
MOSFET Circuits Analysis and Examples
18 pages
Diesel Engine Technical Specs
No ratings yet
Diesel Engine Technical Specs
8 pages
Dynamic Analysis and Structural Design of Turbine Generator Foundations
100% (1)
Dynamic Analysis and Structural Design of Turbine Generator Foundations
12 pages
Minor Project
No ratings yet
Minor Project
22 pages
Knauf Safeboard
No ratings yet
Knauf Safeboard
4 pages
CB Operation Tips
No ratings yet
CB Operation Tips
2 pages
PI - Valvoline CVT Fluid - 261-03
No ratings yet
PI - Valvoline CVT Fluid - 261-03
2 pages
Byd Atto3 59dda9
No ratings yet
Byd Atto3 59dda9
2 pages
CH 17 Jumbled Sentences - 0ed37e1e 74ae 4c39 995c 566dda99bd94
No ratings yet
CH 17 Jumbled Sentences - 0ed37e1e 74ae 4c39 995c 566dda99bd94
71 pages
Fruit Wines
0% (1)
Fruit Wines
10 pages
Power System Analysis Course Overview
No ratings yet
Power System Analysis Course Overview
4 pages
Galvanizing Division
No ratings yet
Galvanizing Division
12 pages
Effect of Silver Nitrate on Yield
No ratings yet
Effect of Silver Nitrate on Yield
6 pages
Surgery Shelf Topics
No ratings yet
Surgery Shelf Topics
2 pages
Amir Ajay's Travel Itinerary
No ratings yet
Amir Ajay's Travel Itinerary
1 page
United Scientific Quotation
No ratings yet
United Scientific Quotation
1 page
Nature's Best Water Refilling Business Plan
No ratings yet
Nature's Best Water Refilling Business Plan
45 pages
Question Bank
No ratings yet
Question Bank
10 pages
Expanding and Non-Expanding Minerals
No ratings yet
Expanding and Non-Expanding Minerals
5 pages
BIOL 2070 - Final Exam Review
No ratings yet
BIOL 2070 - Final Exam Review
22 pages
Top Notch 3 2ND Edition Workbook Answer Key PDF
88% (8)
Top Notch 3 2ND Edition Workbook Answer Key PDF
16 pages
Ocean Beach Climate Change Adaptation Project
No ratings yet
Ocean Beach Climate Change Adaptation Project
12 pages
11 - Quadrilaterals and Other Polygons (10+5)
No ratings yet
11 - Quadrilaterals and Other Polygons (10+5)
4 pages
Rome Flight Itinerary for July-August 2017
No ratings yet
Rome Flight Itinerary for July-August 2017
1 page
Metallurgy of Alloy 718 Analysis
No ratings yet
Metallurgy of Alloy 718 Analysis
36 pages
Asco 7000 Series Automatic Transfer Switches Installation Manual - 381333-414A
No ratings yet
Asco 7000 Series Automatic Transfer Switches Installation Manual - 381333-414A
8 pages
Bus Routes to Canterbury Hospital
No ratings yet
Bus Routes to Canterbury Hospital
7 pages
SUZLON
100% (1)
SUZLON
16 pages

Review Question - C3 - SACR3080

Uploaded by

Review Question - C3 - SACR3080

Uploaded by

Chapter 3

 accounts for the precise intervals between data points

- they allow us to compare the shape of two distributions without being

16. What is the “empirical rule”?

You might also like