0% found this document useful (0 votes)

17 views

03_WEEK2_Statistics_Part2 (2)

The document covers statistical measures of location, including mean, median, and mode, as well as their calculations for both sample and population data. It also introduces percentiles, quartiles, and exploratory data analysis techniques such as the five-number summary and box plots. Examples are provided to illustrate the computation of these statistics using apartment rent data.

Uploaded by

Alma Cseh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

03_WEEK2_Statistics_Part2 (2)

Uploaded by

Alma Cseh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Statistics

Week 2
Part 2

Unit 8: Measures of Location

Unit 9: Percentiles and Exploratory Data
Analysis
Unit 8 Measures of Location

■ Mean
■ Weighted Mean
■ Median
■ Mode
Measures of Location

■ Measures of location indicate at what numerical values

certain characteristic points of the distribution are
located.
If the measures are computed
using data from a sample,
they are called sample statistics.

If the measures are computed

using data for a population,
they are called population parameters.

A sample statistic is referred to

as the point estimator of the
corresponding population parameter.
Mean

■ The mean of a data set is the measure commonly

referred to as the average of all the data values.
■ The sample mean y is the point estimator of the
population mean m.
Sample Mean y

Sum of the values

of the n observations
n

y i
y= i =1

n
Number of
observations
in the sample
Population Mean m

Sum of the values

of the N observations
N

y i
m= i =1

N
Number of
observations in
the population
Sample Mean

■ Example: Apartment Rents

Seventy apartments were
randomly sampled in a small
university town. The monthly
rents for these apartments
are listed in ascending
order on the next slide.
Sample Mean

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Sample Mean

y=
 y i
=
34, 356
= 490.80
n 70

■ When the mean is computed by giving each data

value a weight that reflects its importance, it is
referred to as a weighted mean.
■ When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.
■ For example, in the computation of a grade point
average (GPA) in U.S. universities, the weights are
the number of credit hours earned for each grade.
Weighted Mean

y=  Wy
= w y
i i

W i
i i

where:
yi = value of observation i
Wi = weight for observation i
wi = relative weight for observation i
 Wi 
 wi = 

  Wj 

Grouped Data

■ The weighted mean computation can be used to

obtain approximations of the mean, variance, and
standard deviation for grouped data.
■ To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
■ We compute a weighted mean of the class midpoints
using the class frequencies as weights.
■ Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.
Mean for Grouped Data

■ Sample Data
k

fM i i
y i =1
n

■ Population Data
k

fM i i
m i =1
N
where:
k = number of classes
fi = frequency of class i
Mi = midpoint of class i
Sample Mean for Grouped Data

Given below is the previous sample of monthly rents

for 70 apartments, presented here as grouped data in
the form of a frequency distribution.
Rent (€) Frequency
420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6
Sample Mean for Grouped Data

Rent (€) fi Mi fiMi

420-439 8 430 3440 34,560
y = 493.71
440-459 17 450 7650 70
460-479 12 470 5640 This approximation
480-499 8 490 3920 differs by €2.91 from
500-519 7 510 3570
the actual sample
520-539 4 530 2120
540-559 2 550 1100
mean of €490.80.
560-579 4 570 2280
580-599 2 590 1180
600-619 6 610 3660
Total 70 34560
Median

■ The median of a data set is the value in the middle

when the data items are arranged in ascending order.
■ Whenever a data set has extreme values, the median
is the preferred measure of central location.
■ The median is the measure of location most often
reported for annual income and property value data.
■ A few extremely large incomes or property values
can inflate the mean.
Median

■ For an odd number of observations:

• Position of the median: i = (n+1)/2

• Value of the median: Me = yi

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

the median is the middle value.

Median = 19
Median

■ For an even number of observations:

• Position of the median: i = (n+1)/2

• Value of the median: Me = (yi-0.5 + yi+0.5)/2

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

Median

Averaging the 35th and 36th data values:

Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Mode

■ The mode of a data set is the value that occurs with

greatest frequency.
■ The greatest frequency can occur at two or more
different values.
■ If the data have exactly two modes, the data are
bimodal.
■ If the data have more than two modes, the data are
multimodal.
Mode

450 occurred most frequently (7 times)

Mode = 450

■ Percentiles (or quantiles) of a data set are cut-off values

that separate the lower p% of the data from the upper
(100-p)%.
■ The pth percentile, Qp% is a value such that at least p%
of the data items take on a value less than or equal to
Qp% and at least (100 - p)% of the data items take on a
value greater than or equal to Qp%.
■ A well-chosen set of percentiles provides information
about how the data are spread over the interval from
the smallest to the largest value.
■ Admission test scores for colleges and universities are
frequently reported in terms of percentiles.
Percentiles

Arrange the data in ascending order.

Compute index i, the position of the pth percentile.

i = (p/100)n

If i is not an integer, round it up. The pth percentile

is the value in position i.
Qp% = y i 

If i is an integer, the pth percentile is the average of

the values in positions i and i+1.
yi + yi +1
Qp% =
2
90th Percentile

i = (p/100)n = (90/100)×70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585

“At least 90% “At least 10%

of the items of the items
take on a value take on a value
of 585 or less.” of 585 or more.”
63/70 = 0.9 or 90% 7/70 = 0.1 or 10%

■ Quartiles are specific percentiles.

■ First Quartile = 25th Percentile
■ Second Quartile = 50th Percentile = Median
■ Third Quartile = 75th Percentile
Third Quartile

Third quartile = 75th percentile

i = (p/100)n = (75/100)×70 = 52.5 = 53
Third quartile = 525

■ Five-Number Summary
■ Box Plot
Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value
Five-Number Summary

Lowest Value = 425 First Quartile = 445

Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot

■ A box is drawn with its ends located at the first and

third quartiles.
■ A vertical line is drawn in the box at the location of
the median (= second quartile).

375 400 425 450 475 500 525 550 575 600 625

Q1 = 445 Q3 = 525
Q2 = 475
Box Plot

■ The interquartile range (IQR) is calculated as the

difference between the first and the third quartiles.

IQR = Q3 – Q1 = 525 – 445 = 80

■ Limits are located (not drawn) using the interquartile

range (IQR).
■ The normal range is defined as the interval between
the lower and the upper limits.
■ Data outside these limits are considered outliers.

… continued
Box Plot

■ The lower limit is located 1.5×IQR below Q1.

Lower Limit: Q1 – 1.5×IQR = 445 – 1.5×80 = 325

■ The upper limit is located 1.5×IQR above Q3.

Upper Limit: Q3 + 1.5×IQR = 525 + 1.5×80 = 645

■ The normal range is the interval between the lower

and the upper limits: [325, 645].
■ There are no outliers (values outside the normal
range) in the apartment rent data.
Box Plot

■ Whiskers (dashed lines) are drawn from the ends of

the box to the smallest and largest data values within
the normal range.
■ The location of each outlier is shown by a suitable
symbol, e.g. an asterisk (*).

375 400 425 450 475 500 525 550 575 600 625

Smallest value in Largest value in

normal range = 425 normal range = 615
End of Unit 9

Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
CT127 3 2 Pfda NP000327
No ratings yet
CT127 3 2 Pfda NP000327
21 pages
Decision Science
No ratings yet
Decision Science
523 pages
Session 2 Descriptive Statistics
No ratings yet
Session 2 Descriptive Statistics
33 pages
Chapter03
No ratings yet
Chapter03
67 pages
Quantitative Methods For Management
No ratings yet
Quantitative Methods For Management
118 pages
Central Tendency Variation Outliers
No ratings yet
Central Tendency Variation Outliers
59 pages
Descriptive Statistics: Numerical Measures: Measures of Location Measures of Variability
100% (1)
Descriptive Statistics: Numerical Measures: Measures of Location Measures of Variability
68 pages
Slides Prepared by John S. Loucks St. Edward's University
No ratings yet
Slides Prepared by John S. Loucks St. Edward's University
59 pages
Measures of Location (Central Tendency) Measures of Variability
No ratings yet
Measures of Location (Central Tendency) Measures of Variability
68 pages
Presentation 3
100% (1)
Presentation 3
37 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
1 Introduction
No ratings yet
1 Introduction
44 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
41 pages
Chap3 A YzQ6R
No ratings yet
Chap3 A YzQ6R
52 pages
Descriptive Statistics - Numerical measure
No ratings yet
Descriptive Statistics - Numerical measure
33 pages
Business Statistics CH (7)
No ratings yet
Business Statistics CH (7)
37 pages
Data Analytics TB
No ratings yet
Data Analytics TB
1,944 pages
Measures-of-Centrality-and-Variability
No ratings yet
Measures-of-Centrality-and-Variability
42 pages
Descriptive Statistics: Numerical Measures: Measures of Location (Central Tendency) Measures of Variability
No ratings yet
Descriptive Statistics: Numerical Measures: Measures of Location (Central Tendency) Measures of Variability
68 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Stat I Chapter 3
No ratings yet
Stat I Chapter 3
48 pages
slides_week2
No ratings yet
slides_week2
43 pages
Descriptive Statistics - Numerical Measures
No ratings yet
Descriptive Statistics - Numerical Measures
91 pages
Dtatistical Measures
No ratings yet
Dtatistical Measures
54 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
# 4 Pemusatan & Penyebaran Data (TM)
No ratings yet
# 4 Pemusatan & Penyebaran Data (TM)
65 pages
Lecture - 2 Descriptive Analytics
No ratings yet
Lecture - 2 Descriptive Analytics
56 pages
Statistical Measures for Reference
No ratings yet
Statistical Measures for Reference
35 pages
4. Numerical summary measures
No ratings yet
4. Numerical summary measures
60 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
CH03 - Descriptive Statistics 2
No ratings yet
CH03 - Descriptive Statistics 2
67 pages
Measures
No ratings yet
Measures
8 pages
Sec 2.8 - 2021
No ratings yet
Sec 2.8 - 2021
20 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
Interpretation of Test Results
No ratings yet
Interpretation of Test Results
27 pages
U3-PPT6
No ratings yet
U3-PPT6
4 pages
STAE lecture notes_LU3_Annotated
No ratings yet
STAE lecture notes_LU3_Annotated
10 pages
CH 03
No ratings yet
CH 03
48 pages
02 Descriptive Statistics
No ratings yet
02 Descriptive Statistics
30 pages
PowerPoint CH 03a
100% (1)
PowerPoint CH 03a
34 pages
02 Measures of Central Tendency
No ratings yet
02 Measures of Central Tendency
41 pages
Chapter 3
No ratings yet
Chapter 3
98 pages
Chapter_03_SSM-FINAL
No ratings yet
Chapter_03_SSM-FINAL
23 pages
Module 1 Overview_of_Statistics
No ratings yet
Module 1 Overview_of_Statistics
11 pages
Statistics For Business and Economics: Anderson Sweeney Williams
No ratings yet
Statistics For Business and Economics: Anderson Sweeney Williams
34 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
Week7_Measures of Central Tendency
No ratings yet
Week7_Measures of Central Tendency
46 pages
STAE Lecture Notes - LU3
No ratings yet
STAE Lecture Notes - LU3
24 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
65 pages
Sec 2.8 - Measures of Position
No ratings yet
Sec 2.8 - Measures of Position
20 pages
Lec1 Statistics
No ratings yet
Lec1 Statistics
30 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Continuation Cahpter 4
No ratings yet
Continuation Cahpter 4
47 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
Statistics Measure of Center
No ratings yet
Statistics Measure of Center
11 pages
Measurement 1
No ratings yet
Measurement 1
27 pages
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures
No ratings yet
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures
47 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Summary Statistics Q & A
No ratings yet
Summary Statistics Q & A
12 pages
LP Stats
No ratings yet
LP Stats
34 pages
Show What You Know
No ratings yet
Show What You Know
6 pages
Ms Data Science S, 24 (WEEK# 2)
No ratings yet
Ms Data Science S, 24 (WEEK# 2)
19 pages
Review Exercises
No ratings yet
Review Exercises
11 pages
Principal Examiner Feedback Summer 2016: Pearson Edexcel GCSE in Statistics (2ST01) Higher Paper 1H
No ratings yet
Principal Examiner Feedback Summer 2016: Pearson Edexcel GCSE in Statistics (2ST01) Higher Paper 1H
14 pages
Dulces Math 1040
No ratings yet
Dulces Math 1040
6 pages
Excel Box and Whisker Diagrams (Box Plots) - Peltier Tech Blog
No ratings yet
Excel Box and Whisker Diagrams (Box Plots) - Peltier Tech Blog
32 pages
Dixon-MacCallum Graham MSC 2013
No ratings yet
Dixon-MacCallum Graham MSC 2013
76 pages
Some Popular Types of Visualization Charts and When To Use
No ratings yet
Some Popular Types of Visualization Charts and When To Use
12 pages
AP Stats Summary
No ratings yet
AP Stats Summary
26 pages
Functional Classification 1
No ratings yet
Functional Classification 1
17 pages
Ieil"R.: Matrikulasi Malaysia
No ratings yet
Ieil"R.: Matrikulasi Malaysia
10 pages
Grade 7 General Reveal Math - Term 3 Revision 2024-2025
No ratings yet
Grade 7 General Reveal Math - Term 3 Revision 2024-2025
57 pages
Extra Practice
No ratings yet
Extra Practice
24 pages
Question 7 Continued
No ratings yet
Question 7 Continued
2 pages
Biostatistics for Oncologists Apr 10 2018 _ 0826168582 _ Demos Medical 1st Edition Leonard Md Ms Kara Lynne Sullivan Phd Adam - Download the ebook today to explore every detail
100% (2)
Biostatistics for Oncologists Apr 10 2018 _ 0826168582 _ Demos Medical 1st Edition Leonard Md Ms Kara Lynne Sullivan Phd Adam - Download the ebook today to explore every detail
64 pages
6.03.P Spread of Data
No ratings yet
6.03.P Spread of Data
6 pages
ICU Report EN
No ratings yet
ICU Report EN
36 pages
IM M2-Week 3-Organization & Presentation of Data-1
No ratings yet
IM M2-Week 3-Organization & Presentation of Data-1
16 pages
Trip Coil Current Signature Analysis
100% (1)
Trip Coil Current Signature Analysis
5 pages
(eBook PDF) Modern Business Statistics, with Microsoft Office Excel 4th Edition pdf download
100% (1)
(eBook PDF) Modern Business Statistics, with Microsoft Office Excel 4th Edition pdf download
48 pages
Camm 3e Ch02 PPT PDF
No ratings yet
Camm 3e Ch02 PPT PDF
112 pages
Introduction To EDA Method in Machine Learning: by 60 - Soham Pawar
No ratings yet
Introduction To EDA Method in Machine Learning: by 60 - Soham Pawar
10 pages
Characteristics of The Urban Heat Island in Dhaka
No ratings yet
Characteristics of The Urban Heat Island in Dhaka
15 pages
Ans Data Analysis SAC 2019
No ratings yet
Ans Data Analysis SAC 2019
16 pages
wst01-01-que-20240522
No ratings yet
wst01-01-que-20240522
24 pages
creating-box-plots-2
No ratings yet
creating-box-plots-2
2 pages

03_WEEK2_Statistics_Part2 (2)

Uploaded by

03_WEEK2_Statistics_Part2 (2)

Uploaded by

Statistics

Unit 8: Measures of Location

■ Measures of location indicate at what numerical values

If the measures are computed

A sample statistic is referred to

■ The mean of a data set is the measure commonly

Sum of the values

Sum of the values

■ Example: Apartment Rents

■ When the mean is computed by giving each data

■ The weighted mean computation can be used to

Given below is the previous sample of monthly rents

Rent (€) fi Mi fiMi

■ The median of a data set is the value in the middle

■ For an odd number of observations:

• Value of the median: Me = yi

the median is the middle value.

■ For an even number of observations:

• Value of the median: Me = (yi-0.5 + yi+0.5)/2

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

Averaging the 35th and 36th data values:

■ The mode of a data set is the value that occurs with

450 occurred most frequently (7 times)

■ Percentiles (or quantiles) of a data set are cut-off values

Arrange the data in ascending order.

Compute index i, the position of the pth percentile.

If i is not an integer, round it up. The pth percentile

If i is an integer, the pth percentile is the average of

“At least 90% “At least 10%

■ Quartiles are specific percentiles.

Third quartile = 75th percentile

Lowest Value = 425 First Quartile = 445

■ A box is drawn with its ends located at the first and

■ The interquartile range (IQR) is calculated as the

IQR = Q3 – Q1 = 525 – 445 = 80

■ Limits are located (not drawn) using the interquartile

■ The lower limit is located 1.5×IQR below Q1.

Lower Limit: Q1 – 1.5×IQR = 445 – 1.5×80 = 325

■ The upper limit is located 1.5×IQR above Q3.

Upper Limit: Q3 + 1.5×IQR = 525 + 1.5×80 = 645

■ The normal range is the interval between the lower

■ Whiskers (dashed lines) are drawn from the ends of

Smallest value in Largest value in

You might also like