0% found this document useful (0 votes)

7 views

Lecture 3 Numerical Measures of Data

Uploaded by

nicklin0419

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Lecture 3 Numerical Measures of Data

Uploaded by

nicklin0419

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Lecture 3.

Numerical Measures of Data

AGEC 2001 Statistics I

Feng-An Yang1

1 Departmentof Agricultural Economics

National Taiwan University

Fall Semester

1/36
Outline
Measures of Location
Mean
Median
Mode
Shape of a distribution
Measures of Variation
Range
Variance and Standard Deviation
Coefficient of Variation
Grouped Data
Measures of Position
Percentile
Location of Percentile
Quartile and Decile
Box plot
2/36
Measures of Location

Measures of Location
Numerical measures used to describe the central tendency of the
data
I Common measures of location
I Mean
I Median
I Mode

3/36
Mean

Mean
A numerical average of a set of numbers
I Arithmetic Mean
I Weighted Mean
I Geometric Mean

Example
I The mean height of AGEC students is 172 cm.
I The mean weight of AGEC students is 55.3 kg.

4/36
Arithmetic Mean

Arithmetic Mean
Arithmetic mean is the simplest and the most widely used measure
of mean, and it is the sum of all the numbers in a dataset divided
by the number of observations in that dataset

Population Mean
N
P
xi
i=1
µ=
N

I µ is the population mean

I N is the number of observations
I Xi is the value of i-th observation

5/36
Arithmetic Mean

Sample Mean
n
P
xi
i=1
x̄ = n

I x̄ is the sample mean

I n is the number of observations in the sample

Example
{90,77,94,89,119,112,91,110,92,100,113,83}
n
P
xi
i=1 90+77+···+83 1,170
x̄ = n = 12 = 12 = 97.5

6/36
Arithmetic Mean

Properties of the Arithmetic Mean

I All values in the dataset are used in the calculation of mean
I The mean is unique
I The sum of the deviations from the mean is zero
n
(xi − x̄ ) = 0
P
i=1

Example
{3,7,5}, x̄ = 5
n
(xi − x̄ ) = (3 − 5) + (7 − 5) + (5 − 5) = 0
P
i=1

7/36
Arithmetic Mean

Properties of the Arithmetic Mean (cont’d)

I The mean can be affected by extreme values

Example
I A={1,2,3,4,5}, x̄A = 3
I B={1,2,3,4,100}, x̄B = 22

8/36
Median

Median
The midpoint of all values in a dataset

Steps for finding the median

I Sort the data in ascending (or descending) order
I In case of odd number of observations, the Median is on the
n+1
2 position
I Example: {11, 17, 25, 38, 60}. The median is 25
I In case of even number of observations, the Median is the
simple average of two middle numbers
25+38
I Example: {11, 17, 25, 38, 60, 65}. The median is 2 = 31.5

9/36
Median

Median
I The median is less sensitive to extreme values
I The median is unique

Example
I A={1,2,3,4,5}, x̄A = 3, median=3
I B={1,2,3,4,100}, x̄B = 22, median=3

10/36
Mode

Mode
The value of number that appears most often in a datset
I The mode is less sensitive to extreme values
I There may be multiple modes

Steps for finding the mode

I Organize the data and make a frequency table
I The mode is the value(s) with highest frequency

11/36
Mode

Example
{4,4,4,3,100,3,1,3,5,2,2,5,6,1,2,2,3,7,
1,3,7,8,1,4,7,5,2,2,5,1,1,3,3,1,2}

Value Frequency
1 7
2 7
3 7
4 3
5 4
6 1
7 3
100 2

I The modes are 1, 2, and 3

12/36
Shape of a distribution

Skewness
Skewness is a measure of the symmetry of a data distribution

1.5

0.4 Mode 0.4 Mode

Mean, Median, Mode
Median 1 Median

Mean Mean

0.2 0.2
0.5

0 0 0
−4 −2 0 −2 0 2 0 2 4

(a) Left-skewed: Mean < Median (b) Symmetric: Mean = Median (c) Right-skewed: Mean > Median

13/36
Measures of Variation
Measures of Variation
Numerical measures used to describe the spread of data
I Common measures of variation
I Range
I Variance and Standard Deviation
I Coefficient of Variation

Why study dispersion?

Measures of location, which describe central tendency of data, are
useful at that standpoint, but it tells noting about the variability of
data. Two data distributions can have the same central tendency
but quite different variability
0.3
0.2
0.1
0
0 2 4 6 8 10
x
14/36
Range

Range
The difference between the largest and the smallest values in a
dataset
Range = Maximum value - Minimum value

Example
{7,8,13,15,27,30}, Range=30-7=23

Issues
I It can be affected by extreme values
I {7,8,13,15,27,30}, Range=30-7=23
I {7,8,13,15,27,130}, Range=130-7=123
I It tells nothing about how data are distributed

15/36
Variance

Variance
The arithmetic mean of the squared deviations from the mean

Population Variance
N
P
(xi −µ)2
σ2 = i=1
N

I σ 2 is the population variance

I xi is the value of i-th observation
I µ is the population mean
I N is the number of observations in the population

16/36
Variance

Sample Variance
n
P
(xi −x̄ )2
s2 = i=1
n−1

I s 2 is the sample variance

I x̄ is the sample mean

Sample Standard Deviation

v
uPn
u (xi −x̄ )2
t
i=1
s= n−1

17/36
Variance

n
(xi − x̄ )2
P
2 i=1
s =
n−1
n
xi2 − 2xi x̄ + x̄ 2
P
i=1
=
n
n−1
n

P 2
xi − 2x̄ xi + nx̄ 2
P
i=1 i=1
=
n
n−1
P 2 2 2
xi − 2nx̄ + nx̄
i=1
=
n
n−1
P 2 2
xi − nx̄
i=1
=
n−1
18/36
Variance
Example

x x2 x − x̄ (x − x̄ )2
12 144 -5 25
20 400 3 9
16 256 -1 1
18 324 1 1
19 361 2 4
Total 1485 0 40

n
(xi − x̄ )2
P
i=1 40
s2 = = = 10
n
n−1
5−1
P 2
xi − nx̄ 2
i=1 1485 − 5 × 172
= = = 10
n−1 5−1
19/36
Variance

Properties of Variance
I Variance and standard deviation can never be negative
I Variance and standard deviation do not depend on the
location of data
I The more concentrated the data are, the smaller the variance
and standard deviation
I What if there is no variation in the data, i.e., all values are the
same?

0.2

0.1

0
−2 0 2 4 6 8 10 12
x
20/36
Empirical Rule

Empirical Rule
For a symmetrical, bell-shaped distribution, approximately 68%,
95%, and 99.7% of the observations lie within plus and minus one,
two, and three standard deviation of the mean, respectively
I Pr(µ − σ ≤ X ≤ µ + σ) ≈ 68%
I Pr(µ − 2σ ≤ X ≤ µ + 2σ) ≈ 95%
I Pr(µ − 3σ ≤ X ≤ µ + 3σ) ≈ 99.7%
68%

95%

99.7%

−3σ −2σ −1σ µ 1σ 2σ 3σ

21/36
Chebyshev’s Theorem

Chebyshev’s Theorem
For any set of observations (sample or population), the proportion
of values that lie within k standard deviations of the mean is at
least 1âĂŞ k12 , where k is any value greater than 1

Example
The average height of AGEC students is 170 cm and the
corresponding standard deviation is 10. At least what percent of
students lie within plus 3 and minus 3 standard deviations of the
mean? 1 − k12 = 1 − 312 = 1 − 19 ≈ 0.89

22/36
Coefficient of Variation

Coefficient of Variation (CV)

The coefficient of variation is a standardized measure of dispersion
of a data distribution, expressed as a percentage
I CV = x̄s × 100%
s is the sample standard deviation and x̄ is the sample mean
I It quantifies the variability relative to the mean and facilitates
the comparison of variability among data distributions with
different units or significantly different means

23/36
Coefficient of Variation
Example

Pollutant Mean Standard Deviation CV

PM2.5 100 Îĳg/m3 10 Îĳg/m3 10%
Ozone 50 ppm 10 ppm 20%

Relative to mean, the pollution of ozone is more variable than the

PM2.5

Example

Company Mean Production Standard Deviation CV

A 10000 10 0.1%
B 50 10 20%

Company A and B have the same variation in their production, but

company B is more variable relative to its production
24/36
Arithmetic Mean of Grouped data

Meann
P
f ×M
i=1
x̄ = n
I f is the frequency in each class
I M is the midpoint in each class

Example
Point Frequency (f ) Midpoint (M) f ×M
0-10 5 5 25
10-20 1 15 15
20-30 3 25 75
30-40 4 35 140
40-50 2 45 90
Total 15 345

n
P
f ×M
i=1 345
x̄ = n = 15 = 23

25/36
Standard Deviation of Grouped data

Standard
v Deviation
uPn
u f (M−x̄ )2
t
i=1
s= n−1

Example
Point Frequency (f ) Midpoint (M) f ×M (M − x̄ ) (M − x̄ )2 f (M − x̄ )2
0-10 5 5 25 -18 324 1620
10-20 1 15 15 -8 64 64
20-30 3 25 75 2 4 12
30-40 4 35 140 12 144 576
40-50 2 45 90 22 484 968
Total 15 345 3240

v
uPn
u f (M−x̄ )2
t q
i=1 3240
x̄ = n−1 = 14 = 15.21

26/36
Measures of Position

Measures of Position
Numerical measures used to divide data in equal parts
I Common measures of Position
I Quartile
I Decile
I Percentile

27/36
Percentile

Percentile
A percentile is a value indicating the percentage of observations in
a dataset fall below that value

Example
I The 87th percentile is 90 and it indicates that 87% of
observations are below 90

28/36
Location of Percentile

Steps for finding the pth percentile

I 1. Order the data in ascending order
I 2. Multiply p percent by the number of observations in the
data. Let’s call the resulting number as an index i
I 3. Check the index in Step 2.
I In case of a whole number, the pth percentile is the simple
average between the ith value and (i + 1)th value in the
ordered data
I Otherwise, round the index up to the nearest whole number.
The pth percentile is the dieth value in the ordered data

Note
There are some other ways to determine the percentile, such as
nearest-rank method, linear interpolation method

29/36
Location of Percentile

Example
{43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87,
88, 89, 93, 95, 96, 98, 99, 99}
I Suppose we want to find the 60th percentile. Index
i = 60/100 × 25 = 15
I The 60th percentile is then the simple average between the
15th value and 16th value
79+85
I P60 = 2 = 82

30/36
Location of Percentile

Example
{34, 42, 51, 65, 69, 74, 78, 84, 85, 85, 86, 87}
I Suppose we want to find the 80th percentile. Index
i = 80/100 × 12 = 9.6
I Since the index is not a whole number, we round it up to 10.
Then the the 80th percentile is at the 10th position in the
ordered data
I P80 = 85

31/36
Quartile and Decile

Quartiles
I The first quartile is called Q1 and it is equal to the 25th
percentile, indicting that 25% of observations are below it
I The second quartile is called Q2 and it is equal to the 50th
percentile. It is also simply the median that splits the data in
half
I The third quartile is called Q3 and it is equal to the 75th
percentile, indicting that 75% of observations are below it
I Interquartile range = Q3 − Q1

Deciles
In a similar fashion to Quartiles, Deciles are nine values that divide
the data into ten equal parts

32/36
Box plot

Box plot
I A box plot is a graphical representation of the distribution of
a data set
I It displays the median, quartiles, and potential outliers of the
data, providing a visual summary of its central tendency and
spread
I Also known as a box-and-whisker plot

33/36
Box plot
Components of a Box Plot
I Box
I The central box represents the interquartile range (IQR), which
includes the middle 50% of the data
I The edges of the box are the first quartile (Q1) and the third
quartile (Q3)
I Median Line
I A line inside the box represents the median (the 50th
percentile), which divides the data into two equal halves
I Whiskers
I Whiskers extend from the edges of the box to the minimum
and maximum values within a defined range, typically 1.5
times the IQR from Q1 and Q3
I They show the spread of the data outside the middle 50%
I Outliers
I Data points that fall outside the whiskers are considered
outliers and are often marked with individual points or symbols
34/36
Box plot
Min and Max as the boundary

I Let’s consider an example where we have exam scores for a

group of students
I 55, 60, 65, 70, 72, 75, 78, 80, 83, 85, 88, 90, 92, 95, 100
I Summaries
I Minimum: 55
I Q1 (First Quartile): 70
I Median (Q2): 80
I Q3 (Third Quartile): 90
I Maximum: 100

55 70 80 90 100

35/36
Box plot
1.5 IQR as the boundary

I 30,50,51,53,53,54,54,58,59,60,61,62,62,64,65,67,68,69,80,90
I Summaries
I Minimum: 30
I Q1 (First Quartile): 53.5
I Median (Q2): 60.5
I Q3 (Third Quartile): 66
I Maximum: 90
I Lower and upper bound
I Interquartile Range (IQR) = Q3 - Q1 = 66 - 54 = 12
I Lower Bound = 54 - 1.5 × 12 = 36
I Upper Bound = 66 + 1.5 × 12 = 84
I Outliers: 94

30 36 54 60.5 66 84 88 94
36/36

Sta 111 (Introduction of Statistics)
No ratings yet
Sta 111 (Introduction of Statistics)
47 pages
BRM Data Analysis Techniques
No ratings yet
BRM Data Analysis Techniques
53 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
MAT 161 Lesson - 4
No ratings yet
MAT 161 Lesson - 4
26 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
History Reporting
No ratings yet
History Reporting
61 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
Chapter-5-Statistics-and-Data
No ratings yet
Chapter-5-Statistics-and-Data
25 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
59 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
No ratings yet
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
26 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Descriptive Measures With Samples-1
No ratings yet
Descriptive Measures With Samples-1
33 pages
EECM3724_Unit_1_Ch3_slides_2022
No ratings yet
EECM3724_Unit_1_Ch3_slides_2022
48 pages
Theory and Formula
No ratings yet
Theory and Formula
42 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
MMW PPT Weeks 9 12
No ratings yet
MMW PPT Weeks 9 12
31 pages
03 -- measures_of_center_variation
No ratings yet
03 -- measures_of_center_variation
45 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
Data Analytics TB
No ratings yet
Data Analytics TB
1,944 pages
Lecture 5&6
No ratings yet
Lecture 5&6
15 pages
Basic 1
No ratings yet
Basic 1
60 pages
Statistics Unit1 Notes.docx
No ratings yet
Statistics Unit1 Notes.docx
11 pages
Measures of Central Tendency and Spread: Chapter 1, Section 2
No ratings yet
Measures of Central Tendency and Spread: Chapter 1, Section 2
36 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
Chapter 3 - Data Presentation
No ratings yet
Chapter 3 - Data Presentation
40 pages
Chapt3 Overheads
No ratings yet
Chapt3 Overheads
8 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
No ratings yet
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
62 pages
Lecture 3
No ratings yet
Lecture 3
10 pages
EDA_W3_Obtaining-Data
No ratings yet
EDA_W3_Obtaining-Data
57 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
14 pages
Lecture 1, BAS115
No ratings yet
Lecture 1, BAS115
57 pages
Statistical Measures 2024 (Part 2) - Word
No ratings yet
Statistical Measures 2024 (Part 2) - Word
8 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
Central Tendency and Dispersion
No ratings yet
Central Tendency and Dispersion
61 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Measures of Location and Spread
No ratings yet
Measures of Location and Spread
1 page
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Statistics Tutorial 1
No ratings yet
Statistics Tutorial 1
12 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
PDF Document 3
No ratings yet
PDF Document 3
35 pages
STAT 309
No ratings yet
STAT 309
25 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
2) SummarizationOfData Mean Median Mod SD CV
No ratings yet
2) SummarizationOfData Mean Median Mod SD CV
24 pages
Lecture 3 Summarizing Data Measures of Central Location and Sampling
No ratings yet
Lecture 3 Summarizing Data Measures of Central Location and Sampling
53 pages
EXP-1- Statistics and Plotting
No ratings yet
EXP-1- Statistics and Plotting
23 pages
MEFall2023_4
No ratings yet
MEFall2023_4
28 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Ders 6-CH 9 Probability To Make Decision
No ratings yet
Ders 6-CH 9 Probability To Make Decision
65 pages
WFL Girls 0 2 Percentiles
No ratings yet
WFL Girls 0 2 Percentiles
6 pages
CA1-QT-2
No ratings yet
CA1-QT-2
6 pages
054 305 Descriptive Statistics - C
No ratings yet
054 305 Descriptive Statistics - C
43 pages
Effect of Data Changes On Dispersion
No ratings yet
Effect of Data Changes On Dispersion
1 page
Inferencial Statistics: Sebastian Gustavo Moreno Barón
No ratings yet
Inferencial Statistics: Sebastian Gustavo Moreno Barón
49 pages
Lampiran SPSS (Fix)
No ratings yet
Lampiran SPSS (Fix)
10 pages
Stat Module 6
No ratings yet
Stat Module 6
3 pages
Data Descriptive Statistics Inferential Statistics
No ratings yet
Data Descriptive Statistics Inferential Statistics
15 pages
Online Games Played Frequency Percentage
No ratings yet
Online Games Played Frequency Percentage
4 pages
Mathematics: Quarter 4: Week 1-5
No ratings yet
Mathematics: Quarter 4: Week 1-5
52 pages
Maths Microproject
No ratings yet
Maths Microproject
20 pages
It0089 Finalreviewer
100% (1)
It0089 Finalreviewer
143 pages
Lampiran R Studio
No ratings yet
Lampiran R Studio
25 pages
Seminar 5 6 Answers1
No ratings yet
Seminar 5 6 Answers1
4 pages
Sta301 - 1
100% (1)
Sta301 - 1
3 pages
Basic Statistics for Business and Economics Canadian 5th Edition Lind Solutions Manual download
100% (3)
Basic Statistics for Business and Economics Canadian 5th Edition Lind Solutions Manual download
57 pages
Ss-Chapter 12: Sampling: Final and Initial Sample Size Determination
No ratings yet
Ss-Chapter 12: Sampling: Final and Initial Sample Size Determination
14 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Lesson 3 - Statistics Refresher
No ratings yet
Lesson 3 - Statistics Refresher
56 pages
Salinan BERN 2143 ENGINEERING STATISTIC - UTeM Engineering Students Academic Performance Survey
No ratings yet
Salinan BERN 2143 ENGINEERING STATISTIC - UTeM Engineering Students Academic Performance Survey
21 pages
Q1 Science Iis MPS Conso
No ratings yet
Q1 Science Iis MPS Conso
2 pages
ANOVA PPT
No ratings yet
ANOVA PPT
41 pages
Session 12
No ratings yet
Session 12
9 pages
Tugas Statistik 4 Nama: Erin Ernawati Nim: 300 420 4026: Paired Samples Statistics
No ratings yet
Tugas Statistik 4 Nama: Erin Ernawati Nim: 300 420 4026: Paired Samples Statistics
5 pages
Mean, Median, Mode, Range: By: P.K Sir
No ratings yet
Mean, Median, Mode, Range: By: P.K Sir
9 pages
Worksheet in GE Math: Wk-12 Statistics Measures of Dispersion
No ratings yet
Worksheet in GE Math: Wk-12 Statistics Measures of Dispersion
4 pages
IBSL-1 AA Stats Ch6
No ratings yet
IBSL-1 AA Stats Ch6
7 pages

Lecture 3 Numerical Measures of Data

Uploaded by

Lecture 3 Numerical Measures of Data

Uploaded by

Lecture 3.

Numerical Measures of Data

1 Departmentof Agricultural Economics

I µ is the population mean

I x̄ is the sample mean

Properties of the Arithmetic Mean

Properties of the Arithmetic Mean (cont’d)

Steps for finding the median

Steps for finding the mode

I The modes are 1, 2, and 3

0.4 Mode 0.4 Mode

Why study dispersion?

I σ 2 is the population variance

I s 2 is the sample variance

Sample Standard Deviation

−3σ −2σ −1σ µ 1σ 2σ 3σ

Coefficient of Variation (CV)

Pollutant Mean Standard Deviation CV

Relative to mean, the pollution of ozone is more variable than the

Company Mean Production Standard Deviation CV

Company A and B have the same variation in their production, but

Steps for finding the pth percentile

I Let’s consider an example where we have exam scores for a

You might also like