Statistics
Statistics
Statistics
Frequency Distribution
A frequency distribution is a tabular summary of data showing the number (frequency)
of observations in each of several nonoverlapping categories or classes.
Relative Frequency and Percent Frequency Distributions
A relative frequency distribution gives a tabular summary of data showing the relative frequency
for each class. A percent frequency distribution summarizes the percent frequency of the data for
each class.
Bar Charts and Pie Charts
A bar chart is a graphical display for depicting
categorical data summarized in a frequency, relative frequency, or percent
frequency distribution. On one axis of the chart (usually the horizontal axis),
we specify the labels that are used for the classes (categories).
The pie chart provides another graphical display for
presenting relative frequency and percent frequency distributions
for categorical data.
Example, because a circle contains 360 degrees and Coca-Cola shows a
relative frequency of .38, the sector of the pie chart labeled Coca-Cola
consists of .38(360) = 136.8 degrees
Chapter 2 / Section 2.2:
Summarizing Data for a Quantitative Variable:
* Frequency Distribution
Note:
This definition holds for quantitative as well as categorical data. However, with quantitative data we
must be more careful in defining the nonoverlapping classes to be used in the frequency distribution.
Steps:
1. determine the number of nonoverlapping classes.
2. determine the width of each class.
3. determine the class limits.
Number of classes Classes are formed by specifying ranges that will be used to group the data. As
a general guideline, we recommend using between 5 and 20 classes.
As a general guideline, we recommend that the width be the same for each class.
Class limits Class limits must be chosen so that each data item belongs to one class.
The approximate class width given by equation (2.2) can be rounded up. Example: 9.28 might be
rounded to 10
Example
class width=(33 − 12)/5 = 4.2 5
Note:
class width=15-10=5
class width=20-15=5
class width=25-20=5
class width=30-25=5
class width=35-30=5
Class midpoint: is the value halfway between the lower and upper class limits.
Audit Time (days) Frequency Relative Frequency Percent Frequency Class midpoint
Class midpoint: is same whether obtained using class limits or class boundaries
Histogram
A common graphical display of quantitative data is a histogram. This graphical display can
be prepared for data previously summarized in either a frequency, relative frequency, or
percent frequency distribution
In our Example: the audit time data are stated as 10–14, 15–19, 20–24, 25–29, and 30–34,
one-unit spaces of 14 to 15, 19 to 20, 24 to 25, and 29 to 30
Histogram types: Symmetric, Skewed to the left and Skewed to the right
Frequency polygon: Symmetric, Skewed to the left and Skewed to the right
Frequency Ogives: Symmetric, Skewed to the left and Skewed to the right
Cumulative Distributions
Exercise:
Consider the following frequency distribution (Given by Black Color)
Construct a Relative Frequency, Percent Frequency, Cumulative Frequency,
Cumulative Frequency, Relative Cumulative Frequency, Class midpoint
and Class Widt h
Relative Percent
Relative Percent Cumulative Class Class
Class Limit Frequency Cumulative Cumulative
Frequency Frequency Frequency midpoint Width
Frequency Frequency
Measures of Location
(Percentiles, and Quartiles)
A statistic is a characteristic or measure obtained by using the data values from a sample.
A parameter is a characteristic or measure obtained by using all the data values from a specific population.
Note: The mean is a central tendency measure. Thus, the mean value case of
must be between The lowest and highest values
The mean is sometimes referred to as the arithmetic mean values
𝒙𝒊
𝒙=
𝒏
Compute the mean of: 46, 54, 42, 46, 32 Compute the mean of: 46, 114, 42, 46, 32
Example Example
Compute the median of values: 46, 54, 42, 32, 46 Compute the median of values: 13, 8, 44, 32, 34, 10
Start by arranged values in ascending order: Start by arranged values in ascending order:
value 32 42 46 46 54 value 8 10 13 32 34 44
order 1st 2nd 3rd 4th 5th order 1st 2nd 3rd 4th 5th 6th
𝟏𝟑+𝟑𝟐 𝟒𝟓
Median=46 Median= = =22.5
𝟐 𝟐
Mode (M) case of
is another measure of central Tendency.
The mode is the value that occurs with greatest frequency
values
(The value that occurs most often in a data set is called the mode).
Example Example
Compute the mode of values: 46, 54, 42, 32, 46 Compute the mode of values: 46, 54, 42, 32, 46, 54
:
Mode=46 :
Mode=46 and 54
Unimodal Bimodal
Note: Median and mode are the two measure of central tendency
do not affect the outliers. (advantage)
Case of values case of
values
The mean 𝒙𝒊
The mean of values is the average 𝒙=
𝒏
The mode
The mode is the value that occurs with greatest frequency
The median
The median is the value in the middle when the data are arranged in ascending order (smallest value
to largest value).
Weighted Mean
The mean Case of frequency table
𝒏= 𝒇𝒊
𝒙𝒊 𝒇𝒊
𝒙= 𝒙𝒊 : 𝒄𝒂𝒍𝒂𝒔𝒔 𝒊𝒕𝒉 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕
𝒏
𝒇𝒊 : 𝒄𝒂𝒍𝒂𝒔𝒔 𝒊𝒕𝒉 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚
Example:
Compute the mean of student marks
fi xi xi fi
Class
mark Frequency
Midpoint
1-8 4 4.5 18
9-16 6 12.5 75
17-24 2 20.5 41
25-32 7 28.5 199.5 𝒙𝒊 𝒇𝒊 𝟑𝟕𝟎
33-40 1 36.5 36.5
𝒙= = =18.5
Total 20 370 𝒏 𝟐𝟎
Case of frequency table
The mean
Example:
Compute the mean of student marks
fi xi xi fi
Class
mark Frequency
Midpoint 𝒙𝒊 𝒇𝒊 𝟓𝟗𝟓
0-10 3 5 15 𝒙= = =23.8
𝒏 𝟐𝟓
10-20 8 15 120
20-30 6 25 150
30-40 5 35 175
40-50 3 45 135
25 595
Case of frequency table
The Mode (M)
The mode is the midpoint of a class have greatest frequency
Example: Example:
Compute the mode of student marks Compute the mode of student marks
fi xi fi xi
Class Class
mark Frequency mark Frequency
Midpoint Midpoint
1-8 4 4.5 1-8 4 4.5
9-16 6 12.5 9-16 7 12.5
17-24 2 20.5 17-24 2 20.5
25-32 7 28.5 25-32 7 28.5
33-40 1 36.5 33-40 1 36.5
Total 20 Total 21
𝒇𝒊
1) Rank of Median=
𝟐
𝒇𝒊 𝟐𝟎
Rank of Median= = = 𝟏𝟎
𝟐 𝟐
Cumulative
Class Cumulative Class Frequency
Marks Frequency
boundaries (Rank)
1-8 4 0.5-8.5 less than or equals to 0.5 0
9-16 6 8.5-16.5 less than or equals to 8.5 4
17-24 2 16.5-24.5 less than or equals to 16.5 10
25-32 7 24.5-32.5 less than or equals to 24.5 12
33-40 1 32.5-40.5
20 less than or equals to 32.5 19
less than or equals to 40.5 20
Median=16.5
Example 2:
Compute the median of student marks
𝒇𝒊 𝟏𝟖
Rank of Median= = = 𝟗
𝟐 𝟐
Class
Marks Frequency
boundaries
1-8 2 0.5-8.5
9-16 3 8.5-16.5
17-24 8 16.5-24.5
25-32 1 24.5-32.5
33-40 4 32.5-40.5
18
Another solution way for Example2:
Compute the median of student marks
𝒇𝒊 𝟏𝟖
Rank of Median= = = 𝟗
𝟐 𝟐
Class
Marks Frequency
boundaries
1-8 2 0.5-8.5
9-16 3 8.5-16.5
17-24 8 16.5-24.5
25-32 1 24.5-32.5
33-40 4 32.5-40.5
18
9−5 4
𝑴𝒆𝒅𝒊𝒂𝒏 = 16.5 + 24.5 − 16.5 =16.5 + 8 =16.5+4 = 20.5
13−5 8
Percentiles Case of frequency table
* the 𝑝𝑡ℎ
percentile is the value that approximately p% of
the observations are less than the 𝑝𝑡ℎ percentile
* and approximately (100 – p)% of the observations are greater than the 𝑝𝑡ℎ percentile.
Note: The 50th percentile is also the median.
To find the 𝑝𝑡ℎ percentile begin by arranging the sample values in ascending order then locate it
using the corresponding value
𝑡ℎ 𝑃
1) 𝑝 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑅𝑎𝑛𝑘 = ∗ 𝑓𝑖
100
2) Construct cumulative frequency table (using all class boundaries)
Class
Marks Frequency
boundaries
1-8 2 0.5-8.5
9-16 3 8.5-16.5
17-24 8 16.5-24.5
25-32 1 24.5-32.5
33-40 4 32.5-40.5
18
Another solution way for Example 3:
Compute the percentile 20 of student marks (i.e. p=20)
𝟐𝟎 𝟐𝟎
Rank of 20𝑡ℎ = 𝒇𝒊 = 𝟏𝟖 = 𝟑. 𝟔
𝟏𝟎𝟎 𝟏𝟎𝟎
Class
Marks Frequency
boundaries
1-8 2 0.5-8.5
9-16 3 8.5-16.5
17-24 8 16.5-24.5
25-32 1 24.5-32.5
33-40 4 32.5-40.5
18
3.6−2 1.6
Percentile 20𝑡ℎ 𝒗𝒂𝒍𝒖𝒆 = 8.5 + 16.5 − 8.5 =8.5+ 8 =8.5+4.24=12.76
5−2 3
Example 4:
Compute the percentile 90 of student marks (i.e. p=90)
𝟗𝟎 𝟗𝟎
Rank of 90𝑡ℎ = 𝒇𝒊 = 𝟏𝟖 = 𝟏𝟔. 𝟐
𝟏𝟎𝟎 𝟏𝟎𝟎
Class
Marks Frequency
boundaries
1-8 2 0.5-8.5
9-16 3 8.5-16.5
17-24 8 16.5-24.5
25-32 1 24.5-32.5
33-40 4 32.5-40.5
18
16.2−14 2.2
Percentile 90𝑡ℎ 𝒗𝒂𝒍𝒖𝒆 = 32.5 + 40.5 − 32.5 = 32.5+ 8 = 32.5+4.4= 36.9
18−14 4
Quartiles Case of frequency table
Q1 = first quartile, or 25th percentile
Class
Marks Frequency Percentile 25𝑡ℎ = 15.17 =P25 = Q1
boundaries
𝑝𝑡ℎ Percentile
* is the value that approximately p% of the observations are less than the 𝑝𝑡ℎ percentile.
• and approximately (100 – p)% of the observations are greater than the 𝑝𝑡ℎ percentile.
𝑷
1) Rank of percentile P = 𝒇𝒊
𝟏𝟎𝟎
2) Construct cumulative frequency table (using all class boundaries)
3) Apply the proportion or the rule
Quartiles
Q1 = first quartile, or 25th percentile = Q1 = P25
Q2 = second quartile, or 50th percentile (also the median) = Q2 = P50 = Median
Q3 = third quartile, or 75th percentile = Q3 = P75
Chapter 3 / Section 3.2 Measures of variability
(Dispersion measure): Range, Interquartile range, Variance,
Standard Deviation, coefficient of Variation
Range
Range= Largest value - Smallest value
Interquartile Range (IQR):
IQR = Q3 – Q1
Variance:
Sample variance (𝑆 2)
Standard Deviation:
Sample standard deviation 𝑆 = Variance
Coefficient of variation
Sample Variance Formula
For Ungrouped Data 𝐱 − 𝒙
𝟐
𝒙𝟐𝒊 − 𝒏𝒙𝟐
𝑺𝟐 = =
𝐧−𝟏 𝒏−𝟏
For Grouped Data 𝒙𝒊 − 𝒙 𝟐 𝒇𝒊 𝒙𝟐𝒊 𝒇𝒊 − 𝒇𝒊 𝒙𝟐
[case of frequency table] 𝑺𝟐 = =
𝒇𝒊 − 𝟏 𝒇𝒊 − 𝟏
MAE (MD):
mean absolute error = Mean Deviation
For Ungrouped Data 𝐱 − 𝒙
𝑴𝑨𝑬(𝑴𝑫) =
𝐧
For Grouped Data 𝒙𝒊 − 𝒙 𝒇𝒊
𝑴𝑨𝑬(𝑴𝑫) =
𝒇𝒊
Case of values:
Example:
For the sample values: 46, 52, 42, 48, 32, Compute
1) Range =52-32=20
2) IQR=Q3-Q1=50-37=13
4) Standard Deviation 𝑺 = 𝟓𝟖 = 𝟕. 𝟔𝟐
𝑺 𝟕.𝟔𝟐
5) Coefficient of variation 𝑪𝑽 = × 𝟏𝟎𝟎% = × 𝟏𝟎𝟎% = 𝟏𝟕. 𝟑%
𝒙 𝟒𝟒
Variance
Standard deviation
MAE
𝒙𝒊 𝒇 𝒊
Example: Assuming the Sample of student marks, Compute 𝒙= = 18.5
𝒏
1) Range = 40.5 - 0.5 = 40
2) IQR = Q3 - Q1 = 27.93 - 9.83 = 18.1
𝒙𝟐𝒊 𝒇𝒊 − 𝒇 𝒊 𝒙𝟐
2
𝟐𝟎𝟑𝟐 2 𝑥𝑖 − 𝑥 𝑓𝑖
3) 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝑺𝟐 = =106.95 𝑆 = = 𝐎𝐑 =
𝟏𝟗 𝑓𝑖 − 1 𝒇𝒊 − 𝟏
4) Standard Deviation 𝑺 = 𝟏𝟎𝟔. 𝟗𝟓 = 𝟏𝟎. 𝟑𝟒
𝟏𝟖𝟒
5) MAE= 𝟐𝟎 =9.2
𝑺 𝟗.𝟐
6) Coefficient of variation 𝑪𝑽 = × 𝟏𝟎𝟎% = × 𝟏𝟎𝟎% = 𝟒𝟗. 𝟕%
𝒙 𝟏𝟖.𝟓
𝟐
𝒇𝒊 𝒙𝒊 𝒙𝒊 𝒇𝒊 𝒙𝟐𝒊 𝒇𝒊 𝒙𝒊 − 𝒙 𝒇𝒊 𝒙𝒊 − 𝒙 𝒇 𝒊
Student # of Class
Marks Students Midpoint
1-8 4 4.5 18 81 784 56
9-16 6 12.5 75 937.5 216 36
17-24 2 20.5 41 840.5 8 4
25-32 7 28.5 199.5 5685.75 700 70
33-40 1 36.5 36.5 1332.25 324 18
Total 20 370 8877 2032 184
𝒙𝒊 − 𝒙
z-Score 𝒛𝒊 =
𝑺
Used to determine the relative location of any observation.