EDA_W3_Obtaining-Data
EDA_W3_Obtaining-Data
Obtaining Data
Editha A. Macorol
Measures of Describing Data
• Measure of Position
- Measure of finding the kth element of the
distribution
Measures of Describing Data
• Measure of Variation
- Measure of how the data is distributed about the
mean.
• Measure of Shape
- Measure of the degree of symmetry of a
distribution.
The Mean
Weighted Mean
Example:
1. The Carter Construction Company pays its hourly
employees $16.50, $19.00, or $25.00 per hour. There
are 26 hourly employees, 14 of which are paid at the
$16.50 rate, 10 at the $19.00 rate, and 2 at $25.00
rate. What is the mean hourly rate paid of the 26
employees?
The Median
Characteristics
• There is a unique median for each data set.
• It is not affected by extremely large or small values and
is therefore a valuable measure of central tendency
when such values occur.
• It can be computed for ratio-level, interval-level, and
ordinal-level data.
• It can be computed for an open-ended frequency
distribution if the median does not lie in an open-
ended class.
The Median
• The midpoint of the values after they have been
ordered from the smallest to largest
• There are as many values above the median as below it
in the data array.
• For an even set of values, the median will be the
arithmetic average of the two middle numbers.
Median: Computational Procedure
First Procedure
Arrange the observations in an ordered array.
If there is an odd number of terms, the median is the middle term of
the ordered array.
If there is an even number of terms, the median is the average of the
middle two terms.
Second Procedure
The median’s position in an ordered array is given by (n+1)/2.
The Median
Example:
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
Deciles
• Dividing the dataset into 10 groups.
Percentiles
• Dividing the dataset into 100 groups.
Measures of Location
• Quartile – One fourth
First (1/4), Second (1/2), Third (3/4)
Quartile locator (Lq):
• Decile – One tenth
10%, 20%, …, 90%
Decile locator (Ld):
• Percentile − One hundredth
1%, 2%, …, 99%
Measures of Variation (Dispersion)
Why study dispersion?
• A second reason is to compare the spread in two or
more distributions.
• These are measures of the average distance of each
observation from the center of distribution.
• They measure the homogeneity or heterogeneity of a
particular group.
Measures of Variation (Dispersion)
Why study dispersion?
• A measure of location, such as the mean or the median does not
tell us anything about the spread of the data.
• For example, if your nature guide told you that the river averaged
3 feet in depth, would you want to wade across on foot without
additional information? Probably not. You would want to know
something about the variation in depth.
Measures of Variation
• Range
- The difference between the largest and smallest number in
the set
• Interquartile Range
- Range of values between the first and third quartiles
- Range of the “middle half”
• Mean Deviation
• The average of unsigned deviations from mean
• Variance
- The average of square deviations
• Standard Deviation (SD)
- The population/sample standard deviation is given as the
positive square root of population/sample variance
• Coefficient of Variation (CV)
- The percentage of the ratio of standard deviation to the
mean
Range
R=H─L
Consider the following data.
Grades in Statistics
Jon 100 Ann 84
Ron 65 Ria 86
Dan 75 Let 85
Tom 85 Bel 82
Bob 95 Nel 83
Range 35 Range 4
Range
Conclusion: Grades of males are more scattered while
grades of females are more compressed. Females are
more homogeneous in their math ability.
Females
Measures of Variation
Example:
• Coefficient of Variation
s
CV = ( 100 % )
𝑥
Coefficient of Variation
Measures of Shape
• Skewness
- Degree of asymmetry of distribution about a mean. It
is a measure on how the data departs from being
symmetrical
- Can be interpreted as symmetric, positively skewed or
negatively skewed
• Kurtosis
- The degree of peakedness exhibited by the distribution
- Computed as the fourth degree moment from the
mean
Skewness
Pearsonian Coefficient of Skewness (Pearson’s Coefficient
of Skewness)
Interpretation of values:
1. Sk < 0, “negatively skewed” or “skewed to the left”
2. Sk = 0, symmetrical
3. Sk > 0, “positively skewed” or “skewed to the right”
Skewness
• A measure of the asymmetry of the frequency distribution
Leptokurtic Platykurtic
Mesokurtic
Kurtosis
Moment Based Coefficient of Kurtosis
25-29 1
30-34 1
35-39 5
40-44 8
45-49 15
50-54 4
55-59 4
60-64 3
65-69 4
70-74 3
75-79 2
Class
Frequency
Interval
25-29 1 27 1
30-34 1 32 2
35-39 5 37 7
40-44 8 42 15
45-49 15 47 30
50-54 4 52 34
55-59 4 57 38
60-64 3 62 41
65-69 4 67 45
70-74 3 72 48
75-79 2 77 50
Class
Frequency
Interval
27 1 27
25-29 1
32 2 32
30-34 1
37 7 185
35-39 5
42 15 336
40-44 8
47 30 705
45-49 15
52 34 208
50-54 4
57 38 228
55-59 4
62 41 186
60-64 3
67 45 268
65-69 4
72 48 216
70-74 3
77 50 154
75-79 2
2545
50
Class
Frequency
Interval
𝑚𝑜= 44.5 + ( 7
7 + 11 )
5
𝑚𝑜=46.44
- lower boundary of the 1st quartile class
- cumulative frequency for class interval preceding the 1 st quartile class
- frequency in the 1st quartile class
– class width or the interval size
( )5
– sample size
1 5 −7
𝐷 3 =39.5 +
8
( )
35 𝑛
− 𝑐𝑓
( 1 2.5− 7
) 𝐷 3= 44.5
100
𝑄1 =39.5+ 5 𝑃 35 = 𝑥 𝑙𝑏 + 𝑖
𝑓 𝑚
8
IR=15.94 3 ( 50.9−47.83 )
𝑆𝑘=
11.97
484
𝑚𝑑=
49 𝑆𝑘=0.77
𝑚𝑑=9.9
𝑠=
√
7014.5
49 𝑘=
2536769.88
50 ¿ ¿
𝑠=11.96 1.12
SOLVE THE FOLLOWING:
1. Mean
2. Median
3. Mode
4. 1st quartile
5. 3rd Quartile
6. 35th Percentile
7. 67th Percentile
8. IQR
9. Mean Deviation
10. Standard Deviation
11. Skewness
12. Kurtosis