Data
DR. FLORENTINA PUNGKY PRAMESTI, ST., MT.
Preliminary Data Analysis
Raw data: collected data that have not been organized
numerically. e.g: the set of heights of 100 male students obtained from
an alphabetical listing of university records
Array: an arrangement of raw numerical data in ascending or
descending order of magnitude
Frequency distribution (frequency table): a tabular
arrangement of data by classes together with the corresponding
class frequencies
Class Intervals And Class Limits
Class Boundaries. Bisa jg merupakan simbol kelas. Should
not coincide with actual observation ➔ to avoid ambiguity
The size, or width, of a class interval is the difference between the lower and
upper class boundaries
The class mark is the midpoint of the class interval
Forming frequency distributions
Determine the largest and smallest numbers in the raw data and thus find the
range.
Divide the range into a convenient number of class intervals having the same
size. If this is not feasible, use class intervals of different sizes or open class
intervals. The number of class intervals is usually between 5 and 20, depending
on the data. Class intervals are also chosen so that the class marks (or midpoints)
coincide with the actually observed data. This tends to lessen the so-called
grouping error involved in further mathematical analysis. However, the class
boundaries should not coincide with the actually observed data.
Determine the number of observations falling into each class interval; that is, find
the class frequencies. This is best done by using a tally, or score sheet.
Table 1 shows a frequency
distribution of the weekly wages
of 65 employees at the P&R
Company.
Five new employees were hired
at weekly wages of $285.34,
$316.83, $335.78, $356.21, and
$374.50. Construct a frequency
distribution of wages for the 70
employees.
Graphic of the frequency
histogram or frequency histogram: set of rectangles
Graphic of the frequency
frequency polygon : line graph
Cumulative-frequency distributions and ogives
Ogive: cumulative-frequency polygon
Can you make it?
Case
The final grades in mathematics of 80 students at State
University are recorded in the accompanying table
Case
Measuring the central tendency
Arithmetic mean
Arithmetic weighed mean
Arithmetic mean from grouped data
Size of class intervals : c,
A : any guessed or assumed arithmetic
mean (which may be any number)
Deviations dj = Xj A, expressed as cuj ,
where uj can be positive or negative
integers or zero
either the middle value or the arithmetic mean of the
Median two middle values.
The set of numbers 3, 4, 4, 5, 6, 8, 8, 8, and 10 has median 6
The set of numbers 5, 5, 7, 9, 11, 12, 15, and 18 has median ½ *(9+11) =10
For grouped data
L1 : lower class boundary of the median class (i.e., the class containing the median)
N : number of items in the data (i.e., total frequency)
( f)1 : sum of frequencies of all classes lower than the median class
fmedian : frequency of the median class
c : size of the median class interval
value which occurs with the greatest frequency
Mode
The set 2, 2, 5, 7, 9, 9, 9, 10, 10, 11, 12, and 18 has mode 9.
The set 3, 5, 8, 10, 12, 15, and 16 has no mode
The set 2, 3, 4, 4, 4, 5, 5, 7, 7, 7, and 9 has two modes, 4 and 7, and is called bimodal
A distribution having only one mode is called unimodal
where L1 : lower class boundary of the modal class (i.e.,
the class containing the mode)
1 : excess of modal frequency over frequency of
next-lower class
2 : excess of modal frequency over frequency of
next-higher class
c : size of the modal class interval
EMPIRICAL RELATION BETWEEN THE MEAN, MEDIAN, AND MODE
For unimodal frequency curves that are
moderately skewed (asymmetrical), we
have the empirical
relation
Mean - mode = 3(mean - median)
THE GEOMETRIC MEAN G
THE HARMONIC MEAN H The geometric mean of the numbers 2, 4, and 8
is ….
And The harmonic mean
is ….
RELATION BETWEEN THE ARITHMETIC, GEOMETRIC, AND
HARMONIC MEANS
H G X
THE ROOT MEAN SQUARE
The RMS of the set 1, 3, 4, 5, and 7 is
QUARTILES, DECILES, AND PERCENTILES
Standard deviation for a grouped data
fiXi − ( fiXi) / n
2 2
=w
n−1
W: class width n – 1 : degree of freedom (page 39)
Fi: Frequency ➔ membicarakan sampel maka gunakan
Xi: class mid point or deviation degree of freedom (n - m)
from an arbitrary origin ➔ membicarakan populasi gunakan n
Standard deviation (xi − )
n 2
= i=1
for an ungrouped data n
Catatan
will learn about the construction of ogive or cumulative frequency curve and cumulative
frequency polygon. There are two methods of constructing frequency polygon and cumulative
frequency curve but the techniques of drawing it is same.
1) Less than method
2) More than method
Less than method :
First prepare a less than type cumulative frequency table.
1) On the x – axis use the upper limits of the class.
2) Mark the less than type cumulative frequency on y – axis.
3) Plot the points using upper limits and corresponding cumulative frequencies.
4) Join the points by a free hand curve to get ogive and to get the cumulative frequency
polygon join the points by line segments.
More than method :
First prepare a more than type cumulative frequency table.
5) On the x – axis use the lower limits of the class.
6) Mark the more than type cumulative frequency on y – axis.
7) Plot the points using upper limits and corresponding cumulative frequencies.
8) Join the points by a free hand curve to get ogive and to get the cumulative frequency
polygon join the points by line segments.