1-1
Chapter 2
Methods of Data Collection
and Presentation
1-2
2.1 Method of data collection
Two types of data: Primary and Secondary
Primary data: Data measured or collect by the investigator
or the user directly from the source.
Secondary data: is not investigated by the investigator
himself, but he obtains from someone else records.
Primary data collection methods: includes observation,
personal interview, self administered questionnaire, mailed
questionnaire etc.
Secondary data collection methods: obtained from
published or unpublished documents: reports, journals,
magazines, articles e t c.
1-3
2.2 Method of data presentation
The presentation of data is broadly classified in to the following two
categories:
Tabular presentation
Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to
similarities technically is called classification.
Definitions:
Raw data: recorded information in its original collected form, whether
it is counts or measurements.
Frequency: is the number of values in a specific class of the
distribution.
1-4
Frequency distribution: is the organization of raw data in
table form using classes and frequencies.
There are three basic types of frequency distributions
Categorical frequency distribution
Ungrouped frequency distribution
Grouped frequency distribution
There are specific procedures for constructing each type.
1. Categorical frequency Distribution
Used for data that can be place in specific categories
such as nominal, or ordinal. e.g. marital status.
1-5
Example: The following data are on the political party affiliations of
sample of 40 students. D, R, and O stand for Democratic, Republican
and Other, respectively.
D D D D O R O R O R O R O
D D R D D D R R O R D R R
O R R R R R O O R R D R D D
Construct ungrouped frequency distribution.
1-6
Number of students by political party affiliations
Number of Tally Frequency Relative frequency
student
Democratic 13 0.325
Republican 18 0.45
Other 9 0.225
Total 40 1
1-7
2. Ungrouped frequency distributions
Is a table of all the potential raw score values that could possible occur in
the data along with the number of times each actually occurred.
-Is often constructed for small set or data on discrete variable.
Example: The following data is the number of cars in a sample of 30
government offices in SNNPR.
4 2 4 3 2 8 3 4 4 2 2 8 5 3 4
4 5 4 3 5 2 7 3 3 7 7 3 8 4 5
Construct ungrouped frequency distribution.
1-8
The frequency distribution of the number of cars
Number of Frequency Relative
cars frequency
2 5 .17
3 7 .23
4 8 .27
5 4 .13
7 2 .07
8 3 .1
Total 30 1
1-9
3. Grouped frequency Distribution
When the range of the data is large, the data must be grouped in
to classes that are more than one unit in width.
Grouped Frequency Distribution: a frequency distribution
when several numbers are grouped in one class
Components of Grouped frequency Distribution
Class limits: Separates one class in a grouped frequency
distribution from another. The limits could actually appear in the
data and have gaps between the upper limits of one class and
lower limit of the next.
Units of measurement (U): the distance between two possible
consecutive measures. It is usually taken as 1, 0.1, 0.01, 0.001, ---
1-10
Class boundaries: Separates one class in a grouped frequency
distribution from another. The boundaries have one more decimal
places than the row data and therefore do not appear in the data.
There is no gap between the upper boundary of one class and
lower boundary of the next class. The lower class boundary is
found by subtracting U/2 from the corresponding lower class
limit and the upper class boundary is found by adding U/2 to the
corresponding upper class limit.
Class width: the difference between the upper and lower class
boundaries of any class. It is also the difference between the
lower limits of any two consecutive classes or the difference
between any two consecutive class marks.
1-11
Class mark (Mid points): it is the average of the lower and
upper class limits or the average of upper and lower class
boundary.
Cumulative frequency: is the number of observations less
than/more than or equal to a specific value.
Cumulative frequency above: it is the total frequency of all
values greater than or equal to the lower class boundary of a
given class.
Cumulative frequency below: it is the total frequency of all
values less than or equal to the upper class boundary of a given
class.
1-12
Cumulative Frequency Distribution (CFD): it is the tabular
arrangement of class interval together with their corresponding
cumulative frequencies. It can be more than or less than type,
depending on the type of cumulative frequency used.
Relative frequency (rf): it is the frequency divided by the total
frequency.
Relative cumulative frequency (rcf): it is the cumulative
frequency divided by the total frequency.
Guidelines for classes
1. There should be between 5 and 20 classes.
2.The classes must be mutually exclusive. This means that no data
value can fall into two different classes
1-13
3. The classes must be all inclusive or exhaustive. This means that
all data values must be included.
4. The classes must be continuous. There are no gaps in a
frequency distribution.
5. The classes must be equal in width. The exception here is the
first or last class. It is possible to have an "below ..." or "... and
above" class. This is often used with ages.
Steps for constructing Grouped frequency Distribution
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use
Sturges rule:
1-14
4. Find the class width by dividing the range by the number of
classes and rounding up, not off.
5. Pick a suitable starting point less than or equal to the minimum
value. The starting point is called the lower limit of the first
class. Continue to add the class width to this lower limit to get
the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the
lower limit of the second class. Then continue to add the class
width to this upper limit to find the rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower
limits and adding U/2 units from the upper limits.
8. Tally the data.
1-15
Example*:
Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
The complete frequency distribution follows:
Class Class Class Mark Tally Freq. Cf (less Cf (more rf. rcf (less
limit boundary than than than type
type) type)
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00
1-16
Diagrammatic and Graphic presentation of data.
These are techniques for presenting data in visual displays using
geometric and pictures.
Importance:
They have greater attraction.
They facilitate comparison.
They are easily understandable.
Diagrams are appropriate for presenting discrete data.
The three most commonly used diagrammatic presentation for discrete as well
as qualitative data are:
Pie charts
Pictogram
Bar charts
1-17
Pie chart
A pie chart is a circle that is divided in to sections or wedges
according to the percentage of frequencies in each category of the
distribution. The angle of the sector is obtained using:
angle of sector= (value of the part/ the whole quantity) *360
Example: Draw a suitable diagram to represent the following
Men Women Girls Boys
population in a town. 2500 2000 4000 1500
1-18
3. Pictogram: is a device used to represent data by means of
pictures or small symbols.
Example 2.23: The following table shows the orange
production in a plantation from production year 1990-
1993. Represent the data by a pictogram.
Production year 1990 1991 1992 1993
Amount (in kg) 3000 3850 3500 5000
1-19
Figure: Pictogram of the data on Orange productions from
1990 to 1993.
1-20
Bar Charts:
A set of bars (thick lines or narrow rectangles) representing
some magnitude over time space.
They are useful for comparing aggregate over time space.
Bars can be drawn either vertically or horizontally.
There are different types of bar charts. The most common
being :
Simple bar chart
Component or sub divided bar chart.
Multiple bar charts.
1-21
i. Simple Bar charts
Are used to display data on one variable.
They are thick lines (narrow rectangles) having the same breadth.
The magnitude of a quantity is represented by the height /length
of the bar.
Example: draw a bar chart for the following coffee production data from
1990 to 1995.
Year 1990 1991 1992 1993 1994 1995
Amount (in 50 75 92 64 100 120
1000 tones)
1-22
Figure : Production of coffee from 1990 to 1995
A m o u n t o f c o ffe e in 1 0 00 to n s
120
100
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
1-23
ii. Multiple bar charts:
These are used to display data on more than one variable.
They are used for comparing different variables at the same time.
Example : Draw a multiple bar chart for the data on
production of coffee (in 1000 tons) from 1991 to 1993 by
region.
Region Production year
1991 1992 1993
Region A 80 85 90
Region B 120 165 120
Total 200 250 210
1-24
Figure : Production of coffee from 1991 to 1993 in two
A m o u n t o f c o f fe e in 1 0 0 0 to n s
regions.
200 Region
A
B
150
100
50
0
1991 1992 1993
Production year
1-25
iii. Component bar charts:
When there is a desire to show how a total (or aggregate) is divided in to its
component parts, we use component bar chart. The bars represent total value of
a variable with each total broken in to its component parts.
Example : Draw a component bar chart for the data on production of coffee
A m o u n t o f c o ffe e in 1 0 0 0 to n s
(in 1000 tons) from 1991 to 1993
250 Region
A
B
200
150
100
50
0
1991 1992 1993
Production year
1-26
Graphic presentation of data
1. Histogram: A graph which displays the data by using
vertical bars of various height to represent frequencies.
Class boundaries are placed along the horizontal axes.
Class marks and class limits are some times used as
quantity on the X axes.
Example: Construct a histogram for the frequency
distribution of the time spent by the automobile workers.
1-27
The frequency distribution is:
Time (class Class mark Number of
boundaries) workers
15.5-21.5 18.5 3
21.5-27.5 24.5 6
27.5-33.5 30.5 8
33.5-39.5 36.5 4
39.5-45.5 42.5 3
45.5-51.5 48.5 1
1-28
Figure 1: The time in minutes spent by automobile workers to
travel from home to work.
1-29
2. Frequency polygon: class marks against class frequencies
and joining them by a set of line segments
• Add two classes with zero frequencies at the two ends of
the frequency distribution
Example: Construct a frequency polygon for the frequency
distribution of the time spent by the automobile workers.
1-30
Figure 2: The time in minutes spent by automobile workers to
travel from home to work.
1-31
3. Ogive (Cumulative frequency graph): A graph showing
the cumulative frequency (less than or more than type)
plotted against upper or lower class boundaries
respectively.
That is class boundaries are plotted along the horizontal
axis and the corresponding cumulative frequencies are
plotted along the vertical axis.
The points are joined by a free hand curve.
Example: Construct an ogive for the time spent by the
automobile workers.
1-32
The frequency distribution is:
Class boundaries LCF MCF
15.5 0 25
21.5 3 22
27.5 9 16
33.5 17 8
39.5 21 4
45.5 24 1
51.5 25 0
1-33
Figure 3: The time in minutes spent by automobile workers to
travel from home to work.