Supplementary Readings for Research
Supplementary Readings for Research
Makonnen Asefa
Fasil Tessema
Department of Epidemiology and Biostatistics
Community Health Program
Jimma University
April 2000
Jimma
Table of Contents
Preface________________________________________________________________ iv
1. Introduction to Research ________________________________________________ 1
1.1. Health for all _____________________________________________________ 1
1.2. What is research? _________________________________________________ 2
1.3. What is meant by a “health system”? __________________________________ 2
2. Identifying and Prioritizing Problems for Research ___________________________ 4
2.1. Problem Identification ______________________________________________ 4
2.2. Criteria for Prioritizing Problems for Research __________________________ 4
3. Statement of the Problem _______________________________________________ 5
3.1. Analyzing the Problem _____________________________________________ 5
3.2. Deciding on the Focus and Scope of the Research ________________________ 7
3.3. Formulating the Problem Statement ___________________________________ 8
4. Review of Literature and Available Information _____________________________ 9
Why is it important to review already available information when preparing a research
proposal? ____________________________________________________________ 9
What are the possible sources of information? _______________________________ 9
Where can we find these different sources? _________________________________ 9
How do you write a review of literature? __________________________________ 11
5. Research Objectives __________________________________________________ 13
How should you state your objectives? ___________________________________ 14
6. Research Methodology ________________________________________________ 15
6.1. Variables _______________________________________________________ 15
What is a variable? _________________________________________________ 15
Dependent and Independent Variables __________________________________ 16
Background Variables _______________________________________________ 17
6.2. Study Type _____________________________________________________ 18
6.2.1. Introduction _________________________________________________ 18
6.2.2. Overview of Study Types ______________________________________ 19
6.3. Overview of Data-Collection Techniques _____________________________ 19
6.3.1. Introduction _________________________________________________ 20
6.3.2. Types of Questions ___________________________________________ 20
6.3.3. Steps in Designing a Questionnaire _______________________________ 24
i
6.4: Sampling _______________________________________________________ 28
6.4.1. Introduction _________________________________________________ 28
6.4.2. Sampling Methods ____________________________________________ 29
6.5. Plan for Data Collection ___________________________________________ 37
6.5.1. Introduction _________________________________________________ 37
6.5.2. Stages in the Data-Collection Process _____________________________ 37
6.6. Plan for Processing and Analysis for Data _____________________________ 41
6.6.1. Introduction _________________________________________________ 41
6.6.2. Sorting Data _________________________________________________ 42
6.6.3. Performing Quality-Control Checks ______________________________ 42
6.6.4. Data Processing ______________________________________________ 44
6.6.5. Data Analysis ________________________________________________ 45
6.7. Pre-test/Pilot Study _______________________________________________ 48
What is pretest or pilot study of the methodology? ________________________ 48
7. Work plan__________________________________________________________ 49
7.1. Introduction _____________________________________________________ 49
7.2 Various Work Scheduling and Planning Techniques ______________________ 49
7.2.1. The work schedule ____________________________________________ 49
7.2.2. THE GANTT chart ___________________________________________ 51
8. Plan for Project Administration, Monitoring & Utilization ____________________ 53
8.1. Administering Research Projects ____________________________________ 53
8.2. Project Monitoring _______________________________________________ 54
8.3. Planning for the Utilization and Dissemination of Research Results _________ 54
9. Budget ____________________________________________________________ 57
10. Finalizing and reviewing the research proposal____________________________ 59
10.1. Finalizing the research proposal ___________________________________ 59
10.2. Writing a Summary of the Research Proposal _________________________ 60
A Short Handbook for beginners to EPI INFO: A Word Processing, Database, and
Statistical System for Epidemiology on Microcomputers _______________________ 62
1. The EPED program _________________________________________________ 62
2. The ENTER program _______________________________________________ 65
3. The CHECK program ______________________________________________ 66
4. The ANALYSIS program ___________________________________________ 67
5. Browsing ________________________________________________________ 69
6. Graphics _________________________________________________________ 69
7. The IMPORT program______________________________________________ 69
ii
8. The EXPORT program ______________________________________________ 70
9. The MERGE program ______________________________________________ 71
10. The Anthropometric calculations using EPINUT. ________________________ 72
11. The CSAMPLE program: Analyzing data from complex survey samples _____ 72
References ____________________________________________________________ 77
iii
Preface
Jimma University's main objective is to train professionals who are responsive to the
needs of society. In order to realize this objective, the university uses three different
strategies which complement each other. The strategies are:
What is the purpose of student research project? Its main purpose is to enhance students'
skill on problem identification and propose solution by involving all stakeholders in a
community setting. To realize this, a training on research undertaking is organized based
on a new approach – participatory training. The participatory approach of learning is
learner-centered, emphasizing the learners developing abilities and skills to diagnose and
solve problems. The trainer merely facilitates a process of competency building and self-
discovery for the learner, whose needs, experience and goals are the focus of the training.
The methods for participatory learning include brainstorming; trainer led discussion, case
methods, role-playing, exercise, group work, and plenary. A trainers guide is presented
in 'A manual for undertaking research, the participatory approach-learning by doing',
Asefa 1998.
This document is prepared as a supplementary reading material for the participants. The
materials in the different sections are largely adapted from Varkeuisser et al 1991.
This training is organized in a workshop setting, which takes two/three weeks duration.
In the workshop each participant and trainer/facilitator brings his/her own experiences.
The workshop provide a forum for sharing information where everyone can contribute
the benefits of his/her own experience and knowledge. This sharing will add greatly to
the richness and relevance of the course. During the training participants will identify
priority problems in a community or in their own working situations. Following this,
participants will develop the different sections of a research proposal in a modular
approach that they will carry out in the field in community/organization settings.
Research is a systematic search for information and new knowledge. It helps to advance
applied/operational research for health. Operational research is necessary to the process
of identifying priority problems and to designing and evaluating polices and programmes
that will be of the greatest health benefit, using existing knowledge and available
resources.
We hope participants who successfully undergo this training are able and responsible to
greatly contribute in improving the health of a community by enhancing the effectiveness
of the health system as an integral part of the overall process of socio-economic
development.
iv
1. Introduction to Research
The adoption of the philosophy and strategies for Health For All by the Year 2000
implies that we are committed to ensuring that all people (not just some) will attain a
level of health that enables them to participate actively in the social and economic life of
the community in which they live.
In the past, research has made major contributions to health by providing knowledge on
the causes of diseases and ill health and by developing the technology to cure and prevent
disease and promote health.
However, despite the considerable amount of knowledge and technology that is available
today, many peoples continue to be unable to achieve the targets of Health For All. Why
is this so?
The health of any community depends on the interaction and balances between the health
need o f the community, the health resources that are available, and the selection and
application of health and health related interventions. This can be illustrated as in Fig.
1.1. It is evident that it is important to apply the available technology in an optimal
manner, within the limited resources available; in order to serve the health needs of the
community.
Figure 1.1 Evaluation of health interventions.
(Perceived by (available
professionals from health
form different HEALTH HEALTH services,
disciplines NEEDS RESOURCES other sectors
and the and the
population) population)
To effect the necessary changes to achieve Health for All, countries must decided on the
best approaches to adopt. This requires detailed and accurate information on needs,
possibilities, and consequences of recommended actions. Such information is often
lacking, inadequate or unreliable. For this reason, decision-making based on assumptions
and unjustified conclusions often result in the selection of inappropriate policy and
program choices, the consequences of which are only discovered after implementation.
In many instances, research can provide the information needed for informed decision-
making.
1
1.2. What is research?
Characteristics of research:
First, basic research is necessary to generate new knowledge and technologies to deal
with major unresolved health problems. Second, applied research is necessary to identify
priority problems and to design and evaluate policies and programs that will deliver the
greatest health benefit, making optimal use available resources.
During the past two (or even three) decades, there has been a rapid evolution of concepts
and research approaches to support managerial aspects of health development. Many of
these have been described by specific terms such as operations research, health services
research, health manpower research, policy and economic analysis, applied research, and
decision-linked research. Each of these has made crucial contributions to the
development of health systems research (HSR).
• A set of cultural beliefs about health and illness that forms the basis for
health-seeking and health-promoting behavior;
• The institutional arrangements within which that behavior occurs; and
• The socioeconomic/political/physical context for those beliefs and institutions.
2
It includes the following components:
The aim of HSR is to provide health managers at all levels, as well as community leaders,
with the relevant information they need to make decisions on problems they are facing.
Although complex research projects at the policy level may require heavy involvement of
a multidisciplinary team of researchers, health-care decision-makers, health providers,
and representatives from the community that will be affected by the policy should be
involved as well. Although service personnel may take the major role in simpler studies
focusing on practical problems in their own working situations, their projects may require
assistance from researchers with skills in relevant disciplines, as well as the participation
of health managers and the community.
3
2. Identifying and Prioritizing Problems for Research
These questions can be placed in three broad categories, depending on the type of
information sought:
1. Description of health problems required for planning interventions.
Planners need to know the magnitude and distribution of health needs
as well as of health resources, to formulate adequate policies and plan
interventions.
2. Information required to evaluate ongoing interventions with respect to:
• Coverage of health needs
• Coverage of target groups
• Quality
• Cost
• Effects/impact
to assess progress and the need for adjustment on a routine basis.
3. Information required to define problems situation arising during the
implementation of health activities, to analyze possible causes to find
solutions.
Each problem that is proposed for research has to be judged according to certain
guidelines or criteria. There may be several ideas to choose from. Before deciding on a
research topic, each proposed topic must be compared with all other options.
4
3. Statement of the Problem
The first major section in a research proposal is the statement of the problem. The
following components are included in the statement of the problem.
In HSR, the researcher is often required to do research on a problem with which he or she
is not very familiar. Health Workers and managers or community members may be much
more familiar with the problems. But even they may never have given critical attention
to the various aspects of the problem.
In a workshop setting, it may be impossible to obtain input from all concerned. The
opinion of people who cannot be consulted (e.g., local health staff or community leaders)
should be solicited immediately after the workshop, before finalizing the proposal.
Step 3.1 Write down the core problem(s) as defined in Step 2 in the center of a
blackboard or flip chart.
5
Figure 3.1 identifying several “generations” of predisposing factors
Causing high defaulter rate among TB patients.
High defaulter
Other rate of TB patients
service
factors
No systematic advice
And counseling provided
No adequate
materials or
guidelines on Staff not trained inefficient
TB health staff
education distribution
Little understanding
of patients’ perceptions
of TB and TB treatment
Step 3.4 Attempt to organize related factors together into larger categories, and
develop your final draft of the diagram.
This final step in organizing the diagram will help you not to overlook important
factors and will make it easier to develop the data collection tools in a systematic
way.
For example, the revised diagram focusing on the “high defaulter rate” among
tuberculosis patients may group contributing factors into three main categories:
• Socio-cultural factors;
• Service-related factors; and
• Disease related factors.
For our TB example, we may categorize the factors contributing to defaulting into
these three main groups.
6
Sociocultural factors, which may be:
• Personal factors such as age, sex, education occupation, and composition (and
possible support) of the family;
• Community determined factors such as:
- Poor or conflicting community knowledge of signs and caused of TB and
of requirements for TB treatment,
- Availability of other types of treatment in the community,
- Preference for other types of treatment, and
- Poor understanding and support from employer.
For example, we may need information on knowledge, attitudes, and practices (KAP) of
teenagers with respect to bilharzia to develop adequate health-education materials for
schools. In this case, we can make a different diagram listing the relevant KAP that we
want in the study. We can, however, go one step further and list the factors that may
(have) contribute(d) to the development of the teenagers KAP.
After this detailed analysis of the problem, it is important to reconsider the focus and
scope of the research. Several issues are particularly important to consider, including:
1. Usefulness of the information. Will the information that would be collected on this
problem help improve health and health care? Who would use the findings related to
the factors in the diagram that would be studied? How would the findings be used?
2. Feasibility. Is it feasible to analyze all the factors related to the problem in the 4-6
months available for research?
3. Duplication. Is some of the information related to factors in the diagram already
available? What aspects of the problem need further research?
7
Review your problem diagram with these issues in mind. If your problem is complex and
has many possible contributing factors, identify and demarcate the boundaries of possible
smaller research topics. If there is more than one possible topic, use the selection criteria
and ranking method that were described in Module 3 to assist you in your final decision
concerning the focus and scope of your research.
2. A concise description of the nature of the problem (the discrepancy between what is
and what should be) and of its size, distribution, and severity (who is affected, were,
since when, and what are the consequences for those affected and for the services?)
3. An analysis of the major factors that may influence the problem and a convincing
argument that available knowledge is insufficient to solve it.
4. A brief description of any solutions that have been tried in the past, how well they
have worked, and why further research is needed.
5. A description of the type of information expected to result from the project and how
this information will be used to help solve the problem.
6. If necessary, a short list of definitions of crucial concepts used in the statement of the
problem.
List abbreviations may be annexed to the proposal, but each abbreviation also has to
be written out in full when introduced in the text for the first time.
8
4. Review of Literature and Available Information
Different sources of information can be consulted and reviewed at various levels of the
administrative system within your country and internationally.
9
You need to develop a strategy to gain access to each source and to obtain information in
the most productive manner. Your strategy may vary according to where you work and
the topic under study. It may the following steps:
Some agencies will assist with your literature search if requested by telephone or in
writing. The request, however, should be very specific. Otherwise you will receive a
long list of references, most of which will not be relevant to your topic. If you are
requesting a computerized search it is useful to suggest key words that can be used in
locating the relevant references.
Note: Facilitators should be able to provide specific information regarding national and
international facilities to assist you with the search for literature.
Information on an index card should be organized in such a way that you can easily find
all data you will need for your report.
Example:
Fantahun M, Abebe T. Self-reported disease conditions among workers of the
textile mill in Bahir Dar, northwest Ethiopia. Ethiop J Health Dev 199; 13(2):
151-156.
10
For a book, the following information should be noted:
Author(s) (last name first). Title of book. Edition, Place: Publisher, year: number
of pages in the book.
Example:
Abramson JH. Survey methods in community medicine. 2nd ed. Edinburgh:
Churchill Livingstone, 1979: 229.
Author(s) (last name first). Chapter title. In: Editors of book (surname followed
by initials). eds., Title of book, Place: Publisher, year: page numbers of chapter.
Example:
Tsega E. Viral Hepatitis. In: Kloos H & Ahmed Z, eds. The ecology of health
and disease in Ethiopia. Sanfrancisco: Westview Press, 1993: 213-222.
This information, recorded in a standard format such as that suggested above, can then
easily be used as part of your list of references for the proposal. The formats suggested
above have been adopted as standard by over 300 biomedical journals and sometimes
referred to as “the Vancouver System.” For more information, see International
Committee of Medical Journal Editors (1988). Other references in this series follow
IDRC’s house style.
The index card or computer entry (one for each reference) could contain quotations and
information such as:
• Key words:
• A summary of the contents of the books or the article, concentrating on
information relevant to your study; and
• A brief analysis of the content, with comments such as:
- Appropriateness of the methodology
- Important aspects of the study; and
- How information from the study can be used in your research.
Index cards or computer entries can also be used to summarize information obtained from
other sources, such as informal discussions, reports of local health statistics, and internal
reports.
There are a number of steps you should take when preparing a review of available
literature and information:
11
• Then, decide in which order you want to discuss the various issues. If you
discover you have not yet found literature or information on some aspects of
your problem that you suspect are important, make a special effort to find this
literature.
• Finally write a coherent discussion of one or two pages in your own words,
using all relevant references. You can use consecutive numbers in the text to
refer to your references. Then list your references in that order, using the
format described in the section above on index cards. Add this list as an
annex to your research proposal.
Alternatively, you can refer to the references more fully in the text, putting the
last name of the author, year of publication and number (s) of page(s) referred
to between brackets, e.g., (Kebede 1988: 15-17). If this system of citation is
used, the references at the end of the proposal should be listed in alphabetical
order.
Possible bias
It is useful to be aware of various types of bias. This will help you to be critical of the
existing literature. If you have reservations about certain references, or if you find
conflicting opinions in the literature, discuss these openly and critically. Such a critical
attitude may also help you avoid biases in your own study. Common types of bias in
literature include:
• Playing down controversies and differences in one’s own study results;
• Restricting references to those that support the point of view of the author;
and
• Drawing far-reaching conclusion from preliminary or shaky research results
or making sweeping generalizations from just one case or small study.
Ethical Considerations
They types of bias mentioned above would put the scientific integrity of the responsible
researcher in question. Moreover, careless presentation and interpretation of data may
put readers who want to use the study’s findings in the wrong track. This may have
serious consequences, in terms of time and money spent on HSR and it may even lead to
wrong decisions affecting people’s health.
A similarly serious act, for which a researcher can be taken to court, is the presentation of
research results or scientific publications from other writers without quoting the author.
Therefore, appropriate referencing procedures should always be followed in research
proposals as well as in research reports.
12
5. Research Objectives
Objectives should be closely related to the statement of the problem. For example, if the
problem identified is low utilization of child welfare clinics, the general objective of the
study could be to identify the reasons for this low utilization, to find solutions.
The general objective of a study state's what is expected to be achieved by the study in
general terms.
It is possible (and advisable) to break down a general objective into smaller, logically
connected parts. These are normally referred to as specific objectives.
Specific objectives should systematically address the various aspects of the problem as
defined under “Statement of the problem” and the key factors that are assumed to
influence or cause the problem. They should specify what you would do in your study,
where, and for what purpose.
The general objective “to identify the reasons for low utilization of child welfare clinics
in District X to find solutions, “ for example, could be broken down into the following
specific objectives:
3. Identify factors related to the child welfare services offered that makes them
either attractive or not attractive to mothers. This objective may be divided
into smaller sub-objectives focusing on distance between the home and clinic,
acceptability of the services to mothers, quality of the services, etc.
4. Identify socioeconomic and cultural factors that may influence the mothers’
utilization of services. (Again, this objective may be broken down into
several sub-objectives.)
6. Work with all parties concerned to develop a plan for implementing the
recommendations
13
Note: An objective focusing on how the results will be used should be included in every
applied research study.
• Cover the different aspects of the problem and its contributing factors in a
coherent way and in a logical sequence;
• Are clearly phrased in operational terms, specifying exactly what you are
going to do, where, and for what purpose;
Keep in mind that when the project is evaluated; the results will be compared to the
objectives. If the objectives have not been spelled out clearly, the project cannot be
evaluated.
Note:
Policymakers and field staff usually feel the need for research because they do NOT have
enough insight into the causes of a certain problem. Therefore, most HSR proposals
present the specific objectives in the form of open statements (as given in the examples
earlier) instead of focusing the study on a limited number of hypotheses.
14
6. Research Methodology
Research method is a process how you are going to achieve your stated objectives. The
following are the different components of the research methodology.
6.1. Variables
What is a variable?
A simple example of a variable is a person’s age. The variable age can take on different
values because a person can be 20 years old, 35 years old, and so on. Other examples of
variables are:
• Weight (expressed in kilograms or in pounds);
• Distance between homes and clinic (expressed in kilo meters or in minutes walking
distance): and
• Monthly income (expressed in birr, dollars).
Because the values of all these variables are expressed in numbers, we call them
numerical variables.
The different values of a variable may also be expressed in categories. For example, the
variable sex has two values, male and female, which are distinct categories. Other
examples are:
Variables Categories
Color • Red
• Blue
• green, etc.
• maize
Main type of staple food eaten
• millet
• rice
• teff, etc.
Since the values of these variables are expressed in categories, we call them categorical
variables.
15
Dependent and Independent Variables
Because in health systems research you often look for causal explanations, it is important
to make a distinction between dependent and independent variables.
The variable that is used to describe or measure the problem under study is called the
dependent variable.
The variables that are used to describe or measure the factors that are assumed to cause or
at least to influence the problem are called the independent variables.
A variable that is associated with the problem and with a possible cause of the problem is
a potential confounding variable.
Cause Effect/outcome
(independent (dependent
variable variable)
Other factors
(confounding variables)
Therefore, to give a true picture of cause and effect, the confounding variables must be
considered, either at planning stage or while doing analysis.
For example:
A relationship is shown between the low level of the mother’s education and malnutrition
in under 5s. However, family income may be related to the mother’s education as well as
to malnutrition.
Family income
(confounding variables)
16
Background Variables
In almost every study background variables appear, such as age, sex, educational level,
socioeconomic status, marital status, and religion. These background variables are often
related to a number of independent variables, so that they influence the problem
indirectly. (Hence they are called background variables.) If the background variables are
important to the study, they should be measured. However, try to keep the number of
background variables measured as few as possible, in the interest of economy.
Background variables are notorious “cofounders.”
17
6.2. Study Type
6.2.1. Introduction
Depending on the existing state of knowledge about a problem that is being studied,
different types of questions may be asked that require different study designs. Some
examples are given in below
Suspecting that certain factors Are certain factors indeed Analytical (comparative)
contribute to the problem associated with the problem? studies:
(e.g., Is lack of preschool
education related to low school Cross-sectional comparative
performance? Is low fiber diet studies
related to carcinoma of the large Case-control studies
intestine?) Cohort studies
Having established that certain What is the cause of the Cohort studies
factors are associated with the problem?
problem; desiring to establish
the extent to which a particular Will the removal of a particular Experimental or quasi-
factor causes or contributes to factor prevent or reduce the experimental study designs
the problem problem? (e.g., stopping
smoking, providing safe water)
Experimental or quasi-
Having sufficient knowledge What is the effect of a particular experimental study designs
about cause to develop and intervention/strategy? (e.g.,
assess an intervention that treating with a particular drug,
would prevent, control, or being exposed to a certain type
solve the problem of health education)
18
The type of study design chosen depends on:
When exploring more complicated management problems and many health problems, we
usually want to go further and determine the extent to which one or several independent
variables contributes to the problem (for example, the contribution of low-fiber diet to
cancer of the large intestine). For these types of problems, more rigorous analytical or
quasi-experimental studies will have to be conducted before we decide on appropriate
interventions.
Several classifications of study types are possible, depending on what research strategies
are used. Usually a combination of research strategies is used, including:
19
6.3.1. Introduction
Interviews and self-administered questionnaires are probably the most commonly used
research techniques. Therefore, designing good “questioning tools” forms an important
and time-consuming phase in the development of most research proposals.
Once the decision has been made to use these techniques, the following questions should
be considered before designing our tools:
Before examining the steps in designing a questionnaire, we need to review the types of
questions used in questionnaires. Depending on how questions are asked and recorded
we can distinguish tow major possibilities:
• Open –ended questions, and
• Closed questions.
Open-ended questions
Open-ended questions permit free responses that should be recorded in the respondent’s
own words. The respondent is not given any possible answers to choose from.
For example
“Can you describe exactly what the traditional birth attendant did when your labor
started?”
“What do you think are the reasons for a high drop-out rate of village health
committee members?”
20
“What would you do if you noticed that your daughter (school girl) had a
relationship with a teacher?”
Closed Questions
Closed questions offer a list of possible options or answers from which the respondents
must choose.
For example
“What is your marital status? 1. Single
2. Married/living together
3. Separated/divorced/widowed
“Have your every gone to the local village health worker for treatment?
1. Yes
2. No
Closed questions may also be used if one is only interested in certain aspects of an issue
and does not want to waste the time of the respondent and interviewer by obtaining more
information than one needs.
For example, a researcher who is only interested in the protein content of a family diet
may ask:
“Did you eat any of the following foods yesterday? (circle yes or no for each set
of items)
• Peas, bean, lentils Yes No
• Fish or meat Yes No
• Eggs Yes No
• Milk or Cheese Yes No
Closed questions may be used as well to get the respondents to express their opinions by
choosing rating points on a scale.
21
For example
“How useful would you say the activities of the Village Health Committee have
been in the development of this village?”
1. Extremely useful Ο
2. Very useful Ο
3. Useful Ο
4. Not very useful Ο
5. Not useful at all Ο
22
Advantages and disadvantages of open-ended and closed questions
And conditions for optimal use.
Issues not previously thought of when planning Answers can be recorded quickly
the study may be explored, thus providing Analysis is easy.
valuable new insights into the problem.
Skilled interviewers are needed to get the Closed questions are less suitable for face-
discussion started and focused on relevant to face interviews with non-literate.
issues and to record all important information.
Respondents may choose options they
Analysis is time-consuming and requires would not have thought of themselves
experience. (leading questions → bias)
Thoroughly train and supervise the interviewers Use closed questions only on issues that
or select experienced people. are simple.
Prepare a list of further questions to keep at Pretest closed questions first as open-ended
hand to use to “probe” for answer(s) in a questions to see if your categories cover all
systematic way. possibilities.
Pretest open-ended questions and, if possible, Use closed questions in combination with
pre-categorize the most common responses, open-ended questions.
leaving enough space for other answers
23
6.3.3. Steps in Designing a Questionnaire
Designing a good questionnaire always takes several drafts. In the first draft we should
concentrate on the content. In the second, we should look critically at the formulation
and sequencing of the questions. Then we should scrutinize the format of the
questionnaire. Finally, we should do a test-run to check whether the questionnaire gives
us the information we require and whether both we and the respondents feel at ease with
it. Usually the questionnaire will need some further adaptation before we can use it for
actual data collection.
Step1: Content
Decide what questions will be needed to measure or to define your variables and
reach your objectives
When developing the questionnaire, you should reconsider the variables you have
chosen, and, if necessary, add, drop or change some. You may even change some
of your objectives at this stage.
Formulate one or more questions that will provide the information needed
for each variable.
Take care that questions are specific and precise enough that different respondents
do not interpret them differently. For example, a question such as: “Where do
community members usually seek treatment when they are sick?” cannot be
asked in such a general way because each respondent may have something
different in mind when answering the question:
• One informant may think of measles with complications and say he goes to the
hospital, another of cough and say goes to the private pharmacy;
• Even if both think of the same disease, they may have different degrees of seriousness
in mind and thus answer differently;
• In all cases, self-care may be overlooked.
The question, therefore, as rule has to be broken up into different parts and made so
specific that all informants focus on the same thing. For example, one could:
• Concentrate on illness that has occurred in the family over the past 14 days and ask
what has been done to treat if from the onset; or
• Concentrate on a number of diseases, ask whether they have occurred in the family
over the past X months (chronic or serious diseases have a longer recall period than
minor ailments) and what has been done to treat each of them from the onset.
24
Check whether each question measures one thing at a time.
For example, the question, ''How large an interval would you and your husband prefer
between two successive births?'' would better be divided into two questions because
husband and wife may have different opinions on the preferred interval.
A question is leading if it suggests a certain answer. For example, the question, ''Do you
agree that the district health team should visit each health center monthly?'' hardly leaves
room for “no” or for other options. Better would be: “Do you thing that district health
teams should visit each health center? If yes, how often?”
Avoid words with double or vaguely defined meanings and emotionally laden words.
Concepts such as nasty (health staff), lazy (patients), or unhealthy (food), for example,
should be omitted.
• The sequence of questions must be logical for the respondent and allow as much as
possible for a “natural” discussion, even in more structured interviews.
• At the beginning of the interview, keep questions concerning “background variables”
(e.g., age, religion, education, marital status, or occupation) to a minimum. If
possible, pose most or all of these questions later in the interview. (Respondents may
be reluctant to provide “personal” information early in an interview and, if they
become worried about confidentiality, be wary about giving their true opinions.)
• Start with an interesting but non controversial question (preferably open) that is
directly related to the subject of the study. This type of beginning should help to raise
the informants’ interest and lessen suspicions concerning the purpose of the interview
(e.g., that it will be used to provide information to use in levying taxes).
• Pose more sensitive questions as late as possible in the interview (e.g., questions
pertaining to income, political matters, sexual behavior, or diseases with stigma
attached to them
• Use simple everyday language.
25
Make the questionnaire as short as possible. Conduct the interview in two parts if the
nature of the topic requires a long questionnaire (more than 1 hour).
Your questionnaire should not only be consumer but also user friendly!
Step 5: Translation
After having it translated you should have it retranslated into the original
language. You can then compare the two versions for differences and make a
decision concerning the final phrasing of difficult concepts.
Bias in information collection is a distortion that results in the information not being
representative of the true situation.
1. Defective instruments
• Questionnaires with:
26
2. Observer bias
Observer bias can easily occur during observation or loosely structured group or
individual interviews. There is a risk that the data collector will see or hear only
things in which he or she is interested or will miss information that is critical to the
research. Observation protocols and guidelines of conducting loosely structured
interviews should be prepared, and training and practice should be provided to data
collectors in using both these tools. Moreover, it is highly recommended that data
collectors work in pairs when using flexible research techniques and discuss and
interpret the data immediately after collecting it.
This is a possible factor in all interview situations. The informant may mistrust the
intention of the interview and dodge certain questions or give misleading answers.
Such bias can be reduced by adequately introducing the purpose of the study to
informants, by taking sufficient time for the interview, and by assuring informants
that the data collected will be confidential.
Ethical Considerations
If sensitive questions are asked, for example about family planning practices, it may be
advisable to omit names and addresses from the questionnaires.
27
6.4: Sampling
6.4.1. Introduction
What is sampling?
Sampling involves the selection of a number of study units from a defined study
population.
Some studies involve only small number of people and, thus, all of them can be included.
Often, however, research focuses on such a large population that, for practical reasons, it
is only possible to include some of its members in the investigation. We then have to
draw a SAMPLE from the total population.
• What is the group of people (study population) from which we want to draw a
sample?
• How many people do we need in our sample?
• How will these people be selected?
The study population has to be clearly defined, for example, according to age, sex, and
residence. Apart from persons, a study population may consist of villages, institutions,
records, etc.
Each study population consists of study units. The way we define our study population
and our study unit depends on the problem we want to investigate. For example:
High drop-out rates in primary All primary schools in One primary school in
schools in District Y District Y District Y
Representativeness
If researchers want to draw conclusions that are valid for the whole study population,
they should take care to draw a sample in such a way that it is representative of that
population.
28
A Representative Sample has all the important characteristics of the population from
which it is drawn
For example:
An important issue influencing the choice of the most appropriate sampling method is
whether a sampling frame is available, that is, a listing of all the units that composes the
study population.
If a sampling frame is not available, it is not possible to sample the study units in such a
way that the probability for the different units to be selected in the sample is known.
Two such non-probability sampling methods will be reviewed:
• Convenience sampling and
• Quota sampling
1. Convenience sampling
Convenience sampling is a method in which for convenience sake the study units that
happen to be available at the time of data collection are selected in the sample.
Many clinic-based studies use convenience samples.
For example, a researcher wants to study the attitudes of villagers toward family-
planning services provided by the MCH clinic. He decides to interview all adult
patients who visit the out-patient clinic during one particular day. This is more
convenient than taking a ransom sample of people in the village, and it gives a
useful first impression.
29
A drawback of convenience sampling is that the sample may be quite unrepresentative of
the population you want to study. Some units may be over-selected and others under-
selected or missed altogether. It is impossible to adjust for such a distortion. If you need
to be representative you have to use another sampling method.
2. Quota sampling
Quota sampling is a method that ensure that a certain number of sample units from
different categories with specific characteristics appear in the sample so that all these
characteristics are represented.
In this method the investigator interviews as many people in each category of study unit
as he can find until he has filled his quota.
For example, the researcher of the family-planning study just mentioned suspects
that religion might have strong effect on patients' attitudes toward the family-
planning services. He is afraid to miss the Catholics, who are a minority in the
area. He, therefore, decides to include in the study 60 patients from each of the
different religious groups (Hindus, Muslims, Protestants, and Catholics) and to
extend the study over 3 or 4 days to obtain the desired sample.
Quota sampling is useful when researchers feel that a convenience sample would not
provide the desired balance of study units. However, like a convenience sample, it does
not claim to be representative of the entire population.
Probability sampling involves random selection procedures to ensure that each unit of the
sample is chosen on the basis of chance. All units of the study population should have an
equal or at least a know chance of being included in the sample.
Probability sampling requires that a listing of all study units exists or can be compiled.
This listing is called the sampling frame.
If a sampling frame does exist or can be compiled, probability sampling methods can be
used: With these methods, each study unit has an equal or a least a known probability of
being selected in the sample. The following probability sampling methods will be
discussed.
• Simple random sampling
• Systematic sampling,
• Stratified sampling and
• Cluster sampling
• Multistage sampling.
30
1. Simple random sampling
This is the simplest form of probability sampling. To select a simple random sample you
need to:
• Make a number list of all the unites in the population from which you want to draw a
sample;
• Decide on the size of the sample (this will be discussed later);
• Select the required number of sampling units, using a “lottery” method or a table of
random numbers.
2. Systematic sampling
In systematic sampling individuals are chosen at regular intervals (for example every fifth
individual) from the sampling frame. Ideally we randomly select a number to tell us
where to start selecting individuals from the list.
The number of the first student to be included in the sample is chosen randomly,
for example by blindly picking one out of twelve pieces of paper, numbered 1 to
12. If number 6 is picked, then every twelfth students will be included in the
sample, starting with student number 6, until 100 students are selected: the
numbers selected would be 6, 18, 30, 42, etc.
Systematic sampling is usually less time consuming and easier to perform than simple
random sampling. However, there is a risk of bias, as the sampling interval may coincide
with a systematic variation in the sampling frame. For instance, if we want to select a
random sample of days on which to count clinic attendance, systematic sampling with a
sampling interval of 7 days would be inappropriate, as all study days would fall on the
same day of the week, which might, for example, be a market day.
31
3. Stratified sampling
The simple random sampling method described above does not ensure that the
proportions of individuals with certain characteristics in the sample will be same as those
in the whole study population.
If it is important that the sample includes representative groups of study units with
specific characteristics (for example, residents from urban and rural areas, of different
age groups), then the sampling frame must be divided into groups, or strata, according to
these characteristics. Random or systematic samples of a predetermined size will then
have to be obtained from each group (stratum). This is called stratified sampling.
Stratified sampling is only possible when we know what proportion of the study
population belongs to each group we are interested in.
An advantage of stratified sampling is that we can take a relatively large sample from a
small group in our study population. This allows us to get a sample that is big enough to
enable us to draw valid conclusions about a relatively small group without having to
collect an unnecessarily large (and hence expensive) sample of the other larger groups.
However, in doing so, we are using unequal sampling fractions, and it is important to
correct for this when generalizing our findings to the whole study population.
4. Cluster sampling
It may be difficult or impossible to take a simple random sample of the units of the study
population, either because a complete sampling frame does not exist, or because of other
logistical difficulties (e.g., visiting people who are scattered over a large area may be too
time consuming). However, when a list of groupings of study units is available (e.g.,
villages or schools or can be easily compiled, a number of these groupings can be
randomly selected.
The selection of groups of study units (clusters) instead of the selection of study units
individually is called cluster sampling.
Clusters are often geographic units (e.g. districts, villages) or organizational units (e.g.
clinics, training groups).
32
For example, in a study of the knowledge, attitudes, and practices related to
family planning in rural communities of a region, a list is made of all the villages.
Using this list, a random sample of villages is chosen and all the adults in the
selected villages are interviewed.
5. Multistage sampling
In very large and divers populations sampling may be done in two or more stages. This is
often the case in community-based studies, in which people to be interviewed are from
different villages, and the villages have to be chosen from different areas.
A multistage sampling procedure is carried out in phases and usually involves more than
one sampling method.
33
The main advantages of cluster and multistage sampling are that:
• A sampling frame of individual units is not required for the whole population.
Initially a sampling frame of clusters is sufficient. Only within the clusters that are
finally selected do we need to list and sample the individual units.
• The sample is easier to select than a simple random of similar size, because the
individual units in the sample are physically together in groups, instead of scattered
all over the study population.
Compared to simple random sampling, there is a large probability that the final
sample will not be representative of the total study population. The likelihood of
the sample not being representative depends mainly on the number of clusters
selected in the first stage. The larger the number of clusters, the greater the
likelihood that the sample will be representative.
Bias in Sampling
For example, a study was conducted to determine the health needs of a rural
population to plan primary health care activities. However, a nomadic tribe,
which represented one third of the total population, was left out of the study. As a
result, the study did not give a picture of the health needs of the total population.
There are several possible sources of bias in sampling. The best known source of bias is
non-response.
There are several ways to deal with this problem and reduce the possibility of bias:
• Data collection tools (including written introductions for the interviewers to use with
potential respondents) have to be pre tested. If necessary, adjustments should be
made to ensure better cooperation.
• If non-response is due to absence of the subjects, follow-up of non-respondents may
be considered.
• If non-response is due to refusal to cooperate, and extra, separate study of non-
respondents may be considered to discover to what extent they differ from
respondents.
34
• Another strategy is to include additional people in the sample so that non-respondents
who were absent during data collection can be replaced. However, this can only be
justified if their absence was very unlikely to be related to the topic being studied.
The bigger the non-response rate, the more necessary it becomes to take remedial action.
It is important in any study to mention the non-response rate and to honestly discuss
whether and how it might have influenced the results.
Other sources of bias in sampling may be less obvious, but at least as serious:
• Studying volunteers only. The fact that volunteers are motivated to participate in
the study may mean that they are also different from the study population on the
factors being studied. It is better to avoid using nonrandom procedures that introduce
the element of choice.
• Sampling of registered patients only. Patients reporting to a clinic are likely to
differ systematically from people seeking treatment at home.
• Missing cases of short duration. In studies of the prevalence of disease, cases of
short duration are more likely to be missed. This may often mean missing fatal cases,
cases with short episodes, and mild cases.
• Seasonal bias. It may be that the problem under study exhibits different
characteristics in different seasons of the year. For this reason, data on the prevalence
and distribution of malnutrition in a community, for example, should be collected
during all seasons rather than just at one time. When investigating health services’
performance, to take another example, one has to take into account the fact that
toward the end of the financial year shortages may occur in certain budget items
which may affect the quality of services delivered.
• Tarmac bias. Study areas are often selected because they are easily accessible.
However, these areas are likely to be systematically different from more inaccessible
areas.
Ethical Considerations
If the recommendations from a study will be implemented in the entire study population,
one has the ethical obligation to draw a sample from this population in a representative
way. If part way through the research, new evidence suggests that the sample was not
representative, this should be mentioned in any publication concerning the study, and
care must be taken not to draw conclusions or make recommendations that are not
justified.
Sample size
Having decided how to select our sample, we now have to determine our sample size. As
a general rule we can say that the desirable sample size is determined by the expected
variation in the data: the more varied that data are, the larger the sample size we need to
attain the same level of accuracy.
35
The eventual sample size is usually a compromise between what is desirable and what is
feasible. The feasible sample size is determined by the availability of resources: time,
manpower, transport, money, etc.
The desirable sample size also depends on the number of cells we will have in the cross
tabulations required to analyze the results. A rough guideline is to have at least 20 to 30
study units per cell.
In some studies minimum sample sizes can be calculated depending on the objective of
the study, estimation of population parameter with a certain precision or test of
significance differences between groups.
36
6.5. Plan for Data Collection
6.5.1. Introduction
1. Listing the tasks that have to be carried out and who should be involved, making a
rough estimate of the time needed for the different parts of the study, and identifying
the most appropriate period in which to carry out the research.
2. Actually scheduling the different activities that have to be carried out each week in a
work plan.
Before the workshop is finished, a pretest of the data collection and data analysis
procedures should be made. The advantages of conducting the pretest before we finalize
our proposal is that we can draft the work plan and budget based on realistic estimates, as
well as revise the data collection tools before we submit the proposal for approval.
However, if this is not possible (for example, because the proposal is drafted far from the
field, and there are no similar research settings available close to the workshop site), the
field test may be done after finishing the proposal, but long enough before the actual
fieldwork to allow for a thorough revision of data collection tools and procedures.
Consent must be obtained from the relevant authorities, individuals, and the community
in which the project to be carried out. This may involve organizing meetings national or
provincial level, at district, and at village level. For clinical studies this may also involve
obtaining written informed consent.
Most likely the principal investigator will be responsible for obtaining permission, to
proceed at the various levels.
37
Stage II: Data collection
• Logistics: who will collect what, when and with what resources and
• Quality control.
When allocating tasks for data collection, it is recommended that you first list them.
Then you may identify who could best implement each of the tasks. If it is clear
beforehand that you research team will not be able to carry out the entire study by itself,
you might look for research assistants to assist in relatively simple but time-consuming
tasks.
2. Ensuring quality
In the previous modules possible sources of data distortion (bias) have been
discussed. Biases we should try to prevent include:
There are a number of measures that can be taken to prevent and partly correct such
distortions, by remember: prevention is FAR better than cure! Cure is usually surgery:
you may have to cut the bad parts of your data or, at best, devise crutches.
38
There are several other aspects of the data-collection process that will help ensure data
quality. You should:
• Select your research assistants, if requires, with care. Choose assistants that
are:
- from the same educational level;
- knowledgeable concerning the topic and local conditions;
- not the object of study themselves; and
- not biased concerning the topic (for example, health staff are usually not
the best interviewers for a study on alternative health practices).
• Pretest research instruments and research procedures with the whole research
team, including research assistants.
• Take care that research assistants are not placed under too much stress
(requiring too many interviews a day; paying per interview instead of per
day).
• Devise methods to assure the quality of data collected by all members of the
research team. For example, quality can be assured by:
39
- asking the supervisor to check at the end of each day during the data
collection period whether the questionnaires are filled in complete and
whether the recorded information makes sense.
- having the researchers review the data during the data analysis stage to
check whether data are compete and consistent.
It is extremely important that the data we collect are of good quality, that is reliable
and valid. Otherwise we may come up with false or misleading conclusions.
Once the data have been collected, a clear procedure should be developed for handling
and storing them:
• First, it is necessary to check that the data gathered are complete and accurate At
some stage questionnaires will have to be numbered. Decide if this should be done at
the time of the interview or at the time the questionnaires are stored.
• Identify the person responsible for storing data and the place where they will be
stored.
• Decide how data should be stored. Record forms should be kept in the sequence in
which they have been numbered.
40
6.6. Plan for Processing and Analysis for Data
6.6.1. Introduction
Such a plan helps the researcher assure that at the end of the study:
• All the information he or she needs has indeed been collected, and in a standardized
way;
• He or she has not collected unnecessary data that will never be analyzed.
This implies that the plan for data processing and analysis must be made after careful
consideration of the objectives of the study as well as the list of variables.
The procedures for the analysis of data collected through qualitative and quantitative
techniques are quite different. Therefore, one must also consider the type(s) of study and
the different data-collection techniques used when making a plan for data processing and
analysis.
For quantitative data, the starting point in analysis is usually a description of the data
for each variable for all the study units included in the sample.
For qualitative data, it is more a matter of describing, summarizing, and interpreting data
obtained for each study unit (or for each group of study units). Here the researcher starts
analyzing while collecting the data so that questions that remain unanswered (or new
questions that come up) can be addressed before data collection is over.
Preparation of a plan for data processing and analysis will provide you with better insight
into the feasibility of the analysis to be performed as well as the resources that are
required. It also provides an important review of the appropriateness of your data-
collection tools.
The plan for processing and analysis of data must be prepared before the data is collected
in the field so that it is still possible to make changes in the list of variables or the data-
collection tools.
41
What should the plan include?
When making a plan for data processing and analysis the following issues should be
considered:
• Sorting data
• Performing quality-control checks,
• Data processing, and
• Data analysis.
An appropriate system for sorting data is important for facilitating subsequent processing
and analysis.
If you have different study populations (for example village health workers, village
health committees, and the general population), you obviously would number the
questionnaires separately.
In a comparative study, it is best to sort the data right after collection into the two or the
three groups that you will be comparing during data analysis.
For example, in a study concerning the reasons for low acceptance of family-
planning services, users and nonusers would be basic categories; in a study of the
reasons why nurses object to being posted in rural areas, rural and urban nurses
would be basic categories; in a case-control study obviously the cases are to be
compared with the controls.
In a cross-sectional survey it may be useful to sort the data into two or more groups,
depending on the objectives of the study.
Usually the data have already been checked in the field to ensure that all the information
has been properly collected and recorded. Before and during data processing, however,
the information should be checked again for completeness and internal consistency.
42
• If a questionnaire has not been filled in completely you will have missing data for
some of your variables. If there are many missing items in a particular
questionnaire, you may decide to exclude the whole questionnaire from further
analysis
• If an inconsistency is clearly due to mistake made by the researcher or assistant
(for example, if a person in an earlier question is recorded as being a nonsmoker,
whereas all other questions reveal that he is smoking), it may still be possible to
check with the person who conducted the interview and to correct the answer.
• If the inconsistency is less clearly a mistake in recording, it may be possible (in a
small-scale study) to return to the respondent and ask for clarification
• If it is not possible to correct information that is clearly inconsistent, you may
consider excluding this particular part of the data form further processing and
analysis. If a certain question produces ambiguous or vague answers throughout,
the whole question should be excluded from further analysis.
A decision to exclude data should be considered carefully, as it may affect the validity of
the study. Such a decision is ethically correct and it testifies to the scientific integrity of
the researcher. You should keep and accurate count of how may answers or
questionnaires.
If you process your data by computer, quality-control checks must also include a
verification of how the data has been transformed into codes and subsequently entered
into the computer.
43
6.6.4. Data Processing
As you begin planning for data processing, you must make a decision concerning whether
to process and analyze the data:
Categorizing
For categorical variables that are investigated through closed questions or observation
(for example, observation of the presence or absence of latrines in households) the
categories have been decided upon beforehand.
In interviews, the answers to open-ended questions (for example, Why do you smoke?)
can be pre categorized to a certain extent, depending on the knowledge of possible
answer. However, there should always be a category called “others, specify…”, which
can only be categorized afterwards. These responses should be listed and placed in
categories that are a logical continuation of the categories you already have. Answers
that are difficult or impossible to categorize may be put into a separate residual category
called “others,” but this category should not contain more than 5% of the answers
obtained.
For numerical variables, the data are usually collected without any pre-categorization.
Because you are often still discovering the range and the dispersion of the different
values of these variables when you collect your sample (e.g., home-clinic distance for
out-patients), decisions concerning how to categorize numerical data (and how to code
them) are usually made after they have been collected.
Coding
If data are entered into a computer for subsequent processing and analysis, it is essential
to develop a coding system.
Coding is a method used to convert (translate) the data gathered during the study into
symbols appropriate for analysis.
44
For computer analysis, each category of a variable is usually given a number, for
example, the answer “yes” may be coded as 1, “no” as 2 and “no response” as 9.
The codes should be entered on the questionnaires (or checklists) themselves. When
finalizing your questionnaire, for each question, you should insert a box for the code in
the right margin of the page. These boxes should not be used by the interviewer. They
are only filled in afterwards during data processing. Take care that you have as many
boxes as the number of digits in each code.
If you intend to process your data by computer, always consult an experienced person
before you finalize your questionnaire.
Frequency counts
From the data master sheets, simple tables can be made with frequency counts for each
variable. A frequency count is an enumeration of how often a certain measurement or a
certain answer to a specific question occurs.
For example,
Smokers 63
Nonsmokers 74
Total 137
A percentage is the number of units in the sample with a certain characteristic, divided by
the total number of units in the sample and multiplied by 100.
In the above example the calculation of the percentage answers the question: if I had
asked 100 people who had an episode of coughing if they smoke cigarettes, how many
would have answered “yes”? The percentage of people answering “yes” would be:
63/137 x 100 = 46%
45
A frequency table such as the following could then be presented:
Sometimes data are missing due to non-response or (in oral interviews) non-recording by
the interviewer. Usually you do not use missing data in the calculation of percentages.
However, the number of missing data is a useful indication of the quality of your data
collection and, therefore, this number should be mentioned.
It is usually necessary to summarize the data from numerical variables by dividing them
into categories. This process includes the following steps:
1. Inspect all the figures: What is their range? (The range is the difference between
the largest and the smallest measurement.)
2. Divide the range in three to five categories. You can either aim at having a
reasonable number in each category (e.g., 0-2 km, 3-4 km, 10 + km for home-clinic
distance) or you can define the categories in such a way that they all start with
round numbers (e.g. 20-29 years, 30-39 years, 40-49 years, etc.).
Cross-tabulations
In addition to making frequency counts for one variable at a time, it may be useful to
combine information on two or more variables to describe the problem or to arrive at
possible explanation for it.
Depending on the objectives and the type of study, three different kinds of cross-
tabulations may be required:
46
When the plan for data analysis is being developed, the data are of course, not available.
However, to visualize how the data can be organized and summarized, it is useful at this
stage to construct so called dummy cross-tabulations.
A dummy table contains all elements of a real table, except that the cells are still empty.
In a research proposal dummy tables should be prepared to show the major relationships
between variables.
It is extremely important to determine before you start collecting the data what tables you
will need to assist you in looking for possible explanations of the problem you have
identified. This will prevent you from collecting too little or too much data in the field.
It will also save you much time in the data-processing stage. Care should be taken not to
embark on an unstructured comparison of all possible variables. The dummy tables to be
prepared follow from the specific objectives of the study.
• All tables should have a clear title and clear headings for all rows and columns.
• All tables should have a separate row and a separate column for totals to enable you
to check if your totals are the same for all variables and to make further analysis
easier.
• All tables related to each objective should be numbered and kept together so the work
can be easily organized and the writing of the final report will be simplified.
47
6.7. Pre-test/Pilot Study
A pilot study is the process of carrying out a preliminary study, going through the entire
research procedure with a small sample.
What aspects of your research methodology can be evaluated during pre testing?
1. Reactions of the respondents to the research procedures can be observed in the pretest
to determine:
• Availability of the study population and how respondents’ daily work schedules
can best be respected;
• Acceptability of the methods used to establish contact with the study population;
• Acceptability of the questions asked; and
• Willingness of the respondents to answer the questions and collaborate with the
study.
2. The data-collection tools can be pre tested to determine:
• Whether the tools you use allow you to collect the information you need and
whether those tools are reliable. You may find that some of the data collected are
not relevant to the problem or are not in a form suitable for analysis. This is the
time to decide not to collect these data or to consider using alternative techniques
that will produce data in a more usable form.
• How much time is needed to administer the questionnaire, to conduct
observations or group interviews, and to make measurements.
• Whether there is any need to revise the format or presentation of questionnaires or
interview schedules, including whether:
- The sequence of questions is logical,
- The wording of the questions is clear,
- Translations are accurate,
- Space for answers is sufficient
- There is a need to pre categorize some answers or to change closed
questions into open-ended questions,
- There is a need to adjust the coding system, or
- There is a need for additional instructions for interviewers (e.g.,
guidelines for “probing” certain open questions).
48
7. Work plan
7.1. Introduction
A work plan is a schedule, chart or graph that summarizes, in a clear fashion, various
components of a research project and how they fit together.
It may include:
49
Example of Work Schedule: Child-Spacing Study (C/S)
50
7.2.2. THE GANTT chart
The GANIT chart is a planning tool which depicts graphically the order in which various
tasks must be completed and duration of each activity.
The length of each task is shown by a bar that extends over the number of days, weeks or
months the task is expected to take.
51
Example of a GANTT chart for the child spacing study.
Tasks to be performed Responsible person April May June July Aug. Sept. Oct. Nov. Dec.
1. Finalize research proposal Research team
52
8. Plan for Project Administration, Monitoring & Utilization
Project administration is the term for all the activities involved in managing the human material,
financial, and logistical resources of a project.
• It allows for orderly and accurate purchase and procurement of equipment, payment of bills,
and preparation of financial reports.
• It allows researchers to foresee the need for funds and to make timely requests to avoid
unwanted breaks in the implementation of the project.
• It allows researchers to devote most of their time to the technical and scientific aspects of the
project.
What administrative issues should be considered as the project proposal is being finalized?
As a team developing a research project, you should now consider the following issues:
• One of the team members should be selected by you as principal investigator (PI). A
principal investigator is the “first among equals”; he or she is ultimately responsible for
implementing the proposal as planned and for solving possible problems that may arise. The
PI is the team’s representative for official contacts with the ministry of health and with other
relevant (funding, research, or service) institutions.
• An organizational unit or official has to be identified, outside the team, who has the power to
receive and handle funds: a principal administrator. The research team has to consider what
service unit is best able to:
53
8.2. Project Monitoring
Monitoring should continue throughout the project and be organized so that it is helpful in
alerting staff to problems that develop and changes needed. It is a valuable management and
learning tool for everyone concerned.
• The resources needed for the project, including staff, equipment, supplies, logistical support
and funds, to assess if they are available when needed and being appropriately used;
• The activities of each team member and their relations to the project as a whole, to assess if
the work plan is being carried out as planned and what delays of difficulties, if any, have
emerged that need to be addressed;
• The flow and quality of the data that are being collected; and
• The communication and coordination of the research team with the study population, other
collaborating groups, and funding authorities.
Monitoring will usually take place at team meetings during field activities. If there is a gap in
the fieldwork, it may be necessary to convene a special meeting.
It is advisable to keep close track of changes in the work plan and problems encountered and
solved (or unsolved) so that your can inform you facilitator and superiors, and include this
information in your preliminary report
Before you finish drafting your research proposal you should start planning how the results of
your study could be used.
Why should the researcher be concerned about utilization and dissemination of research
results?
The fundamental reason for undertaking health systems research is to obtain results that can be
used to improve health and health care.
54
Who will be interested in the results?
Depending on the topic you selected, the results may be useful to the community, to staff and
managers of health and health-related services, and to researchers and donor agencies in general
to all stakeholders in your own country as well as others.
However, above all, you as a research team and your program should keep the results, as you
have developed the proposal to help solve one of your own priority problems.
What strategies can you follow to ensure that the results of your study will be used?
1. Involve relevant authorities, staff, and community members in the selection of your
topic and in the definition of your problem.
If possible, these groups should be consulted before the proposed development workshop
begins. If the final decision for a certain topic is made during the workshop, however, not all
parties concerned may have been consulted. If not, they should be consulted immediately
after the workshop.
2. List the major types of recommendations you expect to obtain from your study and
identify who should be involved in their implementation.
Most likely you will be authorized to implement certain recommendations yourself, but for
others you will need the approval of your superiors or of decision-makers from other sectors.
Some authorities may merely need to give their approval, but you may need the active
collaboration of others during the application of the results. Furthermore you will need to
identify from which colleagues, subordinate staff, and target groups in the community
cooperation will be required for the implementation of the study’s recommendations.
3. Identify which communication channels already exist which can be used to disseminate
results.
Keep relevant parties informed of progress during project implementation and plan to obtain
their input when study findings and recommendations are drafted.
55
4. Determine what written materials should be prepared to keep relevant parties
informed. They may include:
• A 1- to 2-page summary of your project proposal, which includes details on expected results,
to distribute when you introduce the project to policymakers and staff concerned.
• A progress report of 4-5 pages including preliminary findings and recommendations that you
will prepare for presentation at the data analysis and report writing workshop. This report
can also be used to inform authorities who will be crucial to utilization of project results.
• The draft report of findings and recommendations prepared during the data analysis
workshop. The summary of this report can be used for discussion with policymakers and
staff. However, for decision-makers and target groups in the community, you will need a
different summary, concentrating in simple words on the findings and recommendations that
directly concern them.
Make sure that summaries of your findings and recommendations are adapted to the level of
understanding and interests of different audiences. This will increase their motivation to
provide feedback and to participate in the implementation of the final recommendations.
• Special visits to top policymaker(s) by the principal investigator or the whole research team
to report on progress during the fieldwork or to discuss preliminary results and
recommendations.
• The invitation of the one or two most crucial persons for implementation of your
recommendations to the last day of the data analysis workshop, when you will present your
findings and recommendations in plenary.
• Special meetings with policymakers, staff, and representatives of the target groups concerned
to discuss the findings and recommendations of the study and develop a plan for action.
For complex projects of relative long duration, it may be advisable to have a Project Advisory
Committee, representing the major parties involved. Because the projects developed during
workshops will, in general, not last longer than 6 months, you may be able to keep key
individuals or representatives informed through ad hoc or even routine meetings.
Do not forget to report the findings to the subjects/community/ organization studied before the
report is finalized. This should be done to fulfil and obligation to those studied, to obtain
information on possible errors in your draft report, and to discuss your proposed
recommendations and obtain useful feedback.
56
9. Budget
• A detailed budget will help you to identify which resources are already locally available and
which additional resources may be required.
• The process of budget design will encourage you to consider aspects of the work plan you
have not thought about before and will serve as a useful reminder of activities planned, as
your research gets underway.
A complete budget is normally not prepared until the final stage of project planning. However,
cost is usually a major limiting factor and, therefore, must always be kept in mind during
planning so that your proposals will not have an unrealistically high budget. Remember that both
ministries and donor agencies usually set limits for research project budgets.
The use of locally available resources increases the feasibility of the project from a financial
point of view.
It is convenient to use the work plan as a starting point. Specify, for each activity in the work
plan, what resources are required. Determine for each resource needed the unit cost the total
cost.
Example:
In the work plan of a study to determine the utilization of family planning methods in a
certain district, it is specified that 5 interviewers will each visit 20 households in clusters
of 4 over a time period of 5 working days. A supervisor will accompany one of the
interviewers each day using a car. The other 4 interviewers will use motor cycles. The
clusters of households are scattered over the district but are on average 50 kilometers
from the district hospital from where the study is conducted.
The budget for the fieldwork component of the work plan will include funds for
personnel, transport and supplies.
Note that unit cost (e.g., per diem or cost petrol per km), the multiplying factor (number of days),
and total cost should be clearly indicated for all budget categories.
57
Budget justification
The budget justification follows the budget as an explanatory note justifying briefly, in the
context of the proposal, why the various items in the budget are required. Make sure you give
clear explanations concerning why items that may seem questionable or are particularly costly
are needed and discuss how complicated expenses have been calculated. If a strong budget
justification has been prepared, it is less likely that essential items will be cut during proposal
review.
58
10. Finalizing and reviewing the research proposal
When you have finished the methodological section of your research proposal and have pre
tested the methodology or at least reviewed it thoroughly, you can start preparing the final draft
of various parts of your research proposal.
TABLE OF CONTENTS
1. INTRODUCTION
2. OBJECTIVES
3. METHODOLOGY
4. PROJECT MANAGEMENT
5. BUDGET
5.1 Budget
5.2 Budget justification
ANNEXES
Annex 1. References
Annex 2. List of abbreviations (if applicable)
Annex 3. Questionnaires (and/or other data collection tools)
59
10.2. Writing a Summary of the Research Proposal
When you have completed writing your research proposal, there is usually a need for the
protocol to be reviewed by senior authorities and policymakers or funding agencies. For the
purpose of obtaining approval from policymakers or very busy administrators, it is advisable to
add a summary (of no more than two pages) to the proposal.
A brief narrative summary of one page, which could contain the following elements:
You should put the summary at the beginning of the proposal, although it is the last thing you
prepare.
After the summary, a table of contents should follow. Adding numbers to the pages of your
report and including them in your table of contents is one of the last activities involved in
preparing your proposal.
The title page should be prepared, containing the title of your study, the names of the researchers
with their titles, name of organization or place, date.
60
A Short Handbook for beginners to
61
A Short Handbook for beginners to EPI INFO: A Word Processing, Database, and
Statistical System for Epidemiology on Microcomputers
A questionnaire is a template that guides EPI INFO in making a data file. Once you have
created a questionnaire making a data file is an automatic process.
An Epi Info questionnaire may have up to 500 lines. Headings and other text may appear
anywhere.
Places where you enter data are called “Fields” or in the analysis phase “variables”.
In EPED, holding down the control key and pressing “Q” twice (<Ctrl Q-Q>) will display a list
of field symbols. Choosing a field type from the list will automatically insert the field after the
current cursor position.
To begin making your questionnaire first set EPED for the questionnaire (.QES) mode.
Numeric data value can be integers or non-integers and the length of values depend on the
number of digits required.
Example: The variable birth weight may be measured in grams or kilograms. In these cases the
variable weight may take integer or non-integer values.
• Weight #### defines the variable in integer values with 4 digits
• Weight #.## defines the variable with two decimal digits
62
Note that the sign # is used to define numerical data values
For those variables that take character (nonnumeric) values, like name, country of birth, etc.
called general-purpose fields, the length of the variable is represented by underscores (_______).
Example: Name _________________
Uppercase fields. These are similar to the general-purpose fields, but entries will be converted
to uppercase. The length is indicated by the number of characters between the < and > symbols.
Example: <A>, <A >
As far as possible the variables that you are going to use in data entry and analysis should be
short and clear so that you can identify the variables easily. You may wonder how Epi Info
assigns names to the fields or variables.
The ENTER program looks at the first ten characters of the question phrase that precedes a
field. For example the variable “City or town” will become CITYORTOWN.
This is rather awkward name for the variable you will manipulate during analysis, but you can
tell ENTER explicitly what names you would like by placing curly brackets --- and --- around
the part of the question that contains the name.
63
To define dates that contain month, day and year use <mm/dd/yy> where mm for month, dd for
day and yy for year in two digits.
For example 18/11/99 in the <mm/dd/yy> format can be entered as: <11/18/99>
After you define your variables and making your questionnaire you have to save the
questionnaire file using an extension of .QES, the default, say DIABETES.QES
To save your questionnaire file Press F2 and select ‘Save file to …’ and write the name for the
file. After completing defining and saving your .QES file Press F10, Done to Quite the EPED
window.
A separate program, called CHECK allows skip patterns, ranges, automatic coding, legal values,
and other more complex operations during data to be specified.
Font Styles
Alt – 1 Bold
Alt – 2 Double printing
Alt – 3 Underlining
Alt – 4 Superscript
Alt – 5 Subscript
Alt – 6 Compressed Type
Alt – 7 Italic type
Ctrl O-C Center current line
Ctrl Q-L Undo this line
Ctrl Q-U Undelete
F5 Print
64
2. The ENTER program
The ENTER program will create a data file from a questionnaire. Once the file is created,
ENTER guides the data entry process so that any number of data records may be added.
We have made a questionnaire file called DIABETES.QES. Now we will run the ENTER
program so that we can create a database, using DIABETES.QES as the template.
In ENTER you will see a screen that asks for the name of a data file. Type “DIABETES”.
Actually the data file will be called DIABETES.REC, but ENTER assumes the “.REC”, the
default, if it is not supplied.
When you run ENTER, the following menu choices will be displayed:
1. Enter or edit data
2. Create new data file from .QES file
3. Revise structure of data file using revised .QES
4. Reenter and verify records in existing data file
5. Rebuild index file(s) specified in .CHK file
After typing DIABETES, press <enter> and then choose menu item “2”, “Create new data file
from .QES file”
Now ENTER will ask for the name of the .QES file. Type the file name of the questionnaire
you have just made, i.e., DIABETES.QES. Do not forget to include the path where you saved
your questionnaire file. Then press <enter> to confirm that the file you specified is the correct
questionnaire file to be used.
After you make these entries, ENTER will make a .REC file and give further instructions.
65
Note the following:
i. Once you created the .REC file, start entering data and quit data entry, to continue data
entry next time the option you have to select in the ENTER program is “1” – “Enter or
edit data” the default.
ii. In cases were you modify the questionnaire file, say changing the length of the fields or
variables, you have to modify the REC file with the new questionnaire file. To do this
first go to the ENTER program, type the name of the data file, say DIABETIC.REC,
then <enter> and choose option “3” – “Revise structure of data file using revised .QES”.
Often it is useful to have the computer check for errors during the data entry process to do
automatic coding of entries, and to skip over parts of the questionnaire if certain conditions are
met. The CHECK program makes it possible to instruct ENTER to perform such operations
automatically. By using CHECK you can protect your data against many common types of error
and also make data entry easier and more automatic.
CHECK makes a file with a name ending .CHK. The .CHK file contains instructions for
ENTER to restrict that data entered in specified fields. When ENTER is run, it automatically
looks for a file with the same name as the .REC file but ending in .CHK. Using CHECK is
optional and the ENTER program will function just as well is you decide not to make a .CHK
file.
Before running CHECK, a .REC file must be already exist for the questionnaire you wish to
enhance. Use ENTER to make a .REC file from the .QES file if necessary before running
CHECK. Do not enter any data items, but exit immediately from ENTER with <F10> to make
an empty .REC file; then run CHECK from the main menu. Enter the name of the .REC file in
the data file space in CHECK and answer “Y” in the “Ready?” blank.
66
CHECK presents the questionnaire on the screen with the following function keys indicated on
the bottom line:
F1/F2 – Min/Max F5 – Link fields
F3 – Repeat F9 – Edit field
F4 – Must enter F10 – Quit
Legal: F6 – Add Shift F6 – Display Ctrl-F6 – Delete
Jump: F7 – Add Shift F7 – Display Ctrl-F7 – Delete
Codes: F8 – Add Shift F8 – Display Ctrl-F7 – Delete
ANALYSIS produces lists, frequencies, tables, statistics, and graphs from Epi Info files. In the
ANALYSIS program, the upper portion of the screen shows results and the lower window is for
entering commands. One STATUS line at the top of the screen gives the name of the active data
set and the amount of memory available.
67
The PROMPT line at the bottom describes commonly used commands accessible through
function keys.
The ANALYSIS program is command driven, that is, it requires commands for its operation.
The commands may be given from the keyboard or placed in a PROGRAM FILE that is then
RUN in ANALYSIS. Program files may be created in EPED merely by putting the commands
in a text files, called syntax file, with one command on each line.
Most frequently used commands in ANALYSIS are:
1. General
READ - Specifies the data set to be analyzed. This command must precede all commands that
use data files
LIST - Produces a line listing of variables with data points
FREQ - Produce frequency distribution
TABLES - Does cross tabulations
MEANS - Does means, ANOVA, etc.
REGRESS – Does regression and correlation analysis for quantitative variables
ROUTE – Opens a FILE, PRINTER or SCREEN
2. Variable Manipulation
SELECT - Selects a subset of the data file
DEFINE – Defines new variable
RECODE – Recodes an existing variable into new variable
SORT – Does sorting of the data file in some order with a sorting variable(s)
IF – defines conditions and one or more consequences (assignment of values) which result for
every record in which the conditions are met.
68
5. Browsing
BROWSE: Displays the current data file in a spreadsheet format on the screen. You can use the
arrow keys to move around and see the entire data sheet but you can not use this
command to change data values.
UPDATE: is just like BROWSE, except that you can change the data items. This is the only
command in ANALYSIS that actually changes the original data file; it is important to make a
back-up file before using UPDATE.
6. Graphics
Commands can be in upper or lower cases or mixtures except for items in quotation marks,
where case is taken seriously.
In subsequent operations, we want ANALYSIS to exclude missing values from the calculations
and tables. Normally, missing values are included, and are treated as zeroes. This can be
changed with the set command.
SET IGNORE = ON
IMPORT allows files created in other systems to be brought into Epi Info for processing or for
conversion to still other file formats. It accepts files either in fixed-length card format or in
comma-delimited format using string or text fields enclosed in quotation marks. It will also
import lotus .WKS and .WK1 files and dBase II, III, or IV files directly.
In importing data files from other programs, the following are important and needs to be defined:
i. Input format – Fixed, Delimited, Lotus, and dBase
ii. Input file name – the name of the file you want to import
iii. Output file name – the name of the REC file to be created
69
8. The EXPORT program
Data files created in Epi Info have file names ending in .REC. They consist entirely of “ASCII”
(printable) characters and can be transmitted over electronic mail systems. If you prefer to do
analyses in a program other than Epi Info, however, the program called EXPORT will transform
.REC files into files that can be used in a variety of commercial software systems, as described
below.
During exporting your REC file to other programs, you need to define and give:
i. Input file name – REC file
ii. Output format – the type of data file you want to create from your REC file
iii. Output file name – specify the name of the file name that you want to create using
the default extension
70
9. The MERGE program
MERGE can be used for combining Epi Info files in several different ways or for updating
records in one file using data in another file. It operates in batch mode, making a permanent file
containing the results of merging two existing Epi Info files.
Many of the functions of MERGE can be accomplished in a dynamic way with the relational
features of ANALYSIS. Sometimes, however, a batch program like MERGE is useful to
incorporate into a permanent surveillance system or other database application. MERGE may be
used to combine records from many different sources submitted to a central processing facility in
different files, or to perform batch updates using update records sent in from other sites.
You can MERGE (combine) two files together to create a combined file for all the cases. To do
this you have to give the names of the two REC files and a name for the output file. You also
have to specify the MERGE option.
71
10. The Anthropometric calculations using EPINUT.
EPINUT is a program for performing calculations with anthropometric data in Epi Info files and
for displaying summary statistics from the data. To use EPINUT, you must have a .REC file
with the relevant information already entered (example, sex, age, weight and height).
11. The CSAMPLE program: Analyzing data from complex survey samples
The ANALYSIS program in Epi Info performs statistical calculation that assume the data come
from simple random (or unbiased systematic) samples. In many surveys applications more
complicated sampling strategies are used. These may involve sampling features like
stratification, cluster sampling, and the use of unequal sampling fractions. Surveys that
include some form of sampling include the coverage surveys of the WHO EPI and CDCs
Behavioral Risk Factors Surveillance system.
CSAMPLE computes proportions or means with standard errors and confidence limits for
studies in which the data did not come from a simple random sample. If tables with two
dimensions are requested, the old ratios and risk ratio, and risk difference are also calculated.
Data from complex sample designs should be analyzed with methods that account for the
sampling design. In the past, easy-to-use programs were not available for analysis of such data.
CSAMPLE provide these facilities and with an understanding of sampling design and analysis,
can form the basis of a complete survey system.
72
Some basic commands in ANALYSIS program
1. READ diabetes.rec
2. LIST * * is shorthand for all fields
The command LIST will display only as many variables as will fit across the current
screen width
3. FREQ Q3 – produces frequencies and give the absolute and relative frequencies (%) for each
value of age
4. TABLES Q3 Q4 – produces cross tabulation that shows age and sex together with some
statistics
5. TABLES Q3 Q4 Q5 – produces table of age and sex for the different address – strata
6. For quantitative data, the MEANS command produces a table that displays continuous or
ordinal data and then perform appropriate statistical analysis
MEANS Q48 Q4 – will compare mean diastolic blood pressure between males and females
7. Charts and graphs
BAR Q9
SCATTER Q47 Q48 /R
/R is placed to get a least square regression line through the data points
8. BROWSE Q3 Q4 Q9 Q16
For example the diabetes questionnaire may be defined in the following way:
Card number ###
Study number ###
Age ##
Sex #
Address #
Distance from Jimma ###
Distance from the nearby town where there is transport ###
Duration of living in the area at time of diagnosis ###
Occupation #
Type of DM #
If NIDDM #
73
In those suspected to have MRDM #
Duration of DM ###
Means of first dx #
Time of first dx <dd/mm/yy>
Education #
Language #
Religion #
Other form of treatment tried #
If yes, when? #
Diagnosis of previous admission #
If yes to Q22, number of admissions #
Other concomitant diseases #
If yes to Q24, what? ##
Family history of diabetes mellitus #
If yes to Q26, mention the relative #
Compliance for those who are not coming on their date of appointment #
Reasons #
Symptoms at present while on treatment #
Treatment #
Insulin #
Insulin dose/day ###
OHA #
OHA dose per day ###
Do you know symptoms of hypoglycemia? #
If yes to Q35, what? #
If yes to Q35 what should you do? #
For females: Did you give birth after you have diagnosed # to have diabetes?
If yes to Q38, number of deliveries #
Outcome) of delivery #
How long do you use the same needle? ###
History of smoking #
74
If yes to Q42, for how long? ## years
How many cigarettes per day? ##
Bare foot walking #
Systolic BP ###
Diastolic BP ###
Weight ###.#
Height ###.#
BMI ##
Waist circumference ###
Hip circumference ###
Bilateral parotid enlargement #
Discolored teeth, eythematous gum #
Injection site examination #
FBS ###
RBS ###
75
The CHECK file for the diabetes questionnaire may look like the following
Q4
RANGE 1 2 Q21
END RANGE 1 2
Jumps Q35
Q5 2 Q23 RANGE 1 6
RANGE 1 2 END END
END END
Q37
Q9 Q23 RANGE 1 2
RANGE 1 6 RANGE 1 2 END
END Jumps
2 Q25 Q38
Q10 END RANGE 1 2
RANGE 1 8 END
END Q25
RANGE 1 2 Q42
Q11 END RANGE 1 2
RANGE 1 2 END
END Q26
RANGE 1 5 Q45
Q11 END RANGE 1 2
RANGE 1 4 END
END Q28
RANGE 1 3 Q53
Q14 END RANGE 1 2
RANGE 1 7 END
END Q29
RANGE 1 9 Q54
Q16 END RANGE 1 2
RANGE 1 4 END
END Q30
RANGE 1 4 Q55
Q17 END RANGE 1 2
RANGE 1 9 END
END Q31
RANGE 1 4
Q18 END
RANGE 1 3
END Q33
RANGE 1 2
Q19 END
RANGE 1 2
AutoJump Q35
Q21 RANGE 1 2
END END
76
References
2. Brownlee, et al. Health Services Research Course. Brazzaville: WHO, 1983: 371
3. Bauman KE. Research methods for community health and welfare. New York:
Churchill Living Stone, 1980:
4. Shi L. Health Services Research Methods. New York: Delmar, 1997: 410
5. WHO Study Group. Research for the reorientation of national health systems.
Geneva: WHO Technical Report series No. 694, 1983.
77