Epidemiological Study Designs Overview
Epidemiological Study Designs Overview
MODULE – 2
Epidemiological
Study Designs
MODULE
This module has been designed to introduce the candidates about the various study
designs used in the epidemiological studies. This will help the candidate to develop
ability to know the conditions under which a particular study design is to be used.
The candidate will also be able to understand the role of error, bias and confounding
in the epidemiological research.
Unit 2.1
Descriptive Epidemiology
Unit 2.2
Unit 2.3
Learning Objectives
Learning Outcome
Epidemiology 1
Introduction
Epidemiology 2
Introduction
Classification of Study designs: There are five major categories of classifying study
designs
The term descriptive implies a study which provides information about the pattern
of disease or risk factors but not the underlying causes. The term analytic applies to
Epidemiology 3
Introduction
studies exploring hypotheses about causes of disease but, by inference, not primarily
concerned with patterns.
Retrospective studies are said to be concerned with data in the past and prospective
ones with data in the future. The terms retrospective and prospective have been used
synonymously with case-control and cohort studies, respectively. The distinction
between retrospective and prospective studies is also very blurred, for case-control
studies may enrol subjects prospectively and cohort studies may enrol subjects
retrospectively and both do, of course, collect data on risk factors in the past.
The observational study is one where the investigator observes the natural course of
events. The experimental study is one where the course of events is deliberately
altered. The investigation of a natural experiment is, strictly speaking, an
observational study. Most epidemiology is observational, for experiments are the
exception.
Epidemiology 4
Introduction
Epidemiology 5
Introduction
The word cross-section describes the shape that results from cutting an object
lengthwise. A cross-sectional study, therefore, exposes and studies disease and risk
factor patterns in a representative part of the population, in a narrowly defined time
period. Also known as prevalence study, to emphasise the key role of cross-sectional
studies in Epidemiology. The cross-sectional study seeks associations, generates and
tests hypotheses and, by repetition in different time periods, can be used to measure
change, and hence evaluate interventions. Its focus is simultaneously on disease and
population characteristics and risk factors. Comparisons between subgroups within
the sample are invariably made, but the study can also be deliberately designed with
comparison groups. The comparisons are usually based on differences in the
prevalence of risk factors and diseases, and the association between risk factors and
diseases.
Epidemiology 6
Introduction
the population at risk and the listing of the sampling frame usually conform to the
snapshot analogy. But the measurement of risk factors and disease is usually made
over a period of time which varies from as little as a day to several years. In most
studies, the measurements are made over a relatively short period of time such as a
year or two.
The benefit of a study collecting data over a year is that any seasonal differences will
be evened out to give a more valid annual measure of prevalence. When the time
period of data collection is long, the degree of mismeasurement of the point
prevalence of disease depends on the natural history. If the disease is permanent
then the point prevalence may be overestimated because incident cases will occur
and be included during the duration of the fieldwork. Because most diseases are rare
so the effect will be small. If the disease incidence or base population is changing
over the duration of the study, this too will affect the disease prevalence. If there is a
dynamic but balanced state with new cases arising and old cases recovering in equal
numbers, then the mismeasure of the point prevalence is small. The
mismeasurement of point prevalence is likely to be important for diseases which
vary greatly by season or where the incidence of a disease is changing rapidly. The
cross-sectional study design is excellent for measuring the population burden of
disease using prevalence rates, which are the most reliable summary measures
obtained from such surveys.
In studies of the apparently well, cross-sectional studies may discover people with
previously unknown disease, that is, they uncover the iceberg of disease. The full
spectrum of disease can be described only by a combination of cross-sectional
surveys of the apparently well population and the diseased population, the latter is
more often obtained from clinical case-series than from cross-sectional studies. The
cross-sectional study can be of populations in different places, so comparisons can be
made. Studies can also compare people with different characteristics; for example,
there may be a sample of women and another of men, or one of people belonging to
a Chinese origin population and another of the Indian population. Such studies are
called comparative cross-sectional studies.
Epidemiology 7
Introduction
The case-control study is a comparative study where people with the disease (or
problem) of interest are compared with people without that disease. The word case
is used to describe the characteristics and medical history of a patient. The
comparison, control or reference group supplies information about the expected risk
factor profile in the population from which the case group is drawn.
The cases can be obtained from a number of sources e.g. a clinical case-series, a
population register of cases, from the new cases identified in a cohort study, and
from those identified in a cross-sectional survey.
The ideal set of cases would be new (incident) and representative of all cases of the type of
interest to the study question in the population under study. The cases from population registers
and cohort studies usually meet this ideal the best. The cases identified in a clinical case-series
are usually highly selected; while those from a cross-sectional study are usually prevalent ones,
Epidemiology 8
Introduction
though there will be incident cases in a period prevalence study. The cases are compared with
controls, associations between the disease and potential risk factors are measured (usually by the
odds ratio), and through analysis of similarities and dissimilarities, hypotheses about disease
causes are generated or tested.
Information is obtained on the social and medical history of cases and controls and on potential
causal and confounding factors. As the causal factors have already had their effect in causing
disease in the case group, and the information required is recalled from the past, the case-control
study is sometimes referred to as a retrospective study.
In some studies, controls are recruited to match each case; for example, if a woman of 53 years
was recruited as a case, one could seek a control of similar age. This matching process is
reducing the risk of confounding, here by age and sex. If a mix of ages is likely to arise anyway,
the control group can be recruited without one-to-one matching.
Information is collected to confirm the presence of disease in cases and the absence of disease in
controls and on the past exposure to risk factors which may have caused the disease in both
groups. The concept is to find differences in exposure to the hypothesized causes in the past lives
of cases as compared with controls. These differences can be quantified and summarised either
as differences in prevalence of exposure, or more usually as the odds ratio which in defined
circumstances approximates to the relative risk. An exposure that may have caused disease will
be more common in cases than in controls giving an odds ratio greater than one, and one that
may protect against disease will be less common, giving an odds ratio less than one. Of all the
epidemiological designs, case control is most focused on establishing aetiology and least on
measuring burden of disease or risk factors, which is a by-product.
The estimate of risk in a case control study using the odds ratio, as a valid estimator of the
relative risk, is based on the following assumptions:
• The cases are incident cases drawn from a known and defined population;
• The controls are drawn from the same defined population and would have been in the
case group if they had developed the disease;
• Controls are selected in an unbiased way, e.g. independently of exposure status;
• Some types of study where the disease is rare.
Epidemiology 9
Introduction
The word cohort is derived from the Latin ‘cohors’ meaning an enclosure, company,
or crowd. In Roman times, a cohort was a body of 300–600 infantry. In
epidemiological terms, ‘cohort is a group of people with something in common, usually an
exposure or involvement in a defined population group’. The three synonyms for cohort
study design are
• Follow-up
• Longitudinal, and
• Prospective study
Cohort study involves tracking the study population over a period of time.
The cohort study population may be a general one, or one with characteristics of
particular interest, for example, people with a defined lifestyle or even a disease. The
hallmark of this design is that health outcome or health change data are obtained on
Epidemiology 10
Introduction
the same individuals in a population at more than one time, not just once as in the
cross-sectional study. The idea is to study part of the natural history of risk factors or
diseases in individuals, and to relate one or more characteristics, exercise for
example, to future outcomes such as coronary heart disease.
In causal research, cohort studies usually test the hypothesis that disease incidence
differs in people with different characteristics (exposures) at baseline; that is, there is
an association between exposure and outcome. The cohort study begins by
establishing baseline data, usually from a cross-sectional study, or less commonly by
the extraction of baseline data from sources such as the census or a routine
information system such as a birth register. The cohort can either be followed up
directly with repeated surveys of the same population or the baseline data can be
linked to health records, so providing information on outcomes of interest, usually
disease related but potentially also on risk factors. The new cases of disease
identified are incident cases and can be enrolled into a case-control study. Controls
can also be identified from within the cohort, and this is best done as each case
occurs. This is known as a nested case-control study.
Epidemiology 11
Introduction
Retrospective cohort study/ies’ are cohort studies done in the past without any
prospective work. Essentially, the cohort is identified from past records of exposure
status and this is the vital step. Usually, the outcome data is also obtained in the past
from records (but this information can be supplemented with direct questioning of
those subjects who are alive, and can be traced). Once identified the subjects can be
followed up over time (prospectively) so using both currently available and future
data on outcome. The difference between this design and the prospective cohort is
minimal; a retrospective cohort is assembled from historical records on exposure
status, the prospective cohort on exposure status in the present. Cohort studies are
often described as analytic but one of their main functions is to provide information
on the incidence and to describe the natural history of disease and not just to explore
or generate hypotheses, for which they are of course, extremely useful. If the cohort
study is based on a defined and characterised population, the incidence rates can
often be extrapolated beyond the study group to similar populations elsewhere. ‘The
most important information from a cohort study is on incidence rates’. The ratio of the
incidence rates in the exposed and non-exposed groups derived from the cohort
study is the relative risk; the primary basis for measuring the strength of an
association.
[Link] Trials:
Fig 2.2.6: Trials
Epidemiology 12
Introduction
Trials are studies where an intervention designed to improve health has been
applied to a population, and the outcome is assessed at follow-up. Such studies give
information about the causes of a disease, effectiveness of interventions to influence
the natural history of disease, and the costs and benefits of interventions.
Trials are experiments that are not done in the laboratory setting, and are on human
or whole animal studies. The term is also known by other terms such as ‘Intervention
studies, clinical trials, and community trials’.
The trial has, essentially, the same design as a cohort study with one vital difference,
that the exposure status of the study population has been deliberately changed by
the investigator to see how this alters the incidence of disease or other features of the
natural history. Clinical and public health trials usually have a practical question to
answer: whether a particular intervention is sufficiently effective to be introduced into
clinical or public health practice. Hence, trials must be based on a study population
with proper understanding of how it relates to the (target) population which will be
offered the intervention should it be shown to be successful. Otherwise, an
intervention which works in a selected population may not fulfil its goals when put
into public health or clinical practice in the general population.
Proof of concept trials: are trials designed solely to produce knowledge about cause
and effect; the intention being to test the efficacy of the intervention in actual practice
Epidemiology 13
Introduction
at a later date. For these trials, understanding the relationship of the study
population to the target population is not so essential (but still advised).
Epidemiology 14
Introduction
a) To provide "scientific proof" of aetiological {or risk) factors which may permit
the modification or control of those diseases.
b) To provide a method of measuring the effectiveness and efficiency of health
services for the prevention, control and treatment of disease and improve the
health of the community.
[Link] Randomised controlled Trials (RCT): The randomised trial is the gold
standard for evaluating the efficacy of therapeutic, preventive, and other measures
in both clinical medicine and public health study designs because randomisation, if
correctly conducted, prevents any biases on the part of the study investigators from
influencing the treatment assignment for each participant. If our study is large
enough, randomisation will also most likely lead to comparability between
treatment groups on factors that may be important for the outcome, such as age, sex,
race etc., as well as for factors one has not measured, or may not even be aware of as
important. The basic steps in conducting a RCT include the following:
Epidemiology 15
Introduction
1. The protocol
This is an essential feature of a randomised controlled trial. The protocol specifies the
aims and objectives of the study, questions to be answered, criteria for the selection of
study and control groups, size of the sample, the procedures for allocation of subjects
into study and control groups, treatments to be applied when and where and how to
what kind of patients, standardisation of working procedures and schedules as well as
responsibilities of the parties involved in the trial, up to the stage of evaluation of
outcome of the study. A protocol is essential especially when a number of centres are
participating in the trial. Once a protocol has been evolved, it should be strictly adhered
to throughout the study. The protocol aims at preventing bias and to reduce the sources
of error in the study. Preliminary test running of the protocol may be necessary to see
whether it contains any flaws. The final version of the protocol is then agreed upon by
all concerned before the trial begins.
(a) Reference or target population: It is the population to which the findings of the trial,
if found successful, are expected to be applicable (e.g., a drug, vaccine or other
procedure). A reference population may be as broad as mankind or a whole city, it may
be geographically limited or limited to persons in specific age, sex, occupational or social
groups, e.g. population of school children, industrial workers, and obstetric population.
(b) Experimental or study population: The study population is derived from the
reference population. It is the actual population that participates in the experimental
study. Ideally, it should be randomly chosen from the reference population, so that it
has the same characteristics as the reference population. If the study population differs
from the reference population, it may not be possible to generalise the findings of the
study to the reference population.
Epidemiology 16
Introduction
i. They must give "informed consent", that is they must agree to participate in the
trial after having been fully informed about the purpose, procedures and possible
dangers of the trial
ii. They should be representative of the population to which they belong (i.e.,
reference population)
iii. They should be qualified or eligible for the trial. That is, let us suppose, we are
testing the effectiveness of a new drug for the treatment of anaemia. If the
volunteers are not anaemic, one will then say, they are not eligible or qualified
for the trial. In other words, the participants must be fully susceptible to the
disease under study.
3. Randomisation
This is a statistical procedure by which the participants are allocated into groups
usually called "study" and "control" groups, to receive or not to receive an
experimental preventive or therapeutic procedure, manoeuvre or intervention.
Randomisation is an attempt to eliminate "bias" and allow for comparability. Unlike
matching which can only match those factors which are known to be important to
assure comparability randomisation seeks to take care of other factors which are
important but whose effect is not recognised or cannot be determined. By a process
of randomization, hopefully, these factors will be distributed equally between the
two groups. In other words, by random allocation, every individual gets an equal
chance of being allocated into either group or any of the trial groups. Randomisation
is done only after the participant has entered the study that is after having been
qualified for the trial and has given his informed consent to participate in the study.
Randomisation is best done by using a table of random numbers.
4. Manipulation
5. Follow-up
Epidemiology 17
Introduction
This implies examination of the experimental and control group subjects at defined
intervals of time, in a standard manner, with equal intensity, under the same given
circumstances, in the same time frame till final assessment of outcome. The duration
of the trial is usually based on the expectation that a significant difference (e.g.
mortality) will be demonstrable at a given point in time after the start of the trial. It is
possible that some losses to follow-up are inevitable due to factors, such as death,
migration and loss of interest. This is known as attrition. If the attrition is substantial,
it may be difficult to generalise the results of the study to the reference population.
6. Assessment
This is the final step to assess the final outcome of the trial in terms of:
a) Positive results: That is, the benefits of the experimental measure such as reduced
incidence or severity of the disease, cost to the health service or other appropriate
outcome in the study and control groups.
b) Negative results: That is, the severity and frequency of side-effects and
complications, if any, including death. Adverse effects may be missed if they are
not sought.
Bias may arise from errors of assessment of the outcome due to human element.
These may be from three sources:
a) Subject Bias: There may be bias on the part of the participants, who may
subjectively feel better or report improvement if they knew they were receiving a
new form of treatment. This is known as "subject variation".
b) Observer Bias: That is the investigator measuring the outcome of a therapeutic
trial may be influenced if he knows beforehand the particular procedure or
therapy to which the patient has been subjected.
c) Evaluation Bias: There may be bias in evaluation, that is, the investigator may
subconsciously give a favourable report of the outcome of the trial.
Randomisation cannot guard against these sorts of bias, nor the size of the
sample.
i. Single Blind Trial: The trial is so planned that the participant is not aware
whether he belongs to the study group or the control group.
Epidemiology 18
Introduction
ii. Double Blind Trial: The trial is so planned that neither the investigator nor
the participant is aware of the group allocation and the treatment received.
iii. Triple Blind Trial: This goes one step further. The participant, the
investigator and the person analysing the data are all "blind". Ideally, of
course, triple blinding should be used; but double blinding is the most
frequently used method when a blind trial is conducted. When an outcome
such as death is being measured, blinding is not so essential.
Each study has advantages and disadvantages and no one study design is superior.
The ‘hierarchy of evidence’ whereby the trial is said to produce definitive evidence,
and other designs weaker evidence, is a narrow idea that only applies to evaluation,
particularly of drugs. Other designs are stronger for measuring the burden of disease
and in understanding causality. It should be noted that causal understanding comes
from all types of study. Some of the strengths and weaknesses of each study design
are given in the Table below.
Table 2.2.3: Advantage & Disadvantage of Different Study
Epidemiology 19
Introduction
Epidemiology 20
Introduction
to analysis hypotheses.
of Control group
association may supply
and may burden of need
confirm or data
spark
hypothesis
8 Observer Small Small studies Usually require Usually require
Bias studies may may be done by multiple multiple observers
be done by one observer observers but
one observer but large exceptionally
but for most studies usually studies may be
studies require a few small.
,inter-
observer
bias may be
a problem.
9 Selection Selection Studies of Selection bias Selection biases are
Bias bias arising prevalent cases due to non- particularly severe
from non- have selection response at because non-
response is bias, those of baseline is participation may
almost incident cases augmented by be high and
inevitable minimize this. loss to follow up intervention may
All studies have only be suitable for
recall bias some of the target
population.
10 Analytic Main output Proportions Incidence rate Incidence, survival
output is exposed and and the relative and numbers
prevalence, odds ratio incidence i.e. needed to treat or
though other relative risk prevent
measures
like the odds
ratio are
possible
Epidemiology 21
Introduction
The odds ratio is a good measure of relative risk when the disease being studied is
rare.
Table 2.2.4: Odds Ratio
Epidemiology 22
Introduction
• When exposure is associated with protection from disease, Odds ratio < 1
In cohort study, the Relative risk is the rate of disease in exposed compared to rate of
disease in unexposed subjects.
Table 2.2.5: Cohort Study
Epidemiology 23
Introduction
2.2.5 Conclusion
Epidemiology 24
Introduction
Summary
o The observational study is one where the investigator observes the natural
course of events.
o A cross-sectional study exposes and studies disease and risk factor patterns in
a representative part of the population, in a narrowly defined time period. Its
focus is simultaneously on disease and population characteristics and risk
factors.
o The case-control study is a comparative study where people with the disease
(or problem) of interest are compared with people without that disease.
o Cohort study involves tracking the study population over a period of time.
Epidemiology 25
Introduction
Case Study-2
In a case study of prospective study design, comprising of 10000 subjects, 6000 subjects were
put on beta carotene and 4000 were not. Three out of the first 6000 developed lung cancer
and 2 out of the second 4000 developed lung cancer.
Epidemiology 26
Introduction
Bibliography
E-References
Epidemiology 27