Understanding systematic reviews and meta-analysis 845
EVIDENCE BASED CHILD HEALTH 3
Understanding systematic reviews and meta-analysis
A K Akobeng
...............................................................................................................................
Arch Dis Child 2005;90:845–848. doi: 10.1136/adc.2004.058230
This review covers the basic principles of systematic how they selected, assessed, and analysed the
primary studies, thereby not allowing readers to
reviews and meta-analyses. The problems associated with assess potential bias in the review process.
traditional narrative reviews are discussed, as is the role of Narrative reviews are therefore often biased,
systematic reviews in limiting bias associated with the and the recommendations made may be inap-
propriate.5
assembly, critical appraisal, and synthesis of studies
addressing specific clinical questions. Important issues that WHAT IS A SYSTEMATIC REVIEW?
need to be considered when appraising a systematic In contrast to a narrative review, a systematic
review is a form of research that provides a
review or meta-analysis are outlined, and some of the summary of medical reports on a specific clinical
terms used in the reporting of systematic reviews and meta- question, using explicit methods to search,
analyses—such as odds ratio, relative risk, confidence critically appraise, and synthesise the world
literature systematically.6 It is particularly useful
interval, and the forest plot—are introduced. in bringing together a number of separately
........................................................................... conducted studies, sometimes with conflicting
findings, and synthesising their results.
By providing in a clear explicit fashion a
H
ealth care professionals are increasingly summary of all the studies addressing a specific
required to base their practice on the best clinical question,4 systematic reviews allow us to
available evidence. In the first article of the take account of the whole range of relevant
series, I described basic strategies that could be findings from research on a particular topic, and
used to search the medical literature.1 After a not just the results of one or two studies. Other
literature search on a specific clinical question, advantages of systematic reviews have been
many articles may be retrieved. The quality of the discussed by Mulrow.7 They can be used to
studies may be variable, and the individual establish whether scientific findings are consis-
studies might have produced conflicting results. tent and generalisable across populations, set-
It is therefore important that health care tings, and treatment variations, or whether
decisions are not based solely on one or two findings vary significantly by particular sub-
studies without account being taken of the groups. Moreover, the explicit methods used in
whole range of research information available systematic reviews limit bias and, hopefully, will
on that topic. improve reliability and accuracy of conclusions.
Health care professionals have always used For these reasons, systematic reviews of rando-
review articles as a source of summarised mised controlled trials (RCTs) are considered to
evidence on a particular topic. Review articles be evidence of the highest level in the hierarchy
in the medical literature have traditionally been of research designs evaluating effectiveness of
in the form of ‘‘narrative reviews’’ where experts interventions.8
in a particular field provide what is supposed to
be a ‘‘summary of evidence’’ in that field. METHODOLOGY OF A SYSTEMATIC
Narrative reviews, although still very common REVIEW
in the medical field, have been criticised because The need for rigour in the preparation of a
of the high risk of bias, and ‘‘systematic reviews’’ systematic review means that there should be a
are preferred.2 Systematic reviews apply scientific formal process for its conduct. Figure 1 sum-
strategies in ways that limit bias to the assembly, marises the process for conducting a systematic
a critical appraisal, and synthesis of relevant review of RCTs.9 This includes a comprehensive,
studies that address a specific clinical question.2 exhaustive search for primary studies on a
.......................
focused clinical question, selection of studies
Correspondence to: THE PROBLEM WITH TRADITIONAL using clear and reproducible eligibility criteria,
Dr A K Akobeng, critical appraisal of primary studies for quality,
Department of Paediatric REVIEWS
Gastroenterology, Central The validity of a review article depends on its and synthesis of results according to a predeter-
Manchester and methodological quality. While traditional review mined and explicit method.3 9
Manchester Children’s
University Hospitals, Booth
articles or narrative reviews can be useful when
conducted properly, there is evidence that they WHAT IS A META-ANALYSIS?
Hall Children’s Hospital,
Charlestown Road, are usually of poor quality. Authors of narrative Following a systematic review, data from indivi-
Blackley, Manchester, M9 reviews often use informal, subjective methods dual studies may be pooled quantitatively and
7AA, UK; tony.akobeng@ to collect and interpret studies and tend to be reanalysed using established statistical meth-
cmmc.nhs.uk ods.10 This technique is called meta-analysis. The
selective in citing reports that reinforce their
Accepted 22 April 2005 preconceived ideas or promote their own views
....................... on a topic.3 4 They are also rarely explicit about Abbreviations: RCT, randomised controlled trial
www.archdischild.com
846 Akobeng
and the horizontal lines their 95% confidence intervals. The
State objectives of the review and outline eligibility criteria area of the black squares reflects the weight each trial
contributes in the meta-analysis. The 95% confidence
intervals would contain the true underlying effect in 95% of
Comprehensively search for trials that seem to meet
the occasions if the study was repeated again and again. The
eligibility criteria
solid vertical line corresponds to no effect of treatment
(OR = 1.0). If the CI includes 1, then the difference in the
Tabulate characteristics of each trial identified and assess its effect of experimental and control treatment is not significant
methodological quality at conventional levels (p.0.05).15 The overall treatment effect
(calculated as a weighted average of the individual ORs) from
the meta-analysis and its CI is at the bottom and represented
Apply eligibility criteria and justify any exclusions as a diamond. The centre of the diamond represents the
combined treatment effect (0.37), and the horizontal tips
represent the 95% CI (0.26 to 0.52). If the diamond shape is
Assemble the most comprehensive dataset feasible on the Left of the line of no effect, then Less (fewer episodes)
of the outcome of interest is seen in the treatment group. If
the diamond shape is on the Right of the line, then moRe
Analyse results of eligible RCT's using statistical synthesis of data episodes of the outcome of interest are seen in the treatment
(meta-analysis) if appropriate and possible)
group. In fig 2, the diamond shape is found on the left of the
line of no effect, meaning that less diarrhoea (fewer
Compare alternative analyses if appropriate and possible episodes) was seen in the probiotic group than in the placebo
group. If the diamond touches the line of no effect (where the
OR is 1) then there is no statistically significant difference
Prepare a critical summary of the review, stating aims, between the groups being compared. In fig 2, the diamond
describing materials, and reporting results shape does not touch the line of no effect (that is, the
confidence interval for the odds ratio does not include 1) and
Figure 1 Methodology for a systematic review of randomised this means that the difference found between the two groups
controlled trials.9 was statistically significant.
rationale for a meta-analysis is that, by combining the APPRAISING A SYSTEMATIC REVIEW WITH OR
samples of the individual studies, the overall sample size is WITHOUT META-ANALYSIS
increased, thereby improving the statistical power of the Although systematic reviews occupy the highest position in
analysis as well as the precision of the estimates of treatment the hierarchy of evidence for articles on effectiveness of
effects.11 interventions,8 it should not be assumed that a study is valid
Meta-analysis is a two stage process.12 The first stage merely because it is stated to be an systematic review. Just as
involves the calculation of a measure of treatment effect with in RCTs, the main issues to consider when appraising a
its 95% confidence intervals (CI) for each individual study. systematic review can be condensed into three important
The summary statistics that are usually used to measure areas8:
treatment effect include odds ratios (OR), relative risks (RR),
and risk differences. N The validity of the trial methodology.
In the second stage of meta-analysis, an overall treatment N The magnitude and precision of the treatment effect.
effect is calculated as a weighted average of the individual
summary statistics. Readers should note that, in meta-
N The applicability of the results to your patient or
population.
analysis, data from the individual studies are not simply
combined as if they were from a single study. Greater weights Box 1 shows a list of 10 questions that may be used to
are given to the results from studies that provide more appraise a systematic review in all three areas.16
information, because they are likely to be closer to the ‘‘true
effect’’ we are trying to estimate. The weights are often the
ASSESSING THE VALIDITY OF TRIAL
inverse of the variance (the square of the standard error) of
the treatment effect, which relates closely to sample size.12
METHODOLOGY
The typical graph for displaying the results of a meta-analysis Focused research question
is called a ‘‘forest plot’’.13 Like all research reports, the authors should clearly state the
research question at the outset. The research question should
The forest plot include the relevant population or patient groups being
The plot shows, at a glance, information from the individual studied, the intervention of interest, any comparators (where
studies that went into the meta-analysis, and an estimate of relevant), and the outcomes of interest. Keywords from the
the overall results. It also allows a visual assessment of the research question and their synonyms are usually used to
amount of variation between the results of the studies identify studies for inclusion in the review.
(heterogeneity). Figure 2 shows a typical forest plot. This
figure is adapted from a recent systematic review and meta- Types of studies included in the review
analysis which examined the efficacy of probiotics compared The validity of a systematic review or meta-analysis depends
with placebo in the prevention and treatment of diarrhoea heavily on the validity of the studies included. The authors
associated with the use of antibiotics.14 should explicitly state the type of studies they have included
in their review, and readers of such reports should decide
Description of the forest plot whether the included studies have the appropriate study
In the forest plot shown in fig 2, the results of nine studies design to answer the clinical question. In a recent systematic
have been pooled. The names on the left of the plot are the review which determined the effects of glutamine supple-
first authors of the primary studies included. The black mentation on morbidity and weight gain in preterm babies
squares represent the odds ratios of the individual studies, the investigators based their review only on RCTs.17
www.archdischild.com
Understanding systematic reviews and meta-analysis 847
Study Odds ratio Odds ratio Weight (%)
(95% Cl)
Surawicz34 0.37 (0.16 to 0.88) 15.1
37
McFarland 0.46 (0.18 to 1.18) 12.1
38
Lewis 1.67 (0.47 to 5.89) 3.5
31
Adam 0.22 (0.10 to 0.48) 29.9
35
Tankanow 0.88 (0.22 to 3.52) 3.9
39
Vanderhoof 0.23 (0.09 to 0.56) 21.2
36
Orrhage 0.58 (0.07 to 4.56) 2.2
34
Wunderlich 0.25 (0.05 to 1.43) 5.2
32
Gotz 0.34 (0.09 to 1.38) 7.0
Overall 0.37 (0.26 to 0.52)
0.01 1 10
Favours Favours
treatment control
Figure 2 Effect of probiotics on the risk of antibiotic associated diarrhoea.14
Search strategy used to identify relevant articles morbidity and weight gain in preterm babies, the authors
There is evidence that single electronic database searches lack searched the Cochrane controlled trials register, Medline, and
sensitivity and relevant articles may be missed if only one Embase,17 and they also hand searched selected journals,
database is searched. Dickersin et al showed that only 30–80% cross referencing where necessary from other publications.
of all known published RCTs were identifiable using
MEDLINE.18 Even if relevant records are in a database, it Quality assessment of included trials
can be difficult to retrieve them easily. A comprehensive The reviewers should state a predetermined method for
search is therefore important, not only for ensuring that as assessing the eligibility and quality of the studies included. At
many studies as possible are identified but also to minimise least two reviewers should independently assess the quality
selection bias for those that are found. Relying exclusively on of the included studies to minimise the risk of selection bias.
one database may retrieve a set of studies that are There is evidence that using at least two reviewers has an
unrepresentative of all studies that would have been important effect on reducing the possibility that relevant
identified through a comprehensive search of multiple reports will be discarded.19
sources. Therefore, in order to retrieve all relevant studies
on a topic, several different sources should be searched to Pooling results and heterogeneity
identify relevant studies (published and unpublished), and If the results of the individual studies were pooled in a meta-
the search strategy should not be limited to the English analysis, it is important to determine whether it was
language. The aim of an extensive search is to avoid the reasonable to do so. A clinical judgement should be made
problem of publication bias which occurs when trials with about whether it was reasonable for the studies to be
statistically significant results are more likely to be published combined based on whether the individual trials differed
and cited, and are preferentially published in English considerably in populations studied, interventions and
language journals and those indexed in Medline. comparisons used, or outcomes measured.
In the systematic review referred to above, which The statistical validity of combining the results of the
examined the effects of glutamine supplementation on various trials should be assessed by looking for homogeneity
of the outcomes from the various trials. In other words, there
should be some consistency in the results of the included
Box 1: Questions to consider when appraising a
trials. One way of doing this is to inspect the graphical
systematic review 1 6
display of results of the individual studies (forest plot, see
above) looking for similarities in the direction of the results.
N Did the review address a clearly focused question? When the results differ greatly in their direction—that is, if
N Did the review include the right type of study? there is significant heterogeneity—then it may not be wise
N Did the reviewers try to identify all relevant studies? for the results to be pooled. Some articles may also report a
N Did the reviewers assess the quality of all the studies statistical test for heterogeneity, but it should be noted that
the statistical power of many meta-analyses is usually too
included?
low to allow the detection of heterogeneity based on
N If the results of the study have been combined, was it statistical tests. If a study finds significant heterogeneity
reasonable to do so? among reports, the authors should attempt to offer explana-
N How are the results presented and what are the main tions for potential sources of the heterogeneity.
results?
N How precise are the results? Magnitude of the treatment effect
N Can the results be applied to your local population? Common measures used to report the results of meta-analyses
N Were all important outcomes considered? include the odds ratio, relative risk, and mean differences. If the
outcome is binary (for example, disease v no disease, remission
N Should practice or policy change as a result of the v no remission), odds ratios or relative risks are used. If the
evidence contained in this review? outcome is continuous (for example, blood pressure measure-
ment), mean differences may be used.
www.archdischild.com
848 Akobeng
ODDS RATIOS AND RELATIVE RISKS APPLICABILITY OF RESULTS TO PATIENTS
Odds and odds ratio Health care professionals should always make judgements
The odds for a group is defined as the number of patients in about whether the results of a particular study are applicable
the group who achieve the stated end point divided by the to their own patient or group of patients. Some of the issues
number of patients who do not. For example, the odds of that one need to consider before deciding whether to
acne resolution during treatment with an antibiotic in a incorporate a particular piece of research evidence into
group of 10 patients may be 6 to 4 (6 with resolution of acne clinical practice were discussed in the second article of the
divided by 4 without = 1.5); in a control group the odds series.8 These include similarity of study population to your
may be 3 to 7 (0.43). The odds ratio, as the name implies, is a population, benefit v harm, patients preferences, availability,
ratio of two odds. It is simply defined as the ratio of the odds and costs.
of the treatment group to the odds of the control group. In
our example, the odds ratio of treatment to control group CONCLUSIONS
would be 3.5 (1.5 divided by 0.43). Systematic reviews apply scientific strategies to provide in an
explicit fashion a summary of all studies addressing a specific
Risk and relative risk question, thereby allowing an account to be taken of the
Risk, as opposed to odds, is calculated as the number of whole range of relevant findings on a particular topic. Meta-
patients in the group who achieve the stated end point analysis, which may accompany a systematic review, can
increase power and precision of estimates of treatment
divided by the total number of patients in the group. Risk
effects. People working in the field of paediatrics and child
ratio or relative risk is a ratio of two ‘‘risks’’. In the example
health should understand the fundamental principles of
above the risks would be 6 in 10 in the treatment group (6
systematic reviews and meta-analyses, including the ability
divided by 10 = 0.6) and 3 in 10 in the control group (0.3),
to apply critical appraisal not only to the methodologies of
giving a risk ratio, or relative risk of 2 (0.6 divided by 0.3).
review articles, but also to the applicability of the results to
their own patients.
Interpretation of odds ratios and relative risk
Competing interests: none declared
An odds ratio or relative risk greater than 1 indicates
increased likelihood of the stated outcome being achieved
in the treatment group. If the odds ratio or relative risk is less REFERENCES
than 1, there is a decreased likelihood in the treatment group. 1 Akobeng AK. Evidence based child health 1. Principles of evidence based
medicine. Arch Dis Child 2005;90:837–40.
A ratio of 1 indicates no difference—that is, the outcome is 2 Cook DJ, Mulrow CD, Haynes RB. Systematic reviews: synthesis of best
just as likely to occur in the treatment group as it is in the evidence for clinical decisions. Ann Intern Med 1997;126:376–80.
control group.11 As in all estimates of treatment effect, odds 3 Pai M, McCulloch M, Gorman JD, et al. Systematic reviews and meta-
analyses: an illustrated, step-by-step guide. Natl Med J India 2004;17:86–95.
ratios or relative risks reported in meta-analysis should be 4 McGovern DPB. Systematic reviews. In: McGovern DPB, Valori RM,
accompanied by confidence intervals. Summerskill WSM, eds. Key topics in evidence based medicine. Oxford: BIOS
Readers should understand that the odds ratio will be close Scientific Publishers, 2001:17–9.
5 McAlister FA, Clark HD, van Walraven C, et al. The medical review article
to the relative risk if the end point occurs relatively revisited: has the science improved? Ann Intern Med 1999;131:947–51.
infrequently, say in less than 20%.15 If the outcome is more 6 Sackett DL, Strauss SE, Richardson WS, et al. Evidence-based medicine: how
common, then the odds ratio will considerably overestimate to practice and teach EBM. London: Churchill-Livingstone, 2000.
7 Mulrow CD. Systematic reviews: rationale for systematic reviews. BMJ
the relative risk. The advantages and disadvantages of odds 1994;309:597–9.
ratios v relative risks in the reporting of the results of meta- 8 Akobeng AK. Evidence based child health 2. Understanding randomised
analysis have been reviewed elsewhere.12 controlled trials. Arch Dis Child 2005;90:840–4.
9 Greenhalgh T. How to read a paper: papers that summarise other papers
(systematic reviews and meta-analyses). BMJ 1997;315:672–5.
Precision of the treatment effect: confidence intervals 10 Muir Gray JA. Evidence based healthcare. How to make health policy and
As stated earlier, confidence intervals should accompany management decisions. London: Churchill Livingstone, 2001:125–6.
11 Lang TA, Secic M. How to report statistics in medicine. Philadelphia: American
estimates of treatment effects. I discussed the concept of College of Physicians, 1997.
confidence intervals in the second article of the series.8 12 Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining
Ninety five per cent confidence intervals are commonly heterogeneity and combining results from several studies in meta-analysis. In:
Egger M, Smith GD, Altman DG, eds. Systematic reviews in healthcare: meta-
reported, but other intervals such as 90% or 99% are also analysis in context. London: BMJ Publishing Group, 2001:285–312.
sometimes used. The 95% CI of an estimate (for example, of 13 Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ
odds ratios or relative risks) will be the range within which 2001;322:1479–80.
14 D’Souza AL, Rajkumar C, Cooke J, et al. Probiotics in prevention of antibiotic
we are 95% certain that the true population treatment effect associated diarrhoea: meta-analysis. BMJ 2002;324:1361.
will lie. The width of a confidence interval indicates the 15 Egger M, Smith GD, Phillips AN. Meta-analysis: principles and procedures.
precision of the estimate. The wider the interval, the less the BMJ 1997;315:1533–7.
16 Critical Appraisal Skills Programme. Appraisal Tools. Oxford, UK. http://
precision. A very long interval makes us less sure about the www.phru.nhs.uk/casp/appraisa.htm (accessed 10 Dec 2004).
accuracy of a study in predicting the true size of the effect. If 17 Tubman TRJ, Thompson SW. Glutamine supplementation for prevention of
morbidity in preterm infants. The Cochrane Database of Systematic Reviews
the confidence interval for relative risk or odds ratio for an 2001, Issue 4.
estimate includes 1, then we have been unable to demon- 18 Dickersin K, Scherer R, Lefebvre C. Systematic reviews: identifying relevant
strate a statistically significant difference between the groups studies for systematic reviews. BMJ 1994;309:1286–91.
19 Clarke M, Oxman AD, eds. Selecting studies. Cochrane reviewers’ handbook
being compared; if it does not include 1, then we say that 4.2.0 [updated March 2003]. In: The Cochrane library, issue 2. Oxford:
there is a statistically significant difference. Update Software, 2003.
www.archdischild.com